Registering in Batches

Adding pre-existing data to a container.

Maniac lets you upload existing completions directly into a container. These might be inference logs from a different inference provider, or a labeled dataset.

Boilerplate

import requests

# First, create a container if you don't already have onee
container = maniac.containers.create(
    label="my-container",
    initial_model="some-base-model", # If you want to track the model that originated these completions
    initial_system_prompt="Your system prompt here",
)

# Second, create the messages object that will populate your container
dataset = [
    {
        "input": {
            "messages": [
                {"role": "system", "content": "..."},
                {"role": "user", "content": "..."},
            ]
        },
        "output": {
            "choices": [
                {"message": {"role": "assistant", "content": "..."}}
            ]
        },
    }
]

# Register completions. These will now show up on the Maniac dashboard in your container.
maniac.chat.completions.register(
    model="maniac:my-container,
    dataset=dataset,
)

Each dataset entry consists of:

input : the messages sent to the model (system prompt & user prompt)
output : the assistant response

Example: Uploading a HuggingFace Dataset

Let's walk through an example using the LEDGAR (Tuggener et al. 2020) dataset, great for training and testing legal clause classification models.

Prerequisites

export MANIAC_API_KEY=...
pip install maniac datasets

Load the HuggingFace Dataset

from datasets import load_dataset

DATASET = "coastalchp/ledgar"

# Load dataset
dataset = load_dataset(DATASET)
train_split = dataset["train"]

# Extract inputs and labels
clauses = train_split["text"]                  
label_ids = train_split["label"]              
label_names = train_split.features["label"].names

Define the System Prompt

system_prompt = (
    "You are a legal clause classifier.\n"
    "Given a clause, return exactly one label from this list:\n"
    + "\n".join(f"- {label}" for label in label_names)
    + "\nRespond with only the label name.")

Create a container

You can also skip this step and upload a dataset to an existing container, where it will be combined with any existing inference logs.

from maniac import Maniac

maniac = Maniac()

container = maniac.containers.create(
    label="LEDGAR-register",
    default_system_prompt=system_prompt,
)

Upload Data in Batches

For large datasets, it's recommended uploading in batches to avoid timeouts.

BATCH_SIZE = 1500
START = 0
MAX_SAMPLES = 60000

end = min(len(clauses), START + MAX_SAMPLES)

for batch_start in range(START, end, BATCH_SIZE):
    batch_end = min(batch_start + BATCH_SIZE, end)
    dataset = []

    for i in range(batch_start, batch_end):
        messages = [
            {"role": "system", "content": system_prompt},
            {"role": "user", "content": clauses[i]},
        ]

        dataset.append({
            "input": {"messages": messages},
            "output": {
                "choices": [
                    {
                        "message": {
                            "role": "assistant",
                            "content": label_names[label_ids[i]],
                        }
                    }
                ]
            },
        })

    client.chat.completions.register(
        container=container,
        dataset=dataset,
    )

    print(f"Uploaded samples {batch_start}–{batch_end - 1}")

Note: Unlike generating completions inside a container—where the container’s system prompt is automatically applied—registering (logging) existing completions requires the system prompt to be included explicitly with each messages object. Registered completions do not inherit the container-level system prompt.

PreviousManiac: Your best model in one click.NextUsing the Datasets Feature

Last updated 5 hours ago

Good afternoon

hashtagBoilerplate

hashtagExample: Uploading a HuggingFace Dataset

hashtagPrerequisites

hashtagLoad the HuggingFace Dataset

hashtagDefine the System Prompt

hashtagCreate a container

hashtagUpload Data in Batches