REST API

The Maniac API provides openai compatible inference endpoints for interacting with both frontier models and your custom models. This reference details the available endpoints.

List evaluation runs

get

List evaluation runs for the authenticated project. Optionally filter by container and status.

Authorizations
AuthorizationstringRequired

API key in Authorization header using Bearer .

Query parameters
containerany ofOptional

Container ID or label to filter by.

stringOptional
or
nullOptional
statusany ofOptional

Filter by run status (e.g. 'running', 'completed', 'error').

stringOptional
or
nullOptional
limitinteger · min: 1 · max: 100OptionalDefault: 20
offsetintegerOptionalDefault: 0
Responses
chevron-right
200

Successful Response

application/json
objectconst: listOptional

Object type identifier.

Default: list
totalintegerRequired

Total number of items available for this resource.

get
/v1/evaluation/runs

Create an evaluation run

post

Launch an evaluation run. Validates access to the specified container, evaluators, data sources, and models, then dispatches the run through the backend gateway interface.

Authorizations
AuthorizationstringRequired

API key in Authorization header using Bearer .

Body

Request body for creating an evaluation run.

Each side of the evaluation (sample and ground_truth) is described by a single data-source object whose type discriminator determines how data is obtained:

  • "dataset" — pull from a dataset.
  • "container" — pull from the container's task logs.
  • "generate" — generate completions using one or more models.

Both fields are optional, but at least one must be provided. When a side is omitted it defaults to the top-level container's task logs. At least one resolved side must not be type='generate' so there is seed input to evaluate against.

containerstringRequired

Container id or label.

evaluatorsstring[] · min: 1Required

Evaluator ids or labels.

sampleany ofOptional

Sample-side data source. Omit to default to the container's task logs. Use type='dataset' to pull from a dataset, type='container' to pull from task logs, or type='generate' to generate completions with the specified models.

anyOptional
or
or
or
nullOptional
ground_truthany ofOptional

Ground-truth-side data source. Omit to default to the container's task logs. Use type='dataset' to pull from a dataset, type='container' to pull from task logs, or type='generate' to generate completions with a model.

anyOptional
or
or
or
nullOptional
baselineany ofOptional

Baseline-side data source for pairwise evaluation. Use type='dataset' to pull from a dataset, type='container' to pull from task logs, or type='generate' to generate completions with a model.

anyOptional
or
or
or
nullOptional
metadataany ofOptional

Optional metadata.

or
nullOptional
environmentany ofOptional

Execution environment name (maps to Modal app suffix).

Default: main
stringOptional
or
nullOptional
Responses
post
/v1/evaluation/runs

Get an evaluation run

get

Retrieve a single evaluation run by ID within the authenticated project.

Authorizations
AuthorizationstringRequired

API key in Authorization header using Bearer .

Path parameters
run_idstringRequired
Responses
chevron-right
200

Successful Response

application/json

Response model for an evaluation run.

created_atstringRequired
finished_atany ofOptional
stringOptional
or
nullOptional
error_atany ofOptional
stringOptional
or
nullOptional
statusstringRequired
errorany ofOptional
anyOptional
or
nullOptional
objectconst: evaluation.runOptional

Object type.

Default: evaluation.run
idstringRequired

Evaluation run id (run group id).

process_idany ofOptional

Process id for lifecycle tracking.

stringOptional
or
nullOptional
evaluatorsany ofOptional

Evaluator ids used in this run.

string[]Optional
or
nullOptional
containerany ofOptional

Container id.

stringOptional
or
nullOptional
dataset_idany ofOptional

Dataset id (if a dataset was used).

stringOptional
or
nullOptional
sampleany ofOptional

Resolved sample-side data source.

anyOptional
or
or
or
nullOptional
ground_truthany ofOptional

Resolved ground-truth-side data source.

anyOptional
or
or
or
nullOptional
baselineany ofOptional

Resolved baseline-side data source for pairwise evaluation.

anyOptional
or
or
or
nullOptional
resultsany ofOptional

Evaluation results (populated on completion).

or
nullOptional
metricsany ofOptional

Evaluation metrics (populated on completion).

or
nullOptional
configany ofOptional

Run configuration as submitted.

or
nullOptional
spendany ofOptional

Estimated spend.

numberOptional
or
nullOptional
metadataany ofOptional

Optional metadata.

or
nullOptional
get
/v1/evaluation/runs/{run_id}

Healthz

get

Health check endpoint for load balancers and uptime monitors.

Responses
chevron-right
200

Successful Response

application/json

Health check response.

okbooleanRequired
get
/healthz
200

Successful Response

Last updated