Evaluation
API key in Authorization header using Bearer .
200Successful Response
Object type identifier.
listTotal number of items available for this resource.
Bad Request
Unauthorized
Forbidden
Validation Error
Too Many Requests
Internal Server Error
Not Implemented
Upstream Unavailable
API key in Authorization header using Bearer .
Container label.
Evaluator type.
Judge model slug.
Judge prompt.
Code evaluator source.
Optional requirements list.
Optional pass threshold.
Evaluation mode.
pointwisePossible values: Optional metadata.
Evaluator name.
Evaluator description.
API type.
Optional inference parameters for judge evaluators (e.g. response_format, temperature).
Successful Response
Bad Request
Unauthorized
Forbidden
Not Found
Conflict
Validation Error
Too Many Requests
Internal Server Error
Not Implemented
Upstream Unavailable
API key in Authorization header using Bearer .
Successful Response
Bad Request
Unauthorized
Forbidden
Not Found
Conflict
Validation Error
Too Many Requests
Internal Server Error
Not Implemented
Upstream Unavailable
API key in Authorization header using Bearer .
Patchable fields for an evaluator.
Container label.
Judge model slug.
Judge prompt.
Code evaluator source.
Optional requirements list.
Optional pass threshold.
Evaluation mode.
Optional metadata.
Evaluator name.
Evaluator description.
API type.
Optional inference parameters for judge evaluators (e.g. response_format, temperature).
Successful Response
Bad Request
Unauthorized
Forbidden
Not Found
Conflict
Validation Error
Too Many Requests
Internal Server Error
Not Implemented
Upstream Unavailable
API key in Authorization header using Bearer .
Successful Response
Object type.
evaluatorEvaluator id.
Deletion status.
trueBad Request
Unauthorized
Forbidden
Not Found
Conflict
Validation Error
Too Many Requests
Internal Server Error
Not Implemented
Upstream Unavailable
API key in Authorization header using Bearer .
Container ID or label to filter by.
Filter by run status (e.g. 'running', 'completed', 'error').
200Successful Response
Object type identifier.
listTotal number of items available for this resource.
Bad Request
Unauthorized
Forbidden
Validation Error
Too Many Requests
Internal Server Error
Not Implemented
Upstream Unavailable
API key in Authorization header using Bearer .
Request body for creating an evaluation run.
Each side of the evaluation (sample and ground_truth) is described
by a single data-source object whose type discriminator determines how
data is obtained:
"dataset"— pull from a dataset."container"— pull from the container's task logs."generate"— generate completions using one or more models.
Both fields are optional, but at least one must be provided. When a
side is omitted it defaults to the top-level container's task logs. At
least one resolved side must not be type='generate' so there is seed
input to evaluate against.
Container id or label.
Evaluator ids or labels.
Sample-side data source. Omit to default to the container's task logs. Use type='dataset' to pull from a dataset, type='container' to pull from task logs, or type='generate' to generate completions with the specified models.
Ground-truth-side data source. Omit to default to the container's task logs. Use type='dataset' to pull from a dataset, type='container' to pull from task logs, or type='generate' to generate completions with a model.
Optional metadata.
Execution environment name (maps to Modal app suffix).
mainSuccessful Response
Bad Request
Unauthorized
Forbidden
Not Found
Conflict
Validation Error
Too Many Requests
Internal Server Error
Not Implemented
Upstream Unavailable
API key in Authorization header using Bearer .
Successful Response
Response model for an evaluation run.
Object type.
evaluation.runEvaluation run id (run group id).
Process id for lifecycle tracking.
Evaluator ids used in this run.
Container id.
Dataset id (if a dataset was used).
Resolved sample-side data source.
Resolved ground-truth-side data source.
Evaluation results (populated on completion).
Evaluation metrics (populated on completion).
Run configuration as submitted.
Estimated spend.
Optional metadata.
Bad Request
Unauthorized
Forbidden
Not Found
Conflict
Validation Error
Too Many Requests
Internal Server Error
Not Implemented
Upstream Unavailable
Last updated