Skip to main content

Experiment (Zero-shot Deployment)

Experiment is the fastest way to try the Emissary platform. In under a minute, you can spin up a customized model and call it from your application through a ready-made API endpoint — no training data, no tuning, no infrastructure setup required.

Every experiment starts as a zero-shot model built from the class definitions you provide. You can begin sending traffic immediately, then iterate by uploading data, labeling, and re-training as your use case matures.

Experiment currently supports six modes:

ModeWhat it does
judgeEvaluate outputs against a single criterion (LLM-as-Judge)
decisionClassify inputs into one of several mutually exclusive categories
routingRoute requests to the right downstream model, agent, or workflow
tool_callingPick the right tool and extract its arguments from a user query
regressionScore inputs on a continuous numeric scale
nerExtract named entities (people, places, custom types) from text

Common Workflow

Each mode follows the same two-step pattern:

  1. Create the experiment by POSTing to /v1/experiments with a mode and a list of classes. You receive an id and latest_version.
  2. Call the model at the mode's inference endpoint using the format EXPERIMENT_ID/VERSION as the model parameter.

All requests require your API key in the X-API-Key header.


judge — LLM-as-Judge

Score a model's output against a single quality criterion. The model returns a probability that the criterion holds, which you can threshold or log for offline analysis.

Use it for: automated evaluation of LLM outputs in CI, online quality monitoring of a production assistant (helpfulness, safety, groundedness), or filtering low-quality generations from a synthetic dataset.

Create the experiment:

import json
import requests

response = requests.post(
'https://api.withemissary.com/v1/experiments',
headers={
'Content-Type': 'application/json',
'X-API-Key': YOUR_API_KEY
},
data=json.dumps({
'name': 'MyJudge',
'mode': 'judge',
'classes': [
{'name': 'helpful', 'description': 'Is the response to the query helpful?'}
]
})
)
print(response.json())
{"id": "ex-ahejacuehandheha", "latest_version": "0.0.0"}

Call the model:

response = requests.post(
'https://api.withemissary.com/v1/classification',
headers={
'Content-Type': 'application/json',
'X-API-Key': YOUR_API_KEY
},
data=json.dumps({
'model': 'ex-ahejacuehandheha/0.0.0', # EXPERIMENT_ID/VERSION
'input': "User: Explain quantum computing in simple terms.\nAssistant: I don't know, Google it.",
'data_format': 'probs'
})
)
print(response.json())
{
"id": "classify-3c52592c7a404f97aa494861a79db220",
"model": "ex-ahejacuehandheha/0.0.0",
"data": [{"index": 0, "probs": {"helpful": 0}}],
"created": 1779906329
}

decision

Classify an input into exactly one of several mutually exclusive labels. The model returns a probability distribution across all classes that sums to 1.

Use it for: sentiment analysis, intent detection, content moderation, support ticket triage, or any task where each input belongs to one and only one category.

Create the experiment:

response = requests.post(
'https://api.withemissary.com/v1/experiments',
headers={
'Content-Type': 'application/json',
'X-API-Key': YOUR_API_KEY
},
data=json.dumps({
'name': 'MyDecision',
'mode': 'decision',
'classes': [
{'name': 'positive', 'description': 'The text expresses a positive sentiment, satisfaction, or approval.'},
{'name': 'negative', 'description': 'The text expresses a negative sentiment, dissatisfaction, or complaint.'},
{'name': 'neutral', 'description': 'The text is factual or does not express a strong opinion.'}
]
})
)
print(response.json())
{"id": "ex-ahejacuehandhehe", "latest_version": "0.0.0"}

Call the model:

response = requests.post(
'https://api.withemissary.com/v1/classification',
headers={
'Content-Type': 'application/json',
'X-API-Key': YOUR_API_KEY
},
data=json.dumps({
'model': 'ex-ahejacuehandhehe/0.0.0',
'input': 'This product is absolutely wonderful!',
'data_format': 'probs'
})
)
print(response.json())
{
"id": "classify-eac7451c54174ebc8da47be77dfe9581",
"model": "ex-ahejacuehandhehe/0.0.0",
"data": [{
"index": 0,
"probs": {
"negative": 0.017637422308325768,
"neutral": 0.024233082309365273,
"positive": 0.9581295251846313
}
}],
"created": 1779907078
}

routing

A specialized form of classification designed for picking the right downstream destination for each request. Behaves like decision but is optimized for routing patterns where each class represents a target model, agent, or pipeline.

Use it for: sending simple queries to a fast/cheap model and complex ones to a larger one, dispatching tasks across specialized agents (research, coding, creative), or routing customer messages to the correct team.

Create the experiment:

response = requests.post(
'https://api.withemissary.com/v1/experiments',
headers={
'Content-Type': 'application/json',
'X-API-Key': YOUR_API_KEY
},
data=json.dumps({
'name': 'MyRouter',
'mode': 'routing',
'classes': [
{'name': 'simple_task', 'description': 'Simple, straightforward queries that can be handled by a small, fast model.'},
{'name': 'complex_task', 'description': 'Complex queries requiring reasoning, code generation, or multi-step analysis.'},
{'name': 'creative_task', 'description': 'Creative writing, brainstorming, or content generation tasks.'}
]
})
)
print(response.json())
{"id": "ex-ahejacuehandheheee", "latest_version": "0.0.0"}

Call the model:

response = requests.post(
'https://api.withemissary.com/v1/classification',
headers={
'Content-Type': 'application/json',
'X-API-Key': YOUR_API_KEY
},
data=json.dumps({
'model': 'ex-ahejacuehandheheee/0.0.0',
'input': 'Write a poem about autumn leaves',
'data_format': 'probs'
})
)
print(response.json())
{
"id": "classify-23d895d52c534bfca2c24e5492d3321d",
"model": "ex-ahejacuehandheheee/0.0.0",
"data": [{
"index": 0,
"probs": {
"complex_task": 0.011172994039952755,
"creative_task": 0.9710367918014526,
"simple_task": 0.017790259793400764
}
}],
"created": 1779907385
}

tool_calling

Select the right tool for a user query and extract the arguments needed to call it. Each class is a tool defined with a JSON Schema for its parameters; the model returns both the tool's probability and a structured generation payload ready to pass to your function.

Use it for: building lightweight, low-latency function-calling agents without a general-purpose LLM in the hot path — database query dispatchers, API gateways, voice assistants, or any system where you need both routing and structured argument extraction in one call.

Create the experiment:

response = requests.post(
'https://api.withemissary.com/v1/experiments',
headers={
'Content-Type': 'application/json',
'X-API-Key': YOUR_API_KEY
},
data=json.dumps({
'name': 'MyToolCalling',
'mode': 'tool_calling',
'classes': [
{
'name': 'sql.execute',
'description': 'Execute SQL queries based on user-defined parameters like SQL keyword, table name, column names, and conditions.',
'parameters': {
'type': 'object',
'properties': {
'sql_keyword': {'type': 'string', 'enum': ['SELECT', 'INSERT', 'UPDATE', 'DELETE', 'CREATE']},
'table_name': {'type': 'string'},
'columns': {'type': 'array', 'items': {'type': 'string'}}
},
'required': ['sql_keyword', 'table_name']
}
},
{
'name': 'Movies_3_FindMovies',
'description': 'Retrieves a list of movies based on the director, genre, and cast specified by the user.',
'parameters': {
'type': 'object',
'properties': {
'directed_by': {'type': 'string'},
'genre': {'type': 'string', 'enum': ['Fantasy', 'Mystery', 'Thriller', 'Comedy', 'Drama', 'Action']},
'cast': {'type': 'string'}
},
'required': []
}
},
{
'name': 'Weather_1_GetWeather',
'description': 'Retrieves the weather forecast for a specified city on a particular date.',
'parameters': {
'type': 'object',
'properties': {
'city': {'type': 'string'},
'date': {'type': 'string'}
},
'required': ['city']
}
}
]
})
)
print(response.json())
{"id": "ex-ahejacuehandhehedfe", "latest_version": "0.0.0"}

Call the model:

Tool calling uses the /v1/classify-generate endpoint, which returns both the selected tool and its arguments.

response = requests.post(
'https://api.withemissary.com/v1/classify-generate',
headers={
'Content-Type': 'application/json',
'X-API-Key': YOUR_API_KEY
},
data=json.dumps({
'model': 'ex-ahejacuehandhehedfe/0.0.0',
'input': 'Find action movies directed by Christopher Nolan'
})
)
print(response.json())
{
"id": "classify-6829e3d6987641a1badd9b16d8098ef1",
"model": "ex-ahejacuehandhehedfe/0.0.0",
"data": [{
"index": 0,
"generation": [{
"directed_by": "Christopher Nolan",
"genre": "Action"
}],
"probs": {
"Movies_3_FindMovies": 0.9871819019317627,
"NO_TOOL": 0.001537600066512823,
"Weather_1_GetWeather": 0.007378291338682175,
"sql.execute": 0.0039021370466798544
}
}],
"created": 1779908032
}

Note: A built-in NO_TOOL class is added automatically so the model can decline to call any tool when none applies.


regression

Score inputs on a continuous scale rather than choosing a discrete label. You define each scale by its min/max range and what the endpoints mean; the model returns a single numeric value.

Use it for: scoring sentiment intensity, response quality on a 1–5 scale, urgency or risk levels, readability, toxicity severity — anything where "how much" matters more than "which category."

Create the experiment:

response = requests.post(
'https://api.withemissary.com/v1/experiments',
headers={
'Content-Type': 'application/json',
'X-API-Key': YOUR_API_KEY
},
data=json.dumps({
'name': 'MyRegressor',
'mode': 'regression',
'classes': [{
'name': 'valence',
'description': 'How positive the emotional tone is.',
'min_value': 0,
'max_value': 1,
'low_description': 'very negative emotional tone',
'high_description': 'very positive emotional tone',
'higher_means': 'more positive emotional tone'
}]
})
)
print(response.json())
{"id": "ex-ahejacuehandhehedfedaew", "latest_version": "0.0.0"}

Call the model:

response = requests.post(
'https://api.withemissary.com/v1/regression',
headers={
'Content-Type': 'application/json',
'X-API-Key': YOUR_API_KEY
},
data=json.dumps({
'model': 'ex-ahejacuehandhehedfedaew/0.0.0',
'input': 'I absolutely love this product — best purchase I have made all year!'
})
)
print(response.json())
{
"id": "classify-56abbf04ce9b4fc793461a21538eabcb",
"model": "ex-ahejacuehandhehedfedaew/0.0.0",
"data": [{"index": 0, "logits": 0.9821727275848389}],
"created": 1779908552
}

ner — Text Extraction

Extract structured entities from unstructured text. Unlike a classifier, NER returns spans of text grouped by type, and you fully define what counts as an entity through the class descriptions — so you're not limited to a fixed taxonomy like PERSON/ORG/LOC.

Use it for: pulling structured data from emails, contracts, or chat messages; extracting PII for redaction; lifting custom entities like ticker symbols, drug names, product SKUs, or contract clauses from your domain.

Create the experiment:

response = requests.post(
'https://api.withemissary.com/v1/experiments',
headers={
'Content-Type': 'application/json',
'X-API-Key': YOUR_API_KEY
},
data=json.dumps({
'name': 'MyExtractor',
'mode': 'ner',
'classes': [
{'name': 'PERSON', 'description': 'human names'},
{'name': 'ORGANIZATION', 'description': 'companies and institutions'},
{'name': 'LOCATION', 'description': 'cities and places'}
]
})
)
print(response.json())
{"id": "ex-ahejacuehandhehedqeef", "latest_version": "0.0.0"}

Call the model:

response = requests.post(
'https://api.withemissary.com/v1/ner',
headers={
'Content-Type': 'application/json',
'X-API-Key': YOUR_API_KEY
},
data=json.dumps({
'model': 'ex-ahejacuehandhehedqeef/0.0.0',
'input': 'Sundar Pichai announced that Google will open a new research lab in Zurich next year.'
})
)
print(response.json())
{
"id": "ner-8cf67c5c3ff94550a930874cc261edbe",
"model": "ex-ahejacuehandhehedqeef/0.0.0",
"entities": {
"LOCATION": [{"entity": "Zurich"}],
"ORGANIZATION": [{"entity": "Google"}],
"PERSON": [{"entity": "Sundar Pichai"}]
},
"created": 1779908815
}

Endpoint Reference

ModeEndpoint
judge, decision, routingPOST /v1/classification
tool_callingPOST /v1/classify-generate
regressionPOST /v1/regression
nerPOST /v1/ner

Next Steps

The zero-shot model created here is just the starting point. Once you've validated the mode and class definitions against real traffic, you can:

  • Upload labeled examples to fine-tune the model for higher accuracy
  • Publish new versions and roll them out incrementally
  • Monitor inference logs in the dashboard to spot edge cases

Head to the Datasets and Training sections of the docs to continue.