Experiment (Zero-shot Deployment)

Experiment is the fastest way to try the Emissary platform. In under a minute, you can spin up a customized model and call it from your application through a ready-made API endpoint — no training data, no tuning, no infrastructure setup required.

Every experiment starts as a zero-shot model built from the class definitions you provide. You can begin sending traffic immediately, then iterate by uploading data, labeling, and re-training as your use case matures.

Experiment currently supports six modes:

Mode	What it does
`judge`	Evaluate outputs against a single criterion (LLM-as-Judge)
`decision`	Classify inputs into one of several mutually exclusive categories
`routing`	Route requests to the right downstream model, agent, or workflow
`tool_calling`	Pick the right tool and extract its arguments from a user query
`regression`	Score inputs on a continuous numeric scale
`ner`	Extract named entities (people, places, custom types) from text

Common Workflow

Each mode follows the same two-step pattern:

Create the experiment by POSTing to /v1/experiments with a mode and a list of classes. You receive an id and latest_version.
Call the model at the mode's inference endpoint using the format EXPERIMENT_ID/VERSION as the model parameter.

All requests require your API key in the X-API-Key header.

judge — LLM-as-Judge

Score a model's output against a single quality criterion. The model returns a probability that the criterion holds, which you can threshold or log for offline analysis.

Use it for: automated evaluation of LLM outputs in CI, online quality monitoring of a production assistant (helpfulness, safety, groundedness), or filtering low-quality generations from a synthetic dataset.

Create the experiment:

Python
cURL

import json
import requests

response = requests.post(
    'https://api.withemissary.com/v1/experiments',
    headers={
        'Content-Type': 'application/json',
        'X-API-Key': YOUR_API_KEY
    },
    data=json.dumps({
        'name': 'MyJudge',
        'mode': 'judge',
        'classes': [
            {'name': 'helpful', 'description': 'Is the response to the query helpful?'}
        ]
    })
)
print(response.json())

curl https://api.withemissary.com/v1/experiments \
  -H "Content-Type: application/json" \
  -H "X-API-Key: $YOUR_API_KEY" \
  -d '{
    "name": "MyJudge",
    "mode": "judge",
    "classes": [
      {"name": "helpful", "description": "Is the response to the query helpful?"}
    ]
  }'

{"id": "ex-ahejacuehandheha", "latest_version": "0.0.0"}

Call the model:

Python
cURL

response = requests.post(
    'https://api.withemissary.com/v1/classification',
    headers={
        'Content-Type': 'application/json',
        'X-API-Key': YOUR_API_KEY
    },
    data=json.dumps({
        'model': 'ex-ahejacuehandheha/0.0.0',  # EXPERIMENT_ID/VERSION
        'input': "User: Explain quantum computing in simple terms.\nAssistant: I don't know, Google it.",
        'data_format': 'probs'
    })
)
print(response.json())

curl https://api.withemissary.com/v1/classification \
  -H "Content-Type: application/json" \
  -H "X-API-Key: $YOUR_API_KEY" \
  -d '{
    "model": "ex-ahejacuehandheha/0.0.0",
    "input": "User: Explain quantum computing in simple terms.\nAssistant: I don'\''t know, Google it.",
    "data_format": "probs"
  }'

{
    "id": "classify-3c52592c7a404f97aa494861a79db220",
    "model": "ex-ahejacuehandheha/0.0.0",
    "data": [{"index": 0, "probs": {"helpful": 0}}],
    "created": 1779906329
}

decision

Classify an input into exactly one of several mutually exclusive labels. The model returns a probability distribution across all classes that sums to 1.

Use it for: sentiment analysis, intent detection, content moderation, support ticket triage, or any task where each input belongs to one and only one category.

Create the experiment:

Python
cURL

response = requests.post(
    'https://api.withemissary.com/v1/experiments',
    headers={
        'Content-Type': 'application/json',
        'X-API-Key': YOUR_API_KEY
    },
    data=json.dumps({
        'name': 'MyDecision',
        'mode': 'decision',
        'classes': [
            {'name': 'positive', 'description': 'The text expresses a positive sentiment, satisfaction, or approval.'},
            {'name': 'negative', 'description': 'The text expresses a negative sentiment, dissatisfaction, or complaint.'},
            {'name': 'neutral',  'description': 'The text is factual or does not express a strong opinion.'}
        ]
    })
)
print(response.json())

curl https://api.withemissary.com/v1/experiments \
  -H "Content-Type: application/json" \
  -H "X-API-Key: $YOUR_API_KEY" \
  -d '{
    "name": "MyDecision",
    "mode": "decision",
    "classes": [
      {"name": "positive", "description": "The text expresses a positive sentiment, satisfaction, or approval."},
      {"name": "negative", "description": "The text expresses a negative sentiment, dissatisfaction, or complaint."},
      {"name": "neutral",  "description": "The text is factual or does not express a strong opinion."}
    ]
  }'

{"id": "ex-ahejacuehandhehe", "latest_version": "0.0.0"}

Call the model:

Python
cURL

response = requests.post(
    'https://api.withemissary.com/v1/classification',
    headers={
        'Content-Type': 'application/json',
        'X-API-Key': YOUR_API_KEY
    },
    data=json.dumps({
        'model': 'ex-ahejacuehandhehe/0.0.0',
        'input': 'This product is absolutely wonderful!',
        'data_format': 'probs'
    })
)
print(response.json())

curl https://api.withemissary.com/v1/classification \
  -H "Content-Type: application/json" \
  -H "X-API-Key: $YOUR_API_KEY" \
  -d '{
    "model": "ex-ahejacuehandhehe/0.0.0",
    "input": "This product is absolutely wonderful!",
    "data_format": "probs"
  }'

{
    "id": "classify-eac7451c54174ebc8da47be77dfe9581",
    "model": "ex-ahejacuehandhehe/0.0.0",
    "data": [{
        "index": 0,
        "probs": {
            "negative": 0.017637422308325768,
            "neutral":  0.024233082309365273,
            "positive": 0.9581295251846313
        }
    }],
    "created": 1779907078
}

routing

A specialized form of classification designed for picking the right downstream destination for each request. Behaves like decision but is optimized for routing patterns where each class represents a target model, agent, or pipeline.

Use it for: sending simple queries to a fast/cheap model and complex ones to a larger one, dispatching tasks across specialized agents (research, coding, creative), or routing customer messages to the correct team.

Create the experiment:

Python
cURL

response = requests.post(
    'https://api.withemissary.com/v1/experiments',
    headers={
        'Content-Type': 'application/json',
        'X-API-Key': YOUR_API_KEY
    },
    data=json.dumps({
        'name': 'MyRouter',
        'mode': 'routing',
        'classes': [
            {'name': 'simple_task',   'description': 'Simple, straightforward queries that can be handled by a small, fast model.'},
            {'name': 'complex_task',  'description': 'Complex queries requiring reasoning, code generation, or multi-step analysis.'},
            {'name': 'creative_task', 'description': 'Creative writing, brainstorming, or content generation tasks.'}
        ]
    })
)
print(response.json())

curl https://api.withemissary.com/v1/experiments \
  -H "Content-Type: application/json" \
  -H "X-API-Key: $YOUR_API_KEY" \
  -d '{
    "name": "MyRouter",
    "mode": "routing",
    "classes": [
      {"name": "simple_task",   "description": "Simple, straightforward queries that can be handled by a small, fast model."},
      {"name": "complex_task",  "description": "Complex queries requiring reasoning, code generation, or multi-step analysis."},
      {"name": "creative_task", "description": "Creative writing, brainstorming, or content generation tasks."}
    ]
  }'

{"id": "ex-ahejacuehandheheee", "latest_version": "0.0.0"}

Call the model:

Python
cURL

response = requests.post(
    'https://api.withemissary.com/v1/classification',
    headers={
        'Content-Type': 'application/json',
        'X-API-Key': YOUR_API_KEY
    },
    data=json.dumps({
        'model': 'ex-ahejacuehandheheee/0.0.0',
        'input': 'Write a poem about autumn leaves',
        'data_format': 'probs'
    })
)
print(response.json())

curl https://api.withemissary.com/v1/classification \
  -H "Content-Type: application/json" \
  -H "X-API-Key: $YOUR_API_KEY" \
  -d '{
    "model": "ex-ahejacuehandheheee/0.0.0",
    "input": "Write a poem about autumn leaves",
    "data_format": "probs"
  }'

{
    "id": "classify-23d895d52c534bfca2c24e5492d3321d",
    "model": "ex-ahejacuehandheheee/0.0.0",
    "data": [{
        "index": 0,
        "probs": {
            "complex_task":  0.011172994039952755,
            "creative_task": 0.9710367918014526,
            "simple_task":   0.017790259793400764
        }
    }],
    "created": 1779907385
}

tool_calling

Select the right tool for a user query and extract the arguments needed to call it. Each class is a tool defined with a JSON Schema for its parameters; the model returns both the tool's probability and a structured generation payload ready to pass to your function.

Use it for: building lightweight, low-latency function-calling agents without a general-purpose LLM in the hot path — database query dispatchers, API gateways, voice assistants, or any system where you need both routing and structured argument extraction in one call.

Create the experiment:

Python
cURL

response = requests.post(
    'https://api.withemissary.com/v1/experiments',
    headers={
        'Content-Type': 'application/json',
        'X-API-Key': YOUR_API_KEY
    },
    data=json.dumps({
        'name': 'MyToolCalling',
        'mode': 'tool_calling',
        'classes': [
            {
                'name': 'sql.execute',
                'description': 'Execute SQL queries based on user-defined parameters like SQL keyword, table name, column names, and conditions.',
                'parameters': {
                    'type': 'object',
                    'properties': {
                        'sql_keyword': {'type': 'string', 'enum': ['SELECT', 'INSERT', 'UPDATE', 'DELETE', 'CREATE']},
                        'table_name':  {'type': 'string'},
                        'columns':     {'type': 'array', 'items': {'type': 'string'}}
                    },
                    'required': ['sql_keyword', 'table_name']
                }
            },
            {
                'name': 'Movies_3_FindMovies',
                'description': 'Retrieves a list of movies based on the director, genre, and cast specified by the user.',
                'parameters': {
                    'type': 'object',
                    'properties': {
                        'directed_by': {'type': 'string'},
                        'genre':       {'type': 'string', 'enum': ['Fantasy', 'Mystery', 'Thriller', 'Comedy', 'Drama', 'Action']},
                        'cast':        {'type': 'string'}
                    },
                    'required': []
                }
            },
            {
                'name': 'Weather_1_GetWeather',
                'description': 'Retrieves the weather forecast for a specified city on a particular date.',
                'parameters': {
                    'type': 'object',
                    'properties': {
                        'city': {'type': 'string'},
                        'date': {'type': 'string'}
                    },
                    'required': ['city']
                }
            }
        ]
    })
)
print(response.json())

curl https://api.withemissary.com/v1/experiments \
  -H "Content-Type: application/json" \
  -H "X-API-Key: $YOUR_API_KEY" \
  -d '{
    "name": "MyToolCalling",
    "mode": "tool_calling",
    "classes": [
      {
        "name": "sql.execute",
        "description": "Execute SQL queries based on user-defined parameters like SQL keyword, table name, column names, and conditions.",
        "parameters": {
          "type": "object",
          "properties": {
            "sql_keyword": {"type": "string", "enum": ["SELECT", "INSERT", "UPDATE", "DELETE", "CREATE"]},
            "table_name":  {"type": "string"},
            "columns":     {"type": "array", "items": {"type": "string"}}
          },
          "required": ["sql_keyword", "table_name"]
        }
      },
      {
        "name": "Movies_3_FindMovies",
        "description": "Retrieves a list of movies based on the director, genre, and cast specified by the user.",
        "parameters": {
          "type": "object",
          "properties": {
            "directed_by": {"type": "string"},
            "genre":       {"type": "string", "enum": ["Fantasy", "Mystery", "Thriller", "Comedy", "Drama", "Action"]},
            "cast":        {"type": "string"}
          },
          "required": []
        }
      },
      {
        "name": "Weather_1_GetWeather",
        "description": "Retrieves the weather forecast for a specified city on a particular date.",
        "parameters": {
          "type": "object",
          "properties": {
            "city": {"type": "string"},
            "date": {"type": "string"}
          },
          "required": ["city"]
        }
      }
    ]
  }'

{"id": "ex-ahejacuehandhehedfe", "latest_version": "0.0.0"}

Call the model:

Tool calling uses the /v1/classify-generate endpoint, which returns both the selected tool and its arguments.

Python
cURL

response = requests.post(
    'https://api.withemissary.com/v1/classify-generate',
    headers={
        'Content-Type': 'application/json',
        'X-API-Key': YOUR_API_KEY
    },
    data=json.dumps({
        'model': 'ex-ahejacuehandhehedfe/0.0.0',
        'input': 'Find action movies directed by Christopher Nolan'
    })
)
print(response.json())

curl https://api.withemissary.com/v1/classify-generate \
  -H "Content-Type: application/json" \
  -H "X-API-Key: $YOUR_API_KEY" \
  -d '{
    "model": "ex-ahejacuehandhehedfe/0.0.0",
    "input": "Find action movies directed by Christopher Nolan"
  }'

{
    "id": "classify-6829e3d6987641a1badd9b16d8098ef1",
    "model": "ex-ahejacuehandhehedfe/0.0.0",
    "data": [{
        "index": 0,
        "generation": [{
            "directed_by": "Christopher Nolan",
            "genre": "Action"
        }],
        "probs": {
            "Movies_3_FindMovies":   0.9871819019317627,
            "NO_TOOL":               0.001537600066512823,
            "Weather_1_GetWeather":  0.007378291338682175,
            "sql.execute":           0.0039021370466798544
        }
    }],
    "created": 1779908032
}

Note: A built-in NO_TOOL class is added automatically so the model can decline to call any tool when none applies.

regression

Score inputs on a continuous scale rather than choosing a discrete label. You define each scale by its min/max range and what the endpoints mean; the model returns a single numeric value.

Use it for: scoring sentiment intensity, response quality on a 1–5 scale, urgency or risk levels, readability, toxicity severity — anything where "how much" matters more than "which category."

Create the experiment:

Python
cURL

response = requests.post(
    'https://api.withemissary.com/v1/experiments',
    headers={
        'Content-Type': 'application/json',
        'X-API-Key': YOUR_API_KEY
    },
    data=json.dumps({
        'name': 'MyRegressor',
        'mode': 'regression',
        'classes': [{
            'name': 'valence',
            'description': 'How positive the emotional tone is.',
            'min_value': 0,
            'max_value': 1,
            'low_description':  'very negative emotional tone',
            'high_description': 'very positive emotional tone',
            'higher_means':     'more positive emotional tone'
        }]
    })
)
print(response.json())

curl https://api.withemissary.com/v1/experiments \
  -H "Content-Type: application/json" \
  -H "X-API-Key: $YOUR_API_KEY" \
  -d '{
    "name": "MyRegressor",
    "mode": "regression",
    "classes": [{
      "name": "valence",
      "description": "How positive the emotional tone is.",
      "min_value": 0,
      "max_value": 1,
      "low_description":  "very negative emotional tone",
      "high_description": "very positive emotional tone",
      "higher_means":     "more positive emotional tone"
    }]
  }'

{"id": "ex-ahejacuehandhehedfedaew", "latest_version": "0.0.0"}

Call the model:

Python
cURL

response = requests.post(
    'https://api.withemissary.com/v1/regression',
    headers={
        'Content-Type': 'application/json',
        'X-API-Key': YOUR_API_KEY
    },
    data=json.dumps({
        'model': 'ex-ahejacuehandhehedfedaew/0.0.0',
        'input': 'I absolutely love this product — best purchase I have made all year!'
    })
)
print(response.json())

curl https://api.withemissary.com/v1/regression \
  -H "Content-Type: application/json" \
  -H "X-API-Key: $YOUR_API_KEY" \
  -d '{
    "model": "ex-ahejacuehandhehedfedaew/0.0.0",
    "input": "I absolutely love this product — best purchase I have made all year!"
  }'

{
    "id": "classify-56abbf04ce9b4fc793461a21538eabcb",
    "model": "ex-ahejacuehandhehedfedaew/0.0.0",
    "data": [{"index": 0, "logits": 0.9821727275848389}],
    "created": 1779908552
}

ner — Text Extraction

Extract structured entities from unstructured text. Unlike a classifier, NER returns spans of text grouped by type, and you fully define what counts as an entity through the class descriptions — so you're not limited to a fixed taxonomy like PERSON/ORG/LOC.

Use it for: pulling structured data from emails, contracts, or chat messages; extracting PII for redaction; lifting custom entities like ticker symbols, drug names, product SKUs, or contract clauses from your domain.

Create the experiment:

Python
cURL

response = requests.post(
    'https://api.withemissary.com/v1/experiments',
    headers={
        'Content-Type': 'application/json',
        'X-API-Key': YOUR_API_KEY
    },
    data=json.dumps({
        'name': 'MyExtractor',
        'mode': 'ner',
        'classes': [
            {'name': 'PERSON',       'description': 'human names'},
            {'name': 'ORGANIZATION', 'description': 'companies and institutions'},
            {'name': 'LOCATION',     'description': 'cities and places'}
        ]
    })
)
print(response.json())

curl https://api.withemissary.com/v1/experiments \
  -H "Content-Type: application/json" \
  -H "X-API-Key: $YOUR_API_KEY" \
  -d '{
    "name": "MyExtractor",
    "mode": "ner",
    "classes": [
      {"name": "PERSON",       "description": "human names"},
      {"name": "ORGANIZATION", "description": "companies and institutions"},
      {"name": "LOCATION",     "description": "cities and places"}
    ]
  }'

{"id": "ex-ahejacuehandhehedqeef", "latest_version": "0.0.0"}

Call the model:

Python
cURL

response = requests.post(
    'https://api.withemissary.com/v1/ner',
    headers={
        'Content-Type': 'application/json',
        'X-API-Key': YOUR_API_KEY
    },
    data=json.dumps({
        'model': 'ex-ahejacuehandhehedqeef/0.0.0',
        'input': 'Sundar Pichai announced that Google will open a new research lab in Zurich next year.'
    })
)
print(response.json())

curl https://api.withemissary.com/v1/ner \
  -H "Content-Type: application/json" \
  -H "X-API-Key: $YOUR_API_KEY" \
  -d '{
    "model": "ex-ahejacuehandhehedqeef/0.0.0",
    "input": "Sundar Pichai announced that Google will open a new research lab in Zurich next year."
  }'

{
    "id": "ner-8cf67c5c3ff94550a930874cc261edbe",
    "model": "ex-ahejacuehandhehedqeef/0.0.0",
    "entities": {
        "LOCATION":     [{"entity": "Zurich"}],
        "ORGANIZATION": [{"entity": "Google"}],
        "PERSON":       [{"entity": "Sundar Pichai"}]
    },
    "created": 1779908815
}

Endpoint Reference

Mode	Endpoint
`judge`, `decision`, `routing`	`POST /v1/classification`
`tool_calling`	`POST /v1/classify-generate`
`regression`	`POST /v1/regression`
`ner`	`POST /v1/ner`

Next Steps

The zero-shot model created here is just the starting point. Once you've validated the mode and class definitions against real traffic, you can:

Upload labeled examples to fine-tune the model for higher accuracy
Publish new versions and roll them out incrementally
Monitor inference logs in the dashboard to spot edge cases

Head to the Datasets and Training sections of the docs to continue.

Common Workflow​

judge — LLM-as-Judge​

decision​

routing​

tool_calling​

regression​

ner — Text Extraction​

Endpoint Reference​

Next Steps​

Common Workflow

judge — LLM-as-Judge

decision

routing

tool_calling

regression

ner — Text Extraction

Endpoint Reference

Next Steps