CLIP Embedding

1. Objective

This guide provides step-by-step instructions on fine-tuning an embedding model for visual similarity tasks on Emissary. The model learns to encode images into a high-dimensional embedding space where visually similar images produce similar vectors.

Note: This task is only supported in dino-v2 and siglip.

2. Dataset Preparation

Prepare your dataset in JSON format with the following structure.

CLIP-Embedding Data Format

  [
    {
      "itemA": {
        "image": "<base64_or_url>",
        "text": "<optional_text>"
      },
      "itemB": {
        "image": "<base64_or_url>",
        "text": "<optional_text>"
      },
      "similarity": 1
    }
  ]

Field	Description	Required
itemA.image	First image (URL or base64-encoded string)	Yes
itemB.image	Second image (URL or base64-encoded string)	Yes
itemA.text	Text description for first image	No
itemB.text	Text description for second image	No
similarity	Similarity label: 1 = similar, 0 = not similar	Yes

Tip: The text field is optional and only used by SigLIP. Dinov2 will ignore it, so you can use the same dataset format for both backbones.

3. Finetuning Preparation

Please refer to the in-depth guide on Finetuning on Emissary here - Quickstart Guide.

Create Training Project

Navigate to Dashboard arriving at Training, the default page on the Emissary platform.

Click +NEW PROJECT in the dashboard
In the pop-up, enter a new training project name, and click CREATE.

Upload Dataset

A tile is created for your task. Click Manage to enter the task workspace

manage_project

Click Manage Datasets in the Datasets Available tile.
Click on + UPLOAD DATASET and select training and test datasets.
. Name dataset and upload files

4. Model Finetuning

Now, go back one panel by clicking OVERVIEW and then click Manage Training Jobs in the Training Jobs tile.

manage_training_job

Click + NEW TRAINING JOB button and fill in the configuration

new_training_job

hyper_parameters

Required Fields

Name: Name of your training job (fine-tuned model)
Base Model: Choose the backbone pre-trained / fine-tuned model from the drop down list
Training Technique Choose training technique to use, clip-embedding is only supported in SFT
Task Type: Select task type clip-embedding
Train Dataset: Select dataset you would like to train on the backbone model

(Optional)

Test Dataset: You can provide a test dataset which then will be used in testing (evaluation phase). If None selected, the testing phase will be skipped.
- Split Train/Test Dataset: Use ratio of train dataset as a test set
- Select existing dataset: Upload separate dataset for test
Hyper Parameters: Hyper Parameters’ value is all set with Good default values but you can adjust the value if you want.
Note: The loss_type for dino only supports “dino” and “contrastive”.
Test Functions: When you select any Test Dataset option, you can also provide your own test functions which provides you an aggregate results. However for default, clip-embedding will calculate the similarity between 2-images in your test dataset and shows the optimal cutoff to make your matching score highest.

After initiating the training job you will see your training job on the list

training_jobs

If you click the row you will be navigated to the training job detail page

training_job_details

You can check the Status and Progress from the summary and you can also check the live logs and loss graph when you click the tab on the side

training_job_logs

training_loss_graph

Go to Artifacts tab to check checkpoints and test results (if test dataset and functions provided).

artifacts

Deployment

From the Artifacts tab you can deploy any checkpoint from the training job by hitting DEPLOY button.

fine_tuned_model_deployment_modal

(Optional) You can also set resource management when creating a deployment. Setting a inactivity timeout will shutdown your deployment (inference engine) after a period of inactivity. Also you can schedule your deployment to be run in specific date and time.

Once you initiate your deployment you go to Inference dashboard and you will see your recent / previous deployments.

engine_list

By clicking the card you can see the details of your deployment (inference engine).

deployment_detail

Once your deployment status becomes deployed then it means your inference server is ready to be used. You can test your deployment on calling the API as below.

import requests

url = "https://api.withemissary.com/v1/image-embeddings"

MODEL_ID = "Embedding-model-1-checkpoint-5"
image_url_or_base64 = "image_url_or_base64_encoded_string_here"

HEADERS = {
    "Content-Type": "application/json",
    "Accept": "application/json",
    "X-API-Key": API_KEY
}

payload = {
    "model": MODEL_ID,
    "image": image_url_or_base64,
    "type": "pooled" # or "patch-wise"
}

response = requests.post(url, json=payload, headers=HEADERS)
response.raise_for_status()
data = response.json()

1. Objective​

2. Dataset Preparation​

CLIP-Embedding Data Format​

3. Finetuning Preparation​

Create Training Project​

Upload Dataset​

4. Model Finetuning​

Deployment​