Model Deployment

The process of wrapping your models into scalable, production-ready endpoints.

Model Deployment is the final step in the machine learning lifecycle. It takes a versioned artifact from the Model Registry and wraps it into a high-performance, scalable endpoint ready to serve real-time predictions.

Deploying a Registered Model

To deploy a model, you need the id of the logged model (which you obtained in the previous step). The deployment process allocates the necessary computational resources (CPU, RAM, or GPU) and sets up the inference runtime.

Create a Deployment

Use the deployment create command to launch your model. You must specify the model ID and the desired serving size.

fathom intelligence machine-learning deployment create logged-model --model-id 6174cc98-55fb-4818-9370-f75cafade62e --name "iris-classifier" --description "Production endpoint for Iris flower classification" --serving-size small

Option	Requirement	Description
–model-id	Required	The UUID of the model from the registry.
–name	Required	A unique name for your deployment.
–serving-size	Optional	Resource tier: small, large, or extra-large.
–serving-gpu	Optional	Attach a GPU for heavy models (nvidia-l4, nvidia-l4-2x).

Tag deployment

Use the deployment tag to tag your deployment. You must specify the deployment ID. It is possible to remove tags using deployment untag.

Monitoring Deployment Status

Deployments happen asynchronously. After creating one, you should monitor its state to ensure it transitions to running:

fathom intelligence machine-learning deployment list

Output example of a command run with --watch option:

 id                                   | created_at                     | name                 | kind                                       | description                                        | status  | state | tags
--------------------------------------+--------------------------------+----------------------+--------------------------------------------+----------------------------------------------------+---------+--------------
 379f103f-45cd-4c00-aec3-0fa4af756cae | 2026-03-25 08:06:17.003811 UTC | iris-classifier      | logged-models                              | Production endpoint for Iris flower classification | pending | N/A | production 

 id                                   | created_at                     | name            | kind          | description                                        | status  | state | tags |
--------------------------------------+--------------------------------+-----------------+---------------+----------------------------------------------------+---------+---------------
 379f103f-45cd-4c00-aec3-0fa4af756cae | 2026-03-25 08:06:17.003811 UTC | iris-classifier | logged-models | Production endpoint for Iris flower classification | running | hot   | production |

Resource Sizing

For the Iris Classifier (ONNX), a small serving size is more than sufficient. Choose large or attach a GPU only for complex models.

Updating a Deployment

Once a deployment is running, you can update it to point to a new version of your model (e.g., a newly trained logged-model-id) or change its resource allocation (e.g., upgrading from small to large).

The platform performs a rolling update, ensuring that your endpoint remains available while the new model version is being provisioned.

fathom intelligence machine-learning deployment update <DEPLOYMENT_ID> logged-model <OPTIONS>

Option	Description
–model-id	The new Logged Model UUID from the registry.
–name	Update the display name of the deployment.
–description	Update the deployment’s metadata/description.
–serving-size	Scale resources (small, large, extra-large).
–serving-gpu	Change or add a GPU accelerator.

Example: Update Logged Model

To promote a new model version to an existing deployment, use the update logged-model command. You will need the Deployment ID and the new Model ID.

fathom intelligence machine-learning deployment update 3cdec2ec-f51e-420c-937a-6c65af770084 logged-model --model-id 93096f6a-3a8a-4315-bc18-615ef72c7bcc

Model Inference

Once your deployment is in the running and hot state, you can begin making predictions. Fathom Intelligence supports three primary inference modes depending on your model type: General Tensor Inference, Chat Completions, and Embeddings.

General Tensor Inference (V2 Protocol)

This mode is used for classic ML models (Scikit-learn, ONNX, XGBoost) and computer vision. It follows the NVIDIA Triton V2 Predict Protocol.

Pipe via Standard Input (Recommended for Scripts)

You can pipe a JSON payload directly into the CLI. This is ideal for integration with tools like jq or automated data pipelines.

echo '{
  "inputs": [
    {
      "name": "float_input",
      "shape": [1, 4],
      "datatype": "FP32",
      "data": [7.0, 3.2, 4.7, 1.4]
    }
  ],
  "outputs": [
    {
      "name": "label"
    }
  ]
}' | fathom intelligence machine-learning deployment infer <DEPLOYMENT_ID> --data -

Inline JSON Payload

For quick manual testing, you can pass the full JSON object directly as a string.

fathom intelligence machine-learning deployment infer <DEPLOYMENT_ID> --data '{
  "inputs": [
    {
      "name": "float_input",
      "shape": [2, 4],
      "datatype": "FP32",
      "data": [5.1, 3.5, 1.4, 0.2]
    }
  ],
  "outputs": [
    {
      "name": "label"
    }
  ]
}'

Protocol Compatibility

MLflow ONNX Models (like the Iris Classifier we registered earlier) strictly support the Triton Inference Protocol via the infer command.

Generative commands like chat and embed are reserved for LLMs and specialized transformers (e.g., from Hugging Face), which will be covered in the following sections.

Developer Resources

Ready to integrate these models into your code? Check out our API Integration Guide for detailed documentation on endpoints, authentication headers, and code examples in Rust and Python.

Last modified April 24, 2026: Merge 3bb4672c9ae15a0dcd08061a85722c9bcadfae8f into 78b252ca111bc5b90174676e09115f7af7d6d37f (92ca1b0)