This is the multi-page printable view of this section. Click here to print.

Return to the regular view of this page.

Machine Learning

This section covers the various ways to integrate, deploy, and manage machine learning models within the Fathom platform, ranging from proprietary custom models to pre-trained industry standards.

1: Custom Models

1.1: Tracking Experiments
1.2: Model Registry
1.3: Model Deployment

2: Hugging Face Models
3: Managed LLMs

1 - Custom Models

This document describes different approaches to accessing and preparing data required for training machine learning models.

1.1 - Tracking Experiments

Tools to log parameters, metrics, and metadata during the training phase to ensure reproducibility.

Fathom enables seamless experiment tracking by integrating with platform experiment tracking. This allows you to log parameters, metrics, and metadata during your training phase to ensure full reproducibility of your machine learning models.

How It Works

The platform does not require a custom logging library. Instead, it leverages the standard MLflow Python SDK (minimum version 3.5 required).

When you execute your training scripts via the Fathom CLI, the system automatically injects the necessary environment variables and authentication contexts. This ensures that all data logged via the SDK is correctly routed to your organization’s private experiment registry on the platform.

Integrated Workflow: Code & Run

The most efficient way to track an experiment is to write a standard Python script and execute it using the mlflow run wrapper. This ensures that your session is authenticated and linked to the correct project.

Write your Training Script

Create a file (e.g., train.py) using the standard MLflow library. The platform handles the backend connection automatically.

import mlflow
import os

# Create experiment
mlflow.set_experiment("fraud-detection-v1")

# Add or update tags to the created experiment.
mlflow.set_experiment_tags({
    "project_name": "Fraud Prevention",
    "team": "Data Science Core",
    "priority": "High"
})

with mlflow.start_run():
    # Log parameters (hyperparameters)
    mlflow.log_param("learning_rate", 0.01)
    mlflow.log_param("epochs", 10)
    
    # Log metrics (performance)
    mlflow.log_metric("accuracy", 0.95)
    mlflow.log_metric("loss", 0.05)
    # You can tag each run under an experiment independently.
    mlflow:set_tag("version", "1.0")
    
    # Log the model (Logged Model)
    # This makes the model visible in the Fathom Model Registry
    # mlflow.sklearn.log_model(sk_model, "model")
    
    print("Run completed and logged to Fathom.")

Local Setup

To quickly prepare your local environment, we recommend using a virtual environment:

python3 -m venv .venv
source .venv/bin/activate
pip install "mlflow>=3.1.4"

Execute with Fathom Context

To run your script and ensure the MLflow context is correctly injected, use the mlflow run command. This command wraps your execution and handles all backend communication.

fathom intelligence mlflow run <COMMAND>

To execute your local Python script:

fathom i mlflow run python3 train.py

Accessing Results

Metrics and experiment history are accessible via the intelligence platform portal.

Key Benefits

Zero-Config Tracking

The CLI automatically manages MLFLOW_TRACKING_URI and authentication tokens. No need to hardcode credentials or endpoints.

Native SDK Support

Use the tools you already know (Python, Scikit-learn, PyTorch) without custom Fathom-specific logging libraries.

Unified Registry

Models logged during training are immediately visible in the platform and ready for deployment.

Requirements

The Fathom integration requires MLflow SDK version 3.1.4 or higher. Check your version using pip show mlflow.

1.2 - Model Registry

Centralized versioning and storage for your model artifacts.

The Model Registry is a centralized repository where your trained machine learning models are stored, versioned, and prepared for deployment. Models enter the registry primarily through the mlflow tracking integration.

Registering a Model

To ensure maximum interoperability and performance, we recommend exporting models to the ONNX format. Below is a practical example using a small, public dataset (Iris) to train a model and push it to the Model Registry.

The following script trains a simple classifier and logs it as an ONNX artifact.

import mlflow
import mlflow.onnx
import onnx
import numpy as np

from mlflow.models.signature import ModelSignature
from mlflow.types.schema import Schema, TensorSpec
from sklearn.datasets import load_iris
from sklearn.ensemble import RandomForestClassifier
from skl2onnx import convert_sklearn
from skl2onnx.common.data_types import FloatTensorType

# 1. Prepare data and train a small model
iris = load_iris()
X, y = iris.data, iris.target
model = RandomForestClassifier(n_estimators=10)
model.fit(X, y)

input_schema = Schema([
    TensorSpec(np.dtype(np.float32), [1, 4], name="float_input")
])
output_schema = Schema([
    TensorSpec(np.dtype(np.int64), [-1], name="label")
])
signature = ModelSignature(inputs=input_schema, outputs=output_schema)

# 2. Convert the Scikit-learn model to ONNX format
initial_type = [('float_input', FloatTensorType([None, 4]))]
options = {type(model): {'zipmap': False}}

onnx_model = convert_sklearn(model, initial_types=initial_type, options=options, target_opset=17)

mlflow.set_experiment("test-experiment")

# Add or update tags to the created experiment.
mlflow.set_experiment_tags({
    "project_name": "Fraud Prevention",
    "team": "Data Science Core",
    "priority": "High"
})

# 3. Log to Fathom via MLflow
with mlflow.start_run():
    # Log hyperparameters for context
    mlflow.log_param("n_estimators", 10)

    # You can tag each run under an experiment independently.
    mlflow:set_tag("version", "1.0")
    
    # Register the model in the registry
    mlflow.onnx.log_model(
        onnx_model=onnx_model,
        artifact_path="iris_classifier",
        signature=signature,
        input_example=X[:1] 
    )
    
    print("Model successfully pushed to the Fathom Registry.")

Dependencies

To use the example above, ensure you have the conversion libraries installed in your environment:

pip install skl2onnx onnxruntime

Why Signatures Matter

A missing or incorrect signature is the most common cause of Deployment Failures. The platform requires exact tensor specifications (dtype and shape) to prepare the serving infrastructure.

Version Compatibility

Unsupported model IR version? If your deployment fails with an “Unsupported IR version” error, it means your local onnx library is newer than the platform’s runtime. Fix: Always specify target_opset=17 (or lower) when converting models to ONNX to ensure compatibility with the production Inference Engine.

Dimension Mismatch (1 vs 2)

If your deployment fails with a “1 dimension vs 2” error, it means the auto-batching logic is conflicting with your flat ONNX model. Fix: Set the first dimension of your input to a fixed number (e.g., 1) in the TensorSpec. This disables implicit batching, allowing the engine to map your 1D model correctly.

Naming Convention

The default output name for Scikit-learn classifiers in ONNX is label. Ensure your output_schema uses this exact name. Using custom names like output_label will result in an Invalid argument error during inference.

Execution

Run the script using the Fathom CLI to ensure the registry context is correctly injected:

fathom intelligence mlflow run python3 train_onnx.py

Listing models

Once the script finishes, you can confirm that the model was received and stored correctly by querying the platform’s model list. This ensures your model is now an immutable asset ready for deployment.

fathom intelligence machine-learning model list

1.3 - Model Deployment

The process of wrapping your models into scalable, production-ready endpoints.

Model Deployment is the final step in the machine learning lifecycle. It takes a versioned artifact from the Model Registry and wraps it into a high-performance, scalable endpoint ready to serve real-time predictions.

Deploying a Registered Model

To deploy a model, you need the id of the logged model (which you obtained in the previous step). The deployment process allocates the necessary computational resources (CPU, RAM, or GPU) and sets up the inference runtime.

Create a Deployment

Use the deployment create command to launch your model. You must specify the model ID and the desired serving size.

fathom intelligence machine-learning deployment create logged-model --model-id 6174cc98-55fb-4818-9370-f75cafade62e --name "iris-classifier" --description "Production endpoint for Iris flower classification" --serving-size small

Option	Requirement	Description
–model-id	Required	The UUID of the model from the registry.
–name	Required	A unique name for your deployment.
–serving-size	Optional	Resource tier: small, large, or extra-large.
–serving-gpu	Optional	Attach a GPU for heavy models (nvidia-l4, nvidia-l4-2x).

Tag deployment

Use the deployment tag to tag your deployment. You must specify the deployment ID. It is possible to remove tags using deployment untag.

Monitoring Deployment Status

Deployments happen asynchronously. After creating one, you should monitor its state to ensure it transitions to running:

fathom intelligence machine-learning deployment list

Output example of a command run with --watch option:

 id                                   | created_at                     | name                 | kind                                       | description                                        | status  | state | tags
--------------------------------------+--------------------------------+----------------------+--------------------------------------------+----------------------------------------------------+---------+--------------
 379f103f-45cd-4c00-aec3-0fa4af756cae | 2026-03-25 08:06:17.003811 UTC | iris-classifier      | logged-models                              | Production endpoint for Iris flower classification | pending | N/A | production 

 id                                   | created_at                     | name            | kind          | description                                        | status  | state | tags |
--------------------------------------+--------------------------------+-----------------+---------------+----------------------------------------------------+---------+---------------
 379f103f-45cd-4c00-aec3-0fa4af756cae | 2026-03-25 08:06:17.003811 UTC | iris-classifier | logged-models | Production endpoint for Iris flower classification | running | hot   | production |

Resource Sizing

For the Iris Classifier (ONNX), a small serving size is more than sufficient. Choose large or attach a GPU only for complex models.

Updating a Deployment

Once a deployment is running, you can update it to point to a new version of your model (e.g., a newly trained logged-model-id) or change its resource allocation (e.g., upgrading from small to large).

The platform performs a rolling update, ensuring that your endpoint remains available while the new model version is being provisioned.

fathom intelligence machine-learning deployment update <DEPLOYMENT_ID> logged-model <OPTIONS>

Option	Description
–model-id	The new Logged Model UUID from the registry.
–name	Update the display name of the deployment.
–description	Update the deployment’s metadata/description.
–serving-size	Scale resources (small, large, extra-large).
–serving-gpu	Change or add a GPU accelerator.

Example: Update Logged Model

To promote a new model version to an existing deployment, use the update logged-model command. You will need the Deployment ID and the new Model ID.

fathom intelligence machine-learning deployment update 3cdec2ec-f51e-420c-937a-6c65af770084 logged-model --model-id 93096f6a-3a8a-4315-bc18-615ef72c7bcc

Model Inference

Once your deployment is in the running and hot state, you can begin making predictions. Fathom Intelligence supports three primary inference modes depending on your model type: General Tensor Inference, Chat Completions, and Embeddings.

General Tensor Inference (V2 Protocol)

This mode is used for classic ML models (Scikit-learn, ONNX, XGBoost) and computer vision. It follows the NVIDIA Triton V2 Predict Protocol.

Pipe via Standard Input (Recommended for Scripts)

You can pipe a JSON payload directly into the CLI. This is ideal for integration with tools like jq or automated data pipelines.

echo '{
  "inputs": [
    {
      "name": "float_input",
      "shape": [1, 4],
      "datatype": "FP32",
      "data": [7.0, 3.2, 4.7, 1.4]
    }
  ],
  "outputs": [
    {
      "name": "label"
    }
  ]
}' | fathom intelligence machine-learning deployment infer <DEPLOYMENT_ID> --data -

Inline JSON Payload

For quick manual testing, you can pass the full JSON object directly as a string.

fathom intelligence machine-learning deployment infer <DEPLOYMENT_ID> --data '{
  "inputs": [
    {
      "name": "float_input",
      "shape": [2, 4],
      "datatype": "FP32",
      "data": [5.1, 3.5, 1.4, 0.2]
    }
  ],
  "outputs": [
    {
      "name": "label"
    }
  ]
}'

Protocol Compatibility

MLflow ONNX Models (like the Iris Classifier we registered earlier) strictly support the Triton Inference Protocol via the infer command.

Generative commands like chat and embed are reserved for LLMs and specialized transformers (e.g., from Hugging Face), which will be covered in the following sections.

Developer Resources

Ready to integrate these models into your code? Check out our API Integration Guide for detailed documentation on endpoints, authentication headers, and code examples in Rust and Python.

2 - Hugging Face Models

A specialized guide on how to leverage the vast library of open-source models from Hugging Face.

For Large Language Models (LLMs) and Text Embedding models, Fathom Intelligence provides a direct integration with Hugging Face. This allows you to skip the Model Registry and deploy industry-standard models with a single CLI command.

You can deploy any supported model by providing its Hugging Face repository ID (e.g., mistralai/Mistral-7B-v0.1). The platform automatically handles the weights download, environment setup, and API wrapping.

Chat Model

For conversational AI, Fathom Intelligence supports Instruct models. Unlike base models that simply “complete” text, Instruct models are fine-tuned to follow directions and maintain a dialogue. When deployed, these models expose an OpenAI-compatible API, allowing you to use them as a drop-in replacement for existing AI integrations.

To deploy a chat-optimized model:

fathom intelligence machine-learning deployment create hugging-face --model-id "Qwen/Qwen2.5-0.5B-Instruct" --name "qwen-tiny-chat" --description "Fast 0.5B parameter chat model" -s large-high-mem

Once the deployment status reaches Running / Hot, you can interact with the model using the chat command. This command automatically handles the complex formatting (roles like user and assistant) required by the model’s internal chat template.

fathom intelligence machine-learning deployment chat <DEPLOYMENT_ID> --prompt "Explain the concept of 'Open Source' in one sentence."

Embedding Model

Fathom Intelligence allows you to bypass the manual model registration process for industry-standard architectures. You can deploy models directly from the Hugging Face Hub using their repository ID.

To start, we will deploy a lightweight but high-performance embedding model. This model converts text into 384-dimensional vectors.

fathom intelligence machine-learning deployment create hugging-face --model-id "sentence-transformers/all-MiniLM-L6-v2" --name "tiny-embed" --description "Small embedding model for testing" --serving-size large

The embed command is designed for models that perform Feature Extraction (e.g., BERT, RoBERTa, BGE). It converts raw text into high-dimensional numerical vectors (embeddings), which are essential for semantic search, clustering, and Retrieval-Augmented Generation (RAG).

fathom intelligence machine-learning deployment embed <DEPLOYMENT_ID> --input "The quick brown fox jumps over the lazy dog"

The command returns a JSON object containing the vector (embedding) for your input. For a standard model like all-MiniLM-L6-v2, the output will look like this:

{
  "object": "list",
  "data": [
    {
      "object": "embedding",
      "index": 0,
      "embedding": [
        0.0125, -0.0456, 0.0892, ... 384 dimensions total
      ]
    }
  ],
  "model": "default",
  "usage": {
    "prompt_tokens": 8,
    "total_tokens": 8
  }
}

Developer Resources

Ready to integrate these models into your code? Check out our API Integration Guide for detailed documentation on endpoints, authentication headers, and code examples in Rust and Python.

3 - Managed LLMs

This section describes how to access and configure Large Language Models (LLMs) managed directly by the platform. It focuses on rapid integration, prompt engineering via API, and cost-efficient scaling without the overhead of infrastructure management.

The Managed LLMs service provides a unified interface to access Large Language Models provided natively by the platform, as well as external models running as a service (such as OpenAI, Claude, or Gemini).

Access and Permissions

To ensure security and cost management, access to these models is governed by API keys managed at two levels:

Platform Level: Global models provided by the infrastructure.
Organization Level: Custom integrations where organization administrators can plug in their own provider keys.

This architecture allows teams to use state-of-the-art models without managing individual credentials, while administrators maintain full control over which models are available to specific organizations.

Listing Available Models

Before interacting with an LLM, you can list all models currently available in your active context. This list includes both native and third-party models (e.g., gpt-4o, claude-3-5-sonnet).

fathom intelligence llms model list

Chat Completions

The chat command is the primary way to interact with Managed LLMs via the CLI. It is an excellent tool for testing connectivity, validating model behavior, or quickly generating content.

fathom intelligence llms model chat <MODEL_NAME> --prompt <PROMPT_TEXT> [OPTIONS]

Key Options

Option	Default	Description
<MODEL_NAME>	Required	The model ID to use (e.g., gpt-4o, gemini-1.5-pro).
–prompt, -p	Required	The message to send to the model.
–system, -s	“You are a helpful…”	Sets the behavior/persona of the assistant.
–temperature, -t	0.7	Controls creativity (0.0 = deterministic, 1.0 = creative).
–max-tokens, -n	N/A	Limits the length of the generated response.
–no-stream	N/A	Disables real-time streaming of the response to the terminal.

Example: Basic Interactive Chat

To send a simple query to gemma-3-12b-it:

fathom intelligence llms model chat google/gemma-3-12b-it --prompt 'Explain quantum entanglement in one sentence.'

Example: Advanced System Behavior

You can override the default assistant behavior to act as a specific persona:

fathom intelligence llms model chat Qwen/Qwen2.5-VL-3B-Instruct --system 'You are a senior Rust developer. Provide code examples only.' --prompt 'How do I implement a trait in Rust?' --temperature 0.2

Developer Resources

Ready to integrate these models into your code? Check out our API Integration Guide for detailed documentation on endpoints, authentication headers, and code examples in Rust and Python.