Hugging Face Models

A specialized guide on how to leverage the vast library of open-source models from Hugging Face.

For Large Language Models (LLMs) and Text Embedding models, Fathom Intelligence provides a direct integration with Hugging Face. This allows you to skip the Model Registry and deploy industry-standard models with a single CLI command.

You can deploy any supported model by providing its Hugging Face repository ID (e.g., mistralai/Mistral-7B-v0.1). The platform automatically handles the weights download, environment setup, and API wrapping.

Chat Model

For conversational AI, Fathom Intelligence supports Instruct models. Unlike base models that simply “complete” text, Instruct models are fine-tuned to follow directions and maintain a dialogue. When deployed, these models expose an OpenAI-compatible API, allowing you to use them as a drop-in replacement for existing AI integrations.

To deploy a chat-optimized model:

fathom intelligence machine-learning deployment create hugging-face --model-id "Qwen/Qwen2.5-0.5B-Instruct" --name "qwen-tiny-chat" --description "Fast 0.5B parameter chat model" -s large-high-mem

Once the deployment status reaches Running / Hot, you can interact with the model using the chat command. This command automatically handles the complex formatting (roles like user and assistant) required by the model’s internal chat template.

fathom intelligence machine-learning deployment chat <DEPLOYMENT_ID> --prompt "Explain the concept of 'Open Source' in one sentence."

Embedding Model

Fathom Intelligence allows you to bypass the manual model registration process for industry-standard architectures. You can deploy models directly from the Hugging Face Hub using their repository ID.

To start, we will deploy a lightweight but high-performance embedding model. This model converts text into 384-dimensional vectors.

fathom intelligence machine-learning deployment create hugging-face --model-id "sentence-transformers/all-MiniLM-L6-v2" --name "tiny-embed" --description "Small embedding model for testing" --serving-size large

The embed command is designed for models that perform Feature Extraction (e.g., BERT, RoBERTa, BGE). It converts raw text into high-dimensional numerical vectors (embeddings), which are essential for semantic search, clustering, and Retrieval-Augmented Generation (RAG).

fathom intelligence machine-learning deployment embed <DEPLOYMENT_ID> --input "The quick brown fox jumps over the lazy dog"

The command returns a JSON object containing the vector (embedding) for your input. For a standard model like all-MiniLM-L6-v2, the output will look like this:

{
  "object": "list",
  "data": [
    {
      "object": "embedding",
      "index": 0,
      "embedding": [
        0.0125, -0.0456, 0.0892, ... 384 dimensions total
      ]
    }
  ],
  "model": "default",
  "usage": {
    "prompt_tokens": 8,
    "total_tokens": 8
  }
}

Developer Resources

Ready to integrate these models into your code? Check out our API Integration Guide for detailed documentation on endpoints, authentication headers, and code examples in Rust and Python.

Last modified April 24, 2026: Merge 3bb4672c9ae15a0dcd08061a85722c9bcadfae8f into 78b252ca111bc5b90174676e09115f7af7d6d37f (92ca1b0)