Integrating ALLaM-7B-Instruct-preview with Ollama

Ollama provides a convenient way to run large language models locally. While many models are available directly via ollama pull
, you can also import custom models, like ALLaM-AI/ALLaM-7B-Instruct-preview
, by creating a Modelfile
.
Understanding Model Formats: Safetensors vs. GGUF
Before diving into the import process, it's helpful to understand the different model formats involved:
-
Safetensors / PyTorch Format:
- What it is: These formats (
.safetensors
,.bin
,.pth
) are standard for distributing models used in training and within frameworks like Hugging Face Transformers. They typically store high-precision model weights (e.g., 16-bit or 32-bit floating point). The officialALLaM-AI/ALLaM-7B-Instruct-preview
model uses Safetensors. - Use Case: Primarily for model training, fine-tuning, and running inference using Python libraries (
transformers
,torch
) often requiring powerful hardware (especially GPUs).
- What it is: These formats (
-
GGUF (GPT-Generated Unified Format):
- What it is: A binary format specifically designed by the
llama.cpp
project for efficient inference (running the model) on a wider range of hardware, including CPUs and Apple Silicon (Metal). - Key Feature - Quantization: GGUF files usually contain quantized weights. Quantization reduces the precision (e.g., to 4-bit or 5-bit integers), significantly shrinking file size and memory usage, making large models accessible on consumer hardware.
- Self-Contained: Bundles model weights, architecture details, and tokenizer information into a single file.
- What it is: A binary format specifically designed by the
Why Ollama Prefers GGUF
Ollama leverages the llama.cpp
library internally. GGUF is the native format for llama.cpp
, offering several advantages for Ollama's goal of easy local LLM execution:
- Efficiency: Quantized GGUF models run faster and use less RAM/VRAM.
- Accessibility: Enables running large models on standard laptops and desktops.
- Simplicity: Users interact with a single file format managed seamlessly by Ollama.
The Challenge with ALLaM and Ollama
Ollama works best with models in the GGUF format. The official ALLaM-AI/ALLaM-7B-Instruct-preview
repository on Hugging Face primarily provides weights in Safetensors/PyTorch format.
Key Point: Directly importing Safetensors into Ollama using a Modelfile
is often not straightforward and might require manual conversion to GGUF first. Tools like llama.cpp
can perform this conversion, but it's an advanced process.
This guide outlines the steps using a Modelfile
, assuming you either:
a) Find a pre-converted GGUF version of the model (community contributions sometimes provide these).
b) Have successfully converted the original model weights to GGUF format yourself.
Importing ALLaM into Ollama (Requires GGUF)
1. Obtain the GGUF Model File
- Ollama Installed: Ensure you have Ollama running on your system. Visit ollama.com for installation instructions.
- GGUF File: As Ollama works best with GGUF, you need the ALLaM model in this format.
- Option A (Recommended): Search the Hugging Face community (e.g., search for "ALLaM GGUF") or other model repositories for a pre-converted GGUF version of
ALLaM-7B-Instruct-preview
. Download a suitable quantization level (e.g.,q4_K_M
). - Option B (Advanced): Download the original Safetensors weights from the official repository and convert them to GGUF yourself using
llama.cpp
's conversion scripts (e.g.,convert.py
andquantize
). This is a technical process requiring familiarity with Python and compilingllama.cpp
.
- Option A (Recommended): Search the Hugging Face community (e.g., search for "ALLaM GGUF") or other model repositories for a pre-converted GGUF version of
2. Create the Modelfile
Create a file named Modelfile
(no extension) in a directory of your choice. This file tells Ollama how to configure and run the model.
# Modelfile for ALLaM-7B-Instruct-preview (GGUF)
# IMPORTANT: Replace './path/to/your/allam-7b-instruct-preview.gguf'
# with the actual path to your downloaded or converted GGUF file.
FROM ./path/to/your/allam-7b-instruct-preview.gguf
# Define the chat template based on the model's expected format.
# This mirrors the structure used by Hugging Face's apply_chat_template
# for models expecting user/assistant turns. Check model card if specific tokens are needed.
TEMPLATE """{{ if .System }}<|im_start|>system
{{ .System }}<|im_end|>
{{ end }}{{ if .Prompt }}<|im_start|>user
{{ .Prompt }}<|im_end|>
{{ end }}<|im_start|>assistant
{{ .Response }}<|im_end|>
"""
# Set model parameters (refer to model card for specifics if available)
PARAMETER stop "<|im_start|>"
PARAMETER stop "<|im_end|>"
PARAMETER temperature 0.6
PARAMETER top_k 50
PARAMETER top_p 0.95
# Context size from the model card
PARAMETER num_ctx 4096
# Optional: Set a default system prompt (can be overridden during chat)
SYSTEM """You are ALLaM, a helpful bilingual English and Arabic AI assistant developed by SDAIA."""
# Optional: License information (refer to the model's actual license)
LICENSE """
Apache License 2.0
(Verify the exact license from the Hugging Face repository)
"""
Explanation:
FROM
: Specifies the path to your GGUF model file. Crucially, update this path.TEMPLATE
: Defines how prompts and responses are formatted for the model. This example uses a common chat format with<|im_start|>
and<|im_end|>
tokens, often used by instruct models. You might need to adjust this based on the specific GGUF conversion or model requirements.PARAMETER
: Sets default generation parameters like stopping tokens, temperature, context window size (num_ctx
), etc.SYSTEM
: Sets a default personality or instruction for the model.LICENSE
: Include the model's license information.
3. Create the Ollama Model
Open your terminal, navigate to the directory containing your Modelfile
and the GGUF file, and run the ollama create
command:
ollama create allam-instruct-preview -f Modelfile
allam-instruct-preview
: This is the name you'll use to refer to the model in Ollama. You can choose a different name.-f Modelfile
: Specifies the Modelfile to use.
Ollama will process the Modelfile
and import the GGUF model into its library. This might take some time depending on the model size.
4. Run the Model
Once the creation process is complete, you can run the model using:
ollama run allam-instruct-preview
You can now chat with the model directly in your terminal. Try prompts in English or Arabic:
>>> كيف أجهز كوب شاهي؟
>>> Explain the concept of Large Language Models.
>>> /bye (to exit)
Publishing Your Custom Model to Ollama Registry
Once you have successfully created a local Ollama model from a GGUF file (e.g., allam-instruct-preview
), you can share it on the ollama.com registry for others to use.
1. Create an Ollama Account
If you haven't already, sign up for an account at ollama.com/signup. Your username will be part of the model's public name (e.g., <your_username>/allam-instruct-preview
).
2. Add Your Public Key
- Find your local Ollama public key. The location is typically:
- macOS:
~/.ollama/id_ed25519.pub
- Linux:
/usr/share/ollama/.ollama/id_ed25519.pub
or~/.ollama/id_ed25519.pub
- Windows:
C:\\Users\\<username>\\.ollama\\id_ed25519.pub
- macOS:
- Go to your account settings on ollama.com/settings/keys.
- Click "Add Ollama Public Key" and paste the entire content of your
.pub
file.
3. Tag Your Local Model
Before pushing, you need to tag your local model with your Ollama username. Use the ollama cp
(copy) command:
# ollama cp <local_model_name> <your_username>/<new_model_name>
ollama cp allam-instruct-preview your_ollama_username/allam-instruct-preview
Replace your_ollama_username
with your actual Ollama username.
4. Push the Model
Now, push the tagged model to the registry:
ollama push your_ollama_username/allam-instruct-preview
This will upload your model. Once complete, others can pull and run it using:
ollama pull your_ollama_username/allam-instruct-preview
ollama run your_ollama_username/allam-instruct-preview
Important: You can only publish models that you have successfully created locally, which for non-standard models like ALLaM typically requires starting with a GGUF file. Ensure you have the rights to redistribute the model weights according to the original model's license (ALLaM uses Apache 2.0, which generally permits redistribution).
Conclusion
Integrating custom models like ALLaM-7B-Instruct-preview
into Ollama requires a Modelfile
and, most importantly, the model weights in the GGUF format. While the official repository doesn't provide a GGUF version, finding a community conversion or converting it yourself allows you to leverage this powerful Arabic/English model within your local Ollama environment. Remember to adjust the FROM
path and potentially the TEMPLATE
in the Modelfile
based on your specific GGUF file.
References:
Discuss Your Project with Us
We're here to help with your web development needs. Schedule a call to discuss your project and how we can assist you.
Let's find the best solutions for your needs.