البدء مع ALLaM-7B-Instruct-preview

نموذج ALLaM-7B-Instruct-preview هو نموذج لغوي كبير وقوي يحتوي على 7 مليارات معلمة، تم تطويره بواسطة المركز الوطني للذكاء الاصطناعي (NCAI) في الهيئة السعودية للبيانات والذكاء الاصطناعي (SDAIA). تم تدريبه خصيصًا للغتين العربية والإنجليزية، مما يجعله أداة قيمة للتطبيقات ثنائية اللغة. يرشدك هذا البرنامج التعليمي خلال إعداد واستخدام النموذج مباشرة في Python ويشرح كيفية التفاعل معه من JavaScript عبر نقطة نهاية API مستضافة.

مقدمة إلى ALLaM

النموذج هو جزء من سلسلة ALLaM، المصممة لتطوير تقنية اللغة العربية (ALT). هذا الإصدار المحدد (ALLaM-7B-Instruct-preview) مضبوط حسب التعليمات، مما يعني أنه مُحسَّن لاتباع تعليمات المستخدم المقدمة في التوجيهات (prompts). تم بناؤه باستخدام بنية المحولات التوليدية التراجعية (autoregressive transformer) ويدعم طول سياق يبلغ 4096 رمزًا (token).

الاستخدام مع Python ومكتبة transformers

الطريقة الأساسية للتفاعل مع ALLaM هي من خلال مكتبة transformers من Hugging Face في Python.

الإعداد

تثبيت المكتبات: ستحتاج إلى transformers و torch. يوصى بشدة باستخدام وحدة معالجة رسومات (GPU) تدعم CUDA للحصول على أداء معقول.

pip install transformers torch
# أو لدعم CUDA (تأكد من تطابق إصدار PyTorch مع إصدار CUDA لديك):
# pip install transformers torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118 # مثال لـ CUDA 11.8

حساب Hugging Face (اختياري): بناءً على أذونات الوصول إلى النموذج، قد تحتاج إلى تسجيل الدخول إلى حساب Hugging Face الخاص بك. يمكنك تسجيل الدخول عبر واجهة سطر الأوامر (CLI):
```
huggingface-cli login
```

مثال على الكود

يقوم البرنامج النصي التالي بتحميل النموذج والمُرمِّز (tokenizer)، وإعداد توجيه إدخال (باللغة العربية)، وإنشاء استجابة، وطباعتها.

# -*- coding: utf-8 -*-
"""
Example usage for the ALLaM-AI/ALLaM-7B-Instruct-preview model from Hugging Face.
 
This script demonstrates how to load the model and tokenizer using the
transformers library and generate text based on a sample prompt.
 
Requirements:
- transformers>=4.40.1
- torch
- A CUDA-enabled GPU is highly recommended for performance.
"""
 
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer
 
# --- Configuration ---
MODEL_NAME = "ALLaM-AI/ALLaM-7B-Instruct-preview"
# Set device to CUDA if available, otherwise CPU (will be very slow on CPU)
DEVICE = "cuda" if torch.cuda.is_available() else "cpu"
print(f"Using device: {DEVICE}")
 
# --- Load Model and Tokenizer ---
try:
    print(f"Loading model: {MODEL_NAME}...")
    # Consider adding torch_dtype=torch.bfloat16 if memory is an issue and GPU supports it
    allam_model = AutoModelForCausalLM.from_pretrained(MODEL_NAME)
    print(f"Loading tokenizer: {MODEL_NAME}...")
    tokenizer = AutoTokenizer.from_pretrained(MODEL_NAME)
    print("Model and tokenizer loaded successfully.")
 
    # Move model to the selected device
    allam_model = allam_model.to(DEVICE)
 
except Exception as e:
    print(f"Error loading model or tokenizer: {e}")
    print("Please ensure you have the necessary libraries installed and are logged in to Hugging Face if required.")
    exit(1)
 
# --- Prepare Input ---
# Example prompt (Arabic)
messages = [
    {"role": "user", "content": "كيف أجهز كوب شاهي؟"}, # "How do I prepare a cup of tea?"
]
 
# Apply the chat template (handles formatting for the model)
# Note: The model card mentions the system prompt is integrated here.
# You could potentially add a system message like:
# messages = [
#     {"role": "system", "content": "You are ALLaM, a bilingual English and Arabic AI assistant."},
#     {"role": "user", "content": "كيف أجهز كوب شاهي؟"},
# ]
try:
    print("Applying chat template...")
    # tokenize=False first to get the formatted string, then tokenize
    formatted_input_string = tokenizer.apply_chat_template(messages, tokenize=False)
    print(f"Formatted input string:\n{formatted_input_string}")
 
    print("Tokenizing input...")
    inputs = tokenizer(formatted_input_string, return_tensors='pt', return_token_type_ids=False)
 
    # Move inputs to the selected device
    inputs = {k: v.to(DEVICE) for k, v in inputs.items()}
    print("Input prepared for the model.")
 
except Exception as e:
    print(f"Error preparing input: {e}")
    exit(1)
 
# --- Generate Response ---
print("Generating response...")
try:
    # Generation parameters (adjust as needed)
    generation_params = {
        "max_new_tokens": 4096, # Max tokens for the *newly generated* text
        "do_sample": True,     # Use sampling for more creative output
        "top_k": 50,           # Consider only the top K most likely tokens
        "top_p": 0.95,         # Use nucleus sampling (cumulative probability)
        "temperature": 0.6     # Controls randomness (lower = more deterministic)
    }
    print(f"Generation parameters: {generation_params}")
 
    with torch.no_grad(): # Disable gradient calculations for inference
        response_ids = allam_model.generate(**inputs, **generation_params)
 
    print("Decoding response...")
    # Decode the generated token IDs back to text
    # response_ids[0] accesses the first sequence in the batch
    # skip_special_tokens=True removes tokens like <bos>, <eos>
    decoded_response = tokenizer.batch_decode(response_ids, skip_special_tokens=True)[0]
 
    # Often the decoded response includes the input prompt, we might want to remove it.
    # This simple approach assumes the response starts exactly with the formatted input.
    # More robust methods might be needed depending on the model's exact output format.
    if decoded_response.startswith(formatted_input_string):
         final_output = decoded_response[len(formatted_input_string):].strip()
    else:
         # Fallback if the prompt isn't exactly at the start (might happen with some templates/models)
         # This might still include parts of the prompt depending on the tokenizer/template behavior.
         # A more robust way might involve finding the specific turn separator used by the template.
         print("Warning: Could not cleanly separate prompt from response. Displaying full decoded output.")
         final_output = decoded_response # Show the full thing if separation fails
 
    print("\n--- Generated Response ---")
    print(final_output)
    print("--------------------------\n")
 
except Exception as e:
    print(f"Error during generation or decoding: {e}")
    exit(1)
 
print("Script finished successfully.")

تشغيل مثال Python

احفظ الكود أعلاه باسم allam_example.py وقم بتشغيله من الطرفية (terminal):

python allam_example.py

سيقوم البرنامج النصي بتحميل النموذج (قد يستغرق هذا بعض الوقت وتنزيل الملفات عند التشغيل لأول مرة)، ومعالجة الإدخال، وإنشاء النص، وطباعة النتيجة.

استخدام JavaScript (عبر API مستضاف)

تشغيل نموذج بحجم 7 مليارات معلمة مثل ALLaM مباشرة داخل بيئة JavaScript قياسية (مثل متصفح الويب أو Node.js) باستخدام مكتبات مثل @xenova/transformers غير عملي بشكل عام بسبب الحجم الكبير للنموذج ومتطلبات الموارد العالية (ذاكرة الوصول العشوائي RAM/VRAM، طاقة وحدة المعالجة المركزية CPU/GPU).

الطريقة العملية للتفاعل مع مثل هذا النموذج من JavaScript هي عن طريق استدعاء نقطة نهاية API حيث يتم استضافة النموذج على بنية تحتية خلفية مناسبة. تتيح لك منصات مثل Hugging Face Spaces أو Inference Endpoints المخصصة نشر النموذج وكشفه عبر واجهة برمجة تطبيقات (API).

مثال لكود عميل JavaScript (استدعاء API مستضاف)

يوضح هذا الكود كيفية استخدام fetch لإرسال توجيه إلى نقطة نهاية API افتراضية مستضافة على Hugging Face Spaces. ستحتاج إلى نشر النموذج أولاً واستبدال عنوان URL المؤقت.

/**
 * Example JavaScript usage for interacting with the ALLaM model hosted on Hugging Face Spaces.
 *
 * IMPORTANT: This script assumes you have deployed the ALLaM model within a
 * backend application (e.g., using Python FastAPI/Gradio and the `transformers` library)
 * on Hugging Face Spaces. You need to replace the placeholder URL below
 * with the actual public URL of your deployed Space's API endpoint.
 */
 
// !!! REPLACE THIS WITH YOUR ACTUAL HUGGING FACE SPACE API ENDPOINT URL !!!
// It might look something like: https://your-username-your-space-name.hf.space/generate
const HF_SPACE_API_URL = "https://YOUR_USERNAME-YOUR_SPACE_NAME.hf.space/generate"; // Placeholder URL
 
// Example prompt
const userPrompt = "كيف أجهز كوب شاهي؟";
 
async function generateTextViaHFSpace(promptText) {
  console.log(`Sending prompt to HF Space: "${promptText}" at ${HF_SPACE_API_URL}`);
 
  if (HF_SPACE_API_URL.includes("YOUR_USERNAME-YOUR_SPACE_NAME")) {
    console.error("Error: Please replace the placeholder HF_SPACE_API_URL with your actual Space endpoint URL.");
    return null;
  }
 
  try {
    const response = await fetch(HF_SPACE_API_URL, {
      method: "POST",
      headers: {
        "Content-Type": "application/json",
        // "Authorization": "Bearer YOUR_HF_TOKEN_IF_NEEDED", // Add if your Space is private
      },
      body: JSON.stringify({ prompt: promptText }), // Adjust payload as needed by your API
    });
 
    if (!response.ok) {
      let errorDetails = `HTTP error! status: ${response.status}`;
      try { const errorBody = await response.json(); errorDetails += `, Body: ${JSON.stringify(errorBody)}`; }
      catch (e) { const textBody = await response.text(); errorDetails += `, Body: ${textBody}`; }
      throw new Error(errorDetails);
    }
 
    const data = await response.json();
    console.log("Received response from HF Space.");
    // Adjust the key based on what your Space API returns
    return data.generated_text || data.response || data;
 
  } catch (error) {
    console.error("Error calling Hugging Face Space API:", error);
    throw error;
  }
}
 
// Execute
(async () => {
  try {
    const generatedText = await generateTextViaHFSpace(userPrompt);
    if (generatedText !== null) {
        console.log("\n--- Generated Response (from HF Space) ---");
        console.log(generatedText);
        console.log("------------------------------------------\n");
    }
  } catch (error) {
    console.error("Failed to get generation from Hugging Face Space.");
  }
})();
 
// To run this (assuming Node.js):
// 1. Deploy ALLaM in a Hugging Face Space with an API endpoint.
// 2. Update the HF_SPACE_API_URL constant in this script.
// 3. Run `node allam_hf_space_example.js`.

متطلب: يتطلب نهج JavaScript هذا استضافة نموذج ALLaM خلف واجهة برمجة تطبيقات أولاً. انظر القسم التالي للخطوات المفاهيمية باستخدام Hugging Face Spaces.

استضافة ALLaM للوصول عبر API (Hugging Face Spaces)

توفر Hugging Face Spaces منصة لاستضافة تطبيقات تعلم الآلة، بما في ذلك خدمة النماذج عبر واجهات برمجة التطبيقات. فيما يلي نظرة عامة مفاهيمية على نشر ALLaM في Space:

إنشاء Space جديد: انتقل إلى Hugging Face وأنشئ Space جديدًا، واختر SDK مناسبًا (مثل Docker أو Gradio/FastAPI). ستحتاج على الأرجح إلى تحديد جهاز مدفوع (مثل مثيل GPU مثل A10G) لتشغيل نموذج 7B بفعالية.

تحديد التبعيات: قم بإنشاء ملف requirements.txt يسرد مكتبات Python الضرورية:

transformers>=4.40.1
torch
fastapi
uvicorn
accelerate # Often needed for efficient model loading
# Add other libraries as needed

إنشاء تطبيق الواجهة الخلفية (Backend) (مثل app.py مع FastAPI):
- استيراد المكتبات الضرورية (FastAPI, transformers, torch, إلخ).
- تحميل نموذج ALLaM والمُرمِّز عند بدء التشغيل (باستخدام AutoModelForCausalLM.from_pretrained، على غرار مثال Python). تأكد من تحميله على الجهاز الصحيح (GPU إذا كان متاحًا في Space).
- تحديد نموذج Pydantic لنص الطلب (request body) (على سبيل المثال، يحتوي على حقل prompt).
- إنشاء نقطة نهاية FastAPI من نوع POST (مثل /generate).
- داخل دالة نقطة النهاية:
  - استقبال التوجيه من نص الطلب.
  - إعداد الإدخال باستخدام tokenizer.apply_chat_template.
  - إنشاء الاستجابة باستخدام model.generate().
  - فك ترميز الاستجابة باستخدام tokenizer.batch_decode.
  - إرجاع النص المُنشأ في استجابة JSON.
تكوين Space: تأكد من أن تكوين Space الخاص بك يستخدم ملف Python الصحيح (app.py) ويثبت التبعيات من requirements.txt.
النشر: قم بتثبيت (commit) ملفاتك (app.py, requirements.txt, إلخ) إلى مستودع Space. ستقوم Hugging Face ببناء ونشر التطبيق.
الحصول على عنوان URL للـ API: بمجرد النشر، سيكون لـ Space الخاص بك عنوان URL عام. ستكون نقطة نهاية API على العنوان https://YOUR_USERNAME-YOUR_SPACE_NAME.hf.space/YOUR_ENDPOINT (مثل /generate). استخدم عنوان URL هذا في كود عميل JavaScript الخاص بك.

ملاحظة: هذه نظرة عامة عالية المستوى. يتضمن بناء ونشر تطبيقات قوية على HF Spaces تفاصيل مثل معالجة الأخطاء وإدارة الموارد والمصادقة المحتملة. راجع وثائق Hugging Face Spaces للحصول على أدلة مفصلة.

توجيهات النظام (System Prompts)

تم تحسين نموذج ALLaM للعمل بدون توجيه نظام افتراضي. ومع ذلك، يمكنك توفير واحد إذا لزم الأمر عن طريق إضافة رسالة بالدور role: "system" إلى قائمة messages قبل توجيه المستخدم في كود Python.

أمثلة:

الإنجليزية: {"role": "system", "content": "You are ALLaM, a bilingual English and Arabic AI assistant."}
العربية: {"role": "system", "content": "أنت علام، مساعد ذكاء اصطناعي مطور من الهيئة السعودية للبيانات والذكاء الاصطناعي..."}

الاعتبارات الأخلاقية

تذكر أن النماذج اللغوية الكبيرة مثل ALLaM يمكن أن تنتج أحيانًا مخرجات غير صحيحة أو متحيزة. من الضروري تنفيذ تدابير السلامة وتقييم مدى ملاءمة النموذج لتطبيقك المحدد. لا يمثل الإخراج الذي تم إنشاؤه بيانات رسمية من NCAI أو SDAIA.

المرجع: بطاقة نموذج ALLaM-AI/ALLaM-7B-Instruct-preview على Hugging Face.