Artificial Intelligence 9 min read

Intel Core Ultra 5 vs Apple M1: Which Wins for Large Language Model Inference?

This article compares the inference performance of a high‑end Intel Core Ultra 5 AI workstation with an Apple M1 MacBook Air using the IPEX‑LLM library, detailing installation steps, minimal code changes, resource usage, and benchmark results for small and large language models.

Code Mala Tang
Code Mala Tang
Code Mala Tang
Intel Core Ultra 5 vs Apple M1: Which Wins for Large Language Model Inference?

A high‑end Intel Core Ultra 5 AI workstation equipped with 96 GB RAM and 48 GB VRAM is compared against a 16 GB RAM Apple MacBook Air M1, which have similar price points, to evaluate their large‑language‑model (LLM) inference performance using a translation model.

Intel provides various neural‑network libraries for CPU, discrete GPU and integrated GPU (iGPU). To minimise code changes, the author first tried the IPEX‑LLM library.

Install Python dependencies

<code>pip install --pre --upgrade ipex-llm[xpu] --extra-index-url https://pytorch-extension.intel.com/release-whl/stable/xpu/us/</code>

Set environment variables

<code>set SYCL_CACHE_PERSISTENT=1
set BIGDL_LLM_XMX_DISABLED=1</code>

Code changes

Only two modifications are required:

<code>import torch
# CHANGE 1: AutoModelForSeq2SeqLM from ipex_llm.transformers
from transformers import AutoTokenizer, pipeline, GenerationConfig  # , AutoModelForSeq2SeqLM
from ipex_llm.transformers import AutoModelForSeq2SeqLM
import re
from datafile import print_err

class Translator:
    models_dict = {
        'nllb-1.3B': 'facebook/nllb-200-1.3B',
        'nllb-3.3B': 'facebook/nllb-200-3.3B',
        'nllb-distilled-600M': 'facebook/nllb-200-distilled-600M',
        'nllb-distilled-1.3B': 'facebook/nllb-200-distilled-1.3B',
    }
    models_url = {
        'nllb-1.3B': 'https://huggingface.co/facebook/nllb-200-1.3B',
        'nllb-3.3B': 'https://huggingface.co/facebook/nllb-200-3.3B',
        'nllb-distilled-600M': 'https://huggingface.co/facebook/nllb-200-distilled-600M',
        'nllb-distilled-1.3B': 'https://huggingface.co/facebook/nllb-200-distilled-1.3B',
    }
    models_symbol = {
        'nllb-1.3B': '🦬',
        'nllb-3.3B': '🐘',
        'nllb-distilled-600M': '🐏',
        'nllb-distilled-1.3B': '🐂',
    }
    models_size = {
        'nllb-1.3B': 1.3,
        'nllb-3.3B': 3.3,
        'nllb-distilled-600M': 0.6 / 2,
        'nllb-distilled-1.3B': 1.3 / 2,
    }

    def __init__(self, model_name, source_lang, target_lang):
        self.model_name = model_name
        self.source_lang = source_lang
        self.target_lang = target_lang
        # CHANGE 2: from "cpu" to "xpu"
        self.device = torch.device("mps" if torch.backends.mps.is_available() else "xpu")
        self.model, self.tokenizer = self.load_model()
        repetition_penalty = 1.5
        print_err(f"repetition_penalty={repetition_penalty}")
        self.translator = pipeline(
            'translation',
            model=self.model,
            tokenizer=self.tokenizer,
            src_lang=self.source_lang,
            tgt_lang=self.target_lang,
            device=self.device,
            repetition_penalty=repetition_penalty
        )

    def load_model(self):
        model = AutoModelForSeq2SeqLM.from_pretrained(Translator.models_dict[self.model_name])
        tokenizer = AutoTokenizer.from_pretrained(Translator.models_dict[self.model_name])
        generation_config = GenerationConfig.from_pretrained(Translator.models_dict[self.model_name])
        print_err(f"{generation_config}")
        return model, tokenizer

    def translate(self, text, batch_size=1):
        translation = self.translator(text, batch_size=batch_size)
        if torch.backends.mps.is_available():
            torch.mps.empty_cache()
        return translation, 'translation_text'

    @staticmethod
    def split_text(text, delimiters=('.', ',', '-')):
        # Split text by specified delimiters
        regex_pattern = '|'.join(map(re.escape, delimiters))
        segments = re.split(regex_pattern, text)
        return [segment.strip() for segment in segments if segment.strip()]

    def translate_segments(self, segments, delimiter=''):
        # Translate segments and merge
        translated_segments = []
        for segment in segments:
            if segment:
                translations, key = self.translate(segment)
                translated_segments.append(translations[key])
        return delimiter.join(translated_segments)</code>

Resource usage

When running the 3.3 B model, the system primarily utilizes the GPU rather than the NPU, with a total power draw of about 40 W and noticeable fan noise.

Performance comparison

Overall, for small models the performance gap with the M1 is minimal, but the 16 GB M1 occasionally crashes when running the 3.3 B model, whereas the Core Ultra 5 remains stable.

Characters per second, higher is better

Performance comparison analysis

When comparing Intel Core Ultra 5 and Apple M1 for LLM inference, we can analyse several aspects in detail.

Small model performance

Similarity : For lightweight tasks, both devices deliver comparable and stable performance.

Applicable scenarios : Simple translation or small‑model inference can be handled comfortably by either platform, allowing users to choose based on preference or budget.

Large model stability

Intel Core Ultra 5 advantage : Running a 3.3 B model shows clear superiority; the hardware processes complex inference without crashes.

Apple M1 limitation : Although efficient for everyday use, the 16 GB memory limit can cause resource exhaustion and crashes with large models.

Resource utilization and power consumption

Resource utilization : The Core Ultra 5 fully exploits its GPU for large‑model workloads, indicating a hardware architecture better suited for massive computations, whereas the M1’s GPU/NPU may become bottlenecks.

Power and cooling : The Core Ultra 5 draws roughly 40 W under heavy load and produces noticeable fan noise, while the M1 typically consumes less power but may underperform when resources are insufficient.

Summary and recommendations

Based on the comparison, the following conclusions can be drawn:

Small‑model tasks : Both devices perform similarly and can satisfy everyday translation or lightweight inference needs.

Large‑model tasks : Intel Core Ultra 5 offers higher stability and performance, making it the better choice for demanding LLM workloads.

Resources and power : Core Ultra 5 consumes more power but delivers stronger computational capability for large models.

If your primary requirement is handling large language models or intensive computation, the Intel Core Ultra 5 is the more suitable option; for routine use and lightweight tasks, the Apple M1 remains an efficient and energy‑saving device.

Large Language ModelsAI inferenceApple M1hardware comparisonIntel Core UltraIPEX-LLM
Code Mala Tang
Written by

Code Mala Tang

Read source code together, write articles together, and enjoy spicy hot pot together.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.