Hugging Face Tokenizer Flaw Enables MitM Attacks on Local

A security vulnerability in Hugging Face‘s tokenizer layer allows attackers to hijack AI model outputs and steal credentials by modifying a single `.json` file, HiddenLayer researcher Divyanshu Divyanshu disclosed on May 12, 2026. The attack affects models run locally across SafeTensors, ONNX, and GGUF formats — three of the most widely used open-source model packaging standards — and extends beyond Hugging Face to any platform running open-source models locally, including LlamaCPP and Ollama.

How the Tokenizer Attack Works

According to HiddenLayer’s blog post, an attacker manipulates a tokenizer configuration file to intercept tool call arguments and redirect URL tokens through attacker-controlled infrastructure. The tokenizer sits between a model’s raw integer output and the human-readable text it produces — making it a chokepoint for all model communication.

Once compromised, the modified tokenizer gives the threat actor “visibility into every URL the model accesses, API parameters, and any credentials embedded in those requests,” Divyanshu explained in the disclosure. This is a classic man-in-the-middle (MitM) setup, with the `.json` file acting as the interception layer.

The attack is limited to locally run models. Models processed through Hugging Face‘s Inference API are not affected, because the attacker cannot modify server-side files. Hugging Face did not respond to Dark Reading’s request for comment.

Scope: Which Formats and Platforms Are Exposed

HiddenLayer confirmed the attack works against three formats:

SafeTensors — Hugging Face’s own format, considered the platform’s de facto standard
ONNX — a cross-platform model interchange format widely used in production pipelines
GGUF — the format popularized by llama.cpp for running quantized models on consumer hardware

All three are supported by Hugging Face and are common across the broader open-source model ecosystem. The vulnerability’s reach extends to any runtime that loads these formats locally, including Ollama and LlamaCPP — two of the most popular tools for running Llama, Mistral, and other open-weight models on local hardware.

The practical implication is broad: developers fine-tuning models on local machines, enterprises running air-gapped deployments, and researchers using downloaded weights from Hugging Face’s model hub are all potentially exposed if an attacker can modify local files.

Fine-Tuning Workflows Add Another Attack Surface

The tokenizer vulnerability arrives as fine-tuning of open-source models is accelerating. The Hugging Face Blog recently published “Chapter 0” of A Hands-On Guide to Fine-Tuning Large Language Models with PyTorch and Hugging Face, reflecting growing demand for tutorials on customizing models like Llama and Mistral using the Transformers library.

Fine-tuning workflows typically involve downloading base model weights, loading tokenizer files, and modifying both locally before training. Each of those steps touches the exact file types HiddenLayer identified as exploitable. A poisoned tokenizer introduced at any point in that pipeline — through a compromised model repository, a dependency swap, or direct file access — would persist through training and into the fine-tuned model’s deployment.

This is not a theoretical risk. Hugging Face hosts hundreds of thousands of community-uploaded models, and supply chain attacks targeting model repositories have been documented previously.

Empromptu’s Alchemy Platform Highlights the Fine-Tuning Trend

The security disclosure comes as enterprise demand for custom fine-tuned models is growing. San Francisco-based Empromptu AI on Thursday launched Alchemy Models, a platform that automatically captures production workflow outputs and routes them into a continuous fine-tuning pipeline, according to VentureBeat.

The core pitch: enterprises already generate training signal through their AI applications — every expert correction, every validated output — and most of it is discarded. Alchemy captures that signal and updates model weights automatically, without requiring a dedicated ML team.

“Every customer, everybody that I talk to, is like, how am I not going to get disrupted? How am I going to protect my business? And they just don’t see the path,” Empromptu CEO Shanea Leven told VentureBeat. Enterprises retain ownership of the resulting weights outright — a meaningful distinction from API-only foundation model providers.

Alchemy sits in a different category from retrieval-augmented generation (RAG), which retrieves context at inference time without touching weights, and from traditional fine-tuning, which requires pre-assembled labeled datasets. The platform uses the live application as its data source, updating weights continuously.

The model ownership angle is significant in the context of the tokenizer vulnerability: enterprises that own their weights and run models locally face exactly the attack surface HiddenLayer described.

Open-Source Video Model Enters the Market at Steep Discount

Separately, two-year-old startup Perceptron Inc. released Mk1, a proprietary video analysis reasoning model priced at $0.15 per million input tokens and $1.50 per million output tokens via API — roughly 80–90% cheaper than Anthropic’s Claude Sonnet 4.5, OpenAI’s GPT-5, and Google’s Gemini 3.1 Pro, according to VentureBeat.

Co-founder and CEO Armen Aghajanyan, formerly of Meta FAIR and Microsoft, said the company spent 16 months building a “multi-modal recipe” designed to handle spatial reasoning, object dynamics, and cause-and-effect relationships in video. Target use cases include security monitoring, marketing video editing, and behavioral analysis in controlled studies.

Mk1 is proprietary rather than open-weight, but its pricing undercuts closed competitors substantially and a public demo is available at perceptron.inc/demo. The model’s benchmark performance focuses on grounded spatial and video understanding tasks.

What This Means

The HiddenLayer disclosure is a concrete reminder that the open-source model ecosystem’s greatest strength — freely downloadable, locally runnable weights — is also its primary attack surface. Unlike API-based models where infrastructure is controlled by the provider, local deployments put file integrity entirely in the user’s hands.

The tokenizer vector is particularly insidious because it sits downstream of the model itself. A compromised tokenizer doesn’t require modifying the weights — it intercepts outputs after the model has already done its work. That makes detection harder: the model behaves correctly internally while the attacker sees everything flowing out.

For enterprises accelerating fine-tuning workflows — whether through tools like Empromptu’s Alchemy or manual PyTorch pipelines — the implication is clear: tokenizer file integrity needs to be part of any model supply chain audit. Cryptographic verification of tokenizer configs, not just model weights, should be standard practice.

The broader open-source model moment is real. Llama, Mistral, and their derivatives are running in production across thousands of organizations. The security infrastructure around those deployments has not kept pace with adoption speed.

FAQ

What is the Hugging Face tokenizer vulnerability?

A tokenizer is a file that converts AI model outputs — raw integer sequences — into human-readable text. HiddenLayer researchers found that modifying a tokenizer’s `.json` configuration file allows an attacker to redirect the model’s outputs through attacker-controlled infrastructure, exposing URLs, API parameters, and embedded credentials. The attack only works on models run locally, not those accessed via Hugging Face’s hosted Inference API.

Which open-source model formats are affected by the tokenizer attack?

HiddenLayer confirmed the attack against models in SafeTensors, ONNX, and GGUF formats — all three supported by Hugging Face. Any local runtime loading these formats is potentially affected, including Ollama and LlamaCPP, which are commonly used to run Llama and Mistral models on consumer and enterprise hardware.

How does fine-tuning increase exposure to this type of attack?

Fine-tuning workflows require downloading base model weights and their associated tokenizer files, then modifying and retraining locally. A poisoned tokenizer introduced at any stage — through a compromised model repository or direct file modification — would persist into the fine-tuned model. Developers and enterprises fine-tuning open-source models should verify tokenizer file integrity using cryptographic checksums before and after any pipeline step.