AI Model Releases and Incidents: June 2026

Three significant AI model releases landed in June 2026 alongside a wave of supply-chain security incidents that exposed gaps in how major labs ship software. Stability AI, Cohere, and Cerebras each made notable announcements, while OpenAI, Anthropic, and Meta grappled with four supply-chain attacks in 50 days that compromised release pipelines rather than the models themselves.

Cohere Releases First Apache 2.0 Open Model, Command A+

Cohere on Tuesday unveiled Command A+, a 218-billion-parameter sparse Mixture-of-Experts model released under an Apache 2.0 license — the first fully open-source release in the company’s history. Weights are available free on Hugging Face, giving enterprises, governments, and developers the right to run, modify, and commercialize the model without licensing fees.

Despite its 218B total parameters, Command A+ activates only 25 billion parameters per generation step, according to Cohere’s blog post. That sparse architecture dramatically reduces compute requirements compared to dense models of equivalent total size. The model is engineered for complex reasoning, multimodal document processing, and agentic workflows.

The release also introduces what Cohere calls lossless W4A4 quantization — compressing weights to 4-bit precision without measurable performance degradation — and native citation generation, which surfaces sources inline during retrieval-augmented generation tasks.

Aidan Gomez, Cohere’s CEO and co-author of the original “Attention Is All You Need” transformer paper, announced the Apache 2.0 licensing on X, framing the decision around “sovereign AI” — the thesis that organizations should control frontier-grade models entirely within their own secure environments. The move comes shortly after Cohere announced a merger with German AI startup Aleph Alpha.

Stability AI Launches Stable Audio 3.0 with Six-Minute Compositions

Stability AI released Stable Audio 3.0, a family of four audio models capable of generating professional-grade music up to 6 minutes and 20 seconds long — more than double the roughly 3-minute ceiling of Stable Audio 2.0, released in 2024, according to TechCrunch.

The four models span a wide range of deployment targets:

Small SFX (459M parameters) — on-device sound effects, up to 2 minutes
Small (459M parameters) — on-device music generation, up to 2 minutes
Medium (1.4B parameters) — full compositions up to 6 minutes 20 seconds
Large (2.7B parameters) — full compositions up to 6 minutes 20 seconds, API and self-hosting only

Small SFX, Small, and Medium are available with open weights. The Large model requires a paid API or self-hosting arrangement, and companies with more than $1 million in annual revenue must obtain an enterprise license. The previous open release, Stable Audio Open, was limited to 47-second clips.

Stability AI said the models were trained on fully licensed data, citing existing deals with Warner Music Group and Universal Music Group. The company also announced it is building a professional musician product suite, though no feature details were disclosed. Ethan Kaplan, former chief digital officer at Universal Audio and Fender, has joined the company.

Cerebras Runs Trillion-Parameter Kimi K2.6 at 981 Tokens per Second

Cerebras Systems announced it is serving Kimi K2.6 — a trillion-parameter open-weight model from Beijing-based Moonshot AI — at 981 output tokens per second, a result independently verified by benchmarking firm Artificial Analysis, according to VentureBeat. That speed is 6.7 times faster than the next-fastest GPU-based cloud provider and 23 times faster than the median GPU provider.

For a standard agentic coding task with 10,000 input tokens and 500 output tokens, Cerebras completed the full request in 5.6 seconds. The same request took 163.7 seconds on the official Kimi endpoint — a 29-fold difference in time-to-answer.

“We’re really wanting to be very clear and show that we can do the largest models,” James Wang, Cerebras’ director of product marketing, told VentureBeat. “In this case, Kimi K2.6 — a trillion-parameter MoE model on the wafer-scale architecture — and it runs also at this same incredible speed that we’re famous for.”

Kimi K2.6 is the first trillion-parameter model Cerebras has served in production, addressing a longstanding perception that its wafer-scale chips could only handle smaller models. The announcement came less than a week after Cerebras completed what it described as the largest tech IPO of 2026, giving it a $95 billion market cap and $5.55 billion in IPO proceeds.

Four Supply-Chain Attacks Hit AI Labs in 50 Days

Four supply-chain incidents struck OpenAI, Anthropic, and Meta between mid-April and early June 2026 — three adversary-driven attacks and one packaging failure — none of which targeted AI models directly, according to VentureBeat’s analysis.

The most technically striking incident involved a self-propagating worm called Mini Shai-Hulud, which on May 11, 2026, published 84 malicious package versions across 42 @tanstack/* npm packages in six minutes. The worm exploited a `pullrequesttarget` misconfiguration, GitHub Actions cache poisoning, and OIDC token extraction from runner memory. Critically, the malicious packages carried valid SLSA Build Level 3 provenance because they were published from the correct repository by the correct workflow using a legitimately minted token. No credentials were phished.

Two days later, OpenAI confirmed that two employee devices were compromised and credential material was exfiltrated from internal code repositories. OpenAI is revoking its macOS security certificates and requiring all desktop users to update by June 12, 2026. The company said it had already been hardening its CI/CD pipeline following an earlier incident, but the two affected devices had not yet received the updated configurations.

@The_Calda summarized the SLSA provenance failure on X: “If an attacker controls your CI runner, they control your attestations. Policy-based security is failing at scale.”

All four incidents exploited the same structural gap: release pipelines, dependency hooks, CI runners, and packaging gates that no system card, government AI safety evaluation, or red-team exercise has formally scoped.

What This Means

June 2026’s AI news cycle splits cleanly into two tracks: capability releases and infrastructure risk.

On the capability side, Cohere’s Apache 2.0 release of Command A+ is the most consequential licensing decision any enterprise-focused lab has made this year. Fully permissive weights at 218B parameters — with lossless quantization and native citations — give regulated industries a viable path to on-premises frontier AI without vendor lock-in. Stability Audio 3.0’s open-weight medium tier similarly extends what developers can build locally, particularly in creative tooling.

Cerebras’ Kimi K2.6 numbers are striking, but the more important signal is the market positioning: a freshly public company with $95B in market cap is now directly challenging GPU cloud providers on inference speed for the largest open models. If Artificial Analysis’ independent verification holds up under broader scrutiny, GPU cloud vendors face real pricing pressure on latency-sensitive workloads.

The supply-chain incidents are the underreported story. Four attacks in 50 days — with valid provenance on malicious packages — demonstrate that the AI industry’s security investment is concentrated in model evaluation while release infrastructure remains largely unaudited. OpenAI’s certificate revocation is a reactive measure; the structural fix requires formal threat modeling of CI/CD pipelines as a distinct attack surface.

FAQ

What is Cohere’s Command A+ and how is it licensed?

Command A+ is a 218-billion-parameter sparse Mixture-of-Experts language model released by Cohere in June 2026. It is available free on Hugging Face under an Apache 2.0 license, meaning anyone can use, modify, and commercialize it without paying Cohere licensing fees — the first time the company has released a model under fully open-source terms.

How does Cerebras run Kimi K2.6 faster than GPU clouds?

Cerebras uses a wafer-scale chip architecture that keeps the entire model’s compute on a single large die, eliminating the inter-chip communication overhead that slows GPU clusters. Benchmarking firm Artificial Analysis independently verified Cerebras hitting 981 tokens per second on Kimi K2.6, versus a median of roughly 43 tokens per second across GPU-based providers.

What was the Mini Shai-Hulud supply-chain attack?

Mini Shai-Hulud was a self-propagating worm that on May 11, 2026, injected 84 malicious package versions into 42 TanStack npm packages in six minutes by exploiting a GitHub Actions misconfiguration and extracting OIDC tokens from CI runner memory. The packages passed SLSA Build Level 3 provenance checks because the attack used the legitimate release workflow rather than stolen developer credentials.

Sources

Four AI supply-chain attacks in 50 days exposed the release pipeline red teams aren’t covering – VentureBeat
Stability AI releases a new audio model that can create six-minute songs – TechCrunch
Cohere cracks lossless quantization and native citations with first full Apache 2.0 licensed open model Command A+ – VentureBeat
All of the updates from Elon Musk and Sam Altman’s battle over OpenAI – The Verge
Cerebras says its chips run a trillion-parameter AI model nearly 7 times faster than GPU clouds – VentureBeat