The Distillation Treadmill â€” Mosaic Theory Blog

← All posts

Two frontier-class open weights dropped in seven days, both from Chinese labs, both priced to undercut the closed labs by an order of magnitude. Meanwhile, the open ecosystem is now industrially distilling Claude 4.6 Opus reasoning into 35B parameter packages and shipping abliterated variants of every major release within days. The price-performance gap is closing. The safety-tuning gap is closing in the wrong direction.

DeepSeek V4 arrived Friday and the framing was deliberate. DeepSeek open-sourced the V4 series on April 24 with V4-Flash, V4-Pro-Base, and V4-Flash-Base, and the developer-community framing settled on "almost on the frontier, a fraction of the price." TechCrunch framed it as closing the gap and MIT Technology Review listed three reasons it matters. SGLang shipped Day-0 serving support with verified RL integration. The HuggingFace blog post promoted a million-token context that agents can actually use. If you are running production inference on closed API frontier tier and have not yet costed out a switch to V4-Flash via SGLang or vLLM, that exercise just became higher priority.

Qwen3.6 is the bigger story for the indie developer economy. The Qwen3.6-35B-A3B sparse mixture-of-experts on HuggingFace activates only 3B of 35B parameters per token, and Unsloth's GGUF distribution has accumulated more than a million downloads of the quantised builds. The 27B dense model is being pitched in technical write-ups as flagship-level coding in a dense model. The downstream activity is what makes this release matter. hesamation shipped a Qwen3.6-35B-A3B-Claude-4.6-Opus-Reasoning-Distilled-GGUF, explicitly marketed as carrying Anthropic's reasoning trace into a locally runnable package. A z-lab DFlash variant, FP8 builds from Qwen directly, and a Qwopus-GLM-18B merge are all circulating. This is now a production pipeline that turns every closed-frontier release into an open-weight derivative within weeks. Whoever is paying for the original reasoning traces is functionally subsidising the open ecosystem.

The abliteration pipeline is industrialising

One uploader, HauhauCS, is now running what looks like a factory for stripping refusal behaviour out of frontier open weights. The HauhauCS account shipped Qwen3.6-35B-A3B-Uncensored-HauhauCS-Aggressive (526K downloads by Monday) and a matching 27B variant within days of the Qwen3.6 base release. The earlier Gemma-4 generation produced JANG_4M-CRACK, supergemma4-26b-uncensored, and OBLITERATUS's abliterated gemma-4-E4B-it. Each base release now reliably spawns a same-named "aggressive uncensored" variant on HuggingFace within a week, with hundreds of thousands of downloads. A recent technical article making the rounds argues the abliterated models cannot fully escape pretraining-baked refusal heuristics, but that nuance is not what is driving download counts. The pattern matters because abliteration has moved from a one-off curiosity to a predictable, scalable industrial step that consistently follows frontier open-weight releases. Whatever the genuine capability uplift, it is now trivially available, and the downstream liability shifts to the platforms that host these variants.

The Moonshot Kimi K2.6 release was a useful timing indicator. Moonshot AI released Kimi K2.6 (1T parameters, attention optimisations) ahead of SiliconAngle's coverage on April 20. The model has since hit 443K downloads on HuggingFace as an image-text-to-text MoE. Whether Kimi K2.6 and DeepSeek V4 cannibalise each other in the same niche or target distinct deployment profiles remains open. Independent comparative benchmarks against Qwen3.6 at similar active parameter counts are still sparse. For procurement teams trying to standardise on an open frontier baseline, the answer is probably to defer that decision by a few weeks and let the head-to-head benchmarks settle.

Agent memory as a category resolved this week. mem0 (54K stars), MemPalace (50K stars), and several markdown-wiki-for-agents projects have been converging on a shared shape for the better part of a month. On Monday morning, AWS dropped Company-wise memory in Amazon Bedrock with Amazon Neptune and Mem0, formally bringing the leading independent agent memory library into Bedrock's reference architecture. For anyone building agent memory as a startup category, AWS has now picked a reference partner, and that picks the winning interface for the rest of the year. The differentiation question is no longer "do you have memory" but "does your agent's memory federation match the shape Bedrock-shaped customers now expect."

Agent harnesses are converging on a common skill/memory/hook pattern. everything-claude-code (168K stars), awesome-claude-code, deer-flow, hermes-agent (119K stars), and AutoGPT (184K stars) now all ship variants of the same primitives. Skills as portable specifications, memory layers as separate services, hooks as deterministic interception points. The harness layer is rapidly commoditising. If you are an early-stage company whose pitch is the harness, the moat just got thinner. The defensible layer is moving down to evaluation, verification, and the long-horizon skill chaining benchmarks like SkillLearnBench that quantify how badly current agents degrade across iterative sessions.

Anthropic had a rough week on the Claude Code side. independent commentary tried to make sense of the $100/month Claude Code pricing rumour, Anthropic publicly acknowledged real Claude Code quality regressions, and the Opus 4.7 "obsessive malware-checking" behaviour started appearing in developer complaints. None of this is existential. But it lands at a moment when DeepSeek V4 and Qwen3.6-27B-dense are both being credibly described as flagship-tier coding models at a fraction of the cost. Anthropic has the brand and the enterprise relationships. The execution surface is more exposed than it was a quarter ago.

On our radar

CUDA dependency removal as a portability trend. The TRELLIS.2 Apple Silicon port replaced flash_attn, nvdiffrast, and custom sparse convolution kernels to run a 4B-parameter image-to-3D model natively on MPS. If this pattern repeats with the next few high-profile vision models, Apple's local-inference position improves materially without Apple having to ship a single SDK update.
European sovereign AI is consolidating fast. Cohere and Aleph Alpha announced a merger backed by $600M in new funding, and Sifted's reporting frames it as a direct response to US policy pressure. Verda raising â‚¬100m to build a European hyperscaler the day before fits the same trajectory. The sovereign-cloud question is moving from policy white papers to actual procurement decisions.
LLM safety detection via internal representations. HiLight proposes using intermediate-layer activations rather than terminal outputs to detect harmful content. If the latency and accuracy claims hold under independent evaluation, this changes the architecture of every commercial moderation pipeline. The abliteration trend above makes this more urgent, not less.

Signal data for this briefing is provided by HiddenState, Mosaic Theory's signal intelligence platform.

â€” Cosmo