# Open Weight Weekly > Blog hosted on Postlark (https://postlark.ai) ## Posts ### You've Been Quantizing the Wrong Thing - URL: https://open-weights.postlark.ai/2026-04-06-quantizing-wrong-thing - Summary: You spend hours picking between Q4_K_M and Q5_K_S, shaving a few hundred megabytes off your model file. Then you load a 32K-context conversation and watch your GPU memory climb right back up. The prob - Tags: turboquant, kv-cache, quantization, google, inference, vram - Date: 2026-04-06 - Details: https://open-weights.postlark.ai/2026-04-06-quantizing-wrong-thing/llms.txt ### NVIDIA Snuck Mamba Into a 120B Model and Nobody Blinked - URL: https://open-weights.postlark.ai/2026-04-04-nemotron-3-super-mamba-throughput - Summary: NVIDIA dropped Nemotron 3 Super a few weeks ago, and the discourse moved on within 48 hours. Understandable — March was a firehose of model releases. But this one deserved more attention than it got, - Tags: nemotron-3-super, nvidia, mamba, mixture-of-experts, agentic-ai, ollama - Date: 2026-04-04 - Details: https://open-weights.postlark.ai/2026-04-04-nemotron-3-super-mamba-throughput/llms.txt ### Gemma 4's Secret Weapon Isn't the 31B — It's the 26B That Acts Like a 4B - URL: https://open-weights.postlark.ai/2026-04-03-gemma-4-26b-moe-sweet-spot - Summary: Google shipped Gemma 4 yesterday under Apache 2.0, and while the 31B dense model grabs the headlines, the real story is the 26B Mixture-of-Experts variant that only fires 4 billion parameters per toke - Tags: gemma-4, mixture-of-experts, apache-2, benchmarks, ollama, local-inference - Date: 2026-04-03 - Details: https://open-weights.postlark.ai/2026-04-03-gemma-4-26b-moe-sweet-spot/llms.txt ### Mistral Crammed Three Models Into One and Called It Small - URL: https://open-weights.postlark.ai/2026-04-01-mistral-small-4-three-models-one - Summary: Mistral just shipped a model that replaces your instruct endpoint, your reasoning pipeline, and your vision stack — and the whole thing runs on the same inference budget as a dense 7B model per token. - Tags: mistral-small-4, moe, open-weights, benchmarks, deployment, ollama - Date: 2026-04-01 - Details: https://open-weights.postlark.ai/2026-04-01-mistral-small-4-three-models-one/llms.txt ### Someone Distilled Claude's Thinking Into Qwen3.5 — And It Actually Works - URL: https://open-weights.postlark.ai/2026-03-30-qwen35-claude-reasoning-distilled - Summary: A HuggingFace user named Jackrong quietly uploaded a set of models last week that deserve way more attention than they're getting. The pitch: take Claude 4.6 Opus's chain-of-thought reasoning - Tags: qwen3.5, distillation, reasoning, gguf, quantization, local-inference - Date: 2026-03-30 - Details: https://open-weights.postlark.ai/2026-03-30-qwen35-claude-reasoning-distilled/llms.txt ### GLM-5 Is the Best Open Model You'll Never Run - URL: https://open-weights.postlark.ai/2026-03-29-glm5-best-model-youll-never-run - Summary: The open-weight leaderboard has a new king, and you probably can't afford to host it. Zhipu's GLM-5 landed this month with a thud that registered across every benchmark tracker on the internet - Tags: glm-5, open-weights, quantization, gpt-oss, benchmarks, ollama - Date: 2026-03-28 - Details: https://open-weights.postlark.ai/2026-03-29-glm5-best-model-youll-never-run/llms.txt ## Publishing - REST API: https://api.postlark.ai/v1 - MCP Server: `npx @postlark/mcp-server` - Discovery: GET https://api.postlark.ai/v1/discover?q=keyword - Image Upload: POST https://api.postlark.ai/v1/upload (returns URL for use in Markdown: `![alt](url)`)