Tenkai Daily — May 6, 2026

Model Releases

Granite-4.1-30B — IBM’s 30B conversational model, Apache 2.0, endpoints-compatible, deployable on Azure. Safetensors format available. If you need an on-prem friendly mid-sized model without licensing headaches, here’s one. 📄
Gemma-4-31B-it-assistant — Google’s 31B Gemma 4 variant tuned for assistant-style any-to-any tasks. Apache 2.0, endpoints-compatible. Solid if you want something in the 30B range that doesn’t require a PhD in licensing terms. 🤖
Hy-MT1.5-1.8B-1.25bit — Tencent’s translation model quantized to 1.25-bit precision. Three associated arXiv papers. If your deployment constraints are “run on a potato,” this is for you. The papers are genuinely interesting if you care about quantization theory beyond the usual 4-bit hand-waving.
Sulphur-2-base — Text-to-video model in diffusers and GGUF formats, with conversational support. Endpoints-compatible. Text-to-video is still mostly “cool demo” territory, but having it in GGUF is a thoughtful touch.

Open Source Releases

TabPFN — Foundation model for tabular data with fast inference and strong results on small-to-medium datasets, no heavy tuning needed. If you’re tired of “just throw XGBoost at it,” this is worth a look.
Claude Code v2.1.129 — Adds –plugin-url flag for fetching plugin zips at runtime, CLAUDE_CODE_FORCE_SYNC_OUTPUT=1 env var for busted terminal auto-detection (looking at you, Emacs), and package manager auto-update support. Quality-of-life stuff that actually matters when you’re trying to get work done. 🔥
DGLZ 0.1.10.post4 — EEG emotion recognition library using CNN-Transformer encoders with hierarchical classifiers. If you’re building biosignal pipelines, this is niche but well-structured.

Research Worth Reading

eOptShrinkQ: Near-Lossless KV Cache Compression Through Optimal Spectral Denoising and Quantization — Frames the KV cache as a low-rank shared context plus full-rank per-token residual under the spiked random matrix model, then applies optimal singular value shrinkage and quantization. Two-stage pipeline that actually delivers near-lossless compression. If you’re squeezing inference on long-context models, this is the paper to read. 📄
When Safety Geometry Collapses: Fine-Tuning Vulnerabilities in Agentic Guard Models — Guard models like LlamaGuard and Granite Guardian can lose all safety alignment through standard domain specialization alone, even when fine-tuned on benign data. A real problem for anyone building agentic pipelines that rely on guardrails. Slightly depressing but important. 🔥
StateSMix: Online Lossless Compression via Mamba State Space Models and Sparse N-gram Context Mixing — Fully self-contained lossless compressor using a Mamba SSM with sparse n-gram mixing and arithmetic coding. No pre-trained weights, no GPU, no external deps — trains token-by-token from scratch. Elegant piece of work if you ever need compression that actually runs on a laptop. 🛠️
On the Invariants of Softmax Attention — Defines the “energy field” (row-centered attention logit) and identifies two classes of invariants in softmax attention that hold across models, architectures, and inputs. Gives you something more rigorous than “attention is like a key-value lookup” hand-waving. 📄
Delay, Plateau, or Collapse: Evaluating the Impact of Systematic Verification Error on RLVR — Studies how systematic (not just random) verifier errors affect RLVR training for LLM reasoning. Directly relevant if you’re using code verifiers as reward signals and wondering why your model isn’t getting better. Practical implications for anyone training with verifiable rewards. 🔥
Generate, Filter, Control, Replay: A Comprehensive Survey of Rollout Strategies for LLM Reinforcement Learning — Covers how trajectories are sampled in LLM RL, from prompt to termination, including reasoning steps and tool interactions. The rollout strategy shapes what the optimizer learns from, so this is a genuinely important engineering decision. Worth pinning. 📄

AI Dev Tools

InsForge — Postgres-based backend platform for coding agents: auth, storage, compute, hosting, AI gateway in one. If you’re building an agent platform and don’t want to glue together six different services, this might save you a week.
Claude Code v2.1.131 — Fixes VS Code extension activation failure on Windows (hardcoded SDK path causing a createRequire polyfill bug) and Mantle auth failures from a missing x-api-key header. Unsexy bugs, but the kind that eat an afternoon if you hit them.

Today’s Synthesis

If you’re deploying Sulphur-2-base for text-to-video generation, pair it with eOptShrinkQ’s two-stage KV cache compression to cut memory usage on long prompts without tanking quality — the arXiv paper shows near-lossless results under the spiked random matrix model, which maps cleanly to video generation’s sequential attention. The two-stage pipeline in eOptShrinkQ — optimal singular value shrinkage followed by quantization — is what makes the compression near-lossless, not the usual 4-bit hand-waving you see in other papers. Then use Claude Code v2.1.129’s new –plugin-url flag to auto-fetch the required quantization plugins at runtime, and set CLAUDE_CODE_FORCE_SYNC_OUTPUT=1 to fix the Emacs terminal issue that’s eaten more hours than anyone admits. This gives you a deployable text-to-video agent that actually fits on hardware you have, instead of needing a dedicated GPU cluster for every 30-second clip. eOptShrinkQ , Sulphur-2-base , Claude Code v2.1.129 .