Tenkai Daily — May 5, 2026
Model Releases
IBM Granite 4.1: 8B and 30B Models — IBM dropped Granite 4.1 in 8B and 30B flavors, Apache 2.0 licensed with Azure deployment hooks and eval numbers published. Solid option if you’re already in the IBM ecosystem or want a commercially clean license without the usual caveats.
InclusionAI Ling-2.6-Flash Released — A lightweight text gen model from InclusionAI built on the Bailing Hybrid architecture, MIT licensed. The “flash” branding suggests speed-optimized inference — worth benchmarking if you’re comparing small-model throughput.
Poolside Laguna-XS.2 Released — Poolside’s conversational model update with vLLM support and Apache 2.0 licensing. The vLLM compatibility is the practical win here — drop-in serving without custom infra gymnastics.
Open Source Releases
Claude Code v2.1.128 — MCP tool count visibility so you can see which connected servers are actually pulling their weight (or returning 0 tools),
.zipplugin support for--plugin-dir, random session colors, and--channelsconsole improvements. Small quality-of-life patches that add up 🛠️OpenCode v1.14.37 — Task cancellation now propagates to child subtasks, killing the orphaned background work problem. v2 sessions get cleaner tool states, better compaction summaries, and warp-to-workspace functionality. The cancellation fix alone is worth the update.
code-outline-graph 0.2.35 — Symbol-level code indexer served via MCP, built for token-efficient AI-assisted editing. The confirm-before-read flow is a smart touch — it keeps context windows lean by not pulling code until the agent actually needs it.
puvinoise-sdk 0.3.55 — A vendor-neutral OpenTelemetry SDK for AI agent observability with built-in Anthropic, OpenAI, and Ollama support. If you’re tired of stitching together vendor-specific tracing, this gives you one telemetry layer for your whole agent stack.
Research Worth Reading
Are Tools All We Need? Unveiling the Tool-Use Tax in LLM Agents 📄 — Finds that tool-augmented reasoning doesn’t always beat plain chain-of-thought, especially with semantic distractors in play. The “tool-use tax” framing is useful: it forces you to measure whether adding a tool actually helps for your specific task rather than assuming it does.
AgentFloor: Tool Use Capability of Small Open-Weight Models 📄 — Maps which agent workflow steps actually need frontier models vs. which ones small open-weight models handle fine. Directly actionable for cost optimization — route the easy steps to cheaper models and reserve the big guns for what matters.
Minimal, Local, Causal Explanations for Jailbreak Success in LLMs 📄 — Moves past “this prompt bypassed safety” to actually explaining why at a causal, localized level. If you’re doing red-teaming or building safety layers, this is the kind of rigor the field needs more of.
TUR-DPO: Topology- and Uncertainty-Aware Direct Preference Optimization 📄 — Upgrades DPO by modeling preference topology and uncertainty instead of treating preferences as flat binary pairs. Should make RLHF pipelines more robust to noisy preference data — which is most preference data.
AEM: Adaptive Entropy Modulation for Multi-Turn Agentic RL 📄 — Tackles credit assignment in multi-turn RL by adaptively modulating entropy, specifically for sparse outcome-only rewards. If you’re training agents on complex multi-step tasks where intermediate feedback is thin, this is worth a close read.
Thinking in Text and Images: Interleaved Vision-Language Reasoning for Robot Manipulation 📄 — Interleaved text-and-image reasoning traces for long-horizon robot manipulation, blending CoT causal ordering with spatial visual prediction. Relevant for anyone building VLA policies or robot planning systems.
AI Dev Tools
OpenCode v1.14.34 — PTY websocket auth tickets for more reliable terminal connections, v2 failure events so clients can surface failed runs properly, better shell handling across Bash/PowerShell/cmd, and structured HTTP error bodies. Meaningful reliability improvements across the board 🛠️
msitarzewski/agency-agents — A curated collection of specialized AI agents covering roles from frontend dev to community management, each with defined personalities, processes, and deliverables. More useful as a design reference for role-specific agent architectures than as a plug-and-play toolkit.
Today’s Synthesis
A few threads worth weaving together from today’s batch. If you’re building agent workflows and care about cost, start with AgentFloor — it gives you a framework for deciding which steps in your pipeline actually need a frontier model and which ones a small open-weight model handles just fine. That cost-saving routing only works, though, if you’re honest about whether each tool in the loop is actually pulling its weight. That’s where the Tool-Use Tax paper comes in: it shows tool-augmented reasoning can underperform plain chain-of-thought when semantic distractors creep in. So the play is: map your workflow with AgentFloor’s lens, then benchmark each tool-augmented step against a no-tool baseline before committing. And if you’re running multi-turn agents where feedback is sparse, AEM ’s adaptive entropy modulation might help with the credit assignment problem that makes those long-horizon tasks so painful to train. Three papers, one coherent agent-building checklist.