Model Releases

  • google/gemma-4-26B-A4B-it-assistant — A 26B MoE (4B active) any-to-any multimodal assistant from Google, Apache 2.0. The MoE architecture keeps inference costs reasonable while still handling text, images, and audio. Endpoints compatibility means it slots into existing serving infra without drama. 🤖

Open Source Releases

  • Claude Code v2.1.132 — Adds CLAUDE_CODE_SESSION_ID env var for session tracking in Bash subprocesses, CLAUDE_CODE_DISABLE_ALTERNATE_SCREEN=1 to kill fullscreen rendering, and a --plugin-url flag to pull plugin zips from a remote URL. Also includes CLAUDE_CODE_FORCE_SYN — small quality-of-life fixes that add up if you live in this tool. 🛠️
  • opencode v1.14.40 — Supports .well-known/opencode config files pointing to remote configs, so you can centralize configuration across teams. Also fixes assistant text preservation when replaying signed reasoning blocks and normalizes not-found errors for missing sessions.
  • optillm 0.3.15 — An optimizing inference proxy that sits between your clients and LLM backends. If you’re running production LLM services and care about throughput and cost, this is worth a look.
  • pydtnn 3.8.6 — Python library for distributed neural network training across multiple nodes. Straightforward tooling for engineers scaling training jobs beyond a single machine.
  • cheahjs/free-llm-api-resources — A curated list of free LLM inference APIs. Useful for prototyping and experimentation when you don’t want to burn credits — or when your expense report is already questionable.

Research Worth Reading

Today’s Synthesis

The thread running through today’s picks is smaller models doing serious work in constrained roles. Terminus-4B asks whether a 4B parameter model can handle subagent tasks like search and debugging to keep a frontier model’s context window free — and the answer is “surprisingly often, yes.” That pairs naturally with optillm 0.3.15 , an optimizing inference proxy that lets you route and manage LLM traffic between clients and backends. If you’re building a multi-agent system where a cheap small model handles the grunt work and a larger model does the heavy reasoning, you need exactly this kind of proxy layer to keep costs predictable and latency in check. Meanwhile, Learning Correct Behavior from Examples shows you can validate sequential agent execution from as few as 2-10 passing examples — no manual specs required. Put these together: a practical recipe for building multi-agent pipelines where small models are validated cheaply, routed intelligently, and kept on a tight leash. The frontier model stays in reserve for what it’s actually needed for.