Model Releases

MiniMaxAI/MiniMax-M3 — Multimodal MoE from MiniMax that handles image-text-to-text, video understanding, coding, and agent tasks. Paper at arXiv:2606.13392 if you want the details. Custom code required, but endpoints-compatible.

nex-agi/Nex-N2-Pro — Qwen3.5-based MoE for image-text-to-text, Apache 2.0 with published evals. Targets multimodal chat and generation. Endpoints-compatible, so plug-and-play if that’s your stack.

Open Source Releases

LMCache/LMCache — High-performance KV cache layer for LLM inference. Reuses and manages key-value caches across requests to cut redundant computation. If you’re serving models at scale, this is the kind of infra that actually moves the needle.

django-cfg 2.2.70 — Replaces settings.py with Pydantic v2 models, throws in a Next.js admin, WebSockets, and 8 enterprise apps. Claims 90% less config code and auto-generated TS clients. Sounds like a lot, but validated config is a genuine pain point.

kagura-code-reviewer 0.3.0 — Free code review agent running on local Ollama, integrates with Claude Code. No API costs, runs offline. Useful if you’re privacy-conscious or just cheap.

RedStone MCP — Real-time and historical crypto prices across 1,300+ assets, 110+ chains, 50+ DEX / 30+ CEX sources. $4.9B TVS. No API key needed. Built for agent-driven DeFi pipelines — finally, oracle data that doesn’t require a contract negotiation.

Claude Code v2.1.176 — Multilingual session titles (finally), footerLinksRegexes for custom footer badges, and better Bedrock credential caching. Quality-of-life stuff that matters if you live in this tool daily.

cobol-bridge-mcp 1.1.10 — Parses COBOL and maps to Python, Java, or Go with test harnesses and migration plans. Targets banks, insurers, government — the places where legacy mainframes go to die slowly. 🏦

AI Dev Tools

rtk-ai/rtk — CLI proxy that cuts LLM token consumption 60-90% on common dev commands. Single Rust binary, zero deps, 62K+ stars. Built for agentic coding workflows where token bills add up fast.

JuliusBrussee/caveman — Claude Code skill using “caveman-style” minimal prompts to slash token usage ~65%. 72K+ stars. Proof that prompt engineering isn’t magic — sometimes you just need to stop being verbose. 🤷‍♂️

Today’s Synthesis

The token-cost problem has three distinct attack surfaces, and today’s releases hit all of them. rtk sits at the proxy layer, compressing common dev commands by 60-90% before they hit the model. caveman proves the prompt-engineering angle: “caveman-style” minimal prompts cut usage another ~65% with zero quality loss. Then LMCache tackles the inference engine itself, reusing KV caches across requests to eliminate redundant computation. Stack them: caveman for prompt hygiene, rtk as a drop-in CLI proxy for agentic workflows, LMCache on your inference servers. That’s a compounding reduction — prompt tokens shrink, fewer unique prompts reach the model, and the ones that do reuse cached attention. If you’re running Claude Code or similar agents at team scale, this stack pays for itself in a single sprint. The best part? All three are open source, zero vendor lock-in, and rtk/LMCache are single-binary deploys. 🛠️