Tenkai Daily — May 8, 2026
Model Releases
- ZAYA1-8B — New 8B model from Zyphra with published research — Apache-2.0 licensed 8B model from Zyphra, linked to arxiv:2605.05365, with eval results and safetensors format. Decent option if you want an open-weight model with an actual technical report attached.
Open Source Releases
- PageIndex: Document Index for Vectorless, Reasoning-based RAG — Replaces vector embeddings with a structured document index built for reasoning-based retrieval. If you’ve been knee-deep in embedding pipelines, this is worth poking at — the premise of ditching vectors entirely is either refreshing or naive, but worth a look. 🛠️
- OpenHands Agent Server 1.21.0 — REST/WebSocket interface for the OpenHands AI Agent, with openhands-tools and openhands-sdk at the same version. Lets you script and integrate AI agents via boring web protocols instead of fighting someone’s custom API.
- AgenticAI Framework 3.0.2 — Production-grade Python SDK for multi-agent systems with orchestration, monitoring, and observability built in. If your team is actually deploying agents to production, this targets that space without pretending it’s a weekend hack project. 🛠️
- ZettaBrain RAG 0.5.4 — Local RAG pipeline with a web GUI and zero cloud dependency. Supports NFS, SMB, and object storage — designed for air-gapped or privacy-sensitive setups where you can’t ship docs to some API. 📄
Research Worth Reading
- Are Flat Minima an Illusion? — Function-preserving reparameterization can inflate the Hessian by two orders of magnitude without changing predictions. Either flat minima are less meaningful than we thought, or our measurements were always lying — either way, this matters for SAM and related optim approaches. 📄
- ZAYA1-8B Technical Report — MoE model with 700M active / 8B total params on Zyphra’s MoE++ architecture. Full pretraining pipeline on AMD hardware. Under 1B active params during inference, which is the number that actually matters for your GPU bill. 🤖
- BALAR: A Bayesian Agentic Loop for Active Reasoning — Gives LLM agents a principled Bayesian framework to reason about missing info and decide what to ask next in multi-round dialogue. Addresses the “I’ll just hallucinate an answer” problem with actual information-seeking behavior. 📄
- PRISM: Perception Reasoning Interleaved for Sequential Decision Making — Couples perception and reasoning for LLM-based embodied agents by interleaving the two processes instead of doing perception first, reason second. Tackles a real gap in VLMs that treat perception as a one-shot preprocessing step. 📄
- Agentic Retrieval-Augmented Generation for Financial Document Question Answering — Multi-step numerical reasoning over tables, narratives, and footnotes scattered across corporate filings. Moves past single-pass retrieve-then-generate pipelines that choke on composition. If you work with financial docs, this is relevant. 📄
- SAT: Sequential Agent Tuning for Coordinator-Free Plug-and-Play Multi-LLM Training — Trains teams of smaller LLMs to match or beat a single large model, with monotonic improvement guarantees. No central coordinator needed, which is a meaningful practical constraint. Addresses compounding distribution shift that kills multi-agent training. 🤖
AI Dev Tools
- graphify: Codebase Knowledge Graph Skill for AI Coding Assistants — Turns code, SQL, docs, papers, images, videos into a queryable knowledge graph via GraphRAG with Leiden clustering. Works with Claude Code, Codex, OpenCode, Cursor, Gemini CLI. If your AI assistant keeps hallucinating about your codebase, this is the antidote. 🛠️
- Claude Code v2.1.133 — worktree.baseRef setting for branching control
— Adds
worktree.baseRef(fresh|head) to control whether worktrees branch from origin or local HEAD. The defaultfreshchangesEnterWorktree’s base back to origin, which gives you less surprising branch behavior. Minor but appreciated. 🔥
Today’s Synthesis
If you’re running RAG in an environment where shipping embeddings to a cloud API is a non-starter — think air-gapped, think regulated — ZettaBrain RAG 0.5.4 gives you a local pipeline with a web UI and NFS/SMB support, but you’re still stuck with the usual vector embedding model underneath. PageIndex offers a different path: build a structured document index that ditches embeddings entirely and relies on reasoning-based retrieval. Pair that with ZAYA1-8B , a 8B MoE model with a published report and under 1B active params at inference time, and you have a stack that’s local, smaller than you’d expect on the GPU bill, and not dependent on the embedding pipeline you’ve been maintaining. Whether PageIndex’s vectorless approach holds up in practice is still an open question, but it’s a concrete experiment worth running.