Tenkai Daily — March 25, 2026

Open Source Releases

claude-code v2.1.83 adds managed settings drop-in and hook events 🛠️ — Introduces a managed-settings.d/ directory for dropping in policy fragments that merge alphabetically with managed-settings.json. Adds CwdChanged and FileChanged hook events for reactive environment tweaks and a sandbox.failIfUnavailable flag to fail fast when required resources are missing.
ghostaudit 1.0.0 🛠️ — Provides tamper‑evident auditing of AI workspaces using Merkle‑tree sealed findings and encrypted evidence capsules, targeting SOC 2 and HIPAA compliance by giving verifiable provenance for model outputs and data transformations.
openai-agents 0.13.1 🛠️ — SDK that simplifies building agentic applications with abstractions for agent loops, tool usage, memory management, and sync/async execution against OpenAI’s API, letting developers compose complex behaviors with little boilerplate.
llmhosts 0.7.0 🛠️ — Acts as a personal AI cloud: an intelligent proxy/router/cache for LLMs that deduplicates requests, selects models dynamically based on workload, and persists responses to cut latency and cost on repeated queries.
microsoft/markitdown 🛠️ — Python library and CLI that converts Office documents, PDFs, and images into Markdown via existing converters, offering a unified API for normalizing content in documentation pipelines.

Research Worth Reading

STEM Agent: A Self-Adapting, Tool-Enabled, Extensible Architecture for Multi-Protocol AI Agent Systems 📄 — Presents a modular architecture that dynamically adapts interaction protocols, tool integrations, and user models at runtime via a plugin system, supporting REST, WebSocket, gRPC and allowing hot‑swapping of tools.
Scaling Attention via Feature Sparsity 📄 — Introduces feature‑sparse attention that learns to attend to only a subset of feature dimensions, reducing compute from O(n²d) to O(n²k) with k≪d while preserving accuracy, using differentiable sparsification compatible with standard transformer blocks.
Between the Layers Lies the Truth: Uncertainty Estimation in LLMs Using Intra-Layer Local Information Scores 📄 — Proposes a lightweight uncertainty estimator that measures agreement between intermediate layer representations in a single forward pass; the score correlates with model confidence and detects out‑of‑distribution inputs with minimal overhead.
Sample Transform Cost-Based Training-Free Hallucination Detector for Large Language Models 📄 — Describes a training‑free hallucination detector that estimates the complexity of an LLM’s conditional distribution via sampling‑based transform costs, requiring only black‑box access and showing strong correlation with hallucination across factuality and summarization tasks.
Benchmarking Multi-Agent LLM Architectures for Financial Document Processing: A Comparative Study of Orchestration Patterns, Cost-Accuracy Tradeoffs and Production Scaling Strategies 📄 — Benchmarks sequential, parallel, hierarchical, and mesh multi‑agent LLM setups on financial document extraction (XBRL, earnings calls), measuring cost, latency, accuracy, and scaling; finds hierarchical orchestration delivers the best cost‑accuracy tradeoff.
Understanding LLM Performance Degradation in Multi-Instance Processing: The Roles of Instance Count and Context Length 📄 — Analyzes how throughput and latency drop when processing multiple independent instances, attributing the degradation to increased context length and memory contention, and provides empirical scaling laws across model sizes and hardware, recommending batching and context partitioning strategies.

AI Dev Tools

usestrix/strix 🛠️ — Deploys AI‑powered agents that autonomously probe applications for security vulnerabilities, suggest fixes, and validate remediation by blending offensive security techniques with LLM‑driven reasoning.
agentscope-ai/agentscope 🛠️ — Framework for building, debugging, and observing AI agents with transparent logging, state inspection, and visualizable workflows, including built‑in tracing and performance metrics for trustworthy development.
ruvnet/ruflo 🛠️ — Agent orchestration platform for Claude that enables deployment of intelligent multi‑agent swarms, autonomous workflow coordination, and conversational AI; features enterprise‑grade architecture, distributed swarm intelligence, RAG integration, and native Claude Code/Codex support.
langgenius/dify 🛠️ — Production‑ready platform for developing agentic workflows, offering visual orchestration, API integration, and deployment tools for LLM‑based applications, with built‑in monitoring, versioning, and support for rapid prototyping and scaling.
hsliuping/TradingAgents-CN 🛠️ — Chinese‑language enhancement of the TradingAgents multi‑agent LLM trading framework, providing localized documentation, examples, and agent configurations for Chinese financial markets while preserving the core multi‑agent architecture and adding region‑specific data sources.
mvanhorn/last30days-skill 🛠️ — Skill that equips AI agents to research topics across Reddit, X, YouTube, Hacker News, Polymarket and the wider web, then synthesize a grounded summary with citations by integrating scraping, retrieval, and summarization components for autonomous knowledge gathering.

MCP Servers & Integrations

Browserbase 🔥 — Delivers cloud browser automation via Stagehand, letting LLMs interact with web pages, capture screenshots, and run parallel sessions at scale; enables AI agents to perform web‑based tasks such as form filling, data extraction, and UI testing without local infrastructure.

Today’s Synthesis

Engineers looking to cut LLM inference cost while keeping agents responsive can combine three recent releases. The new claude-code v2.1.83 🛠️ adds managed‑settings.d/ drop‑ins and hook events like CwdChanged and FileChanged, letting you inject configuration or trigger scripts the moment a project file changes. Pair that with the openai‑agents 0.13.1 🛠️ SDK, which gives you a ready‑made agent loop, tool abstraction, and sync/async execution against OpenAI’s API. Inside the agent loop, apply the insight from Scaling Attention via Feature Sparsity 📄: replace the dense self‑attention layer with a feature‑sparse variant that learns to attend to only a small subset of dimensions (k≪d), cutting the O(n²d) cost to O(n²k) without sacrificing accuracy. By configuring claude‑code to reload the agent’s settings via a hook whenever the codebase changes, you can hot‑swap the sparse‑attention model in real time, giving you a self‑tuning, low‑latency agent pipeline that adapts to workload shifts and reduces compute bills.