Tenkai Daily — May 25, 2026

🌌 Model Releases

meituan-longcat/LongCat-Video-Avatar-1.5: Audio-Driven Avatar Video Generation — Generates and continues avatar videos from audio, image, and text inputs in English and Chinese. MIT-licensed with Diffusers, ONNX, and safetensors — because the universe apparently needed more ways to animate disembodied heads. 🏊

🐋 Open Source Releases

cache-dit 1.3.8 — PyTorch-native inference engine for Diffusion Transformers with caching, parallelism, and quantization built in. If your diffusion pipeline is slow, this is the towel you should grab. 🌻
clustertrace 0.14.0 — Local-first LLM agent observability with decorator-based instrumentation and OpenTelemetry ingestion. Groups traces by execution pattern, tracks cost per call, and spits out shareable HTML snapshots — because debugging multi-agent chaos shouldn’t require a second agent. 🌌
Show HN: Cursor IDE now remembers your coding prefs using MCP — Cursor persists user coding preferences across sessions via MCP. A practical demo that MCP can do more than buzzword bingo. 🐋
orchemist 0.10.0 — Scenario-driven orchestration engine for multi-agent pipelines. You define the scenario, it coordinates the actors — which is basically what the mice have been doing with the Earth all along. 🏊
sourcecode 1.31.20 — Deterministic codebase context generation for AI coding agents. Keeps context windows consistent across large repos so your agent isn’t hallucinating the repo structure on every run. 🌻
greatminds 1.2.6 — File-based multi-agent coordination protocol with per-role queues and a layered plugin system for Claude Code and OpenAI Codex. Structured inter-agent communication — because even paranoid androids need a protocol. 🌌

🌻 Research Worth Reading

Latent Cache Flow: Model-to-Model Communication Without Text — LLM agents talk via latent KV cache representations instead of text, cutting latency and sidestepping autoregressive encode/decode loss. Builds on Cache-to-Cache by learning adapters that translate sharer KV matrices to receiver models. The Babel fish just got a hardware upgrade. 🏊
Tensor Cache: Eviction-conditioned Associative Memory for Transformers — Two-level KV cache: sliding-window softmax L1 paired with a fixed-size associative memory for evicted tokens. Standard sliding-window caches throw tokens away like the universe discards middling coffee — this keeps the evidence. 🐋
FuRA: Full-Rank Parameter-Efficient Fine-Tuning with Spectral Preconditioning — Argues both full fine-tuning and LoRA ignore spectral structure from pretraining, letting noisy gradients mangle robust features. Adds spectral preconditioning as a fix. Solid idea that probably should’ve been obvious sooner. 🌌
PathCal: State-Aware Reflection-Marker Calibration for Efficient Reasoning — Calibrates reasoning trajectories using explicit reflection markers (‘wait’, ‘but’, ‘alternatively’) as hesitation signals. State-aware calibration aims to make long CoT chains less of a token-hemorrhage. 🌻
Transcoders Trace Visual Grounding and Hallucinations in Vision-Language Models — Swaps Sparse Autoencoders for transcoders to trace how visual inputs get turned into text in VLMs. Captures functional updates SAEs miss, giving new interpretability hooks for hallucination debugging. 🏊
When Do LLMs Reason? A Dynamical Systems View via Entropy Phase Transitions — Uses entropy phase transitions to explain when CoT actually helps versus when it just burns tokens. Provides theoretical backing for the “why am I paying for reasoning tokens on a simple lookup” feeling. 🐋

🌌 AI Dev Tools

earendil-works/pi — Full-stack AI agent toolkit: coding agent CLI, unified LLM API layer, TUI/web UI libs, Slack bot, and vLLM pod management. One-stop shop if you’re building agent infrastructure from scratch and don’t want to Frankenstein six repos. 🌻
ruvnet/ruflo — Multi-Agent Swarm Orchestration Platform for Claude — Agent orchestration platform for multi-agent swarms with RAG, self-learning swarm intelligence, and native Claude Code/Codex integration. Enterprise-grade — Marvin would call it “mostly harmless” at best. 🏊

Today’s Synthesis

Towel Day is the perfect excuse to rethink the ’tools’ you actually carry in your LLM pipeline, starting with memory management. While everyone debates model sizes, the real bottleneck is often the latency of turning latent state into text and back again for agent-to-agent communication. Latent Cache Flow proposes skipping that translation entirely, letting agents sync via KV caches—a move that cuts latency by treating memory as a shared bus rather than a mailbox. However, a shared bus is useless if you have no memory; Tensor Cache fixes this by adding an associative layer that preserves evicted tokens, preventing the ‘amnesia’ of standard sliding windows. To actually deploy this in production without writing your own CUDA kernels, cache-dit provides the PyTorch-native engine that bundles caching, quantization, and parallelism into a usable package. The math is simple: moving floats is cheaper than moving text. Don’t Panic—grab the towel, cache the state, and stop paying for translation overhead that isn’t doing any thinking.