Tenkai Daily — June 12, 2026
Model Releases
zai-org/SCAIL-2 — ZAI Lab drops a diffusion-based pose-driven character animation model for image-to-video under MIT license. If you’ve been waiting for open weights to animate consistent characters from reference frames, this is worth a look. Paper at arXiv:2512.05905.
Open Source Releases
opencode v1.17.4
— Adds cwd support for local MCP servers so they actually start from your workspace directory instead of wherever the process launched. Also brings connector-based auth and v2 config APIs. The MCP integration ergonomics are finally catching up to where they should’ve been.
remembrane 0.4.0 — SQLite-backed persistent memory for agents with zero required deps, pluggable embeddings, and an MCP server included. Local-first, no external services, framework-agnostic. If you’re building agents that need to remember things across sessions without sending data to a cloud vector DB, this solves the boring infrastructure part.
tramdag 0.2.0 — PyTorch implementation of TRAM-DAGs: causal normalizing flows supporting observational, interventional, and counterfactual queries in one model. Niche but genuinely useful if you’re doing causal inference and want differentiable structure learning without the usual identifiability headaches.
Research Worth Reading
ToolSense: A Diagnostic Framework for Auditing Parametric Tool Knowledge in LLMs — Embedding-based tool retrieval fails on specialized tool semantics because compact encoders under-capture parametric knowledge. This paper builds a diagnostic framework to audit what the model actually knows about tools vs. what retrieval surfaces. Relevant if you’re scaling agent tool catalogs past a few hundred functions.
Strategic Decision Support for AI Agents — Flips the traditional decision-support script: agents act, humans and tools support. Frames the problem as helping agents make better sequential decisions under uncertainty with human-in-the-loop feedback. More decision theory than prompt engineering, which is refreshing.
Pythagoras-Prover: Advancing Efficient Formal Proving via Augmented Lean Formalisation — Tackles the compute bottleneck in Lean theorem proving by augmenting formalization data and shortening reasoning traces. If you’re tracking the “AI for math” space, this is the efficiency angle — less compute, more clever data augmentation.
PersonaDrive: Human-Style Retrieval-Augmented VLA Agents for Closed-Loop Driving Simulation — Traffic agents in simulators usually behave identically. This gives them distinct “personas” via retrieval-augmented VLAs so non-ego traffic actually drives like heterogeneous humans. Matters for training robust driving policies that don’t overfit to average behavior.
“Did you lie?” Evaluating Lie Detectors across Model Scale and Belief-Verified Model Organisms — Lie detectors need ground truth where models verifiably believe the opposite of what they say. This constructs belief-verified model organisms across scales to test detection methods. The eval methodology alone is worth stealing for any deception/alignment work.
TrajGenAgent: A Hierarchical LLM Agent for Human Mobility Trajectory Generation — Synthetic human mobility trajectories for urban planning/epidemiology without privacy nightmares. Hierarchical LLM agent generates realistic patterns at scale. If you’ve worked with mobility data, you know how painful real datasets are — this is a pragmatic generative approach.
AI Dev Tools
NVIDIA/SkillSpector — Security scanner for AI agent skills. Detects vulns, malicious patterns, and supply-chain risks in skill definitions. As agent skill markets grow, this is the static analysis layer that should’ve existed yesterday. 🛠️
hexo-ai/sia — Framework for autonomous self-improvement of any AI system on a benchmark task. Iterative optimization without human intervention. The recursive self-improvement pipe dream, now with a CLI. 🤖
Cline v3.89.2 — Critical compat fix: upgrades Anthropic and Vertex AI SDKs for VS Code 1.123+’s Node 24 runtime. If your Cline stopped working after updating VS Code, this is why. 🔧
Today’s Synthesis
If you’re building agent systems that need to persist across sessions, remembrane gives you local-first SQLite memory with an MCP server built in — no cloud vector DB required. Pair that with ToolSense’s diagnostic framework to audit what your model actually knows about its tool catalog versus what retrieval surfaces, and you’ve got a feedback loop: deploy, measure parametric knowledge gaps, augment the memory store, repeat. Then run SkillSpector over your skill definitions before they hit production; it catches supply-chain risks and malicious patterns in the same static-analysis pass you’d run on code. The three tools form a coherent stack: persistent context, verified tool knowledge, and a security gate — all local, all inspectable. Start by wiring remembrane’s MCP server into your agent loop, add a nightly ToolSense eval on your top-200 functions, and gate skill installs through SkillSpector’s CLI. You’ll catch hallucinated tool calls, stale embeddings, and compromised skill definitions before they become incidents.