Tenkai Daily — May 13, 2026

Model Releases

Qwen3.6-35B-A3B GGUF: Unsloth Quantized MoE — Unsloth’s GGUF quantized build of Qwen3.6-35B-A3B. Packs reasoning, CoT, LoRA, SFT, multimodal vision, tool-use, function-calling, and long context into a format that actually runs on consumer hardware. Multi-language support across en, zh, es, ru, ja. If you’ve been waiting for a quantized MoE you can tinker with locally, this is your moment. 🦙

Open Source Releases

openhuman - Personal AI Super Intelligence — Self-hosted, privacy-first AI assistant. “Super Intelligence” is doing a lot of heavy lifting in that name, but the pitch is simple: run it locally, keep your data yours, don’t send your life to a cloud. Worth a look if you’re tired of trusting your notes and docs to someone else’s API. 🤖
agentcloak 0.1.0 — Browser automation toolkit for AI agents with CLI and MCP dual interfaces, multi-backend stealth, and remote bridge support. Basically gives your agents a hoodie and sunglasses for web scraping. Useful if you’re building agent-driven web pipelines that need to not get blocked. 🕶️
mesh-context-layer 0.4.4 — Shared memory layer for multi-agent systems, available as both an MCP server and Python library. If you’ve got agents that need to share state without you duct-taping Redis into everything, this handles the plumbing. 🧠
sniffly-lzwind 0.4.10 — AI Code Analytics Dashboard supporting Claude Code and OpenCode. Gives you visibility into what your AI-assisted dev workflow is actually doing. Because “vibe coding” is fun until someone asks what the agent changed and why. 📊

Research Worth Reading

QuIDE: Unified Metric for Quantized Neural Network Efficiency — Collapses the compression-accuracy-latency trade-off into a single score: I = (C × P)/log₂(T+1). Validated on SimpleCNN, ResNet-18, and Llama. If you’re tired of quoting three different numbers to explain why your quantized model is “good enough,” this gives you one number to argue with instead. 📄
Steering Discrete Diffusion Language Models Without Quality Degradation — Turns out uniformly intervening at every denoising step (the naive port from autoregressive models) trashes quality in discrete diffusion LMs. The paper proposes mechanistically informed intervention schedules instead. If you’re working with diffusion-based text gen, this saves you a few weeks of wondering why your outputs read like soup. 📄
Rotation-Preserving Supervised Fine-Tuning for Better OOD Generalization — SFT quietly murders your out-of-domain generalization. This work preserves the dominant singular subspaces of pretrained weights during fine-tuning, keeping your model from forgetting everything it learned. No Hessian/Fisher overhead, which is the part that’ll make practitioners actually use it. 📄
LEAP: Lookahead Early-Convergence Token Detection for dLLM Parallelism — Unlocks more parallelism in diffusion language models by detecting tokens that converge early and not wasting compute on them. Relaxes the strict conditional independence assumptions that were throttling dLLM throughput. Speedup for diffusion text gen without changing the architecture — chef’s kiss. 📄
TMPO: Trajectory Matching Policy Optimization for Diffusion Alignment — Identifies mode-seeking as the root cause of reward hacking in RL-based diffusion alignment and proposes TMPO to keep generative diversity intact. Addresses mode collapse and reward amplification going off the rails. If your diffusion model keeps producing the same thing no matter the prompt, this is the reading for you. 📄
SkillGen: Verified Inference-Time Agent Skill Synthesis — Multi-agent framework that auto-synthesizes auditable skills from base agent trajectories. Saves you from hand-writing skills for every agent task, which is the bottleneck nobody talks about when scaling agent systems. Skipping retraining is a nice touch. 📄

AI Dev Tools

Claude Code v2.1.140 — Agent tool subagent_type matching now handles case- and separator-insensitive values (e.g., “Code Reviewer” → “code-reviewer”). Also fixes the /goal command silently hanging when disableAllHooks or allowManagedHooksOnly is set — it now shows a proper error instead of just… staring at you. Small quality-of-life stuff that actually matters when you’re mid-flow. 🛠️

Tutorials & Guides

LLMs-from-scratch — Step-by-step PyTorch implementation of a ChatGPT-like LLM from scratch. Highly educational if you want to understand what’s actually happening under the hood instead of just calling an API and crossing your fingers. The kind of resource that makes the difference between someone who uses LLMs and someone who understands them. 📚

Today’s Synthesis

If you’re planning to run Qwen3.6-35B-A3B GGUF locally and fine-tune it on a domain-specific task, you’ve got two problems worth solving in tandem. First, quantizing trades accuracy for speed, but “how much accuracy for how much speed” has always been answered with three squishy numbers you argue about in Slack. The QuIDE metric collapses that into a single score — useful for deciding whether your quantized build is actually holding up before you invest GPU hours in fine-tuning. Second, fine-tuning any model risks wrecking the general knowledge baked in during pretraining, and quantization error compounds that risk fast. The rotation-preserving SFT approach keeps the dominant singular weight subspaces intact during training, which is exactly the kind of guardrail a quantized model needs. Pair these two and you get a clean workflow: validate your quantized baseline with QuIDE, then fine-tune under rotation-preserving constraints so you don’t burn the efficiency gains you already paid for.