Open Source Releases

  • Qwen3.5-9B Reasoning Distilled Model — A distilled version of Qwen3.5-9B fine-tuned for reasoning, using unsloth. A practical example of compressing and adapting a model for specific conversational tasks without starting from scratch.
  • vmlx 1.0.0 — A library for running AI inference locally on Apple Silicon, handling text, image, video, and audio. Useful if you want to prototype or deploy on-device AI without cloud dependencies.
  • Qwen 2.5 VL 7B with FP8 Scaling — A quantized vision-language model using FP8, aiming to reduce memory footprint and speed up inference. A solid option for deploying multimodal models where resources are tight.

Research Worth Reading

AI Dev Tools

Today’s Synthesis

The common thread today is making AI agents more efficient and capable without just throwing more compute at them. ToolTree tackles the planning problem, giving agents a smarter way to sequence tool use. But a smart plan is useless if the agent’s memory is bloated and slow; that’s where Structured Distillation comes in, slashing the token cost of remembering past interactions. Meanwhile, Efficient Reasoning with Balanced Thinking addresses the core reasoning loop itself, preventing the model from wasting cycles on unnecessary deliberation. For an engineer, this suggests a practical stack: use structured distillation to manage memory, apply balanced thinking to the core reasoning step, and then layer on something like ToolTree for complex, multi-step planning. It’s a blueprint for building agents that are cheaper to run and more reliable in their actions.