Model Releases

  • netflix/void-model — Diffusion-based video inpainting model for object removal and editing, built on CogVideoX and released under Apache 2.0. Useful for automated VFX pipelines, though it won’t replace a human editor for nuanced continuity work. 🎬
  • google/gemma-4-31B — Base 31B multimodal model with image-text-to-text capabilities and safetensors support. A solid mid-weight checkpoint for fine-tuning if you have the VRAM and prefer open weights over API rate limits. 🤖
  • google/gemma-4-E2B-it — Compact instruction-tuned variant with image-to-text and any-to-any routing capabilities. Fits comfortably on consumer hardware for rapid inference testing, provided you handle the routing logic yourself.
  • Jackrong/Qwopus3.5-9B-v3-GGUF — GGUF-quantized 9B reasoning model fine-tuned on competitive programming and chain-of-thought tasks with multilingual support. Ready to run on CPU-only setups for algorithmic workflows without torching your GPU budget. 🧠

Open Source Releases

  • Local Deep Research: Local LLM-Powered Research Assistant — Automated research agent that scores ~95% on SimpleQA by querying arXiv, PubMed, and private docs via local or cloud LLMs. Keeps everything encrypted on your machine, which beats feeding your proprietary queries into a corporate data lake. 🔍
  • light-llm-hp 0.3.2 — Minimalist inference framework focused on fast model loading, efficient batching, and low-latency serving. Another option in the crowded inference space; worth a look if vLLM’s memory overhead is bottlenecking your deployment. 🛠️
  • memgraph-sdk 0.7.0 — Adds persistent memory to AI agents for belief tracking, semantic similarity search, and decision logging. Solves the “turn-by-turn amnesia” problem without forcing you to build a custom vector DB wrapper from scratch. 💾

Research Worth Reading

AI Dev Tools

Today’s Synthesis

Combining the newly released google/gemma-4-31B with the Local Deep Research: Local LLM-Powered Research Assistant and the persistent memory layer from memgraph-sdk 0.7.0 gives you a practical, fully on‑prem research loop that avoids both API rate limits and the “turn‑by‑turn amnesia” of vanilla agents. Start by quantizing Gemma‑4‑31B to fit your VRAM (or run it GGUF‑style on CPU if needed), then plug it into Local Deep Research as the backend LLM for arXiv/PubMed queries and private document retrieval. Wrap each research turn with memgraph-sdk’s belief‑tracking store so the agent can reuse procedural knowledge—like problem reframing or intermediate hypotheses—across iterations instead of recomputing from scratch. The result is a self‑contained agent that builds a growing semantic graph of what it’s learned, letting you scale reasoning depth without burning extra tokens or hitting external quotas. For engineers, this means you can prototype a domain‑specific deep‑research assistant in a day, iterate on prompts and retrieval strategies locally, and only pay the compute cost of the model itself—no hidden API fees, no data leakage, and a clear path to add more sophisticated memory or reasoning modules later.