Blog

Writing

Thoughts on AI agents, LLM systems, MLOps, and building intelligent software at scale.

Agent Teams vs Sub-Agents: Multi-Agent Architectures for LLMs

Claude Code's Agent Teams let AI agents communicate peer-to-peer. How does this compare to hub-and-spoke sub-agents and other multi-agent patterns?

A deep dive into Skills vs MCP for building production AI agents—with enterprise use cases, security considerations, and practical recommendations.

How to design memory systems that give AI agents the right context at the right time—from short-term buffers to persistent knowledge graphs.

Using LLMs to generate search-optimized and geo-targeted content pipelines for multi-market deployments.

Building low-latency autocomplete and suggestion systems powered by streaming language model inference.

How to serve dozens of LoRA adapters from a single base model in production using vLLM, PEFT, and SageMaker.

Training small models to reason in latent space—achieving chain-of-thought quality without the token overhead.

How data science can be applied in the real world using Taobao data to show lipstick buying trends and recommendations.

Earthquake analysis based on web crawled data and heatmap visualization.