Blog
Writing
Thoughts on AI agents, LLM systems, MLOps, and building intelligent software at scale.
Agent Teams vs Sub-Agents: Multi-Agent Architectures for LLMs
Claude Code's Agent Teams let AI agents communicate peer-to-peer. How does this compare to hub-and-spoke sub-agents and other multi-agent patterns?
Read articleSkills vs MCP: Tool Architecture for AI Agents
A deep dive into Skills vs MCP for building production AI agents—with enterprise use cases, security considerations, and practical recommendations.
Read articleContext Engineering: Building Intelligent Memory Systems
How to design memory systems that give AI agents the right context at the right time—from short-term buffers to persistent knowledge graphs.
Read articleScaling SEO & GEO Content with LLMs
Using LLMs to generate search-optimized and geo-targeted content pipelines for multi-market deployments.
Read articleReal-Time Personalized Quick Suggest with WebSockets
Building low-latency autocomplete and suggestion systems powered by streaming language model inference.
Read articleHot-Swappable LoRA: Serving 16 Models on One GPU
How to serve dozens of LoRA adapters from a single base model in production using vLLM, PEFT, and SageMaker.
Read articleLatent Reasoning: Teaching Small LLMs to Think Without Thinking
Training small models to reason in latent space—achieving chain-of-thought quality without the token overhead.
Read articleLipstick Recommendation
How data science can be applied in the real world using Taobao data to show lipstick buying trends and recommendations.
Read articleEarthquake Analysis
Earthquake analysis based on web crawled data and heatmap visualization.
Read article