Blog

Writing

Thoughts on AI agents, LLM systems, MLOps, and building intelligent software at scale.

Agent Teams vs Sub-Agents: Multi-Agent Architectures for LLMs

Claude Code's Agent Teams let AI agents communicate peer-to-peer. How does this compare to hub-and-spoke sub-agents and other multi-agent patterns?

Read article

Skills vs MCP: Tool Architecture for AI Agents

A deep dive into Skills vs MCP for building production AI agents—with enterprise use cases, security considerations, and practical recommendations.

Read article

Context Engineering: Building Intelligent Memory Systems

How to design memory systems that give AI agents the right context at the right time—from short-term buffers to persistent knowledge graphs.

Read article

Scaling SEO & GEO Content with LLMs

Using LLMs to generate search-optimized and geo-targeted content pipelines for multi-market deployments.

Read article

Real-Time Personalized Quick Suggest with WebSockets

Building low-latency autocomplete and suggestion systems powered by streaming language model inference.

Read article

Hot-Swappable LoRA: Serving 16 Models on One GPU

How to serve dozens of LoRA adapters from a single base model in production using vLLM, PEFT, and SageMaker.

Read article

Latent Reasoning: Teaching Small LLMs to Think Without Thinking

Training small models to reason in latent space—achieving chain-of-thought quality without the token overhead.

Read article

Lipstick Recommendation

How data science can be applied in the real world using Taobao data to show lipstick buying trends and recommendations.

Read article

Earthquake Analysis

Earthquake analysis based on web crawled data and heatmap visualization.

Read article