Hello, I'm Alvin
AI Engineer · Developer Tools · Full-Stack
Building independent AI products and developer tools. Scroll for selected work, or jump to Projects.
Alvin Xu
Software engineer — AI agent systems, retrieval, developer tools
- CiteLoom v1: agent-driven research canvas — Plan-Execute v3 loop, two-stage BGE + RRF retrieval, MCP-compatible inspection server
- NeuralLens Phase 1: mechanistic interpretability primitives for VLA / world models / RL policies, with live simulation-loop hooks
- Internal A/B benchmarks: ranking evaluation (NDCG / MRR), agent cost-aware execution, run-to-run variance reduction
- Solo build discipline: 29 ADRs, A/B-validated subsystem migrations, pixel-level E2E test harness over Chrome DevTools Protocol
- LLM agent design: Plan-Execute and ReAct patterns, cost-ceiling reasoning, multi-provider abstraction
- Retrieval engineering: BGE cross-encoder reranking, RRF fusion, embeddings, graded-query evaluation
- Systems delivery: Tauri + Rust + React desktop apps, 4-layer Rust DB architectures, MCP automation servers
- Full-stack range: Next.js / Node / Mongo / Azure, Kotlin Android companions, ROS robotics fundamentals
- A/B everything: pick the winner by metric, not intuition — NDCG, MRR, picks, run-to-run variance
- Document decisions: ADRs over heroic memory; future-you cannot debug yesterday-you
- Diagnose before patching: root-cause the bottleneck (e.g. cost-ceiling truncation), not the symptom
- Honest engineering: real-pixel tests, internally graded benchmarks, no production-scale claims without users
Tauri-based AI research canvas combining a Plan-Execute LLM agent loop and a two-stage BGE + RRF retrieval pipeline. 29 ADRs, an MCP-compatible inspection/automation server (163 endpoints across 6 namespaces), and a 4-layer Rust DB architecture.
Open-source Python library (in development) for inspecting and steering robot foundation models — VLA models, world models, RL policies — with activation inspection, feature steering, ablation, and live simulation-loop hooks.
Stock dashboard (Angular, Node.js, MongoDB, Azure) integrating Finnhub and Polygon APIs for live market data + Highcharts visualizations, with a Kotlin Android companion app sharing the same backend.
Motion planning + localization framework (RRT trajectory planning, MCL with LiDAR) and grasping (TSR-based IK sampling + Jacobian methods), validated on both RViz simulation and a real robotic arm.
Sentiment, POS, and NER pipelines on Amazon Reviews and CoNLL-2003. Benchmarked classical ML (SVM, HMM, LR) against neural models (Word2Vec, GloVe, BiLSTM, GRU, Transformer); BiLSTM+GloVe reached 88% F1 on NER.