Meta - Toronto ON
- Lead E2E GenAI inference performance in MTIA, architecting a modern inference engine with distributed parallelism support and serving features such as disaggregated inference
- Coordinated model enablement and performance for a flagship MTIA initiative, leading 20+ engineers to secure a 1-gigawatt hardware commitment
- Optimized async D2H pipelining and load-balancing, cutting host overhead 20% and TTFT 16% on Llama3
- Developed fused compute and collective kernels to mask communication latency in distributed workloads.
- Ranked #1 in engineering velocity within Foundation Org, averaging 400+ commits and 1,000+ code reviews per half while scaling the team by 5+ engineers