David Ramón Prados

GenAI Performance Engineer

Toronto, ON
+1 (226) 505-5305
dramonpr@uwaterloo.ca

linkedin.com/in/davidramonprados/

Summary

Performance Engineer specializing in inference architecture, hardware/software co-design, and E2E performance for distributed GenAI systems. Proven lead for large-scale silicon enablement, with deep expertise in optimizing the full execution stack and architecting high-utilization inference engines.

Work experience

Senior Software Engineer

2025/07Present

Meta - Toronto ON

Lead E2E GenAI inference performance in MTIA, architecting a modern inference engine with distributed parallelism support and serving features such as disaggregated inference
Coordinated model enablement and performance for a flagship MTIA initiative, leading 20+ engineers to secure a 1-gigawatt hardware commitment
Optimized async D2H pipelining and load-balancing, cutting host overhead 20% and TTFT 16% on Llama3
Developed fused compute and collective kernels to mask communication latency in distributed workloads.
Ranked #1 in engineering velocity within Foundation Org, averaging 400+ commits and 1,000+ code reviews per half while scaling the team by 5+ engineers

Senior Staff Software Engineer - Tech Lead

2024/072025/06

Untether AI - Toronto ON

Led the design and architecture of next-generation inference compiler technologies for LLMs
Delivered industry-leading MLPerf results across BERT Large and ResNet-50 benchmarks
Drove silicon spin readiness, analyzing ECO cost-risk tradeoffs and guiding compiler-related silicon decisions
Managed the deployment of Llama3 8B and 70B across multi-chip, rack-scale inference systems

Staff Software Engineer - Deep Learning Performance

2023/012024/07

Untether AI - Toronto, ON

Led hardware/software co-design and performance optimization for a generative AI inference accelerator
Managed and trained a team of 17 contractors, deploying CNN workloads across a spatial compute fabric
Designed kernel implementations and contributed to core software stack architecture
Developed spatial placement and routing strategies to maximize inference throughput and bandwidth
Collaborated with clients on custom neural network applications
Owned hardware bring-up planning and execution, collaborating the entire stack to enable first-silicon readiness

Deep Learning Engineer

2021/082023/01

Untether AI - Toronto, ON

Developed ingestion, quantization, and post-training optimization pipelines for CNNs and transformers
Designed INT8 and FP8 quantization algorithms, and developed PTQ algorithms such as knowledge distillation
Implemented graph-level optimizations including pruning, fusion, and post-training layer swapping

Support Researcher

2020/042021/08

Huawei R&D Laboratory - Waterloo, ON

Co-developed and benchmarked a novel Remote Differential Compression algorithm for efficient file synchronization
Published and patented methods for scalable delta encoding across heterogeneous devices

Education

B.A.Sc. in Systems Design Engineering - Artificial Intelligence

2016/082021/05

University of Waterloo

Summa Cum Laude / Dean's Honors List.

Publications

Bhatt, Ramón et al. "Unsupervised Detection of Lung Nodules in Chest Radiography Using Generative Adversarial Networks", EMBC Annual Meeting 2021
Borzov, Ramón et al. "Method and Apparatus for Replicating a Target File between Devices" World Intellectual Property Organization, WO2023000915A1 / US20230087778. Issued Jan 26, 2023
Kitamura, Ramón et al. : "Mapping Attention Mechanisms (Transformer) Function to Spatial Architecture (SIMD or At-Memory Processing)" US Patent Office, 63/608,539. Filled Dec 22, 2023, Patent Pending