Download PDF

Summary

Experienced software engineer specializing in inference compiler technologies, hardware/software co-design of next-generation silicon, and performance engineering for generative and deep learning systems. Proven track record of leading cross-functional teams to architect and deliver scalable AI solutions in high-performance computing environments. Seeking to drive innovation in AI infrastructure through multidisciplinary leadership and technical excellence.

Work experience

Senior Staff Software Engineer - Tech Lead

2024/07Present
Untether AI - Toronto ON
  • Led the design and architecture of next-generation inference compiler technologies for large language models (LLMs)
  • Delivered industry-leading MLPerf results across BERT Large and ResNet-50 benchmarks
  • Drove silicon spin readiness, analyzing ECO cost-risk tradeoffs and guiding compiler-related silicon decisions
  • Managed a cross-functional teams responsible for deployment of Llama3 8B and 70B across multi-chip, rack-scale inference systems

Staff Software Engineer - Deep Learning Performance

2023/012024/07
Untether AI - Toronto, ON
  • Led hardware/software co-design and performance optimization for a generative AI inference accelerator
  • Managed and trained a team of 17 contractors, deploying CNN workloads across a spatial compute fabric
  • Designed kernel implementations and contributed to core software stack architecture
  • Developed spatial placement and routing strategies to maximize inference throughput and memory bandwidth
  • Collaborated with clients on custom neural network applications
  • Owned hardware bring-up planning and execution, collaborating across compiler, silicon, and validation teams to enable first-silicon readiness

Deep Learning Engineer

2021/092023/01
Untether AI - Toronto, ON
  • Developed ingestion, quantization, and post-training optimization pipelines for CNNs and transformers
  • Designed INT8 and FP8 quantization algorithms, and supported QAT and sequential knowledge distillation workflows
  • Implemented graph-level optimizations including pruning, fusion, and post-training layer swapping

Support Researcher

2020/042021/09
Huawei R&D Laboratory - Waterloo, ON
  • Co-developed and benchmarked a novel Remote Differential Compression algorithm for efficient file synchronization
  • Published and patented methods for scalable delta encoding across heterogeneous devices

Education

B.A.Sc. in Systems Design Engineering - Artificial Intelligence

2016/082021/05
University of Waterloo
  • Summa Cum Laude / Dean's Honors List.

Publications

  • Bhatt, Ramón et al. "Unsupervised Detection of Lung Nodules in Chest Radiography Using Generative Adversarial Networks", EMBC Annual Meeting 2021
  • Borzov, Ramón et al. "Method and Apparatus for Replicating a Target File between Devices" World Intellectual Property Organization, WO2023000915A1 / US20230087778. Issued Jan 26, 2023
  • Kitamura, Ramón et al. : "Mapping Attention Mechanisms (Transformer) Function to Spatial Architecture (SIMD or At-Memory Processing)" US Patent Office, 63/608,539. Filled Dec 22, 2023, Patent Pending

Notable Projects

ChexScan - Capstone Project

2020/082021/04
University of Waterloo
  • ML model for chest X-Ray disease screening. Finished first place and published results in EMBC 2021.