Download PDF

Previous Experience

Quant Reseacher (full time)

oct 2024Today
Quants Inc (Hedge Fund)

Responsible for modelling ETFs prices and volatility for intraday high frequency data using modern time series analysis with self attention and time tokenization, convolutional NNs for pattern extraction and recognition and NLP for event impact analysis on the market. Deep Learning Framework used is pytorch + Cuda

Staff Data Scientist (full time)

jan 2022oct 2024
Monashees (Venture Capital)

Inside Monashees, a Venture Capital firm focused on seed and Series A investment rounds, I'm responsible for building data teams and data products inside the company's portfolio of startups. Below are my contributions to each company:

  • Facio, credit industry
    • developed new credit model based on boostign algorithms, rising our f1-Score by 7% while maintaining the same loan risk levels for our population of clients. This meant an increase in credit revenue in 4%.
    • built MLOps infrastructure for credit risk, default probability and model performance monitoring using MLFlow, Great Expectations, Weights & Biases and AWS.
    • re-built Data Warehouse tables inside S3 and Athena, able to remove the bottleneck of main queries and speed up our data delivery time in 31%.
  • Gringo, automotive industry
    • built credit card antifraud model for credit card transactions using both structured models (Isolation Forest) as well as Deep Learning Variational AutoEncoder model. Dropped montly chargeback amount (BRL 3MM) by 83%.
    • built first MLOps infrastructure and pipelines, using Vertex AI, BigQuery,  GCS, Apache Airflow, Looker and Weights & Biases. Infrastructure contained experiment and model versioning, model and data drift monitoring and automatic retraining trigger. Speed up development and shipping speed by 200%.
    • built LLM sentiment extractor based on PyABSA (triplets extractors) to extract and process app and services reviews and provide analytics for the CX team
    • responsible for technical training of the team of 18 people in Data Science and Machine Learning areas
  • Liti Saúde, healthcare industry
    • built data platform from the ground-up using GCP (BigQuery, GCS, Apache Beam, Apache Airflow, Spark, CloudSQL, Looker), dbt and Terraform, thus providing data to all internal and external stakeholders. 
    • hired and structured first data team of 5 people
    • built data strategy for the company and our digital product, growing from 7 to 1500+ customers in the first year

Lead Data Scientist (full time)

mar 2021jan 2022
Kompa Saúde, seed investment round founding team (health industry)
  • built data platform from the ground up using GCP products with Terraform orchestration
  • built conversational anamnesis chatbot using transformer-based neural network on top of Rasa. Lowered appointments time in 40%, as well as rising our NPS from 15 to 74 points. 
  • managed and developed intelligent medical promptuary. Using our chatbot data, we managed to cluster patients and diseases and provide personalized promptuaries for each patient to our health team

Machine Learning Engineer (full time)

apr 2020dez 2020
Cloudwalk (finance, payments)
  • built credit card transactions antifraud model with a variotional autoencoder architecture and business rules on pre and post processing. The model was trained with a custom loss function under a heavy unbalanced dataset and displayed 0.73 F1 score in a BRL 20MM+ daily flow.
  • built the first MLOps pipelines, based on the The ML Test Score paper. By using tools such as TFX pipelines, Kubeflow and BigQuery as a Feature Store, we've managed to cut down development cycle (idea to deployment) time from 2 months to 3 weeks.

Data Science Teacher (part time)

mar 2019mar 2020
Digital House Brasil (education)
  • restructured  the Data Science & Machine Learning curriculum (5 months course). Managed to turn around a negative NPS and rise it up to 83
  • graduated 12 classes. Very proud of this achievement, as nowadays my students work in top brazilian startups such as iFood, Nubank, 99 etc

Data Scientist (full time)

nov 2018may 2019
Visagio (consulting)
  • built recommender system (collaborative + content-based hybrid system) for brazilian retail group. First deployed version led to 22% increase in page views and 7% increase in sales
  • developed in-company data science course and founded V-Labs, Visagio's analytics and innovation focused cell

Junior Data Scientist (full time)

may 2017nov 2018
InfoPrice (retail industry)
  • built computer vision OCR system (ConvNet) to clean images and recognize price tags in different supermarket contexs. Reduced 78% of our costs with third party solutions
  • responsible for data analysis, modelling and reporting (via OLAP-based dashboards) of Brazilian's physical and virtual retail prices. Built time-series model to predict prices

Strategy Director (university activity)

jan 2016apr 2017
Grupo Turing, founding member

Grupo Turing is a Data & AI research group from Universidade de São Paulo composed of undergraduate and graduate students. As a founding member leading the team, we grew from half a dozen members to 70+ people, organized many AI related events, online lectures and a datathon for 80 competitors. Furthermore, we developed projects in NLP, Reinforcement Learning and Computer Vision. 

Brain-Machine Interfaces Researcher (university activity)

jan 2015fev 2018
Escola Politécnica da USP

The objective of my research was to interpret electromagnetic signals from the brain in order to control an upper-body exoskeleton. By using digital signal processing techniques, image manipulation with OpenCV and a convolutional neural network architecture, we were able to control a robotic arm with 2 degrees of freedom.

Education

Courses & Certificates

Google Cloud Certified Professional Data Engineer

Google Cloud

https://www.credential.net/3e2f0fa6-dfa3-4055-9e5b-1655b55689c7?key=6a2c463b38ce9c852984a50917dc79d9f204325198cdeef224c26e8ac77e60f7

Skills

Data Modelling | Machine Learning | Large Language Models | Prompt Engineering | A.I. | Statistics | Data Visualization | NLP (Natural Language Processing) | Computational Vision | Deep Learning | Data Pipelines (Batch & Streaming) | Cloud solutions (AWS e GCP) |Model deployment | Model Monitoring and Maintenance | MLOps | Data Governance | Data Lakes | Data Warehousing | Feature Stores

Languages

Portuguese (Native) - English (Fluent) - French (Basic) - Japanese (Basic)

Softwares & Programming Languages

Python (Fluent, complete data stack)  | SQL (PostgreSQL and MySQL) | No-SQL (MongoDB, Redis, Neo4J)  |GCP | AWS | Apache Airflow | Apache Beam | Hadoop Ecosystem | Apache Kafka |Tensorflow  2| PyTorch | dbt | Weights & Biases | Microsoft Office (advanced) | LLMs and RAG