Leopoldo is available for hire

Leopoldo Corona

Verified Expert in Engineering

Machine Learning Engineer and Developer

Location

Guadalajara, Mexico

Toptal Member Since

April 27, 2020

Leopoldo is a Certified AWS Machine Learning (ML) specialist who has worked in all data-related positions. He started his career early as a data research analyst and then became a data scientist, developing risk and fraud assessing models. When Leopoldo began struggling with bringing those models to production, he transitioned to an ML position focusing more on data-centric engineering, thus becoming a data engineer. Currently, Leopoldo is the head of data engineering at Clara startup.

Machine Learning Data Modeling Scikit-learn Jupyter Deep Learning Data Analysis Data Engineering Python Keras Pandas SQL MySQL Amazon Web Services (AWS)PostgreSQL Data Validation

Portfolio

Clara

Scala, Amazon Web Services (AWS), Databricks, Apache Kafka, Spark SQL, Spark...

PayClip

Python, Snowflake, Databricks, AWS Lambda, ETL

Kavak

Amazon SageMaker, Machine Learning, Python 3, Amazon Web Services (AWS)...

Experience

Machine Learning - 8 years Python - 8 years SQL - 8 years Amazon Web Services (AWS) - 7 years Databricks - 3 years Leadership - 3 years Spark - 3 years Scala - 2 years

Availability

Full-time

Preferred Environment

Amazon Web Services (AWS), Python, Databricks, Scala

The most amazing...

...thing I've developed is an identity verification model that extracted and matched faces from an ID card and a selfie with over 95% accuracy.

Work Experience

Head of Data and ML Engineering

2022 - PRESENT

Clara

Built the data engineering team from scratch and grew it to a team of 9+ engineers.
Developed Clara's global data lake and data lakehouse. Started with Redshift and AWS Glue and migrated all processes and ETLs to Databricks.
Delivered global data for insights and reporting. Reported directly to the director of data.

Technologies: Scala, Amazon Web Services (AWS), Databricks, Apache Kafka, Spark SQL, Spark, Redshift, AWS Glue, Data Modeling, AWS Lambda, ETL

Senior Data Scientist

2021 - 2022

PayClip

Deployed a fraud assessment model to production, which reduced fraudulent transactions by over 50%.
Utilized Databricks notebooks and Snowflake to conduct analysis and report on fraud and risk key performance indicators (KPIs).
Oversaw stakeholder requirements and delivered presentations.

Technologies: Python, Snowflake, Databricks, AWS Lambda, ETL

Machine Learning Engineer

2020 - 2021

Kavak

Developed and improved feature engineering jobs for the ML models to consume.
Provided analysis support for credit risk, the financial branch of the company.
Provided analysis and development support for computer vision projects.

Technologies: Amazon SageMaker, Machine Learning, Python 3, Amazon Web Services (AWS), PySpark, AWS Lambda, ETL

Lead Data Scientist | ML Engineer

2017 - 2020

Kueski

Developed and deployed into production a fraud prevention model for a loan application streaming process that saved more than 10% a month in losses.
Monitored and continuously improved the fraud prevention model to prevent more than 1% decreases in performance.
Coached junior team members by transferring fraud modeling knowledge and sharing general know-how.
Proposed a standardized project template for data science model service repositories that made model deployment 80% more efficient and experiment-trackable.
Led a high-performance ML engineering team and proposed a balanced team workflow based on restricted WIP Kanban Agile methodology. This proposal increased the team's productivity by 100%.
Developed a face-image-matching deep-learning model with over 95% accuracy when verifying our client's identity.
Featured a store project using Hopsworks and Databricks.

Technologies: Amazon Web Services (AWS), Git, Ansible, Jupyter, Matplotlib, NumPy, Pandas, Keras, XGBoost, Python, Data Science, Machine Learning, AWS Lambda, ETL

Data Scientist

2016 - 2017

Intelimetrica

Co-developed the nearest-neighbors model used in the company's main platform product used to show the most similar houses geographically close from the property selected in the platform. This model helped to detect anomalies in house appraisals.
Collaborated in the continuous improvement of the house-pricing prediction model for the two main clients of the firm.
Created a model to predict optimal delivery routes as a PoC for a client—potentially creating a savings of over 30% in logistic expenses.
Co-lead the data science team while reporting directly to the CEO.

Technologies: NumPy, Scikit-learn, Pandas, Python, Data Science, Machine Learning

Research Assistant

2015 - 2016

UNAM Physics Institute

Co-authored a conference paper on research where I implemented both the independent image reconstruction and the image registration optimization using affine transformations combined with a non-linear transformation.
Helped with the preclinical studies by preparing and configuring the microCT unit that was getting over 2GB of data in every study.
Conducted research on imaging medical physics—manipulating more than 500GB of data in the university supercomputer cluster.

Technologies: Python, Bash, MATLAB, Machine Learning

Research Intern

2015 - 2015

National Institute of Neurology and Neuroscience

Spearheaded the development of a CT and PET brain atlas on a healthy Mexican population to help improve automatic digital segmentation for radiotherapy and radiosurgery.
Helped on dosimetry measurements in radiotherapy and radiosurgery sessions.
Supported experimental setups and data analysis to calibrate the radiotherapy and radiosurgery equipment based on measurement data.

Technologies: MATLAB, Bash, Python, ITK

Experience

Optimization of Dual-energy Subtraction for Preclinical Studies Using a Commercial MicroCT Unit

This is a protocol designed to optimize DE image subtraction for contrast-enhanced studies in rodents, employing iodine-based contrast medium (CM).

Our investigation used an Albira ARS commercial unit, not designed explicitly for quantitative CT tasks. DE subtraction was divided into stages that were independently analyzed: acquisition, volume reconstruction, image registration, and image weighting. The DE radiological techniques (low- and high- energy) had been previously optimized to enhance the visualization of iodine-based CM.

An independent reconstruction was needed to guarantee linearity between iodine intensity and its concentration for high energy acquisition; it also reduced structured noise occasionally produced by the microCT reconstruction software over uniform regions and improved bone visualization. Image registration was optimized, combining an affine transformation with a non-linear transformation determined with the Free-Form Deformation algorithm.

Two subtraction weight factors were identified: one that maximized the contrast-to-noise ratio (CNR) of iodine mixed with soft-tissue-equivalent resin and another that minimized CNR between bone-like rods and soft-tissue-equivalent material.

Intelimétrica Banca

While working at Intelimetrica (a machine learning and data consulting startup), I was involved in the continuous improvement of a platform to manage mortgage portfolio quality.

This platform featured a house pricing model, with KPI of less than 5% of error, and a similar geographically-close houses finder model to avoid fraud in house appraisals. The similar houses model featured a similarity score as a secondary indicator of the quality of the appraisal.

I co-developed both machine learning models and contributed to the operationalization working close with the engineering team.

Face Similarity Identification Model

After starting my job early at Kueski (high-growing fintech startup financed by Silicon Valley VCs), we used a manual process of identity verification consisting of comparing the ID card face against a selfie. This process started to be unscalable and very prone to errors.

I proposed and developed an application that extracted the faces and inputted them to a model that returned the probability of being the same person to verify the loan applicant's identity automatically. This model was developed and trained from scratch using proprietary data, which had a state-of-the-art performance.

Skills

Libraries/APIs

Scikit-learn, XGBoost, Keras, Pandas, NumPy, Matplotlib, CatBoost, Dask, PySpark, PyTorch

Tools

Jupyter, Jira, AWS Glue, Seaborn, Git, MATLAB, ITK, Apache Airflow, Ansible, Amazon SageMaker, Spark SQL

Paradigms

Data Science, ETL

Other

Machine Learning, Model Validation, Technical Consulting, Data Modeling, Data Analysis, Deep Learning, Data Engineering, Leadership, Algorithms, Engineering, Physics

Languages

Python, SQL, Bash, Python 3, Scala, Snowflake

Platforms

Amazon Web Services (AWS), AWS Lambda, Databricks, Apache Kafka

Storage

MySQL, PostgreSQL, Data Validation, Redshift

Frameworks

LightGBM, Spark

Education

2021 - 2021

Master's Degree in Informatics and Applied Mathematics

Higher School of Economics - Moscow

2011 - 2015

Bachelor of Science Degree in Engineering Physics

Monterrey Institute of Technology and Higher Education - Monterrey, Mexico

Certifications

NOVEMBER 2020 - NOVEMBER 2023

AWS Certified Machine Learning - Specialty

Amazon Web Services

Collaboration That Works

How to Work with Toptal

Toptal matches you directly with global industry experts from our network in hours—not weeks or months.

Share your needs

Discuss your requirements and refine your scope in a call with a Toptal domain expert.

Choose your talent

Get a short list of expertly matched talent within 24 hours to review, interview, and choose from.

Start your risk-free talent trial

Work with your chosen talent on a trial basis for up to two weeks. Pay only if you decide to hire them.

Top talent is in high demand.

Start hiring