Jesus Caro, Developer in Seattle, WA, United States
Jesus is available for hire
Hire Jesus

Jesus Caro

Verified Expert  in Engineering

Data Engineer and Developer

Location
Seattle, WA, United States
Toptal Member Since
August 23, 2023

Jesus is an experienced data engineer skilled in Python, ETL, and cloud infrastructure in AWS and Azure. He's also proficient in Spark, massively parallel processing (MPP) databases, Delta Lake, SQL, Databricks, machine learning, Apache Hive, and Snowflake. Jesus has a record of leading successful data models and ELT implementations with fluent and efficient client communication.

Portfolio

First American Financial
Python, PySpark, Spark, Apache Airflow, AWS IoT, AWS Glue, Snowflake
3Si
Databricks, SQL, Git, Scikit-learn, TensorFlow, Python, Spark, Snowflake, Azure...
Stahmanns Pecans
Python, Tableau, Microsoft Power BI, SQL, PHP, R

Experience

Availability

Part-time

Preferred Environment

Visual Studio, Databricks, Jupyter

The most amazing...

...system I've implemented for a client is a cutting-edge ML NLP model for entity resolution, leveraging sparse demographic data from diverse sources.

Work Experience

Data Engineer

2023 - PRESENT
First American Financial
  • Contributed to developing ETL pipelines utilizing PySpark on AWS Glue, with a dedicated emphasis on optimizing entity resolution processes.
  • Assumed a central role in integrating TransUnion data, resulting in notable improvements to the existing pipelines by seamlessly enriching credit reporting data.
  • Made notable contributions to entity resolution capabilities by advancing the NLP ML code using PySpark.
  • Conducted comprehensive testing and precise debugging of the pipeline code, employing Apache Airflow for streamlined workflow management and methodical output analysis.
Technologies: Python, PySpark, Spark, Apache Airflow, AWS IoT, AWS Glue, Snowflake

Senior Data Engineer

2020 - 2023
3Si
  • Led the development of a standardized ML pipeline, leveraging active learning to aid in entity resolution of data across disparate systems.
  • Implemented ETL pipelines using big data tools on Databricks such as Spark and PySpark. These data pipelines primarily cleaned and aggregated data from public sources.
  • Onboarded clients and led the creation and configuration of resources on Azure and AWS cloud platforms.
  • Handled the mapping of client data to our proprietary model by documenting and assessing client ERDs, data models, and integrations.
  • Implemented big data pipelines using Delta Lake and MPP databases such as Trino, Databricks, and Snowflake to decrease the latency of pipelines and OLAP queries.
  • Introduced automated ETL pipelines from client SQL, SFTP, or datalake sources via Azure Data Factory or Apache Airflow.
Technologies: Databricks, SQL, Git, Scikit-learn, TensorFlow, Python, Spark, Snowflake, Azure, Azure Data Factory, Synapse, AWS IoT, Amazon SageMaker, Apache Airflow, PySpark, Delta Lake, Trino

Data and Systems Analyst

2018 - 2020
Stahmanns Pecans
  • Created and maintained SQL databases that stored sensor and system process data.
  • Developed a production forecasting model in R to allocate products for future contracts. I also facilitated weekly presentations to monitor manufacturing KPIs.
  • Optimized and automated business processes, such as collecting QC and QA data.
Technologies: Python, Tableau, Microsoft Power BI, SQL, PHP, R

Carpark Vacancy in Singapore: A Geo-spatial Analysis

https://607f9ef90597535dcfdc202c--jolly-wright-eba598.netlify.app/portfolio/carpark/
This project involved a thorough analysis to identify parking lots exhibiting consistent availability patterns. Alongside this objective, I addressed the following inquiries:

• Which nearby parking facilities should drivers avoid or choose based on availability trends during regular business hours and off-business hours?
• During off-business hours, which parking lots are frequently full, and which ones maintain reasonable availability rates?
• How do availability fluctuations manifest over weekends?
2016 - 2018

Master's Degree in Astrophysics

Washington State University - Pullman, USA

2012 - 2016

Bachelor's Degree in Physics

The University of Texas - El Paso, USA

Libraries/APIs

PySpark, Scikit-learn, TensorFlow, NumPy, Pandas

Tools

Visual Studio, Jupyter, Apache Airflow, AWS Glue, Git, Synapse, Amazon SageMaker, Tableau, Microsoft Power BI, Plotly

Platforms

Databricks, AWS IoT, Azure

Frameworks

Spark, Trino

Languages

Python, Snowflake, SQL, PHP, R, C

Storage

Databases

Other

Programming, Data Visualization, Mathematics, Statistics, Azure Data Factory, Physics, Delta Lake

Collaboration That Works

How to Work with Toptal

Toptal matches you directly with global industry experts from our network in hours—not weeks or months.

1

Share your needs

Discuss your requirements and refine your scope in a call with a Toptal domain expert.
2

Choose your talent

Get a short list of expertly matched talent within 24 hours to review, interview, and choose from.
3

Start your risk-free talent trial

Work with your chosen talent on a trial basis for up to two weeks. Pay only if you decide to hire them.

Top talent is in high demand.

Start hiring