Jesus Caro
Verified Expert in Engineering
Data Engineer and Developer
Jesus is an experienced data engineer skilled in Python, ETL, and cloud infrastructure in AWS and Azure. He's also proficient in Spark, massively parallel processing (MPP) databases, Delta Lake, SQL, Databricks, machine learning, Apache Hive, and Snowflake. Jesus has a record of leading successful data models and ELT implementations with fluent and efficient client communication.
Portfolio
Experience
Availability
Preferred Environment
Visual Studio, Databricks, Jupyter
The most amazing...
...system I've implemented for a client is a cutting-edge ML NLP model for entity resolution, leveraging sparse demographic data from diverse sources.
Work Experience
Data Engineer
First American Financial
- Contributed to developing ETL pipelines utilizing PySpark on AWS Glue, with a dedicated emphasis on optimizing entity resolution processes.
- Assumed a central role in integrating TransUnion data, resulting in notable improvements to the existing pipelines by seamlessly enriching credit reporting data.
- Made notable contributions to entity resolution capabilities by advancing the NLP ML code using PySpark.
- Conducted comprehensive testing and precise debugging of the pipeline code, employing Apache Airflow for streamlined workflow management and methodical output analysis.
Senior Data Engineer
3Si
- Led the development of a standardized ML pipeline, leveraging active learning to aid in entity resolution of data across disparate systems.
- Implemented ETL pipelines using big data tools on Databricks such as Spark and PySpark. These data pipelines primarily cleaned and aggregated data from public sources.
- Onboarded clients and led the creation and configuration of resources on Azure and AWS cloud platforms.
- Handled the mapping of client data to our proprietary model by documenting and assessing client ERDs, data models, and integrations.
- Implemented big data pipelines using Delta Lake and MPP databases such as Trino, Databricks, and Snowflake to decrease the latency of pipelines and OLAP queries.
- Introduced automated ETL pipelines from client SQL, SFTP, or datalake sources via Azure Data Factory or Apache Airflow.
Data and Systems Analyst
Stahmanns Pecans
- Created and maintained SQL databases that stored sensor and system process data.
- Developed a production forecasting model in R to allocate products for future contracts. I also facilitated weekly presentations to monitor manufacturing KPIs.
- Optimized and automated business processes, such as collecting QC and QA data.
Experience
Carpark Vacancy in Singapore: A Geo-spatial Analysis
https://607f9ef90597535dcfdc202c--jolly-wright-eba598.netlify.app/portfolio/carpark/• Which nearby parking facilities should drivers avoid or choose based on availability trends during regular business hours and off-business hours?
• During off-business hours, which parking lots are frequently full, and which ones maintain reasonable availability rates?
• How do availability fluctuations manifest over weekends?
Education
Master's Degree in Astrophysics
Washington State University - Pullman, USA
Bachelor's Degree in Physics
The University of Texas - El Paso, USA
Skills
Libraries/APIs
PySpark, Scikit-learn, TensorFlow, NumPy, Pandas
Tools
Visual Studio, Jupyter, Apache Airflow, AWS Glue, Git, Synapse, Amazon SageMaker, Tableau, Microsoft Power BI, Plotly
Platforms
Databricks, AWS IoT, Azure
Frameworks
Spark, Trino
Languages
Python, Snowflake, SQL, PHP, R, C
Storage
Databases
Other
Programming, Data Visualization, Mathematics, Statistics, Azure Data Factory, Physics, Delta Lake
How to Work with Toptal
Toptal matches you directly with global industry experts from our network in hours—not weeks or months.
Share your needs
Choose your talent
Start your risk-free talent trial
Top talent is in high demand.
Start hiring