Jaime Leal, Data Cleaning Developer in Monterrey, Mexico
Jaime Leal

Data Cleaning Developer in Monterrey, Mexico

Member since September 20, 2019
Jaime has four years of experience working as a data scientist in all stages of the data science pipeline including data cleaning, feature engineering, model building, and model deployment mainly with structured data. He has developed in R and Python, and he has created automatic reports, visualizations, dashboards, RESTful APIs, and packages. Jaime is continuously working to improve his skills.
Jaime is now available for hire


  • Linksbridge
    RStudio, R, RShiny, HTML, CSS, SQL, database, ETL, Dplyr, GitHub, Git...
  • Strong Analytics
    RStudio, GitHub, Dplyr, Unit Testing, Data Visualization, HTML, CSS, Git, R
  • Teranalytics
    Twitter API, RStudio, Sentiment Analysis, Dplyr, Caret, Data Analysis...



Monterrey, Mexico



Preferred Environment

RStudio, Spyder, Visual Studio, Git

The most amazing...

...thing I've coded is a two-phase fluid flow simulator in Python.


  • R Developer

    2020 - 2020
    • Developed an RShiny application to search vaccinations campaigns in a database and allow the user to update old entries and create new ones.
    • Wrote SQL queries to allow the application to interact with the database.
    • Wrote code documentation and used Git for version control.
    Technologies: RStudio, R, RShiny, HTML, CSS, SQL, database, ETL, Dplyr, GitHub, Git, Web Forms, Data Wrangling, CSS, HTML, Databases, SQL, RStudio Shiny, R, Web Development
  • Data Engineer

    2020 - 2020
    Strong Analytics
    • Worked with a team of developers to create two R libraries.
    • Developed a dashboard in RShiny to interact with a data analytics engine.
    • Wrote unit tests with the testthat library to test that the R packages that we developed worked as intended.
    Technologies: RStudio, GitHub, Dplyr, Unit Testing, Data Visualization, HTML, CSS, Git, R
  • Data Scientist

    2017 - 2020
    • Worked in all stages of the data science pipeline including data cleaning, feature engineering, model building, and model deployment, mainly with structured data.
    • Developed several interactive RShiny dashboards and deployed them to shinyapps.io.
    • Managed and used Amazon Web Services (AWS) S3, EC2, Lambda, API Gateway, Cognito, and RDS.
    • Contributed to projects in various industries including manufacturing, transportation, clinical trials, and marketing.
    • Build an app to collect status updates and comments from Twitter and Facebook and perform text analysis and sentiment analysis.
    Technologies: Twitter API, RStudio, Sentiment Analysis, Dplyr, Caret, Data Analysis, Data Cleaning, AWS RDS, AWS Lambda, AWS S3, AWS EC2, Amazon Web Services (AWS), AWS API Gateway, GitHub, ETL, Natural Language Processing (NLP), Data Visualization, Machine Learning, Data Science, AWS, SQL, Git, Python, R


  • Data Science Certification (Other amazing things)

    Coursera data science specialization projects.

  • Tex Prediction Application (Development)
  • Porous Media Flow Simulator (Development)

    A porous media flow simulator that I made. It is used to simulate flow in a reservoir. The simulator can handle single-phase and two-phase flow, as well as vertical wells, boundary conditions, and different solvers.

  • Training a Speech Synthesis Model (Development)

    To train the speech synthesis models you need a dataset consisting of thousands of pairs of audio clips and their transcriptions. Extracting audio clips from recordings is easy. The difficult part is matching each audio clip to its transcript. Even in cases when you already have an accurate transcript, as it is with audiobooks, the process of manually matching each audio clip to its corresponding text is tedious and time-consuming.
    A workaround is to use an automatic speech transcription service to transcribe each audio clip.

  • Visualizing Mexico's Fishing Industry (Development)

    A Tableau dashboard to visualize the fishing industry performance in Mexico from 2008 to 2014 with information from CONAPESCA. The dashboard shows production per state and by top species, segmented by its origin, capture, or aquaculture.

  • Movie Recommender System (Development)

    Select a movie from the search bar and the app will recommend other movies like the one that you selected.

    The app uses a content-based recommender system, trained on movie descriptions to suggest movies that are similar to each other. The dataset for the analysis comes from "The Movies Dataset" and it contains 45,000 movies released on or before July 2017.


  • Languages

    Python, R, Markdown, SQL, HTML, CSS, Excel VBA, JavaScript
  • Frameworks

    RStudio Shiny, Django
  • Libraries/APIs

    Caret, REST APIs, Twitter API, Facebook API, Keras, TensorFlow, NumPy, SciPy, Pandas, Matplotlib
  • Tools

    Jupyter, Git, GitHub, Dplyr, Bitbucket, Spyder, Microsoft Access, Tableau, Visual Studio
  • Paradigms

    Data Science, ETL, Unit Testing, REST
  • Platforms

    Jupyter Notebook, Windows, Visual Studio Code, Amazon Web Services (AWS), AWS EC2, AWS Lambda, Linux, Docker, Heroku, RStudio
  • Other

    Machine Learning, Data Cleaning, Data Analysis, Natural Language Processing (NLP), Sentiment Analysis, Speech to Text, Text to Speech (TTS), AWS API Gateway, Linear Algebra, Numerical Methods, Deep Learning, Big Data, R, RShiny, HTML, CSS, SQL, database, ETL, Data Wrangling, AWS, Data Visualization
  • Storage

    AWS S3, AWS RDS, PostgreSQL, Databases, Web Forms, Redis
  • Industry Expertise

    Web Development


  • Master's degree in Petroleum Engineering
    2015 - 2016
    Texas A & M University - College Station, Texas, USA


  • Big Data Modeling and Management Systems
    JULY 2020 - PRESENT
  • Introduction to Big Data
    JULY 2020 - PRESENT
  • Deep Learning Specialization
  • Data Science
    MARCH 2019 - PRESENT

To view more profiles

Join Toptal
Share it with others