Angel is available for hire

Angel Ruiz Reche

Verified Expert in Engineering

Data Scientist and Software Developer

Location

Barcelona, Spain

Toptal Member Since

January 27, 2021

Angel is a data scientist with more than five years of research and business experience with a passion for data, pattern finding, and building solutions to problems. He's very communicative and proactive and likes to learn new things daily. He's specialized in building complete solutions in Python, from data parsing to creating specialized machine learning models. So far, he's contributed to startups and big companies in banking, eCommerce, real estate, and bioinformatics.

Predictive Analytics Data Analysis Machine Learning Data Analytics Python Pandas Jupyter Scikit-learn Python 3 SQL Regex Computer Science Algorithms Statistics Data Mining Bioinformatics Optical Character Recognition Unsupervised Learning

Portfolio

Treat Technologies, Inc

Google BigQuery, Python, Data Science, Google Cloud Platform (GCP), Vertex...

Visibly Works LLC

Python, Machine Learning, Data Science, Data Modeling, SQL, NumPy, Matplotlib...

Lurtis Rules

Predictive Analytics, OCR, SQL, Commercial Real Estate, Forecasting...

Experience

Python - 5 years Pandas - 5 years Bioinformatics - 4 years Statistics - 4 years Data Science - 4 years SQL - 4 years Machine Learning - 4 years Time Series Analysis - 3 years

Availability

Part-time

Preferred Environment

Regex, Time Series Analysis, NumPy, Visual Studio Code (VS Code), Machine Learning, Bioinformatics, Scikit-learn, Pandas, Python, MacOS

The most amazing...

...deep learning app I've developed is called ReorientExpress. It allows the deciphering of genetic sequences (RNA splicing code) without a reference.

Work Experience

Data Scientist

2023 - 2023

Treat Technologies, Inc

Created ML models using BigQuery ML to predict the likelihood of customers making repeated purchases after their first interaction with the merchant.
Created ML models using Google's Vertex AI to predict the estimated customer lifetime value of online buyers.
Performed exhaustive EDA and data preparation on big datasets using Jupyter Notebooks, BigQuery, and Google's Dataprep.

Technologies: Google BigQuery, Python, Data Science, Google Cloud Platform (GCP), Vertex, BigQuery, Machine Learning, Visual Studio Code (VS Code), Visualization, Data Analytics, SQL, Supervised Learning, Google Cloud, Cloud Storage, Google Cloud Functions

Data Scientist and ML Engineer

2021 - 2022

Visibly Works LLC

Designed and developed a model that suggests which ads to show on Amazon and in which order to maximize the conversion rate of specific products. It used traffic, conversion geographical, and demographical data.
Created a model that classifies eCommerce ad campaigns in classes according to their content, performance, keywords, and more. This helped standardize the ad campaigns from different advertisers and improve their performance according to their goals.
Set up a pipeline that predicted intraday campaign expenditure used to predict when a campaign would run out of budget and suggest a new budget, along with the potential losses in traffic and conversions.
Developed an app that generates synthetic advertising data. This data could be shown to potential clients to showcase the product without exposing private data.
Created a tool that periodically ran over all our clients' databases and found potentially wrong entries. This helped curate the databases and increase trust with our clients.
Built a model that suggested which keywords to include in an ad campaign according to the target product, past performance, and how much to bid on them to reach a specific goal.
Created a web scraping tool to extract Amazon's product categories. It deals with nested links and keeps track of the links already visited. The output is saved into an Excel file.

Technologies: Python, Machine Learning, Data Science, Data Modeling, SQL, NumPy, Matplotlib, Scikit-learn, Jupyter, eCommerce, PostgreSQL, Statistics, Supervised Learning, Unsupervised Learning, Elasticsearch, Amazon Athena, Bitbucket, APIs, Forecasting, Amazon Web Services (AWS), Google Cloud, ETL, Web Scraping, Google Cloud Platform (GCP), Python 3, Data Analysis, REST APIs, Visual Studio Code (VS Code), Regex, Algorithms, Git, Text Classification, Neural Networks, TensorFlow, Jupyter Notebook, Visualization, Data Analytics, Natural Language Processing (NLP), GitHub, MongoDB, MySQL, Time Series, Deep Neural Networks, Data Mining, Pandas, AWS Lambda, Web Crawlers

Lead Data Scientist

2020 - 2021

Lurtis Rules

Developed several pipelines for the parsing, structuring, and analyzing commercial real estate data. Used the data and analysis to build machine learning-based prediction and forecasting tools to maximize investors' revenue.
Created several machine learning models to help investors decide which real state buildings to invest in according to demographical, geographical, and macroeconomic data.
Used econometrics analysis to give investors insights on the next macroeconomics trends.
Worked in close contact with the client, product owner, and product manager to achieve project goals and the client's needs.
Used agile methodologies with Jira and performed continuous code maintenance with GitHub.
Created a Python web scraper tool that extracts data from real-state property portals. It continuously extracts the most recent data, parses the properties' descriptions,s and extracts the relevant information into tables.

Technologies: Predictive Analytics, OCR, SQL, Commercial Real Estate, Forecasting, Time Series Analysis, Machine Learning, Data Analytics, Data Science, Python, Artificial Intelligence (AI), APIs, Matplotlib, Jupyter, Scikit-learn, Supervised Learning, Unsupervised Learning, ETL, Web Scraping, Python 3, Data Analysis, REST APIs, Macroeconomic Forecasting, Econometrics, Visual Studio Code (VS Code), Regex, Algorithms, Git, Text Classification, TensorFlow, Jupyter Notebook, Amazon Web Services (AWS), Visualization, GitHub, MySQL, Time Series, Data Modeling, Data Mining, Pandas, BigQuery, Google BigQuery, Web Crawlers, Beautiful Soup

Data Scientist and Team Leader

2019 - 2020

Banco Santander

Developed and coded Python and R packages from the idea, code, and testing to the final independent dockerized package.
Created NLP tools to automatically process different documents to classify them into the most likely kind of document and extract relevant information to be stored in databases.
Led a small team of developers and coordinated them. Maintained close communication with other departments to ensure fast results and directly reported to upper management.

Technologies: Predictive Analytics, Regex, Machine Learning, Data Science, Git, Docker, R, OCR, Python, Artificial Intelligence (AI), Matplotlib, Jupyter, Scikit-learn, Supervised Learning, Unsupervised Learning, Web Scraping, Python 3, Data Analysis, SQL, MySQL, PostgreSQL, MongoDB, GitHub, Natural Language Processing (NLP), Visual Studio Code (VS Code), Algorithms, Text Classification, Jupyter Notebook, Visualization, REST APIs, Data Analytics, ETL, Data Modeling, Data Mining, Pandas

Data Scientist

2018 - 2018

Cambridge Cancer Research Institute

Developed machine learning-based tools to extract, analyze, and classify papers from the biggest medical journal repository, PubMed.
Created a deep learning NLP tool to learn patterns from authors' papers and their metadata. It can guess who wrote an article and distinguishes authors with the same name.
Used the tools created to extract insights on how authors from different fields, countries, and universities behave and relate with other authors and topics.

Technologies: Predictive Analytics, Data Mining, Text Classification, Machine Learning, Data Science, R, Python, APIs, Matplotlib, Jupyter, Scikit-learn, Supervised Learning, Unsupervised Learning, Python 3, Data Analysis, REST APIs, Natural Language Processing (NLP), Deep Learning, Keras, Visual Studio Code (VS Code), Algorithms, Neural Networks, TensorFlow, Jupyter Notebook, Visualization, Data Analytics, Data Modeling, Pandas

Data Scientist and Bioinformatics Developer

2017 - 2018

Parc de Recerca Biomèdica de Barcelona

Researched alternative splicing with machine learning models and data science tools.
Developed a deep learning tool that can predict with 99% accuracy from which tissue a sample came from.
Developed another deep learning tool that can predict the genetic expression of specific tissues, their potential response to specific drugs, and whether or not they are in a healthy state.

Technologies: Predictive Analytics, Keras, Deep Learning, Deep Neural Networks, RESTful Development, REST APIs, R, Python, Biopython, Biotechnology, Bioinformatics, Machine Learning, Data Science, Artificial Intelligence (AI), Matplotlib, Jupyter, Scikit-learn, Supervised Learning, Unsupervised Learning, SQL, Python 3, Data Analysis, TensorFlow, Next-generation Sequencing, Jupyter Notebook, Data Modeling, Pandas

Experience

ReorientExpress: Deep Learning Tool Gene Expression Prediction

https://github.com/comprna/reorientexpress

A deep learning tool created using Python and Tensorflow that can rebuild genomes and evaluate their expression without having a reference from that species. It works by stimulating the ARN splicing, a biological process that uses a language (similar to DNA) that has not yet been fully deciphered. Therefore, ReorientExpress can predict the results of splicing without the explicit need to deciphering the splicing code.

This highlights one of the biggest advantages of deep learning; it can simulate complex systems without having to simplify the process into simple rules. Rather, it can learn complex interactions that other machine learning models cannot.

DeepOracle

https://github.com/angelrure/DeepOracle

A deep learning-based application in Python that helps software testers by selecting the best samples to test. It first creates a model that tries to replicate the program (the Oracle) using deep neural networks. Then, given a new dataset, it selects the samples that are more likely to find bugs in the software under testing.

Intraday Campaign Budget Predictor

A Python-based application that connects to an AWS Athena service, extracts hourly advertising data from Amazon ad campaigns, and forecasts the traffic and conversion data of the active campaigns.

Those forecasts are sent to a web app in which clients can see which campaigns are likely to go out of budget during the day and by how much.

They also get an estimate on the potentially missed traffic and conversion events and a suggested budget increase to avoid going out of budget. As a result, their campaigns are always on budget.

Augmented Introspection: Emel

https://store.steampowered.com/app/2189350/Augmented_Introspection_Emel/?curator_clanid=4777282&utm_source=SteamDB

I created a conversation-based videogame using GML, a language based on C++. I designed and coded the structure, the graphical interface, and all the behind-the-scenes logic.

In this conversation-based videogame, the user communicates with an AI assistant using text inputs and can perform several quizzes, psychological tests games, and more. It uses Google Cloud services such as:
• Storage: To store user behavior data and gameplay event data.
• Functions: To allow communication between GCP and the videogame. It uses several endpoints for specific tasks.
• A text-to-speech API: In combination with Functions, it allows the AI assistance to speak.

The game explores topics such as transhumanism, hedonism, and individualism.

ETL Orchestration using AWS

I created an ETL system in which data from different APIs and an on-premise PostgreSQL DB were dumped into AWS S3 using AWS's Lambda Functions.

Then the data was parsed, processed, cleaned, and then uploaded to AWS's Redshift. Data was also homogenized so the different data sources could be queried together. The pipeline was scheduled to run automatically at midnight ever day. The process was fully logged and fully developed in just 4 days.

Finally, the data was connected to an external dash-boarding solution (Metabase) where it could be visualized in real-time.

Skills

Languages

Python, Regex, SQL, Python 3, R, GML

Libraries/APIs

Pandas, Scikit-learn, Keras, TensorFlow, Matplotlib, NumPy, REST APIs, Beautiful Soup, PySpark

Tools

Jupyter, Git, Bitbucket, GitHub, Biopython, Amazon Athena, BigQuery, Amazon CloudWatch

Paradigms

Data Science, RESTful Development, ETL, Business Intelligence (BI), Software Testing

Platforms

Jupyter Notebook, Visual Studio Code (VS Code), AWS Lambda, Docker, Amazon Web Services (AWS), Google Cloud Platform (GCP), Steam, HubSpot, Databricks

Other

Machine Learning, Data Analytics, Predictive Analytics, Supervised Learning, Data Analysis, Time Series Analysis, Algorithms, Mathematics, Statistics, Computer Science, Visualization, Forecasting, OCR, Text Classification, Data Mining, Deep Neural Networks, Deep Learning, Neural Networks, Artificial Intelligence (AI), APIs, Unsupervised Learning, Data Modeling, Web Scraping, Time Series, Natural Language Processing (NLP), Commercial Real Estate, Biotechnology, Next-generation Sequencing, Biomedical Skills, Monte Carlo Simulations, Reinforcement Learning, eCommerce, Macroeconomic Forecasting, Econometrics, Psychology, Philosophy, Cloud Storage, Google Cloud Functions, Text to Speech (TTS), Google BigQuery, Vertex, Metabase, Web Crawlers

Storage

PostgreSQL, MySQL, Elasticsearch, Google Cloud, MongoDB, Google Cloud Storage, Redshift

Industry Expertise

Bioinformatics

Education

2019 - 2020

Master's Degree in Data Science

Valencia International University - Valencia, Spain

2016 - 2018

Master's Degree in Bioinformatics

Pompeu Fabra University - Barcelona, Spain

2012 - 2016

Bachelor's Degree in Biotechnology

Lleida University - Lleida, Spain

Certifications

NOVEMBER 2017 - PRESENT

Machine Learning Nanodegree

Udacity

Collaboration That Works

How to Work with Toptal

Toptal matches you directly with global industry experts from our network in hours—not weeks or months.

Share your needs

Discuss your requirements and refine your scope in a call with a Toptal domain expert.

Choose your talent

Get a short list of expertly matched talent within 24 hours to review, interview, and choose from.

Start your risk-free talent trial

Work with your chosen talent on a trial basis for up to two weeks. Pay only if you decide to hire them.

Top talent is in high demand.

Start hiring