
Angel Ruiz Reche
Verified Expert in Engineering
Data Scientist and Software Developer
Angel is a data scientist with more than five years of research and business experience with a passion for data, pattern finding, and building solutions to problems. He's very communicative and proactive and likes to learn new things daily. He's specialized in building complete solutions in Python, from data parsing to creating specialized machine learning models. So far, he's contributed to startups and big companies in banking, eCommerce, real estate, and bioinformatics.
Portfolio
Experience
Availability
Preferred Environment
Regex, Time Series Analysis, NumPy, Visual Studio Code (VS Code), Machine Learning, Bioinformatics, Scikit-learn, Pandas, Python, MacOS
The most amazing...
...deep learning app I've developed is called ReorientExpress. It allows the deciphering of genetic sequences (RNA splicing code) without a reference.
Work Experience
Data Scientist
Treat Technologies, Inc
- Created ML models using BigQuery ML to predict the likelihood of customers making repeated purchases after their first interaction with the merchant.
- Created ML models using Google's Vertex AI to predict the estimated customer lifetime value of online buyers.
- Performed exhaustive EDA and data preparation on big datasets using Jupyter Notebooks, BigQuery, and Google's Dataprep.
Data Scientist and ML Engineer
Visibly Works LLC
- Designed and developed a model that suggests which ads to show on Amazon and in which order to maximize the conversion rate of specific products. It used traffic, conversion geographical, and demographical data.
- Created a model that classifies eCommerce ad campaigns in classes according to their content, performance, keywords, and more. This helped standardize the ad campaigns from different advertisers and improve their performance according to their goals.
- Set up a pipeline that predicted intraday campaign expenditure used to predict when a campaign would run out of budget and suggest a new budget, along with the potential losses in traffic and conversions.
- Developed an app that generates synthetic advertising data. This data could be shown to potential clients to showcase the product without exposing private data.
- Created a tool that periodically ran over all our clients' databases and found potentially wrong entries. This helped curate the databases and increase trust with our clients.
- Built a model that suggested which keywords to include in an ad campaign according to the target product, past performance, and how much to bid on them to reach a specific goal.
- Created a web scraping tool to extract Amazon's product categories. It deals with nested links and keeps track of the links already visited. The output is saved into an Excel file.
Lead Data Scientist
Lurtis Rules
- Developed several pipelines for the parsing, structuring, and analyzing commercial real estate data. Used the data and analysis to build machine learning-based prediction and forecasting tools to maximize investors' revenue.
- Created several machine learning models to help investors decide which real state buildings to invest in according to demographical, geographical, and macroeconomic data.
- Used econometrics analysis to give investors insights on the next macroeconomics trends.
- Worked in close contact with the client, product owner, and product manager to achieve project goals and the client's needs.
- Used agile methodologies with Jira and performed continuous code maintenance with GitHub.
- Created a Python web scraper tool that extracts data from real-state property portals. It continuously extracts the most recent data, parses the properties' descriptions,s and extracts the relevant information into tables.
Data Scientist and Team Leader
Banco Santander
- Developed and coded Python and R packages from the idea, code, and testing to the final independent dockerized package.
- Created NLP tools to automatically process different documents to classify them into the most likely kind of document and extract relevant information to be stored in databases.
- Led a small team of developers and coordinated them. Maintained close communication with other departments to ensure fast results and directly reported to upper management.
Data Scientist
Cambridge Cancer Research Institute
- Developed machine learning-based tools to extract, analyze, and classify papers from the biggest medical journal repository, PubMed.
- Created a deep learning NLP tool to learn patterns from authors' papers and their metadata. It can guess who wrote an article and distinguishes authors with the same name.
- Used the tools created to extract insights on how authors from different fields, countries, and universities behave and relate with other authors and topics.
Data Scientist and Bioinformatics Developer
Parc de Recerca Biomèdica de Barcelona
- Researched alternative splicing with machine learning models and data science tools.
- Developed a deep learning tool that can predict with 99% accuracy from which tissue a sample came from.
- Developed another deep learning tool that can predict the genetic expression of specific tissues, their potential response to specific drugs, and whether or not they are in a healthy state.
Experience
ReorientExpress: Deep Learning Tool Gene Expression Prediction
https://github.com/comprna/reorientexpressThis highlights one of the biggest advantages of deep learning; it can simulate complex systems without having to simplify the process into simple rules. Rather, it can learn complex interactions that other machine learning models cannot.
DeepOracle
https://github.com/angelrure/DeepOracleIntraday Campaign Budget Predictor
Those forecasts are sent to a web app in which clients can see which campaigns are likely to go out of budget during the day and by how much.
They also get an estimate on the potentially missed traffic and conversion events and a suggested budget increase to avoid going out of budget. As a result, their campaigns are always on budget.
Augmented Introspection: Emel
https://store.steampowered.com/app/2189350/Augmented_Introspection_Emel/?curator_clanid=4777282&utm_source=SteamDBIn this conversation-based videogame, the user communicates with an AI assistant using text inputs and can perform several quizzes, psychological tests games, and more. It uses Google Cloud services such as:
• Storage: To store user behavior data and gameplay event data.
• Functions: To allow communication between GCP and the videogame. It uses several endpoints for specific tasks.
• A text-to-speech API: In combination with Functions, it allows the AI assistance to speak.
The game explores topics such as transhumanism, hedonism, and individualism.
ETL Orchestration using AWS
Then the data was parsed, processed, cleaned, and then uploaded to AWS's Redshift. Data was also homogenized so the different data sources could be queried together. The pipeline was scheduled to run automatically at midnight ever day. The process was fully logged and fully developed in just 4 days.
Finally, the data was connected to an external dash-boarding solution (Metabase) where it could be visualized in real-time.
Skills
Languages
Python, Regex, SQL, Python 3, R, GML
Libraries/APIs
Pandas, Scikit-learn, Keras, TensorFlow, Matplotlib, NumPy, REST APIs, Beautiful Soup, PySpark
Tools
Jupyter, Git, Bitbucket, GitHub, Biopython, Amazon Athena, BigQuery, Amazon CloudWatch
Paradigms
Data Science, RESTful Development, ETL, Business Intelligence (BI), Software Testing
Platforms
Jupyter Notebook, Visual Studio Code (VS Code), AWS Lambda, Docker, Amazon Web Services (AWS), Google Cloud Platform (GCP), Steam, Databricks
Other
Machine Learning, Data Analytics, Predictive Analytics, Supervised Learning, Data Analysis, Time Series Analysis, Algorithms, Mathematics, Statistics, Computer Science, Visualization, Forecasting, OCR, Text Classification, Data Mining, Deep Neural Networks, Deep Learning, Neural Networks, Artificial Intelligence (AI), APIs, Unsupervised Learning, Data Modeling, Web Scraping, Time Series, Natural Language Processing (NLP), Commercial Real Estate, Biotech, Next-generation Sequencing, Biomedical Skills, Monte Carlo Simulations, Reinforcement Learning, eCommerce, Macroeconomic Forecasting, Econometrics, Psychology, Philosophy, Cloud Storage, Google Cloud Functions, Text to Speech (TTS), Google BigQuery, Vertex, Metabase, HubSpot, Web Crawlers
Storage
PostgreSQL, MySQL, Elasticsearch, Google Cloud, MongoDB, Google Cloud Storage, Redshift
Industry Expertise
Bioinformatics
Education
Master's Degree in Data Science
Valencia International University - Valencia, Spain
Master's Degree in Bioinformatics
Pompeu Fabra University - Barcelona, Spain
Bachelor's Degree in Biotechnology
Lleida University - Lleida, Spain
Certifications
Machine Learning Nanodegree
Udacity