Juan Luis Ruiz - Tagle
Verified Expert in Engineering
Data Scientist and Developer
Juan Luis is a data scientist with expertise in spatial analytics and optimization. He has a background in computer science and four years of professional experience working in spatial data science, finance, and advertising technology. He combines his deep knowledge of machine learning with software engineering best practices to build robust and reliable ML solutions. Juan Luis has strong analytical skills and addresses problems from a business perspective, prioritizing the client's needs.
MacOS, Google Cloud, BigQuery, Git, Slack, Python
The most amazing...
...system I've developed is a set of spatial ML algorithms in SQL, which run at scale on cloud data warehouses like Google BigQuery.
Senior Data Scientist
Pelikan Mobility SAS
- Developed VRP algorithms to accommodate electric vehicles, incorporating the constraints of charging time and limited range.
- Conducted comprehensive research to identify alternative solutions for VRP solvers and successfully implemented them into the project, resulting in significant cost reductions.
- Made valuable contributions to open-source projects pertaining to the VRP problem.
Data Analytics Lead Instructor
IESE Business School
- Taught a two-week intensive course on Python and Data Analytics to 60+ MiM students at IESE Business School.
- Managed different Python levels in students, making sure the inexperienced had a solid understanding of the fundamentals while I provided the more advanced students with extra material.
- Evaluated the students, measuring the effort made to take the most out of the course, regardless of their initial Python skills.
- Coordinated with two teacher assistants who helped me with the classes and another lead instructor who instructed another classroom.
- Implemented spatial statistics and ML algorithms in SQL to run them at scale on cloud data warehouses.
- Developed spatial models for estimating accumulated litter in cities at a granular level.
- Built optimization solutions for vehicle routing and territory management, connected to Google BigQuery as remote functions.
- Designed spatial indexes for clients, which combined target demographics, POI presence density, and mobility data.
- Identified trends in hotspot areas for retail during the pandemic using human mobility data (origin-destination matrices), POI data, and performing time series analysis.
- Created ETL processes with Apache Airflow to recurrently ingest spatial data from several data sources into CARTO's platform.
ETS Asset Management Factory
- Applied state-of-the-art techniques to make more accurate predictions of financial markets' behavior, contributing to the financial advisory firm's primary purpose of making stock market investment recommendations driven by data science.
- Developed a RESTful API that serves synthetic stock series created by generative adversarial networks on demand.
- Put into production a novel deep learning portfolio investment strategy and deployed it to internal servers to automate portfolio recommendations.
- Developed a funnel for the company's video advertising campaigns which helped gain insights into the adequate progress of the business.
- Built ETL processes that aggregated data periodically from ads stored in a MongoDB database and displayed the current state of the ad flow in a dashboard.
- Assisted the CEO in preparing the company's next funding round by analyzing revenue and client fidelity.
Local MX Refinement | ML Tool for out of Home Advertising Campaign Optimizationhttps://carto.com/blog/carto-havas-media-big-data-ai-world-madrid/
The client's interest was to measure the impressions (number of visits) and coverage (number of distinct visitors) each of their billboards in Spain received weekly. They also wanted this information segmented by different categorical variables: type of day, hourly range, age, gender, and income level. For this, our models were trained on data from several sources (telco, SDK data, sociodemographic, POI, etc.). Then an optimization algorithm ordered the billboards best adapted to the target campaign.
I got involved in this project at a calibration stage, in which I:
• Tweaked the ML models and algorithms to align with client expectations
• Automatized background processes for telco data ingestion, automatic enablement of new billboards in the tool, etc
• Extended the usage of the tool within the Canary Islands by computing SDK routes on this region with OSRM
• Handled the communication with the client for all technical matters
Sales KPI Calculation Automation for an International Beverage Company
Together with my team, we launched a Spark cluster in Databricks to automate the KPI calculations. This allowed us to leverage the power of distributed computing and easily process the massive amounts of data the client was working with. I worked closely with their team to understand their specific requirements. Then I implemented the Spark-based solution that automated the calculations, eliminating the need for manual intervention and saving countless work hours.
The tweets of both accounts are analyzed using NLP techniques, including sentiment and emotion prediction, topic modeling, and tweeting behavior statistics. These results are presented in a dashboard and sent to the paying user.
Despite its complexity, the system is fully autonomous and requires minimal maintenance on my part. It is comprised of multiple seamlessly integrated microservices which take care of payment processing, tweet fetching, sentiment inference, dashboard generation, email communication, and other tasks.
Black Friday Analysishttps://www.safegraph.com/blog/2021-black-friday
Spatial Data Science Conference 2022https://www.youtube.com/watch?v=6kNqsQY_e90
I presented the CARTO Analytics Toolbox, an SQL library for cloud data warehouses' spatial analysis and modeling.
Scraper App for Official State Documents in PDF
• Generating fake data with pandas, very quickly
• What to expect when throwing dice and adding them up
• Scraping Google Search (without getting caught)
• Can neural networks predict the stock market just by reading
A couple of examples follow:
1. Youtuber Chatbot
I fine-tuned LLMs to mimic the style of a Youtuber (MKBHD)
For this, I scraped the transcripts of his YouTube videos and generated a dataset of conversations based on these transcripts.
I tried two approaches:
• Finetuning Llama2 7b as a quantized model and deploying it as an inference endpoint in Hugging Face
• Finetuning GPT3.5 from OpenAI via thier API.
I also gave the chatbot access to a vector database with info about the YouTuber's videos in order to make recommendations.
2. Doctor Chatbot
For a client, I developed a chatbot that would interact with a database of doctors in the US and answer questions about it.
It would transform the questions asked by the user into an SQL query and then return the results of the executed query to the user.
3. Firecitadel bot:
A Twitter bot that imagines stories and generates images using open-source models. (https://x.com/firecitadel)
Pandas, Scikit-learn, NumPy, REST APIs, Keras, TensorFlow, Stripe, PyTorch
BigQuery, GIS, Git, Apache Airflow, Google Sheets, Slack, Jenkins, Celery, Google Analytics
Data Science, Agile Software Development
Google Cloud, Databases, MongoDB, Google Cloud Storage, Database Administration (DBA)
Artificial Intelligence (AI), Natural Language Processing (NLP), Machine Learning, Deep Learning, Spatial Analysis, Data Analytics, Big Data, Data Visualization, Data Analysis, Analytics, Data Management, Data Modeling, Data Scientist, Data Engineering, Geographic Information Systems, Optimization, PySAL, APIs, Predictive Analytics, ETL Development, Business Analysis, API Integration, Spatial Reasoning, GPT, Generative Pre-trained Transformers (GPT), Data Governance, OpenAI GPT-3 API, Finance, Decision Trees, Regression, Recommendation Systems, Vehicle Routing, Machine Learning Operations (MLOps), Image Processing, Image Generation, Text to Image, Labeling, OpenAI API, OpenAI GPT-4 API, Chatbots, Algorithms, Data Structures, Time Series, Computer Vision, GeoPandas, Time Series Analysis, Generative Adversarial Networks (GANs), Presentations, Communication, Web Scraping, Technical Writing, Excel 365, Scraping, OCR, eCommerce, Marketplaces, University Teaching, OpenAI, Business to Business (B2B), Business to Consumer (B2C), Cloud Tasks, BERT, Sentiment Analysis, Google Cloud Functions, Azure Databricks, Large Language Models (LLMs), LangChain
Docker, Databricks, Google App Engine, Amazon Web Services (AWS), Azure
Spark, Flask, Apache Spark, Bootstrap
Master's Degree in Data Science
Universidad Politécnica de Madrid - Madrid, Spain
Bachelor's Degree in Computer Science
KTH Royal Institute of Technology - Stockholm, Sweden