Juan Luis Ruiz - Tagle, Developer in Barcelona, Spain
Juan is available for hire
Hire Juan

Juan Luis Ruiz - Tagle

Verified Expert  in Engineering

Data Scientist and Developer

Location
Barcelona, Spain
Toptal Member Since
September 29, 2022

Juan Luis is a data scientist with expertise in spatial analytics and optimization. He has a background in computer science and four years of professional experience working in spatial data science, finance, and advertising technology. He combines his deep knowledge of machine learning with software engineering best practices to build robust and reliable ML solutions. Juan Luis has strong analytical skills and addresses problems from a business perspective, prioritizing the client's needs.

Portfolio

Pelikan Mobility SAS
Python, Vehicle Routing, Geographic Information Systems, Algorithms...
IESE Business School
University Teaching
CARTO
Pandas, NumPy, GIS, GeoPandas, Optimization, PySAL, SQL, Docker, Spark...

Experience

Availability

Part-time

Preferred Environment

MacOS, Google Cloud, BigQuery, Git, Slack, Python

The most amazing...

...system I've developed is a set of spatial ML algorithms in SQL, which run at scale on cloud data warehouses like Google BigQuery.

Work Experience

Senior Data Scientist

2023 - 2023
Pelikan Mobility SAS
  • Developed VRP algorithms to accommodate electric vehicles, incorporating the constraints of charging time and limited range.
  • Conducted comprehensive research to identify alternative solutions for VRP solvers and successfully implemented them into the project, resulting in significant cost reductions.
  • Made valuable contributions to open-source projects pertaining to the VRP problem.
Technologies: Python, Vehicle Routing, Geographic Information Systems, Algorithms, Artificial Intelligence (AI)

Data Analytics Lead Instructor

2022 - 2022
IESE Business School
  • Taught a two-week intensive course on Python and Data Analytics to 60+ MiM students at IESE Business School.
  • Managed different Python levels in students, making sure the inexperienced had a solid understanding of the fundamentals while I provided the more advanced students with extra material.
  • Evaluated the students, measuring the effort made to take the most out of the course, regardless of their initial Python skills.
  • Coordinated with two teacher assistants who helped me with the classes and another lead instructor who instructed another classroom.
Technologies: University Teaching

Data Scientist

2020 - 2022
CARTO
  • Implemented spatial statistics and ML algorithms in SQL to run them at scale on cloud data warehouses.
  • Developed spatial models for estimating accumulated litter in cities at a granular level.
  • Built optimization solutions for vehicle routing and territory management, connected to Google BigQuery as remote functions.
  • Designed spatial indexes for clients, which combined target demographics, POI presence density, and mobility data.
  • Identified trends in hotspot areas for retail during the pandemic using human mobility data (origin-destination matrices), POI data, and performing time series analysis.
  • Created ETL processes with Apache Airflow to recurrently ingest spatial data from several data sources into CARTO's platform.
Technologies: Pandas, NumPy, GIS, GeoPandas, Optimization, PySAL, SQL, Docker, Spark, Apache Airflow, Databricks, TensorFlow, Spatial Reasoning, Data Science, Data Analytics, REST APIs, Big Data, JavaScript, Data Visualization, Apache Spark, Predictive Analytics, Data Analysis, Analytics, eCommerce, Marketplaces, Data Management, Data Governance, Azure, Keras, Scikit-learn, Databases, Data Modeling, Database Administration (DBA), Decision Trees, Snowflake, Regression, Data Scientist, Recommendation Systems, Data Engineering, Google Cloud, BigQuery, Git, ETL Development, Vehicle Routing, Geographic Information Systems, Artificial Intelligence (AI), Amazon Web Services (AWS), Image Processing

Data Scientist

2019 - 2020
ETS Asset Management Factory
  • Applied state-of-the-art techniques to make more accurate predictions of financial markets' behavior, contributing to the financial advisory firm's primary purpose of making stock market investment recommendations driven by data science.
  • Developed a RESTful API that serves synthetic stock series created by generative adversarial networks on demand.
  • Put into production a novel deep learning portfolio investment strategy and deployed it to internal servers to automate portfolio recommendations.
Technologies: Time Series Analysis, Generative Adversarial Networks (GANs), APIs, Flask, Jenkins, Data Analytics, REST APIs, Data Analysis, Generative Pre-trained Transformers (GPT), Natural Language Processing (NLP), GPT, Analytics, Keras, Scikit-learn, Finance, Data Modeling, Decision Trees, Regression, Data Scientist, Google Cloud, Git, Artificial Intelligence (AI), Amazon Web Services (AWS)

Data Analyst

2019 - 2019
Seedtag
  • Developed a funnel for the company's video advertising campaigns which helped gain insights into the adequate progress of the business.
  • Built ETL processes that aggregated data periodically from ads stored in a MongoDB database and displayed the current state of the ad flow in a dashboard.
  • Assisted the CEO in preparing the company's next funding round by analyzing revenue and client fidelity.
Technologies: MongoDB, SQL, Python, Google Sheets, Data Analytics, Data Analysis, Analytics, eCommerce, Business Analysis

Local MX Refinement | ML Tool for out of Home Advertising Campaign Optimization

https://carto.com/blog/carto-havas-media-big-data-ai-world-madrid/
While working in CARTO, I took full ownership of the ML models and optimization algorithms of the Local MX, a tool built for the Havas Media Group, which makes predictions of coverage and impacts at a very granular level and computes a selection of billboards that maximize such metrics.

The client's interest was to measure the impressions (number of visits) and coverage (number of distinct visitors) each of their billboards in Spain received weekly. They also wanted this information segmented by different categorical variables: type of day, hourly range, age, gender, and income level. For this, our models were trained on data from several sources (telco, SDK data, sociodemographic, POI, etc.). Then an optimization algorithm ordered the billboards best adapted to the target campaign.

I got involved in this project at a calibration stage, in which I:

• Tweaked the ML models and algorithms to align with client expectations
• Automatized background processes for telco data ingestion, automatic enablement of new billboards in the tool, etc
• Extended the usage of the tool within the Canary Islands by computing SDK routes on this region with OSRM
• Handled the communication with the client for all technical matters

Sales KPI Calculation Automation for an International Beverage Company

During my time at CARTO, I had the privilege to work with one of the leading beverage companies in the world. They had a vast amount of sales data to analyze to calculate various KPIs related to their fleet, stock clients, and other business areas. At the time, processing this data and performing the calculations were done purely in Excel, which was time-consuming and prone to errors.

Together with my team, we launched a Spark cluster in Databricks to automate the KPI calculations. This allowed us to leverage the power of distributed computing and easily process the massive amounts of data the client was working with. I worked closely with their team to understand their specific requirements. Then I implemented the Spark-based solution that automated the calculations, eliminating the need for manual intervention and saving countless work hours.

TweetWars

http://tweetwars.wtf
TweetWars is a web app that enables users to compare two Twitter accounts by analyzing their latest 200 tweets and displaying insightful results in an interactive dashboard.

The tweets of both accounts are analyzed using NLP techniques, including sentiment and emotion prediction, topic modeling, and tweeting behavior statistics. These results are presented in a dashboard and sent to the paying user.

Despite its complexity, the system is fully autonomous and requires minimal maintenance on my part. It is comprised of multiple seamlessly integrated microservices which take care of payment processing, tweet fetching, sentiment inference, dashboard generation, email communication, and other tasks.

Black Friday Analysis

https://www.safegraph.com/blog/2021-black-friday
I did a thorough spatial analysis of the effects of the pandemic on retail stores during Black Friday in four different cities across the US using SafeGraph's human mobility data. I compared data for 2019, 2020, and 2021 to obtain insights into the evolution of footfall traffic and presented my results in a webinar and an article published on SafeGraph's blog.

Spatial Data Science Conference 2022

https://www.youtube.com/watch?v=6kNqsQY_e90
I participated as a speaker in the Spatial Data Science Conference held in May 2022 at the Royal Geographic Society in London. This conference is among the most renowned congresses for geographic information systems and spatial data.

I presented the CARTO Analytics Toolbox, an SQL library for cloud data warehouses' spatial analysis and modeling.

Scraper App for Official State Documents in PDF

A script that scrapes the BOE (the official gazette published daily by the Spanish government) in PDF format. It extracts relevant information about newly registered brands, including the registrant's name, telephone number, company website, etc. It also generates an excel file containing all the scraped data in a structured way.

Personal Blog

http://juanluis.me
I write about data science, geographic information systems, and math. Some articles have also been published in CARTO's blog as well as in Towards Data Science and Cantor's Paradise, two important publications on the Medium platform.

Some examples:
• Generating fake data with pandas, very quickly
https://towardsdatascience.com/generating-fake-data-with-pandas-very-quickly-b99467d4c618
• What to expect when throwing dice and adding them up
https://www.cantorsparadise.com/what-to-expect-when-throwing-dice-and-adding-them-up-5231f3831d7
• Scraping Google Search (without getting caught)
https://juanluisrto.medium.com/scraping-google-search-without-getting-caught-e43bb91b363e
• Can neural networks predict the stock market just by reading
newspapers?
https://quantdare.com/can-neural-networks-predict-the-stock-market-just-by-reading-newspapers/

Scraping Orchestra

https://github.com/juanluisrto/Scraping-orchestra
I created a scraping master-slave system based on Google App Engine. The main problem of scraping is that sites can block your IP if they detect misleading behavior. As a solution, this system orchestrates from a local process a scraper deployed in Google App Engine. The main idea is to start scraping and redeploying the scraper to get a new IP whenever the current IP gets blocked.

Svenska Scraper

https://github.com/juanluisrto/SvenskaScraper
I built a web scraper during college to help me study the Swedish language. Given a list of words in Spanish, it gathers their translations and example sentences. The software also generates exercises to practice.

LLM Chatbots

I have experience in creating LLM-based chatbots for different use cases:

A couple of examples follow:

1. Youtuber Chatbot
I fine-tuned LLMs to mimic the style of a Youtuber (MKBHD)
For this, I scraped the transcripts of his YouTube videos and generated a dataset of conversations based on these transcripts.
I tried two approaches:
• Finetuning Llama2 7b as a quantized model and deploying it as an inference endpoint in Hugging Face
• Finetuning GPT3.5 from OpenAI via thier API.
I also gave the chatbot access to a vector database with info about the YouTuber's videos in order to make recommendations.

2. Doctor Chatbot
For a client, I developed a chatbot that would interact with a database of doctors in the US and answer questions about it.
It would transform the questions asked by the user into an SQL query and then return the results of the executed query to the user.

3. Firecitadel bot:
A Twitter bot that imagines stories and generates images using open-source models. (https://x.com/firecitadel)
2020 - 2020

Master's Degree in Data Science

Universidad Politécnica de Madrid - Madrid, Spain

2015 - 2018

Bachelor's Degree in Computer Science

KTH Royal Institute of Technology - Stockholm, Sweden

Libraries/APIs

Pandas, Scikit-learn, NumPy, REST APIs, Keras, TensorFlow, Stripe, PyTorch

Tools

BigQuery, GIS, Git, Apache Airflow, Google Sheets, Slack, Jenkins, Celery, Google Analytics

Languages

Python, SQL, R, JavaScript, Snowflake

Paradigms

Data Science, Agile Software Development

Storage

Google Cloud, Databases, MongoDB, Google Cloud Storage, Database Administration (DBA)

Frameworks

Spark, Flask, Apache Spark, Bootstrap

Platforms

Docker, Databricks, Google App Engine, Amazon Web Services (AWS), Azure

Other

Artificial Intelligence (AI), Natural Language Processing (NLP), Machine Learning, Deep Learning, Spatial Analysis, Data Analytics, Big Data, Data Visualization, Data Analysis, Analytics, Data Management, Data Modeling, Data Scientist, Data Engineering, Geographic Information Systems, Optimization, PySAL, APIs, Predictive Analytics, ETL Development, Business Analysis, API Integration, Spatial Reasoning, GPT, Generative Pre-trained Transformers (GPT), Data Governance, OpenAI GPT-3 API, Finance, Decision Trees, Regression, Recommendation Systems, Vehicle Routing, Machine Learning Operations (MLOps), Image Processing, Image Generation, Text to Image, Labeling, OpenAI API, OpenAI GPT-4 API, Chatbots, Algorithms, Data Structures, Time Series, Computer Vision, GeoPandas, Time Series Analysis, Generative Adversarial Networks (GANs), Presentations, Communication, Web Scraping, Technical Writing, Excel 365, Scraping, OCR, eCommerce, Marketplaces, University Teaching, OpenAI, Business to Business (B2B), Business to Consumer (B2C), Cloud Tasks, BERT, Sentiment Analysis, Google Cloud Functions, Azure Databricks, Large Language Models (LLMs), LangChain

Collaboration That Works

How to Work with Toptal

Toptal matches you directly with global industry experts from our network in hours—not weeks or months.

1

Share your needs

Discuss your requirements and refine your scope in a call with a Toptal domain expert.
2

Choose your talent

Get a short list of expertly matched talent within 24 hours to review, interview, and choose from.
3

Start your risk-free talent trial

Work with your chosen talent on a trial basis for up to two weeks. Pay only if you decide to hire them.

Top talent is in high demand.

Start hiring