Juan Luis Ruiz - Tagle
Verified Expert in Engineering
Data Scientist and Developer
Barcelona, Spain
Toptal member since September 29, 2022
Juan Luis is a data scientist with expertise in spatial analytics and optimization. He has a background in computer science and four years of professional experience working in spatial data science, finance, and advertising technology. He combines his deep knowledge of machine learning with software engineering best practices to build robust and reliable ML solutions. Juan Luis has strong analytical skills and addresses problems from a business perspective, prioritizing the client's needs.
Portfolio
Experience
Availability
Preferred Environment
MacOS, Google Cloud, BigQuery, Git, Slack, Python
The most amazing...
...system I've developed is a set of spatial ML algorithms in SQL, which run at scale on cloud data warehouses like Google BigQuery.
Work Experience
LLM Engineer
Gartner - Engineering
- Worked within the AskGartner team to develop and deploy a chatbot using retrieval-augmented generation (RAG) pipelines, which efficiently answers user queries by retrieving relevant data from Gartner reports.
- Implemented and deployed an internal tool for code observability that significantly improved the monitoring and debugging processes, resulting in better code quality and making it accessible across the organization.
- Automated the evaluation testing for large language models (LLMs) deploying an AWS Batch pipeline.
Senior Data Scientist
Pelikan Mobility SAS
- Developed VRP algorithms to accommodate electric vehicles, incorporating the constraints of charging time and limited range.
- Conducted comprehensive research to identify alternative solutions for VRP solvers and successfully implemented them into the project, resulting in significant cost reductions.
- Made valuable contributions to open-source projects pertaining to the VRP problem.
Data Analytics Lead Instructor
IESE Business School
- Taught a two-week intensive course on Python and Data Analytics to 60+ MiM students at IESE Business School.
- Managed different Python levels in students, making sure the inexperienced had a solid understanding of the fundamentals while I provided the more advanced students with extra material.
- Evaluated the students, measuring the effort made to take the most out of the course, regardless of their initial Python skills.
- Coordinated with two teacher assistants who helped me with the classes and another lead instructor who instructed another classroom.
Data Scientist
CARTO
- Implemented spatial statistics and ML algorithms in SQL to run them at scale on cloud data warehouses.
- Developed spatial models for estimating accumulated litter in cities at a granular level.
- Built optimization solutions for vehicle routing and territory management, connected to Google BigQuery as remote functions.
- Designed spatial indexes for clients, which combined target demographics, POI presence density, and mobility data.
- Identified trends in hotspot areas for retail during the pandemic using human mobility data (origin-destination matrices), POI data, and performing time series analysis.
- Created ETL processes with Apache Airflow to recurrently ingest spatial data from several data sources into CARTO's platform.
Data Scientist
ETS Asset Management Factory
- Applied state-of-the-art techniques to make more accurate predictions of financial markets' behavior, contributing to the financial advisory firm's primary purpose of making stock market investment recommendations driven by data science.
- Developed a RESTful API that serves synthetic stock series created by generative adversarial networks on demand.
- Put into production a novel deep learning portfolio investment strategy and deployed it to internal servers to automate portfolio recommendations.
Data Analyst
Seedtag
- Developed a funnel for the company's video advertising campaigns which helped gain insights into the adequate progress of the business.
- Built ETL processes that aggregated data periodically from ads stored in a MongoDB database and displayed the current state of the ad flow in a dashboard.
- Assisted the CEO in preparing the company's next funding round by analyzing revenue and client fidelity.
Experience
Local MX Refinement | ML Tool for out of Home Advertising Campaign Optimization
https://carto.com/blog/carto-havas-media-big-data-ai-world-madrid/The client's interest was to measure the impressions (number of visits) and coverage (number of distinct visitors) each of their billboards in Spain received weekly. They also wanted this information segmented by different categorical variables: type of day, hourly range, age, gender, and income level. For this, our models were trained on data from several sources (telco, SDK data, sociodemographic, POI, etc.). Then an optimization algorithm ordered the billboards best adapted to the target campaign.
I got involved in this project at a calibration stage, in which I:
• Tweaked the ML models and algorithms to align with client expectations
• Automatized background processes for telco data ingestion, automatic enablement of new billboards in the tool, etc
• Extended the usage of the tool within the Canary Islands by computing SDK routes on this region with OSRM
• Handled the communication with the client for all technical matters
Sales KPI Calculation Automation for an International Beverage Company
Together with my team, we launched a Spark cluster in Databricks to automate the KPI calculations. This allowed us to leverage the power of distributed computing and easily process the massive amounts of data the client was working with. I worked closely with their team to understand their specific requirements. Then I implemented the Spark-based solution that automated the calculations, eliminating the need for manual intervention and saving countless work hours.
TweetWars
http://tweetwars.wtfThe tweets of both accounts are analyzed using NLP techniques, including sentiment and emotion prediction, topic modeling, and tweeting behavior statistics. These results are presented in a dashboard and sent to the paying user.
Despite its complexity, the system is fully autonomous and requires minimal maintenance on my part. It is comprised of multiple seamlessly integrated microservices which take care of payment processing, tweet fetching, sentiment inference, dashboard generation, email communication, and other tasks.
Black Friday Analysis
https://www.safegraph.com/blog/2021-black-fridaySpatial Data Science Conference 2022
https://www.youtube.com/watch?v=6kNqsQY_e90I presented the CARTO Analytics Toolbox, an SQL library for cloud data warehouses' spatial analysis and modeling.
Scraper App for Official State Documents in PDF
Personal Blog
http://juanluis.meSome examples:
• Generating fake data with pandas, very quickly
https://towardsdatascience.com/generating-fake-data-with-pandas-very-quickly-b99467d4c618
• What to expect when throwing dice and adding them up
https://www.cantorsparadise.com/what-to-expect-when-throwing-dice-and-adding-them-up-5231f3831d7
• Scraping Google Search (without getting caught)
https://juanluisrto.medium.com/scraping-google-search-without-getting-caught-e43bb91b363e
• Can neural networks predict the stock market just by reading
newspapers?
https://quantdare.com/can-neural-networks-predict-the-stock-market-just-by-reading-newspapers/
Scraping Orchestra
https://github.com/juanluisrto/Scraping-orchestraSvenska Scraper
https://github.com/juanluisrto/SvenskaScraperLLM Chatbots
A couple of examples follow:
1. Youtuber Chatbot
I fine-tuned LLMs to mimic the style of a Youtuber (MKBHD)
For this, I scraped the transcripts of his YouTube videos and generated a dataset of conversations based on these transcripts.
I tried two approaches:
• Finetuning Llama2 7b as a quantized model and deploying it as an inference endpoint in Hugging Face
• Finetuning GPT3.5 from OpenAI via thier API.
I also gave the chatbot access to a vector database with info about the YouTuber's videos in order to make recommendations.
2. Doctor Chatbot
For a client, I developed a chatbot that would interact with a database of doctors in the US and answer questions about it.
It would transform the questions asked by the user into an SQL query and then return the results of the executed query to the user.
3. Firecitadel bot:
A Twitter bot that imagines stories and generates images using open-source models. (https://x.com/firecitadel)
Education
Master's Degree in Data Science
Universidad Politécnica de Madrid - Madrid, Spain
Bachelor's Degree in Computer Science
KTH Royal Institute of Technology - Stockholm, Sweden
Skills
Libraries/APIs
Pandas, Scikit-learn, NumPy, REST APIs, Keras, OpenAI API, TensorFlow, Stripe, PyTorch
Tools
BigQuery, GIS, Git, Apache Airflow, Google Sheets, Slack, Jenkins, Celery, Google Analytics, Solr
Languages
Python, SQL, R, JavaScript, Snowflake
Storage
Google Cloud, Databases, MongoDB, Google Cloud Storage, Database Administration (DBA)
Platforms
Docker, Databricks, Google App Engine, Amazon Web Services (AWS), Azure
Frameworks
Spark, Flask, Apache Spark, Bootstrap
Paradigms
Agile Software Development
Other
Artificial Intelligence (AI), Data Science, Natural Language Processing (NLP), Machine Learning, Deep Learning, Spatial Analysis, Data Analytics, Big Data, Data Visualization, Data Analysis, Analytics, Data Management, Data Modeling, Data Scientist, Data Engineering, Geographic Information Systems, Optimization, PySAL, APIs, Predictive Analytics, ETL Development, Business Analysis, API Integration, Spatial Reasoning, Generative Pre-trained Transformers (GPT), Data Governance, OpenAI GPT-3 API, Finance, Decision Trees, Regression, Recommendation Systems, Vehicle Routing, Machine Learning Operations (MLOps), Image Processing, Image Generation, Text to Image, Labeling, OpenAI GPT-4 API, Chatbots, Algorithms, Data Structures, Time Series, Computer Vision, GeoPandas, Time Series Analysis, Generative Adversarial Networks (GANs), Presentations, Communication, Web Scraping, Technical Writing, Excel 365, Scraping, OCR, eCommerce, Marketplaces, University Teaching, OpenAI, Business to Business (B2B), Business to Consumer (B2C), Cloud Tasks, BERT, Sentiment Analysis, Google Cloud Functions, Azure Databricks, Large Language Models (LLMs), LangChain, FastAPI, Generative Artificial Intelligence (GenAI), Text to Image AI
How to Work with Toptal
Toptal matches you directly with global industry experts from our network in hours—not weeks or months.
Share your needs
Choose your talent
Start your risk-free talent trial
Top talent is in high demand.
Start hiring