Juan Luis Ruiz - Tagle, Data Scientist and Developer in Madrid, Spain
Juan Luis Ruiz - Tagle

Data Scientist and Developer in Madrid, Spain

Member since September 29, 2022
Juan Luis is a data scientist with expertise in spatial analytics and optimization. He has a background in computer science and four years of professional experience working in spatial data science, finance, and advertising technology. He combines his deep knowledge in machine learning with software engineering best practices to build robust and reliable ML solutions. Juan Luis has strong analytical skills and addresses problems from a business perspective, prioritizing the client's needs.
Juan is now available for hire

Portfolio

  • CARTO
    Pandas, NumPy, GIS, GeoPandas, Optimization, Pysal, SQL, Docker, Spark...
  • ETS Asset Management Factory
    Time Series Analysis, Generative Adversarial Networks (GANs), APIs, Flask...
  • Seedtag
    MongoDB, SQL, Python, Google Sheets, Data Analytics, Data Analysis, Analytics...

Experience

Location

Madrid, Spain

Availability

Part-time

Preferred Environment

MacOS, Google Cloud, BigQuery, Git, Slack, Python

The most amazing...

...system I've developed is a set of spatial ML algorithms in SQL, which run at scale on cloud data warehouses like Google BigQuery.

Employment

  • Data Scientist

    2020 - 2022
    CARTO
    • Implemented spatial statistics and ML algorithms in SQL to run them at scale on cloud data warehouses.
    • Developed spatial models for estimating accumulated litter in cities at a granular level.
    • Built optimization solutions for vehicle routing and territory management, connected to Google BigQuery as remote functions.
    • Designed client spatial indexes, which combined target demographics, POI presence density, and mobility data.
    • Identified trends in hotspot areas for retail during the pandemic using human mobility data like origin-destination matrix computation and POI and time series analysis.
    • Created ETL processes with Apache Airflow to recurrently ingest spatial data from several data sources into CARTO's platform.
    Technologies: Pandas, NumPy, GIS, GeoPandas, Optimization, Pysal, SQL, Docker, Spark, Apache Airflow, Databricks, TensorFlow, Spatial Data Science, Data Analytics, REST APIs, Big Data, JavaScript, Data Visualization, Apache Spark, Predictive Analytics, Data Analysis, Analytics, eCommerce, Marketplaces
  • Data Scientist

    2019 - 2020
    ETS Asset Management Factory
    • Applied state-of-the-art techniques to make more accurate predictions of financial markets' behavior, contributing to the financial advisory firm's primary purpose of making stock market investment recommendations driven by data science.
    • Developed a RESTful API that serves synthetic stock series created by generative adversarial networks on demand.
    • Put into production a novel deep learning portfolio investment strategy and deployed it to internal servers to automate portfolio recommendations.
    Technologies: Time Series Analysis, Generative Adversarial Networks (GANs), APIs, Flask, Jenkins, Data Analytics, REST APIs, Data Analysis, Natural Language Processing (NLP), Analytics
  • Data Analyst

    2019 - 2019
    Seedtag
    • Developed a funnel for the company's video advertising campaigns which helped gain insights into the adequate progress of the business.
    • Built ETL processes that aggregated data periodically from ads stored in a MongoDB database and displayed the current state of the ad flow in a dashboard.
    • Assisted the CEO in preparing the company's next funding round by analyzing revenue and client fidelity.
    Technologies: MongoDB, SQL, Python, Google Sheets, Data Analytics, Data Analysis, Analytics, eCommerce, Business Analysis

Experience

  • Local MX Refinement | ML tool for Out of Home Advertising Campaign Optimization
    https://carto.com/blog/carto-havas-media-big-data-ai-world-madrid/

    While working in CARTO, I took full ownership of the ML models and optimization algorithms of the Local MX, a tool built for the Havas Media Group, which makes predictions of coverage and impacts at a very granular level and computes a selection of billboards that maximize such metrics.

    The client's interest was to measure the impressions (number of visits) and coverage (number of distinct visitors) each of their billboards in Spain received weekly. They also wanted this information segmented by different categorical variables: type of day, hourly range, age, gender, and income level. For this, our models were trained on data from several sources (telco, SDK data, sociodemographic, POI, etc.). Then an optimization algorithm ordered the billboards best adapted to the target campaign.

    I got involved in this project at a calibration stage, in which I:

    • Tweaked the ML models and algorithms to align with client expectations
    • Automatized background processes for telco data ingestion, automatic enablement of new billboards in the tool, etcetera
    • Extended the usage of the tool within the Canary Islands by computing SDK routes on this region with OSRM
    • Handled the communication with the client for all technical matters

  • Spatial Data Science Conference 2022
    https://www.youtube.com/watch?v=6kNqsQY_e90

    I participated as a speaker in the Spatial Data Science Conference held in May 2022 at the Royal Geographic Society in London. This conference is among the most renowned congresses for geographic information systems and spatial data.

    I presented the CARTO Analytics Toolbox, an SQL library for cloud data warehouses' spatial analysis and modeling.

  • Black Friday Analysis
    https://www.safegraph.com/blog/2021-black-friday

    I did a thorough spatial analysis of the effects of the pandemic on retail stores during Black Friday in four different cities across the US using SafeGraph's human mobility data. I compared data for 2019, 2020, and 2021 to obtain insights into the evolution of footfall traffic and presented my results in a webinar and an article published on SafeGraph's blog.

  • Scraper App for Official State Documents in PDF

    A script that scrapes the BOE (the official gazette published daily by the Spanish government) in PDF format. It extracts relevant information about newly registered brands, including the registrant's name, telephone number, company website, etc. It also generates an excel file containing all the scraped data in a structured way.

  • Personal Blog
    http://juanluis.me

    I write about data science, geographic information systems, and math. Some articles have also been published in CARTO's blog as well as in Towards Data Science and Cantor's Paradise, two important publications on the Medium platform.

    Some examples:
    • Generating fake data with pandas, very quickly
    https://towardsdatascience.com/generating-fake-data-with-pandas-very-quickly-b99467d4c618
    • What to expect when throwing dice and adding them up
    https://www.cantorsparadise.com/what-to-expect-when-throwing-dice-and-adding-them-up-5231f3831d7
    • Scraping Google Search (without getting caught)
    https://juanluisrto.medium.com/scraping-google-search-without-getting-caught-e43bb91b363e
    • Can neural networks predict the stock market just by reading
    newspapers?
    https://quantdare.com/can-neural-networks-predict-the-stock-market-just-by-reading-newspapers/

  • Scraping Orchestra
    https://github.com/juanluisrto/Scraping-orchestra

    I created a scraping master-slave system based on Google App Engine. The main problem of scraping is that sites can block your IP if they detect misleading behavior. As a solution, this system orchestrates from a local process a scraper deployed in Google App Engine. The main idea is to start scraping and redeploying the scraper to get a new IP whenever the current IP gets blocked.

  • Svenska Scraper
    https://github.com/juanluisrto/SvenskaScraper

    I built a web scraper during college to help me study the Swedish language. Given a list of words in Spanish, it gathers their translations and example sentences. The software also generates exercises to practice.

Skills

  • Languages

    Python, SQL, JavaScript
  • Libraries/APIs

    Pandas, NumPy, REST APIs, TensorFlow
  • Tools

    BigQuery, GIS, Git, Apache Airflow, Google Sheets, Slack, Jenkins, Celery
  • Paradigms

    Data Science, Agile Software Development
  • Other

    Machine Learning, Spatial Analysis, Data Analytics, Big Data, Data Visualization, Data Analysis, Analytics, Natural Language Processing (NLP), Deep Learning, Optimization, Pysal, APIs, Spatial Data Science, Predictive Analytics, ETL Development, Business Analysis, Algorithms, Data Structures, Artificial Intelligence (AI), Time Series, Computer Vision, GeoPandas, Time Series Analysis, Generative Adversarial Networks (GANs), Presentations, Communication, Web Scraping, Technical Writing, Excel 365, Scraping, OCR, eCommerce, Marketplaces
  • Platforms

    Docker, MacOS, Databricks, Google App Engine, Amazon Web Services (AWS)
  • Storage

    Google Cloud, MongoDB
  • Frameworks

    Spark, Flask, Apache Spark

Education

  • Master's Degree in Data Science
    2020 - 2020
    Universidad Politécnica de Madrid - Madrid, Spain
  • Bachelor's Degree in Computer Science
    2015 - 2018
    KTH Royal Institute of Technology - Stockholm, Sweden

To view more profiles

Join Toptal
Share it with others