Daniel Pérez Rubio
Verified Expert in Engineering
Data Scientist and Developer
Guadalajara, Spain
Toptal member since November 29, 2021
Daniel is an experienced data scientist with a master's in signal theory (telecommunications). He accounts for eight years of professional experience: from impactful seed-phase startups like Ketekelo (CTO, two years) to global companies like BASF (senior data scientist, two years). Daniel strives on challenges, so he's decided to become a freelance data scientist to help Toptal clients achieve excellence with developing their machine learning, deep learning, NLP, and big data solutions.
Portfolio
Experience
- Machine Learning - 7 years
- APIs - 7 years
- Scraping - 7 years
- Docker - 5 years
- Generative Pre-trained Transformers (GPT) - 5 years
- Natural Language Processing (NLP) - 5 years
- Python - 5 years
- Deep Learning - 3 years
Availability
Preferred Environment
Windows, Windows Subsystem for Linux (WSL), Visual Studio Code (VS Code), Docker
The most amazing...
...product I've developed was an internal service desk ticket prioritization model, which helped reduce escalations to 60% within the same workforce.
Work Experience
Data Scientist
Non-disclosable NLP startup from MIT (toptal engagement)
- Developed a productive pipeline based on model explainability with Shapely values for the analysis of complex dependencies between language and cultural trends in a company.
- Designed a robust replicability setup ensuring AutoML capabilities featuring multiple overfitting, dimensionality, and signal/noise control processes like SMOTE, hyperparameter tuning, SHAP-based feature selection, cross-validation, and seed control.
- Refactored and optimized two existing big data pipelines, improving stability and reducing resource allocation with a cost reduction of 75% for one.
- Implemented and optimized a topic modeling pipeline based on a large language model (BERT), which helped validate their custom topic modeling approach.
- Implemented a flexible fine-tuning process for large language model architectures like BERT and GPT2 and used it to train several topic classification models, which were used to refine their custom topic modeling pipeline.
- Implemented a clause parsing and classification pipeline based on large language models for a clause sentiment classification tool.
- Designed and proved a semi-supervised learning concept for the iterative refinement of large language models based on auto-labeling techniques.
- Implemented a productive runtime predictive resource allocation concept to avoid GPU and system memory issues. It's a process based on resource usage logging and a polynomial interpolation pipeline which helped reduce most memory allocation errors.
- Conducted several viability analyses for different functional features devised by the client in nine months, following with a subsequent implementation upon the client's decision and the completion of all open points in their product roadmap.
- Kept daily contact with the CTO and CEO, providing all necessary insights and low-level details for them to be able to steer product development, always coming forward with proposals and my expert opinion but prioritizing their will.
Senior Data Scientist
Daimler
- Developed three big data after-sales time series forecasting products: the timing of tire replacement, the timing of brake disc replacement, and the timing of brake pad replacement.
- Kickstarted creating an experimentation library to allow multiple data scientists to run experiments over the same product, so the results from those experiments could be compared, replicated, and easily communicated to business partners.
- Fostered the improvement of the branching model and CI/CD pipelines to eliminate human error and operations overhead and unlock the possibility of developing software packages instead of scripts in notebooks.
- Created two new data sources for the team's data lake: worldwide elevation with 30 meters resolution (Aster 30) and regional name localization, including common countries, cities, provinces, and names written in over ten languages.
- Collaborated in the organization of the 2021 Daimler Innovation Days, a 2-day event focused on creating fresh product designs and getting familiar with the most modern technologies.
Senior Data Scientist
BASF
- Developed two successful NLP products: a fuzzy-logic expert system for customer name matching and a topic modeling dashboard for patent search engine monitoring.
- Made three generalistic products: a recommendation system for health of inventory management, a threat-level classifier for domain name trademark fraud detection, and an escalation probability forecaster for a service desk's ticket prioritization.
- Performed a topic and sentiment analysis report of Spain's 2020-2021 employee survey for HR to help them process thousands of valuable free-text feedback fields.
- Performed multiple workshops for machine learning, Git, open-source software, and remote Docker environments.
- Supported company culture by fostering and co-organizing local and global initiatives: 10% innovation time, cross-squad collaboration initiatives, and custom training plans.
- Led, together with my colleagues, the introduction of a modern Python workflow in a global company, seamlessly using best code practices, CI/CD pipelines, containerization, and remote environments.
- Supported the hiring process by conducting multiple technical interviews.
- Assumed the shared role of product owner during more than half of the squad's lifetime.
- Worked successfully and efficiently under the Scrum and Kanban Agile frameworks, delivering five successful products in two years.
Senior Data Scientist
Rebold
- Implemented and maintained a daily CD pipeline for model training, optimization, and deployment for an ad-buying agent.
- Developed a whole email campaign audience-enriched analytics solution.
- Took ownership of three big data daily running products: machine learning ad-buying agent training, email campaign audience-enriched analytics, and cookie-based audience classification.
- Supported business intelligence (BI) colleagues, implementing custom Python scripts and SQL queries to improve their processes and help them work more efficiently.
- Developed, with the assistance of a freelance DevOps engineer, a Python tool for creating and running Ansible templates based on playbooks.
- Kickstarted the development of a citizen development web platform based on Flask.
Data Scientist
Human Forecast
- Worked autonomously as the only technical profile in the company.
- Developed several PoC solutions with a value proposition based on machine learning, most of which can be found on my GitHub profile.
- Sold and developed four final products: a topic discovery engine for market research, a real-time social brand image observatory, an Edge AI handrail use advisor, and a chatbot-based smart contract solution for international commerce tracking.
- Established the presales strategies together with the CEO.
- Performed several product presentations to big companies such as Airbus, Navantia, Vall d'Hebron Hospital, and Cemex Ventures.
- Worked in diverse fields like topic modeling, human pose recognition, hyperspectral imaging, Edge AI, sentiment analysis, chatbots, smart contracts, data mining, dashboarding, and APIs.
CTO
Ketekelo
- Worked as technical lead and full-stack developer, setting the development roadmap and executing it, together with an intern student.
- Implemented several custom WordPress/WooCommerce components, API integrations, and a scraping tool.
- Pitched at multiple events. Awarded as the best pitch by Madrid's local government, and attracted the interest of investors like Kike Sarasola and Fundación José Manuel Entrecanales.
- Gained acceleration programs from IE Business School, Lanzadera, and Madrid's local government.
Experience
Topic Discovery Engine for Market Research
https://github.com/danielperezr88/TOMIt allows the user to define fine-grained searches for fields of interest, keep track of the different topics found per field and their relevance with time, and find out quickly if a new topic of interest appears in that field.
Logistics Dapp: Smart Contract Chatbot app for Freight Transport Tracking
https://github.com/danielperezr88/logistics-dappThe app is in the active MVP phase. It's been tested and proved useful, but currently is not supported because of changes in Coinbase's Dapp platform and discontinuity of relations with the sponsor.
Handrail Advisor: On-site Human Pose Tracking Camera for Improved Worker Security
https://github.com/danielperezr88/idoonet-rpi-mvncsOnce placed on a point with good visibility of a handrail-guarded area and configured with labels of the handrail positions and associated areas of use; it will track the correct use of the handrail by all workers on the area and show real-time feedback to those in a preferred way (sound, video, and lightbulb feedback).
Education
Postgraduate Course in Artificial Intelligence
Stanford University - Stanford, CA
Bachelor’s Degree and Master's Degree in Telecommunications
Universidad de Alcalá - Alcalá de Henares, Madrid, Spain
Certifications
Stanford Reinforcement Learning
Stanford University | Online
Stanford Natural Language Processing with Deep Learning
Stanford University | Online
AI for Trading Nanodegree
Udacity
Startup Acceleration and Consolidation
IE Business School
Introduction to Artificial Intelligence
Sebastian Thrun and Peter Norvig
Skills
Libraries/APIs
Scikit-learn, Natural Language Toolkit (NLTK), Pandas, NumPy, Beautiful Soup, PySpark, PyTorch, Shapely, Matplotlib, Spark ML, jQuery, OpenCV, Node.js, TensorFlow, D3.js, Asyncio, Web3.js, SciPy, SpaCy
Tools
Spark SQL, Amazon Elastic MapReduce (EMR), Git, Supervisord, Apache HTTP Server, Apache Airflow, Named-entity Recognition (NER), GitLab CI/CD, GitHub, Jira, Seaborn, MATLAB, Amazon SageMaker, Plotly, NGINX, Tableau, GIS, Ansible, Helm, StatsModels
Languages
Python, Python 3, C, C++, SQL, PHP, JavaScript, HTML, Solidity, R, Java
Platforms
Visual Studio Code (VS Code), Jupyter Notebook, Windows, Docker, Unix, Raspberry Pi, Arduino, Amazon Web Services (AWS), Databricks, WooCommerce, Kubernetes
Frameworks
Flask, Django, Bootstrap, Express.js, Jinja
Paradigms
Continuous Delivery (CD), ETL, Azure DevOps, Agile, Dynamic Programming
Storage
MySQL, PostgreSQL, Amazon S3 (AWS S3), MongoDB, Google Cloud, Microsoft SQL Server, SAP HANA SQLScript
Other
Statistics, Natural Language Processing (NLP), Machine Learning, K-nearest Neighbors (KNN), TextRank, Data Science, Data Analysis, Data Visualization, Predictive Modeling, Generative Pre-trained Transformers (GPT), Windows Subsystem for Linux (WSL), Numerical Methods, Programming, Embedded Systems, Optimization, Computer Vision, Signal Processing, Deep Learning, Linux Administration, Scraping, APIs, HyperOpt, Apache Superset, Google Data Studio, Support Vector Machines (SVM), Neural Networks, K-means Clustering, Bayesian Statistics, Information Retrieval, Transformers, Word Embedding, Linguistic Tagging, FastAPI, Multiprocessing, lxml, MLflow, Time Series Analysis, Recurrent Neural Networks (RNNs), Sentiment Analysis, Business Planning, Chatbots, Topic Modeling, BERT, Dashboards, Product Leadership, Data Engineering, Electronics, Telematics, Evolutionary Computation, Ajax, Bokeh, Robotics, Motion Planning, Language Models, Azure Data Lake, Azure Data Factory, Encoder-Decoder Neural Architecture, Sequence Models, Fundamental Analysis, Quantitative Analysis, Portfolio Optimization, Risk Models, Attribution Modeling, Backtesting Trading Strategies, Negotiation, Tax Accounting, Business Modeling, Business Model Canvas, Partnerships, Google Custom Search, Clustering, Unsupervised Learning, Predictive Maintenance, Reinforcement Learning, Monte Carlo Simulations, Deep Reinforcement Learning, Temporal Difference Learning, Monte Carlo
How to Work with Toptal
Toptal matches you directly with global industry experts from our network in hours—not weeks or months.
Share your needs
Choose your talent
Start your risk-free talent trial
Top talent is in high demand.
Start hiring