Iván Sánchez
Verified Expert in Engineering
Data Scientist and Developer
Valencia, Spain
Toptal member since May 31, 2022
Iván is a data scientist with experience in Python, TensorFlow, exploratory data analysis, natural language processing, computer vision, and Google Cloud Platform. He worked on many projects building the machine learning lifecycle that powered artificial intelligence, processing digital documents, performing data analytics, and normalizing databases. Iván is passionate about applying the state-of-the-art solutions that constantly arise in the fast-growing field of AI.
Portfolio
Experience
Availability
Preferred Environment
Ubuntu Linux, Visual Studio Code (VS Code), Docker, Docker Compose, Python 3
The most amazing...
...thing I've designed and developed is a neural network to extract data from semi-structured documents with minimal labeling effort.
Work Experience
Senior Data Scientist
Zyte
- Implemented several proof of concept (PoC) work using ChatGPT and OpenAI API (GPT 3.5 and GPT4) for a variety of tasks, including automatic labeling, active learning, automatic report generation, data extraction, etc.
- Set up a load testing pipeline and automatic reporting to track the staging and production performance of a machine learning service.
- Researched open source large language models (LLM) and ran cost comparisons to leverage the use of OpenAI vs. Open Source LLMs for various NLP tasks.
- Added features and verticals to existing machine learning products and performed iterations of the machine learning lifecycle to ensure the best possible quality in the available development time.
Data Scientist | MLOps
GFT Technologies
- Designed and implemented a data extraction pipeline to extract specific target data from semi-structured documents, such as total quantity paid in bill scans.
- Developed a custom neural network from scratch to perform the cognitive extraction from semi-structured documents.
- Implemented an automatic page rotation detection software optimizing time execution while keeping its optimum performance quality.
- Deployed a document classification solution developed by another team on the Google Cloud Platform.
- Helped integrate and productionalize a data extraction pipeline in a REST API by building a correct and stateful application containing the AI system.
- Devised and implemented an active learning procedure for state-of-the-art named-entity recognition models that drastically reduced the labeling time and costs for a particular entity extraction project.
- Mediated key communications with an important American automobile manufacturer in a data engineering and processing project to detect and solve on-site issues with the IoT sensors that obtained the information.
Data Scientist for NLP
PFS Group
- Developed, tested, and integrated a natural language processing system for text classification, entity recognition, and information retrieval for legal documents.
- Researched state-of-the-art solutions for intelligent document navigability and explainable AI for document classification.
- Mediated key communication between the stakeholders with business knowledge about legal documents (end users) to capture the requirements that the AI system should have.
Data Scientist for Computer Vision
Mybrana
- Developed and tested the face tracking module in C++ for a mobile SDK for face detection and face alignment. This module allowed the placement of augmented reality content on the users' faces with the phone camera.
- Built and tested the simultaneous localization and mapping (SLAM) module in C++ for a mobile SDK camera localization. This module allowed the placement of augmented reality content on a scanned surface using visual odometry technology.
- Helped integrate and test the augmented reality modules in iOS and Android mobile apps.
Experience
Document Classification and Data Extraction System for Legal Documents
I developed the library and worked on integrating it into a scalable service deployed in Kubernetes. I also ensured the reusability of this library's modules and its proper documentation.
Insurance Document Processing with Complete Machine Learning Lifecycle
As a data scientist, I developed the page-wise classifier and extracted information from the pages using Google's data extraction services and a custom named-entity recognition solution.
The application UI lets the user correct the machine learning system's predictions. I was in charge of using this user feedback to retrain the models and redeploy them using a continuous delivery pipeline.
Customer Lifetime Value Analysis and Prediction for a Bank Client
I acted as the data scientist, gathering, exploring, and analyzing data from the bank to understand the bank's requirements and get valuable insights and statistics about the customer lifetime value of the bank's clients.
In particular, stratification of the bank's clients into groups by age, income, and financial score was done before the yearly analysis of their behavior to statistically model their attrition, product reorder probability, and other valuable financial information.
IoT Data Processing and Analysis Tool for a Car Manufacturing Client
The client needed a data visualization tool to quickly obtain insights from the manufacturing plant and how their machines operated and correlation analysis between changeable input material properties—such as oiling, blank thickness, or elongation—and blank output quality, measured with a provided vision system.
As the data scientist, I oversaw consuming the particular messages needed for the analysis required by the client, creating the visualization app with Plotly, generating a correlation analysis with Python, and reporting to the client. Also, I found several on-site problems in the data and reported them in an agile way to ensure they had enough information to fix them and provide quality data.
Education
Master's Degree in Artificial Intelligence, Pattern Recognition, and Digital Imaging
Polytechnic University of Valencia - Valencia, Spain
Bachelor's Degree in Computer Engineering
Polytechnic University of Valencia - Valencia, Spain
Skills
Libraries/APIs
Pandas, Scikit-learn, TensorFlow, Keras, NumPy, SciPy, PyTorch
Tools
Git, Jupyter, ChatGPT, Docker Compose, Named-entity Recognition (NER), Hidden Markov Model, Scikit-image, Google AI Platform, Plotly, Google Compute Engine (GCE)
Languages
Python 3, C++11
Paradigms
Object-oriented Programming (OOP), Distributed Computing, Continuous Deployment, Load Testing
Platforms
Docker, Jupyter Notebook, Visual Studio Code (VS Code), Google Cloud Platform (GCP), Ubuntu Linux, Kubernetes
Frameworks
Locust, Flask
Storage
Databases, Google Cloud Storage, Google Cloud
Other
Machine Learning, Artificial Intelligence, Data Science, Pattern Recognition, Programming, Natural Language Processing (NLP), Neural Networks, Deep Neural Networks (DNNs), Classification Algorithms, Text Classification, Computer Science, LSTM Networks, Generative Pre-trained Transformers (GPT), GPT-4, OpenAI GPT-3 API, Language Models, Deep Learning, Data Representation, Convolutional Neural Networks (CNNs), Recurrent Neural Networks (RNNs), Regular Expressions, PIP, Hugging Face, BERT, Active Learning, Data Extraction, Exploratory Data Analysis, DVC, MLflow, Algorithms, Metaheuristics, Operating Systems, Sorting Algorithms, Logistic Regression, Linear Regression, Information Theory, Information Retrieval, Linear Optimization, Genetic Algorithms, Stochastic Modeling, Computer Vision, Simultaneous Localization & Mapping (SLAM), Eye Tracking, Facial Recognition, Ensemble Methods, Visual Odometry, Documentation, Reporting, Analysis, Customer Data, Data Analytics, Data Processing, Correspondence Analysis (CA), Data Quality, Data Quality Analysis, Tracking, Data Scraping
How to Work with Toptal
Toptal matches you directly with global industry experts from our network in hours—not weeks or months.
Share your needs
Choose your talent
Start your risk-free talent trial
Top talent is in high demand.
Start hiring