Iván is available for hire

Iván Sánchez

Verified Expert in Engineering

Data Scientist and Developer

Location

Valencia, Spain

Toptal Member Since

May 31, 2022

Iván is a data scientist with experience in Python, TensorFlow, exploratory data analysis, natural language processing, computer vision, and Google Cloud Platform. He worked on many projects building the machine learning lifecycle that powered artificial intelligence, processing digital documents, performing data analytics, and normalizing databases. Iván is passionate about applying the state-of-the-art solutions that constantly arise in the fast-growing field of AI.

Computer Science Machine Learning Artificial Intelligence (AI)Programming Natural Language Processing (NLP)Deep Neural Networks OpenAI GPT-4 API Python 3 Git Docker Jupyter Pandas Visual Studio Code (VS Code)ChatGPT Deep Learning Facial Recognition Exploratory Data Analysis Named-entity Recognition (NER)

Portfolio

Zyte

Python 3, PyTorch, ChatGPT, OpenAI GPT-4 API, OpenAI GPT-3 API, Active Learning...

GFT Technologies

Algorithms, Artificial Intelligence (AI), Classification Algorithms, Python 3...

PFS Group

Classification Algorithms, BERT, Artificial Intelligence (AI), Algorithms...

Experience

Artificial Intelligence (AI) - 5 years Deep Learning - 5 years Scikit-learn - 4 years Natural Language Processing (NLP) - 4 years PyTorch - 2 years ChatGPT - 1 year Language Models - 1 year OpenAI GPT-4 API - 1 year

Availability

Part-time

Preferred Environment

Ubuntu Linux, Visual Studio Code (VS Code), Docker, Docker Compose, Python 3

The most amazing...

...thing I've designed and developed is a neural network to extract data from semi-structured documents with minimal labeling effort.

Work Experience

Senior Data Scientist

2022 - PRESENT

Zyte

Implemented several proof of concept (PoC) work using ChatGPT and OpenAI API (GPT 3.5 and GPT4) for a variety of tasks, including automatic labeling, active learning, automatic report generation, data extraction, etc.
Set up a load testing pipeline and automatic reporting to track the staging and production performance of a machine learning service.
Researched open source large language models (LLM) and ran cost comparisons to leverage the use of OpenAI vs. Open Source LLMs for various NLP tasks.
Added features and verticals to existing machine learning products and performed iterations of the machine learning lifecycle to ensure the best possible quality in the available development time.

Technologies: Python 3, PyTorch, ChatGPT, OpenAI GPT-4 API, OpenAI GPT-3 API, Active Learning, Data Extraction, Kubernetes, Docker, Ubuntu Linux, Pandas, Google Compute Engine (GCE), Artificial Intelligence (AI), Data Scraping, Locust, Load Testing, Language Models

Data Scientist | MLOps

2019 - PRESENT

GFT Technologies

Designed and implemented a data extraction pipeline to extract specific target data from semi-structured documents, such as total quantity paid in bill scans.
Developed a custom neural network from scratch to perform the cognitive extraction from semi-structured documents.
Implemented an automatic page rotation detection software optimizing time execution while keeping its optimum performance quality.
Deployed a document classification solution developed by another team on the Google Cloud Platform.
Helped integrate and productionalize a data extraction pipeline in a REST API by building a correct and stateful application containing the AI system.
Devised and implemented an active learning procedure for state-of-the-art named-entity recognition models that drastically reduced the labeling time and costs for a particular entity extraction project.
Mediated key communications with an important American automobile manufacturer in a data engineering and processing project to detect and solve on-site issues with the IoT sensors that obtained the information.

Technologies: Algorithms, Artificial Intelligence (AI), Classification Algorithms, Python 3, PIP, Scikit-learn, Scikit-image, TensorFlow, Keras, Hugging Face, NumPy, SciPy, BERT, Data Representation, Databases, Text Classification, Named-entity Recognition (NER), Active Learning, Data Extraction, Exploratory Data Analysis, Continuous Deployment, Google Cloud Platform (GCP), DVC, MLflow, Pandas

Data Scientist for NLP

2018 - 2019

PFS Group

Developed, tested, and integrated a natural language processing system for text classification, entity recognition, and information retrieval for legal documents.
Researched state-of-the-art solutions for intelligent document navigability and explainable AI for document classification.
Mediated key communication between the stakeholders with business knowledge about legal documents (end users) to capture the requirements that the AI system should have.

Technologies: Classification Algorithms, BERT, Artificial Intelligence (AI), Algorithms, Text Classification, Named-entity Recognition (NER), Python 3, Flask, Docker, Docker Compose, Jupyter, Jupyter Notebook, Computer Science, LSTM Networks, Neural Networks

Data Scientist for Computer Vision

2017 - 2018

Mybrana

Developed and tested the face tracking module in C++ for a mobile SDK for face detection and face alignment. This module allowed the placement of augmented reality content on the users' faces with the phone camera.
Built and tested the simultaneous localization and mapping (SLAM) module in C++ for a mobile SDK camera localization. This module allowed the placement of augmented reality content on a scanned surface using visual odometry technology.
Helped integrate and test the augmented reality modules in iOS and Android mobile apps.

Technologies: C++11, Algorithms, Computer Vision, Simultaneous Localization & Mapping (SLAM), Eye Tracking, Tracking, Facial Recognition, Artificial Intelligence (AI), Ensemble Methods, Visual Odometry

Experience

Document Classification and Data Extraction System for Legal Documents

A Python-based library for digitizing legal documents, classifying them into a preset number of possible classes, determining the document's author (jury, lawyer, or attorney), and extracting valuable information such as the process identification number, attorney, defendant, and date.

I developed the library and worked on integrating it into a scalable service deployed in Kubernetes. I also ensured the reusability of this library's modules and its proper documentation.

Insurance Document Processing with Complete Machine Learning Lifecycle

A serverless application in Google Cloud Platform to process insurance documents.

As a data scientist, I developed the page-wise classifier and extracted information from the pages using Google's data extraction services and a custom named-entity recognition solution.

The application UI lets the user correct the machine learning system's predictions. I was in charge of using this user feedback to retrain the models and redeploy them using a continuous delivery pipeline.

Customer Lifetime Value Analysis and Prediction for a Bank Client

An exploratory data analysis and predictive analytics for the customer lifetime value of bank clients.

I acted as the data scientist, gathering, exploring, and analyzing data from the bank to understand the bank's requirements and get valuable insights and statistics about the customer lifetime value of the bank's clients.

In particular, stratification of the bank's clients into groups by age, income, and financial score was done before the yearly analysis of their behavior to statistically model their attrition, product reorder probability, and other valuable financial information.

IoT Data Processing and Analysis Tool for a Car Manufacturing Client

A Google Vertex AI-based exploratory data analysis tool that processes IoT messages from a car manufacturer's plant.

The client needed a data visualization tool to quickly obtain insights from the manufacturing plant and how their machines operated and correlation analysis between changeable input material properties—such as oiling, blank thickness, or elongation—and blank output quality, measured with a provided vision system.

As the data scientist, I oversaw consuming the particular messages needed for the analysis required by the client, creating the visualization app with Plotly, generating a correlation analysis with Python, and reporting to the client. Also, I found several on-site problems in the data and reported them in an agile way to ensure they had enough information to fix them and provide quality data.

Education

2016 - 2017

Master's Degree in Artificial Intelligence, Pattern Recognition, and Digital Imaging

Polytechnic University of Valencia - Valencia, Spain

2012 - 2016

Bachelor's Degree in Computer Engineering

Polytechnic University of Valencia - Valencia, Spain

Skills

Libraries/APIs

Pandas, Scikit-learn, TensorFlow, Keras, NumPy, SciPy, PyTorch

Tools

Git, Jupyter, ChatGPT, Docker Compose, Named-entity Recognition (NER), Hidden Markov Model, Scikit-image, Google AI Platform, Plotly, Google Compute Engine (GCE)

Languages

Python 3, C++11

Paradigms

Data Science, Object-oriented Programming (OOP), Distributed Computing, Continuous Deployment, Load Testing

Platforms

Docker, Jupyter Notebook, Visual Studio Code (VS Code), Google Cloud Platform (GCP), Ubuntu Linux, Kubernetes

Storage

Databases, Google Cloud Storage, Google Cloud

Frameworks

Locust, Flask

Other

Machine Learning, Artificial Intelligence (AI), Pattern Recognition, Programming, Natural Language Processing (NLP), Neural Networks, Deep Neural Networks, Classification Algorithms, Text Classification, Computer Science, LSTM Networks, GPT, Generative Pre-trained Transformers (GPT), OpenAI GPT-4 API, OpenAI GPT-3 API, Language Models, Deep Learning, Data Representation, Convolutional Neural Networks (CNN), Recurrent Neural Networks (RNNs), Regular Expressions, PIP, Hugging Face, BERT, Active Learning, Data Extraction, Exploratory Data Analysis, DVC, MLflow, Algorithms, Metaheuristics, Operating Systems, Sorting Algorithms, Logistic Regression, Linear Regression, Information Theory, Information Retrieval, Linear Optimization, Genetic Algorithms, Stochastic Modeling, Computer Vision, Simultaneous Localization & Mapping (SLAM), Eye Tracking, Facial Recognition, Ensemble Methods, Visual Odometry, Documentation, Reporting, Analysis, Customer Data, Data Analytics, Data Processing, Correspondence Analysis (CA), Data Quality, Data Quality Analysis, Tracking, Data Scraping

Collaboration That Works

How to Work with Toptal

Toptal matches you directly with global industry experts from our network in hours—not weeks or months.

Share your needs

Discuss your requirements and refine your scope in a call with a Toptal domain expert.

Choose your talent

Get a short list of expertly matched talent within 24 hours to review, interview, and choose from.

Start your risk-free talent trial

Work with your chosen talent on a trial basis for up to two weeks. Pay only if you decide to hire them.

Top talent is in high demand.

Start hiring