Javier Garcia de Leaniz, Developer in Madrid, Spain
Javier is available for hire
Hire Javier

Javier Garcia de Leaniz

Verified Expert  in Engineering

Natural Language Processing (NLP) Developer

Madrid, Spain
Toptal Member Since
October 30, 2021

Javier is an engineer with over nine years of experience in AI and data science. Beyond his expertise in machine learning, natural language processing, and software engineering, Javier's unique strength lies in harmonizing business with technology. His consulting tenures at EY and Accenture have furnished him with invaluable experience, where he successfully implemented data and AI technology across diverse industries and geographies globally.


Amazon Web Services (AWS), Flask, Natural Language Processing (NLP), React, SQL...
Smart Retrieval
Artificial Intelligence (AI), ChatGPT, OpenAI GPT-3 API, OpenAI GPT-4 API...
Python, Generative Pre-trained Transformers (GPT)...




Preferred Environment

Azure, Visual Studio Code (VS Code), MacOS, Linux, Amazon Web Services (AWS), Windows

The most amazing...

...product I've developed is a Generative AI tool that structures business data, documents, emails, and voice recordings and has processed millions of documents.

Work Experience

AI and Full-stack Engineer

2023 - PRESENT
  • Developed a web app that allows users to search for restaurants in a natural language based on their characteristics.
  • Designed and deployed a simple architecture using AWS stack such as Amazon EC2, Amazon RDS, Amazon S3, Elastic Load Balancing (ELB), etc.
  • Designed a RAG pipeline, prompt-engineering a query parsing module and keyword matching functionalities using full-text search.
Technologies: Amazon Web Services (AWS), Flask, Natural Language Processing (NLP), React, SQL, PostgreSQL, Python, OpenAI GPT-3 API, SQLAlchemy, OpenAI GPT-4 API, Information Retrieval, Prompt Engineering, Cognitive Computing, OpenAI, Custom Models, Web Scraping, Generative Pre-trained Transformer 3 (GPT-3)


2023 - PRESENT
Smart Retrieval
  • Led the technical strategy and product development of the company.
  • Deployed the platform to Azure using Azure DevOps CI/CD pipelines.
  • Developed a retrieval-augmented generation (RAG) pipeline to allow search in natural language over business documents such as financial statements, invoices, contracts, and more, leveraging OpenAI's GPT services.
Technologies: Artificial Intelligence (AI), ChatGPT, OpenAI GPT-3 API, OpenAI GPT-4 API, Azure, Python, OpenAI, B2B, Automation, Generative Pre-trained Transformer 3 (GPT-3)

Lead Data Scientist

2016 - 2023
  • Led a multidisciplinary product team of 50+ team members building AI-driven products with a focus on NLP and generative AI.
  • Developed, trained, and evolved multiple models for different functionalities, including layout detection, document classification, named entity recognition, and question-answering and section ranking models.
  • Developed and trained a deep learning model (CNN) that cleans lines, stains, scribbles, and other imperfections on invoice images to improve the downstream accuracy of an OCR engine.
  • Trained a gradient-boosting classifier to evaluate the severity of changes in baseline FATCA and CRS regulatory texts compared to local implementations. Achieved a cross-validated F1 score of 92.5.
  • Implemented a topic modeling LDA model on US FDA reports to obtain insights for a wealth and asset management firm seeking investments in pharmaceutical companies.
Technologies: Python, Generative Pre-trained Transformers (GPT), Natural Language Processing (NLP), Machine Learning, Docker, Artificial Intelligence (AI), Azure, ChatGPT, OpenAI GPT-4 API, OpenAI GPT-3 API, Agile Software Development, Azure Cognitive Services, OpenAI, Data Scraping, Architecture, API Integration, B2B, Automation, Integration, Consulting

Data Engineer

2014 - 2016
  • Designed and developed risk assessment processes for multichannel applications (smartphone app, web, ATM, bank branch) of a Spanish international bank.
  • Developed data pipelines for the risk assessment process of credit cards and online personal loans.
  • Developed SQL queries to analyze risk customer data indicators.
Technologies: SQL, Scrum

Insurance Claim Payment Automation

NLP models that I trained and developed to identify, extract, and structure data from veterinary invoices to allow for the reimbursement of animal health insurance claims.

I developed a pipeline consisting of OCR, layout detection techniques, and named-entity recognition (NER) models to extract the relevant information from the invoices accurately. I also built the validation module to identify and validate medical diagnoses against the policyholder coverage.
Finally, I developed the extraction confidence methodology to help determine claims reimbursements to be processed automatically or reviewed by a human, depending on the different models' confidence and business rules.

Mortgage Contract Audit Automation

NLP models that I trained and developed to identify key data points from mortgage contracts to allow automatic audit and validation of data in actual contracts vs the ERP.

I trained a model to classify between main contracts and their annexes, extensions, and modifications using TF-IDF features to train a classifier. I also developed the validation module to disambiguate and match contracts and DB rows and perform the comparison to highlight differences.

Tax Relief Application Eligibility

NLP models that I built and trained to extract key data points from various documents such as invoices, mortgage payments, paychecks, and more.

The goal was to increase the efficiency of the application process for a tax relief program offered by the government due to COVID-19 that received millions of requests.

I developed a pipeline consisting of handwritten text detection, layout detection techniques, classification (to detect the document type), and NER (named-entity recognition) models to extract the relevant information from the documents accurately. I also developed the confidence module that prioritized manual review of applications based on business rules and models' confidence.

Invoice Validation Automation

I developed an AI-driven system to automate the analysis of financial documents in construction processes, targeting the high volume of invoices, purchase orders, and goods received notes. This project aimed to detect mismatches and inconsistencies to prevent financial losses.

The project's pipeline started with OCR technology to extract text from scanned documents accurately. I employed named-entity recognition (NER) models to identify and categorize key data points within these texts, such as vendor names, dates, and amounts. An important part of the project was the development of classification models to accurately detect and categorize different document types, automatically detecting its relevant data points.

Additionally, I implemented fuzzy matching algorithms to link items listed in invoices with corresponding entries in purchase orders. This approach was key in identifying mismatches and inconsistencies.

Natural Language Processing Nanodegree



Machine Learning Engineer Nanodegree



SpaCy, Scikit-learn, Pandas, Azure Cognitive Services, Keras, React, SQLAlchemy


ChatGPT, Named-entity Recognition (NER)


Python, SQL


Automation, Scrum, Agile Software Development, B2B




Docker, Azure, Visual Studio Code (VS Code), MacOS, Linux, Amazon Web Services (AWS), Windows




Natural Language Processing (NLP), Generative Pre-trained Transformers (GPT), Artificial Intelligence (AI), Machine Learning, Deep Learning, OpenAI GPT-4 API, OpenAI GPT-3 API, OpenAI, Data Scraping, Computer Vision, OCR, Handwriting Recognition, Text Classification, Tf-idf, Information Retrieval, Prompt Engineering, Cognitive Computing, Custom Models, Web Scraping, Architecture, API Integration, Integration, Consulting, Generative Pre-trained Transformer 3 (GPT-3)

Collaboration That Works

How to Work with Toptal

Toptal matches you directly with global industry experts from our network in hours—not weeks or months.


Share your needs

Discuss your requirements and refine your scope in a call with a Toptal domain expert.

Choose your talent

Get a short list of expertly matched talent within 24 hours to review, interview, and choose from.

Start your risk-free talent trial

Work with your chosen talent on a trial basis for up to two weeks. Pay only if you decide to hire them.

Top talent is in high demand.

Start hiring