Javier Garcia de Leaniz, Developer in Madrid, Spain
Javier is available for hire
Hire Javier

Javier Garcia de Leaniz

Verified Expert  in Engineering

Natural Language Processing (NLP) Developer

Madrid, Spain

Toptal member since October 30, 2021

Bio

Javier is an engineer with over nine years of experience in AI and data science. Beyond his expertise in natural language processing, large language models, machine learning, and software engineering, Javier's unique strength lies in harmonizing business with technology. His consulting tenures at EY and Accenture have furnished him with invaluable experience, where he successfully implemented data and AI technology across diverse industries and geographies globally.

Portfolio

Self-employed
Amazon Web Services (AWS), Flask, Natural Language Processing (NLP), React, SQL...
Smart Retrieval
Artificial Intelligence (AI), ChatGPT, OpenAI GPT-3 API, OpenAI GPT-4 API...
Explore My Store Pty Ltd
Artificial Intelligence (AI), Azure, Azure Cognitive Services, OpenAI GPT-4 API...

Experience

  • Python - 7 years
  • Natural Language Processing (NLP) - 7 years
  • Machine Learning - 7 years
  • Artificial Intelligence (AI) - 7 years
  • Azure - 5 years
  • Amazon Web Services (AWS) - 1 year
  • OpenAI GPT-3 API - 1 year
  • OpenAI GPT-4 API - 1 year

Availability

Part-time

Preferred Environment

Azure, Visual Studio Code (VS Code), MacOS, Linux, Amazon Web Services (AWS), Windows

The most amazing...

...product I've developed was a generative AI tool that structures business data, documents, emails, and voice recordings and has processed millions of documents.

Work Experience

AI and Full-stack Engineer

2023 - PRESENT
Self-employed
  • Developed a web app that allows users to search for restaurants in natural language based on their characteristics. Developed the full user interface, back end, AI modules, and CI/CD pipelines.
  • Designed and deployed the whole architecture using AWS stack such as Amazon EC2, Amazon RDS, Amazon S3, Elastic Load Balancing (ELB), etc.
  • Designed a RAG pipeline, prompt-engineering a query parsing module and keyword matching functionalities using full-text search and LLMs such as GPT-3.5 and GPT-4o.
Technologies: Amazon Web Services (AWS), Flask, Natural Language Processing (NLP), React, SQL, PostgreSQL, Python, OpenAI GPT-3 API, SQLAlchemy, OpenAI GPT-4 API, Information Retrieval, Prompt Engineering, Cognitive Computing, OpenAI, Custom Models, Web Scraping, Generative Pre-trained Transformer 3 (GPT-3), Data Science, Full-stack, Large Language Models (LLMs), Retrieval-augmented Generation (RAG), AI Prompts, APIs, Product Management, Supervised Learning, OpenAI API, Claude

CTO

2023 - PRESENT
Smart Retrieval
  • Led the technical strategy and product development of the company.
  • Deployed the platform to Azure using Azure DevOps CI/CD pipelines.
  • Developed a retrieval-augmented generation (RAG) pipeline to allow search in natural language over business documents such as financial statements, invoices, contracts, and more, leveraging OpenAI's GPT services.
  • Optimized the performance of the LLM-based functionalities by applying prompt engineering, fine-tuning LLMs, and using open-source LLMs such as Llama 3.
Technologies: Artificial Intelligence (AI), ChatGPT, OpenAI GPT-3 API, OpenAI GPT-4 API, Azure, Python, OpenAI, B2B, Automation, Generative Pre-trained Transformer 3 (GPT-3), Data Science, Fine-tuning, Llama 3, Full-stack, Open-source LLMs, Large Language Models (LLMs), Retrieval-augmented Generation (RAG), LlamaIndex, Prompt Engineering, AI Prompts, Machine Learning Operations (MLOps), PDF, Llama, APIs, Product Management, Supervised Learning, Document Parsing, Pattern Recognition, OpenAI API

AI Engineer

2023 - 2024
Explore My Store Pty Ltd
  • Created a web scraping process that obtained full details on 1.5 million products from 1400 eCommerce sites and optimized the process to detect product changes without requiring a full re-scrape. This optimization resulted in over 60% cost reduction.
  • Developed an ETL to load and transform the scraped data into an Azure Cosmos DB, enriching the data using LLMs to determine product categories, create embeddings to allow vector search, and more.
  • Configured and optimized an Azure Search service, working with the product team to integrate it with the web application. This included search functionalities such as full-text search, vector search, facets, filters, spelling correction, etc.
  • Developed a web scraping process to obtain full details on eCommerce stores, such as the company logo, about us section, payment methods accepted, social media links, and more.
  • Developed a process to determine if a website is down, detecting edge cases such as deactivated Shopify stores, domains for sale, websites under maintenance, and more.
Technologies: Artificial Intelligence (AI), Azure, Azure Cognitive Services, OpenAI GPT-4 API, ChatGPT, Generative Artificial Intelligence (GenAI), Azure Search, Python, Azure Cosmos DB, Web Scraping, Large Language Models (LLMs), Prompt Engineering, Data Science, Full-stack, AI Prompts, Large Data Sets, eCommerce, Artificial Neural Networks (ANN), Supervised Learning, OpenAI API

Lead Data Scientist

2016 - 2023
EY
  • Led a multidisciplinary product team of 50+ team members building AI-driven products with a focus on NLP and generative AI.
  • Developed, trained, and evolved multiple models for different functionalities, including layout detection, document classification, named entity recognition, and question-answering and section ranking models.
  • Developed and trained a deep learning model (CNN) that cleans lines, stains, scribbles, and other imperfections on invoice images to improve the downstream accuracy of an OCR engine.
  • Trained a gradient-boosting classifier to evaluate the severity of changes in baseline FATCA and CRS regulatory texts compared to local implementations. Achieved a cross-validated F1 score of 92.5.
  • Implemented a topic modeling LDA model on US FDA reports to obtain insights for a wealth and asset management firm seeking investments in pharmaceutical companies.
Technologies: Python, Natural Language Processing (NLP), Generative Pre-trained Transformers (GPT), Machine Learning, Docker, Artificial Intelligence (AI), Azure, ChatGPT, OpenAI GPT-4 API, OpenAI GPT-3 API, Agile Software Development, Azure Cognitive Services, OpenAI, Data Scraping, Architecture, API Integration, B2B, Automation, Integration, Consulting, Data Science, Full-stack, Large Data Sets, Machine Learning Operations (MLOps), PDF, APIs, Databricks, GPU Computing, Speech Recognition, Speech to Text, Graphics Processing Unit (GPU), Speech Analytics, Deep Learning, PyTorch, Electronic Health Records (EHR), Machine Learning Algorithms, Product Management, Data Analysis, Supervised Learning, Document Parsing, Pattern Recognition, OpenAI API, LangChain

Data Engineer

2014 - 2016
Accenture
  • Designed and developed risk assessment processes for multichannel applications (smartphone app, web, ATM, bank branch) of a Spanish international bank.
  • Developed data pipelines for the risk assessment process of credit cards and online personal loans.
  • Developed SQL queries to analyze risk customer data indicators.
Technologies: SQL, Scrum, Data Engineering, Data Analysis

Insurance Claim Payment Automation

NLP models that I trained and developed to identify, extract, and structure data from veterinary invoices to allow for the reimbursement of animal health insurance claims.

I developed a pipeline consisting of OCR, layout detection techniques, and named-entity recognition (NER) models to extract the relevant information from the invoices accurately. I also built the validation module to identify and validate medical diagnoses against the policyholder coverage.
Finally, I developed the extraction confidence methodology to help determine claims reimbursements to be processed automatically or reviewed by a human, depending on the different models' confidence and business rules.

Mortgage Contract Audit Automation

NLP models that I trained and developed to identify key data points from mortgage contracts to allow automatic audit and validation of data in actual contracts vs the ERP.

I trained a model to classify between main contracts and their annexes, extensions, and modifications using TF-IDF features to train a classifier. I also developed the validation module to disambiguate and match contracts and DB rows and perform the comparison to highlight differences.

Tax Relief Application Eligibility

NLP models that I built and trained to extract key data points from various documents such as invoices, mortgage payments, paychecks, and more.

The goal was to increase the efficiency of the application process for a tax relief program offered by the government due to COVID-19 that received millions of requests.

I developed a pipeline consisting of handwritten text detection, layout detection techniques, classification (to detect the document type), and NER (named-entity recognition) models to extract the relevant information from the documents accurately. I also developed the confidence module that prioritized manual review of applications based on business rules and models' confidence.

Invoice Validation Automation

I developed an AI-driven system to automate the analysis of financial documents in construction processes, targeting the high volume of invoices, purchase orders, and goods received notes. This project aimed to detect mismatches and inconsistencies to prevent financial losses.

The project's pipeline started with OCR technology to extract text from scanned documents accurately. I employed named-entity recognition (NER) models to identify and categorize key data points within these texts, such as vendor names, dates, and amounts. An important part of the project was the development of classification models to accurately detect and categorize different document types, automatically detecting its relevant data points.

Additionally, I implemented fuzzy matching algorithms to link items listed in invoices with corresponding entries in purchase orders. This approach was key in identifying mismatches and inconsistencies.
DECEMBER 2018 - PRESENT

Natural Language Processing Nanodegree

Udacity

DECEMBER 2017 - PRESENT

Machine Learning Engineer Nanodegree

Udacity

Libraries/APIs

OpenAI API, SpaCy, Scikit-learn, Pandas, Azure Cognitive Services, Keras, React, SQLAlchemy, PyTorch

Tools

ChatGPT, AI Prompts, Named-entity Recognition (NER), Azure Search

Languages

Python, SQL

Frameworks

LlamaIndex, Flask

Paradigms

Automation, Scrum, Agile Software Development, B2B

Platforms

Azure, Docker, Amazon Web Services (AWS), Visual Studio Code (VS Code), MacOS, Linux, Windows, Databricks

Storage

PostgreSQL, Azure Cosmos DB

Other

Natural Language Processing (NLP), Generative Pre-trained Transformers (GPT), Artificial Intelligence (AI), Prompt Engineering, OpenAI, Large Language Models (LLMs), Retrieval-augmented Generation (RAG), Document Parsing, Pattern Recognition, Machine Learning, Deep Learning, OpenAI GPT-4 API, OpenAI GPT-3 API, Data Scraping, Data Science, Large Data Sets, Machine Learning Operations (MLOps), PDF, Llama, APIs, Artificial Neural Networks (ANN), Machine Learning Algorithms, Product Management, Supervised Learning, Computer Vision, Optical Character Recognition (OCR), Handwriting Recognition, Text Classification, Tf-idf, Information Retrieval, Cognitive Computing, Custom Models, Web Scraping, Architecture, API Integration, Integration, Consulting, Generative Pre-trained Transformer 3 (GPT-3), Generative Artificial Intelligence (GenAI), Fine-tuning, Llama 3, Full-stack, Open-source LLMs, Data Engineering, GPU Computing, Speech Recognition, Speech to Text, Graphics Processing Unit (GPU), Speech Analytics, eCommerce, Electronic Health Records (EHR), Data Analysis, Claude, LangChain

Collaboration That Works

How to Work with Toptal

Toptal matches you directly with global industry experts from our network in hours—not weeks or months.

1

Share your needs

Discuss your requirements and refine your scope in a call with a Toptal domain expert.
2

Choose your talent

Get a short list of expertly matched talent within 24 hours to review, interview, and choose from.
3

Start your risk-free talent trial

Work with your chosen talent on a trial basis for up to two weeks. Pay only if you decide to hire them.

Top talent is in high demand.

Start hiring