Javier Garcia de Leaniz
Verified Expert in Engineering
Natural Language Processing (NLP) Developer
Madrid, Spain
Toptal member since October 30, 2021
Javier is an engineer with over nine years of experience in AI and data science. Beyond his expertise in natural language processing, large language models, machine learning, and software engineering, Javier's unique strength lies in harmonizing business with technology. His consulting tenures at EY and Accenture have furnished him with invaluable experience, where he successfully implemented data and AI technology across diverse industries and geographies globally.
Portfolio
Experience
- Python - 7 years
- Natural Language Processing (NLP) - 7 years
- Machine Learning - 7 years
- Artificial Intelligence (AI) - 7 years
- Azure - 5 years
- Amazon Web Services (AWS) - 1 year
- OpenAI GPT-3 API - 1 year
- OpenAI GPT-4 API - 1 year
Availability
Preferred Environment
Azure, Visual Studio Code (VS Code), MacOS, Linux, Amazon Web Services (AWS), Windows
The most amazing...
...product I've developed was a generative AI tool that structures business data, documents, emails, and voice recordings and has processed millions of documents.
Work Experience
AI and Full-stack Engineer
Self-employed
- Developed a web app that allows users to search for restaurants in natural language based on their characteristics. Developed the full user interface, back end, AI modules, and CI/CD pipelines.
- Designed and deployed the whole architecture using AWS stack such as Amazon EC2, Amazon RDS, Amazon S3, Elastic Load Balancing (ELB), etc.
- Designed a RAG pipeline, prompt-engineering a query parsing module and keyword matching functionalities using full-text search and LLMs such as GPT-3.5 and GPT-4o.
CTO
Smart Retrieval
- Led the technical strategy and product development of the company.
- Deployed the platform to Azure using Azure DevOps CI/CD pipelines.
- Developed a retrieval-augmented generation (RAG) pipeline to allow search in natural language over business documents such as financial statements, invoices, contracts, and more, leveraging OpenAI's GPT services.
- Optimized the performance of the LLM-based functionalities by applying prompt engineering, fine-tuning LLMs, and using open-source LLMs such as Llama 3.
AI Engineer
Explore My Store Pty Ltd
- Created a web scraping process that obtained full details on 1.5 million products from 1400 eCommerce sites and optimized the process to detect product changes without requiring a full re-scrape. This optimization resulted in over 60% cost reduction.
- Developed an ETL to load and transform the scraped data into an Azure Cosmos DB, enriching the data using LLMs to determine product categories, create embeddings to allow vector search, and more.
- Configured and optimized an Azure Search service, working with the product team to integrate it with the web application. This included search functionalities such as full-text search, vector search, facets, filters, spelling correction, etc.
- Developed a web scraping process to obtain full details on eCommerce stores, such as the company logo, about us section, payment methods accepted, social media links, and more.
- Developed a process to determine if a website is down, detecting edge cases such as deactivated Shopify stores, domains for sale, websites under maintenance, and more.
Lead Data Scientist
EY
- Led a multidisciplinary product team of 50+ team members building AI-driven products with a focus on NLP and generative AI.
- Developed, trained, and evolved multiple models for different functionalities, including layout detection, document classification, named entity recognition, and question-answering and section ranking models.
- Developed and trained a deep learning model (CNN) that cleans lines, stains, scribbles, and other imperfections on invoice images to improve the downstream accuracy of an OCR engine.
- Trained a gradient-boosting classifier to evaluate the severity of changes in baseline FATCA and CRS regulatory texts compared to local implementations. Achieved a cross-validated F1 score of 92.5.
- Implemented a topic modeling LDA model on US FDA reports to obtain insights for a wealth and asset management firm seeking investments in pharmaceutical companies.
Data Engineer
Accenture
- Designed and developed risk assessment processes for multichannel applications (smartphone app, web, ATM, bank branch) of a Spanish international bank.
- Developed data pipelines for the risk assessment process of credit cards and online personal loans.
- Developed SQL queries to analyze risk customer data indicators.
Experience
Insurance Claim Payment Automation
I developed a pipeline consisting of OCR, layout detection techniques, and named-entity recognition (NER) models to extract the relevant information from the invoices accurately. I also built the validation module to identify and validate medical diagnoses against the policyholder coverage.
Finally, I developed the extraction confidence methodology to help determine claims reimbursements to be processed automatically or reviewed by a human, depending on the different models' confidence and business rules.
Mortgage Contract Audit Automation
I trained a model to classify between main contracts and their annexes, extensions, and modifications using TF-IDF features to train a classifier. I also developed the validation module to disambiguate and match contracts and DB rows and perform the comparison to highlight differences.
Tax Relief Application Eligibility
The goal was to increase the efficiency of the application process for a tax relief program offered by the government due to COVID-19 that received millions of requests.
I developed a pipeline consisting of handwritten text detection, layout detection techniques, classification (to detect the document type), and NER (named-entity recognition) models to extract the relevant information from the documents accurately. I also developed the confidence module that prioritized manual review of applications based on business rules and models' confidence.
Invoice Validation Automation
The project's pipeline started with OCR technology to extract text from scanned documents accurately. I employed named-entity recognition (NER) models to identify and categorize key data points within these texts, such as vendor names, dates, and amounts. An important part of the project was the development of classification models to accurately detect and categorize different document types, automatically detecting its relevant data points.
Additionally, I implemented fuzzy matching algorithms to link items listed in invoices with corresponding entries in purchase orders. This approach was key in identifying mismatches and inconsistencies.
Certifications
Natural Language Processing Nanodegree
Udacity
Machine Learning Engineer Nanodegree
Udacity
Skills
Libraries/APIs
OpenAI API, SpaCy, Scikit-learn, Pandas, Azure Cognitive Services, Keras, React, SQLAlchemy, PyTorch
Tools
ChatGPT, AI Prompts, Named-entity Recognition (NER), Azure Search
Languages
Python, SQL
Frameworks
LlamaIndex, Flask
Paradigms
Automation, Scrum, Agile Software Development, B2B
Platforms
Azure, Docker, Amazon Web Services (AWS), Visual Studio Code (VS Code), MacOS, Linux, Windows, Databricks
Storage
PostgreSQL, Azure Cosmos DB
Other
Natural Language Processing (NLP), Generative Pre-trained Transformers (GPT), Artificial Intelligence (AI), Prompt Engineering, OpenAI, Large Language Models (LLMs), Retrieval-augmented Generation (RAG), Document Parsing, Pattern Recognition, Machine Learning, Deep Learning, OpenAI GPT-4 API, OpenAI GPT-3 API, Data Scraping, Data Science, Large Data Sets, Machine Learning Operations (MLOps), PDF, Llama, APIs, Artificial Neural Networks (ANN), Machine Learning Algorithms, Product Management, Supervised Learning, Computer Vision, Optical Character Recognition (OCR), Handwriting Recognition, Text Classification, Tf-idf, Information Retrieval, Cognitive Computing, Custom Models, Web Scraping, Architecture, API Integration, Integration, Consulting, Generative Pre-trained Transformer 3 (GPT-3), Generative Artificial Intelligence (GenAI), Fine-tuning, Llama 3, Full-stack, Open-source LLMs, Data Engineering, GPU Computing, Speech Recognition, Speech to Text, Graphics Processing Unit (GPU), Speech Analytics, eCommerce, Electronic Health Records (EHR), Data Analysis, Claude, LangChain
How to Work with Toptal
Toptal matches you directly with global industry experts from our network in hours—not weeks or months.
Share your needs
Choose your talent
Start your risk-free talent trial
Top talent is in high demand.
Start hiring