Javier Garcia de Leaniz
Verified Expert in Engineering
Natural Language Processing (NLP) Developer
Javier is an engineer with over nine years of experience in AI and data science. Beyond his expertise in machine learning, natural language processing, and software engineering, Javier's unique strength lies in harmonizing business with technology. His consulting tenures at EY and Accenture have furnished him with invaluable experience, where he successfully implemented data and AI technology across diverse industries and geographies globally.
Portfolio
Experience
Availability
Preferred Environment
Azure, Visual Studio Code (VS Code), MacOS, Linux, Amazon Web Services (AWS), Windows
The most amazing...
...product I've developed is a Generative AI tool that structures business data, documents, emails, and voice recordings and has processed millions of documents.
Work Experience
AI and Full-stack Engineer
Self-employed
- Developed a web app that allows users to search for restaurants in a natural language based on their characteristics.
- Designed and deployed a simple architecture using AWS stack such as Amazon EC2, Amazon RDS, Amazon S3, Elastic Load Balancing (ELB), etc.
- Designed a RAG pipeline, prompt-engineering a query parsing module and keyword matching functionalities using full-text search.
CTO
Smart Retrieval
- Led the technical strategy and product development of the company.
- Deployed the platform to Azure using Azure DevOps CI/CD pipelines.
- Developed a retrieval-augmented generation (RAG) pipeline to allow search in natural language over business documents such as financial statements, invoices, contracts, and more, leveraging OpenAI's GPT services.
Lead Data Scientist
EY
- Led a multidisciplinary product team of 50+ team members building AI-driven products with a focus on NLP and generative AI.
- Developed, trained, and evolved multiple models for different functionalities, including layout detection, document classification, named entity recognition, and question-answering and section ranking models.
- Developed and trained a deep learning model (CNN) that cleans lines, stains, scribbles, and other imperfections on invoice images to improve the downstream accuracy of an OCR engine.
- Trained a gradient-boosting classifier to evaluate the severity of changes in baseline FATCA and CRS regulatory texts compared to local implementations. Achieved a cross-validated F1 score of 92.5.
- Implemented a topic modeling LDA model on US FDA reports to obtain insights for a wealth and asset management firm seeking investments in pharmaceutical companies.
Data Engineer
Accenture
- Designed and developed risk assessment processes for multichannel applications (smartphone app, web, ATM, bank branch) of a Spanish international bank.
- Developed data pipelines for the risk assessment process of credit cards and online personal loans.
- Developed SQL queries to analyze risk customer data indicators.
Experience
Insurance Claim Payment Automation
I developed a pipeline consisting of OCR, layout detection techniques, and named-entity recognition (NER) models to extract the relevant information from the invoices accurately. I also built the validation module to identify and validate medical diagnoses against the policyholder coverage.
Finally, I developed the extraction confidence methodology to help determine claims reimbursements to be processed automatically or reviewed by a human, depending on the different models' confidence and business rules.
Mortgage Contract Audit Automation
I trained a model to classify between main contracts and their annexes, extensions, and modifications using TF-IDF features to train a classifier. I also developed the validation module to disambiguate and match contracts and DB rows and perform the comparison to highlight differences.
Tax Relief Application Eligibility
The goal was to increase the efficiency of the application process for a tax relief program offered by the government due to COVID-19 that received millions of requests.
I developed a pipeline consisting of handwritten text detection, layout detection techniques, classification (to detect the document type), and NER (named-entity recognition) models to extract the relevant information from the documents accurately. I also developed the confidence module that prioritized manual review of applications based on business rules and models' confidence.
Invoice Validation Automation
The project's pipeline started with OCR technology to extract text from scanned documents accurately. I employed named-entity recognition (NER) models to identify and categorize key data points within these texts, such as vendor names, dates, and amounts. An important part of the project was the development of classification models to accurately detect and categorize different document types, automatically detecting its relevant data points.
Additionally, I implemented fuzzy matching algorithms to link items listed in invoices with corresponding entries in purchase orders. This approach was key in identifying mismatches and inconsistencies.
Skills
Languages
Python, SQL
Other
Natural Language Processing (NLP), GPT, Generative Pre-trained Transformers (GPT), Artificial Intelligence (AI), Machine Learning, Deep Learning, OpenAI GPT-4 API, OpenAI GPT-3 API, OpenAI, Data Scraping, Computer Vision, OCR, Handwriting Recognition, Text Classification, Tf-idf, Information Retrieval, Prompt Engineering, Cognitive Computing, Custom Models, Web Scraping, Architecture, API Integration, Integration, Consulting, Generative Pre-trained Transformer 3 (GPT-3)
Libraries/APIs
SpaCy, Scikit-learn, Pandas, Azure Cognitive Services, Keras, React, SQLAlchemy
Tools
ChatGPT, Named-entity Recognition (NER)
Paradigms
Automation, Scrum, Agile Software Development, B2B
Platforms
Docker, Azure, Visual Studio Code (VS Code), MacOS, Linux, Amazon Web Services (AWS), Windows
Frameworks
Flask
Storage
PostgreSQL
Certifications
Natural Language Processing Nanodegree
Udacity
Machine Learning Engineer Nanodegree
Udacity
How to Work with Toptal
Toptal matches you directly with global industry experts from our network in hours—not weeks or months.
Share your needs
Choose your talent
Start your risk-free talent trial
Top talent is in high demand.
Start hiring