Akhil Lohia, Developer in Bengaluru, Karnataka, India
Akhil is available for hire
Hire Akhil

Akhil Lohia

Verified Expert  in Engineering

Bio

Akhil is a data scientist and economist by training with experience across academia and a wide variety of AI and ML corporate projects. He has modeled large volumes of customer clickstream data for end-to-end machine learning pipelines using Python, SQL, etc., as well as census, questionnaire, and RCT data in a research setting. He communicates extremely well and has worked with teams across time zones. Akhil is also adept at picking up new skills quickly.

Portfolio

Amazon.com
Amazon Athena, Python, SQL, Data Analytics, ComfyUI, Stable Diffusion...
UCSF - Main
Data Engineering, Dashboards, Python, CSV, Data Visualization, ETL, AWS Lambda...
Google
Python, SQL, Data Analytics, Looker, Data Management, Business Intelligence (BI)

Experience

  • Python - 5 years
  • Artificial Intelligence (AI) - 4 years
  • Machine Learning - 3 years
  • Data Science - 3 years
  • Large Language Models (LLMs) - 3 years
  • Data Analytics - 3 years
  • Amazon Web Services (AWS) - 2 years
  • Natural Language Processing (NLP) - 2 years

Availability

Full-time

Preferred Environment

Python, Git, PyCharm, Jupyter, Unix

The most amazing...

...project I've worked on was an LLM-based customer support chatbot for the largest online travel agency in India.

Work Experience

Applied Scientist II

2023 - PRESENT
Amazon.com
  • Implemented and productionalized inter-ranking of grocery buying options involving cart-level delivery fees.
  • Achieved continuous improvements to the FMA model with over 20 online experiments, resulting in over 500 million annualized units across all Amazon marketplaces.
  • Developed an internal LLM-based chatbot based on Anthropic Claude, allowing SQL query generation from natural language questions and helping democratize the usage of FMA logs for deep dives.
  • Conducted deep dives on business metrics like Lost Featured Offer to maximize utility for sellers from model updates.
Technologies: Amazon Athena, Python, SQL, Data Analytics, ComfyUI, Stable Diffusion, AWS Lambda, Deep Neural Networks (DNNs), Retrieval-augmented Generation (RAG), LangChain, Generative Artificial Intelligence (GenAI), ChatGPT, OpenAI, FastAPI, Computer Vision, AI Agents, Anthropic, Claude, MySQL, Data Management, Business Intelligence (BI)

Data Engineer

2022 - 2023
UCSF - Main
  • Developed data pipelines to migrate customer responses from a custom mobile app that tracks patients' medical conditions to Amazon Aurora.
  • Created multiple analytics dashboards to extract insights from medical data regarding patient performance over time.
  • Collaborated with stakeholders and another contractor to develop an API for fetching patient demographic data from a third-party system.
Technologies: Data Engineering, Dashboards, Python, CSV, Data Visualization, ETL, AWS Lambda, MySQL, Business Intelligence (BI)

Quantitative User Experience Researcher

2021 - 2022
Google
  • Conducted research to explain the relationship between real and perceived ROI for Google Ads users.
  • Identified customer segments likely to adopt new features for query translation and asset translation in Google Ads.
  • Developed custom dashboards built on ETL jobs for leadership to track customer retention and progress effectively.
Technologies: Python, SQL, Data Analytics, Looker, Data Management, Business Intelligence (BI)

Senior Data Scientist

2020 - 2021
eka.care
  • Developed a module that extracts relevant information from medical documents such as prescriptions, pathology lab reports, and vaccination certificates and makes them digitally available and searchable.
  • Used LayoutLM model to exploit position and to extract the key terms in medical documents.
  • Developed end-to-end pipeline from uploading documents to entity extraction, including document classification and manual data annotation steps on AWS ecosystem.
  • Collaborated on designing medically relevant hierarchies for different medical conditions and symptoms using SNOMED CT, which helped provide contextual options to doctors in their prescription pad.
Technologies: Python, Amazon Web Services (AWS), Deep Learning, Machine Learning, Data Science, Amazon S3 (AWS S3), Amazon Athena, Jupyter Notebook, Data Analysis, Requirements Analysis, TensorFlow, AWS Lambda, Deep Neural Networks (DNNs), Docker, Text Analytics, Text Classification, REST APIs, Optical Character Recognition (OCR), FastAPI, MongoDB, Computer Vision, MySQL, Data Management, Business Intelligence (BI)

Data Scientist

2020 - 2020
MYRM Technologies, LLC
  • De-duplicated and cross-referenced customer records to be inserted from a disorganized collection of spreadsheets into the Salesforce system.
  • Designed a database used to migrate Salesforce data to a RoR-based system.
  • Led imports from various sources into the Salesforce system for efficient tracking of leads and progression to different stages of deal completion.
Technologies: Pandas, Salesforce, Matching Systems, Jupyter Notebook, Amazon Athena, Data Analysis, Business Intelligence (BI)

Lead Data Scientist

2017 - 2020
MakeMyTrip
  • Developed a hotel-ranking model that used a user's recent interactions to show relevant results.
  • Built a user intent prediction model based on a customer's activity in the eCommerce funnel.
  • Constructed the NLP part of a chatbot for handling the post-sales requirements of the business.
  • Collaborated on the design of a feature marketplace—a kind of data warehouse that combined data from several sources for use by data science models.
  • Created a universal search for the travel domain which allowed users to search for hotels and flights using free text. This involved the application of NLP techniques to extract relevant fields from the text.
Technologies: Amazon SageMaker, PyTorch, Amazon Web Services (AWS), PySpark, Data Science, NumPy, Pandas, Apache Airflow, Redshift, Spark, Natural Language Processing (NLP), Generative Pre-trained Transformers (GPT), Machine Learning, Python, Artificial Intelligence (AI), Algorithms, Data Analysis, Amazon S3 (AWS S3), NoSQL, Amazon Athena, Jupyter Notebook, Microsoft Power BI, Data Build Tool (dbt), Snowflake, AWS Lambda, Deep Neural Networks (DNNs), Docker, Text Analytics, Text Classification, REST APIs, Chatbots, Chatbot Conversation Design, Azure, Azure Data Factory (ADF), MySQL, Data Management

Data Scientist | Analyst

2019 - 2019
Mix Tech (via Toptal)
  • Set up various dashboards over Redshift and Metabase to understand how the product was performing among different customer segments and devices.
  • Analyzed customer data and monitor stats like user retention, app installation/uninstallation rates, user engagement, daily/weekly/monthly/quarterly performance, and customer movement through the funnel, etc.
  • Developed a churn model using PySpark and Python which was used to target customers based on their probability of churn.
Technologies: Amazon Web Services (AWS), Data Analytics, Spark, Machine Learning, Metabase, SQL, Redshift, Python, Data Analysis, Data Modeling, Amazon S3 (AWS S3), Amazon Athena, Jupyter Notebook, Microsoft Power BI, Advertising Technology (Adtech), REST APIs, Azure, MySQL

Research Assistant

2015 - 2017
Universitat Pompeu Fabra
  • Developed a model linking household wealth to female infanticide in India through the marriage market.
  • Estimated the structural model and conducted counterfactual policy simulations to inform interventions. Implementation using Amazon Web Services (AWS) for the heavy computational tasks.
  • Developed theoretical solutions of the model with derivation of the equilibrium equations and checking the proofs. Simulated the model economy in Matlab.
Technologies: Mathematica, MATLAB, Python, Economics, Data Modeling

Experience

Natural Language to SQL Query Generation

An LLM-based chatbot application that generates optimized SQL queries for custom database tables used by the team. The LLM is customized to understand the domain-specific vocabulary used within the team and help increase the reach of data analytics to non-tech-savvy leadership. I was the lead developer on this, and it improved the efficiency of data analysts across the organization.

Chatbot Intent Classifier

I created a deep learning-based intent classification model for the chatbot of MakeMyTrip, the largest OTA in India. This intent classifier was based on the ULMFiT model. It can classify an intent among over 100 classes.

Feature Marketplace for Data Science

I developed a feature store in Amazon Redshift that collates data from a number of different sources and makes them available in the desired format. It made the data clean, was always up-to-date, and ready to be used by machine learning models in production.

Data Tagging Tool

I improved an open-source data labeling tool in Django to create training data for an NLP classifier which was used in a chatbot. It enabled support for the dynamic options for every instance to be labeled.

Ranking

I developed a machine learning model to show personalized ranking to users based on their historical and recent interaction with products as well as similarity with other users.

South India Community Study

I worked on research projects on the economics of social networks in South India involving a randomized control trial.
I developed and customized a name-matching algorithm to match incoming patients to the project’s census data.

Predict 'em All

I developed an R-shiny-based machine learning application that predicts which Pokemon creature you would encounter at a given location and time in the Pokemon GO mobile game. The ML model was trained on a large publicly available dataset of the game.

Real-time Multiplayer Game

I developed a real-time multiplayer game integrating Microsoft Kinect and Windows Phone that allows one player using the phone to generate obstacles for the player using the Kinect.

Slot Extraction and Intent Classification

I developed a joint model based on sequence-to-sequence (Seq2Seq) architecture, which allows a user to extract the intent and slot values from an utterance given to a chatbot. This was used to understand the user's requirements when chatting with a chatbot.

Medical Document Understanding

A Python-based app for classifying and parsing medical documents (including lab reports, prescriptions, vaccination certificates, etc.).

This makes the documents digitally available as well as searchable. This is very similar to what Google Photos does for unorganized photos. It makes all your medical documents organized in proper categories and easily searchable with the relevant medical terms, even if they are handwritten.

Education

2016 - 2017

Master's Degree in Data Science

Barcelona Graduate School of Economics - Barcelona, Spain

2011 - 2015

Bachelor's Degree in Economics

Indian Institute of Technology Kanpur - Kanpur, India

Skills

Libraries/APIs

Pandas, PySpark, NumPy, SpaCy, PyTorch, REST APIs, TensorFlow

Tools

Git, Jupyter, Redash, Apache Airflow, Amazon Elastic MapReduce (EMR), Amazon SageMaker, Amazon Athena, Microsoft Power BI, ChatGPT, Amazon QuickSight, MATLAB, STATA, LaTeX, Looker, AI Prompts, ComfyUI, Claude, Odoo, PyCharm, Mathematica

Languages

Python, SQL, R, Snowflake, C, Java, Scala

Frameworks

Spark, Django, Seq2Seq

Paradigms

Business Intelligence (BI), Requirements Analysis, ETL

Platforms

Linux, MacOS, Amazon Web Services (AWS), Jupyter Notebook, AWS Lambda, Docker, Azure, Unix, Salesforce, Firebase

Storage

MySQL, Redshift, Apache Hive, Amazon S3 (AWS S3), Data Pipelines, Elasticsearch, NoSQL, MongoDB

Other

Deep Learning, Statistics, Predictive Learning, Predictive Modeling, Data Visualization, Data Engineering, Analytics, Big Data, Economics, Machine Learning, Data Science, Natural Language Processing (NLP), Data Analytics, Artificial Intelligence (AI), Algorithms, Data Analysis, Machine Learning Operations (MLOps), Generative Pre-trained Transformers (GPT), Large Language Models (LLMs), Generative Artificial Intelligence (GenAI), Deep Neural Networks (DNNs), Retrieval-augmented Generation (RAG), Text Analytics, Text Classification, Data Matching, Statistical Modeling, Computer Vision, Inventory Management Systems, Recommendation Systems, Data Modeling, Dashboards, Advertising Technology (Adtech), Data Build Tool (dbt), API Integration, Chatbots, AI Chatbots, Prompt Engineering, Stable Diffusion, NLU, LangChain, OpenAI, Optical Character Recognition (OCR), Chatbot Conversation Design, FastAPI, Azure Data Factory (ADF), AI Agents, Anthropic, Data Management, Metabase, Custom Audio Embedding, Matching Systems, CSV

Collaboration That Works

How to Work with Toptal

Toptal matches you directly with global industry experts from our network in hours—not weeks or months.

1

Share your needs

Discuss your requirements and refine your scope in a call with a Toptal domain expert.
2

Choose your talent

Get a short list of expertly matched talent within 24 hours to review, interview, and choose from.
3

Start your risk-free talent trial

Work with your chosen talent on a trial basis for up to two weeks. Pay only if you decide to hire them.

Top talent is in high demand.

Start hiring