Ankur is currently unavailable

Ankur Dixit

Verified Expert in Engineering

Data Engineer and Developer

Pune, Maharashtra, India

Toptal member since April 29, 2025

Expertise

Data Analysis CSS Data Scraping Web Scraping Data Engineering Prompt Engineering Python SQL MySQL Git Agile Development NumPy Excel Development

Bio

Ankur has 12+ years of experience in hardcore application development and creating testing automation frameworks. He has two years of relevant experience in Azure Web Apps, Logic Apps, Functions, WebJobs, virtual machines, and Power BI, and is skilled in implementing Azure technologies, including Data Factory, Databricks, and Storage. Ankur also has four years of experience developing automation frameworks in Python.

Portfolio

Open3D

AI Agents

OpenX

Databricks, PySpark, Apache Spark, Delta Lake, ETL, ELT, Lakehouse Architecture...

OpenAI

Python, Go, Machine Learning, Data Science, Pandas, Flask, LangChain...

Experience

Pandas - 8 years
Windows 10 - 8 years
Python 3 - 8 years
Amazon Web Services (AWS) - 5 years
Spark - 5 years
FastAPI - 4 years
Flask - 4 years
Azure - 4 years

Preferred Environment

Linux, Windows

The most amazing...

...thing I've developed is an application and APIs for a banking client.

Work Experience

Software AI Engineer

2025 - 2026

Open3D

Ensured that inputs were ingested (files/APIs) and pre-processed. Used structured prompts and tool integrations to validate data against rules, flagging inconsistencies and classifying error types. A decision layer suggested corrective actions.
Defined agent workflows visually (step-by-step logic). Configured prompt templates for validation, classification, and reasoning. Integrated external tools/APIs for data fetching and updates. Set up conditional flows and retry/error handling logic.
Reduced manual validation effort by around 60–70%. Improved data accuracy and turnaround time. Enabled non-technical stakeholders to interact with the system via simple prompts.
Built an AI agent that could ingest and validate inputs, as the client had fragmented data from multiple sources (Excel, APIs, and databases), and manual validation and decision-making were slow and error-prone.

Technologies: AI Agents

Senior Data Engineer

2024 - 2025

OpenX

Built a Databricks Lakehouse platform using PySpark and Delta Lake with Bronze, Silver, and Gold layers, implementing scalable ETL pipelines, data-quality validation, Spark optimization, operational monitoring, and Unity Catalog-style governance for enterprise analytics.
Demonstrated strong Databricks and PySpark experience by building scalable ETL/ELT pipelines, optimizing Medallion Lakehouse architectures, implementing data validation, tuning Spark, and ensuring production monitoring and governance.
Developed scalable Databricks and PySpark Lakehouse pipelines with Delta optimization and governance controls, resulting in reduced pipeline failures, improved query performance, accelerated analytics delivery, and increased reliability of production data workflows.

Technologies: Databricks, PySpark, Apache Spark, Delta Lake, ETL, ELT, Lakehouse Architecture, Medallion Architecture, Data Engineering, SQL, Python 3, Role-based Access Control (RBAC), Unity Catalog, Cloudera Data Platform, CI/CD Pipelines, Partitioning Strategies, Distributed Data Processing, Data Ingestion Framework, Spark Performance Optimization

Lead Developer

2023 - 2024

OpenAI

Worked on solving complex problems to train ChatGPT using Python, LangChain, and ML. Wrote detailed, well-commented solutions used across multiple models to improve understanding and performance in production environments.
Collaborated with cross-functional teams to evaluate and compare multiple engineer-provided solutions on various models, enhancing model accuracy and interpretability for real-world problem statements.
Trained language models using curated, well-documented code solutions from multiple engineers, ensuring high-quality input data that improved ChatGPT’s contextual reasoning and problem-solving capabilities.

Technologies: Python, Go, Machine Learning, Data Science, Pandas, Flask, LangChain, Agentic AI, React

Full-stack Developer

2021 - 2023

Flexiple

Developed an AI-powered remote interview monitoring platform that analyzes video, audio, and screen-sharing behavior to detect fraudulent activity and ensure interview integrity using ML, NLP, and computer vision.
Tracked potential cheating behavior during interviews by analyzing lip sync patterns, eye movements, and background noises; integrated real-time alerts for anomalies like remote desktop access or off-screen prompts.
Launched the platform across leading hiring platforms such as First IPO, Mongen, iTakeE, RiceBird, and FloCareer, enhancing remote interview authenticity and reducing candidate fraud by over 70%.
Tested engineer-submitted solutions across multiple models to evaluate effectiveness, improving overall model accuracy and response interpretability for production-scale deployments.
Trained language models using peer-reviewed, highly annotated codebases with thorough in-line documentation, resulting in better comprehension of user intent by generative AI systems.
Developed detailed, step-by-step solutions to complex problems using Python, LangChain, and ML techniques, which were used to fine-tune and enhance the performance of ChatGPT across various reasoning tasks.

Technologies: Amazon Web Services (AWS), Python, Azure, ADF, Azure Databricks, PySpark, Terraform, Kubernetes, Docker

Lead Developer

2017 - 2018

IBM

Contributed to IBM Watson, an AI platform for business that uncovers insights, enables new engagement models, supports confident decision-making, and improves operational productivity.
Built chatbots and virtual agents that answer customers’ questions and respond to their needs quickly and efficiently.
Designed and developed frameworks to perform with Watson, enabling analysis and interpretation of all data, including unstructured text, images, audio, and video.

Technologies: Python 3, Spark, Amazon Web Services (AWS), Windows 10, JavaScript, React

Lead Developer

2016 - 2017

Harman

Contributed to Novus, the world's leading portfolio intelligence platform with $2+ trillion in client AUM. It brings all historical and ongoing positions to life through an intuitive analytics platform and provides true aggregation.
Enabled the platform to aggregate data, process it, apply analytics, and gain insights across equity, hedge fund, private equity, venture capital, and real asset allocations on a multi-asset class, multi-level basis of exposures and performance.
Designed and developed frameworks to perform ETL on market, public, and private data. Applied various strategies and algorithms on financial data, data modeling, and data mining. Contributed to the analytics library using various formulas.

Technologies: Spark, Amazon Web Services (AWS), Windows 10, Pandas, JavaScript

Experience

Platform Migration & Cloud Integration

The bank’s legacy financial analytical platform for credit risk assessment, built on a traditional on-premises infrastructure, has become inefficient and costly to maintain. The platform is essential for assessing credit risk, which involves processing massive volumes of financial data to evaluate clients' creditworthiness.

Key components of the migration plan:

DATA MIGRATION
• Data lake formation: Migrate data from the legacy system to Amazon S3 for cost-effective storage and scalability. Utilize AWS Glue for ETL processes to clean, transform, and load data into Amazon S3.
• Data warehouse: Use Amazon Redshift as the data warehouse solution for efficient querying and analysis of structured data, providing seamless integration with existing analytics tools.

COMPUTE MIGRATION
• Processing: For big data processing, replace on-premises processing with Amazon Elastic MapReduce (EMR), which provides scalability and cost-efficiency for handling large datasets.

DATABASE MIGRATION
• Transactional databases: Migrate relational databases to Amazon Relational Database Service (RDS), ensuring high availability, automatic backups, and scalability.
• Non-relational databases: For NoSQL data, use Amazon DynamoDB.

Education

2007 - 2011

Bachelor's Degree in Computer Science

Indian Institute of Technology Kanpur - Kanpur, Uttar Pradesh, India

Certifications

OCTOBER 2024 - PRESENT

Google Analytics Certification

IBM

FEBRUARY 2024 - PRESENT

AWS Certified Solutions Architect

IBM

JULY 2023 - PRESENT

Deep Learning

IBM

MARCH 2022 - PRESENT

Certified Scrum Product Owner

IBM

JUNE 2020 - PRESENT

Master Project Manager (MPM)

IBM

Skills

Libraries/APIs

Pandas, NumPy, SciPy, React, Node.js, Binance API, PySpark, TensorFlow, PyTorch

Tools

Git, GitHub, Apache Airflow, Microsoft Excel, GitLab CI/CD, Microsoft Power BI, AWS Glue, Amazon SageMaker, Terraform, Claude

Languages

Python 3, SQL, HTML5, JavaScript, Python, PHP, Snowflake, Go, Java, C, C++

Frameworks

Selenium, Spark, Flask, Angular, Core ML, Core Data, ADF, Apache Spark

Paradigms

ETL, Agile, Scrum, Role-based Access Control (RBAC), OLAP, Business Intelligence (BI), Management

Storage

MySQL

Platforms

Amazon Web Services (AWS), Azure, Linux, Databricks, Firebase, Kubernetes, Docker, Windows, Cloudera Data Platform

Other

FastAPI, Windows 10, CSS5, APIs, Data Scraping, Web Scraping, Data Modeling, Data Engineering, Software Engineering, Data Cleaning, Data Transformation, Data Analysis, Quality Assurance (QA), Real-time Processing, Scalable Architecture, ELT, Prompt Engineering, Product Management, Full-stack Development, Machine Learning, Artificial Intelligence (AI), Chatbots, CI/CD Pipelines, OpenAI, Reverse Engineering, API Testing, QA Automation, Regression Testing, Analytics, Data Warehousing, DAX, Data Governance, Modeling, CDC, Event-driven Systems, Backtesting, Trading, Algorithms, Amazon Redshift, Data Science, LangChain, Agentic AI, Azure Databricks, IT Projects, Scrum Master, Scrum Product Owner, Deep Learning, Multimodal GenAI, Large Language Models (LLMs), AI Agents, Delta Lake, Lakehouse Architecture, Medallion Architecture, Unity Catalog, Partitioning Strategies, Distributed Data Processing, Data Ingestion Framework, Spark Performance Optimization

Collaboration That Works

How to Work with Toptal

Toptal matches you directly with global industry experts from our network in hours—not weeks or months.

Share your needs

Discuss your requirements and refine your scope in a call with a Toptal domain expert.

Choose your talent

Get a short list of expertly matched talent within 24 hours to review, interview, and choose from.

Start your risk-free talent trial

Work with your chosen talent on a trial basis for up to two weeks. Pay only if you decide to hire them.

Top talent is in high demand.

Start hiring