
Ankur Dixit
Verified Expert in Engineering
Data Engineer and Developer
Pune, Maharashtra, India
Toptal member since April 29, 2025
Ankur has 12+ years of experience in hardcore application development and creating testing automation frameworks. He has two years of relevant experience in Azure Web Apps, Logic Apps, Functions, WebJobs, virtual machines, and Power BI, and is skilled in implementing Azure technologies, including Data Factory, Databricks, and Storage. Ankur also has four years of experience developing automation frameworks in Python.
Portfolio
Experience
- Pandas - 8 years
- Windows 10 - 8 years
- Python 3 - 8 years
- Amazon Web Services (AWS) - 5 years
- Spark - 5 years
- FastAPI - 4 years
- Flask - 4 years
- Azure - 4 years
Preferred Environment
Linux, Windows
The most amazing...
...thing I've developed is an application and APIs for a banking client.
Work Experience
Software AI Engineer
Open3D
- Ensured that inputs were ingested (files/APIs) and pre-processed. Used structured prompts and tool integrations to validate data against rules, flagging inconsistencies and classifying error types. A decision layer suggested corrective actions.
- Defined agent workflows visually (step-by-step logic). Configured prompt templates for validation, classification, and reasoning. Integrated external tools/APIs for data fetching and updates. Set up conditional flows and retry/error handling logic.
- Reduced manual validation effort by around 60–70%. Improved data accuracy and turnaround time. Enabled non-technical stakeholders to interact with the system via simple prompts.
- Built an AI agent that could ingest and validate inputs, as the client had fragmented data from multiple sources (Excel, APIs, and databases), and manual validation and decision-making were slow and error-prone.
Senior Data Engineer
OpenX
- Built a Databricks Lakehouse platform using PySpark and Delta Lake with Bronze, Silver, and Gold layers, implementing scalable ETL pipelines, data-quality validation, Spark optimization, operational monitoring, and Unity Catalog-style governance for enterprise analytics.
- Demonstrated strong Databricks and PySpark experience by building scalable ETL/ELT pipelines, optimizing Medallion Lakehouse architectures, implementing data validation, tuning Spark, and ensuring production monitoring and governance.
- Developed scalable Databricks and PySpark Lakehouse pipelines with Delta optimization and governance controls, resulting in reduced pipeline failures, improved query performance, accelerated analytics delivery, and increased reliability of production data workflows.
Lead Developer
OpenAI
- Worked on solving complex problems to train ChatGPT using Python, LangChain, and ML. Wrote detailed, well-commented solutions used across multiple models to improve understanding and performance in production environments.
- Collaborated with cross-functional teams to evaluate and compare multiple engineer-provided solutions on various models, enhancing model accuracy and interpretability for real-world problem statements.
- Trained language models using curated, well-documented code solutions from multiple engineers, ensuring high-quality input data that improved ChatGPT’s contextual reasoning and problem-solving capabilities.
Full-stack Developer
Flexiple
- Developed an AI-powered remote interview monitoring platform that analyzes video, audio, and screen-sharing behavior to detect fraudulent activity and ensure interview integrity using ML, NLP, and computer vision.
- Tracked potential cheating behavior during interviews by analyzing lip sync patterns, eye movements, and background noises; integrated real-time alerts for anomalies like remote desktop access or off-screen prompts.
- Launched the platform across leading hiring platforms such as First IPO, Mongen, iTakeE, RiceBird, and FloCareer, enhancing remote interview authenticity and reducing candidate fraud by over 70%.
- Tested engineer-submitted solutions across multiple models to evaluate effectiveness, improving overall model accuracy and response interpretability for production-scale deployments.
- Trained language models using peer-reviewed, highly annotated codebases with thorough in-line documentation, resulting in better comprehension of user intent by generative AI systems.
- Developed detailed, step-by-step solutions to complex problems using Python, LangChain, and ML techniques, which were used to fine-tune and enhance the performance of ChatGPT across various reasoning tasks.
Lead Developer
IBM
- Contributed to IBM Watson, an AI platform for business that uncovers insights, enables new engagement models, supports confident decision-making, and improves operational productivity.
- Built chatbots and virtual agents that answer customers’ questions and respond to their needs quickly and efficiently.
- Designed and developed frameworks to perform with Watson, enabling analysis and interpretation of all data, including unstructured text, images, audio, and video.
Lead Developer
Harman
- Contributed to Novus, the world's leading portfolio intelligence platform with $2+ trillion in client AUM. It brings all historical and ongoing positions to life through an intuitive analytics platform and provides true aggregation.
- Enabled the platform to aggregate data, process it, apply analytics, and gain insights across equity, hedge fund, private equity, venture capital, and real asset allocations on a multi-asset class, multi-level basis of exposures and performance.
- Designed and developed frameworks to perform ETL on market, public, and private data. Applied various strategies and algorithms on financial data, data modeling, and data mining. Contributed to the analytics library using various formulas.
Experience
Platform Migration & Cloud Integration
Key components of the migration plan:
DATA MIGRATION
• Data lake formation: Migrate data from the legacy system to Amazon S3 for cost-effective storage and scalability. Utilize AWS Glue for ETL processes to clean, transform, and load data into Amazon S3.
• Data warehouse: Use Amazon Redshift as the data warehouse solution for efficient querying and analysis of structured data, providing seamless integration with existing analytics tools.
COMPUTE MIGRATION
• Processing: For big data processing, replace on-premises processing with Amazon Elastic MapReduce (EMR), which provides scalability and cost-efficiency for handling large datasets.
DATABASE MIGRATION
• Transactional databases: Migrate relational databases to Amazon Relational Database Service (RDS), ensuring high availability, automatic backups, and scalability.
• Non-relational databases: For NoSQL data, use Amazon DynamoDB.
Education
Bachelor's Degree in Computer Science
Indian Institute of Technology Kanpur - Kanpur, Uttar Pradesh, India
Certifications
Google Analytics Certification
IBM
AWS Certified Solutions Architect
IBM
Deep Learning
IBM
Certified Scrum Product Owner
IBM
Master Project Manager (MPM)
IBM
Skills
Libraries/APIs
Pandas, NumPy, SciPy, React, Node.js, Binance API, PySpark, TensorFlow, PyTorch
Tools
Git, GitHub, Apache Airflow, Microsoft Excel, GitLab CI/CD, Microsoft Power BI, AWS Glue, Amazon SageMaker, Terraform, Claude
Languages
Python 3, SQL, HTML5, JavaScript, Python, PHP, Snowflake, Go, Java, C, C++
Frameworks
Selenium, Spark, Flask, Angular, Core ML, Core Data, ADF, Apache Spark
Paradigms
ETL, Agile, Scrum, Role-based Access Control (RBAC), OLAP, Business Intelligence (BI), Management
Storage
MySQL
Platforms
Amazon Web Services (AWS), Azure, Linux, Databricks, Firebase, Kubernetes, Docker, Windows, Cloudera Data Platform
Other
FastAPI, Windows 10, CSS5, APIs, Data Scraping, Web Scraping, Data Modeling, Data Engineering, Software Engineering, Data Cleaning, Data Transformation, Data Analysis, Quality Assurance (QA), Real-time Processing, Scalable Architecture, ELT, Prompt Engineering, Product Management, Full-stack Development, Machine Learning, Artificial Intelligence (AI), Chatbots, CI/CD Pipelines, OpenAI, Reverse Engineering, API Testing, QA Automation, Regression Testing, Analytics, Data Warehousing, DAX, Data Governance, Modeling, CDC, Event-driven Systems, Backtesting, Trading, Algorithms, Amazon Redshift, Data Science, LangChain, Agentic AI, Azure Databricks, IT Projects, Scrum Master, Scrum Product Owner, Deep Learning, Multimodal GenAI, Large Language Models (LLMs), AI Agents, Delta Lake, Lakehouse Architecture, Medallion Architecture, Unity Catalog, Partitioning Strategies, Distributed Data Processing, Data Ingestion Framework, Spark Performance Optimization
How to Work with Toptal
Toptal matches you directly with global industry experts from our network in hours—not weeks or months.
Share your needs
Choose your talent
Start your risk-free talent trial
Top talent is in high demand.
Start hiring