Aditya is available for hire

Aditya Andra

Verified Expert in Engineering

Analyst and Developer

Location

Hyderabad, Telangana, India

Toptal Member Since

June 8, 2020

Aditya is a developer with experience building machine learning and statistical models with large-scale data sets on cloud platforms using the latest big data technologies. Thanks to master's degrees from the IE Business School and IIT (ISM) Dhanbad, Aditya has a solid understanding of data science in various business scenarios. He is also a former quantitative researcher specializing in time-series and machine learning-based strategies and risk models in financial markets.

Statistics Time Series Machine Learning Data Analysis Time Series Analysis Statistical Analysis Predictive Analytics Data Analytics Data Visualization Natural Language Processing (NLP)Deep Learning Python Pandas NumPy Scikit-learn Data Warehouse Dashboard Linear Programming

Portfolio

Novo Nordisk

Python, Deep Learning, Time Series, Machine Learning, Azure Machine Learning...

Zvoid

Machine Learning, Python, Quantitative Modeling, Quantitative Finance...

COGNIZER AI

Natural Language Processing (NLP), GPT...

Experience

Machine Learning - 7 years Data Science - 7 years Quantitative Research - 6 years Time Series Analysis - 6 years Python - 6 years Risk Modeling - 6 years Generative Pre-trained Transformers (GPT) - 3 years Natural Language Processing (NLP) - 3 years

Availability

Part-time

Preferred Environment

Machine Learning, Python, Git, Jupyter, Data Science

The most amazing...

...project I've developed is building a custom trial optimization model for a pharmaceutical company which outperformed all existing ML models.

Work Experience

Senior Data Scientist

2022 - 2023

Novo Nordisk

Built time series forecasting models using SOTA deep learning algorithms like N-HiTS and N-BEATS, which outperformed traditional ARIMA and Holt-Winters ES models.
Built a proprietary trial optimization algorithm to predict the end date of trials, which outperformed all the time series models.
Built ensemble models for demand and sales forecasting.

Technologies: Python, Deep Learning, Time Series, Machine Learning, Azure Machine Learning, Databricks, Supply Chain Optimization

Machine Learning Developer

2022 - 2022

Zvoid

Created a tweet listener capable of listening to the tweets from a given list of authors and making the data ready for the decision engine.
Built the automated trading capacity using the Alpaca API.
Developed the end-end analysis of a particular Twitter IPO hypothesis.
Worked on the decision engine using a random forest regressor that accepts the tweet and the stock price and gives out a stock buying or selling recommendation.

Technologies: Machine Learning, Python, Quantitative Modeling, Quantitative Finance, Data Science

Senior Data Scientist

2020 - 2021

COGNIZER AI

Developed a BERT-based conversational AI solution based on business requirements.
Converted natural language queries into SQL queries using BERT-based deep-learning architecture.
Contributed to significant parts of the back-end flow and took ownership of those flows.
Extracted various fields from contract PDFs using regex and deep learning models and optimized the models to increase processing speed using TensorRT.
Put the DL models into production using APIs and Docker. Used AWS and GCP to enable autoscaling features.

Technologies: Generative Pre-trained Transformers (GPT), GPT, Natural Language Processing (NLP), Custom BERT, APIs, Python 3, Google Cloud Platform (GCP), Deep Learning, Amazon Web Services (AWS), Machine Learning Operations (MLOps), Flask, REST APIs, Docker, Autoscaling

Data Scientist | Researcher

2020 - 2020

Freelance

Built data pipelines for data coming from multiple sources like the Quandl API and a SQL database.
Performed an exploratory data analysis on the built dataset, derived insights, and presented it to the stakeholders on Jupyter Notebook and Tableau.
Modeled the data using decision tree-based regression models.

Technologies: Amazon Web Services (AWS), Tableau, Jupyter Notebook, Redshift, NumPy, Pandas, Python, Data Science, Data Analytics, Statistical Analysis, Machine Learning, Git, Docker, Amazon EC2, APIs, Generative Pre-trained Transformers (GPT), GPT, Natural Language Processing (NLP), PostgreSQL, Jupyter, Python 3

CTO

2020 - 2020

WiseLike

Competed at the IE Business School's startup lab and won the investors' choice award and the most innovative project award.
Developed the whole machine learning pipeline from scratch, starting with a web scraper for pictures, extracting properties of a picture, and training the model using the data.
Served the model using a REST API (Flask) on the website wiselike.pythonanywhere.com.
Performed A/B and hypothesis testing to test the validity of the model.

Technologies: Deep Learning, Computer Vision, NumPy, Pandas, Python, Machine Learning, Social Media Marketing (SMM), Websites, Scikit-learn, Flask

Quantitative Analyst

2013 - 2019

Futures First

Performed an exploratory data analysis on large-scale financial datasets and derived insights that led to tradable strategies, using Python and visualizing data through dashboards in Tableau.
Implemented a time series analysis (SARIMA and GARCH) of prices in commodity markets, considering CFTC reports and external factors like currency.
Developed regression-based mean-reverting strategies in fixed-income markets of the US and Brazil.
Deployed ETL pipelines and ML pipelines working on GCP.
Performed backtesting and forward testing of strategies by tracking their Sharpe ratios.
Performed hypothesis testing and evaluated the risk for strategies based on Monte Carlo simulations and historical value at risk.
Built natural language pipelines to track news sentiment.

Technologies: Google Cloud Platform (GCP), NumPy, Pandas, Python, Data Science, Data Analytics, Statistical Analysis, Machine Learning, Derivatives, Bloomberg API, Reuters Eikon, Git, Jupyter, Excel VBA

Research Intern

2012 - 2012

Next Sapiens

Developed a novel 4D (degrees of freedom) solution for the simultaneous localization and mapping of an unmanned aerial vehicle to reduce the computation cost and published research on the same (Leeexplore.ieee.org/document/6461785).
Combined location data from various sources like LIDAR, proximity sensors, inertial measurement units, and camera using extended Kalman filters to update the state information of the robot.
Developed a fuzzy logic-based PID controller for the unmanned aerial vehicle to maintain stability during flight.

Technologies: Embedded C, C++, MATLAB

Experience

Churn Prediction for a Book Publisher

https://github.com/adia4/Churn-publisher/blob/master/datathon-final.ipynb

The problem statement involved predicting which classes were about to change from using the publisher's books to online material. After implementing feature engineering using a genetic algorithm and clustering, the best prediction results were achieved using a random forest model.

Stock Suggestions | Distributed System with PySpark

https://github.com/adia4/Financial-Analysis/blob/master/Spark-Financial_data_Analysis.ipynb

This is an attempt to understand the relationship between the financials of a company and its performance in the stock market. There is also an attempt to identify cheap buying opportunities based on various risk profiles. The dataset was huge, so it was stored in a distributed file system, and we used PySpark for the transformations.

Word Recommendation System for Movie and Series Reviews

This is a natural language processing project where we used various methods like parts of speech tagging, name-entity recognition, readability, sentiment score, topic modeling, and more to train a regression model for good and bad reviews scraped from websites concerning different topics. The recommendations were made based on how various features impacted the score and what measures could be taken to improve it.

SQL Database for North American Oil and Gas and Visualization through Tableau

I developed the database using ETL processes on the data from online resources, normalized the data to create a star schema using MySQL workbench, and used this output to visualize the data using Tableau.

Machine Learning Model to Suggest Better Pictures for Social Media

I have created a database by scraping the web for pictures and trained a machine learning model with several characteristics of images available in social media and the number of likes to suggest which picture works better. I also deployed the model using the Flask API.

Generating Insights in Stock Market Data

I created data pipelines for merging data from various sources like several data APIs and the PostgreSQL database. I also implemented an exploratory data analysis and modeling of the new data to derive new insights along with running Jupyter Lab on an AWS EC2 instance.

Predicting the Probability of a Default of a Company to Make Loan Decisions

https://github.com/MBD-RiskandFraud/fintech_platform_ie

The project involved retrieving financial data of the company from a database and building a random forest model. The project had the scope of having a variable interest rate based on the probability of default for different sectors. Finally, the model was deployed using the Flask API.

Live Tweet Sentiment Tracking

The project involved ingesting live tweet data using the API into Kafka topics. Then we used Spark streaming as a subscriber and did sentiment analysis and feature engineering. This data was then aggregated and passed onto a shiny dashboard. The data then was stored in a MongoDB database.

Cancer Prediction Using VOC Data

https://github.com/adia4/voc_cancer_prediction

This project is based on research where volatile organic compounds (VOCs) released by humans have predictive power with cancer. Here we are using a
VOC database with labeled cancer data. The results are deployed using a Flask API which predicts the kind of cancer based on the VOC content.

Sales Forecast Model for FMCG, Taking the COVID Scenario Into Account

I developed the sales forecasting model for an FMCG client. We trained an ARIMA model and decomposed the data into its component sine waves using FFT. This data was then fed to a machine learning model along with some external factors to predict the sales.

Time Series Forecasting

Built an ensemble model that combined outputs from deep learning time series models like N-HiTS and N-BEATS with a traditional linear regression that outperformed all the existing forecasts. Also built a Twitter scraper to get tweet data for the products and their associated sentiments.

End-to-end NLP Model Deployment

Trained BERT-based solutions to fit the given use case.
Built APIs to allow its interaction with external modules.
Dockerized the whole application.
Connected it with AWS and GCP solutions like Lambda, container registry, etc., to achieve autoscaling of the API.

Skills

Languages

Python, SQL, R, C++, Python 3, SAS, Embedded C, Excel VBA

Libraries/APIs

Pandas, NumPy, Scikit-learn, Keras, REST APIs, Spark ML, TensorFlow, Natural Language Toolkit (NLTK), Spark Streaming, Bloomberg API, PySpark

Paradigms

Functional Programming, Data Science, Quantitative Research, Object-oriented Programming (OOP), ETL, Linear Programming

Other

Quantitative Modeling, Statistics, Finance, Time Series, Mathematics, Natural Language Processing (NLP), Financial Modeling, Machine Learning, Time Series Analysis, Risk Modeling, Automated Trading Software, Statistical Analysis, Quantitative Analysis, Predictive Analytics, Data Analysis, Data Analytics, Statistical Modeling, Regression Modeling, Hypothesis Testing, Data Visualization, ARIMA, Forecasting, Deep Learning, GPT, Generative Pre-trained Transformers (GPT), Recommendation Systems, Multivariate Statistical Modeling, Data Modeling, Machine Learning Operations (MLOps), Data Warehousing, Algorithms, Dashboards, Web Scraping, Neural Networks, Computer Vision, Data Warehouse Design, Websites, Social Media Marketing (SMM), A/B Testing, Trading, Data Engineering, Big Data, APIs, Derivatives, Decision Trees, Signal Analysis, Custom BERT, Quantitative Finance, General Management, Supply Chain Optimization, Autoscaling, Gunicorn

Frameworks

Flask, Spark, Apache Spark

Tools

Tableau, DataViz, Spark SQL, Jupyter, Git, MATLAB, Bloomberg, Reuters Eikon, Azure Machine Learning

Platforms

Linux, Amazon Web Services (AWS), Windows, Amazon EC2, Apache Kafka, Google Cloud Platform (GCP), Docker, Pentaho, Jupyter Notebook, Databricks

Storage

MongoDB, Redshift, NoSQL, PostgreSQL, Azure SQL Databases, MySQL, Databases

Industry Expertise

Social Media

Education

2022 - 2023

Accelerated General Management Program in General Management

IIM Ahmedabad - Ahmedabad

2019 - 2020

Master's Degree in Business Analytics and Big Data

IE Business School - Madrid, Spain

2009 - 2013

Bachelor of Technology Degree in Electrical Engineering

Indian Institute of Technology (ISM), Dhanbad - Dhanbad, India

Certifications

SEPTEMBER 2020 - PRESENT

Data Engineering, Big Data, and Machine Learning on GCP Specialization

Coursera

SEPTEMBER 2018 - PRESENT

Certification in Quantitative Finance

Fitch Learning

Collaboration That Works

How to Work with Toptal

Toptal matches you directly with global industry experts from our network in hours—not weeks or months.

Share your needs

Discuss your requirements and refine your scope in a call with a Toptal domain expert.

Choose your talent

Get a short list of expertly matched talent within 24 hours to review, interview, and choose from.

Start your risk-free talent trial

Work with your chosen talent on a trial basis for up to two weeks. Pay only if you decide to hire them.

Top talent is in high demand.

Start hiring