Aditya Andra, Analyst and Developer in Kolkata, West Bengal, India
Aditya Andra

Analyst and Developer in Kolkata, West Bengal, India

Member since April 18, 2020
Aditya is a developer with experience building machine learning and statistical models with large-scale data sets on cloud platforms using the latest big data technologies. Thanks to a master’s degree in business analytics and big data from the IE Business School, Aditya has a solid understanding of data science in various business scenarios. Aditya is also a former quantitative researcher specializing in time-series and machine learning-based strategies and risk models in financial markets.
Aditya is now available for hire

Portfolio

  • COGNIZER AI
    Natural Language Processing (NLP), Custom BERT, APIs, Python 3...
  • Freelance
    Amazon Web Services (AWS), Tableau, Jupyter Notebook, Redshift, AWS, NumPy...
  • WiseLike
    Deep Learning, Computer Vision, NumPy, Pandas, Python, Machine Learning...

Experience

Location

Kolkata, West Bengal, India

Availability

Full-time

Preferred Environment

Machine Learning, Python, Git, Jupyter, Tableau

The most amazing...

...project I've developed is combining a time series forecasting model with ML techniques to better price financial instruments.

Employment

  • Senior Data Scientist

    2020 - 2021
    COGNIZER AI
    • Developed a BERT-based conversational AI solution based on the business requirements.
    • Converted natural language queries into SQL queries using BERT-based deep-learning architecture.
    • Worked on major parts of their back-end flow while also taking ownership of those flows.
    Technologies: Natural Language Processing (NLP), Custom BERT, APIs, Python 3, Google Cloud Platform (GCP), Deep Learning
  • Data Scientist | Researcher

    2020 - 2020
    Freelance
    • Built data pipelines for data coming from multiple sources like the Quandl API and a SQL database.
    • Performed an exploratory data analysis on the built dataset, derived insights, and presented it to the stakeholders on Jupyter Notebook and Tableau.
    • Modeled the data using decision tree-based regression models.
    Technologies: Amazon Web Services (AWS), Tableau, Jupyter Notebook, Redshift, AWS, NumPy, Pandas, Python, Data Science, Data Analytics, Statistical Analysis, Machine Learning, Git, Docker, AWS EC2, APIs, Natural Language Processing (NLP), PostgreSQL, Jupyter, Python 3
  • CTO

    2020 - 2020
    WiseLike
    • Competed at the IE Business School's startup lab and won the investors' choice award and the most innovative project award.
    • Developed the whole machine learning pipeline from scratch, starting with a web scraper for pictures, extracting properties of a picture, and training the model using the data.
    • Served the model using a REST API (Flask) on the website wiselike.pythonanywhere.com.
    • Performed A/B and hypothesis testing to test the validity of the model.
    Technologies: Deep Learning, Computer Vision, NumPy, Pandas, Python, Machine Learning, Social Media Marketing, Websites, Scikit-learn, Flask
  • Quantitative Analyst

    2013 - 2019
    Futures First
    • Performed an exploratory data analysis on large scale financial datasets and derived insights that led to tradable strategies using Python and visualizing data through dashboards in Tableau.
    • Implemented a time series analysis (SARIMA, GARCH) of prices in commodity markets taking into account CFTC reports and external factors like currency and so on.
    • Developed regression-based mean-reverting strategies in fixed income markets of the US and Brazil.
    • Deployed ETL pipelines and ML pipelines working on GCP.
    • Performed backtesting and forward testing of strategies by tracking their Sharpe ratios.
    • Performed hypothesis testing and evaluated the risk for strategies based on Monte Carlo simulations and historical value at risk.
    • Built natural language pipelines to track news sentiment.
    Technologies: Google Cloud Platform (GCP), NumPy, Pandas, Python, Data Science, Data Analytics, Statistical Analysis, Machine Learning, Fixed-income Derivatives, Derivatives, Bloomberg API, Reuters Eikon, Git, Jupyter, Excel VBA
  • Research Intern

    2012 - 2012
    Next Sapiens
    • Developed a novel 4D (degrees of freedom) solution for the simultaneous localization and mapping of an unmanned aerial vehicle to reduce the computation cost and published research on the same (Leeexplore.ieee.org/document/6461785).
    • Combined location data from various sources like LIDAR, proximity sensors, inertial measurement units, and camera using extended Kalman filters to update the state information of the robot.
    • Developed a fuzzy logic-based PID controller for the unmanned aerial vehicle to maintain stability during flight.
    Technologies: Embedded C, C++, MATLAB

Experience

  • Churn Prediction for a Book Publisher
    https://github.com/adia4/Churn-publisher/blob/master/datathon-final.ipynb

    The problem statement involved predicting which classes were about to change from using the publisher's books to online material. After implementing feature engineering using genetic algorithm and clustering, the best prediction results were achieved using a random forest model.

  • Stock Suggestions Using Financial Data Stored on a Distributed System Using PySpark
    https://github.com/adia4/Financial-Analysis/blob/master/Spark-Financial_data_Analysis.ipynb

    This is an attempt to understand the relation between the financials of a company and the performance in the stock market. There is also an attempt to identify cheap buying opportunities based on various risk profiles. The dataset was huge so it was stored in a distributed file system and we used PySpark for the transformations.

  • Word Recommendation System for Movie and Series Reviews

    This is a natural language processing project where we used various methods like parts of speech tagging, name-entity recognition, readability, sentiment score, topic modeling, and more to train a regression model for good and bad reviews scraped from websites concerning different topics. The recommendations were made based on how various features impacted the score and what measures could be taken to improve it.

  • SQL Database for North American Oil and Gas and Visualization through Tableau

    I developed the database using ETL processes on the data from online resources, normalized the data to create a star schema using MySQL workbench, and used this output to visualize the data using Tableau.

  • Machine Learning Model to Suggest Better Pictures for Social Media
    http://wiselike.pythonanywhere.com/

    I have created a database by scraping the web for pictures and trained a machine learning model with several characteristics of images available in social media and the number of likes to suggest which picture works better. I also deployed the model using the Flask API.

  • Generating Insights in Stock Market Data

    I created data pipelines for merging data from various sources like several data APIs and the PostgreSQL database. I also implemented an exploratory data analysis and modeling of the new data to derive new insights along with running Jupyter Lab on an AWS EC2 instance.

  • Predicting the Probability of a Default of a Company to Make Loan Decisions
    https://github.com/MBD-RiskandFraud/fintech_platform_ie

    The project involved retrieving financial data of the company from a database and building a random forest model. The project had the scope of having a variable interest rate based on the probability of default for different sectors. Finally, the model was deployed using the Flask API.

  • Live Tweet Sentiment Tracking

    The project involved ingesting live tweet data using the API into Kafka topics. Then we used Spark streaming as a subscriber and did sentiment analysis and feature engineering. This data was then aggregated and passed onto a shiny dashboard. The data then was stored in a MongoDB database.

  • Cancer Prediction Using VOC Data
    https://github.com/adia4/voc_cancer_prediction

    This project is based on research where volatile organic compounds (VOCs) released by humans have predictive power with cancer. Here we are using a
    VOC database with labeled cancer data. The results are deployed using a Flask API which predicts the kind of cancer based on the VOC content.

  • Sales Forecast Model for FMCG, Taking the COVID Scenario Into Account

    I developed the sales forecasting model for an FMCG client. We trained an ARIMA model and decomposed the data into its component sine waves using FFT. This data was then fed to a machine learning model along with some external factors to predict the sales.

Skills

  • Languages

    Python, SQL, R, C++, Python 3, SAS, Embedded C, Excel VBA
  • Libraries/APIs

    Pandas, NumPy, Scikit-learn, Keras, REST APIs, Spark ML, TensorFlow, NLTK, Spark Streaming, Bloomberg API, PySpark
  • Paradigms

    Functional Programming, Data Science, Quantitative Research, Object-oriented Programming (OOP), ETL
  • Other

    Quantitative Modeling, Statistics, Finance, Time Series, Mathematics, Natural Language Processing (NLP), Financial Modeling, Machine Learning, Time Series Analysis, Risk Modeling, Automated Trading Software, Statistical Analysis, Quantitative Analysis, Predictive Analytics, Data Analysis, Data Analytics, Statistical Modeling, Regression Models, Hypothesis Testing, Data Visualization, ARIMA, Forecasting, Deep Learning, Recommendation Systems, Multivariate Statistical Modeling, Data Modeling, Machine Learning Operations (MLOps), Data Warehousing, Algorithms, AWS, Dashboards, Web Scraping, Neural Networks, Computer Vision, Data Warehouse Design, Websites, Social Media Marketing, Flask API, A/B Testing, Trading, Data Engineering, Big Data, APIs, Derivatives, Decision Trees, Signal Analysis, Custom BERT
  • Frameworks

    Flask, Spark, Apache Spark
  • Tools

    Tableau, DataViz, Spark SQL, Jupyter, Git, MATLAB, Bloomberg, Reuters Eikon
  • Platforms

    Linux, Amazon Web Services (AWS), Windows, AWS EC2, Apache Kafka, Google Cloud Platform (GCP), Docker, Pentaho, Jupyter Notebook
  • Storage

    MongoDB, Redshift, NoSQL, PostgreSQL, Azure SQL Databases, MySQL, Databases
  • Industry Expertise

    Social Media

Education

  • Master's degree in Business Analytics and Big Data
    2019 - 2020
    IE Business School - Madrid, Spain
  • Bachelor of Technology degree in Electrical Engineering
    2009 - 2013
    Indian Institute of Technology (ISM), Dhanbad - Dhanbad, India

Certifications

  • Data Engineering, Big Data, and Machine Learning on GCP Specialization
    SEPTEMBER 2020 - PRESENT
    Coursera
  • Certification in Quantitative Finance
    SEPTEMBER 2018 - PRESENT
    Fitch Learning

To view more profiles

Join Toptal
Share it with others