Rajeev is available for hire

Rajeev Gupta

Verified Expert in Engineering

Artificial Intelligence (AI) Developer

Location

Delhi, India

Toptal Member Since

July 22, 2019

Rajeev is passionate about data and machine learning and has more than five years of experience in data science projects across numerous industries and applications. He's currently focused on cutting-edge technologies such as TensorFlow, Keras, deep learning, and most of the Python data science stack. Rajeev has used these skills to solve many real business problems in NLP, image processing, and time series domains.

Portfolio

Availyst LLC

Data Engineering, Data Scraping, Amazon Web Services (AWS), Scraping...

JSS Information Technology Business Incubator

Google Cloud Platform (GCP), Git, Jupyter Notebook, Keras, TensorFlow...

Forbes Media - Q.ai

Python, Data Science, Data Analysis

Experience

Machine Learning - 5 years Data Science - 5 years Artificial Intelligence (AI) - 5 years Natural Language Processing (NLP) - 4 years Image Processing - 4 years Generative Pre-trained Transformers (GPT) - 4 years Deep Learning - 4 years GPT - 4 years

Availability

Full-time

Preferred Environment

Google Cloud, Jupyter Notebook, Spyder, Git

The most amazing...

...project I've implemented was a NLP attention boosted sequential inference model to automate one of the business processes.

Work Experience

Data Developer

2021 - PRESENT

Availyst LLC

Worked with a US-based food aggregator startup on data engineering and scraping, using the Python data science stack, Jupyter Notebook, and AWS services.
Handled the recommendation engine for the user, a food and restaurant recommendation.
Developed the scraping application using Python and deployed it using AWS services.

Technologies: Data Engineering, Data Scraping, Amazon Web Services (AWS), Scraping, JavaScript, CSS, Python, MySQL, Tango

Independent Consultant — Data Scientist

2017 - PRESENT

JSS Information Technology Business Incubator

Associated with JSS Information Technology Business Incubator as a data science mentor.
Helped small companies and startups take advantage of their data.
Created predictive models using machine learning.
Worked with natural language processing with neural networks.
Developed classification and regression algorithms.
Implemented time-series forecasting.
Developed image detection with deep learning.

Technologies: Google Cloud Platform (GCP), Git, Jupyter Notebook, Keras, TensorFlow, Scikit-learn, Python

Data Scientist – Fintech Project

2021 - 2022

Forbes Media - Q.ai

Managed the business intelligence team, acting as a senior data scientist for the client.
Worked as a quant researcher, using advanced forms of quantitative techniques and artificial intelligence to generate investment recommendations across multiple asset classes, including stocks, ETFs, options, and cryptocurrencies.
Created a dashboard for the growth and marketing and leadership teams using Dash, Plotly, and Tableau.

Technologies: Python, Data Science, Data Analysis

Senior Data Scientist and Data Analyst

2021 - 2021

Premier Global Management Consultancy

Worked as a data scientist and senior analyst with the client and its team.
Worked on demand space segmentation for a large US fashion retailer.
Mapped 6 million customer data to the demand space segment.

Technologies: Python 3, Amazon Elastic MapReduce (EMR), PySpark

Data Scientist

2019 - 2019

A Telecommunications and Media Company in the US

Worked with a telecommunications and media company in the US on identifying fake news.
Developed two models to identify sarcasm and quantification fallacies in articles.

Technologies: PyTorch, TensorFlow, Python

Independent Consultant – Data Scientist

2019 - 2019

IBM

Worked for IBM US to optimize its US facility leases to run its operation.
Developed a Python model to improve facility utilization, reduce facility operations cost and reduce lease cost along with number of business constraints.

Technologies: Linear Programming, Plotly, Python

Independent Consultant – Data Scientist

2018 - 2018

AbbVie, Inc.

Worked closely with the C-level executive and product management team to analyze the survey and produced data/reports.
Helped the product team and executive team to make more informed decisions—increasing market share through the identification of new opportunity, target segments and devising ingenious new ways of resolving constraints.

Technologies: Association Rule Learning, Cluster, Regression, Matplotlib, Plotly, R, Python

Independent Consultant – Data Scientist

2017 - 2018

Newristics

Developed a Python app which uses natural language processing with deep neural networks sequence to sequence learning to automate business process.
Reduced the cost of business operations.

Technologies: Google Cloud Platform (GCP), Git, Jupyter Notebook, Keras, TensorFlow, Scikit-learn, Natural Language Toolkit (NLTK), SpaCy, GloVe, Gensim, LSTM, Python

Data Scientist

2016 - 2017

Sopra Steria Singapore

Worked with the Land Transport Authority, Singapore to implement the vision to convert the city into a digital and intelligent one to improve the efficiency of services for the citizens, using machine learning, predictive modeling, and data mining.

Technologies: Git, Jupyter Notebook, Keras, TensorFlow, Scikit-learn, Tableau, Python

Data Scientist

2014 - 2015

Steria India

Built a recommendation system for an eCommerce site; it recommended the best possible items to buy based on customer history and collaborative filtering.
Helped with customer churn prediction by developing a classification algorithm for a retail bank to identify customers likely to churn balances in the next quarter by at least 50% vis-a-vis current quarter.
Created a classification algorithm for a retail bank to improve sales from existing customers by cross-selling one of its product, the personal loan (customer cross-sales).

Technologies: Classification, Cluster, Regression, Matplotlib, Plotly, R, Python

Technical Program Manager

1997 - 2014

Steria India — Barclays Bank

Set up business benefits of around £43 million over five years in customer retention, cost savings, and new business opportunities at an estimated cost of around £12 million.
Acted as a vital member of the steering committee that identified user needs and developed customized solutions for around 250,000 Barclaycard acquiring merchants.
Led a project team of 147 members including solution architects, designers, developers, and testers spread across multi-geographical locations through the entire project development life cycle.
Consistently stayed within around 5% of resource and budget forecast monthly.
Recognized as problem solver within a team of 22 project managers in the portfolio of annual spend over £70 million.

Technologies: Oracle, Content Management, Ab Initio, WebSphere, XML, Java, COBOL, JCL, Virtual Storage Access Method (VSAM), IBM Db2, CICS

Experience

IBM

IBM US leases several facilities across the US to run its operation. The objective of this project was to improve facility utilization and reduce facility operations and lease costs, along with many business constraints.
I developed the Python integer programming algorithm to solve this problem. Considering the business constraints made this problem interesting and unique. I parameterized the optimization period (the period to look into the future) in the algorithm to provide multiple solutions. The client especially appreciated this feature.
Technologies: Python, Plotly, Linear Programming, Package Pulp

Newristics

Newristics is a US-based global leader in applying decision-heuristic science to marketing. Using heuristic psychology (500+ different heuristics), it rewrites each marketing message.

I automated the message scorer process where a team compares the new message against the old one and analyzes it to rate how closely it depicts the heuristic.

Text data is then preprocessed with text cleaning, text normalization, and generated unigram bigram of normalized data. I built two main models to solve this problem: XGBoost and deep neural network seq-to-seq learning.

For XGBoost, I created around 900 features (divided into three sections).
• NLP basic features: count/ratio of words/character of the message, TF-IDF of unigram/bigram, gensim TF-IDF similarity, and so on
• Word embedding—similarity of self/pre-trained Word2vec/GloVe-weighted average embedding vectors (TF-IDF as weight), etc.
• Graph—degree of nodes, the intersection of neighbors, k-core/k-clique, degree of separation, etc.

I used the deep learning seq-to-seq model to enhance the sequence inference neural network architecture.

Technologies: Python, LSTM, gensim, GloVe, SpaCy, NLTK, Scikit-learn, TensorFlow, Keras, Jupyter Notebook, Git, Google Cloud Platform

AbbVie, Inc.

AbbVie, Inc. is a leading pharmaceutical company and introduced a drug whose market share slipped from 65% to 49%. They conducted a physician survey on three themes to help in strategic planning.

We interviewed 119 physicians about HCV regiment attributes which impact the market driver, 55 physicians concerning patient treatment, and 60 physicians about sales rep interaction and their impression about the message and interaction.
I worked closely with the C-level executive and product management team to analyze the survey and produced data/reports. This helped the product team and executive team to make more informed decisions—increasing market share through the identification of new opportunity, target segments, and devising ingenious new ways of resolving constraints.
Technologies: Python, R, Plotly, Matplotlib, Regression, Cluster, Association Rule

Classify H&E Stained Histological Breast Cancer Images

I participated in a hackathon to classify H&E stained histological breast cancer images. We got a minimal set of training data (a few hundred images). To increase the robustness of the classifier, I used a strong data augmentation and deep convolutional feature extractor at different scales with pre-trained CNNs on ImageNet. On this feature set, I applied a highly accurate gradient boosting algorithm. I also avoided training neural networks on this amount of data to prevent suboptimal generalization.

Technologies: Python 3, Keras, NumPy, Pandas, SciPy, Scikit-learn

Demand Forecast at an SKU-level for a Brewery Company

Problem: They have a large portfolio of products distributed to retailers through wholesalers (agencies). There are thousands of unique wholesaler-SKU/product combinations.

In order to plan its production and distribution as well as help wholesalers with their planning, it is important for them to have an accurate estimate of demand at SKU level (34) for each wholesaler (60).

Data: Four years of data of 60 agencies and 34 SKUs are used for prediction.
• Price sales promotion (dollar/hectoliter): The price, sales, and promotion in dollar value per hectoliter at an agency-SKU-month level
• Historical volume (hectoliters): Sales data at an agency-SKU-month level
• Weather (degree celsius): The average maximum temperature at an agency-month level
• Industry soda sales (hectoliters): Industry-level soda sales
• Event calendar: Event details (sports, carnivals, and so on)
• Industry volume (hectoliters): Industry actual beer volume
• Demographics: Demographic details (yearly income in dollars); used deep neural networks sequence to sequence learning for demand prediction

Satellite Imagery Feature Detection Using Deep Learning

I developed a model for satellite imagery feature detection using deep learning. 1KM x 1KM satellite images are in both 3-band and 16-band formats. This multi-band imagery is taken from the multispectral (400-1040NM) and short-wave infrared (SWIR) (1195-2365NM) range.

Skills

Languages

Python, Python 3, SQL, R, CICS, COBOL, Java, XML, JavaScript, CSS

Frameworks

LightGBM, Apache Spark

Libraries/APIs

TensorFlow, TensorFlow Deep Learning Library (TFLearn), Matplotlib, Scikit-learn, Pandas, NumPy, XGBoost, CatBoost, Keras, PyTorch, SciPy, Dask, LSTM, SpaCy, Natural Language Toolkit (NLTK), PySpark

Tools

Jupyter, GitHub, Seaborn, Plotly, Git, Spyder, Gensim, Cluster, Tableau, JCL, Ab Initio, Amazon Elastic MapReduce (EMR)

Paradigms

Data Science, Agile Software Development, Linear Programming

Platforms

Docker, Amazon Web Services (AWS), Jupyter Notebook, Google Cloud Platform (GCP), WebSphere, Oracle, Tango

Storage

Data Pipelines, Google Cloud, IBM Db2, Virtual Storage Access Method (VSAM), MySQL

Other

Data Analysis, Data Analytics, Data Scraping, Data Engineering, Quantitative Modeling, Quantitative Analysis, Mixed-integer Linear Programming, Deep Learning, Deep Neural Networks, Convolutional Neural Networks (CNN), Recurrent Neural Networks (RNNs), Long Short-term Memory (LSTM), Natural Language Processing (NLP), Image Processing, Time Series Analysis, Artificial Intelligence (AI), Machine Learning, Modeling, Statistical Modeling, Statistical Methods, Statistical Learning, Analytics, GPT, Generative Pre-trained Transformers (GPT), Statistics, Numba, Optimization, Reinforcement Learning, Deep Reinforcement Learning, Dash, GloVe, Regression, Association Rule Learning, Classification, Content Management, Scraping

Education

1991 - 1994

Master's Degree in Computer Science

Jawaharlal Nehru University - New Delhi, India

1987 - 1990

Bachelor's Degree in Mathematics

Delhi University - Delhi, India

Collaboration That Works

How to Work with Toptal

Toptal matches you directly with global industry experts from our network in hours—not weeks or months.

Share your needs

Discuss your requirements and refine your scope in a call with a Toptal domain expert.

Choose your talent

Get a short list of expertly matched talent within 24 hours to review, interview, and choose from.

Start your risk-free talent trial

Work with your chosen talent on a trial basis for up to two weeks. Pay only if you decide to hire them.

Top talent is in high demand.

Start hiring