Dhaval is available for hire

Dhaval Patel

Verified Expert in Engineering

Data Scientist and Developer

Location

London, United Kingdom

Toptal Member Since

August 18, 2020

Dhaval is a data scientist and engineer with a proven track record in applying ahead-of-the-curve technologies to solve a range of data-driven problems. Some of them included extracting information from natural language to aid fact-checkers in decision making, classifying tweets in real-time to stop the spread of misinformation, and analyzing large volumes of news articles. Dhaval is always interested in new opportunities to apply and extend his expertise and to explore new areas.

Software Engineering Machine Learning Natural Language Processing (NLP)Data Analysis Deep Learning Clustering Computer Vision Ubuntu Linux Python PyCharm TensorFlow PyTorch Natural Language Toolkit (NLTK)Keras Spark Sqoop

Portfolio

Logically LTD

Generative Pre-trained Transformers (GPT), GPT...

Tata Consultancy Services

Amazon Web Services (AWS), Scikit-learn, ETL, Apache Sqoop, Apache Hive, Hadoop...

Experience

Python - 5 years Machine Learning - 4 years TensorFlow - 3 years PyTorch - 3 years Generative Pre-trained Transformers (GPT) - 3 years Deep Learning - 3 years Data Science - 3 years GPT - 3 years

Availability

Part-time

Preferred Environment

Jupyter Notebook, Ubuntu Linux, PyCharm

The most amazing...

...achievement was securing the 24th rank out of 4,551 teams worldwide with a final ROC-AUC score of 0.9877 in the toxic comment classification challenge, Kaggle.

Work Experience

Senior Data Scientist

2018 - PRESENT

Logically LTD

Developed a multi-document abstractive text summarization system for news stories using denoising sequence-to-sequence architecture.
Created a scalable algorithm to identify automated accounts(bots) on Twitter which can serve up to 900 million requests per day.
Constructed a stance classification model to identify a stance between a claim and perspective to help fact-checkers work more effectively.
Developed an end-to-end pipeline (using Kubernetes) for a topic categorization system that was collecting training data to deploy the model in a production environment.
Implemented a hate-speech detection model using a state-of-the-art ROBERTA encoder.
Improved the F1 score of the existing headline click-bait detection system by 8%.

Technologies: Generative Pre-trained Transformers (GPT), Natural Language Processing (NLP), GPT, Data Science, Flask, Google Compute Engine (GCE), Keras, Kubernetes, MongoDB, Scikit-learn, Natural Language Toolkit (NLTK), SpaCy, TensorFlow, PyTorch, Python

Data Engineer

2013 - 2017

Tata Consultancy Services

Worked with different big data technologies to develop ML models for default rate prediction; clustering the client base into different groups and optimizing production jobs.
Improved an existing default rate model’s accuracy from 79% to 84.5%by introducing relevant new features.
Developed an ETL tool for data extraction, filtering, and cleaning using Sqoop, Python, Apache Spark, and Apache Hive.
Developed new functionalities for TCS BaNCS (the core banking product) using COBOL and SQL.

Technologies: Amazon Web Services (AWS), Scikit-learn, ETL, Apache Sqoop, Apache Hive, Hadoop, Python, Scala, Spark

Experience

Analysis of Data Efficiency for Model-free Deep Reinforcement Learning Algorithms

https://github.com/Patel-Dhaval-M/MSC_project

I implemented different variants of model-free deep reinforcement learning algorithms in Python and analyzed the data efficiency of each of the variants.

The overall objective of the entire project can be summarized in the below points:
• Configure Mujoco simulator to work on Windows.
• Implement deep deterministic policy gradient algorithm with generalized advantage estimation and asynchronous deep deterministic policy gradient with multiple updates.
• Analyze the data efficiency of both the algorithms along with a number of update steps.

The results and analysis can be found at the GitHub link.

News Story Headline Generation

Generate the headline of a news story (a group of news stories).

I architected and developed the entire pipeline which takes the set of news articles, performs the LexRank algorithm to select the candidate sentences, and passes it to the natural language generation algorithm to generate the headline of the news story.

I deployed this pipeline in Kubernetes to generate the headlines in real-time.

Large Scale Clustering on a Stack Overflow Dataset Using Apache Spark

https://github.com/Patel-Dhaval-M/Large-Scale-Clustering-using-Apache-Spark

This project aims at applying the K-means clustering algorithm on the Stack Overflow dataset to group similar users and posts. Appropriate features are selected to extract the skill sets of users and the relevance of the posts.

The algorithm is completely implemented on PySpark to make use of parallel computation of spark and HDFS. The code is implemented without using the MLlib library of Spark, results are discussed and finally, it is compared with the results obtained after using Spark's Machine Learning library (MLlib).

The elbow method was applied to obtain the optimal number of clusters for both user and posts dataset. Additionally, two other functions are written to normalize the data and to implement one-hot notations for string type data (e.g., badges, tags).

Education

2017 - 2018

Master's Degree in Big Data Science

Queen Mary University of Mumbai - London, UK

2009 - 2013

Bachelor of Engineering Degree in Computer Science

University of Mumbai - Mumbai, India

Certifications

APRIL 2020 - PRESENT

Nanodegree in Data Structures and Algorithm

Udacity

APRIL 2018 - PRESENT

Deep Learning Specialization

Deeplearning.ai via Coursera

JULY 2017 - PRESENT

Machine Learning Specialization

University of Washington via Coursera

Skills

Libraries/APIs

TensorFlow, PyTorch, SpaCy, Natural Language Toolkit (NLTK), Keras, Scikit-learn

Tools

PyCharm, Google Compute Engine (GCE), Apache Sqoop

Languages

C++, Python, Scala

Paradigms

Data Science, ETL

Frameworks

Spark, Flask, Hadoop

Platforms

Ubuntu Linux, Jupyter Notebook, Kubernetes, Amazon Web Services (AWS)

Storage

Database Structure, MongoDB, Apache Hive

Other

Natural Language Processing (NLP), Computer Vision, Machine Learning, Reinforcement Learning, Data Analysis, Big Data, Bayesian Inference & Modeling, Data Mining, Operating Systems, Artificial Intelligence (AI), Web Programming, Graph Theory, Data Structures, Algorithms, Computer Graphics, Software Engineering, Deep Learning, Regression, Classification, Clustering, GPT, Generative Pre-trained Transformers (GPT)

Collaboration That Works

How to Work with Toptal

Toptal matches you directly with global industry experts from our network in hours—not weeks or months.

Share your needs

Discuss your requirements and refine your scope in a call with a Toptal domain expert.

Choose your talent

Get a short list of expertly matched talent within 24 hours to review, interview, and choose from.

Start your risk-free talent trial

Work with your chosen talent on a trial basis for up to two weeks. Pay only if you decide to hire them.

Top talent is in high demand.

Start hiring