
Dhaval Patel
Verified Expert in Engineering
Data Scientist and Developer
London, United Kingdom
Toptal member since August 18, 2020
Dhaval is a data scientist and engineer with a proven track record in applying ahead-of-the-curve technologies to solve a range of data-driven problems. Some of them included extracting information from natural language to aid fact-checkers in decision making, classifying tweets in real-time to stop the spread of misinformation, and analyzing large volumes of news articles. Dhaval is always interested in new opportunities to apply and extend his expertise and to explore new areas.
Portfolio
Experience
- Python - 5 years
- Machine Learning - 4 years
- TensorFlow - 3 years
- PyTorch - 3 years
- Generative Pre-trained Transformers (GPT) - 3 years
- Deep Learning - 3 years
- Data Science - 3 years
- Natural Language Processing (NLP) - 3 years
Availability
Preferred Environment
Jupyter Notebook, Ubuntu Linux, PyCharm
The most amazing...
...achievement was securing the 24th rank out of 4,551 teams worldwide with a final ROC-AUC score of 0.9877 in the toxic comment classification challenge, Kaggle.
Work Experience
Senior Data Scientist
Logically LTD
- Developed a multi-document abstractive text summarization system for news stories using denoising sequence-to-sequence architecture.
- Created a scalable algorithm to identify automated accounts(bots) on Twitter which can serve up to 900 million requests per day.
- Constructed a stance classification model to identify a stance between a claim and perspective to help fact-checkers work more effectively.
- Developed an end-to-end pipeline (using Kubernetes) for a topic categorization system that was collecting training data to deploy the model in a production environment.
- Implemented a hate-speech detection model using a state-of-the-art ROBERTA encoder.
- Improved the F1 score of the existing headline click-bait detection system by 8%.
Data Engineer
Tata Consultancy Services
- Worked with different big data technologies to develop ML models for default rate prediction; clustering the client base into different groups and optimizing production jobs.
- Improved an existing default rate model’s accuracy from 79% to 84.5%by introducing relevant new features.
- Developed an ETL tool for data extraction, filtering, and cleaning using Sqoop, Python, Apache Spark, and Apache Hive.
- Developed new functionalities for TCS BaNCS (the core banking product) using COBOL and SQL.
Experience
Analysis of Data Efficiency for Model-free Deep Reinforcement Learning Algorithms
https://github.com/Patel-Dhaval-M/MSC_projectThe overall objective of the entire project can be summarized in the below points:
• Configure Mujoco simulator to work on Windows.
• Implement deep deterministic policy gradient algorithm with generalized advantage estimation and asynchronous deep deterministic policy gradient with multiple updates.
• Analyze the data efficiency of both the algorithms along with a number of update steps.
The results and analysis can be found at the GitHub link.
News Story Headline Generation
I architected and developed the entire pipeline which takes the set of news articles, performs the LexRank algorithm to select the candidate sentences, and passes it to the natural language generation algorithm to generate the headline of the news story.
I deployed this pipeline in Kubernetes to generate the headlines in real-time.
Large Scale Clustering on a Stack Overflow Dataset Using Apache Spark
https://github.com/Patel-Dhaval-M/Large-Scale-Clustering-using-Apache-SparkThe algorithm is completely implemented on PySpark to make use of parallel computation of spark and HDFS. The code is implemented without using the MLlib library of Spark, results are discussed and finally, it is compared with the results obtained after using Spark's Machine Learning library (MLlib).
The elbow method was applied to obtain the optimal number of clusters for both user and posts dataset. Additionally, two other functions are written to normalize the data and to implement one-hot notations for string type data (e.g., badges, tags).
Education
Master's Degree in Big Data Science
Queen Mary University of Mumbai - London, UK
Bachelor of Engineering Degree in Computer Science
University of Mumbai - Mumbai, India
Certifications
Nanodegree in Data Structures and Algorithm
Udacity
Deep Learning Specialization
Deeplearning.ai via Coursera
Machine Learning Specialization
University of Washington via Coursera
Skills
Libraries/APIs
TensorFlow, PyTorch, SpaCy, Natural Language Toolkit (NLTK), Keras, Scikit-learn
Tools
PyCharm, Google Compute Engine (GCE), Apache Sqoop
Languages
C++, Python, Scala
Frameworks
Spark, Flask, Hadoop
Paradigms
ETL
Platforms
Ubuntu Linux, Jupyter Notebook, Kubernetes, Amazon Web Services (AWS)
Storage
Database Structure, MongoDB, Apache Hive
Other
Data Science, Natural Language Processing (NLP), Computer Vision, Machine Learning, Reinforcement Learning, Data Analysis, Big Data, Bayesian Inference & Modeling, Data Mining, Operating Systems, Artificial Intelligence (AI), Web Programming, Graph Theory, Data Structures, Algorithms, Computer Graphics, Software Engineering, Deep Learning, Regression, Classification, Clustering, Generative Pre-trained Transformers (GPT)
How to Work with Toptal
Toptal matches you directly with global industry experts from our network in hours—not weeks or months.
Share your needs
Choose your talent
Start your risk-free talent trial
Top talent is in high demand.
Start hiring