Dragos Tudor, Developer in London, United Kingdom
Dragos is available for hire
Hire Dragos

Dragos Tudor

Verified Expert  in Engineering

Technical Leader and Developer

Location
London, United Kingdom
Toptal Member Since
March 6, 2019

Dragos is a technical leader who touched the lives of 1.5 million users and generated $50 million in business value by building and deploying machine learning implementations for international enterprises, SMEs, and startups. Dragos has worked across the entire engineering pipeline with both executives and data scientists, building production-ready recommender systems, advanced NLP models, time-series forecasting data products and classifiers, and other custom advanced analytics capabilities.

Portfolio

Quasar Labs
Data Analysis, Data Engineering, Computer Vision, Deep Learning, XGBoost...
DataZip
Amazon Web Services (AWS), Data Analysis, Data Engineering, Computer Vision...
Tessian
Data Analysis, Data Engineering, Deep Learning, XGBoost...

Experience

Availability

Part-time

Preferred Environment

Google Cloud Platform (GCP), Amazon Web Services (AWS), Amazon WorkSpaces, Amazon SageMaker, Python, R, TensorFlow, Linux

The most amazing...

...projects I've built are transformer-based deep learning models and custom embeddings on 1.5 billion multilingual emails for detecting spear-phishing attacks.

Work Experience

Founder | Senior Data Scientist

2018 - PRESENT
Quasar Labs
  • Consulted enterprise, SMB, and startup clients on the implementation of cutting-edge machine learning capabilities for a variety of use cases with the express goal of increasing performance and impact.
  • Communicated with executives, senior managers, and teams of data scientists from over 20 companies and over 40 countries.
  • Implemented deep learning neural networks using CNNs in TensorFlow for object detection and recognition—earthquake impact detection, receipt text detection, valve defect, and wear and tear detection.
  • Built custom learners for revenue forecasting in retail using seasonal ARIMA and RNNs and 85GB hourly sampled data. Deployed models in a real-time production environment—used Docker, Flask, AWS, PostgreSQL, and MySQL Server.
  • Implemented optical character recognition (OCR) for automated receipt text extraction and classification using Google OCR, TensorFlow, Flask, and Keras.
  • Developed an end-to-end training pipeline to predict user churn for a telecom client from the Bahamas. The architecture used leveraged time-to-event RNNs and gradient boosted decision trees.
Technologies: Data Analysis, Data Engineering, Computer Vision, Deep Learning, XGBoost, Artificial Intelligence (AI), Keras, Neural Networks, Generative Pre-trained Transformers (GPT), Natural Language Processing (NLP), Python, Data Science, Data Analytics, Spark Streaming, Flask, Sentiment Analysis, Technical Leadership, Data Reporting, Machine Learning, Exploratory Data Analysis, Statistical Analysis, SQL, TensorFlow, R

Founder

2019 - 2020
DataZip
  • Collected, processed, and controlled the distribution of auto dual dash-cam imagery and telematics data, as well as healthcare imagery.
  • Built pipelines for cleaning, processing, classifying, and anomaly detection applied to 1080p and 720p, and 30fps footage.
  • Synchronized the telematics and dash-cam video footage using audio recordings, Fast Fourier Transform (FFT) convolutions, de-noising, and signal processing techniques.
  • Implemented image semantic segmentation, road object classification, identification of rapid decelerations/breaks, and occurrences of near-misses and collisions.
  • Managed client interactions, projects, and development.
Technologies: Amazon Web Services (AWS), Data Analysis, Data Engineering, Computer Vision, Deep Learning, XGBoost, Artificial Intelligence (AI), Keras, Neural Networks, Python, Data Science, Data Analytics, Statistical Data Analysis, Flask, Sentiment Analysis, Technical Leadership, Android, Data Reporting, Machine Learning, Exploratory Data Analysis, Statistical Analysis, TensorFlow, OpenCV

Data Scientist | Natural Language Research Engineer

2019 - 2019
Tessian
  • Developed language models, transfer learning, text analysis, classification and clustering, few-shot learning, embeddings, and attention to RNN networks across 100GB of email data.
  • Pioneered techniques such as unsupervised data augmentation, weak supervision in Snorkel MeTaL, and multi-task learning for malicious data classification.
  • Implemented end-to-end machine learning models in production, using TensorFlow, AWS S3 and Athena, and SageMaker on both CPU and GPU-based architectures.
  • Proactively explored and analyzed the compatibility of string similarity matching using one-shot learning and siamese networks across multiple use cases.
  • Implemented various codebase improvements, testing automation, parallelized processing, and documentation design.
Technologies: Data Analysis, Data Engineering, Deep Learning, XGBoost, Artificial Intelligence (AI), Keras, Neural Networks, Natural Language Processing (NLP), Generative Pre-trained Transformers (GPT), Python, Data Science, Data Analytics, Statistical Data Analysis, Spark Streaming, Flask, Sentiment Analysis, Technical Leadership, Data Reporting, Machine Learning, Exploratory Data Analysis, Statistical Analysis, Amazon Athena, Amazon DynamoDB, Amazon S3 (AWS S3), Docker, Bash, TensorFlow

Data Scientist

2018 - 2018
Apsara Capital
  • Led the development and implementation of the data analysis and research infrastructure.
  • Developed the AWS S3, Lambda, EC2, and Docker orchestration for extracting, processing, and storing financial, economic, and market data from the Thomson Reuters Eikon API.
  • Built an NLP language model using Snorkel and MeTaL for the analysis earnings of call transcripts.
  • Created the technical analysis infrastructure using R and a set of 20 customizable technical indicators.
  • Designed the codebase, automate the testing, integrated the production, and generated and managed documentation.
Technologies: Quantitative Modeling, Data Analysis, Data Engineering, Deep Learning, XGBoost, Artificial Intelligence (AI), Keras, Neural Networks, Natural Language Processing (NLP), Generative Pre-trained Transformers (GPT), Python, Data Science, Data Analytics, Statistical Data Analysis, Flask, Sentiment Analysis, Technical Leadership, Data Reporting, Machine Learning, Exploratory Data Analysis, Statistical Analysis, R, Amazon Kinesis Data Firehose, AWS Glue, Amazon Athena, Amazon S3 (AWS S3)

Data Scientist

2017 - 2018
Tracktics GmbH
  • Analyzed time series data for motion classification and identification of activity bursts using CNN, Bayesian models, and Monte Carlo simulations.
  • Supported the development of the analytical pipeline and user segmentation capabilities using AWS S3, AWS Lambda, and EC2.
  • Implemented data management and visualization with AWS SQS, S3, DynamoDB, Python, Pandas, and Bokeh.
  • Developed a general motion analysis over triaxial accelerometer, gyroscope, magnetometer data in addition to GPS and video.
  • Proactively researched sports analytics, documentation management, scrum integration, and agile methodologies.
Technologies: Data Analysis, Data Engineering, Deep Learning, XGBoost, Artificial Intelligence (AI), Keras, Neural Networks, Python, Data Science, Data Analytics, Statistical Data Analysis, Flask, Data Reporting, Machine Learning, Exploratory Data Analysis, Statistical Analysis, JavaScript, Amazon Web Services (AWS), Django

Data Scientist | Analyst

2017 - 2018
PredictX
  • Took the initiative and improved sales forecasting capabilities by more than 20% as part of an MVP for a retail client with 700 POS. Used tree-based/linear models and 40TB+ extraneous variables such as weather, events, and client-specific metrics.
  • Drove business decisions by researching, testing, and integrating various regression and classification-based models using Python Scikit-learn, TensorFlow, and Keras.
  • Led the implementation of end-to-end ETL processes using Python, MySQL, PostgreSQL, and Knime.
  • Applied association rule mining with Neo4j Graph data representations for product recommendations in retail. Replicated results in production and supported the transition of the research initiative to a new market-ready product.
  • Developed an insurance algorithm for seismic and flood risk computation using MCMC.
  • Delivered codebase improvements via the use of in-memory processing with Spark and Hadoop.
Technologies: Quantitative Modeling, Data Analysis, Data Engineering, Deep Learning, XGBoost, Artificial Intelligence (AI), Keras, Neural Networks, Generative Pre-trained Transformers (GPT), Natural Language Processing (NLP), Python, Data Science, Data Analytics, Statistical Data Analysis, Flask, Sentiment Analysis, Data Reporting, Machine Learning, Exploratory Data Analysis, Statistical Analysis, KNIME, TensorFlow, Neo4j, MySQL, JavaScript

Research Assistant

2016 - 2017
University of Glasgow — Urban Big Data Centre
  • Started with no knowledge of machine learning and coding and ended up building an eCommerce recommender system that relied on RNNs and collaborative filtering to predict user-product relevance.
  • Learned C# from scratch and developed an Android app with Xamarin, which aimed to collect sensitive data from mobile devices. Developed the solution end-to-end (both front, back end, and documentation) and paired it with a MySQL database for storage.
  • Manipulated high-dimensional datasets with 120 GB+ for feature creation using Python Pandas, PostgreSQL, RDD in Hadoop DFS, and Spark. Visualized the data using Tableau, Stata, and LaTeX.
  • Reviewed, replicated, and analyzed a variety of state-of-the-art research papers about recommender systems, information retrieval, and distributed systems.
  • Used GPU and parallel computing for modeling 100 GB+ datasets and Spark and Hadoop in a research environment on an on-premise cluster.
Technologies: Data Analysis, Data Engineering, Deep Learning, XGBoost, Artificial Intelligence (AI), Keras, Neural Networks, Natural Language Processing (NLP), Generative Pre-trained Transformers (GPT), Python, Data Science, Data Analytics, Statistical Data Analysis, Sentiment Analysis, Android, Machine Learning, Exploratory Data Analysis, Statistical Analysis, LaTeX, Spark, Hadoop, Java, C#

Assistant Brand Manager

2015 - 2015
Procter & Gamble
  • Led a competitive analysis initiative across nine SEE regions.
  • Co-led a team of 5-10 people for launching Pampers Premium Care’s biggest innovation in the past five years and a Pampers UNICEF PR campaign across four SSE regions.
  • Identified pricing gaps and researched and presented viable solutions to increase the company’s competitiveness in four SEE regions.
Technologies: Data Analysis, Data Science, Data Analytics, Statistical Data Analysis, Analytics, Marketing, Branding, Management, Microsoft Excel, Microsoft PowerPoint

Co-founder

2014 - 2015
Crowd Augur
  • Designed the project to harness video gamers’ actions for augmenting data analysis algorithms. Started in collaboration with five McGill-based bioinformatics and computer science researchers.
  • Took the initiative to secure meetings with top executives, which led to several partnership agreements and four qualified clients from healthcare and finance.
  • Proposed and developed a unique business model for bringing more accurate data analysis to genomics and finance.
  • Ranked 5/150+ in the McGill University’s Dobson Startup Cup.
Technologies: Data Analysis, Data Science, Statistical Data Analysis, Leadership, Research, Analysis, Microsoft PowerPoint, Strategy, Business Strategy, Microsoft Excel

Assistant Manager

2010 - 2014
Maximal Group
  • Proposed, built, and promoted using SEO, Google AdWords, and Analytics, the company’s first online store. This initiative leads to a four times increase in new customer acquisition and a 13% increase in sales in the first three months.
  • Took the initiative to propose and coordinate a Kaizen/Lean-inspired waste reduction program that contributed to a 30% leftover reduction.
  • Managed suppliers and negotiated bulk purchases, which led to a 5% reduction in raw material costs.
Technologies: Data Analysis, Data Science, Statistical Data Analysis, Technical Leadership, Management, Lean, Warehouses, Statistical Modeling, WordPress, CSS, HTML, Python 3

Inventory Depletion Modeling

One of the projects that I've worked on is calculating the probability of a warehouse running out of stock given (Xt = Xt-1 + e^Wt, Xt) accumulated orders at a given time, and Wt is a normally distributed random variable with mean 0 and standard deviation = 1.5. Orders at t = 0 is X0 = 10. The warehouse has 100 units in stock. To do: estimate the probability of running out of stock at t=10.

Satellite Building Damage Detection

https://github.com/tudoriliuta/CollapseView
I trained a CNN (convolutional neural network) in TensorFlow to recognize houses from satellite imagery. The aim was to re-run the model on an image post-earthquake for identifying collapsed units. 97%+ accuracy.

Traffic Accident Modeling

https://github.com/tudoriliuta/RoadAccidentPrediction
I built a model for visualizing clusters of road accidents across the UK. I used KDE and XGB for visualizing and modeling road accidents. Thousands of various accidents were analyzed in the process.

Mood Music

https://github.com/tudoriliuta/MoodMusic
This is a project where the music adapts to your emotions with data extracted from your own webcam.

Smart Notification Management System

I designed a smart notification management system. It is an adaptive notifications system that aims to group together multiple notifications based on the users' circadian cycle, popularity, and behavior.

Association Rule Learning for eCommerce

I boosted a UK-based industrial retail client's revenues by 11% by recommending opportunities to upsell.

Housing Market Price Prediction

This project consists of two main parts:
1. London housing market price predictions—stacked learners and seasonal ARIMA-based models.
2. Forecasted the error of Zillow's internal model better than 93% of other submitted models; used stacked models in Python.

DermaView: Skin Lesion Detection, Segmentation, and Categorization

I used RCNN/DCNN and CRF on 50,000+ samples (ISIC, scraped and generated imagery) for identifying over 1,000 skin condition subtypes from HD images.

Allergen-aware Food Recipe Recommendations Using Graph Embeddings

The project's goal was to unify and structure the existing knowledge about dietary preferences, allergens, intolerances, and their interactions. Given the distribution of various allergens, both IgE and non-IgE (delayed response), the goal was to identify the likelihood of one of them to be present in a specific ingredient and, therefore, to pose a threat to the user of an app.

Some users might be allergic to peanuts, which might not be an issue if the dish contains Brazilian nuts. Similarly, a user might be intolerant to peanuts, but not if the amount is small in a given dish.

For the two types of users, the perceived risk can differ. In the first case, the user perceives Brazil nuts as dangerous, while their real risk is low (restaurant might also process groundnuts), and in the second case, the perceived risk is medium, but the user can decide if it’s acceptable. All of these allergen - ingredient ’risk’ relationships are approved by an expert and categorized.

Secure Aggregation, Analysis, and Sharing of DICOM Radiology Data

Hospitals' DICOM imagery is curated, securely stored, anonymized, calibrated, and automatically annotated by using custom computer vision algorithms, at scale.

Access to anonymized imagery is offered on-demand, to verified research departments, startups, and other partners, via virtual machines (VMs) hosted in a private cloud with strict data management and exfiltration prevention protocols.
2017 - 2021

Graduate Diploma in Mathematics

London School of Economics - London, UK

2012 - 2016

Master's Degree in Economics, Econometrics, and Management

University of Glasgow - Glasgow, Scotland

2014 - 2015

Exchange in Strategy and Computer Science

McGill University - Montreal, Canada

2011 - 2012

Bachelor's Degree in Mathematics and Management

University of Babes-Bolyai - Cluj-Napoca, Romania

Libraries/APIs

SciPy, NumPy, Scikit-learn, TensorFlow, PySpark, XGBoost, Keras, Pandas, Matplotlib, Natural Language Toolkit (NLTK), OpenCV, Spark ML, Amazon EC2 API, NetworkX, Spark Streaming, PyTorch

Tools

Tableau, Microsoft PowerPoint, Microsoft Excel, Amazon Athena, PyCharm, IPython Notebook, Amazon SageMaker, Amazon WorkSpaces, LaTeX, Reuters Eikon, Amazon Simple Queue Service (SQS), TensorBoard, AWS Glue, Amazon Kinesis Data Firehose

Frameworks

Flask, Spark, Scrapy, Hadoop, Django

Languages

Python 3, Python, SQL, Bash, HTML, CSS, C#, R, Java, JavaScript

Paradigms

Requirements Analysis, Object-oriented Programming (OOP), Data Science, Management, Siamese Neural Networks

Platforms

Amazon Web Services (AWS), WordPress, Amazon EC2, iOS, Windows, Jupyter Notebook, AWS Lambda, Google Cloud Platform (GCP), Linux, Docker, Azure, Ubuntu, KNIME, Android, Kubernetes

Storage

MySQL, Amazon S3 (AWS S3), MongoDB, Databases, Graph Databases, Amazon DynamoDB, Neo4j

Industry Expertise

Project Management, Retail & Wholesale, Marketing, Healthcare

Other

Machine Learning, Data Analysis, Data, Unstructured Data Analysis, Complex Data Analysis, Scientific Data Analysis, Exploratory Data Analysis, Prescriptive Analytics, Prescriptive Modeling, Predictive Analytics, Statistical Analysis, Random Forest Regression, Regression, Decision Tree Regression, Logistic Regression, Linear Regression, Regression Modeling, Classification, Classification Algorithms, Text Classification, Decision Tree Classification, Stacked Ensemble, Startups, Early-stage Startups, Enterprise Startups, High-tech Startups, Lean Startups, Startup Consulting, Time Series Analysis, Predictive Modeling, Data Reporting, Statistics, Lean, Analytics, Analysis, Research, Data Engineering, OCR, Image Analysis, Statistical Modeling, Statistical Data Analysis, Neural Networks, Statistical Forecasting, Communication, Data Analytics, Natural Language Processing (NLP), Image Recognition, Computer Vision, Natural Language Understanding (NLU), Artificial Intelligence (AI), Artificial Neural Networks (ANN), Deep Neural Networks, Convolutional Neural Networks (CNN), Recurrent Neural Networks (RNNs), Gradient Boosting, Gradient Boosted Trees, Ensemble Methods, Bootstrapping, Deep Learning, Demand Sizing & Segmentation, Modeling, Pharmaceuticals, Generative Pre-trained Transformers (GPT), Image Processing, Signal Processing, Technical Leadership, Sentiment Analysis, Warehouses, Branding, Business Strategy, Software Engineering, GNN, Quantitative Modeling, Leadership, Strategy, BERT, Computer Vision Algorithms, Explainable Artificial Intelligence (XAI), Unsupervised Learning, Parquet, Education, Radiology, Fintech, Machine Learning Operations (MLOps), Recommendation Systems, Lean Project Management, Grakn, Directed Acrylic Graphs (DAG), GraphSAGE, Food Safety, Food Science, DICOM, Healthcare IT, Healthcare Management Systems

Collaboration That Works

How to Work with Toptal

Toptal matches you directly with global industry experts from our network in hours—not weeks or months.

1

Share your needs

Discuss your requirements and refine your scope in a call with a Toptal domain expert.
2

Choose your talent

Get a short list of expertly matched talent within 24 hours to review, interview, and choose from.
3

Start your risk-free talent trial

Work with your chosen talent on a trial basis for up to two weeks. Pay only if you decide to hire them.

Top talent is in high demand.

Start hiring