
Dragos Tudor
Technical Leader and Developer
Dragos is a technical leader who touched the lives of 1.5 million users and generated $50 million in business value by building and deploying machine learning implementations for international enterprises, SMEs, and startups. Dragos has worked across the entire engineering pipeline with both executives and data scientists, building production-ready recommender systems, advanced NLP models, time-series forecasting data products and classifiers, and other custom advanced analytics capabilities.
Portfolio
Experience
Python - 5 yearsData Science - 5 yearsNatural Language Processing (NLP) - 4 yearsGPT - 4 yearsNeural Networks - 4 yearsGenerative Pre-trained Transformers (GPT) - 4 yearsTechnical Leadership - 4 yearsXGBoost - 3 yearsAvailability
Preferred Environment
Google Cloud Platform (GCP), Amazon Web Services (AWS), Amazon WorkSpaces, Amazon SageMaker, Python, R, TensorFlow, Linux
The most amazing...
...projects I've built are transformer-based deep learning models and custom embeddings on 1.5 billion multilingual emails for detecting spear-phishing attacks.
Work Experience
Founder | Senior Data Scientist
Quasar Labs
- Consulted enterprise, SMB, and startup clients on the implementation of cutting-edge machine learning capabilities for a variety of use cases with the express goal of increasing performance and impact.
- Communicated with executives, senior managers, and teams of data scientists from over 20 companies and over 40 countries.
- Implemented deep learning neural networks using CNNs in TensorFlow for object detection and recognition—earthquake impact detection, receipt text detection, valve defect, and wear and tear detection.
- Built custom learners for revenue forecasting in retail using seasonal ARIMA and RNNs and 85GB hourly sampled data. Deployed models in a real-time production environment—used Docker, Flask, AWS, PostgreSQL, and MySQL Server.
- Implemented optical character recognition (OCR) for automated receipt text extraction and classification using Google OCR, TensorFlow, Flask, and Keras.
- Developed an end-to-end training pipeline to predict user churn for a telecom client from the Bahamas. The architecture used leveraged time-to-event RNNs and gradient boosted decision trees.
Founder
DataZip
- Collected, processed, and controlled the distribution of auto dual dash-cam imagery and telematics data, as well as healthcare imagery.
- Built pipelines for cleaning, processing, classifying, and anomaly detection applied to 1080p and 720p, and 30fps footage.
- Synchronized the telematics and dash-cam video footage using audio recordings, Fast Fourier Transform (FFT) convolutions, de-noising, and signal processing techniques.
- Implemented image semantic segmentation, road object classification, identification of rapid decelerations/breaks, and occurrences of near-misses and collisions.
- Managed client interactions, projects, and development.
Data Scientist | Natural Language Research Engineer
Tessian
- Developed language models, transfer learning, text analysis, classification and clustering, few-shot learning, embeddings, and attention to RNN networks across 100GB of email data.
- Pioneered techniques such as unsupervised data augmentation, weak supervision in Snorkel MeTaL, and multi-task learning for malicious data classification.
- Implemented end-to-end machine learning models in production, using TensorFlow, AWS S3 and Athena, and SageMaker on both CPU and GPU-based architectures.
- Proactively explored and analyzed the compatibility of string similarity matching using one-shot learning and siamese networks across multiple use cases.
- Implemented various codebase improvements, testing automation, parallelized processing, and documentation design.
Data Scientist
Apsara Capital
- Led the development and implementation of the data analysis and research infrastructure.
- Developed the AWS S3, Lambda, EC2, and Docker orchestration for extracting, processing, and storing financial, economic, and market data from the Thomson Reuters Eikon API.
- Built an NLP language model using Snorkel and MeTaL for the analysis earnings of call transcripts.
- Created the technical analysis infrastructure using R and a set of 20 customizable technical indicators.
- Designed the codebase, automate the testing, integrated the production, and generated and managed documentation.
Data Scientist
Tracktics GmbH
- Analyzed time series data for motion classification and identification of activity bursts using CNN, Bayesian models, and Monte Carlo simulations.
- Supported the development of the analytical pipeline and user segmentation capabilities using AWS S3, AWS Lambda, and EC2.
- Implemented data management and visualization with AWS SQS, S3, DynamoDB, Python, Pandas, and Bokeh.
- Developed a general motion analysis over triaxial accelerometer, gyroscope, magnetometer data in addition to GPS and video.
- Proactively researched sports analytics, documentation management, scrum integration, and agile methodologies.
Data Scientist | Analyst
PredictX
- Took the initiative and improved sales forecasting capabilities by more than 20% as part of an MVP for a retail client with 700 POS. Used tree-based/linear models and 40TB+ extraneous variables such as weather, events, and client-specific metrics.
- Drove business decisions by researching, testing, and integrating various regression and classification-based models using Python Scikit-learn, TensorFlow, and Keras.
- Led the implementation of end-to-end ETL processes using Python, MySQL, PostgreSQL, and Knime.
- Applied association rule mining with Neo4j Graph data representations for product recommendations in retail. Replicated results in production and supported the transition of the research initiative to a new market-ready product.
- Developed an insurance algorithm for seismic and flood risk computation using MCMC.
- Delivered codebase improvements via the use of in-memory processing with Spark and Hadoop.
Research Assistant
University of Glasgow — Urban Big Data Centre
- Started with no knowledge of machine learning and coding and ended up building an eCommerce recommender system that relied on RNNs and collaborative filtering to predict user-product relevance.
- Learned C# from scratch and developed an Android app with Xamarin, which aimed to collect sensitive data from mobile devices. Developed the solution end-to-end (both front, back end, and documentation) and paired it with a MySQL database for storage.
- Manipulated high-dimensional datasets with 120 GB+ for feature creation using Python Pandas, PostgreSQL, RDD in Hadoop DFS, and Spark. Visualized the data using Tableau, Stata, and LaTeX.
- Reviewed, replicated, and analyzed a variety of state-of-the-art research papers about recommender systems, information retrieval, and distributed systems.
- Used GPU and parallel computing for modeling 100 GB+ datasets and Spark and Hadoop in a research environment on an on-premise cluster.
Assistant Brand Manager
Procter & Gamble
- Led a competitive analysis initiative across nine SEE regions.
- Co-led a team of 5-10 people for launching Pampers Premium Care’s biggest innovation in the past five years and a Pampers UNICEF PR campaign across four SSE regions.
- Identified pricing gaps and researched and presented viable solutions to increase the company’s competitiveness in four SEE regions.
Co-founder
Crowd Augur
- Designed the project to harness video gamers’ actions for augmenting data analysis algorithms. Started in collaboration with five McGill-based bioinformatics and computer science researchers.
- Took the initiative to secure meetings with top executives, which led to several partnership agreements and four qualified clients from healthcare and finance.
- Proposed and developed a unique business model for bringing more accurate data analysis to genomics and finance.
- Ranked 5/150+ in the McGill University’s Dobson Startup Cup.
Assistant Manager
Maximal Group
- Proposed, built, and promoted using SEO, Google AdWords, and Analytics, the company’s first online store. This initiative leads to a four times increase in new customer acquisition and a 13% increase in sales in the first three months.
- Took the initiative to propose and coordinate a Kaizen/Lean-inspired waste reduction program that contributed to a 30% leftover reduction.
- Managed suppliers and negotiated bulk purchases, which led to a 5% reduction in raw material costs.
Experience
Inventory Depletion Modeling
Satellite Building Damage Detection
https://github.com/tudoriliuta/CollapseViewTraffic Accident Modeling
https://github.com/tudoriliuta/RoadAccidentPredictionMood Music
https://github.com/tudoriliuta/MoodMusicSmart Notification Management System
Association Rule Learning for eCommerce
Housing Market Price Prediction
1. London housing market price predictions—stacked learners and seasonal ARIMA-based models.
2. Forecasted the error of Zillow's internal model better than 93% of other submitted models; used stacked models in Python.
DermaView: Skin Lesion Detection, Segmentation, and Categorization
Allergen-aware Food Recipe Recommendations Using Graph Embeddings
Some users might be allergic to peanuts, which might not be an issue if the dish contains Brazilian nuts. Similarly, a user might be intolerant to peanuts, but not if the amount is small in a given dish.
For the two types of users, the perceived risk can differ. In the first case, the user perceives Brazil nuts as dangerous, while their real risk is low (restaurant might also process groundnuts), and in the second case, the perceived risk is medium, but the user can decide if it’s acceptable. All of these allergen - ingredient ’risk’ relationships are approved by an expert and categorized.
Secure Aggregation, Analysis, and Sharing of DICOM Radiology Data
Access to anonymized imagery is offered on-demand, to verified research departments, startups, and other partners, via virtual machines (VMs) hosted in a private cloud with strict data management and exfiltration prevention protocols.
Skills
Languages
Python 3, Python, SQL, Bash, HTML, CSS, C#, R, Java, JavaScript
Frameworks
Flask, Spark, Scrapy, Hadoop, Django
Libraries/APIs
SciPy, NumPy, Scikit-learn, TensorFlow, PySpark, XGBoost, Keras, Pandas, Matplotlib, Natural Language Toolkit (NLTK), OpenCV, Spark ML, Amazon EC2 API, NetworkX, Spark Streaming, PyTorch
Tools
Tableau, Microsoft PowerPoint, Microsoft Excel, Amazon Athena, PyCharm, IPython Notebook, Amazon SageMaker, Amazon WorkSpaces, LaTeX, Reuters Eikon, Amazon Simple Queue Service (SQS), TensorBoard, AWS Glue
Paradigms
Requirements Analysis, Object-oriented Programming (OOP), Data Science, Management, Siamese Neural Networks
Platforms
Amazon Web Services (AWS), WordPress, Amazon EC2, iOS, Windows, Jupyter Notebook, AWS Lambda, Google Cloud Platform (GCP), Linux, Docker, Azure, Ubuntu, KNIME, Android, Kubernetes
Storage
MySQL, Amazon S3 (AWS S3), MongoDB, Databases, Graph Databases, Amazon DynamoDB, Neo4j
Industry Expertise
Project Management, Retail & Wholesale, Marketing, Healthcare
Other
Machine Learning, Data Analysis, Data, Unstructured Data Analysis, Complex Data Analysis, Scientific Data Analysis, Exploratory Data Analysis, Prescriptive Analytics, Prescriptive Modeling, Predictive Analytics, Statistical Analysis, Random Forest Regression, Regression, Decision Tree Regression, Logistic Regression, Linear Regression, Regression Modeling, Classification, Classification Algorithms, Text Classification, Decision Tree Classification, Stacked Ensemble, Startups, Early-stage Startups, Enterprise Startups, High-tech Startups, Lean Startups, Startup Consulting, Time Series Analysis, Predictive Modeling, Data Reporting, Statistics, Lean, Analytics, Analysis, Research, Data Engineering, OCR, Image Analysis, Statistical Modeling, Statistical Data Analysis, Neural Networks, Statistical Forecasting, Communication, Data Analytics, Natural Language Processing (NLP), Image Recognition, Computer Vision, Natural Language Understanding (NLU), Artificial Intelligence (AI), Artificial Neural Networks (ANN), Deep Neural Networks, Convolutional Neural Networks, Recurrent Neural Networks (RNN), Gradient Boosting, Gradient Boosted Trees, Ensemble Methods, Bootstrapping, Deep Learning, Demand Sizing & Segmentation, Modeling, Pharmaceuticals, GPT, Generative Pre-trained Transformers (GPT), Image Processing, Signal Processing, Technical Leadership, Sentiment Analysis, Warehouses, Branding, Business Strategy, Software Engineering, GNN, Quantitative Modeling, Leadership, Strategy, BERT, Computer Vision Algorithms, Explainable Artificial Intelligence (XAI), Unsupervised Learning, Parquet, Education, Radiology, Fintech, Machine Learning Operations (MLOps), Recommendation Systems, Amazon Kinesis Data Firehose, Lean Project Management, Grakn, Directed Acrylic Graphs (DAG), GraphSAGE, Food Safety, Food Science, DICOM, Healthcare IT, Healthcare Management Systems
Education
Graduate Diploma in Mathematics
London School of Economics - London, UK
Master's Degree in Economics, Econometrics, and Management
University of Glasgow - Glasgow, Scotland
Exchange in Strategy and Computer Science
McGill University - Montreal, Canada
Bachelor's Degree in Mathematics and Management
University of Babes-Bolyai - Cluj-Napoca, Romania