Dragos Tudor, Natural Language Processing (NLP) Developer in London, United Kingdom
Dragos Tudor

Natural Language Processing (NLP) Developer in London, United Kingdom

Member since June 10, 2018
For Dragos, automation, data science, and deep learning have a fundamental appeal. He’s worked across the entire engineering pipeline and has built production-ready recommender systems (rule mining, SVM, RNN), NLP models (language models, transfer learning, embeddings), time-series forecasting data products applied to financial, market, economic and retail data (ARIMA, SVR, GBM, LSTM, ensemble models), and time-series classifiers (CNN, DTW)
Dragos is now available for hire

Portfolio

  • Tessian
    Python, NLP, TensorFlow, Bash, Docker, Keras, AWS S3, DynamoDB, Athena
  • Quasar Labs
    Python, R, TensorFlow, SQL, Keras, Flask, TensorFlow Lite
  • Apsara Capital
    Python, AWS S3, Athena, Glue, Firehose, R

Experience

  • Data Science, 5 years
  • Python, 5 years
  • Neural Networks, 4 years
  • Natural Language Processing (NLP), 4 years
  • XGBoost, 3 years
  • Keras, 3 years
  • TensorFlow, 3 years
  • Computer Vision, 2 years

Location

London, United Kingdom

Availability

Part-time

Preferred Environment

Python, AWS, Linux, Ubuntu, TensorFlow, Bash, R

The most amazing...

...project involved custom made email embeddings, weak-supervision and unsupervised data augmentation for classifying maliciousness across 200 million emails.

Employment

  • Data Scientist (NLP Research)

    2019 - PRESENT
    Tessian
    • Developed language models, transfer learning, text analysis/classification and clustering, few-shot learning, embeddings, and attention RNN networks across 100GB email data.
    • Worked on unsupervised data augmentation, weak supervision in Snorkel MeTaL and multi-task learning for malicious data classification.
    • Implemented end-to-end machine learning models, in production, using TensorFlow, AWS S3/Athena and SageMaker on both CPU and GPU based architectures.
    • Worked on string similarity and matching with one-shot learning/Siamese networks.
    • Implemented various codebase improvements, testing automation, parallelized processing, and documentation design.
    Technologies: Python, NLP, TensorFlow, Bash, Docker, Keras, AWS S3, DynamoDB, Athena
  • Founder

    2018 - PRESENT
    Quasar Labs
    • Consulted on the implementation of cutting-edge machine learning research for various companies with the express goal of increasing performance and impact.
    • Implemented deep learning using CNNs in TensorFlow for object detection and recognition (earthquake impact detection and receipt text detection).
    • Developed end-to-end training pipeline for churn prediction in Telecom using time-to-event RNN and gradient boosted decision trees.
    • Built custom learners for revenue forecasting in retail using seasonal ARIMA and RNN over 85GB hourly sampled data. Implemented them in production for close to real-time prediction using Bash and Docker. The infrastructure: private including PostgreSQL and MySQL Server.
    • Implemented OCR (optical character recognition) for automated receipt text extraction and classification using Google OCR, TensorFlow, Flask, and Keras.
    Technologies: Python, R, TensorFlow, SQL, Keras, Flask, TensorFlow Lite
  • Data Scientist | Engineer (Contract)

    2018 - 2018
    Apsara Capital
    • Led the development and implementation of the data analysis and research infrastructure.
    • Developed the AWS S3, Lambda, EC2, and Docker orchestration for extracting, processing, and storing financial, economic and market data from Thomson Reuters Eikon API.
    • Built an NLP language model using Snorkel and MeTaL for the analysis earnings of call transcripts.
    • Created the technical analysis infrastructure using R and a set of 20 customizable technical indicators.
    • Designed the codebase, automate the testing, integrated the production, and generated and managed the documentation.
    Technologies: Python, AWS S3, Athena, Glue, Firehose, R
  • Data Scientist | Analyst (Contract)

    2017 - 2018
    Tracktics GmbH
    • Analyzed time series data for motion classification and identification of activity bursts using CNN, Bayesian models, and Monte Carlo simulations.
    • Supported the development of the analytical pipeline and user segmentation capabilities using AWS S3, AWS Lambda, and EC2.
    • Implemented data management and visualization with AWS SQS, S3, DynamoDB, Python, and Pandas/Bokeh.
    • Developed a general motion analysis over triaxial accelerometer, gyroscope, magnetometer data in addition to GPS and video.
    • Researched about sports analytics, documentation management, and Scrum integration.
    Technologies: Python, Django, Amazon Web Services (AWS), JavaScript
  • Data Scientist | Analyst

    2017 - 2018
    Predict X
    • Implemented forecasting models including retail sales analysis using more than 40 TB of extraneous data such as weather, events, and client-specific metrics. Used proprietary infrastructure based on PostgreSQL, Vector, Bash and TensorFlow/Scikit-learn.
    • Drove business decisions by researching, testing and integrating various regression and classification-based models using Python Scikit-learn, TensorFlow, and Keras.
    • Implemented end-to-end ETL processes using Python, MySQL, PostgreSQL, and Knime. Code-base management and refinement, numerous efficiencies created by using multi-processing and the introduction of Spark and Hadoop.
    • Applied association rule mining over a Neo4j Graph database for product recommendations in retail. Replicated results in production and supported the transition of the research initiative to a new market-ready product.
    • Developed an insurance algorithm for seismic and flood risk computation using MCMC.
    Technologies: Python, JavaScript, MySQL, Neo4j, Keras, TensorFlow, Knime
  • Research Assistant

    2016 - 2017
    University of Glasgow — Urban Big Data Centre
    • Built an eCommerce recommendation system that predicted user-product relevance via RNNs and collaborative filtering.
    • Developed a C# app in Xamarin for sensitive data collection from Android mobile devices. Created a full end-to-end solution from front-end to back-end automation using a remote MySQL database for data storage.
    • Manipulated high-dimensional datasets (120 GB+) for feature creation using Python Pandas, PostgreSQL, RDD in Hadoop DFS and Spark. Visualized the data using Tableau, Stata, and LaTeX.
    • Worked on machine learning research and paper replication in a research environment. Used GPU and parallel computing for modeling 100 GB+ datasets and Spark and Hadoop in a research environment on an on-premise cluster.
    Technologies: C#, Java, Python, Xamarin, Hadoop, Spark, LaTeX, Stata

Experience

  • Satellite Building Damage Detection (Development)
    https://github.com/tudoriliuta/CollapseView

    I trained a CNN (convolutional neural network) in TensorFlow to recognize houses from satellite imagery. The aim was to re-run the model on an image post-earthquake for identifying collapsed units. 97%+ accuracy.

  • Traffic Accident Modeling (Development)
    https://github.com/tudoriliuta/RoadAccidentPrediction

    I built a model for visualizing clusters of road accidents across the UK. I used KDE and XGB for visualizing and modeling road accidents.

  • Mood Music (Development)
    https://github.com/tudoriliuta/MoodMusic

    This is a project where the music adapts to your emotions with data extracted from your own webcam.

  • Association Rule Learning for eCommerce (Other amazing things)

    I boosted a UK-based industrial retail client's revenues by 11% by recommending opportunities to upsell.

  • Housing Market Price Prediction (Development)
    https://drive.google.com/open?id=1XxDoQr46Nm_gBBSBpdTimzsENL83neyK

    This project consists of two main parts:
    1. London housing market price predictions—stacked learners and seasonal ARIMA-based models.
    2. Forecasted the error of Zillow's internal model better than 93% of other submitted models; used stacked models in Python.

  • DermaView: Skin Lesion Detection, Segmentation, and Categorization (Development)

    I used RCNN/DCNN and CRF on 50,000+ samples (ISIC, scraped and generated imagery) for identifying over 1,000 skin condition subtypes from HD images.

Skills

  • Languages

    Python, R, SQL, C#, Java, JavaScript
  • Frameworks

    Spark, Scrapy, Django, Hadoop
  • Libraries/APIs

    Sklearn, TensorFlow, PySpark, XGBoost, Keras, Pandas, Matplotlib, NLTK, OpenCV, AWS EC2 API, Spark ML, Spark Streaming
  • Tools

    AWS Athena, PyCharm, IPython Notebook, Amazon SageMaker, Amazon WorkSpaces, Reuters Eikon, Amazon SQS, TensorBoard
  • Paradigms

    Object-oriented Programming (OOP), Data Science, Siamese Neural Networks
  • Platforms

    AWS EC2, iOS, Windows, Jupyter Notebook, AWS Lambda, Linux, Docker
  • Storage

    AWS S3, MySQL, AWS DynamoDB
  • Other

    Statistical Modeling, Statistical Data Analysis, Neural Networks, Statistical Forecasting, Communication, Data Analytics, Natural Language Processing (NLP), Image Recognition, Computer Vision, Natural Language Understanding, Artificial Intelligence (AI), Artificial Neural Networks (ANN), Deep Neural Networks, Convolutional Neural Networks, Recurrent Neural Networks, Gradient Boosting, Gradient Boosted Trees, Ensemble Methods, Bootstrapping, Deep Learning, Project Management, Leadership, Strategy, BERT, Computer Vision Algorithms, Explainable Artificial Intelligence (XAI), Unsupervised Learning, Parquet, Education, Healthcare, Radiology, Demand Sizing & Segmentation

Education

  • Graduate diploma in Mathematics
    2017 - 2019
    London School of Economics - London, UK
  • Master's degree in Economics, Econometrics, and Management
    2012 - 2016
    University of Glasgow - Glasgow, Scotland
  • Bachelor's degree in Mathematics and Management
    2011 - 2012
    University of Babes-Bolyai - Cluj-Napoca, Romania

To view more profiles

Join Toptal
I really like this profile
Share it with others