Data Scientist (NLP Research)2019 - PRESENTTessian
Technologies: Python, NLP, TensorFlow, Bash, Docker, Keras, AWS S3, DynamoDB, Athena
- Developed language models, transfer learning, text analysis/classification and clustering, few-shot learning, embeddings, and attention RNN networks across 100GB email data.
- Worked on unsupervised data augmentation, weak supervision in Snorkel MeTaL and multi-task learning for malicious data classification.
- Implemented end-to-end machine learning models, in production, using TensorFlow, AWS S3/Athena and SageMaker on both CPU and GPU based architectures.
- Worked on string similarity and matching with one-shot learning/Siamese networks.
- Implemented various codebase improvements, testing automation, parallelized processing, and documentation design.
Founder2018 - PRESENTQuasar Labs
Technologies: Python, R, TensorFlow, SQL, Keras, Flask, TensorFlow Lite
- Consulted on the implementation of cutting-edge machine learning research for various companies with the express goal of increasing performance and impact.
- Implemented deep learning using CNNs in TensorFlow for object detection and recognition (earthquake impact detection and receipt text detection).
- Developed end-to-end training pipeline for churn prediction in Telecom using time-to-event RNN and gradient boosted decision trees.
- Built custom learners for revenue forecasting in retail using seasonal ARIMA and RNN over 85GB hourly sampled data. Implemented them in production for close to real-time prediction using Bash and Docker. The infrastructure: private including PostgreSQL and MySQL Server.
- Implemented OCR (optical character recognition) for automated receipt text extraction and classification using Google OCR, TensorFlow, Flask, and Keras.
Data Scientist | Engineer (Contract)2018 - 2018Apsara Capital
Technologies: Python, AWS S3, Athena, Glue, Firehose, R
- Led the development and implementation of the data analysis and research infrastructure.
- Developed the AWS S3, Lambda, EC2, and Docker orchestration for extracting, processing, and storing financial, economic and market data from Thomson Reuters Eikon API.
- Built an NLP language model using Snorkel and MeTaL for the analysis earnings of call transcripts.
- Created the technical analysis infrastructure using R and a set of 20 customizable technical indicators.
- Designed the codebase, automate the testing, integrated the production, and generated and managed the documentation.
Data Scientist | Analyst (Contract)2017 - 2018Tracktics GmbH
- Analyzed time series data for motion classification and identification of activity bursts using CNN, Bayesian models, and Monte Carlo simulations.
- Supported the development of the analytical pipeline and user segmentation capabilities using AWS S3, AWS Lambda, and EC2.
- Implemented data management and visualization with AWS SQS, S3, DynamoDB, Python, and Pandas/Bokeh.
- Developed a general motion analysis over triaxial accelerometer, gyroscope, magnetometer data in addition to GPS and video.
- Researched about sports analytics, documentation management, and Scrum integration.
Data Scientist | Analyst2017 - 2018Predict X
- Implemented forecasting models including retail sales analysis using more than 40 TB of extraneous data such as weather, events, and client-specific metrics. Used proprietary infrastructure based on PostgreSQL, Vector, Bash and TensorFlow/Scikit-learn.
- Drove business decisions by researching, testing and integrating various regression and classification-based models using Python Scikit-learn, TensorFlow, and Keras.
- Implemented end-to-end ETL processes using Python, MySQL, PostgreSQL, and Knime. Code-base management and refinement, numerous efficiencies created by using multi-processing and the introduction of Spark and Hadoop.
- Applied association rule mining over a Neo4j Graph database for product recommendations in retail. Replicated results in production and supported the transition of the research initiative to a new market-ready product.
- Developed an insurance algorithm for seismic and flood risk computation using MCMC.
Research Assistant2016 - 2017University of Glasgow — Urban Big Data Centre
Technologies: C#, Java, Python, Xamarin, Hadoop, Spark, LaTeX, Stata
- Built an eCommerce recommendation system that predicted user-product relevance via RNNs and collaborative filtering.
- Developed a C# app in Xamarin for sensitive data collection from Android mobile devices. Created a full end-to-end solution from front-end to back-end automation using a remote MySQL database for data storage.
- Manipulated high-dimensional datasets (120 GB+) for feature creation using Python Pandas, PostgreSQL, RDD in Hadoop DFS and Spark. Visualized the data using Tableau, Stata, and LaTeX.
- Worked on machine learning research and paper replication in a research environment. Used GPU and parallel computing for modeling 100 GB+ datasets and Spark and Hadoop in a research environment on an on-premise cluster.