Lead Data Scientist
2022 - PRESENTKoble- Developed an NLP service that describes any company activity and products given their raw website.
- Created an NLP service that classifies companies according to a taxonomy of sectors.
- Devised a classification/prediction pipeline that predicts startup success probability given market, team, and funding features.
- Introduced good data science practices and processes to the team.
Technologies: Amazon S3 (AWS S3), APIs, REST, Kubernetes, AWS Lambda, Amazon SageMaker, Elasticsearch, BERT, Large Language Models (LLM), Sentiment Analysis, Text Classification, Data AnalysisMachine Learning Consultant
2022 - 2022Telescope- Developed an automated email writing service based on historical emails and generative AI.
- Created a recommender system that suggests people contact based on previous people searched, keywords, and company description.
- Built an email classification system that classifies emails based on type. e.g., sales, inquiry, follow-up, and more.
Technologies: Amazon Web Services (AWS), Amazon S3 (AWS S3), AWS Lambda, Elasticsearch, Recommendation Systems, BERT, Sentiment Analysis, Computational Linguistics, Large Language Models (LLM), Text Classification, Data AnalysisMachine Learning Consultant
2020 - 2022Springbok- Developed a system using GPT-3 to automatically create questions and answers from documents, feeding them to a chatbot system answering customers' queries.
- Created a microservice for the linguistically aware natural language understanding (NLU) model to recommend text style correction for technical requirements.
- Produced machine learning (ML) services for multiple natural language processing (NLP) tasks, including text classification and recommendation.
Technologies: Python 3, Rasa NLU, Generative Pre-trained Transformer 3 (GPT-3), PyTorch, TensorFlow, APIs, Microservices, Recommendation Systems, Natural Language Understanding (NLU), NLU, PostgreSQL, Amazon S3 (AWS S3), Artificial Intelligence (AI), Docker, Kubernetes, Machine Learning, Data Science, Jupyter, REST APIs, Annotations, Amazon Web Services (AWS), Flask, Chatbots, Linux, MacOS, PyCharm, Mathematics, Statistics, Programming, Computational Linguistics, Natural Language Processing (NLP), Machine Learning Operations (MLOps), Continuous Delivery (CD), Continuous Integration (CI), Software Engineering, NumPy, Data Reporting, SciPy, Jupyter Notebook, Matplotlib, NLTK, Convolutional Neural Networks, Long Short-term Memory (LSTM), Applied Research, Supervised Machine Learning, Git, Topic Modeling, Python, SQL, NoSQL, Deep Learning, Neural Networks, Code Review, Task Analysis, Large Language Models (LLM), Sentiment Analysis, Text Classification, Data AnalysisSenior Machine Learning Engineer
2016 - 2020ComplyAdvantage- Built the ML pipeline models to read articles, an information extraction system that feeds into a knowledge graph of criminal and adverse media entities.
- Improved the company's entity extraction and classification system eight times, increasing unique entities in the knowledge graph through implementing the latest NLU research.
- Enhanced the ease of ML model deployment and development by redesigning the monolithic ML pipeline into a microservices-based scalable one.
- Guided the team's growth by helping with research and development projects and managed a junior ML engineer.
- Drove the inclusion of the latest NLP research in the company's solutions.
- Built the ML pipelines and led data collection automation through MLOps practices.
- Introduced an additional entity meta-data for the company's extraction system by developing a relation extractor using distant supervision methods.
- Led a significant refactoring project following a microservices approach to split the company's main article-reading ML pipeline into multiple projects, using Elasticsearch, Kubernetes, Docker, AWS, and CI/CD.
Technologies: Python 3, Amazon Web Services (AWS), Kubernetes, Docker, Helm, CI/CD Pipelines, Natural Language Understanding (NLU), Natural Language Processing (NLP), Machine Learning, Machine Learning Operations (MLOps), ETL, TensorFlow, Keras, PyTorch, Pandas, NLU, Amazon SageMaker, MTurk API, Annotations, Amazon S3 (AWS S3), Continuous Delivery (CD), Continuous Integration (CI), Big Data, Software Engineering, Flask, APIs, Artificial Intelligence (AI), Data Science, NumPy, Data Reporting, Scikit-learn, SciPy, Jupyter Notebook, Matplotlib, NLTK, Convolutional Neural Networks, Long Short-term Memory (LSTM), Applied Research, Supervised Machine Learning, Git, SQL, Topic Modeling, Linux, MacOS, PyCharm, Mathematics, Statistics, Programming, Social Network Analysis, Computational Linguistics, Rasa NLU, Microservices, PostgreSQL, Jupyter, REST APIs, Python, NoSQL, Deep Learning, Data Analytics, Neural Networks, Technical Hiring, Source Code Review, Code Review, Task Analysis, Interviewing, Team Management, Sentiment Analysis, Text Classification, Data Analysis