CTO2017 - PRESENTBot MD
Technologies: PostgreSQL, Chatbots, WebSockets, Django, Python
- Led and managed a remote team of two back-end engineers, two Android engineers, and an assortment of freelancers for Bot MD, a clinical AI assistant for doctors, as part of YCombinator's S18 batch.
- Spearheaded the development of a full-featured Android chat application with various productivity features for doctors.
- Built the chat engine from scratch, leveraging my deep understanding of linguistics and NLP.
Technical Advisor2016 - PRESENTStravito
Technologies: Search Engines, Information Retrieval, Machine Learning, Natural Language Processing (NLP), Elasticsearch, Python
- Provided technical expertise and advised on information from unstructured text.
- Led a team of two engineers to build a customized text search algorithm for market research documents.
Scientist2016 - PRESENTInstitute for Infocomm Research
Technologies: Natural Language Processing (NLP), Machine Learning
- Researched novel techniques for improving state-of-the-art NLP systems.
Advisor and Data Scientist in Residence2016 - PRESENTIntelllex
Technologies: Machine Learning, Information Retrieval, Natural Language Processing (NLP)
- Advised and collaborated with the engineering team on topics and techniques related to natural language processing, information retrieval, and machine learning.
- Provided domain knowledge and input on product roadmaps.
Technical Advisor2015 - PRESENTAirPR, Inc.
Technologies: Amazon Web Services (AWS), AWS EMR, AWS, MongoDB, MySQL, Java, Ruby on Rails (RoR), Ruby, Flask, Elasticsearch, Spark, Scala, Python
- Built an automatic key phrase extraction module for PR news (soundbites).
- Designed customized author ranking algorithms for LinkedIn publishers using social and influence metrics.
- Improved the Elasticsearch relevance ranking algorithm by designing custom features and metrics. We improved results relevance rankings by 30%.
- Implemented a state-of-the-art customized sentiment classifier for Tweets using crowdsourcing and ensemble methods.
- Built a data processing pipeline for handling millions of articles using Spark and Elasticsearch.
- Built an NLP pipeline for processing millions of news articles.
- Utilized techniques that included logistic regression, support vector machines (SVM), random forests, and ensemble methods.
Visiting PhD Scholar2015 - 2016University of Washington
- Performed a variety of academic duties as scholar in residence with the University of Washington Computer Science and Engineering department.
Graduate Research Assistant2011 - 2016Carnegie Mellon University
Technologies: LaTeX, Julia, Python, C++, Java
- Assisted the course Introduction to Natural Language Processing (NLP) and Graduate Seminar on Advanced NLP.
- Pursued research interests in Machine Learning (ML), Natural Language Processing (NLP), and Computational Social Science (CSS).
- Applied NLP techniques to text mining and information extraction tasks.
- Built tools to help automatic discovery and analysis of decision making in the U.S. Supreme Court.
- Built tools to help political scientists analyze and explore speeches of U.S. presidential candidates.
- Gained expert knowledge of statistical models, probabilistic graphical models, MCMC and variational methods, deep learning, and topic modeling.
Research Intern2013 - 2013Google, Inc.
- Worked with the Google Knowledge team to improve their state of the art NLP pipeline.
- Proposed and implemented a novel model for joint inference on named entity recognition/tagging and coreference resolution.
- Developed efficient algorithms for performing inference in high-dimension combinatorics space using dual decomposition.
- Utilized techniques including dual decomposition, support vector machine (SVM), conditional random fields (CRF), and graphical models.
Research Officer2010 - 2011Institute for Infocomm Research
Technologies: Apache UIMA, Java
- Built a state-of-the-art entity resolution system by leveraging unsupervised latent topic features.
- Designed a robust high precision acronym identification module using carefully crafted features.
- Ranked #3 in the 2011 Knowledge Base Population shared task.
- Utilized algorithms including SVM, Naive Bayes, Latent Dirichlet Allocation topic modeling, and UIMA for the NLP pipeline.