Data Science Contractor
2022 - 2023Shell- Design solutions for carbon sequestration. Mined structured and unstructured data on the molecular processes driving the carbon cycle in the soil. Identified interventions to optimize the carbon cycle.
- Validated findings against the published literature (20M articles) using NLP.
- Developed interactive dashboard with the results and published to end users.
Technologies: Python 3, Scikit-learn, NumPy, Pandas, Git, Machine Learning, Artificial Intelligence (AI), Natural Language Processing (NLP), StatisticsMachine Learning CTO for carbon emission reduction project
2022 - 2022Carbon Connect Enterprise Strategies Inc.- Developed a platform for monitoring forest growth and carbon credit budgeting.
- Mined and segmented Lidar and satellite images for the identification of trees and tree growth.
- Built a dashboard using Streamlit in Python, deployed in GCP to allow the users query their data.
Technologies: Data Science, Machine Learning, Large-Scale Computing, CTOData Science Contractor
2020 - 2022AstraZeneca- Developed a machine learning workflow to leverage and interpret genetic data. This included parsing and preprocessing patient data, normalization, dimensionality reduction, statistical tests, and supervised analysis.
- Created a natural language solution for mining biomedical literature. The data was structured in an Elasticsearch database, cleaned, tokenized using the Natural Language Toolkit (NLTK), vectorized, and then used in a text classification framework.
- Built dashboards and UI using Streamlit in Python. Deployed using Nginx.
Technologies: Python 3, Bash Script, Data Science, Machine Learning, Natural Language Processing (NLP), Scikit-learn, Keras, TensorFlow, Streamlit, NGINX, Python, Data Analysis, Spotfire, Flask, Git, Data VisualizationData Science Contractor
2019 - 2020Arm- Built a machine learning framework for maximizing coverage in CPU verification. Development was in Python; deployed on HPC using the Slurm Workload Manager.
- Developed workflows leveraging adversarial learning using GANs and programmed in Python Keras.
- Addressed numerical optimization problems using genetic algorithms with a custom GA implementation.
Technologies: Python 3, Scikit-learn, Keras, TensorFlow, Generative Adversarial Networks (GANs), Bash, Jenkins, Git, Slurm Workload Manager, GitHub, Python, Deep Learning, Genetic Algorithms, Numerical Methods, Convex Optimization, Data VisualizationPrincipal Data Scientist
2016 - 2019UCB Celltech- Built machine learning workflows to predict patient response to candidate drugs. Developed in R.
- Led a team of three developers to create exploratory analytics solutions/dashboards to visualize high-dimensional data. Results were pre-calculated in R, then imported in TIBCO Spotfire.
- Designed machine learning solutions to predict drug activity in assays. Used LSTMs to model chemical structures as free text and applied methods from text classification.
Technologies: R, Python 3, Spotfire, Linux, H20, Keras, LSTM, Git, Python, Data Analysis, Data Analytics, Data Science, Machine Learning, Bioinformatics, Genomics, Data VisualizationPostdoctoral Research Fellow
2014 - 2016U.S. Food & Drug Administration- Developed a solution for predicting drug adverse events based on their transcriptomic profiles.
- Created a linear programming formulation to model the structure of directed graphs.
- Applied a solution to predict the adverse effects of new compounds.
Technologies: R, Linux, C, Slurm Workload Manager, Linear Optimization, NetworkX, Bioinformatics, Genomics, Drug Development, Python, Data Science, Data Analytics