Senior Python Web Scraping Specialist2022 - 2022Number Five House Ltd
Technologies: Python, Web Scraping, Google Sheets API, Google Sheets
- Developed and deployed a web scraping pipeline with Python and Selenium.
- Collected profile and network data from a large online social media platform.
- Performed ETL on the data. Used network analysis and machine learning to enrich collected information.
Undergraduate Researcher2021 - 2022Imperial College London
Technologies: Python 3, HPCC Systems, Machine Learning, Applied Mathematics, TensorFlow, PyTorch, Data Analytics, Data Visualization, Data Science, Artificial Intelligence (AI), Python, APIs, Jupyter Notebook, Pandas, Pytest, Data Reporting, Statistical Modeling, Web Scraping, Data Analysis, Big Data, Google Sheets API, Google Sheets
- Built a TensorFlow machine learning pipeline using Python to predict the properties of high-energy X-ray pulses at ultrafast rates.
- Deployed machine learning pipelines to the university's high-performance computing cluster using Secure Shell.
- Developed simulations of quantum many-body physics in Python and devised a new measurement scheme to analyze simulations.
- Used advanced statistics and machine learning, including restricted Boltzmann machine neural networks, to extract knowledge from our simulations.
- Co-authored two papers currently in preparation, both applying machine learning to different physics regimes.
Research Intern2020 - 2020The Institute of Cancer Research
Technologies: R, Python 3, Data Science, Genetics, Machine Learning, Data Analytics, Data Visualization, Artificial Intelligence (AI), Python, SQL, Jupyter Notebook, Pandas, Data Engineering, Pytest, Data Reporting, Statistical Modeling, Tableau, Data Mining, Data Analysis, Big Data, STATA, Google Sheets API, Google Sheets
- Developed an unsupervised learning pipeline to analyze genetic risk factor pathways for brain tumors in adults.
- Programmed and debugged R and Python to contribute to interdisciplinary research.
- Implemented my pipeline on a high-performance computing cluster.
- Updated the legacy code to use Python 3 instead of Python 2.
- Performed tissue-specific analysis to find significant risk factors that would not be recognized as significant without accounting for tissue differences.