Freelance Tech Lead | Machine Learning Engineer
2019 - PRESENTEuropean-based Investment Company (via Toptal)- Led a team of Toptal engineers in a project to create an MVP of a platform for configuring and testing models for building stock portfolios.
- Researched architectures for portfolio-build models, including those using neural network approximation.
- Created a highly configurable system for training and backtesting models based on point-in-time financial data. Furthermore, automated the configurability itself by setting up hyperparameter optimization using Ray Tune.
- Oversaw the build of the ingestion/ETL framework capable of discovering potential stocks from third-party data suppliers (based on region/industry/market cap) and extracting historical market and fundamental data from them into a central DB.
- Ensured rich reporting of model performance via user-defined metrics and information around portfolio constituents. This was directly consumable via an API, including a Dash/Plotly dashboard with interactive graphics.
- Reviewed and maintained code for the Angular web app through which an administrator can test models and view tabular reports.
- Designed the database model used by the ORM (SQLAlchemy), which also underpinned the RESTful API that was robustly marshalled using Marshmallow.
- Set up the Google Cloud Platform-based infrastructure, including a Docker-based CI/CD pipeline for both the UI and back-end service, a Kubernetes engine for parallelizing long-running worker processes, worker queues (Pub/Sub), and scheduled tasks.
Technologies: Deep Learning, Kubernetes, Docker, ETL, Flask, SQL, Google Cloud Platform (GCP), Python, Neural Networks, Data Science, Machine Learning, Flask-RESTful, PostgreSQL, SQLAlchemy, Marshamallow, Angular, Keras, TensorFlow, Pandas, Alembic, Dash, PlotlyComputer Vision Engineer
2021 - 2021UK-based eCommerce Company- Devised and implemented a system for receiving an image via API and finding the closest matching products in a catalog.
- Isolated the encoder of the MobileNet neural network architecture to translate images to compressed vectors which could then be queried using an efficient similarity search index.
- Used MLflow to manage model storage, experiment tracking, and serve the model through an API.
- Derived an autoencoder so that the encoder model could be fine-tuned with any available training data.
- Leveraged TensorFlow datasets to move image data through the pipeline without overwhelming memory.
- Created a Python package for consuming RabbitMQ messages to create and wait on multiple subprocesses, all performed in an asynchronous manner using asyncio.
Technologies: Python, TensorFlow, Asyncio, MLflow, Image Processing, Scikit-learnFreelance Speech Recognition Engineer
2019 - 2019US-based Tech Startup (via Toptal)- Created speech recognition models using custom neural networks with Keras, along with Baidu's DeepSpeech architecture via TensorFlow.
- Handled preprocessing of audio file input types (PCM/WAV), including conversions to and manipulations of MFCCs/spectrograms.
- Wrapped up processes for data uploading, model training, persistence, and inference in RESTful APIs, with capabilities for complex versioning and live updates of training.
- Deployed and configured APIs to leverage GPUs for faster training/inference and a Celery distributed-task queue for training in parallel.
Technologies: Deep Learning, Audio Processing, Data Science, Neural Networks, Machine Learning, MongoDB, Flask, Celery, Keras, PythonFreelance Data Scientist
2019 - 2019Kalepa Corporation (via Toptal)- Trained and evaluated various models for classifying text data using GloVe word representations, bag-of-words model, and XGBoost among other ML/NLP techniques.
- Created and managed Amazon Mechanical Turk tasks for deriving labels for training classification, including detection of consistently inaccurate workers.
- Productionized the inference process into the PIP installable Python package.
Technologies: Natural Language Processing (NLP), Machine Learning, Data Science, SpaCy, MongoDB, Scikit-learn, XGBoost, Gensim, Pandas, PythonChief Information Officer
2019 - 2019Lawli Ltd- Led the technical direction for a UK government-funded legal AI app to help deliver the initial version of the product.
- Implemented initial NLP solutions in Python for providing document services at the heart of the app.
- Created the first version of an Angular 7 web app using Node.js and MongoDB.
- Conducted interviews to find people for the company's first development team to carry on my work.
Technologies: Natural Language Processing (NLP), WordPress, RabbitMQ, Gensim, Python, Node.js, AngularFreelance NLP Expert
2018 - 2018Zugata (via Toptal)- Improved and developed a system for the key-phrase extraction from texts by using a trained ML classifier and a variety of extraction techniques (including those involving the statistical analysis of word collocations).
- Innovated a library which applies a dependency parser (including SpaCy or Stanford parsers) to texts and then extracts phrases according to grammatical rules that have been automatically inferred from training texts (using graph theory and NetworkX).
- Implemented frameworks that helped with research including the use of caches for extracted phrases and objects for persisting models with metadata to give consumer knowledge of how that model was formed.
- Created a Flask API for the output of results along with the user ability to specify the different methodologies.
- Enhanced an in-house evaluator of extractor performance accompanied by integration of traditional evaluators (Bleu/Rouge); also, set up cross-validation tests for classifier performance.
- Verified and advised on statistical/confidence tests for studies by a company which went toward a paper that won an award at KDD 2018.
Technologies: SQL, Graph Theory, Natural Language Processing (NLP), Data Science, Machine Learning, NumPy, Pandas, MySQL, Flask, Scikit-learn, SpaCy, NLTK, PythonFreelance Data Scientist | Freelance Machine Learning Specialist
2017 - 2018US-based Investment Management Firm (via Toptal)- Researched and tested prediction models with a Python stack using machine learning regressors and natural language processing techniques.
- Derived features from various sources, including forming vector representations of words/documents using a bag-of-words model (with NLTK) and neural networks (with TensorFlow).
- Developed a configurable model backtesting (and backfilling) system making extensive use of various Pandas functionality.
- Improved the reliability of a Selenium-based framework for scraping websites to source data for model training, including improved logging and reports of nightly performance.
- Created a framework for mining and structuring of data from particular sections of PDF files.
- Enhanced and bug-fixed a React/Redux web app used for showing predictions.
Technologies: Deep Learning, Bokeh, Sentiment Analysis, Amazon Web Services (AWS), Neural Networks, Machine Learning, Data Science, NLTK, React, AWS, Jupyter Notebook, MongoDB, Pandas, TensorFlow, Scikit-learn, PythonFreelance Machine Learning Engineer
2016 - 2017Wedifique (via Toptal)- Implemented a collaborative filtering learning algorithm using Python libraries for use in a product recommendation system.
- Enabled a learning algorithm to be influenced by administrator suggestions when deciding feature weighings.
- Updated aspects of the main web app, where necessary, on both the Node.js back-end and AngularJS front-end.
- Queried (using MongoDB) and derived data for use in user/trend analysis and to populate reports/graphs.
- Set up web/worker multiple server infrastructures using AMQP with Heroku.
Technologies: Heroku, RabbitMQ, AMQP, Machine Learning, AngularJS, Node.js, MongoDB, NumPy, Pandas, Scikit-learn, PythonFreelance Full-stack Developer
2016 - 2016Swtch (via Toptal)- Created a geolocation web app's proof of concept using primarily AngularJS.
- Extensively used a Google Maps JavaScript API in an asynchronous manner, to create a map canvas, plot markers, as well as geocode them from addresses.
- Developed a complex user registration and booking system that was persisted to a PostgreSQL database using a pg-promise library in Node.js.
- Styled an app using Bootstrap so that it is responsive and can be used on a variety of devices.
- Implemented a RESTful API using Express.js and Node.js.
Technologies: Sass, CSS, HTML5, HTML, Gulp.js, PostgreSQL, Express.js, Bootstrap, Node.js, AngularJSAssociate Developer
2013 - 2016Goldman Sachs- Collaborated with a global market risk business to design and maintain a platform that produced a bank’s risk metrics.
- Used the firm's Python-like proprietary language to build and test a framework to collate big data sets and to automate the creation of stress test reports for regulators.
- Led the development team that produced an AngularJS web app and RESTful API to allow users to adjust risk measures and audit these changes.
- Conducted interviews of lateral hires and of interns/analysts for tech division.
- Assisted in integrating a platform into a new distributed computing framework, including occasional examination of the platform's core C++ code.
- Co-created a Java-based version system for report configurations which could be controlled via an AngularJS web app.
- Investigated machine learning methods for possible use in the department.
Technologies: SQL, Sybase, D3.js, Gulp.js, jQuery, DataTables, Sass, Java, Python, C++, Node.js, JavaScript, AngularJSAnalytics Developer
2010 - 2013RBS Markets & International Banking- Developed and maintained a .NET web-based application for analyzing and visualizing time series data for a range of financial products.
- Performed extensive regression testing and other analysis as part of the regular upgrades in pricing libraries the tool depended on.
- Implemented Agile methodologies in delivering several C# coding assignments to add new analytics.
- Trained the support and development teams around the globe including a trip to Singapore to facilitate this.
- Created a VBA tool for logging emails sent to the support inbox, which then detected whether they had been responded to. This was then summarized in a management report.
Technologies: .NET, SQL, Visual Basic for Applications (VBA), Oracle, ASP.NET, Microsoft SQL Server, C#Inside Licensing Specialist
2010 - 2010Microsoft- Automated the building of a spreadsheet which was compiled from various sources and also kept track of deal progress for a Munich-based licensing team.
- Created a tool using VBA to identify discrepancies between two customer pricing sheets (taking into account that entries may be present in both, but in different row locations).
- Presented these tools at team calls and wrote up documentation for them in English and German.
- Added new statistics for the account planning sheet including the data mining of past discounts given to customers.
- Assisted the licensing sales specialists with price and product migration queries.
Technologies: Microsoft Excel, Visual Basic for Applications (VBA)