Thomas Wood, Developer in London, United Kingdom
Thomas is available for hire
Hire Thomas

Thomas Wood

Verified Expert  in Engineering

Data Scientist and Developer

Location
London, United Kingdom
Toptal Member Since
July 13, 2020

Thomas has been working in machine learning and natural language processing for more than ten years. Thomas initially studied physics to a master's level. He then moved into the machine learning field, completing a second master's degree in computer speech, text, and internet technology at the University of Cambridge in 2008. Thomas has worked in a variety of companies in industries including consulting, computer science, recruitment, retail, and security, as well as some research experience.

Availability

Part-time

Preferred Environment

MacOS, Linux, Windows, SpaCy, Scikit-learn, TensorFlow, PyCharm, Python

The most amazing...

...thing I've delivered for a client is a 7% increase in customers by adding machine learning to their signup form.

Work Experience

Director | Freelance Consultant Data Scientist

2018 - PRESENT
Fast Data Science Ltd.
  • Provided consulting in many areas of machine learning to clients across industries, building and deploying machine learning models, focusing on natural language processing.
  • Conducted AI due diligence of startups for investors.
  • Provided training and upskilling in data science for analytics teams.
  • Assisted consultancies with public sector procurement in a variety of countries.
Technologies: Amazon Web Services (AWS), TensorFlow, Azure, Python

Consultant Data Scientist

2020 - 2020
National Health Service
  • Investigated factors behind junior doctor attrition from the NHS and developed a machine learning model to predict who is going to leave the organization.
  • Provided consulting services to management and insights on causes of employee turnover.
  • Provided general data strategy consulting to the NHS management.
Technologies: Scikit-learn, Azure, Python

Data Scientist

2019 - 2020
Boehringer Ingelheim
  • Trained a text classification model to predict 75 parameters of complexity from 200-page clinical trial protocol PDFs, allowing the clinical operations team to run financial modeling on more reliable data.
  • Analyzed text reports of manufacturing defects and performed unsupervised clustering with LDA, allowing manufacturing division to see key areas of faults.
  • Identified molecules in scientific publications linked to molecules discovered by Boehringer Ingelheim, allowing pre-clinical research team to connect with researchers over the world experimenting with the same compounds.
Technologies: TensorFlow, PostgreSQL, Python

Data Scientist

2018 - 2019
Tesco
  • Designed and trained a regression model using PySpark/Spark MLLib to predict customers' order weights in kilograms before they even place the order.
  • Worked on recommendation systems for recommending online shopping purchases.
  • Trained a predictive model to predict vehicle turnaround and loading times.
Technologies: MLlib, PySpark, Spark, Python

Data Scientist

2017 - 2018
CV-Library
  • Used machine learning to predict information about candidates, allowing the company to simplify the registration process and improve registrations by 7%.
  • Deployed a recommender system to send job alerts to candidates by email with a 7% conversion rate.
  • Trained deep learning models (CNN, RNN, LSTM, Word2Vec, Seq2Seq) to analyze candidates' CVs and job descriptions, using Google GPU instances.
  • Deployed machine learning projects through to production on the live site as scalable Docker instances behind a load balancer.
  • Worked on new techniques to recommend a job to a candidate based on past behavior (like the recommendations you see when you buy a product on Amazon).
Technologies: Google Cloud Platform (GCP), TensorFlow, Python

Computer Vision Scientist

2015 - 2017
Veridium
  • Designed and trained—using a team of five developers and five testers/data annotators—neural network solutions for face recognition that ran on Android, iOS, and Windows.
  • Collected training data from sources such as web scraping and arranged annotators to manually clean data.
  • Worked on and patented cryptographic measures to protect biometric data (irises, fingerprints, and so on).
  • Trained convolutional neural networks on GPU using deep learning software Caffe and was able to classify images such as fingerprints or pharmaceutical pill bottles.
  • Designed cryptographic measures to protect biometric data (irises, fingerprints).
Technologies: C++, TensorFlow, Python

Solution Architect

2009 - 2016
CID GmbH
  • Worked in a team of five computational linguists that were designing methods for monitoring market sentiment on the internet and specializing in focused web crawling.
  • Communicated designs for natural-language-processing programs to a team of developers who implemented these into products marketed to corporate clients.
  • Worked on the development of a machine learning NLP pipeline.
Technologies: Machine Learning, .NET, C#

Knowledge Engineer

2011 - 2015
Artificial Solutions
  • Worked in a multilingual team on the architecture of human-like natural language dialog systems for use on mobile, web, and in consumer electronics, becoming the team expert on advanced parsing of user input.
  • Made frequent visits to blue-chip companies in Silicon Valley and Asia while presenting technology solutions to potential clients.
  • Defined requirements, estimated time scales, and prototyped during project planning.
  • Provided consulting services to clients and partners developing their own dialog systems using my company's proprietary software.
Technologies: Python

Customer Conversion on an Online Form

I established that the signup form in a job board was causing the company to lose customers. I was able to establish that users were confused by some fields.

Since users also uploaded their CV which contains explicitly lots of personal information, as well as implicit information such as the job type or salary that someone was looking for, I was able to train a deep neural network on past signup data over several years, to analyze the CV and fill out some of the fields in the signup form automatically. This allowed a field to be removed, which boosted the conversion rate of the form by 7%, measured by A/B testing.

Vehicle Unloading Times

A client in the retail industry had a fleet of vehicles delivering produce at different times of the day. They used third-party logistics software to plan the delivery schedules, however, an element of the delivery schedules that was hard to plan was the unloading time of the vehicle when it arrived at the store.

Fortunately, there was a system in place for recording vehicle ignition events, GPS location, and geofencing to identify the arrival and departure times of delivery vehicles, and past schedules were available to identify the quantity and type of product delivered on each drop, which driver was in charge, and the time of day and type of vehicle used.

Using this trove of logged data I was able to train a simple regression model that would predict the unloading time of any future delivery at the time that the schedule is being generated.

This allowed the client to save money on driver overtime, disruption caused by late deliveries, and fines due to drivers working longer than their legally permitted hours.

Analysis of Clinical Trials

When a pharmaceutical company develops a drug, it needs to pass through several phases of trials before regulators can approve it.

Before the trial is run, the drug developer writes a document called a protocol. This contains vital information about how long the trial will run for, what is the risk to participants, what kind of treatment is being investigated, and so on.

The problem is that each protocol is up to 200 pages long, and the structure can vary.

For one pharmaceutical company, I developed and trained a deep learning tool to predict more than 50 output variables from a clinical trial protocol. This allows pharma companies and regulators to analyze and quantify large numbers of protocols, allowing more accurate cost estimation.

The technique can be extended to other industries where large unstructured or semi-structured documents are the norm.

Finding Molecules and Proteins in Scientific Literature

I have worked on several different projects where a client needed to parse scientific literature and identify occurrences of molecules or proteins.

As an example, the molecule on the right is Aspirin. This is still a trademark of Bayer in some countries. But in a paper, it could appear under acetylsalicylic acid, 2-acetoxybenzenecarboxylic acid,

C9H8O4, or a number of identifiers such as DB00945. There could also be identifiers that refer to other molecules or identifiers that refer to only one version of a molecule.

I have developed several tried and tested techniques to disambiguate these terms. Usually, I need several annotated examples to start with, and we will train a machine learning model to learn from these examples and annotate new publications as they come in.

Languages

Python, Python 3, SQL, Java, R, C++, C#

Libraries/APIs

Scikit-learn, NumPy, SpaCy, Natural Language Toolkit (NLTK), SciPy, TensorFlow, PySpark, MLlib

Tools

Azure ML Studio, Azure Machine Learning, PyCharm

Paradigms

Data Science

Platforms

Azure, Google Cloud Platform (GCP), Docker, MacOS, Unix, Amazon Web Services (AWS), Windows, Linux

Other

Natural Language Understanding (NLU), Natural Language Processing (NLP), Text Classification, Machine Learning, Classification Algorithms, Convolutional Neural Networks (CNN), Dialog Systems, Natural Language Generation (NLG), Spanish, German, GPT, Generative Pre-trained Transformers (GPT), Programming, Physics, Clustering, Graphics Processing Unit (GPU), Custom BERT, Computer Vision, Speech Recognition, Speech Synthesis, Sentiment Analysis

Frameworks

Flask, Spark, .NET

Storage

PostgreSQL

2007 - 2008

Master's Degree in Computer Speech, Text, and Internet Technology

University of Cambridge - Cambridge, UK

2003 - 2007

Master's Degree in Physics

University of Durham - Durham, UK

APRIL 2020 - APRIL 2022

Azure Data Science Associate

Microsoft

Collaboration That Works

How to Work with Toptal

Toptal matches you directly with global industry experts from our network in hours—not weeks or months.

1

Share your needs

Discuss your requirements and refine your scope in a call with a Toptal domain expert.
2

Choose your talent

Get a short list of expertly matched talent within 24 hours to review, interview, and choose from.
3

Start your risk-free talent trial

Work with your chosen talent on a trial basis for up to two weeks. Pay only if you decide to hire them.

Top talent is in high demand.

Start hiring