Neil du Toit, Developer in Cape Town, Western Cape, South Africa
Neil is available for hire
Hire Neil

Neil du Toit

Verified Expert  in Engineering

Software Developer

Cape Town, Western Cape, South Africa

Toptal member since December 14, 2021

Bio

Neil is a data scientist specializing in natural language processing, including text classification, summarization, regular expressions, OCR, and data visualization. Most recently, he analyzed court record repositories from 15 different countries. Neil has also developed Django back ends and has experience with many SQL and NoSQL databases. He has worked in data strategy consulting in the insurance and retail sectors.

Portfolio

Artbrain Ltd.
Python, MongoDB, PyTorch, Amazon Web Services (AWS), NumPy, Pandas...
Henry Stewart
Python, Generative Pre-trained Transformers (GPT)...
University of Cape Town
ArangoDB, Python, Text Classification, Regular Expressions, MySQL, Docker...

Experience

  • Python - 5 years
  • Generative Pre-trained Transformers (GPT) - 5 years
  • Natural Language Processing (NLP) - 5 years
  • DataViz - 4 years
  • Regular Expressions - 3 years
  • Elasticsearch - 3 years
  • Text Classification - 3 years
  • Automated Summarization - 3 years

Availability

Part-time

Preferred Environment

Linux, Vim Text Editor, Slack, Bash, Git, Docker, Virtualenv, SSH, Tmux

The most amazing...

...text classifier I've built classified court judgments from 15 countries allowing thousands of users access to a more powerful legal research tool.

Work Experience

Python Developer

2022 - 2023
Artbrain Ltd.
  • Identified and fixed bugs in the system that were causing it to produce incorrect results.
  • Optimized the system to run on downgraded infrastructure and take less time for inference.
  • Ran end-to-end testing and profiling of the system.
Technologies: Python, MongoDB, PyTorch, Amazon Web Services (AWS), NumPy, Pandas, Amazon DynamoDB, Natural Language Processing (NLP), Generative Pre-trained Transformers (GPT), APIs

Senior Python Developer

2022 - 2022
Henry Stewart
  • Developed an MVP from scratch that can extract the required information from medical documents.
  • Created a production environment and deployed a production-ready build of the service.
  • Worked with the UI developer to integrate the service into the application.
Technologies: Python, Natural Language Processing (NLP), Generative Pre-trained Transformers (GPT), SpaCy, Flask, Django, Docker, PostgreSQL, Text Mining, Deployment, Sentiment Analysis, Entity Extraction, Data Mining, Optical Character Recognition (OCR), Relational Databases, Natural Language Toolkit (NLTK), APIs, Document Parsing

Data Scientist

2018 - 2022
University of Cape Town
  • Extracted references automatically from thousands of court records.
  • Created a network graph based on the extracted references and visualized it interactively in the browser.
  • Developed a taxonomy and classified all court records according to the taxonomy using machine learning.
  • Created an automated summary generator able to summarize every court record in the collection.
  • Configured an Elasticsearch search engine to allow for the searching of court records.
Technologies: ArangoDB, Python, Text Classification, Regular Expressions, MySQL, Docker, DataViz, Django, Optical Character Recognition (OCR), Machine Learning, Servers, TensorFlow, Tesseract, Pytesseract, Image Recognition, REST APIs, HTML, CSS, Natural Language Processing (NLP), Generative Pre-trained Transformers (GPT), JavaScript, Topic Modeling, Elasticsearch, Automated Summarization, Cython, SciPy, Matplotlib, Seaborn, Natural Language Toolkit (NLTK), APIs, Full-stack, Selenium, Robotic Process Automation (RPA), Document Parsing

Head of Data Division

2017 - 2018
Q Division
  • Developed architecture for churn prediction and customer lifetime value measurement for an insurance provider, requiring the integration of separate systems managing customer acquisition data and on-book customer data.
  • Audited third-party service providers for an insurance provider, specifically marketing agencies, and developed pipelines to integrate third-party data into the insurance provider's analytics.
  • Developed a Sankey diagram data visualization that comprehensively displayed the customer acquisition channels, the costs associated with each channel, and the value obtained in return.
Technologies: Python 3, Django, Machine Learning, Servers, Data Engineering, REST APIs, Microsoft Power BI, Relational Databases, Natural Language Toolkit (NLTK), RapidMiner, APIs, Selenium

Data Strategist

2017 - 2017
Q Division
  • Developed a dashboard for a large retailer, including sales trends, KPI tracking, relevant open data, and predictive analytics.
  • Created the game-play "quests" for an educational adaptive-learning tablet game that taught financial literacy to children.
  • Developed the calculations for a debt repayment calculator application, which provided a breakdown of the snowball and avalanche debt repayment methods.
Technologies: SQL, Python 3, Django, C#, Unity, Servers, REST APIs, Relational Databases, RapidMiner, APIs

Court Precedent Citation Network Graph

https://ojs.law.cornell.edu/index.php/joal/article/view/89
Most legal jurisdictions operate on a system of precedent, which means that all subsequent judgments must follow legal principles established in one court judgment. When precedents are followed, judges cite the cases which established them. I extracted over a million of these citations from cases spanning 15 jurisdictions across Africa using pattern matching. These citations were not standardized and data cleaning formed a large part of this project. After that, the cases could be organized into a network graph, allowing for a mathematical treatment of how legal principles propagate through the courts.

An Evaluation of Four-team-per-contest Swiss Power Paired Tournaments

This paper, published in the Monash Debate Review volume 12 in 2014, views tournament structures as sorting algorithms and evaluates their performance. It concludes that the most popular tournament structure for debating tournaments, the Swiss-system tournament, is ill-suited to British parliamentary-style debating and leads to unfair results compared with group stage tournaments, round robins, and elimination tournaments.

Artbrain AI

https://www.artbrain.ai/
I debugged and optimized the AI model to fix stability issues and be able to move the service to cheaper infrastructure. I worked with an existing codebase with many bugs and profiled, tested, and debugged the system before optimizing it.
2014 - 2016

Bachelor’s Degree in Law and Justice Administration

University of Stellenbosch - Stellenbosch, South Africa

2011 - 2013

Bachelor’s Degree in Mathematics

University of Cape Town - Cape Town, South Africa

Libraries/APIs

REST APIs, NumPy, D3.js, Matplotlib, Natural Language Toolkit (NLTK), Pandas, SciPy, PyTorch, TensorFlow, SpaCy

Tools

DataViz, Vim Text Editor, Slack, Git, Virtualenv, Tmux, Seaborn, Microsoft Power BI

Languages

Python, Python 3, SQL, Bash, Falcon, C#, R, Octave, Java, HTML, CSS, JavaScript

Frameworks

Django, Selenium, Flask, Unity

Platforms

Docker, Linux, RapidMiner, Amazon Web Services (AWS)

Storage

MySQL, Elasticsearch, Relational Databases, MongoDB, ArangoDB, Amazon DynamoDB, PostgreSQL

Other

Natural Language Processing (NLP), Regular Expressions, Optical Character Recognition (OCR), Automated Summarization, Law, Tesseract, APIs, Robotic Process Automation (RPA), Document Parsing, Generative Pre-trained Transformers (GPT), Legal Documentation, Data Science, Text Classification, Topic Modeling, SSH, Cython, Mathematics, Economics, Civil Law, Business Law, Machine Learning, Servers, Artificial Intelligence (AI), Data Engineering, Pytesseract, Image Recognition, Web Scraping, Generative Pre-trained Transformer 3 (GPT-3), OpenAI GPT-3 API, Pipelines, Text Mining, Deployment, Sentiment Analysis, Entity Extraction, Data Mining, Full-stack

Collaboration That Works

How to Work with Toptal

Toptal matches you directly with global industry experts from our network in hours—not weeks or months.

1

Share your needs

Discuss your requirements and refine your scope in a call with a Toptal domain expert.
2

Choose your talent

Get a short list of expertly matched talent within 24 hours to review, interview, and choose from.
3

Start your risk-free talent trial

Work with your chosen talent on a trial basis for up to two weeks. Pay only if you decide to hire them.

Top talent is in high demand.

Start hiring