Neil du Toit, Developer in Cape Town, Western Cape, South Africa
Neil is available for hire
Hire Neil

Neil du Toit

Software Developer

Cape Town, Western Cape, South Africa

Toptal member since December 14, 2021

Bio

Neil is a data scientist specializing in natural language processing, including classification, OCR, and entity extraction, as well as retrieval augmented generation, including semantic search, image search, image models, reranking, and tabular data. He's developed systems to summarize thousands of commercial lease agreements at a major property company and worked on its internal AI chatbot. Neil has particular expertise in the legal industry and experience in healthcare.

Portfolio

Jones Lang LaSalle
Python, Azure, Azure Function App, Azure AI Document Intelligence...
Artbrain Ltd.
Python, MongoDB, PyTorch, Amazon Web Services (AWS), NumPy, Pandas...
Henry Stewart
Python, Generative Pre-trained Transformers (GPT)...

Experience

  • Python - 5 years
  • Generative Pre-trained Transformers (GPT) - 5 years
  • Natural Language Processing (NLP) - 5 years
  • DataViz - 4 years
  • Regular Expressions - 3 years
  • Elasticsearch - 3 years
  • Text Classification - 3 years
  • Automated Summarization - 3 years

Preferred Environment

Linux, Vim Text Editor, Slack, Bash, Git, Docker, Virtualenv, SSH, Tmux

The most amazing...

...text classifier I've built classified court judgments from 15 countries, enabling thousands of users to access a more powerful legal research tool.

Work Experience

Data Scientist

2023 - PRESENT
Jones Lang LaSalle
  • Developed a RAG solution to summarize commercial lease agreements extracting key terms and calculating time periods with reference text highlighted in the original lease.
  • Developed a solution to digitize tens of thousands of order documents to help the client identify outstanding debtors, leading to significant direct cash ROI.
  • Integrated lease summarization into a property portfolio management solution.
  • Improved the performance of retrieval in the RAG pipeline for the company's internal central AI assistant.
Technologies: Python, Azure, Azure Function App, Azure AI Document Intelligence, RAG Pipelines, RAG Architecture, Text Classification

Python Developer

2022 - 2023
Artbrain Ltd.
  • Identified and fixed bugs in the system that were causing it to produce incorrect results.
  • Optimized the system to run on downgraded infrastructure and take less time for inference.
  • Ran end-to-end testing and profiling of the system.
Technologies: Python, MongoDB, PyTorch, Amazon Web Services (AWS), NumPy, Pandas, Amazon DynamoDB, Natural Language Processing (NLP), Generative Pre-trained Transformers (GPT), APIs

Senior Python Developer

2022 - 2022
Henry Stewart
  • Developed an MVP from scratch that can extract the required information from medical documents.
  • Created a production environment and deployed a production-ready build of the service.
  • Worked with the UI developer to integrate the service into the application.
Technologies: Python, Natural Language Processing (NLP), Generative Pre-trained Transformers (GPT), SpaCy, Flask, Django, Docker, PostgreSQL, Text Mining, Deployment, Sentiment Analysis, Entity Extraction, Data Mining, Optical Character Recognition (OCR), Relational Databases, Natural Language Toolkit (NLTK), APIs, Document Parsing

Data Scientist

2018 - 2022
University of Cape Town
  • Extracted references automatically from thousands of court records.
  • Created a network graph based on the extracted references and visualized it interactively in the browser.
  • Developed a taxonomy and classified all court records according to the taxonomy using machine learning.
  • Created an automated summary generator able to summarize every court record in the collection.
  • Configured an Elasticsearch search engine to allow for the searching of court records.
Technologies: ArangoDB, Python, Text Classification, Regular Expressions, MySQL, Docker, DataViz, Django, Optical Character Recognition (OCR), Machine Learning, Servers, TensorFlow, Tesseract, Pytesseract, Image Recognition, REST APIs, HTML, CSS, Generative Pre-trained Transformers (GPT), Natural Language Processing (NLP), JavaScript, Topic Modeling, Elasticsearch, Automated Summarization, Cython, SciPy, Matplotlib, Seaborn, Natural Language Toolkit (NLTK), APIs, Full-stack, Selenium, Robotic Process Automation (RPA), Document Parsing

Head of Data Division

2017 - 2018
Q Division
  • Developed architecture for churn prediction and customer lifetime value measurement for an insurance provider, requiring the integration of separate systems managing customer acquisition data and on-book customer data.
  • Audited third-party service providers for an insurance provider, specifically marketing agencies, and developed pipelines to integrate third-party data into the insurance provider's analytics.
  • Developed a Sankey diagram data visualization that comprehensively displayed the customer acquisition channels, the costs associated with each channel, and the value obtained in return.
Technologies: Python 3, Django, Machine Learning, Servers, Data Engineering, REST APIs, Microsoft Power BI, Relational Databases, Natural Language Toolkit (NLTK), RapidMiner, APIs, Selenium

Data Strategist

2017 - 2017
Q Division
  • Developed a dashboard for a large retailer, including sales trends, KPI tracking, relevant open data, and predictive analytics.
  • Created the game-play "quests" for an educational adaptive-learning tablet game that taught financial literacy to children.
  • Developed the calculations for a debt repayment calculator application, which provided a breakdown of the snowball and avalanche debt repayment methods.
Technologies: SQL, Python 3, Django, C#, Unity, Servers, REST APIs, Relational Databases, RapidMiner, APIs

Experience

Court Precedent Citation Network Graph

https://ojs.law.cornell.edu/index.php/joal/article/view/89
Most legal jurisdictions operate on a system of precedent, which means that all subsequent judgments must follow legal principles established in one court judgment. When precedents are followed, judges cite the cases which established them. I extracted over a million of these citations from cases spanning 15 jurisdictions across Africa using pattern matching. These citations were not standardized and data cleaning formed a large part of this project. After that, the cases could be organized into a network graph, allowing for a mathematical treatment of how legal principles propagate through the courts.

An Evaluation of Four-team-per-contest Swiss Power Paired Tournaments

This paper, published in the Monash Debate Review volume 12 in 2014, views tournament structures as sorting algorithms and evaluates their performance. It concludes that the most popular tournament structure for debating tournaments, the Swiss-system tournament, is ill-suited to British parliamentary-style debating and leads to unfair results compared with group stage tournaments, round robins, and elimination tournaments.

Artbrain AI

https://www.artbrain.ai/
I debugged and optimized the AI model to fix stability issues and be able to move the service to cheaper infrastructure. I worked with an existing codebase with many bugs and profiled, tested, and debugged the system before optimizing it.

Education

2014 - 2016

Bachelor’s Degree in Law and Justice Administration

University of Stellenbosch - Stellenbosch, South Africa

2011 - 2013

Bachelor’s Degree in Mathematics

University of Cape Town - Cape Town, South Africa

Skills

Libraries/APIs

REST APIs, NumPy, D3.js, Matplotlib, Natural Language Toolkit (NLTK), Pandas, SciPy, PyTorch, TensorFlow, SpaCy

Tools

DataViz, Vim Text Editor, Slack, Git, Virtualenv, Tmux, Seaborn, Microsoft Power BI

Languages

Python, Python 3, SQL, Bash, Falcon, C#, R, Octave, Java, HTML, CSS, JavaScript

Frameworks

Django, Selenium, Flask, Unity

Platforms

Docker, Linux, RapidMiner, Amazon Web Services (AWS), Azure

Storage

MySQL, Elasticsearch, Relational Databases, MongoDB, ArangoDB, Amazon DynamoDB, PostgreSQL

Other

Natural Language Processing (NLP), Regular Expressions, Optical Character Recognition (OCR), Automated Summarization, Law, Tesseract, APIs, Robotic Process Automation (RPA), Document Parsing, Generative Pre-trained Transformers (GPT), Legal Documentation, Data Science, AI Integration, Large Language Models (LLMs), Document Processing, Text Classification, Topic Modeling, SSH, Cython, Mathematics, Economics, Civil Law, Business Law, Machine Learning, Servers, Artificial Intelligence (AI), Data Engineering, Pytesseract, Image Recognition, Web Scraping, Generative Pre-trained Transformer 3 (GPT-3), OpenAI GPT-3 API, Pipelines, Text Mining, Deployment, Sentiment Analysis, Entity Extraction, Data Mining, Full-stack, Azure Function App, Azure AI Document Intelligence, RAG Pipelines, RAG Architecture, Data Anonymization

Collaboration That Works

How to Work with Toptal

Toptal matches you directly with global industry experts from our network in hours—not weeks or months.

1

Share your needs

Discuss your requirements and refine your scope in a call with a Toptal domain expert.
2

Choose your talent

Get a short list of expertly matched talent within 24 hours to review, interview, and choose from.
3

Start your risk-free talent trial

Work with your chosen talent on a trial basis for up to two weeks. Pay only if you decide to hire them.

Top talent is in high demand.

Start hiring