Daniel is available for hire

Daniel Burfoot

Verified Expert in Engineering

Data Scientist, Software Engineer, and Developer

Location

Portsmouth, NH, United States

Toptal Member Since

July 26, 2017

Daniel is an experienced software engineer, data scientist, and NLP researcher with an expertise in Java and Python programming and a Ph.D. in machine learning. He has worked with Hadoop, the AWS cloud, SQL databases (MySQL/PostgreSQL), front web programming in HTML/JavaScript, machine learning algorithms, TensorFlow, and more.

Natural Language Processing (NLP)Java Machine Learning Linux Amazon S3 (AWS S3)SQL Amazon EC2 PostgreSQL Python MySQL Hadoop JavaScript JEE XML React

Portfolio

Java, Scala, Spark, Big Data, Machine Language, ChatGPT...

Ozora Research

Amazon EC2, Natural Language Processing (NLP), PostgreSQL, Java

Cargo Chief

Amazon Web Services (AWS), Flask, MySQL, Natural Language Processing (NLP)...

Experience

Java - 12 years Linux - 8 years SQL - 6 years Python - 4 years GPT - 4 years Natural Language Processing (NLP) - 4 years Generative Pre-trained Transformers (GPT) - 4 years Machine Learning - 4 years

Availability

Part-time

Preferred Environment

Amazon Web Services (AWS), Linux, Python, SQL, Jakarta EE

The most amazing...

...project I've built is a combined sentence parser and text compressor; the former finds the parse tree that produces the shortest code length for the latter.

Work Experience

Senior Machine Learning Engineer

2019 - 2023

Redesigned and rebuilt the Language Inference module, resulting in an enormous increase in coverage and precision (around 500,000 to 50 million for Chinese, 0 to 21 million for Hindi, etc).
Worked on a large-scale migration of the people's search to a new internal platform. Smart use of Java language features enabled the reuse of the previous codebase, speeding up project delivery considerably.
Built a toolbox for LLM experimentation, allowing fast iteration and exploration of possibilities for LLM usage. Developers can try out a new LLM experiment in one hour.
Contributed to maintaining and refining a big internal Java REST service to distribute my team's data to the rest of the company. The key issue is versioning: the service mixes different data versions together to enable A/B testing.

Technologies: Java, Scala, Spark, Big Data, Machine Language, ChatGPT, Large Language Models (LLMs), Apache Lucene, Rest.li, Python, SQLite

Founder

2014 - 2019

Ozora Research

Developed machine-learning algorithms for sentence parsing and modeling.
Designed, developed, and performance-tuned back-end SQL databases.
Worked on the user interface and visualization for the system’s admin console (JavaScript and HTML5).
Worked on DevOps to enable the code to run on Linux instances on the AWS cloud (S3, EC2, RDS, and Spot Market).
Designed the software architecture in Java to ensure that all the pieces interacted smoothly.

Technologies: Amazon EC2, Natural Language Processing (NLP), PostgreSQL, Java

Python Developer

2018 - 2018

Cargo Chief

Developed algorithms in Python to extract truck information (location, truck type, and so on) from email text; the challenge lay mainly in the widely varying text structure.
Built a suite of evaluation, management, and analysis tools for the system using MySQL, EC2, and CI tool.
Created an admin web app console in Flask to help developers control, analyze, and debug the core NLP components.

Technologies: Amazon Web Services (AWS), Flask, MySQL, Natural Language Processing (NLP), GPT, Generative Pre-trained Transformers (GPT), Python

NLP Consultant

2017 - 2018

User Testing

Helped to develop an NLP system to detect sentiment in user experience narration transcripts; used Python and Keras.
Took on the main challenge which was the limited amount of available training data; a key insight was how to use information from other datasets to help with our problem.
Created a visualization tool that used the neural network to highlight key phrases of strong sentiment.

Technologies: Keras, TensorFlow, Generative Pre-trained Transformers (GPT), Natural Language Processing (NLP), GPT, Python

Lead Scientist

2011 - 2014

Digilant

Worked as the primary developer of a big data audience analysis system.
Programmed Hadoop, using native Java SDK, to process big data from real-time ad exchanges.
Developed a system to connect the Hadoop output to a machine learning algorithm.
Built a visualization/analysis back-end in MySQL to enable clients to understand the audience profile and characteristics.
Integrated the audience analysis system with other components of the company's stack (the bidder system and the operations console).
Wrote additional significant ETL code in Java for the company's reporting system.

Technologies: Machine Learning, Amazon Elastic MapReduce (EMR), Amazon EC2, Amazon S3 (AWS S3), MySQL, Hadoop, Java

Software Developer

2009 - 2010

Rodale Press (Contract)

Developed SmartCoach and SmartCoachPlus—an automated training program generator for runners.
Programmed the initial version in JavaScript, the second version primarily in Java/JSP.
Developed a MySQL back-end for a second version.
Implemented complex training program generation rules.

Technologies: JavaScript, Java

Experience

Mirror Encoding Library

https://github.com/comperical/MirrorEncode

An implementation of a novel method for data compression called the "mirror" technique. Here are some examples that use the technique.

This technique avoids a crucial difficulty in compression, requiring the encoder and decoder to be perfectly in sync. Using this library, developers can easily create new data compression algorithms with just a few lines of code.

Flow Diagram

https://github.com/comperical/FlowState

To use this tool, a developer first writes an algorithm or software process using a special set of conventions. Then the user can automatically extract a visual diagram describing the algorithm.

The diagram is very useful for documentation purposes; other developers (or the original developer, at a later point in time) can easily understand the way the code works just by looking at the diagram, without needing to dive into the specific details.

Ozora Research Sentence Parser

http://ozoraresearch.com/crm/public/parseview/UserParseView.jsp

At Ozora Research, I built a broad grammar sentence parser without using labeled training data (almost all other work in the area of parsing depends on labeled "treebank" data).

The parser is built in combination with a specialized text compressor which compresses text by using a parse tree. The parser produces the tree that will produce the smallest code length for the given sentence. You can demo the parser at the link provided.

Notes on a New Philosophy of Empirical Science

https://arxiv.org/abs/1104.5466

This is a book that I wrote about a new approach to empirical science based on lossless data compression. In this philosophy, a researcher proposes a theory, builds the theory into a data compressor, and measures the quality of the theory by invoking the compressor on a large shared data set. If the theory achieves a lower net code length (including the size of the compressor itself) than previous theories, it is confirmed as the new "champion" theory.

This philosophy guided my work at Ozora Research. In this case, the relevant data set was English newspaper text. To compress this data, I developed theories of grammar and syntax, and build those theories into a data compressor.

Statistical Modeling as a Search for Randomness Deficiencies | Ph.D. Thesis

My Ph.D. thesis developed an approach to statistical modeling based on the search for randomness deficiencies in an encoded form of the data.

According to algorithmic information theory, if a given model is a perfect fit for a data set, then when you encode the data using the model, the resulting encoded data (typically a bit string) is completely random. This implies that if you have a model—and encode the data using the model and find a randomness deficiency in the encoded data—then there is a flaw in your model. Furthermore, an analysis of the randomness deficiency illustrates a way to improve the model.

The thesis developed a suite of machine learning algorithms that work by using this idea.

Education

2006 - 2010

Ph.D. in Machine Learning

University of Tokyo - Tokyo, Japan

2004 - 2006

Master of Science in Artificial Intelligence

McGill University - Montreal, Canada

2002 - 2004

Master of Science in Physics

University of Connecticut - Storrs, CT, USA

1995 - 1999

Bachelor of Arts in Applied Math and Computer Science

Harvard University - Cambridge, MA, USA

Skills

Languages

Java, SQL, Python, XML, JavaScript, Scala

Other

Natural Language Processing (NLP), GPT, Generative Pre-trained Transformers (GPT), Machine Learning, Algorithms, Big Data, Machine Language, Large Language Models (LLMs)

Frameworks

Hadoop, Flask, Spark, Rest.li

Paradigms

Object-oriented Design (OOD)

Platforms

Linux, Amazon EC2, Jakarta EE, Amazon Web Services (AWS), JEE

Storage

MySQL, PostgreSQL, Amazon S3 (AWS S3), JSON, SQLite

Libraries/APIs

Keras, React, TensorFlow, Apache Lucene

Tools

Azure Machine Learning, Amazon Elastic MapReduce (EMR), ChatGPT

Collaboration That Works

How to Work with Toptal

Toptal matches you directly with global industry experts from our network in hours—not weeks or months.

Share your needs

Discuss your requirements and refine your scope in a call with a Toptal domain expert.

Choose your talent

Get a short list of expertly matched talent within 24 hours to review, interview, and choose from.

Start your risk-free talent trial

Work with your chosen talent on a trial basis for up to two weeks. Pay only if you decide to hire them.

Top talent is in high demand.

Start hiring

Daniel Burfoot

Verified Expert in Engineering

Data Scientist, Software Engineer, and Developer

Portfolio

Experience

Availability

Preferred Environment

The most amazing...

Work Experience

Senior Machine Learning Engineer

LinkedIn

Founder

Ozora Research

Python Developer

Cargo Chief

NLP Consultant

User Testing

Lead Scientist

Digilant

Software Developer

Rodale Press (Contract)

Experience

Mirror Encoding Library

Flow Diagram

Ozora Research Sentence Parser

Notes on a New Philosophy of Empirical Science

Statistical Modeling as a Search for Randomness Deficiencies | Ph.D. Thesis

Education

Ph.D. in Machine Learning

Master of Science in Artificial Intelligence

Master of Science in Physics

Bachelor of Arts in Applied Math and Computer Science

Skills

Languages

Other

Frameworks

Paradigms

Platforms

Storage

Libraries/APIs

Tools

How to Work with Toptal

Share your needs

Choose your talent

Start your risk-free talent trial