
Daniel Burfoot
Verified Expert in Engineering
Data Scientist, Software Engineer, and Developer
Daniel is an experienced software engineer, data scientist, and NLP researcher with an expertise in Java and Python programming and a Ph.D. in machine learning. He has worked with Hadoop, the AWS cloud, SQL databases (MySQL/PostgreSQL), front web programming in HTML/JavaScript, machine learning algorithms, TensorFlow, and more.
Portfolio
Experience
Availability
Preferred Environment
Amazon Web Services (AWS), Linux, Python, SQL, Jakarta EE
The most amazing...
...project I've built is a combined sentence parser and text compressor; the former finds the parse tree that produces the shortest code length for the latter.
Work Experience
Founder
Ozora Research
- Developed machine-learning algorithms for sentence parsing and modeling.
- Designed, developed, and performance-tuned back-end SQL databases.
- Worked on the user interface and visualization for the system’s admin console (JavaScript and HTML5).
- Worked on DevOps to enable the code to run on Linux instances on the AWS cloud (S3, EC2, RDS, and Spot Market).
- Designed the software architecture in Java to ensure that all the pieces interacted smoothly.
Python Developer
Cargo Chief
- Developed algorithms in Python to extract truck information (location, truck type, and so on) from email text; the challenge lay mainly in the widely varying text structure.
- Built a suite of evaluation, management, and analysis tools for the system using MySQL, EC2, and CI tool.
- Created an admin web app console in Flask to help developers control, analyze, and debug the core NLP components.
NLP Consultant
User Testing
- Helped to develop an NLP system to detect sentiment in user experience narration transcripts; used Python and Keras.
- Took on the main challenge which was the limited amount of available training data; a key insight was how to use information from other datasets to help with our problem.
- Created a visualization tool that used the neural network to highlight key phrases of strong sentiment.
Lead Scientist
Digilant
- Worked as the primary developer of a big data audience analysis system.
- Programmed Hadoop, using native Java SDK, to process big data from real-time ad exchanges.
- Developed a system to connect the Hadoop output to a machine learning algorithm.
- Built a visualization/analysis back-end in MySQL to enable clients to understand the audience profile and characteristics.
- Integrated the audience analysis system with other components of the company's stack (the bidder system and the operations console).
- Wrote additional significant ETL code in Java for the company's reporting system.
Software Developer
Rodale Press (Contract)
- Developed SmartCoach and SmartCoachPlus—an automated training program generator for runners.
- Programmed the initial version in JavaScript, the second version primarily in Java/JSP.
- Developed a MySQL back-end for a second version.
- Implemented complex training program generation rules.
Experience
Mirror Encoding Library
https://github.com/comperical/MirrorEncodeThis technique avoids a crucial difficulty in compression, requiring the encoder and decoder to be perfectly in sync. Using this library, developers can easily create new data compression algorithms with just a few lines of code.
Flow Diagram
https://github.com/comperical/FlowStateThe diagram is very useful for documentation purposes; other developers (or the original developer, at a later point in time) can easily understand the way the code works just by looking at the diagram, without needing to dive into the specific details.
Ozora Research Sentence Parser
http://ozoraresearch.com/crm/public/parseview/UserParseView.jspThe parser is built in combination with a specialized text compressor which compresses text by using a parse tree. The parser produces the tree that will produce the smallest code length for the given sentence. You can demo the parser at the link provided.
Notes on a New Philosophy of Empirical Science
https://arxiv.org/abs/1104.5466This philosophy guided my work at Ozora Research. In this case, the relevant data set was English newspaper text. To compress this data, I developed theories of grammar and syntax, and build those theories into a data compressor.
Statistical Modeling as a Search for Randomness Deficiencies | Ph.D. Thesis
According to algorithmic information theory, if a given model is a perfect fit for a data set, then when you encode the data using the model, the resulting encoded data (typically a bit string) is completely random. This implies that if you have a model—and encode the data using the model and find a randomness deficiency in the encoded data—then there is a flaw in your model. Furthermore, an analysis of the randomness deficiency illustrates a way to improve the model.
The thesis developed a suite of machine learning algorithms that work by using this idea.
Skills
Languages
Java, SQL, Python, XML, JavaScript
Other
Natural Language Processing (NLP), GPT, Generative Pre-trained Transformers (GPT), Machine Learning, Algorithms
Frameworks
Hadoop, Flask
Paradigms
Object-oriented Design (OOD)
Platforms
Linux, Amazon EC2, Jakarta EE, Amazon Web Services (AWS), JEE
Storage
MySQL, PostgreSQL, Amazon S3 (AWS S3), JSON
Libraries/APIs
Keras, React, TensorFlow
Tools
Azure Machine Learning, Amazon Elastic MapReduce (EMR)
Education
Ph.D. in Machine Learning
University of Tokyo - Tokyo, Japan
Master of Science in Artificial Intelligence
McGill University - Montreal, Canada
Master of Science in Physics
University of Connecticut - Storrs, CT, USA
Bachelor of Arts in Applied Math and Computer Science
Harvard University - Cambridge, MA, USA