Tal Perry, Developer in Berlin, Germany
Tal is available for hire
Hire Tal

Tal Perry

Verified Expert  in Engineering

Machine Learning Developer

Location
Berlin, Germany
Toptal Member Since
March 20, 2020

Tal is a Google developer with expertise in machine learning and a former NLP researcher at Citi. He is the founder and CTO of LightTag, a profitable NLP SaaS platform. His experience spans ML, ops, and human-machine interfaces. The solutions he's put in production include language-based compliance monitoring systems, high-frequency trading systems trading hundreds of millions a day, and NLP-based alternative data offerings for competitive intelligence and financial analysis.

Portfolio

LightTag
Amazon Web Services (AWS), Docker, Django, Python, React, TensorFlow
Citi
TensorFlow, Scikit-learn, Python
Superfly
Amazon Web Services (AWS), Elasticsearch, PostgreSQL, Redis, Celery, Django...

Experience

Availability

Part-time

Preferred Environment

Amazon Web Services (AWS), SQL, PostgreSQL, Docker, Django, Redux, React, TensorFlow, TypeScript, JavaScript, Python

The most amazing...

...thing I've built is a patented system for analyzing trader behavior based on the behavioral finance literature.

Work Experience

Founder, CEO, CTO

2018 - 2020
LightTag
  • Created a SaaS business for NLP annotation with customers including Viasat, Microsoft, and Pitchbook.
  • Deployed a language-agnostic machine learning model that correctly generates 70% of entity annotations on the platform.
  • Built a multi-tenant SaaS supporting thousands of tenants while maintaining strong guarantees on tenant data isolation and low infrastructure expenses.
  • Invented and implemented a patent-pending interface for drag and drop relationship annotation supporting constituency and dependency grammar.
  • Conducted customer interviews and implemented findings to increase conversions, retention rates, and customer delight.
  • Designed and deployed deep NLP models that can adapt to customer data without incurring significant compute costs.
Technologies: Amazon Web Services (AWS), Docker, Django, Python, React, TensorFlow

Data Scientist

2016 - 2020
Citi
  • Applied behavioral finance theory to create a patented system for detecting bias in credit trader behavior.
  • Used rule-based and deep learning NLP to create multilingual compliance and CRM solutions for sell-side credit and rates trading.
  • Reduced labor costs and turnaround time for institutional loan origination by developing ML-based document classification, routing, and extraction systems.
Technologies: TensorFlow, Scikit-learn, Python

CTO

2014 - 2016
Superfly
  • Grew the engineering team from a team of one to a cohesive and productive team of 12.
  • Reduced turnaround time on POCs from three weeks on average to less than 48 hours by making core data assets accessible to the business side.
  • Led the technological and product shift of the company from a $0 revenue consumer-facing service to a multi-million dollar alternative data provider.
  • Maintained an acceptable infrastructure cost as we grew our data processing scale 1,000X.
  • Increased return on data annotation costs by developing a "human-friendly" domain-specific language for semi-structured text analytics.
  • Drove data acquisition throughput by deploying a terabyte-scale Elasticsearch cluster and designing a custom interface to find "needles in haystacks."
Technologies: Amazon Web Services (AWS), Elasticsearch, PostgreSQL, Redis, Celery, Django, Python

Research Engineer

2013 - 2014
Fluent Trade Technologies
  • Deployed high-frequency algorithmic trading systems capable of trading hundreds of millions in notional volume a day.
  • Implemented ML algorithms with single-digit millisecond latency to maintain an edge in HFT.
  • Contributed to API design, usability testing, and QA as the company expanded into HFT PaaS offerings.
  • Liaised between the research team, engineering, and senior management and helped frame objectives and challenges in an accessible form to each group.
Technologies: Python, C++

Algorithmic Trader

2011 - 2013
Self Employed
  • Designed, developed, and deployed a multi-equity long/short algorithmic trading system in C++.
  • Implemented a multi-exchange and multi-threaded order management system.
  • Developed backtesting infrastructure and data warehousing for equities data.
Technologies: C++

LightTag - Text Annotation SaaS

http://www.lighttag.io
A text annotation SaaS I built and run myself. LightTag includes interfaces for annotation, management of a distributed workforce, data quality assurance, and active learning.
I built LightTag because I needed it and turned it into a profitable business through a combination of ML and UX.

YLabel - Serverless, In-Browser Full Text Search and Annotation

https://github.com/LightTag/ylabel
Ylabel is a POC project demonstrating full-text search powered annotation. The core of the project is storing an inverted index in the browsers IndexDB, which allows persistent full-text search without any infrastructure.

Dense Continuous Sentences - NLP Variational Autoencoder Using Densenet

https://github.com/talolard/DenseContinuousSentances
This project was an attempt to implement the "Generating Sentences from a Continuous Space" paper using Densenet as a variational encoder. Convolutional architectures are very effective with text and I did this project to see if they would make variational embeddings easier to train (they don't).

Article - Convolutional Methods For Text

https://medium.com/@TalPerry/convolutional-methods-for-text-d5260fd5675f
A long-form article aimed at practitioners discussing the merits and techniques of processing text with convolutional models as opposed to recurrent neural networks. Convolutions offer significant compute advantages because they are easily parallelizable thus shortening research cycles and lowering production compute costs and complexity.

Article - How To Label Data

https://www.lighttag.io/how-to-label-data/
A long-form guide I wrote as part of my work on LightTag. In the guide, I go over the typical lifecycle of a text annotation project as a pre-cursor to NLP and highlight best practices I've learned from my own experience and the experience of my customers.

Introductory Course To NLP

https://github.com/LightTag/NLPCourse
Problem sets and presentations for a 2-day course I teach on NLP for practitioners. The course revolves around a single problem, converting German numbers into digits (Fünftausendvierhunderteinunddreißig =>5431) and focuses on pre-processing, regex, and tokenization before introducing sequential NLP models.

RLStocks - Real Time Portfolio Rebalancing with Transaction Costs Solved with Reinforcement Learning

https://github.com/talolard/rlstocks
As a side project on a vacation, I implemented a few reinforcement learning algorithms attempting to optimally structure a portfolio of equities taking transaction costs into account.
After a foray into modern methods, I focused on a paper from the early '90s (Learning to Trade via Direct Reinforcement by Moody) that offers a much more domain focused approach to policy gradient algorithms.

Languages

Python, SQL, TypeScript 3, JavaScript, TypeScript, C++

Frameworks

Django, Redux

Libraries/APIs

PyTorch, TensorFlow, React, Pandas, Scikit-learn

Paradigms

Data Science

Platforms

MetaTrader, MetaTrader 4, Docker, Amazon Web Services (AWS)

Other

Data Analysis, Data Analytics, Natural Language Processing (NLP), Regular Expressions, Text Mining, Deep Learning, Machine Learning, GPT, Generative Pre-trained Transformers (GPT), Statistics, FIX Protocol, Trading Applications, Forex Trading

Storage

PostgreSQL, Redis, Elasticsearch

Industry Expertise

Trading Systems

Tools

Celery

2009 - 2013

Bachelor of Science Degree in Mathametics

Tel Aviv University - Tel Aviv, Israel

NOVEMBER 2017 - PRESENT

Google Developer Expert (Machine Learning)

Google

Collaboration That Works

How to Work with Toptal

Toptal matches you directly with global industry experts from our network in hours—not weeks or months.

1

Share your needs

Discuss your requirements and refine your scope in a call with a Toptal domain expert.
2

Choose your talent

Get a short list of expertly matched talent within 24 hours to review, interview, and choose from.
3

Start your risk-free talent trial

Work with your chosen talent on a trial basis for up to two weeks. Pay only if you decide to hire them.

Top talent is in high demand.

Start hiring