Tal Perry, Natural Language Processing (NLP) Developer in Berlin, Germany
Tal Perry

Natural Language Processing (NLP) Developer in Berlin, Germany

Member since March 20, 2020
Tal is a Google developer with expertise in machine learning and a former NLP researcher at Citi. He is the founder and CTO of LightTag, a profitable NLP SaaS platform. His experience spans ML, ops, and human-machine interfaces. The solutions he's put in production include language-based compliance monitoring systems, high-frequency trading systems trading hundreds of millions a day, and NLP-based alternative data offerings for competitive intelligence and financial analysis.
Tal is now available for hire

Portfolio

  • LightTag
    Amazon Web Services (AWS), Docker, Django, Python, React, TensorFlow
  • Citi
    TensorFlow, Scikit-learn, Python
  • Superfly
    Amazon Web Services (AWS), Elasticsearch, PostgreSQL, Redis, Celery, Django...

Experience

Location

Berlin, Germany

Availability

Part-time

Preferred Environment

Amazon Web Services (AWS), SQL, PostgreSQL, Docker, Django, Redux, React, TensorFlow, TypeScript, JavaScript, Python

The most amazing...

...thing I've built is a patented system for analyzing trader behavior based on the behavioral finance literature.

Employment

  • Founder, CEO, CTO

    2018 - 2020
    LightTag
    • Created a SaaS business for NLP annotation with customers including Viasat, Microsoft, and Pitchbook.
    • Deployed a language-agnostic machine learning model that correctly generates 70% of entity annotations on the platform.
    • Built a multi-tenant SaaS supporting thousands of tenants while maintaining strong guarantees on tenant data isolation and low infrastructure expenses.
    • Invented and implemented a patent-pending interface for drag and drop relationship annotation supporting constituency and dependency grammar.
    • Conducted customer interviews and implemented findings to increase conversions, retention rates, and customer delight.
    • Designed and deployed deep NLP models that can adapt to customer data without incurring significant compute costs.
    Technologies: Amazon Web Services (AWS), Docker, Django, Python, React, TensorFlow
  • Data Scientist

    2016 - 2020
    Citi
    • Applied behavioral finance theory to create a patented system for detecting bias in credit trader behavior.
    • Used rule-based and deep learning NLP to create multilingual compliance and CRM solutions for sell-side credit and rates trading.
    • Reduced labor costs and turnaround time for institutional loan origination by developing ML-based document classification, routing, and extraction systems.
    Technologies: TensorFlow, Scikit-learn, Python
  • CTO

    2014 - 2016
    Superfly
    • Grew the engineering team from a team of one to a cohesive and productive team of 12.
    • Reduced turnaround time on POCs from three weeks on average to less than 48 hours by making core data assets accessible to the business side.
    • Led the technological and product shift of the company from a $0 revenue consumer-facing service to a multi-million dollar alternative data provider.
    • Maintained an acceptable infrastructure cost as we grew our data processing scale 1,000X.
    • Increased return on data annotation costs by developing a "human-friendly" domain-specific language for semi-structured text analytics.
    • Drove data acquisition throughput by deploying a terabyte-scale Elasticsearch cluster and designing a custom interface to find "needles in haystacks."
    Technologies: Amazon Web Services (AWS), Elasticsearch, PostgreSQL, Redis, Celery, Django, Python
  • Research Engineer

    2013 - 2014
    Fluent Trade Technologies
    • Deployed high-frequency algorithmic trading systems capable of trading hundreds of millions in notional volume a day.
    • Implemented ML algorithms with single-digit millisecond latency to maintain an edge in HFT.
    • Contributed to API design, usability testing, and QA as the company expanded into HFT PaaS offerings.
    • Liaised between the research team, engineering, and senior management and helped frame objectives and challenges in an accessible form to each group.
    Technologies: Python, C++
  • Algorithmic Trader

    2011 - 2013
    Self Employed
    • Designed, developed, and deployed a multi-equity long/short algorithmic trading system in C++.
    • Implemented a multi-exchange and multi-threaded order management system.
    • Developed backtesting infrastructure and data warehousing for equities data.
    Technologies: C++

Experience

  • LightTag - Text Annotation SaaS
    http://www.lighttag.io

    A text annotation SaaS I built and run myself. LightTag includes interfaces for annotation, management of a distributed workforce, data quality assurance, and active learning.
    I built LightTag because I needed it and turned it into a profitable business through a combination of ML and UX.

  • YLabel - Serverless, In-Browser Full Text Search and Annotation
    https://github.com/LightTag/ylabel

    Ylabel is a POC project demonstrating full-text search powered annotation. The core of the project is storing an inverted index in the browsers IndexDB, which allows persistent full-text search without any infrastructure.

  • Dense Continuous Sentences - NLP Variational Autoencoder Using Densenet
    https://github.com/talolard/DenseContinuousSentances

    This project was an attempt to implement the "Generating Sentences from a Continuous Space" paper using Densenet as a variational encoder. Convolutional architectures are very effective with text and I did this project to see if they would make variational embeddings easier to train (they don't).

  • Article - Convolutional Methods For Text
    https://medium.com/@TalPerry/convolutional-methods-for-text-d5260fd5675f

    A long-form article aimed at practitioners discussing the merits and techniques of processing text with convolutional models as opposed to recurrent neural networks. Convolutions offer significant compute advantages because they are easily parallelizable thus shortening research cycles and lowering production compute costs and complexity.

  • Article - How To Label Data
    https://www.lighttag.io/how-to-label-data/

    A long-form guide I wrote as part of my work on LightTag. In the guide, I go over the typical lifecycle of a text annotation project as a pre-cursor to NLP and highlight best practices I've learned from my own experience and the experience of my customers.

  • Introductory Course To NLP
    https://github.com/LightTag/NLPCourse

    Problem sets and presentations for a 2-day course I teach on NLP for practitioners. The course revolves around a single problem, converting German numbers into digits (F├╝nftausendvierhunderteinunddrei├čig =>5431) and focuses on pre-processing, regex, and tokenization before introducing sequential NLP models.

  • RLStocks - Real Time Portfolio Rebalancing with Transaction Costs Solved with Reinforcement Learning
    https://github.com/talolard/rlstocks

    As a side project on a vacation, I implemented a few reinforcement learning algorithms attempting to optimally structure a portfolio of equities taking transaction costs into account.
    After a foray into modern methods, I focused on a paper from the early '90s (Learning to Trade via Direct Reinforcement by Moody) that offers a much more domain focused approach to policy gradient algorithms.

Skills

  • Languages

    Python, SQL, TypeScript 3, JavaScript, TypeScript, C++
  • Frameworks

    Django, Redux
  • Libraries/APIs

    PyTorch, TensorFlow, React, Pandas, Scikit-learn
  • Paradigms

    Data Science
  • Platforms

    MetaTrader, MetaTrader 4, Docker, Amazon Web Services (AWS)
  • Other

    Data Analysis, Data Analytics, Natural Language Processing (NLP), Regular Expressions, Text Mining, Deep Learning, Machine Learning, Statistics, FIX Protocol, Trading Applications, Forex Trading
  • Storage

    PostgreSQL, Redis, Elasticsearch
  • Industry Expertise

    Trading Systems
  • Tools

    Celery

Education

  • Bachelor of Science Degree in Mathametics
    2009 - 2013
    Tel Aviv University - Tel Aviv, Israel

Certifications

  • Google Developer Expert (Machine Learning)
    NOVEMBER 2017 - PRESENT
    Google

To view more profiles

Join Toptal
Share it with others