Jeff Carter, Ph.D., Developer in Temecula, CA, United States
Jeff is available for hire
Hire Jeff

Jeff Carter, Ph.D.

Verified Expert  in Engineering

Data Engineer and Developer

Location
Temecula, CA, United States
Toptal Member Since
June 24, 2020

Jeff is a full-stack data professional, well-versed in both data science and data engineering. He has a passion for building predictive data models, data flow processes and custom infrastructures. With over 15 years in the data arena, his experience spans statistical modeling and data visualization to building out real-time data-streaming infrastructures.

Portfolio

Pechanga Resort & Casino
Cloudera, Tableau, Apache Kafka, Kinetica, Kudu, Spark, IBM Db2, Oracle...
Picarro
Amazon Web Services (AWS), Amazon EC2, Spark, Logstash, Elasticsearch, RabbitMQ...

Experience

Availability

Part-time

Preferred Environment

Apache Kafka, Apache Hive, IntelliJ IDEA, Sublime Text, Git, Tableau, CouchDB, ZeroMQ, RabbitMQ, Kinetica, Kudu, Spark, StreamSets, Oracle, PostgreSQL, Microsoft SQL Server, Java, Python, Linux

The most amazing...

...thing that I've built is a real-time streaming infrastructure with more than seven data sources, moving 10+ million records daily into multiple destinations.

Work Experience

Data Engineer | Data Scientist

2016 - 2020
Pechanga Resort & Casino
  • Developed real-time streaming data pipelines processing 10 million records daily.
  • Designed and built data warehouse in Kinetica that tracks all SDCII dimensions with 3TB of data coming from previously isolated sources.
  • Wrote custom MCMC algorithms to calculate ROI on marketing events in a high-dimensional space, generating over a million dollars of additional annual revenue.
  • Built custom ETL to process millions of daily records detecting potential money laundering.
  • Advanced customer segmentation of 3+ million individuals, using a combination of custom behavioral metrics, traditional RFM (recency, frequency, monetary) metrics, and geolocation data.
Technologies: Cloudera, Tableau, Apache Kafka, Kinetica, Kudu, Spark, IBM Db2, Oracle, Microsoft SQL Server, StreamSets, Java, Python

Data Scientist

2013 - 2016
Picarro
  • Redesigned a configurable and modular real-time data pipeline framework to process several IoT sensors in a unified manner.
  • Developed machine learning algorithms to predict the ROI of making additional measurements of the Surveyor product, using Bayesian statistics.
  • Conducted sensitivity analysis of critical model parameters of a highly non-linear, multi-dimensional algorithm.
  • Built a complete software package that collects real-time streaming data from IoT sensors, visualizes multiple time series, conducts on-the-fly statistical calculations, and allows the user to control and interact with hardware firmware.
Technologies: Amazon Web Services (AWS), Amazon EC2, Spark, Logstash, Elasticsearch, RabbitMQ, ZeroMQ, Microsoft SQL Server, Python

Postdoctoral Researcher

2011 - 2013
Lawrence Livermore National Laboratory
  • Performed nonlinear regression modeling of multi-dimensional experimental data with custom models.
  • Built a framework to enable physics-based computer simulations of state-of-the-art experiments to better understand experimental results and sources of potential errors.
  • Published experimental data and modeling results in peer-reviewed scientific journals.
Technologies: Python

Research Assistant

2005 - 2011
University of Illinois
  • Automated real-time data collection and on-the-fly regression modeling from multiple sensors.
  • Developed a framework to simulate quantum dynamics resulting from external perturbations.
  • Published experimental results and data models in peer-reviewed scientific journals.
Technologies: Python, Data Analysis

Real-time Data into a Data Lake and Data Warehouse

Real-time data flows from MS SQL CDC tables, from Oracle LogMiner, and other message queues into a data lake, which mirrors the production data, and subsequently, extracted, transformed, and loaded (ETL) into the data warehouse (DWH). The pipelines for this project were primarily implemented in StreamSets, with the caveat that the open-source version of StreamSets did not have the full-functionality required. Additional pipeline stages were written in Java and integrated into the StreamSets framework.

Custom Python code enabled the automated build-out of the entire data lake schema by querying each source database and automatically generating the appropriate tables, including the mapping of the data types. This type of code as infrastructure enables rapid prototyping and rebuilding from scratch as needed with minimal effort.

The surrogate keys for the DWH are generated by a unique combination of primary keys and source database log IDs. These meaningful surrogate keys provide not only a way to track changes within mutable data but also an intrinsic, built-in data lineage.

Languages

Python, SQL, Java

Libraries/APIs

Pandas, ZeroMQ

Paradigms

Functional Programming, ETL, Object-oriented Programming (OOP)

Other

Statistics, Data Processing, Bayesian Inference & Modeling, Data Analysis, StreamSets, Data Visualization, Machine Learning, Streaming Data

Frameworks

Spark

Tools

Git, Kudu, Kinetica, Tableau, RabbitMQ, Sublime Text, IntelliJ IDEA, Cloudera, Logstash

Platforms

Linux, Apache Kafka, Amazon EC2, Amazon Web Services (AWS), Oracle

Storage

NoSQL, Apache Hive, PostgreSQL, Microsoft SQL Server, CouchDB, IBM Db2, Elasticsearch

2005 - 2011

Ph.D. in Chemical Physics

University of Illinois at Urbana-Champaign - Champaign, IL, USA

2001 - 2005

Bachelor of Science Degree in Chemistry

Virginia Tech - Blacksburg, VA, USA

Collaboration That Works

How to Work with Toptal

Toptal matches you directly with global industry experts from our network in hours—not weeks or months.

1

Share your needs

Discuss your requirements and refine your scope in a call with a Toptal domain expert.
2

Choose your talent

Get a short list of expertly matched talent within 24 hours to review, interview, and choose from.
3

Start your risk-free talent trial

Work with your chosen talent on a trial basis for up to two weeks. Pay only if you decide to hire them.

Top talent is in high demand.

Start hiring