Jeff Carter, Ph.D.
Verified Expert in Engineering
Data Engineer and Developer
Jeff is a full-stack data professional, well-versed in both data science and data engineering. He has a passion for building predictive data models, data flow processes and custom infrastructures. With over 15 years in the data arena, his experience spans statistical modeling and data visualization to building out real-time data-streaming infrastructures.
Portfolio
Experience
Availability
Preferred Environment
Apache Kafka, Apache Hive, IntelliJ IDEA, Sublime Text, Git, Tableau, CouchDB, ZeroMQ, RabbitMQ, Kinetica, Kudu, Spark, StreamSets, Oracle, PostgreSQL, Microsoft SQL Server, Java, Python, Linux
The most amazing...
...thing that I've built is a real-time streaming infrastructure with more than seven data sources, moving 10+ million records daily into multiple destinations.
Work Experience
Data Engineer | Data Scientist
Pechanga Resort & Casino
- Developed real-time streaming data pipelines processing 10 million records daily.
- Designed and built data warehouse in Kinetica that tracks all SDCII dimensions with 3TB of data coming from previously isolated sources.
- Wrote custom MCMC algorithms to calculate ROI on marketing events in a high-dimensional space, generating over a million dollars of additional annual revenue.
- Built custom ETL to process millions of daily records detecting potential money laundering.
- Advanced customer segmentation of 3+ million individuals, using a combination of custom behavioral metrics, traditional RFM (recency, frequency, monetary) metrics, and geolocation data.
Data Scientist
Picarro
- Redesigned a configurable and modular real-time data pipeline framework to process several IoT sensors in a unified manner.
- Developed machine learning algorithms to predict the ROI of making additional measurements of the Surveyor product, using Bayesian statistics.
- Conducted sensitivity analysis of critical model parameters of a highly non-linear, multi-dimensional algorithm.
- Built a complete software package that collects real-time streaming data from IoT sensors, visualizes multiple time series, conducts on-the-fly statistical calculations, and allows the user to control and interact with hardware firmware.
Postdoctoral Researcher
Lawrence Livermore National Laboratory
- Performed nonlinear regression modeling of multi-dimensional experimental data with custom models.
- Built a framework to enable physics-based computer simulations of state-of-the-art experiments to better understand experimental results and sources of potential errors.
- Published experimental data and modeling results in peer-reviewed scientific journals.
Research Assistant
University of Illinois
- Automated real-time data collection and on-the-fly regression modeling from multiple sensors.
- Developed a framework to simulate quantum dynamics resulting from external perturbations.
- Published experimental results and data models in peer-reviewed scientific journals.
Experience
Real-time Data into a Data Lake and Data Warehouse
Custom Python code enabled the automated build-out of the entire data lake schema by querying each source database and automatically generating the appropriate tables, including the mapping of the data types. This type of code as infrastructure enables rapid prototyping and rebuilding from scratch as needed with minimal effort.
The surrogate keys for the DWH are generated by a unique combination of primary keys and source database log IDs. These meaningful surrogate keys provide not only a way to track changes within mutable data but also an intrinsic, built-in data lineage.
Skills
Languages
Python, SQL, Java
Libraries/APIs
Pandas, ZeroMQ
Paradigms
Functional Programming, ETL, Object-oriented Programming (OOP)
Other
Statistics, Data Processing, Bayesian Inference & Modeling, Data Analysis, StreamSets, Data Visualization, Machine Learning, Streaming Data
Frameworks
Spark
Tools
Git, Kudu, Kinetica, Tableau, RabbitMQ, Sublime Text, IntelliJ IDEA, Cloudera, Logstash
Platforms
Linux, Apache Kafka, Amazon EC2, Amazon Web Services (AWS), Oracle
Storage
NoSQL, Apache Hive, PostgreSQL, Microsoft SQL Server, CouchDB, IBM Db2, Elasticsearch
Education
Ph.D. in Chemical Physics
University of Illinois at Urbana-Champaign - Champaign, IL, USA
Bachelor of Science Degree in Chemistry
Virginia Tech - Blacksburg, VA, USA
How to Work with Toptal
Toptal matches you directly with global industry experts from our network in hours—not weeks or months.
Share your needs
Choose your talent
Start your risk-free talent trial
Top talent is in high demand.
Start hiring