Data Engineer | Data Scientist2016 - 2020Pechanga Resort & Casino
Technologies: Cloudera, Tableau, Apache Kafka, Kinetica, Kudu, Spark, IBM DB2, Oracle, Microsoft SQL Server, Streamsets, Java, Python
- Developed real-time streaming data pipelines processing 10 million records daily.
- Designed and built data warehouse in Kinetica that tracks all SDCII dimensions with 3TB of data coming from previously isolated sources.
- Wrote custom MCMC algorithms to calculate ROI on marketing events in a high-dimensional space, generating over a million dollars of additional annual revenue.
- Built custom ETL to process millions of daily records detecting potential money laundering.
- Advanced customer segmentation of 3+ million individuals, using a combination of custom behavioral metrics, traditional RFM (recency, frequency, monetary) metrics, and geolocation data.
Data Scientist2013 - 2016Picarro
Technologies: Amazon Web Services (AWS), AWS EC2, AWS, Spark, Logstash, Elasticsearch, RabbitMQ, ZeroMQ, Microsoft SQL Server, Python
- Redesigned a configurable and modular real-time data pipeline framework to process several IoT sensors in a unified manner.
- Developed machine learning algorithms to predict the ROI of making additional measurements of the Surveyor product, using Bayesian statistics.
- Conducted sensitivity analysis of critical model parameters of a highly non-linear, multi-dimensional algorithm.
- Built a complete software package that collects real-time streaming data from IoT sensors, visualizes multiple time series, conducts on-the-fly statistical calculations, and allows the user to control and interact with hardware firmware.
Postdoctoral Researcher2011 - 2013Lawrence Livermore National Laboratory
- Performed nonlinear regression modeling of multi-dimensional experimental data with custom models.
- Built a framework to enable physics-based computer simulations of state-of-the-art experiments to better understand experimental results and sources of potential errors.
- Published experimental data and modeling results in peer-reviewed scientific journals.
Research Assistant2005 - 2011University of Illinois
Technologies: Python, Data Analysis
- Automated real-time data collection and on-the-fly regression modeling from multiple sensors.
- Developed a framework to simulate quantum dynamics resulting from external perturbations.
- Published experimental results and data models in peer-reviewed scientific journals.