Data Engineer | Data Scientist
2016 - 2020Pechanga Resort & Casino- Developed real-time streaming data pipelines processing 10 million records daily.
- Designed and built data warehouse in Kinetica that tracks all SDCII dimensions with 3TB of data coming from previously isolated sources.
- Wrote custom MCMC algorithms to calculate ROI on marketing events in a high-dimensional space, generating over a million dollars of additional annual revenue.
- Built custom ETL to process millions of daily records detecting potential money laundering.
- Advanced customer segmentation of 3+ million individuals, using a combination of custom behavioral metrics, traditional RFM (recency, frequency, monetary) metrics, and geolocation data.
Technologies: Cloudera, Tableau, Apache Kafka, Kinetica, Kudu, Spark, IBM DB2, Oracle, Microsoft SQL Server, Streamsets, Java, PythonData Scientist
2013 - 2016Picarro- Redesigned a configurable and modular real-time data pipeline framework to process several IoT sensors in a unified manner.
- Developed machine learning algorithms to predict the ROI of making additional measurements of the Surveyor product, using Bayesian statistics.
- Conducted sensitivity analysis of critical model parameters of a highly non-linear, multi-dimensional algorithm.
- Built a complete software package that collects real-time streaming data from IoT sensors, visualizes multiple time series, conducts on-the-fly statistical calculations, and allows the user to control and interact with hardware firmware.
Technologies: Amazon Web Services (AWS), AWS EC2, AWS, Spark, Logstash, Elasticsearch, RabbitMQ, ZeroMQ, Microsoft SQL Server, PythonPostdoctoral Researcher
2011 - 2013Lawrence Livermore National Laboratory- Performed nonlinear regression modeling of multi-dimensional experimental data with custom models.
- Built a framework to enable physics-based computer simulations of state-of-the-art experiments to better understand experimental results and sources of potential errors.
- Published experimental data and modeling results in peer-reviewed scientific journals.
Technologies: PythonResearch Assistant
2005 - 2011University of Illinois- Automated real-time data collection and on-the-fly regression modeling from multiple sensors.
- Developed a framework to simulate quantum dynamics resulting from external perturbations.
- Published experimental results and data models in peer-reviewed scientific journals.
Technologies: Python, Data Analysis