Data Engineer
2020 - PRESENTApple- Served as a data engineer in charge of two projects end-to-end—the projects involved collecting data from third-party cloud vendors.
- Developed a scheduled ETLs based on Python and Spark that collected data from various APIs and loaded the data to AWS S3 and PostgreSQL databases. The ETLs were deployed to Airflow and Kubernetes.
- Built a number of APIs that were exposing data from the data warehouse to consumers of the data.
- Created and modified ETLs based on AWS Glue. Created a serverless ETL based on Amazon SQS and Lambda.
Technologies: Python 3, Python API, Amazon EKS, Docker, Kubernetes, Amazon S3 (AWS S3), Amazon Simple Queue Service (SQS), AWS EMR, Redshift, PostgreSQLData Engineer
2019 - 2020BJ's Wholesale Club- Developed an ETL pipeline based on PySpark running on AWS EMR for the extraction of data from Redshift to S3.
- Contributed to a product recommendation engine based on Spark machine learning.
- Developed a data quality assessment tool in PySpark.
- Owned cloud cost reporting. Managed EMR cluster creation/termination in AWS CLI and AWS console.
- Completely automated ETL/marketing pipeline in Jenkins.
- Contributed to the algorithm for identifying new prospective members based on third-party data.
Technologies: Jenkins, AWS CLI, Amazon S3 (AWS S3), Redshift, Python 3, Spark, AWS EMRSenior Database Marketing Analyst
2017 - 2018eBay- Developed targeting scripts for flagship marketing campaigns with an emphasis on email, mobile push notification, social, and on-site channels. The campaigns often targeted over 50 million users and sometimes resulted in over $100,000 in iGMB annually.
- Designed, developed, implemented, and maintained multi-armed bandit algorithms written in Python while adhering to marketing standards and processes within eBay. The algorithm was measured to generate $5 mil. annually.
- Trained an algorithm for send-time optimization. This has resulted in a 15% increase in click-through-rate in campaigns where it was implemented.
- Assessed existing email, social, and mobile marketing campaigns in terms of KPIs such as iGMB, OR, and CTR.
- Created dashboards in Tableau that reported on the performance of different marketing algorithms I have created.
- Created scripts that moved data between HIVE and Teradata servers.
- Worked with the largest Teradata DWH in the world and often queried tables with 100+ billion rows.
- Communicated with stakeholders across multiple timezones.
Technologies: SQL, TensorFlow, Scikit-learn, Tableau, PySpark, Apache Hive, Python, TeradataMachine Learning SW Developer
2016 - 2017Valeo- Developed and trained a machine vision algorithm for recognition of pedestrians in front of a vehicle. The algorithm has since been implemented in a number of vehicle models including the GM 2019 Chevy.
- Trained and algorithm for detection of dirt on the camera lens. This algorithm had a crucial role in supporting other more complex self-driving functionalities.
- Assessed the quality of unstructured annotated video data used for algorithm training.
- Created a script for synchronization of both structured and unstructured data between multiple teams who participated on the project.
- Attended a computer science conferences and studied scientific literature to keep up-to-date with new trends in machine learning and computer science. Knowledge exchange with other team-members.
- Communicated and networked with teammates and stakeholders from France and Ireland.
Technologies: Protocol Buffers, Intel TBB, C++, OpenCV, SQL, MATLAB, PythonCredit Risk Analyst
2014 - 2015Erste Group- Calculated risk parameters CCF, LGD and PD according to BASEL 2.
- Reduced the overall reserve requirements of Erste Bank subsidiaries by over 7 % thanks to the improvements in the statistical engine for calculation of risk parameters CCF, LGD and PD that I have introduced.
- Designed and trained a mathematical model in SAS for prediction of the overall loss in the event of a client default. This helped Erste improve the repossession process and reduce expenses.
- Performed ad-hoc stress-tests for Erste subsidiaries. The results were later submitted directly to the European National Bank.
- Assessed of risk portfolio stability via bootstrapping and monte-carlo methods.
- Created interactive dashboards for risk parameter reporting in MS SQL and Excel.
- Developed a data quality testing system.
Technologies: Microsoft Excel, MATLAB, Microsoft SQL Server, SASTeaching and Research Assistant
2012 - 2014University of Rochester- Led lab lectures for undergraduate students.
- Developed software for automation of experiments and analyzed data produced by the experiments.
- Wrote several scientific papers that are available online.
Technologies: MATLAB