Data Engineer
2022 - 2022Meta- Delivered machine learning (ML) training data and analytics data to improve video understanding models and insights.
- Collaborated with project managers and data scientists to build Hive datasets and Unidash dashboards to understand trends in static and video time spent and support long-range infrastructure investments.
- Developed a reusable migration framework and supporting documentation using Python and Configurator to aid the migration of ad training data to an enhanced privacy secure storage.
- Created a training data and metadata pipeline using Spark and Dataswarm to enable training of video understanding models with a potential NE gain of 0.03.
Technologies: Python, Apache Hive, Spark SQL, Presto DBLead BI Engineer
2020 - 2022Unity- Built a data integration framework to ingest data from Cloud SQL and BigQuery into Snowflake using Airflow, Python, and YAML templates to unify and enable insights across different databases and organizations and reduce development time.
- Overhauled legacy internal and external reporting analytics pipelines making them efficient and scalable and leading to yearly savings of $14,000.
- Developed a revenue forecasting app for Unity Ads using Google App Script based on JavaScript with CRUD functionality, logging, and traceability of cross-functional input. It improved the productivity of the cross-functional forecasting team.
- Built Looker KPI dashboards to monitor the account activity, sales quota attainment, and growth metrics.
- Automated manual customer support reports by creating scripts to pull data from several APIs to create reports, saving 5+ hours weekly.
- Led a three-people team to deliver a foundational data warehouse and infrastructure to enable analytics on new products and services.
- Created a development environment using Docker, enabling developers to test locally and increasing their productivity.
- Built a Python-based data validation framework to detect and alert data anomalies, leading to proactive resolutions.
- Performed regular on-call duties and code reviews, maintaining and debugging Spark jobs, Airflow pipelines, GCP Infrastructure, and Imply dashboards.
- Led technical training sessions for analysts on Git, Looker, Airflow, and ETL best practices.
Technologies: Apache Airflow, Docker, Google BigQuery, Google Cloud Platform (GCP), Data Warehousing, Python, Looker, Snowflake, REST APIs, Google Cloud SQL, Git, TerraformBusiness Intelligence Engineer
2017 - 2020Pratt & Whitney Canada- Developed and managed ETL scripts to power analytics and business intelligence tools and implemented and organized processes to ensure the data integrity of databases.
- Built a rental-demand forecast algorithm using Python to support informed budget planning decisions of over $10 million on asset investment and maintenance.
- Created a Python-based asset exposure data application to alert internal users for proactive asset deployment to increase service level and eliminate exposure.
- Developed and deployed business intelligence tools and monitoring dashboards using Python and Microsoft Power BI to drive business decisions and KPI performance.
- Conducted research to optimize the warehouse location and proposed a solution that increased in-region coverage by 15%.
- Led and managed multiple business strategies and process automation, saving over 20 hours monthly.
- Performed data analysis and wrangling using Python, Pandas, NumPy, and Microsoft Power BI to analyze the market landscape accurately.
- Utilized NLP to automate the multi-label classification of comments to identify root causes leading to two hours saved weekly.
Technologies: Python, SQL, SAP, Visual Basic for Applications (VBA), Microsoft Power BI, Apache Hive