Senior Data Engineer2018 - PRESENTWalmart Labs
Technologies: Unix, Spark, Apache Hive, MapReduce, Hadoop, SQL, Python, Data Warehouse Design, Data Warehousing, Databases, Kubernetes, Customer Data, Data, Data Engineering, Apache Airflow, Data Modeling, Data Pipelines, Web Scraping
- Architected, developed, and supported new features in the project’s data flow that calculated cumulative/daily metrics such as converted visitors and first-time buyers on the home and search pages.
- Analyzed sensor and beacon parsed data in Hive for ad-hoc analysis of user behavior.
- Automated the current ETL pipeline through the use of Python to build SQL on the fly into Hive map columns. Reduced the development cycle of 2-3 weeks for each new feature.
- Wrote Hive UDF to replace the use of R to calculate p-value in the Hive pipeline. Support existing processes and tools, mentor fellow engineers, and triage data issues in a timely resolution.
- Participated in the effort to migrate on-premise jobs to GCP cloud.
Senior Software Engineer2012 - 2020ebay
Technologies: Teradata, Presto DB, Apache Hive, Spark, Hadoop, Python, Databases, Data Warehouse Design, Data Warehousing, AWS, Docker, Customer Data, Data, Data Engineering, Apache Airflow, Data Modeling, Data Pipelines, Web Scraping
- Converted Teradata SQL to Spark SQL for a migration project. Developed Regex-related string processing UDFs for Spark.
- Wrote Pig, Hive, and Map Reduce jobs on user behavior clickstream data. Automated Unix scripts through crontabs to run analyses such as first-time buyer count and conversion metrics on listings data.
- Prepared data for predictive and prescriptive modeling.
- Built tools and custom wrapper scripts using Python to automate DistCp Hadoop commands and logs processing.
- Developed and supported ETL jobs into production. The jobs entailed both Teradata and Hadoop scripts.
Database Analyst2008 - 2012PeakPoint Technologies
Technologies: Python, Teradata, SQL, T-SQL, PL/SQL, Databases, Data Warehousing, Data Warehouse Design, Data, Data Engineering, Data Modeling, Data Pipelines
- Data modeled and mapped, developed, and deployed ETL code. Wrote advanced Teradata SQL.
- Developed extended stored procedures, db-link, packages, and parameterized dynamic PL/SQL to migrate the schema objects as per business requirements.
- Designed a logical data model and implemented it to a physical data model.
- Developed and placed into production automated ETL jobs scheduled in the UC4 tool.