Senior Data Engineer2020 - PRESENTStout Technologies
Technologies: Apache Hive, Python 3, Unidash, GitHub, Spark
- Managed Facebook videos pipeline, containing attribute data, such as genre, PG rating, trending, etc., through Python/SQL.
- Optimized production SQL for throughput quality.
- Developed queries and built dashboards for business-critical video attributes.
Senior Data Engineer2018 - 2021Walmart Labs
Technologies: Unix, Spark, Apache Hive, MapReduce, Hadoop, SQL, Python, Data Warehouse Design, Data Warehousing, Databases, Kubernetes, Customer Data, Data, Data Engineering, Apache Airflow, Data Modeling, Data Pipelines, Web Scraping, Relational Databases, Dimensional Modeling, PostgreSQL, DevOps, Google Cloud Platform (GCP), Elasticsearch, ETL, Apache Spark, BigQuery, Google Cloud Composer, Looker
- Architected, developed, and supported new features in the project’s data flow that calculated cumulative/daily metrics such as converted visitors and first-time buyers on the home and search pages.
- Analyzed Hive sensor- and beacon-parsed data for ad-hoc analysis of user behavior.
- Automated the current ETL pipeline through Python to build SQL on the fly into Hive map columns. Reduced the development cycle of 2-3 weeks for each new feature.
- Wrote Hive UDF to replace the use of R to calculate p-value in the Hive pipeline. Supported existing processes and tools, mentored fellow engineers, and triaged data issues in a timely resolution.
- Participated in the effort to migrate on-premise jobs to the GCP cloud.
Senior Software Engineer2012 - 2018ebay
Technologies: Teradata, Presto DB, Apache Hive, Spark, Hadoop, Python, Databases, Data Warehousing, Data Warehouse Design, AWS, Docker, Customer Data, Data, Data Engineering, Apache Airflow, Data Modeling, Data Pipelines, Web Scraping, Relational Databases, Dimensional Modeling, PostgreSQL, DevOps, Google Cloud Platform (GCP), Elasticsearch, ETL, Apache Spark, BigQuery, Unix Shell Scripting
- Converted Teradata SQL to Spark SQL for a migration project. Developed Regex-related string processing UDFs for Spark.
- Wrote Pig, Hive, and Map Reduce jobs on user behavior clickstream data. Automated Unix scripts through crontabs to run analyses, such as first-time buyer count and conversion metrics on listings data.
- Prepared data for predictive and prescriptive modeling.
- Built tools and custom wrapper scripts, using Python to automate DistCp Hadoop commands and logs processing.
- Developed and supported ETL jobs into production. The jobs entailed both Teradata and Hadoop scripts.
Database Analyst2008 - 2012PeakPoint Technologies
Technologies: Python, Teradata, SQL, T-SQL, PL/SQL, Databases, Data Warehousing, Data Warehouse Design, Data, Data Engineering, Data Modeling, Data Pipelines, Relational Databases, Dimensional Modeling, DevOps, ETL, Apache Spark
- Data modeled and mapped, developed, and deployed ETL code. Wrote advanced Teradata SQL.
- Developed extended stored procedures, DB-link, packages, and parameterized dynamic PL/SQL to migrate the schema objects per business requirements.
- Designed a logical data model and implemented it to a physical data model.
- Developed and placed into production automated ETL jobs scheduled in the UC4 tool.