Senior Data Engineer
2020 - PRESENTStout Technologies- Managed Facebook videos pipeline, containing attribute data, such as genre, PG rating, trending, etc., through Python/SQL.
- Optimized production SQL for throughput quality.
- Developed queries and built dashboards for business-critical video attributes.
Technologies: Apache Hive, Python 3, Unidash, GitHub, SparkSenior Data Engineer
2018 - 2021Walmart Labs- Architected, developed, and supported new features in the project’s data flow that calculated cumulative/daily metrics such as converted visitors and first-time buyers on the home and search pages.
- Analyzed Hive sensor- and beacon-parsed data for ad-hoc analysis of user behavior.
- Automated the current ETL pipeline through Python to build SQL on the fly into Hive map columns. Reduced the development cycle of 2-3 weeks for each new feature.
- Wrote Hive UDF to replace the use of R to calculate p-value in the Hive pipeline. Supported existing processes and tools, mentored fellow engineers, and triaged data issues in a timely resolution.
- Participated in the effort to migrate on-premise jobs to the GCP cloud.
Technologies: Unix, Spark, Apache Hive, MapReduce, Hadoop, SQL, Python, Data Warehouse Design, Data Warehousing, Databases, Kubernetes, Customer Data, Data, Data Engineering, Apache Airflow, Data Modeling, Data Pipelines, Web Scraping, Relational Databases, Dimensional Modeling, PostgreSQL, DevOps, Google Cloud Platform (GCP), Elasticsearch, ETL, Apache Spark, BigQuery, Google Cloud Composer, LookerSenior Software Engineer
2012 - 2018ebay- Converted Teradata SQL to Spark SQL for a migration project. Developed Regex-related string processing UDFs for Spark.
- Wrote Pig, Hive, and Map Reduce jobs on user behavior clickstream data. Automated Unix scripts through crontabs to run analyses, such as first-time buyer count and conversion metrics on listings data.
- Prepared data for predictive and prescriptive modeling.
- Built tools and custom wrapper scripts, using Python to automate DistCp Hadoop commands and logs processing.
- Developed and supported ETL jobs into production. The jobs entailed both Teradata and Hadoop scripts.
Technologies: Teradata, Presto DB, Apache Hive, Spark, Hadoop, Python, Databases, Data Warehousing, Data Warehouse Design, AWS, Docker, Customer Data, Data, Data Engineering, Apache Airflow, Data Modeling, Data Pipelines, Web Scraping, Relational Databases, Dimensional Modeling, PostgreSQL, DevOps, Google Cloud Platform (GCP), Elasticsearch, ETL, Apache Spark, BigQuery, Unix Shell ScriptingDatabase Analyst
2008 - 2012PeakPoint Technologies- Data modeled and mapped, developed, and deployed ETL code. Wrote advanced Teradata SQL.
- Developed extended stored procedures, DB-link, packages, and parameterized dynamic PL/SQL to migrate the schema objects per business requirements.
- Designed a logical data model and implemented it to a physical data model.
- Developed and placed into production automated ETL jobs scheduled in the UC4 tool.
Technologies: Python, Teradata, SQL, T-SQL, PL/SQL, Databases, Data Warehousing, Data Warehouse Design, Data, Data Engineering, Data Modeling, Data Pipelines, Relational Databases, Dimensional Modeling, DevOps, ETL, Apache Spark