Senior Data Engineer2019 - 2021Curvo Labs
Technologies: Apache Airflow, Amazon Web Services (AWS), Python 3, Web Scraping, Scala, Spark, Spark SQL, Pandas, Amazon Virtual Private Cloud (VPC), TypeScript, GraphQL, Elasticsearch, NestJS, React, Beautiful Soup, Selenium WebDriver, Jest
- Implemented Amazon Redshift Data Warehouse from scratch to serve as a reporting database.
- Designed ETL jobs in AWS Data Pipelines to move data from the production database to Redshift.
- Built data pipelines using Python to do data transformations and web scraping using BeautifulSoup and Selenium WebDriver.
- Implemented an Airflow instance from scratch to orchestrate data pipelines and later moved this to AWS MWAA when it became available.
- Implemented AWS QuickSight as the primary reporting tool and designed reports which were later embedded into a web application.
- Built data pipelines using Spark and Scala for distributed data processing and transformation and deployed them in AWS Glue.
- Implemented a portion of a web application that embedded AWS QuickSight reports. The tech stack used Node.js, React, TypeScript, GraphQL Ant Design, and Jest for testing. Features included user authentication, data access, and dashboard embedding.
Senior Data Engineer2017 - 2020Colorescience
Technologies: Salesforce, MySQL, PostgreSQL, Python, AWS, Data Pipelines
- Designed, developed, and maintained a reporting data warehouse built using PostgreSQL (AWS RDS).
- Built data pipelines to move data from a production system and third-party systems to a centralized data warehouse.
- Connected to third-party APIs to import data on an incremental basis e.g., Salesforce, Sailthru, and CrowdTwist.
- Managed resources on AWS.
- Created reports and dashboards in Looker to provide insights on the data.
Senior Data Engineer2019 - 2019Curvo Labs
Technologies: Web Scraping, AWS Glue, Docker, AWS S3, Redshift, Apache Airflow, Spark ML, Scala, Apache Spark, Python, AWS, Data Pipelines
- Built a new Data pipeline framework orchestrated in Airflow. Also, Airflow was setup with Docker.
- Wrote new data-pipelines in Python and schedule them in Airflow.
- Performed a variety of on-demand tasks using Apache Spark/Scala and deployed in AWS Glue.
- Established Redshift as a centralized data warehouse and moved the data to Redshift from S3, production systems, and third-party applications.
- Setup the mode to create enterprise reports from the data moved to Redshift.
Senior Business Intelligence Engineer2016 - 2018Altus Group Limited
Technologies: Reporting, Pentaho, Informatica, Microsoft SQL Server, PostgreSQL, Data Pipelines
- Built a reporting data warehouse using Pentaho, PostgreSQL, and Informatica.
- Designed a database schema in PostgreSQL to represent the reporting use case.
- Created ETL tasks in Informatica to move data from the production systems into PostgreSQL.
- Built reports and dashboards using a Pentaho report designer and deployed them to the BI server.
Data Engineer2014 - 2016Wave Accounting, Inc.
Technologies: Pentaho, Ansible, Sqoop, Apache Hive, MySQL, Hadoop, PostgreSQL, Sisense, Microsoft SQL Server, Python, Data Pipelines
- Designed, developed, and maintained big data and business intelligence solutions at Wave.
- Designed and scheduled complex ETL workflows and jobs using Pentaho Data Integration (Kettle) to load data into the data systems.
- Wrote custom Python scripts to access third party APIs and download data into the data systems.
- Developed complex SQL queries including JOINS, subqueries, and common table expressions to address ad hoc business analytics and other requirements.
- Coordinated with the product and executive teams to gather and understand business requirements.
- Built an end-to-end relational data warehouse—including infrastructure, schema design, optimization, and administration.
- Designed and developed a Hadoop Cluster using Horton Works HDP 2.0. Tasks include installing and configuring a Hadoop ecosystem and designing the HDFS.
- Designed and scheduled Sqoop jobs to load data into the HDFS from the production systems.
Business Intelligence Developer2011 - 2014Eyereturn Marketing, Inc.
Technologies: MySQL, Sqoop, Apache Pig, Hadoop, Apache Hive, Pentaho, SSAS, SQL Server Integration Services (SSIS), Microsoft SQL Server
- Designed real-time reporting solutions using a SQL server (SSIS, SSAS and SSRS) and Pentaho business intelligence tools (MySQL, Mondrian, and Pentaho).
- Created custom automated/scheduled reports using Eclipse BIRT and Pentaho Report Designer.
- Built custom ETL tasks to transform data for custom reports using Kettle (Pentaho Data Integration).
- Designed and optimized database schemas to make reporting faster and efficient.
- Created, maintained, and scheduled custom data processors to pull and manipulate data from HDFS using Pig, Sqoop, and Oozie (Cloudera Hadoop).
Database Analyst2010 - 2011George Brown College
Technologies: Raiser's Edge, Microsoft SQL Server
- Handled and was responsible for the database administration in the organization using Blackbaud’s Raiser’s Edge.
- Updated and maintained the alumni database using the MS SQL Server.
- Conducted data validation and verification to ensure the accuracy and quality of the data.
- Performed multiple queries at a complex level for the purposes of reports and provide information for divisional and marketing purposes.
- Provided support to the project managers.
Software Engineer2007 - 2009Tata Consultancy Services
Technologies: Oracle, SQL, HTML, Java
- Provided post-implementation support and training for an enterprise level banking application (TCS B@ncs) to 25,000+ corporate end-users.
- Handled different modules of the banking operations such as routine banking, loans and mortgages, capital markets, and foreign exchange.
- Analyzed client business needs and translated them into functional/operational requirements.
- Communicated successfully with a variety of people including subject matter experts to establish a technical vision, business units, development teams, and support teams.