Abarajithan Arunachalam, Developer in Karaikudi, Tamil Nadu, India
Abarajithan is available for hire
Hire Abarajithan

Abarajithan Arunachalam

Verified Expert  in Engineering

BI and Database Developer

Location
Karaikudi, Tamil Nadu, India
Toptal Member Since
March 11, 2021

Abarajithan is a data engineering expert with over thirteen years of experience in​ dynamic startups and Fortune 500 companies in the USA, Germany, and India. At Amazon, he built a sub-second data discovery and search platform for the merchant analytics team. He excels in data engineering, ETL/data warehousing, BI, data analytics, and QA​ , with solid experience in various platforms and technologies, including AWS, GCP, Hive, EMR, Elasticsearch, NoSQL, MPP, and columnar/relational databases.

Portfolio

Freelance
ETL, Data Engineering, GitHub, Python 3, Redshift, Databricks, Data Modeling...
Blinkist
AWS Lambda, Redshift, Serverless Framework, Node.js, Amazon Kinesis, Terraform...
GoodRx
Redshift, SQL, Data Warehouse Design, Data Warehousing...

Experience

Availability

Part-time

Preferred Environment

Python 3, SQL, Amazon Web Services (AWS), Google Cloud Platform (GCP), Serverless Architecture, Data Warehousing, Data Modeling, Data Engineering, ETL, GitHub

The most amazing...

...product I built was for Amazon, a sub-second data discovery and search platform for the merchant analytics team utilizing Redshift, Elasticsearch, and Kibana.

Work Experience

Freelance Data Engineer

2021 - PRESENT
Freelance
  • Worked on an experimental COVID-19 data engineering and business intelligence project using a serverless architecture Pub/Sub, Cloud Functions, Cloud Storage, BigQuery, Airflow, and Google Data Studio on Google Cloud Platform (GCP).
  • Collaborated with Toptal clients for data engineering, modeling, SQL data extraction, business intelligence, and data warehouse optimization projects.
  • Implemented social media analytics and a natural language processing (NLP) project using the GCP serverless stack and BigQuery.
  • Conceptualized and built data architecture from scratch in AWS for real-time and batch data processing using Kinesis, Lambda, Glue (PySpark), and Step Function.
Technologies: ETL, Data Engineering, GitHub, Python 3, Redshift, Databricks, Data Modeling, Google Cloud Platform (GCP), BigQuery, Python, Data Pipelines, Amazon Kinesis, AWS Lambda, AWS Glue, AWS Step Functions, PySpark

Senior Data Engineer

2019 - 2020
Blinkist
  • Developed the data infrastructure using a serverless framework and architecture on Node.js.
  • Developed AWS Lambda functions for posting events to third-party API Amplitude for behavioral analytics.
  • Built the AWS Kinesis streaming platform for pushing mobile events into a Redshift data warehouse. Developed ERB templates using Terraform to deploy AWS infrastructure components and CircleCI for CI/CD development.
Technologies: AWS Lambda, Redshift, Serverless Framework, Node.js, Amazon Kinesis, Terraform, CircleCI, Data Engineering, Amplitude, GitHub, Data Pipelines

Senior Data Engineer

2017 - 2018
GoodRx
  • Built complex star schema-based dimensional models to capture the full customer lifecycle for the customer acquisition and retention marketing programs.
  • Monitored the Redshift database and performance-tuned the analysts' SQL queries. Worked on SQL/Python Jinja templates, data pipeline design, orchestration, and implementation in Amazon Redshift for product managers and analysts.
  • Collaborated with analysts for Looker reporting and took a deep dive into the business analysis.
Technologies: Redshift, SQL, Data Warehousing, Data Warehouse Design, Amazon Web Services (AWS), Business Intelligence (BI), Data Engineering, Looker, Data Modeling, GitHub, Python 3, Python, Data Pipelines

Data Engineer II

2015 - 2017
Amazon.com
  • Designed and developed legacy Oracle and Amazon Redshift SQL data pipelines for key reporting needs of product managers. Led and implemented the data/ETL migration effort from Oracle to Amazon Redshift for over 150TB of data.
  • Implemented a Python-based ETL for data ingestion from Redshift into a sub-second query platform in Elasticsearch and Kibana. Performed a full-scale POC for a next-generation reporting platform, including open source and AWS offerings.
  • Onboarded long-running ETL pipelines onto the Hive/EMR platform. Interacted with business users to solve data problems. Participated in operational activities for the entire suite of ETL and BI systems on a rotational basis.
Technologies: Amazon Web Services (AWS), SQL, Business Intelligence (BI), Elasticsearch, Data Warehousing, Data Warehouse Design, Amazon Elastic MapReduce (EMR), Apache Hive, Python 3, ETL, Data Engineering, Kibana, Oracle Business Intelligence Enterprise Edition 11g (OBIEE), Amazon QuickSight, Data Modeling, Data Pipelines

Data Engineer

2014 - 2015
Lyft
  • Developed the database optimization of 12-node Redshift clusters to handle over 60 TB of data and built toolsets around Amazon Redshift.
  • Built comprehensive customer/passenger mileage-tracking data models using GIS data in Redshift to be used by insurance and operations teams for payouts to third-party vendors.
  • Performed operational tasks for Python-based ETL job dependencies, job failures, and performed SQL query analysis/review for business analysts and data scientists.
Technologies: Redshift, SQL, Python 3, Amazon Web Services (AWS), Data Engineering, Business Intelligence (BI), ETL, Data Modeling, GitHub, Python, Data Pipelines

Senior Data Warehouse Engineer

2013 - 2014
YouSendIt
  • Designed and developed a scalable Splunk ETL using the Splunk Java SDK and Talend with an Amazon Redshift MPP database for processing unstructured web log data.
  • Helped the product development team with Cassandra NoSQL data modeling by developing custom Cassandra ETL, using DataStax Java drivers for partner reporting.
  • Implemented data integration packages using Talend and built reports/visualizations in Tableau for user growth, mobile platform, sales, and other business teams.
Technologies: Java, Redshift, Talend ETL, SQL, NoSQL, Cassandra, Tableau, Amazon Web Services (AWS), Data Engineering, Data Warehouse Design, Data Warehousing, Business Intelligence (BI), SQL Server Integration Services (SSIS), Data Modeling, Data Pipelines

Software Engineer | Data Warehouse Developer

2013 - 2013
Guidewire
  • Enhanced a Java-based ETL engine and SQL transform packages.
  • Designed star schema insurance domain data models.
  • Performed unit and functional tests for different product components.
Technologies: Java, SQL, Data Modeling

Software Engineer 2 | EDW and Business Intelligence

2010 - 2013
Citrix
  • Developed the ETL infrastructure using Oracle Data Integrator, Informatica Cloud, and PL/SQL.
  • Built analytical data models and reports using BusinessObjects and Microsoft SSAS for customer insights and marketing quarterly business reviews.
  • Developed a custom Java-based ETL solution to migrate data from over 80 PostgREST audio bridges spread across eight data centers.
Technologies: Oracle, Java, SSAS, Oracle ODI, Data Warehouse Design, Data Warehousing, Business Intelligence (BI), ETL, Data Engineering, SAP BusinessObjects (BO), Data Modeling, PostgREST, Informatica Cloud, Data Pipelines

Software Quality Engineer 2

2006 - 2008
Adobe
  • Prepared test cases, completed bug reports/triage, and involved in testing bug fixes.
  • Assisted in the scalable keyword-driven test automation framework design using Borland Silk Test.
  • Oversaw all of the quality assurance engineering for the product Adobe Captivate.
Technologies: Test Automation, Manual Software Testing, Test Automation Frameworks

Software Test Engineer

2005 - 2006
Infosys
  • Executed test plans, test scenarios, and test cases based on client requirements.
  • Served for my team as the software-testing point of contact with the client.
  • Oversaw the test strategy which included tracking the test execution and metrics.
Technologies: Manual Software Testing, Test Automation, Test Automation Frameworks

COVID-19 Worldwide Analytics

https://github.com/abarajithan-a/gcp-covid19-cloudfunctions
I built a data engineering and business intelligence analytics platform based on worldwide COVID-19 data collated by Johns Hopkins University. It is implemented on the Google Cloud Platform using the serverless architecture paradigm and the tech stack, including Python-based ETL in cloud functions, pub/sub, cloud storage, Big Query, Google Data Studio, and Docker Airflow for pipeline orchestration.

For more information about the COVID-19 project, check out the following links.

For COVID-19 GCP data architecture
https://github.com/abarajithan-a/gcp-covid19-cloudfunctions

For COVID-19 GCP Business Intelligence
https://github.com/abarajithan-a/gcp-covid19-businessintelligence

Redshift Data Migration Project

At Amazon, I led a time-sensitive complex data migration effort of over 150 TB from Oracle to AWS EMR and AWS Redshift. I migrated over thousands of ETL jobs and completed the project within the deadline without compromising data quality and consistency.

Serverless Streaming Data Pipeline

At Blinkist, I built a serverless streaming data pipeline using AWS Lambda and AWS Kinesis to consume, enrich and push mobile app events to Amplitude for behavioral analytics and Redshift for downstream data warehousing.
2008 - 2010

Master's Degree in Management Information Systems

Eller College of Management, The University of Arizona - Tucson, Arizona, United States

2001 - 2005

Bachelor's Degree in Electronics and Communications Engineering

College of Engineering Guindy, Anna University - Chennai, Tamil Nadu, India

Libraries/APIs

PySpark, Node.js, PostgREST

Tools

AWS Glue, GitHub, Amazon Elastic MapReduce (EMR), Talend ETL, Tableau, BigQuery, SSAS, Terraform, CircleCI, Looker, Kibana, Oracle Business Intelligence Enterprise Edition 11g (OBIEE), Amazon QuickSight, Apache Airflow, AWS Step Functions

Languages

SQL, Python, Python 3, Java

Storage

Redshift, Elasticsearch, Data Pipelines, Apache Hive, NoSQL, Cassandra, SQL Server Integration Services (SSIS)

Paradigms

Business Intelligence (BI), Serverless Architecture, ETL, Oracle ODI, Test Automation

Platforms

Amazon Web Services (AWS), Google Cloud Platform (GCP), AWS Lambda, Oracle, Docker, Databricks

Frameworks

Serverless Framework

Other

Data Warehousing, Data Modeling, Google BigQuery, Data Warehouse Design, Data Engineering, Amazon Kinesis, Manual Software Testing, Test Automation Frameworks, Google Cloud Functions, Google Data Studio, Data Mining, SAP BusinessObjects (BO), Amplitude, Informatica Cloud

Collaboration That Works

How to Work with Toptal

Toptal matches you directly with global industry experts from our network in hours—not weeks or months.

1

Share your needs

Discuss your requirements and refine your scope in a call with a Toptal domain expert.
2

Choose your talent

Get a short list of expertly matched talent within 24 hours to review, interview, and choose from.
3

Start your risk-free talent trial

Work with your chosen talent on a trial basis for up to two weeks. Pay only if you decide to hire them.

Top talent is in high demand.

Start hiring