Abarajithan Arunachalam, Developer in Karaikudi, Tamil Nadu, India
Abarajithan is available for hire
Hire Abarajithan

Abarajithan Arunachalam

Verified Expert  in Engineering

BI and Database Developer

Karaikudi, Tamil Nadu, India

Toptal member since March 11, 2021

Bio

Abarajithan is a data engineering expert with over thirteen years of experience in​ dynamic startups and Fortune 500 companies in the USA, Germany, and India. At Amazon, he built a sub-second data discovery and search platform for the merchant analytics team. He excels in data engineering, ETL/data warehousing, BI, data analytics, and QA​ , with solid experience in various platforms and technologies, including AWS, GCP, Hive, EMR, Elasticsearch, NoSQL, MPP, and columnar/relational databases.

Portfolio

Freelance
ETL, Data Engineering, GitHub, Python, Redshift, Databricks, Data Modeling...
Blinkist
AWS Lambda, Redshift, Serverless Framework, Node.js, AWS Kinesis, Terraform...
GoodRx
Redshift, SQL, Data Warehouse, Data Warehouse, AWS...

Experience

Availability

Part-time

Preferred Environment

Python 3, SQL, Amazon Web Services (AWS), Google Cloud Platform (GCP), Serverless Architecture, Data Warehousing, Data Modeling, Data Engineering, ETL, GitHub

The most amazing...

...product I built was for Amazon, a sub-second data discovery and search platform for the merchant analytics team utilizing Redshift, Elasticsearch, and Kibana.

Work Experience

Freelance Data Engineer

2021 - PRESENT
Freelance
  • Worked on an experimental COVID-19 data engineering and business intelligence project using a serverless architecture Pub/Sub, Cloud Functions, Cloud Storage, BigQuery, Airflow, and Google Data Studio on Google Cloud Platform (GCP).
  • Collaborated with Toptal clients for data engineering, modeling, SQL data extraction, business intelligence, and data warehouse optimization projects.
  • Implemented social media analytics and a natural language processing (NLP) project using the GCP serverless stack and BigQuery.
  • Conceptualized and built data architecture from scratch in AWS for real-time and batch data processing using Kinesis, Lambda, Glue (PySpark), and Step Function.
Technologies: ETL, Data Engineering, GitHub, Python, Redshift, Databricks, Data Modeling, Cloud Engineering, BigQuery, Python, Database, AWS Kinesis, AWS Lambda, AWS Glue, AWS, PySpark

Senior Data Engineer

2019 - 2020
Blinkist
  • Developed the data infrastructure using a serverless framework and architecture on Node.js.
  • Developed AWS Lambda functions for posting events to third-party API Amplitude for behavioral analytics.
  • Built the AWS Kinesis streaming platform for pushing mobile events into a Redshift data warehouse. Developed ERB templates using Terraform to deploy AWS infrastructure components and CircleCI for CI/CD development.
Technologies: AWS Lambda, Redshift, Serverless Framework, Node.js, AWS Kinesis, Terraform, CircleCI, Data Engineering, Amplitude, GitHub, Database

Senior Data Engineer

2017 - 2018
GoodRx
  • Built complex star schema-based dimensional models to capture the full customer lifecycle for the customer acquisition and retention marketing programs.
  • Monitored the Redshift database and performance-tuned the analysts' SQL queries. Worked on SQL/Python Jinja templates, data pipeline design, orchestration, and implementation in Amazon Redshift for product managers and analysts.
  • Collaborated with analysts for Looker reporting and took a deep dive into the business analysis.
Technologies: Redshift, SQL, Data Warehouse, Data Warehouse, AWS, Business Intelligence Development, Data Engineering, Looker, Data Modeling, GitHub, Python, Python, Database

Data Engineer II

2015 - 2017
Amazon.com
  • Designed and developed legacy Oracle and Amazon Redshift SQL data pipelines for key reporting needs of product managers. Led and implemented the data/ETL migration effort from Oracle to Amazon Redshift for over 150TB of data.
  • Implemented a Python-based ETL for data ingestion from Redshift into a sub-second query platform in Elasticsearch and Kibana. Performed a full-scale POC for a next-generation reporting platform, including open source and AWS offerings.
  • Onboarded long-running ETL pipelines onto the Hive/EMR platform. Interacted with business users to solve data problems. Participated in operational activities for the entire suite of ETL and BI systems on a rotational basis.
Technologies: AWS, SQL, Business Intelligence Development, Elasticsearch, Data Warehouse, Data Warehouse, Amazon EMR, Hadoop, Python, ETL, Data Engineering, Kibana, Oracle Development, Amazon QuickSight, Data Modeling, Database

Data Engineer

2014 - 2015
Lyft
  • Developed the database optimization of 12-node Redshift clusters to handle over 60 TB of data and built toolsets around Amazon Redshift.
  • Built comprehensive customer/passenger mileage-tracking data models using GIS data in Redshift to be used by insurance and operations teams for payouts to third-party vendors.
  • Performed operational tasks for Python-based ETL job dependencies, job failures, and performed SQL query analysis/review for business analysts and data scientists.
Technologies: Redshift, SQL, Python, AWS, Data Engineering, Business Intelligence Development, ETL, Data Modeling, GitHub, Python, Database

Senior Data Warehouse Engineer

2013 - 2014
YouSendIt
  • Designed and developed a scalable Splunk ETL using the Splunk Java SDK and Talend with an Amazon Redshift MPP database for processing unstructured web log data.
  • Helped the product development team with Cassandra NoSQL data modeling by developing custom Cassandra ETL, using DataStax Java drivers for partner reporting.
  • Implemented data integration packages using Talend and built reports/visualizations in Tableau for user growth, mobile platform, sales, and other business teams.
Technologies: Java, Redshift, Talend ETL, SQL, NoSQL, Cassandra, Tableau Development, AWS, Data Engineering, Data Warehouse, Data Warehouse, Business Intelligence Development, SSIS, Data Modeling, Database

Software Engineer | Data Warehouse Developer

2013 - 2013
Guidewire
  • Enhanced a Java-based ETL engine and SQL transform packages.
  • Designed star schema insurance domain data models.
  • Performed unit and functional tests for different product components.
Technologies: Java, SQL, Data Modeling

Software Engineer 2 | EDW and Business Intelligence

2010 - 2013
Citrix
  • Developed the ETL infrastructure using Oracle Data Integrator, Informatica Cloud, and PL/SQL.
  • Built analytical data models and reports using BusinessObjects and Microsoft SSAS for customer insights and marketing quarterly business reviews.
  • Developed a custom Java-based ETL solution to migrate data from over 80 PostgREST audio bridges spread across eight data centers.
Technologies: Oracle Development, Java, SSAS, Oracle Development, Data Warehouse, Data Warehouse, Business Intelligence Development, ETL, Data Engineering, SAP BusinessObjects (BO), Data Modeling, PostgREST, Informatica Cloud, Database

Software Quality Engineer 2

2006 - 2008
Adobe
  • Prepared test cases, completed bug reports/triage, and involved in testing bug fixes.
  • Assisted in the scalable keyword-driven test automation framework design using Borland Silk Test.
  • Oversaw all of the quality assurance engineering for the product Adobe Captivate.
Technologies: Test Automation, Manual Software Testing, Test Automation Frameworks

Software Test Engineer

2005 - 2006
Infosys
  • Executed test plans, test scenarios, and test cases based on client requirements.
  • Served for my team as the software-testing point of contact with the client.
  • Oversaw the test strategy which included tracking the test execution and metrics.
Technologies: Manual Software Testing, Test Automation, Test Automation Frameworks

COVID-19 Worldwide Analytics

https://github.com/abarajithan-a/gcp-covid19-cloudfunctions
I built a data engineering and business intelligence analytics platform based on worldwide COVID-19 data collated by Johns Hopkins University. It is implemented on the Google Cloud Platform using the serverless architecture paradigm and the tech stack, including Python-based ETL in cloud functions, pub/sub, cloud storage, Big Query, Google Data Studio, and Docker Airflow for pipeline orchestration.

For more information about the COVID-19 project, check out the following links.

For COVID-19 GCP data architecture
https://github.com/abarajithan-a/gcp-covid19-cloudfunctions

For COVID-19 GCP Business Intelligence
https://github.com/abarajithan-a/gcp-covid19-businessintelligence

Redshift Data Migration Project

At Amazon, I led a time-sensitive complex data migration effort of over 150 TB from Oracle to AWS EMR and AWS Redshift. I migrated over thousands of ETL jobs and completed the project within the deadline without compromising data quality and consistency.

Serverless Streaming Data Pipeline

At Blinkist, I built a serverless streaming data pipeline using AWS Lambda and AWS Kinesis to consume, enrich and push mobile app events to Amplitude for behavioral analytics and Redshift for downstream data warehousing.
2008 - 2010

Master's Degree in Management Information Systems

Eller College of Management, The University of Arizona - Tucson, Arizona, United States

2001 - 2005

Bachelor's Degree in Electronics and Communications Engineering

College of Engineering Guindy, Anna University - Chennai, Tamil Nadu, India

Libraries/APIs

PySpark, Node.js, PostgREST

Tools

AWS Glue, GitHub, Amazon EMR, Talend ETL, Tableau Development, BigQuery, SSAS, Terraform, CircleCI, Looker, Kibana, Oracle Development, Amazon QuickSight, Apache Airflow, AWS

Languages

SQL, Python, Python, Java

Paradigms

Business Intelligence Development, Serverless Architecture, ETL, Oracle Development, Test Automation

Storage

Redshift, Elasticsearch, Database, Hadoop, NoSQL, Cassandra, SSIS

Frameworks

Serverless Framework

Platforms

AWS, Cloud Engineering, AWS Lambda, Oracle Development, Docker, Databricks

Other

Data Warehouse, Data Modeling, Google BigQuery, Data Warehouse, Data Engineering, AWS Kinesis, Manual Software Testing, Test Automation Frameworks, Google Cloud Functions, Google Data Studio, Data Mining, SAP BusinessObjects (BO), Amplitude, Informatica Cloud

Collaboration That Works

How to Work with Toptal

Toptal matches you directly with global industry experts from our network in hours—not weeks or months.

1

Share your needs

Discuss your requirements and refine your scope in a call with a Toptal domain expert.
2

Choose your talent

Get a short list of expertly matched talent within 24 hours to review, interview, and choose from.
3

Start your risk-free talent trial

Work with your chosen talent on a trial basis for up to two weeks. Pay only if you decide to hire them.

Top talent is in high demand.

Start hiring