Abarajithan Arunachalam
Verified Expert in Engineering
BI and Database Developer
Karaikudi, Tamil Nadu, India
Toptal member since March 11, 2021
Abarajithan is a data engineering expert with over thirteen years of experience in dynamic startups and Fortune 500 companies in the USA, Germany, and India. At Amazon, he built a sub-second data discovery and search platform for the merchant analytics team. He excels in data engineering, ETL/data warehousing, BI, data analytics, and QA , with solid experience in various platforms and technologies, including AWS, GCP, Hive, EMR, Elasticsearch, NoSQL, MPP, and columnar/relational databases.
Portfolio
Experience
Availability
Preferred Environment
Python 3, SQL, Amazon Web Services (AWS), Google Cloud Platform (GCP), Serverless Architecture, Data Warehousing, Data Modeling, Data Engineering, ETL, GitHub
The most amazing...
...product I built was for Amazon, a sub-second data discovery and search platform for the merchant analytics team utilizing Redshift, Elasticsearch, and Kibana.
Work Experience
Freelance Data Engineer
Freelance
- Worked on an experimental COVID-19 data engineering and business intelligence project using a serverless architecture Pub/Sub, Cloud Functions, Cloud Storage, BigQuery, Airflow, and Google Data Studio on Google Cloud Platform (GCP).
- Collaborated with Toptal clients for data engineering, modeling, SQL data extraction, business intelligence, and data warehouse optimization projects.
- Implemented social media analytics and a natural language processing (NLP) project using the GCP serverless stack and BigQuery.
- Conceptualized and built data architecture from scratch in AWS for real-time and batch data processing using Kinesis, Lambda, Glue (PySpark), and Step Function.
Senior Data Engineer
Blinkist
- Developed the data infrastructure using a serverless framework and architecture on Node.js.
- Developed AWS Lambda functions for posting events to third-party API Amplitude for behavioral analytics.
- Built the AWS Kinesis streaming platform for pushing mobile events into a Redshift data warehouse. Developed ERB templates using Terraform to deploy AWS infrastructure components and CircleCI for CI/CD development.
Senior Data Engineer
GoodRx
- Built complex star schema-based dimensional models to capture the full customer lifecycle for the customer acquisition and retention marketing programs.
- Monitored the Redshift database and performance-tuned the analysts' SQL queries. Worked on SQL/Python Jinja templates, data pipeline design, orchestration, and implementation in Amazon Redshift for product managers and analysts.
- Collaborated with analysts for Looker reporting and took a deep dive into the business analysis.
Data Engineer II
Amazon.com
- Designed and developed legacy Oracle and Amazon Redshift SQL data pipelines for key reporting needs of product managers. Led and implemented the data/ETL migration effort from Oracle to Amazon Redshift for over 150TB of data.
- Implemented a Python-based ETL for data ingestion from Redshift into a sub-second query platform in Elasticsearch and Kibana. Performed a full-scale POC for a next-generation reporting platform, including open source and AWS offerings.
- Onboarded long-running ETL pipelines onto the Hive/EMR platform. Interacted with business users to solve data problems. Participated in operational activities for the entire suite of ETL and BI systems on a rotational basis.
Data Engineer
Lyft
- Developed the database optimization of 12-node Redshift clusters to handle over 60 TB of data and built toolsets around Amazon Redshift.
- Built comprehensive customer/passenger mileage-tracking data models using GIS data in Redshift to be used by insurance and operations teams for payouts to third-party vendors.
- Performed operational tasks for Python-based ETL job dependencies, job failures, and performed SQL query analysis/review for business analysts and data scientists.
Senior Data Warehouse Engineer
YouSendIt
- Designed and developed a scalable Splunk ETL using the Splunk Java SDK and Talend with an Amazon Redshift MPP database for processing unstructured web log data.
- Helped the product development team with Cassandra NoSQL data modeling by developing custom Cassandra ETL, using DataStax Java drivers for partner reporting.
- Implemented data integration packages using Talend and built reports/visualizations in Tableau for user growth, mobile platform, sales, and other business teams.
Software Engineer | Data Warehouse Developer
Guidewire
- Enhanced a Java-based ETL engine and SQL transform packages.
- Designed star schema insurance domain data models.
- Performed unit and functional tests for different product components.
Software Engineer 2 | EDW and Business Intelligence
Citrix
- Developed the ETL infrastructure using Oracle Data Integrator, Informatica Cloud, and PL/SQL.
- Built analytical data models and reports using BusinessObjects and Microsoft SSAS for customer insights and marketing quarterly business reviews.
- Developed a custom Java-based ETL solution to migrate data from over 80 PostgREST audio bridges spread across eight data centers.
Software Quality Engineer 2
Adobe
- Prepared test cases, completed bug reports/triage, and involved in testing bug fixes.
- Assisted in the scalable keyword-driven test automation framework design using Borland Silk Test.
- Oversaw all of the quality assurance engineering for the product Adobe Captivate.
Software Test Engineer
Infosys
- Executed test plans, test scenarios, and test cases based on client requirements.
- Served for my team as the software-testing point of contact with the client.
- Oversaw the test strategy which included tracking the test execution and metrics.
Experience
COVID-19 Worldwide Analytics
https://github.com/abarajithan-a/gcp-covid19-cloudfunctionsFor more information about the COVID-19 project, check out the following links.
For COVID-19 GCP data architecture
https://github.com/abarajithan-a/gcp-covid19-cloudfunctions
For COVID-19 GCP Business Intelligence
https://github.com/abarajithan-a/gcp-covid19-businessintelligence
Redshift Data Migration Project
Serverless Streaming Data Pipeline
Education
Master's Degree in Management Information Systems
Eller College of Management, The University of Arizona - Tucson, Arizona, United States
Bachelor's Degree in Electronics and Communications Engineering
College of Engineering Guindy, Anna University - Chennai, Tamil Nadu, India
Skills
Libraries/APIs
PySpark, Node.js, PostgREST
Tools
AWS Glue, GitHub, Amazon EMR, Talend ETL, Tableau Development, BigQuery, SSAS, Terraform, CircleCI, Looker, Kibana, Oracle Development, Amazon QuickSight, Apache Airflow, AWS
Languages
SQL, Python, Python, Java
Paradigms
Business Intelligence Development, Serverless Architecture, ETL, Oracle Development, Test Automation
Storage
Redshift, Elasticsearch, Database, Hadoop, NoSQL, Cassandra, SSIS
Frameworks
Serverless Framework
Platforms
AWS, Cloud Engineering, AWS Lambda, Oracle Development, Docker, Databricks
Other
Data Warehouse, Data Modeling, Google BigQuery, Data Warehouse, Data Engineering, AWS Kinesis, Manual Software Testing, Test Automation Frameworks, Google Cloud Functions, Google Data Studio, Data Mining, SAP BusinessObjects (BO), Amplitude, Informatica Cloud
How to Work with Toptal
Toptal matches you directly with global industry experts from our network in hours—not weeks or months.
Share your needs
Choose your talent
Start your risk-free talent trial
Top talent is in high demand.
Start hiring