Harish is available for hire

Harish Chander Ramesh

Verified Expert in Engineering

Data Engineer and Developer

Location

Dubai, United Arab Emirates

Toptal Member Since

April 22, 2022

Harish is a data engineer who has been consuming, engineering, analyzing, exploring, testing, and visualizing data for personal and professional purposes for the last ten years. His passion for data has led him to work with multiple Fortune 50 organizations, including Amazon and Verizon. Harish loves challenges and believes he can learn and deliver best when out of his comfort zone.

Cloud Data Migration Data Visualization Amazon RDS Data Warehousing Data Engineering Infrastructure as Code (IaC)Data Cleaning Big Data Data Warehouse Design Data Analysis Informatica Software Redshift Databases Informatica Cloud Collibra ETL AWS Glue

Portfolio

United Talent Agency - Main

Data Engineering, Azure, Snowflake, Spark, Hadoop, Azure Machine Learning...

MH Alshaya

Apache Airflow, Apache Spark, Google Cloud Platform (GCP), Google Analytics...

Verizon Media

Apache Airflow, Apache Spark, Python, Tableau, ELK (Elastic Stack), Datadog...

Experience

SQL - 9 years Apache Spark - 8 years Tableau - 8 years Python - 7 years Apache Airflow - 6 years Google Cloud Platform (GCP) - 5 years Microsoft Power BI - 4 years

Availability

Part-time

Preferred Environment

Google Cloud Platform (GCP), Tableau, Microsoft Power BI, SQL, ETL, Business Intelligence (BI), Data Visualization, Amazon Web Services (AWS), Google BigQuery, Azure SQL Databases, Data Engineering, AWS Data Pipeline Service, Data Management, Collibra, Informatica Cloud, Informatica ETL, Informatica, Oracle, JavaScript, Data Architecture, Excel 365, CSV File Processing, Excel VBA, Data Extraction, MySQL, Real-time Data

The most amazing...

...data platform I've built from scratch is for a video conferencing app, which managed to have no downtime despite the 600% usage increase during the pandemic.

Work Experience

Data Engineer and Architect

2023 - 2024

United Talent Agency - Main

Designed and implemented a visualization tool for monitoring queries across all environments, enabling the early identification and resolution of potential issues, which improved system reliability by 30% and optimized query performance by 25%.
Created an automated service that effectively detects and resolves data quality issues throughout the development stages, leading to a 50% decrease in incidents and ensuring high data integrity and trustworthiness in the data lake project.
Established a robust testing platform that identified reliability issues during the pre-production stages, enhancing the overall system stability and reducing downtime by 20% before full-scale deployment.
Led a team of data engineers in identifying and addressing infrastructure gaps through the development of automated solutions, which streamlined operations and increased the team's productivity by 35%.
Contributed significantly to the design, development, and maintenance of existing data warehousing and data lake projects.
Developed and deployed a comprehensive framework for the data engineering team, significantly enhancing feature impact analysis and ensuring thorough testing before deployment, resulting in a 40% reduction in customer disruptions due to releases.
Architected and executed a scalable data lake solution in Azure, integrating Snowflake, DBT, and Spark to support advanced analytics and machine learning projects, which increased data accessibility by 50% and reduced data processing time by 40%.
Pioneered the use of machine learning tools and frameworks to automate data quality checks and anomaly detection, reducing manual data verification efforts by 70% and improving data accuracy for downstream analytics and ML model training.
Implemented a CI/CD pipeline for seamless integration and delivery of data engineering and ML projects, which accelerated deployment cycles by 50% and fostered a culture of continuous improvement and innovation within the data engineering team.

Technologies: Data Engineering, Azure, Snowflake, Spark, Hadoop, Azure Machine Learning, Data Architecture, Data Build Tool (dbt), Python, Orchestration, Data Processing, DevOps, Infrastructure as Code (IaC), Query Optimization, English, Data Cleaning, Cloud Dataflow, Metabase

Data Engineer Manager

2021 - 2022

MH Alshaya

Developed the first-ever Data warehouse from scratch, incorporating product analytics at scale, using various GCP services.
Developed the Golden Customer Record in real-time, extending the Loyalty program of 119 brands over 19 countries.
Developed and maintained a data quality framework with the help of the entire business team in-house, using Great Expectations at scale. This was also used in fraud analytics across 50+ brands in near real-time.
Led a team of six data engineers, the first set of data engineers in the organization, and started up a data-driven culture within the team.

Technologies: Apache Airflow, Apache Spark, Google Cloud Platform (GCP), Google Analytics, Tableau, ETL, Dashboards, Data Visualization, Amazon EC2, Amazon RDS, Databases, Redshift, Apache Flink, Amazon S3 (AWS S3), Data Pipelines, Spark, Apache Kafka, Data Warehouse Design, Data Lake Design, Big Data Architecture, Data Warehousing, Data Lakes, Cloud Native, Data Engineering, Google BigQuery, Data Modeling, Analytics, Google Cloud, Data Analysis, Data Analytics, Data Science, Terraform, Data Governance, Azure, PostgreSQL, Cloud Platforms, Looker, Parquet, BigQuery, Database Schema Design, Data Management, Azure Synapse, Collibra, Informatica Cloud, Informatica ETL, Informatica, Ads, User Interface (UI), Excel 2016, Data Architecture, Data Quality, Great Expectations Cloud, AWS Glue, Oracle Cloud, Excel 365, Office 365, CSV File Processing, MongoDB, ETL Implementation & Design, Data Migration, Finance, Mobile Analytics, Firebase, Data Extraction, Amazon Web Services (AWS), ELT, Database Architecture, Database Performance, Database Development, AWS Lambda, Docker, Microservices, Technical Architecture, ETL Tools, Monitoring, Cloud, Databricks, GitHub, NoSQL, Git, Pub/Sub, Warehouses, Machine Learning, BI Reporting, Amazon Aurora, Amazon CloudWatch, Web Analytics, Clickstream, Real-time Data, Kubernetes, Orchestration, Stitch Data, Data Processing, DevOps, Infrastructure as Code (IaC), Azure Kubernetes Service (AKS), Query Optimization, English, Data Cleaning, Cloud Dataflow, Metabase

Lead Data Engineer

2019 - 2021

Verizon Media

Developed the first streaming analytics platform to handle media stats from videoconferencing solutions using Apache Spark and Storm on AWS-managed services.
Built a data pipeline that autoscaled itself, not experiencing the impacts of the COVID-19 pandemic despite the 600% increase in the daily usage volume due to remote work implementation among clients’ teams.
Tested and implemented Apache Hudi at its early stages of development, also providing ACID transactions the ability on historical data.
Led a team of seven data engineers, three seniors, two juniors, and one intern. Created opportunities to interact with large clients worldwide on technical solution consultation and solution architecting.
Migrated a live legacy database of PostgreSQL to Snowflake with DBT on the process with a size of 2.2 PB in five days. Designed, implemented, and validated the migration on the fly with the help of an error reporting framework with 0.3% of errors.

Technologies: Apache Airflow, Apache Spark, Python, Tableau, ELK (Elastic Stack), Datadog, Kafka Streams, ETL, Dashboards, Data Visualization, Amazon EC2, Amazon RDS, Databases, Redshift, Storm, Apache Flink, Amazon S3 (AWS S3), Data Pipelines, Amazon Web Services (AWS), Spark, Big Data, Apache Kafka, Data Warehouse Design, Data Lake Design, Spark Streaming, Big Data Architecture, Data Warehousing, PySpark, Data Lakes, Cloud Native, Data Engineering, Google BigQuery, Data Modeling, Looker, Analytics, Google Cloud, Data Analysis, Snowflake, Data Analytics, Data Governance, Azure, PostgreSQL, pgAdmin, Data Build Tool (dbt), Cloud Platforms, Parquet, BigQuery, AWS Data Pipeline Service, Django, Database Schema Design, Data Management, Azure Synapse, Collibra, Informatica Cloud, Informatica ETL, Informatica, Amazon QuickSight, Ads, User Interface (UI), Excel 2016, JavaScript, Data Architecture, Data Quality, Great Expectations Cloud, AWS Glue, Oracle Cloud, Excel 365, Office 365, CSV File Processing, MongoDB, ETL Implementation & Design, Microsoft SQL Server, Data Migration, Finance, Mobile Analytics, Firebase, Data Extraction, MySQL, ELT, Database Architecture, Database Performance, Database Development, AWS Lambda, AWS CloudFormation, Docker, Technical Architecture, ETL Tools, Monitoring, Cloud, Databricks, Delta Lake, GitHub, NoSQL, Linux, Git, Apache Beam, Pub/Sub, Warehouses, Machine Learning, BI Reporting, Amazon Aurora, Amazon CloudWatch, Web Analytics, Google Analytics, Clickstream, Social Media Web Traffic, Kubernetes, Orchestration, Data Processing, DevOps, Infrastructure as Code (IaC), Query Optimization, English, Data Cleaning, Cloud Dataflow

Data Engineer

2016 - 2018

Amazon

Contributed to the world's largest eCommerce platform covering 16 marketplaces across the globe in different timezones. I was a part of the retail business team that handled the worldwide retail business data management and pipelines.
Managed to handle high-pressure environments and meet tight deadlines. Worked alongside the best minds in the country and the world, initiating a data engineer forum within the organization for cross-polination of ideas among us.
Built real-time pipelines to stream data from different platforms to the Amazon data warehouse with a service-level agreement (SLA) of a 2-minute time delay using Spark, Flink, and Tableau.
Created a 360-degree dashboard with perspectives on Amazon's customers across different Amazon services. The dashboard was made public on a forum and gained massive popularity for the ease of data understanding by consumers.

Technologies: Apache Airflow, Apache Spark, Tableau, ETL, Dashboards, Data Visualization, Amazon EC2, Databases, Redshift, Storm, Apache Flink, Amazon S3 (AWS S3), Data Pipelines, Amazon Web Services (AWS), Spark, Big Data, Apache Kafka, Data Warehouse Design, Data Lake Design, Spark Streaming, Big Data Architecture, Data Warehousing, PySpark, Data Lakes, Cloud Native, Data Engineering, Google BigQuery, Data Modeling, Looker, Data Analysis, Data Analytics, Cloud Platforms, BigQuery, Azure SQL Databases, AWS Data Pipeline Service, Django, Data Management, Amazon QuickSight, Ads, Oracle, Data Architecture, Data Quality, Great Expectations Cloud, AWS Glue, Excel 365, Office 365, CSV File Processing, MongoDB, ETL Implementation & Design, Amazon Elastic Container Service (Amazon ECS), Microsoft SQL Server, Data Migration, Mobile Analytics, Firebase, Data Extraction, MySQL, ELT, Hadoop, Database Performance, Database Development, AWS Lambda, AWS CloudFormation, Technical Architecture, ETL Tools, Monitoring, Cloud, Databricks, Delta Lake, GitHub, NoSQL, Linux, Git, Apache Beam, Pub/Sub, EMR Studio, Warehouses, BI Reporting, Amazon Aurora, Amazon CloudWatch, Google Analytics, Clickstream, Social Media Web Traffic, Real-time Data, Orchestration, Data Processing, DevOps, Infrastructure as Code (IaC), Query Optimization, English, Data Cleaning

Data Engineer

2013 - 2016

NTT Data

Developed, tested, and deployed end-to-end real-time and Batch ETL pipelines for a healthcare provider.
Documented every line of code and changes to the existing product from a business standpoint.
Learned new technologies with an open-minded approach and grew as an agnostic developer.
Developed two major data warehouse-related projects to save 23% of data storage cost and 26.5% of maintenance cost.

Technologies: Abinitio, SQL, Teradata, Amazon RDS, Amazon EC2, Databases, Amazon S3 (AWS S3), Data Pipelines, Amazon Web Services (AWS), Big Data, Data Warehousing, PySpark, Data Engineering, Data Analysis, Snowflake, Microsoft Access, Cloud Platforms, BigQuery, Azure SQL Databases, AWS Data Pipeline Service, Data Management, Azure Synapse, Informatica Cloud, Informatica ETL, Informatica, Amazon QuickSight, Oracle, Excel 2016, Data Architecture, Data Quality, Oracle Cloud, Excel 365, Office 365, CSV File Processing, ETL Implementation & Design, Microsoft SQL Server, Data Migration, Data Extraction, ELT, Hadoop, Database Development, AWS Lambda, ETL Tools, Cloud, Databricks, Delta Lake, GitHub, NoSQL, Linux, Git, Apache Beam, Pub/Sub, Warehouses, BI Reporting, Amazon CloudWatch, Social Media Web Traffic, Orchestration, Data Processing, Query Optimization, English, Data Cleaning

Experience

Competitive Price Monitoring System for eCommerce Business

The developed data framework will scrape multiple eCommerce websites based on their super-competitiveness. Super-competitiveness is the index to categorize different competitors for various product categories, used to scrape the competitor's websites one to three times a day. The output of the scraper script writes data to a data warehouse which will then be compared at the product-to-product level in real-time to generate a PCI. The price competitiveness index (PCI) is used to measure if the eCommerce business products are competitive compared to the super important and important competitors.

Real-time Pipelines for Fraud Alerting

This is for a video conference application where meeting IDs were prone to get hacked. The software system was not mature enough to identify a fraudulent addition to the meetings, so I built a data layer where a fraud meeting id is caught and reported in less than 3 seconds. This was implemented using more of an open-source stack, starting from Kafka, MemSQL (now known as SingleStore), Storm, Python, and Looker as a BI solution.

Driver's Incentives Framework

A real-time computational platform to calculate delivery drivers' target versus actual numbers, reward them with instant bonuses, and encourage them to achieve more than the target. This was built for a ride-hailing company where the driver's targets were not reported to them daily or intraday. A Grafana dashboard was created and embedded in the mobile app used by the driver so the drivers are aware of their performance, the incentives they have earned, and the targets to be achieved or already achieved.

Skills

Languages

SQL, Python, Snowflake, JavaScript, Excel VBA

Frameworks

Apache Spark, Spark, Storm, Hadoop, Django

Tools

Apache Airflow, Tableau, Microsoft Power BI, Abinitio, Kafka Streams, Google Analytics, BigQuery, Collibra, Informatica ETL, Excel 2016, AWS Glue, GitHub, Apache Beam, Amazon CloudWatch, Cloud Dataflow, ELK (Elastic Stack), Microsoft Access, pgAdmin, Amazon QuickSight, Amazon Elastic Container Service (Amazon ECS), Amazon CloudFront CDN, AWS CloudFormation, Git, Stitch Data, Azure Kubernetes Service (AKS), Apache Storm, Logstash, Grafana, Terraform, Looker, Azure Machine Learning

Paradigms

ETL, Business Intelligence (BI), ETL Implementation & Design, Database Development, Data Science, DevOps, Microservices

Platforms

Google Cloud Platform (GCP), Amazon EC2, Amazon Web Services (AWS), Firebase, AWS Lambda, Databricks, Linux, Kubernetes, Apache Flink, Azure, Airbyte, Azure Synapse, Oracle, Docker, Apache Kafka, Cloud Native

Storage

Teradata, Redshift, Databases, Amazon S3 (AWS S3), Data Pipelines, Data Lake Design, PostgreSQL, Azure SQL Databases, AWS Data Pipeline Service, MongoDB, Microsoft SQL Server, Database Architecture, Database Performance, NoSQL, Amazon Aurora, Datadog, Data Lakes, Google Cloud, Oracle Cloud, MySQL, Cloud Firestore, MemSQL, Elasticsearch

Other

Software, Dashboards, Data Visualization, Amazon RDS, Big Data, Data Warehouse Design, Data Warehousing, Data Engineering, Google BigQuery, Data Analysis, Data Build Tool (dbt), Cloud Platforms, Data Management, Informatica Cloud, Informatica, Data Architecture, Excel 365, Office 365, CSV File Processing, Data Migration, Data Extraction, ELT, Technical Architecture, ETL Tools, Cloud, Delta Lake, Pub/Sub, Azure Databricks, Warehouses, BI Reporting, Orchestration, Data Processing, Infrastructure as Code (IaC), Query Optimization, English, Data Cleaning, Big Data Architecture, Data Modeling, Analytics, Data Analytics, Data Governance, Parquet, Database Schema Design, Fivetran, TIBCO, Ads, Data Quality, Finance, Mobile Analytics, Monitoring, CI/CD Pipelines, EMR Studio, Web Analytics, Social Media Web Traffic, Real-time Data, Metabase, User Interface (UI), Great Expectations Cloud, Machine Learning, Clickstream

Libraries/APIs

PySpark, Spark Streaming

Education

2009 - 2013

Bachelor of Engineering Degree in Electronics

Anna University - Chennai, India

Certifications

JANUARY 2023 - PRESENT

Google Cloud Certified - Professional Data Engineer

Google Cloud

Collaboration That Works

How to Work with Toptal

Toptal matches you directly with global industry experts from our network in hours—not weeks or months.

Share your needs

Discuss your requirements and refine your scope in a call with a Toptal domain expert.

Choose your talent

Get a short list of expertly matched talent within 24 hours to review, interview, and choose from.

Start your risk-free talent trial

Work with your chosen talent on a trial basis for up to two weeks. Pay only if you decide to hire them.

Top talent is in high demand.

Start hiring