
Harish Chander Ramesh
Verified Expert in Engineering
Data Engineer and Developer
Dubai, United Arab Emirates
Toptal member since April 22, 2022
Harish is a data engineer who has been consuming, engineering, analyzing, exploring, testing, and visualizing data for personal and professional purposes for the last ten years. His passion for data has led him to work with multiple Fortune 50 organizations, including Amazon and Verizon. Harish loves challenges and believes he can learn and deliver best when out of his comfort zone.
Portfolio
Experience
- BI Reporting - 10 years
- SQL - 9 years
- Apache Spark - 8 years
- Tableau - 8 years
- Python - 7 years
- Apache Airflow - 6 years
- Google Cloud Platform (GCP) - 5 years
- Microsoft Power BI - 4 years
Preferred Environment
Google Cloud Platform (GCP), Tableau, Microsoft Power BI, SQL, ETL, Business Intelligence (BI), Data Visualization, Amazon Web Services (AWS), Google BigQuery, Azure SQL Databases, Data Engineering, AWS Data Pipeline Service, Data Management, Collibra, Informatica Cloud, Informatica ETL, Informatica, Oracle, JavaScript, Data Architecture, Excel 365, CSV File Processing, Excel VBA, Data Extraction, MySQL, Real-time Data
The most amazing...
...data platform I've built from scratch is for a video conferencing app, which managed to have no downtime despite the 600% usage increase during the pandemic.
Work Experience
BigQuery Data Analyst
Guardian Service Holdings LLC
- Designed and implemented API-driven data pipelines integrating Vertafore, AgencyZoom, AMS 360, and PL Rating into a centralized BigQuery data lake, enabling unified analytics across core insurance systems.
- Built and optimized BigQuery data models to transform raw operational data into analytics-ready schemas, improving query performance and enabling faster business reporting.
- Implemented end-to-end data ingestion workflows using Python and SQL, ensuring reliable, scalable data flow from multiple third-party systems into Google Cloud.
- Enabled marketing and web funnel analytics by tagging, transforming, and dispatching event data into the data lake and Google Analytics for downstream analysis.
- Collaborated closely with non-technical stakeholders to translate business requirements into data models, dashboards, and actionable insights.
- Created an open-source customer ID generation project end to end.
Data Engineer and Architect
United Talent Agency - Main
- Designed and implemented a visualization tool for monitoring queries across all environments, enabling the early identification and resolution of potential issues, which improved system reliability by 30% and optimized query performance by 25%.
- Created an automated service that effectively detects and resolves data quality issues throughout the development stages, leading to a 50% decrease in incidents and ensuring high data integrity and trustworthiness in the data lake project.
- Established a robust testing platform that identified reliability issues during the pre-production stages, enhancing the overall system stability and reducing downtime by 20% before full-scale deployment.
- Led a team of data engineers in identifying and addressing infrastructure gaps through the development of automated solutions, which streamlined operations and increased the team's productivity by 35%.
- Contributed significantly to the design, development, and maintenance of existing data warehousing and data lake projects.
- Developed and deployed a comprehensive framework for the data engineering team, significantly enhancing feature impact analysis and ensuring thorough testing before deployment, resulting in a 40% reduction in customer disruptions due to releases.
- Architected and executed a scalable data lake solution in Azure, integrating Snowflake, DBT, and Spark to support advanced analytics and machine learning projects, which increased data accessibility by 50% and reduced data processing time by 40%.
- Pioneered the use of machine learning tools and frameworks to automate data quality checks and anomaly detection, reducing manual data verification efforts by 70% and improving data accuracy for downstream analytics and ML model training.
- Implemented a CI/CD pipeline for seamless integration and delivery of data engineering and ML projects, which accelerated deployment cycles by 50% and fostered a culture of continuous improvement and innovation within the data engineering team.
Data Engineer Manager
MH Alshaya
- Developed the first-ever Data warehouse from scratch, incorporating product analytics at scale, using various GCP services.
- Developed the Golden Customer Record in real-time, extending the Loyalty program of 119 brands over 19 countries.
- Developed and maintained a data quality framework with the help of the entire business team in-house, using Great Expectations at scale. This was also used in fraud analytics across 50+ brands in near real-time.
- Led a team of six data engineers, the first set of data engineers in the organization, and started up a data-driven culture within the team.
Lead Data Engineer
Verizon Media
- Developed the first streaming analytics platform to handle media stats from videoconferencing solutions using Apache Spark and Storm on AWS-managed services.
- Built a data pipeline that autoscaled itself, not experiencing the impacts of the COVID-19 pandemic despite the 600% increase in the daily usage volume due to remote work implementation among clients’ teams.
- Tested and implemented Apache Hudi at its early stages of development, also providing ACID transactions the ability on historical data.
- Led a team of seven data engineers, three seniors, two juniors, and one intern. Created opportunities to interact with large clients worldwide on technical solution consultation and solution architecting.
- Migrated a live legacy database of PostgreSQL to Snowflake with DBT on the process with a size of 2.2 PB in five days. Designed, implemented, and validated the migration on the fly with the help of an error reporting framework with 0.3% of errors.
Data Engineer
Amazon
- Contributed to the world's largest eCommerce platform covering 16 marketplaces across the globe in different timezones. I was a part of the retail business team that handled the worldwide retail business data management and pipelines.
- Managed to handle high-pressure environments and meet tight deadlines. Worked alongside the best minds in the country and the world, initiating a data engineer forum within the organization for cross-polination of ideas among us.
- Built real-time pipelines to stream data from different platforms to the Amazon data warehouse with a service-level agreement (SLA) of a 2-minute time delay using Spark, Flink, and Tableau.
- Created a 360-degree dashboard with perspectives on Amazon's customers across different Amazon services. The dashboard was made public on a forum and gained massive popularity for the ease of data understanding by consumers.
Data Engineer
NTT Data
- Developed, tested, and deployed end-to-end real-time and Batch ETL pipelines for a healthcare provider.
- Documented every line of code and changes to the existing product from a business standpoint.
- Learned new technologies with an open-minded approach and grew as an agnostic developer.
- Developed two major data warehouse-related projects to save 23% of data storage cost and 26.5% of maintenance cost.
Experience
Competitive Price Monitoring System for eCommerce Business
Sub-3-Second Fraud Detection Pipeline for a Hyperscale Video Conferencing Platform
Shipped an end-to-end streaming pipeline that flags fraudulent join attempts in under three seconds, well inside the window where a meeting host can be alerted and act. Architecture: Apache Kafka for event ingestion, Apache Storm for low-latency stream processing, MemSQL (now SingleStore) as the hot store for sub-second lookups, Python for rule and ML signal evaluation, and Looker for trust-and-safety analyst tooling.
Deliberately chose an open-source stack to avoid vendor lock-in at the data layer. The same primitives were then reused for other real-time signals across the platform.
The pipeline ran with zero downtime through a 600% increase in traffic, processing millions of join events per day. It delivered a measurable reduction in fraudulent meeting incident reports to the trust-and-safety team.
Real-time Driver Incentives Platform for a Regional Ride Hailing Operator
It was built on the ELK stack (Elasticsearch, Logstash, and Grafana for visualization) running on Google Cloud Platform, with the dashboard embedded directly into the driver app so the driver can see targets, current progress, earned incentives, and what is still possible, all updated in near real time.
I designed the incentive rules engine to let the operations team change target structures without code changes, so promotion experiments could ship in days rather than sprints. This shifted the relationship between the data team and ops from a ticket-based model to a self-service model.
Impact: target attainment and daily active drivers improved measurably across the first two quarters, and the operations team ran far more incentive experiments per quarter than under the old reporting cadence.
2.2 PB Live PostgreSQL to Snowflake Migration
The final error rate landed at 0.3%, reconciled and resolved before the legacy system was decommissioned. The new Snowflake environment materially reduced downstream query latency and gave analytics and ML teams an ACID-compliant historical store for the first time, layered on top of Apache Hudi, which was still pre-1.0 at the time.
The architectural call: keep PostgreSQL writeable during migration, stream the delta, validate each batch against a hash-checksum, and cut over only once the error report falls below the threshold. That decision is why the business experienced zero downtime on the database underpinning a hyperscale video conferencing platform during its 600 percent surge in pandemic usage.
Enterprise Data Reliability Platform at United Talent Agency
Together, they delivered: 50% reduction in data-quality incidents, 30% lift in system reliability, 25% improvement in query performance, 20% reduction in pre-production downtime, and a 40% drop in customer-visible disruptions from releases, all measured against pre-program baselines.
On the platform side, I architected a scalable data lake on Azure, integrating Snowflake, dbt, and Spark to support advanced analytics and ML, increasing data accessibility by 50% and cutting processing time by 40%. Layered on top: ML-based automated data-quality checks and anomaly detection that reduced manual verification effort by 70%, and a CI/CD pipeline for data and ML projects that accelerated deployment cycles by 50%.
The leadership angle: team productivity rose 35% once the automated infrastructure was in place. Engineers stopped firefighting and started shipping.
Sub-2-Minute Real-time Pipelines Across 16 Amazon Retail Marketplaces
I owned the architecture and SLA enforcement across regions, including the on-call rotation, the schema-evolution path for upstream changes, and the data contracts with retail business teams.
In addition to the pipelines, I designed a 360-degree customer dashboard that gives leaders cross-service visibility into Amazon customers. It was shared on Amazon's internal forum and adopted by teams well outside retail because of how cleanly it surfaced cross-product behavior, one of the first such dashboards inside the organization.
I founded an internal Data Engineer Forum to cross-pollinate ideas across teams. Small thing, but the kind of org-level move that gets noticed at a company that size.
Real-time Golden Customer Record and Fraud Analytics Across 50+ Retail Brands
On top of the warehouse, I shipped the real-time Golden Customer Record that extended the group's loyalty program across all 119 brands, unifying identity, transaction, and engagement signals into a single record consumed in real time by the loyalty engine, CRM, and merchandising teams.
I designed and rolled out a data quality framework using Great Expectations at scale, co-built with the business teams, ensuring the rules captured real domain logic rather than engineering assumptions. The same framework powered near-real-time fraud analytics across 50+ brands.
Leadership: built the data engineering team from zero (first six hires) and established the data-driven culture inside an org that had previously been report-driven. The team I built continued to run the platform after I rolled off.
23% Storage and 26.5% Maintenance Cost Reduction on a Healthcare Data Warehouse
The work covered both real-time and batch ETL pipelines, with cost reductions driven by schema and partition redesign, implementation of retention-tier policies, replacing redundant pipelines with consolidated ones, and turning off cost centers that were running on autopilot.
The maintenance side was equally important. Every line of code and product change was documented from a business standpoint, so the platform stayed cheap to operate after I rolled off. The savings stuck.
This was an end-to-end engagement: requirements gathering with the business, technical design, implementation, parallel-run validation, and handover documentation. The combined storage and maintenance savings paid back the engagement in under two quarters.
Enterprise Data Quality, Governance, and Catalog Program
Technical anchors used repeatedly: Collibra as the catalog and stewardship surface, Informatica for ETL lineage and data quality enforcement, and Great Expectations for in-pipeline validation. The non-technical anchors matter more: defining the ownership model, the steward escalation path, the PII classification rules, and the policy for what gets cataloged versus what stays dark.
Outcome pattern: incident reduction in the 40-50% range (consistent with the 50% I delivered at United Talent Agency), and a measurable drop in the 'I can't find the data I need' friction that paralyzes most enterprise analytics teams.
The strategic value: governance is what separates senior data engineers from principals. Most individual contributors avoid it. Principals own it end-to-end and bring the business along.
Education
Bachelor of Engineering Degree in Electronics
Anna University - Chennai, India
Certifications
AWS Certified Solutions Architect
Amazon Web Services
Google Cloud Certified - Professional Data Engineer
Google Cloud
Skills
Libraries/APIs
REST APIs, Pandas, PySpark, Spark Streaming
Tools
Apache Airflow, Tableau, Microsoft Power BI, Abinitio, Kafka Streams, Google Analytics, Looker, BigQuery, Collibra, Informatica ETL, Excel 2016, AWS Glue, GitHub, Apache Beam, Amazon CloudWatch, Cloud Dataflow, Amazon Athena, Power Query, ELK (Elastic Stack), Microsoft Access, pgAdmin, Amazon QuickSight, Amazon Elastic Container Service (ECS), Amazon CloudFront CDN, AWS CloudFormation, Git, Stitch Data, Azure Kubernetes Service (AKS), Matillion ETL for Redshift, Apache Storm, Logstash, Grafana, Terraform, Azure Machine Learning
Languages
SQL, Python, Snowflake, Looker Modeling Language (LookML), R, SPARQL, JavaScript, Excel VBA
Frameworks
Apache Spark, Spark, Streamlit, Storm, Hadoop, Django
Paradigms
ETL, Business Intelligence (BI), ETL Implementation & Design, Database Development, DevOps, Application Architecture, Microservices
Platforms
Google Cloud Platform (GCP), Amazon EC2, Amazon Web Services (AWS), Azure, Firebase, AWS Lambda, Databricks, Linux, Kubernetes, Microsoft Fabric, AWS IoT, Apache Flink, Airbyte, Azure Synapse, Oracle, Docker, Apache Kafka, Cloud Native, Apache Hudi
Storage
Teradata, Redshift, Databases, Amazon S3 (AWS S3), Data Pipelines, Data Lake Design, PostgreSQL, Azure SQL Databases, AWS Data Pipeline Service, MongoDB, Microsoft SQL Server, Database Architecture, Database Performance, NoSQL, Amazon Aurora, Datadog, Data Lakes, Google Cloud, Oracle Cloud, MySQL, Cloud Firestore, MemSQL, Elasticsearch
Industry Expertise
Marketing
Other
Software, Dashboards, Data Visualization, Amazon RDS, Big Data, Data Warehouse Design, Data Warehousing, Data Engineering, Google BigQuery, Data Analysis, Data Build Tool (dbt), Cloud Platforms, Data Management, Informatica Cloud, Informatica, Data Architecture, Excel 365, Office 365, CSV File Processing, Data Migration, Data Extraction, ELT, Technical Architecture, ETL Tools, Cloud, Delta Lake, Pub/Sub, Azure Databricks, Warehouses, BI Reporting, Orchestration, Data Processing, Infrastructure as Code (IaC), Query Optimization, English, Data Cleaning, GitHub Actions, APIs, Reports, Distributed Systems, Looker Studio, Dashboard Design, Business Analysis, Google Analytics 4 (GA4), Data Strategy, Performance Tuning, Sharding, Serverless, Data Transformation, API Integration, Big Data Architecture, Data Modeling, Analytics, Data Analytics, Data Science, Data Governance, Parquet, Database Schema Design, Fivetran, TIBCO, Ads, Data Quality, Finance, Mobile Analytics, Monitoring, CI/CD Pipelines, Amazon EMR Studio, Web Analytics, Social Media Web Traffic, Real-time Data, Metabase, DocumentDB, SAP, Azure Data Lake, Solution Architecture, Architecture, Sales, Cloud Data Fusion, User Interface (UI), Great Expectations Cloud, Machine Learning, ClickStream, Amazon MQ
How to Work with Toptal
Toptal matches you directly with global industry experts from our network in hours—not weeks or months.
Share your needs
Choose your talent
Start your risk-free talent trial
Top talent is in high demand.
Start hiring