Harish Chander Ramesh, Data Engineer and Developer in Dubai, United Arab Emirates
Harish Chander Ramesh

Data Engineer and Developer in Dubai, United Arab Emirates

Member since April 14, 2022
Harish is a data engineer who has been consuming, engineering, analyzing, exploring, testing, and visualizing data for personal and professional purposes for the last ten years. His passion for data has led him to work with multiple Fortune 50 organizations, including Amazon and Verizon. Harish loves challenges and believes he can learn and deliver best when out of his comfort zone.
Harish is now available for hire

Portfolio

  • MH Alshaya
    Apache Airflow, Apache Spark, Google Cloud Platform (GCP), Google Analytics...
  • Verizon Media
    Apache Airflow, Apache Spark, AWS, Python, Tableau, ELK (Elastic Stack)...
  • Amazon
    Apache Airflow, Apache Spark, AWS, Tableau, ETL, Dashboards...

Experience

Location

Dubai, United Arab Emirates

Availability

Part-time

Preferred Environment

Apache Spark, Apache Airflow, Google Cloud Platform (GCP), AWS, ELK (Elastic Stack), Tableau, Microsoft Power BI, SQL, Python, ETL, Business Intelligence (BI), Dashboards, Data Visualization, Amazon Web Services (AWS), Google BigQuery

The most amazing...

...data platform I've built from scratch is for a video conferencing app, which managed to have no downtime despite the 600% usage increase during the pandemic.

Employment

  • Data Engineer Manager

    2021 - 2022
    MH Alshaya
    • Developed the first-ever Data warehouse from scratch, incorporating product analytics at scale, using various GCP services.
    • Developed the Golden Customer Record in real-time, extending the Loyalty program of 119 brands over 19 countries.
    • Developed and maintained a data quality framework with the help of the entire business team in-house, using Great Expectations at scale. This was also used in fraud analytics across 50+ brands in near real-time.
    • Led a team of six data engineers, the first set of data engineers in the organization, and started up a data-driven culture within the team.
    Technologies: Apache Airflow, Apache Spark, Google Cloud Platform (GCP), Google Analytics, Tableau, ETL, Dashboards, Data Visualization, Amazon EC2, AWS RDS, Databases, Redshift, Apache Flink, AWS S3, Data Pipelines, Spark, Apache Kafka, Data Warehouse Design, Data Lake Design, Big Data Architecture, Data Warehousing, Data Lakes, Cloud Native, Data Engineering, Google BigQuery, Data Modeling, Looker.io, Analytics, Google Cloud, Data Analysis, Data Analytics, Data Science, Terraform, Data Governance, Azure, PostgreSQL, Cloud Platforms, Looker, Parquet
  • Lead Data Engineer

    2019 - 2021
    Verizon Media
    • Developed the first streaming analytics platform to handle media stats from videoconferencing solutions using Apache Spark and Storm on AWS-managed services.
    • Built a data pipeline that autoscaled itself, not experiencing the impacts of the COVID-19 pandemic despite the 600% increase in the daily usage volume due to remote work implementation among clients’ teams.
    • Tested and implemented Apache Hudi at its early stages of development, also providing ACID transactions the ability on historical data.
    • Led a team of seven data engineers, three seniors, two juniors, and one intern. Created opportunities to interact with large clients worldwide on technical solution consultation and solution architecting.
    • Migrated a live legacy database of PostgreSQL to Snowflake with DBT on the process with a size of 2.2 PB in five days. Designed, implemented, and validated the migration on the fly with the help of an error reporting framework with 0.3% of errors.
    Technologies: Apache Airflow, Apache Spark, AWS, Python, Tableau, ELK (Elastic Stack), Datadog, Kafka Streams, ETL, Dashboards, Data Visualization, Amazon EC2, AWS RDS, Databases, Redshift, Storm, Apache Flink, AWS S3, Data Pipelines, Amazon Web Services (AWS), Spark, Big Data, Apache Kafka, Data Warehouse Design, Data Lake Design, Spark Streaming, Big Data Architecture, Data Warehousing, PySpark, Data Lakes, Cloud Native, Data Engineering, Google BigQuery, Data Modeling, Looker.io, Analytics, Google Cloud, Data Analysis, Snowflake, Data Analyst, Data Analytics, Data Governance, Azure, PostgreSQL, pgAdmin, DBT, Cloud Platforms, Parquet
  • Data Engineer

    2016 - 2018
    Amazon
    • Contributed to the world's largest eCommerce platform covering 16 marketplaces across the globe in different timezones. I was a part of the retail business team that handled the worldwide retail business data management and pipelines.
    • Managed to handle high-pressure environments and meet tight deadlines. Worked alongside the best minds in the country and the world, initiating a data engineer forum within the organization for cross-polination of ideas among us.
    • Built real-time pipelines to stream data from different platforms to the Amazon data warehouse with a service-level agreement (SLA) of a 2-minute time delay using Spark, Flink, and Tableau.
    • Created a 360-degree dashboard with perspectives on Amazon's customers across different Amazon services. The dashboard was made public on a forum and gained massive popularity for the ease of data understanding by consumers.
    Technologies: Apache Airflow, Apache Spark, AWS, Tableau, ETL, Dashboards, Data Visualization, Amazon EC2, Databases, Redshift, Storm, Apache Flink, AWS S3, Data Pipelines, Amazon Web Services (AWS), Spark, Big Data, Apache Kafka, Data Warehouse Design, Data Lake Design, Spark Streaming, Big Data Architecture, Data Warehousing, PySpark, Data Lakes, Cloud Native, Data Engineering, Google BigQuery, Data Modeling, Looker.io, Data Analysis, Data Analyst, Data Analytics, Cloud Platforms
  • Data Engineer

    2013 - 2016
    NTT Data
    • Developed, tested, and deployed end-to-end real-time and Batch ETL pipelines for a healthcare provider.
    • Documented every line of code and changes to the existing product from a business standpoint.
    • Learned new technologies with an open-minded approach and grew as an agnostic developer.
    • Developed two major data warehouse-related projects to save 23% of data storage cost and 26.5% of maintenance cost.
    Technologies: Abinitio, SQL, Teradata, AWS RDS, Amazon EC2, Databases, AWS S3, Data Pipelines, Amazon Web Services (AWS), Big Data, Data Warehousing, PySpark, Data Engineering, Data Analysis, Snowflake, Data Analyst, Microsoft Access, Cloud Platforms

Experience

  • Competitive Price Monitoring System for eCommerce Business

    The developed data framework will scrape multiple eCommerce websites based on their super-competitiveness. Super-competitiveness is the index to categorize different competitors for various product categories, used to scrape the competitor's websites one to three times a day. The output of the scraper script writes data to a data warehouse which will then be compared at the product-to-product level in real-time to generate a PCI. The price competitiveness index (PCI) is used to measure if the eCommerce business products are competitive compared to the super important and important competitors.

  • Realtime Pipelines for Fraud Alerting

    This is for a video conference application where meeting IDs were prone to get hacked. The software system was not mature enough to identify a fraudulent addition to the meetings, so I built a data layer where a fraud meeting id is caught and reported in less than 3 seconds. This was implemented using more of an open-source stack, starting from Kafka, MemSQL, Storm, and Python.

  • Driver's incentives Framework

    A real-time computational platform to calculate delivery drivers' target versus actual numbers, reward them with instant bonuses, and encourage them to achieve more than the target. This was built for a ride-hailing company where the driver's targets were not reported to them daily or intraday. A Grafana dashboard was created and embedded in the mobile app used by the driver, so the drivers are aware of their performance, the incentives they have earned, and the targets to be achieved or already achieved.

Skills

  • Languages

    SQL, Python, Snowflake
  • Frameworks

    Apache Spark, Spark, Storm
  • Tools

    Apache Airflow, Tableau, Abinitio, Kafka Streams, ELK (Elastic Stack), Microsoft Power BI, Microsoft Access, pgAdmin, Google Analytics, Apache Storm, Logstash, Grafana, Terraform, Looker
  • Paradigms

    ETL, Business Intelligence (BI), Data Science
  • Platforms

    Google Cloud Platform (GCP), Amazon EC2, Amazon Web Services (AWS), Apache Flink, Azure, Apache Kafka, Cloud Native
  • Storage

    Teradata, Redshift, Databases, AWS S3, Data Pipelines, Data Lake Design, Datadog, Data Lakes, Google Cloud, PostgreSQL, MemSQL, Elasticsearch
  • Other

    AWS, Software, Dashboards, Data Visualization, AWS RDS, Big Data, Data Warehouse Design, Data Warehousing, Data Engineering, Google BigQuery, Data Analysis, Cloud Platforms, Big Data Architecture, Data Modeling, Looker.io, Analytics, Data Analyst, Data Analytics, Data Governance, Parquet, DBT
  • Libraries/APIs

    PySpark, Spark Streaming

Education

  • Bachelor of Engineering Degree in Electronics
    2009 - 2013
    Anna University - Chennai, India

To view more profiles

Join Toptal
Share it with others