Hassan is available for hire

Hassan Ashraf

Verified Expert in Engineering

Data Engineer and Developer

Location

Dubai, United Arab Emirates

Toptal Member Since

June 24, 2021

Hassan has 18 years of experience, with increasingly responsible roles, developing high-performance on-premise and on-cloud data platforms. He has expertise in telecommunications, fintech, logistics, transportation, healthcare, eCommerce, and media analytics industries.

Portfolio

Mindshare

Adobe Spark, Python, DataRobot, Data Engineering, Azure...

JLL

Google Cloud Platform (GCP), Labelbox, Python, Data Pipelines...

Vezeeta.com

Data Engineering, Databases, Amazon Web Services (AWS), AWS Glue...

Experience

Data Engineering - 15 years Distributed Systems - 10 years Data Management Platforms - 7 years Big Data - 7 years Python - 5 years Apache Kafka - 3 years Snowflake - 1 year GeoPandas - 1 year

Availability

Full-time

Preferred Environment

Visual Studio Code (VS Code), Shell

The most amazing...

...experience was creating the vision, architecture, detailed design, and implementation of high-performance data platforms in multiple roles.

Work Experience

Principal Data Engineer

2021 - 2024

Mindshare

Developed an automated MLOps platform to deploy, run, and monitor performance modeling machine learning models for several customers. The key was to support data of different formats from various sources.
Designed and implemented a data platform to power up an advanced analytics team in digital marketing.
Worked on the "Metrics that Matter" project to write a codebase that is configurable and runs for different customers with different volumes and data formats. Customer onboarding processing involved writing configurations instead of code.

Technologies: Adobe Spark, Python, DataRobot, Data Engineering, Azure, Google Cloud Platform (GCP), SQL, Machine Learning Operations (MLOps)

Data Engineer

2021 - 2021

JLL

Developed data pipelines that take AI-generated labels of images from Labelbox and export them into the Google Cloud Platform.
Explored several options from Google Dataflow, Cloud functions, and more for end-to-end production.
Reduced data pipeline time from more than 10 minutes to a couple of minutes by integrating Labelbox and Google Cloud Storage.

Technologies: Google Cloud Platform (GCP), Labelbox, Python, Data Pipelines, Google Cloud Storage, Google Cloud Dataproc, Google Cloud Dataflow, Google Cloud Functions

Lead Data Engineer

2019 - 2021

Vezeeta.com

Provided leadership from concept to production to design, implement, and evolve raw data lake, data catalogs, DWH, data science use cases integration, ETL pipelines for batch and streaming data from more than 20 data sources, and a set of dashboards.
Designed logical and physical data models for DWH to power up self-service BI.
Provided engineering leadership to design, implement, and scale batch and streaming data ingestion from many internal and external data sources.

Technologies: Data Engineering, Databases, Amazon Web Services (AWS), AWS Glue, Amazon S3 (AWS S3), Amazon Athena, Python, PySpark, Redshift, Docker, Kubernetes, Apache Airflow, Prometheus, Tableau, SQL, Shell

Head of Data Science

2018 - 2019

Surface Mobility Consultants

Started and led a team of data scientists, data engineers, and business analysts to work on a transportation and traffic big data and data science project.
Successfully led the team to deliver 17 data science use cases that involved a lot of data engineering, especially in geospatial data processing.
Developed a custom MicroStrategy visualization component to display advanced geospatial data.

Technologies: Adobe Spark, Apache Hive, Impala, Python, SQL, Geospatial Data, Geospatial Analytics, MicroStrategy, Data Engineering, Data Science, Informatica

Lead Data Engineer

2017 - 2018

PegB Tech

Developed data platform architecture for enterprise data repository and supporting data science.
Developed a Kafka-based streaming pipeline that supported 1,000 transactions processed per second.
Migrated huge volumes of legacy data from MySQL database into HDFS and Cloudera to kickstart Spark-based data analytics.

Technologies: Couchbase, Elasticsearch, Apache Kafka, HDFS, Vertica, SQL, Scala, Docker

Data Warehouse Engineer

2017 - 2017

QExpress

Designed a logical and physical data model of a data warehouse optimized for AWS Redshift.
Redesigned existing ETL packages for more fault-tolerant and optimized ETL jobs.
Developed a set of MicroStrategy dashboards and reports for management and operation teams.

Technologies: AWS Glue, Amazon Redshift, MicroStrategy, SQL

Data Warehouse Engineer

2011 - 2016

DesigNET

Re-designed data export and load as part of ETL packages.
Developed a data warehouse model and ETL package to source data from around seven operational data sources.
Worked with multi-agency team to improve customer onboarding program to reduce onboarding time by about 30%.

Technologies: SQL, PostgreSQL, Business Intelligence (BI), BIRT, Java

Freelance DWH and BI Consultant

2010 - 2011

Self Employed

Worked on business development for my freelance consulting, generating three customer engagements, one of which turned into a long-term job.
Developed a MicroStrategy-based dashboard for the office of CFO of a major bank in UAE.
Developed a reporting DB and set of reports for a warehouse based out of Wisconsin, USA.

Technologies: SQL, MySQL, Pentaho Data Integration (Kettle), Oracle, PostgreSQL, Java, MicroStrategy, BIRT

Professional Services Consultant

2006 - 2010

Teradata

Led a team of BI developers to implement BI schema, reports, and dashboards for a leading telecom operator in the country.
Developed a dashboard for the office of the CEO to re-engage customers on a DWH project.
Trained internal resources on BI and DWH. Participated in logical and physical data modeling for the enterprise DWH.

Technologies: Teradata, SQL, MicroStrategy, Data Warehouse Design, Business Intelligence (BI)

Experience

Raw Data Lake

As part of the AWS-based data platform, I designed and implemented a raw data lake that integrated both streaming and batch data from more than 20 internal and external data sources of different types, volumes, and formats.

We used AWS Glue, S3, Athena, Kafka and Kafka Connect, Python, PySPark, Docker, Airflow, and Kubernetes for the implementation of this data lake.

We chose the Parquet file formats with day-level partitioning for better read performance.

We used AWS-managed Kafka and hosted Kafka Connect on Kubernetes to give "managed" semantics.

Geospatial Data Engineering for Data Science Use Case Development

Led a team of data engineers and data scientists to implement 17 data science use cases in traffic and transportation. Because of the nature of data, lots of geospatial data engineering had to be performed. Some of the key modules developed are

1- Mapping bus stops on bus routes by finding minimum distance. Used KDTree for partitioning point space to optimize the process

2- Converted continuous stream of taxi data into a discrete pickup and drop off points in time and space

3- Mapped taxi pickup, dropoff, and bus-stop points into polygons for providing community-based analytics

4- Processed points, line strings, and polygons for various road, stop, community-based analysis

We used Postgres GIS, ArcGIS library for Hadoop, Geo Pandas, Scipy Spatial, QGIS, and ArcGIS JavaScript library for this project.

Education

2002 - 2004

Master's Degree in Software Engineering (Distributed Systems)

COMSATS Institute of Information Technology - Islamabad, Pakistan

1999 - 2001

Bachelor of Science Degree in Mathematics and Physics

University of the Punjab - Lahore, Pakistan

Skills

Libraries/APIs

PySpark, SciPy, ArcGIS, Pandas

Tools

AWS Glue, Amazon Athena, Tableau, Adobe Spark, Impala, Pentaho Data Integration (Kettle), Amazon QuickSight, Apache Airflow, Shell, Apache Beam, Terraform, Google Cloud Dataproc, DataRobot

Languages

Python, SQL, Snowflake, Scala, Java

Paradigms

ETL, Data Science, Business Intelligence (BI)

Platforms

Apache Kafka, Visual Studio Code (VS Code), Docker, Kubernetes, Amazon Web Services (AWS), BIRT, Oracle, Google Cloud Platform (GCP), Azure

Storage

Databases, Distributed Databases, Amazon S3 (AWS S3), Redshift, Apache Hive, PostgreSQL, DB, Teradata, Microsoft SQL Server, Redis, Memcached, MongoDB, Amazon DynamoDB, Elasticsearch, Couchbase, HDFS, Vertica, MySQL, Data Pipelines, Google Cloud Storage

Frameworks

Hadoop

Other

Programming, Data Structures, Algorithms, Distributed Systems, Data Engineering, Big Data, Stream Processing, MicroStrategy, GeoPandas, Database Optimization, Data Management Platforms, Operating Systems, Software Engineering, Differential Equations, QGIS, Mathematics, Numerical Methods, IT Project Management, Web Programming, Applied Mathematics, Algebra, Linear Algebra, Calculus, Prometheus, Informatica, Parquet, Geospatial Data, Geospatial Analytics, Amazon Redshift, Data Warehouse Design, Labelbox, Google Cloud Dataflow, Google Cloud Functions, Machine Learning Operations (MLOps)

Collaboration That Works

How to Work with Toptal

Toptal matches you directly with global industry experts from our network in hours—not weeks or months.

Share your needs

Discuss your requirements and refine your scope in a call with a Toptal domain expert.

Choose your talent

Get a short list of expertly matched talent within 24 hours to review, interview, and choose from.

Start your risk-free talent trial

Work with your chosen talent on a trial basis for up to two weeks. Pay only if you decide to hire them.

Top talent is in high demand.

Start hiring