Hassan Ashraf, Data Engineering Developer in Dubai, United Arab Emirates
Hassan Ashraf

Data Engineering Developer in Dubai, United Arab Emirates

Member since June 24, 2021
Hassan has 16 years of experience, with increasingly responsible roles, developing high-performance on-premise and on-cloud data platforms. He has expertise in telecommunications, fintech, logistics, transportation, healthcare, eCommerce, and media analytics industries.
Hassan is now available for hire

Portfolio

  • JLL
    Google Cloud Platform (GCP), Labelbox, Python, Data Pipelines...
  • Vezeeta.com
    Data Engineering, Databases, AWS, AWS Glue, Amazon S3 (AWS S3), Amazon Athena...
  • Surface Mobility Consultants
    Adobe Spark, Apache Hive, Impala, Python, SQL, Geospatial Data...

Experience

Location

Dubai, United Arab Emirates

Availability

Part-time

Preferred Environment

Visual Studio Code, Shell

The most amazing...

...experience was creating the vision, architecture, detailed design, and implementation of high-performance data platforms in multiple roles.

Employment

  • Data Engineer

    2021 - 2021
    JLL
    • Developed data pipelines that take AI-generated labels of images from Labelbox and export them into the Google Cloud Platform.
    • Explored several options from Google Dataflow, Cloud functions, and more for end-to-end production.
    • Reduced data pipeline time from more than 10 minutes to a couple of minutes by integrating Labelbox and Google Cloud Storage.
    Technologies: Google Cloud Platform (GCP), Labelbox, Python, Data Pipelines, Google Cloud Storage, Google Cloud Dataproc, Google Cloud Dataflow, Google Cloud Functions
  • Lead Data Engineer

    2019 - 2021
    Vezeeta.com
    • Provided leadership from concept to production to design, implement, and evolve raw data lake, data catalogs, DWH, data science use cases integration, ETL pipelines for batch and streaming data from more than 20 data sources, and a set of dashboards.
    • Designed logical and physical data models for DWH to power up self-service BI.
    • Provided engineering leadership to design, implement, and scale batch and streaming data ingestion from many internal and external data sources.
    Technologies: Data Engineering, Databases, AWS, AWS Glue, Amazon S3 (AWS S3), Amazon Athena, Python, PySpark, Redshift, Docker, Kubernetes, Apache Airflow, Prometheus, Tableau, SQL, Shell
  • Head of Data Science

    2018 - 2019
    Surface Mobility Consultants
    • Started and led a team of data scientists, data engineers, and business analysts to work on a transportation and traffic big data and data science project.
    • Successfully led the team to deliver 17 data science use cases that involved a lot of data engineering, especially in geospatial data processing.
    • Developed a custom MicroStrategy visualization component to display advanced geospatial data.
    Technologies: Adobe Spark, Apache Hive, Impala, Python, SQL, Geospatial Data, Geospatial Analytics, MicroStrategy, Data Engineering, Data Science, Informatica
  • Lead Data Engineer

    2017 - 2018
    PegB Tech
    • Developed data platform architecture for enterprise data repository and supporting data science.
    • Developed a Kafka-based streaming pipeline that supported 1,000 transactions processed per second.
    • Migrated huge volumes of legacy data from MySQL database into HDFS and Cloudera to kickstart Spark-based data analytics.
    Technologies: Couchbase, Elasticsearch, Apache Kafka, HDFS, HP Vertica, SQL, Scala, Docker
  • Data Warehouse Engineer

    2017 - 2017
    QExpress
    • Designed a logical and physical data model of a data warehouse optimized for AWS Redshift.
    • Re-designed existing ETL packages for more fault-tolerant and optimized ETL jobs.
    • Developed a set of MicroStrategy dashboards and reports for management and operation teams.
    Technologies: AWS Glue, Amazon Redshift, MicroStrategy, SQL
  • Data Warehouse Engineer

    2011 - 2016
    DesigNET
    • Re-designed data export and load as part of ETL packages.
    • Developed a data warehouse model and ETL package to source data from around seven operational data sources.
    • Worked with multi-agency team to improve customer onboarding program to reduce onboarding time by about 30%.
    Technologies: SQL, PostgreSQL, Business Intelligence (BI), BIRT, Java
  • Freelance DWH and BI Consultant

    2010 - 2011
    Self Employed
    • Worked on business development for my freelance consulting, generating three customer engagements, one of which turned into a long-term job.
    • Developed a MicroStrategy-based dashboard for the office of CFO of a major bank in UAE.
    • Developed a reporting DB and set of reports for a warehouse based out of Wisconsin, USA.
    Technologies: SQL, MySQL, Pentaho Data Integration (Kettle), Oracle, PostgreSQL, Java, MicroStrategy, BIRT
  • Professional Services Consultant

    2006 - 2010
    Teradata
    • Led a team of BI developers to implement BI schema, reports, and dashboards for a leading telecom operator in the country.
    • Developed a dashboard for the office of the CEO to re-engage customers on a DWH project.
    • Trained internal resources on BI and DWH. Participated in logical and physical data modeling for the enterprise DWH.
    Technologies: Teradata, SQL, MicroStrategy, Data Warehouse Design, Business Intelligence (BI)

Experience

  • Raw Data Lake

    As part of the AWS-based data platform, I designed and implemented a raw data lake that integrated both streaming and batch data from more than 20 internal and external data sources of different types, volumes, and formats.

    We used AWS Glue, S3, Athena, Kafka and Kafka Connect, Python, PySPark, Docker, Airflow, and Kubernetes for the implementation of this data lake.

    We chose the Parquet file formats with day-level partitioning for better read performance.

    We used AWS-managed Kafka and hosted Kafka Connect on Kubernetes to give "managed" semantics.

  • Geospatial Data Engineering for Data Science Use Case Development

    Led a team of data engineers and data scientists to implement 17 data science use cases in traffic and transportation. Because of the nature of data, lots of geospatial data engineering had to be performed. Some of the key modules developed are

    1- Mapping bus stops on bus routes by finding minimum distance. Used KDTree for partitioning point space to optimize the process

    2- Converted continuous stream of taxi data into a discrete pickup and drop off points in time and space

    3- Mapped taxi pickup, dropoff, and bus-stop points into polygons for providing community-based analytics

    4- Processed points, line strings, and polygons for various road, stop, community-based analysis

    We used Postgres GIS, ArcGIS library for Hadoop, Geo Pandas, Scipy Spatial, QGIS, and ArcGIS JavaScript library for this project.

Skills

  • Languages

    Python, SQL, Snowflake, Scala, Java
  • Libraries/APIs

    PySpark, SciPy, ArcGIS
  • Tools

    AWS Glue, Amazon Athena, Tableau, Adobe Spark, Impala, Pentaho Data Integration (Kettle), Amazon QuickSight, Apache Airflow, Shell, Apache Beam, Terraform, Google Cloud Dataproc
  • Paradigms

    ETL, Data Science, Business Intelligence (BI)
  • Platforms

    Apache Kafka, Visual Studio Code, Docker, Kubernetes, Amazon Web Services (AWS), BIRT, Oracle, Google Cloud Platform (GCP)
  • Storage

    Databases, Distributed Databases, Amazon S3 (AWS S3), Redshift, Apache Hive, PostgreSQL, DB, Teradata, Microsoft SQL Server, Redis, Memcached, MongoDB, Amazon DynamoDB, Elasticsearch, Couchbase, HDFS, MySQL, Data Pipelines, Google Cloud Storage
  • Other

    Programming, Data Structures, Algorithms, Distributed Systems, Data Engineering, AWS, Big Data, Stream Processing, MicroStrategy, GeoPandas, AWS Data Platform, Database Optimization, Operating Systems, Software Engineering, Differential Equations, QGIS, Mathematics, Numerical Methods, IT Project Management, Web Programming, Applied Mathematics, Mathematical Methods, Algebra, Linear Algebra, Calculus, Prometheus, Informatica, Parquet, Geospatial Data, Geospatial Analytics, HP Vertica, Amazon Redshift, Data Warehouse Design, Geo Pandas, Labelbox, Google Cloud Dataflow, Google Cloud Functions
  • Frameworks

    Hadoop

Education

  • Master's Degree in Software Engineering (Distributed Systems)
    2002 - 2004
    COMSATS Institute of Information Technology - Islamabad, Pakistan
  • Bachelor of Science Degree in Mathematics and Physics
    1999 - 2001
    University of the Punjab - Lahore, Pakistan

To view more profiles

Join Toptal
Share it with others