Hassan Ashraf
Verified Expert in Engineering
Data Engineer and Developer
Dubai, United Arab Emirates
Toptal member since June 24, 2021
Hassan has 18 years of experience, with increasingly responsible roles, developing high-performance on-premise and on-cloud data platforms. He has expertise in telecommunications, fintech, logistics, transportation, healthcare, eCommerce, and media analytics industries.
Portfolio
Experience
Availability
Preferred Environment
Visual Studio Code (VS Code), Shell
The most amazing...
...experience was creating the vision, architecture, detailed design, and implementation of high-performance data platforms in multiple roles.
Work Experience
Principal Data Engineer
Mindshare
- Developed an automated MLOps platform to deploy, run, and monitor performance modeling machine learning models for several customers. The key was to support data of different formats from various sources.
- Designed and implemented a data platform to power up an advanced analytics team in digital marketing.
- Worked on the "Metrics that Matter" project to write a codebase that is configurable and runs for different customers with different volumes and data formats. Customer onboarding processing involved writing configurations instead of code.
Data Engineer
JLL
- Developed data pipelines that take AI-generated labels of images from Labelbox and export them into the Google Cloud Platform.
- Explored several options from Google Dataflow, Cloud functions, and more for end-to-end production.
- Reduced data pipeline time from more than 10 minutes to a couple of minutes by integrating Labelbox and Google Cloud Storage.
Lead Data Engineer
Vezeeta.com
- Provided leadership from concept to production to design, implement, and evolve raw data lake, data catalogs, DWH, data science use cases integration, ETL pipelines for batch and streaming data from more than 20 data sources, and a set of dashboards.
- Designed logical and physical data models for DWH to power up self-service BI.
- Provided engineering leadership to design, implement, and scale batch and streaming data ingestion from many internal and external data sources.
Head of Data Science
Surface Mobility Consultants
- Started and led a team of data scientists, data engineers, and business analysts to work on a transportation and traffic big data and data science project.
- Successfully led the team to deliver 17 data science use cases that involved a lot of data engineering, especially in geospatial data processing.
- Developed a custom MicroStrategy visualization component to display advanced geospatial data.
Lead Data Engineer
PegB Tech
- Developed data platform architecture for enterprise data repository and supporting data science.
- Developed a Kafka-based streaming pipeline that supported 1,000 transactions processed per second.
- Migrated huge volumes of legacy data from MySQL database into HDFS and Cloudera to kickstart Spark-based data analytics.
Data Warehouse Engineer
QExpress
- Designed a logical and physical data model of a data warehouse optimized for AWS Redshift.
- Redesigned existing ETL packages for more fault-tolerant and optimized ETL jobs.
- Developed a set of MicroStrategy dashboards and reports for management and operation teams.
Data Warehouse Engineer
DesigNET
- Re-designed data export and load as part of ETL packages.
- Developed a data warehouse model and ETL package to source data from around seven operational data sources.
- Worked with multi-agency team to improve customer onboarding program to reduce onboarding time by about 30%.
Freelance DWH and BI Consultant
Self Employed
- Worked on business development for my freelance consulting, generating three customer engagements, one of which turned into a long-term job.
- Developed a MicroStrategy-based dashboard for the office of CFO of a major bank in UAE.
- Developed a reporting DB and set of reports for a warehouse based out of Wisconsin, USA.
Professional Services Consultant
Teradata
- Led a team of BI developers to implement BI schema, reports, and dashboards for a leading telecom operator in the country.
- Developed a dashboard for the office of the CEO to re-engage customers on a DWH project.
- Trained internal resources on BI and DWH. Participated in logical and physical data modeling for the enterprise DWH.
Experience
Raw Data Lake
We used AWS Glue, S3, Athena, Kafka and Kafka Connect, Python, PySPark, Docker, Airflow, and Kubernetes for the implementation of this data lake.
We chose the Parquet file formats with day-level partitioning for better read performance.
We used AWS-managed Kafka and hosted Kafka Connect on Kubernetes to give "managed" semantics.
Geospatial Data Engineering for Data Science Use Case Development
1- Mapping bus stops on bus routes by finding minimum distance. Used KDTree for partitioning point space to optimize the process
2- Converted continuous stream of taxi data into a discrete pickup and drop off points in time and space
3- Mapped taxi pickup, dropoff, and bus-stop points into polygons for providing community-based analytics
4- Processed points, line strings, and polygons for various road, stop, community-based analysis
We used Postgres GIS, ArcGIS library for Hadoop, Geo Pandas, Scipy Spatial, QGIS, and ArcGIS JavaScript library for this project.
Education
Master's Degree in Software Engineering (Distributed Systems)
COMSATS Institute of Information Technology - Islamabad, Pakistan
Bachelor of Science Degree in Mathematics and Physics
University of the Punjab - Lahore, Pakistan
Skills
Libraries/APIs
PySpark, SciPy, ArcGIS, Pandas
Tools
AWS Glue, Amazon Athena, Tableau, Adobe Spark, Impala, Pentaho Data Integration (Kettle), Amazon QuickSight, Apache Airflow, Shell, Apache Beam, Terraform, Google Cloud Dataproc, DataRobot
Languages
Python, SQL, Snowflake, Scala, Java
Paradigms
ETL, Business Intelligence (BI)
Platforms
Apache Kafka, Visual Studio Code (VS Code), Docker, Kubernetes, Amazon Web Services (AWS), BIRT, Oracle, Google Cloud Platform (GCP), Labelbox, Azure
Storage
Databases, Distributed Databases, Amazon S3 (AWS S3), Redshift, Apache Hive, PostgreSQL, DB, Teradata, Microsoft SQL Server, Redis, Memcached, MongoDB, Amazon DynamoDB, Elasticsearch, Couchbase, HDFS, Vertica, MySQL, Data Pipelines, Google Cloud Storage
Frameworks
Hadoop
Other
Programming, Data Structures, Algorithms, Distributed Systems, Data Engineering, Big Data Architecture, Stream Processing, MicroStrategy Development, GeoPandas, Database Optimization, Data Management Platforms, Operating Systems, Software Engineering, Differential Equations, QGIS, Mathematics, Numerical Methods, IT Project Management, Web Programming, Applied Mathematics, Algebra, Linear Algebra, Calculus, Prometheus, Informatica, Parquet, Geospatial Data, Geospatial Analytics, Data Science, Amazon Redshift, Data Warehouse Design, Google Cloud Dataflow, Google Cloud Functions, Machine Learning Operations (MLOps)
How to Work with Toptal
Toptal matches you directly with global industry experts from our network in hours—not weeks or months.
Share your needs
Choose your talent
Start your risk-free talent trial
Top talent is in high demand.
Start hiring