Hanee' is available for hire

Hanee' Medhat Shousha

Verified Expert in Engineering

Big Data Architect and Developer

Location

Cairo, Cairo Governorate, Egypt

Toptal Member Since

June 18, 2020

Hanee' is a data expert who enjoys working on data analytics and segmentation to better target customers with campaigns. He is an experienced Java developer who has built enterprise applications that interact with millions of customers daily. Hanee' also has experience working with big data, Spark, and Python.

Portfolio

Multinational Healthcare Company

Python, SQL, Data Engineering, Azure Functions, Azure Databricks, Databricks...

US-based Market Research Company

Python, Spark, PySpark, Apache Airflow, Google Cloud Platform (GCP), BigQuery...

Top Beverages Company

Big Data, Databricks, Spark, PySpark, Python, Azure Data Lake...

Experience

SQL - 10 years Python - 7 years Apache Spark - 7 years Big Data - 7 years Apache Airflow - 3 years Tableau - 2 years Django REST Framework - 2 years Machine Learning - 2 years

Availability

Part-time

Preferred Environment

Big Data, Git, Linux, OS X

The most amazing...

...project that I've implemented is a platform that optimized campaigns scripts by analyzing user responses to identify the best way to interact with the users.

Work Experience

Senior Data Engineer

2022 - 2023

Multinational Healthcare Company

Designed and built data pipelines to process EHR data.
Designed and constructed data reporting data models for data collected from multiple systems.
Built APIs to integrate front applications with a unified data platform.
Built ETL jobs using PySpark on Azure Databricks.

Technologies: Python, SQL, Data Engineering, Azure Functions, Azure Databricks, Databricks, Message Bus, APIs, Azure API Management, Azure Logic Apps, Electronic Health Records (EHR), PostgreSQL, Delta Lake, GitLab CI/CD

Senior Data Engineer

2021 - 2022

US-based Market Research Company

Developed data engineering solutions in the Google Cloud Platform (GCP) environment.
Built and orchestrated complex data pipelines using Apache Airflow.
Developed streaming and batch-processing data pipelines.
Built complex data pipelines using different technologies and integrations.
Designed and modeled data for building a unified data warehouse.
Built and automated deployments using CI/CD pipelines.

Technologies: Python, Spark, PySpark, Apache Airflow, Google Cloud Platform (GCP), BigQuery, Google BigQuery, Big Data, Cloud Dataflow, Apache Beam, Flask, APIs, REST, Travis CI, GitHub Actions, Data Modeling, Data Architecture, Architecture, Pub/Sub, Presto, Streaming Data, Terraform, CI/CD Pipelines

Senior Data Engineer

2020 - 2021

Top Beverages Company

Built data ingestion pipelines to get data from various sources.
Transformed and cleaned data using PySpark over Databricks.
Designed a unified data model to combine all data from different sources and formats.
Built and stored all data in a centralized data lake.
Created an automated the data pipeline to do all ETL logic using Data Factory.
Automated Databricks job deployments along with Data Factory pipelines using Azure DevOps.
Worked and processed data in the Snowflake data warehouse.

Technologies: Big Data, Databricks, Spark, PySpark, Python, Azure Data Lake, Azure Data Factory, Azure Synapse, Data Engineering, Data Warehouse Design, Modeling, ETL, Delta Lake, Azure DevOps, Tableau, Snowflake

Big Data Architect

2019 - 2020

Vodafone Group

Designed data pipelines for different data source types using GCP cloud technologies.
Developed and implemented ETL jobs using Apache Spark.
Developed and implemented analytical jobs using Spark.
Developed and built geospatial analysis models with Spark to do parallel geoprocessing.
Implemented and developed data pipelines to ingest data from on-premise clusters into a cloud data lake.
Developed dashboards for businesses using Tableau.
Worked and developed use cases on on-premise clusters.
Migrated data and jobs from on-premise to cloud clusters.
Designed and applied modeling for data stores to be used for reporting.

Technologies: Data Engineering, Apache Airflow, Data Warehouse Design, Data Warehousing, SQL, Apache Spark, Big Data Architecture, Big Data, Apache Beam, NiFi, Scala, Tableau, BigQuery, GeoPandas, Python, Spark, Hadoop, Google Cloud Platform (GCP), GIS, Data Architecture, GeoSpark, Apache NiFi, GitLab CI/CD, Cloud Dataflow, PostgreSQL, Apache Kafka, Data Modeling, ETL, Unix, Pandas, NumPy, Data Pipelines, Machine Learning, Jenkins, Data Science, Redis, Linux, Jupyter Notebook, Google Cloud Dataproc, Google BigQuery, Continuous Integration (CI)

Senior Python Developer

2018 - 2019

Rio Tinto (via Toptal)

Built a data processing platform to process seismic events.
Created a RESTful API to store and retrieve seismic data and files.
Used Kafka as a message bus between all modules.
Implemented Redis as a cache to store data needed to accessed frequently by the pipeline.
Built an admin UI by Django to administer configurations and saved objects.
Integrated the API with different processing pipeline stages to trigger sync-and-async processing of data.
Migrated and converted a Flask API to a Django RESTful API.
Worked with docker containerized environments for different pipeline modules.
Worked with automated deployment pipelines on Kubernetes.
Developed and ran components on the Microsoft Azure cloud platform.

Technologies: Data Engineering, Apache Airflow, SQL, Big Data Architecture, Big Data, Azure, Kubernetes, Docker, MongoDB, Redis, Apache Kafka, Flask, Django REST Framework, Python, PostgreSQL, Data Architecture, GitLab CI/CD, Data Modeling, ETL, Unix, Data Pipelines, Linux, Continuous Integration (CI), Prometheus

Senior Big Data Engineer

2017 - 2019

Orange Business Services

Developed new business use cases with big data technologies.
Created analytical and ETL jobs using Spark.
Built data pipelines to ingest data into different data lakes like Azure DataLake.
Developed new PoCs for customers to build big data platforms over cloud environments.
Constructed a real-time monitoring platform to monitor all customers servers hosted on cloud.
Implemented a new centralized Elasticsearch to collect metrics from all customers servers.
Designed and built multiple dashboards for systems monitoring use cases using Tableau and Power BI.
Developed multiple automated scripts for most day-to-day tasks.
Handled and optimized the performance of the big data platforms.
Managed the Hadoop clusters with all included services.
Developed scripts and modules that automate day-to-day tasks.
Led a squad for automation and self-monitoring activities.
Upgraded on-premise Hadoop cluster version.
Managed and added new nodes and disks to on-premise Hadoop.
Installed and built the security of Hadoop clusters using Kerberos, Knox, and Ranger.
Worked on different cloud platforms like Azure and AWS.

Technologies: Amazon Web Services (AWS), Data Engineering, SQL, Apache Spark, Big Data, Azure Data Lake, Amazon S3 (AWS S3), Azure, Microsoft Power BI, Tableau, MongoDB, Cassandra, Elasticsearch, Apache Hive, Apache Kafka, NiFi, Spark, Hadoop, PostgreSQL, MySQL, Google Cloud Platform (GCP), Automation, Data Architecture, Apache NiFi, Python, Data Modeling, ETL, Unix, Pandas, NumPy, Data Pipelines, Linux, Hortonworks Data Platform (HDP), Google Cloud Dataproc, Google BigQuery, HBase

DWH and Campaigns Senior Developer

2014 - 2017

Etisalat

Developed analysis and segmentation models to build customer profiles.
Created offering and campaign applications to create targeted and non-targeted campaigns that reach millions of customers daily.
Built real-time engines that serve and fulfill millions of customer requests per hour.
Designed and developed massive complex platforms that interact with many different systems.
Developed real-time location based advertising platform to send users offers based on their current location.
Developed multiple data monetization solutions to be used by third-party advertisers.
Developed and integrated the campaigns applications with many channels to empower business to reach users using there preferred channels.
Built many web applications to empower business users to easily interact with campaigns platform.
Designed and put architecture of DWH models for reporting and segmentations.
Developed ETL and Integration jobs from different sources to DWH.

Technologies: Data Warehouse Design, Data Warehousing, SQL, SQL Server Integration Services (SSIS), PrimeFaces, Microsoft SQL Server, Oracle, Teradata, Spark, Spring, JSF, Java, Python, Apache Spark, MySQL, Data Architecture, Data Modeling, ETL, Aprimo, Hortonworks Data Platform (HDP), HBase

MIS Specialist

2013 - 2014

ADIB

Designed and implemented new database models for reporting purposes.
Developed extraction jobs and stored procedures.
Implemented Business Objects universes and developed Business Objects reports.
Developed custom Crystal Reports.
Performed data transformation.

Technologies: Data Warehouse Design, Data Warehousing, SQL, Sybase, Crystal Reports, SAP BusinessObjects (BO), Data Architecture, Data Modeling

DWH Support Analyst

2012 - 2013

Etisalat

Deployed and fixed issues for production ETL jobs, data mining, and analytic models.
Developed new shell scripts for automatic monitoring and alarms for production issues.

Technologies: Data Warehouse Design, Data Warehousing, SQL, Teradata Warehouse Miner, Unix Shell Scripting, Datastage, Oracle, Teradata, Aprimo

Software Developer

2011 - 2012

ITS

Developed new modules in core banking applications.
Handled a full migration of the trade finance applications from Sybase to a SQL server.
Implemented a full-service interface for a trade finance application.
Developed custom reports using Crystal Reports.

Technologies: SQL, Java, Sybase, Oracle

Experience

Certified CCA Spark and Hadoop Developer (CCA175)

I gained a certification from Cloudera.
License No: 100-019-596.

Big Data Development | Mastery Award for Professionals 2016

https://www.youracclaim.com/badges/da6c7070-8fde-4799-b04e-f9d8719a49a3/linked_in_profile

I won an award from IBM.

Big Data Specialist with IBM BigInsights V2.1 Certificate

I gained the certificate IBM Big Data Specialist in March 2016 with license # 0717-1458-8215-5644.

Introduction to Big Data Certificate

https://www.coursera.org/account/accomplishments/certificate/CPH7HZ6TDEZN

I completed this Coursera course from the University of California, San Diego.

Hadoop Platform and Application Framework Certificate

https://www.coursera.org/account/accomplishments/certificate/Y6QNGTJMQFVV

I completed this Coursera course from the University of California, San Diego.

Introduction to Data Science in Python

I completed and earned a certificate from the Michigan University via Coursera.

Publication

Apache Spark Streaming Tutorial: Identifying Trending Twitter Hashtags

https://www.toptal.com/apache/apache-spark-streaming-twitter

Skillset

Languages

SQL, Java, Python, Scala, C++, Snowflake

Frameworks

Apache Spark, Django REST Framework, Spark, Hadoop, Django, Flask, JSF, PrimeFaces, Spring, Presto

Libraries/APIs

Pandas, NumPy, PySpark, D3.js, Chart.js, Azure API Management

Tools

Azure HDInsight, Git, Apache Beam, Cloud Dataflow, Tableau, Cloudera, Google Cloud Dataproc, GIS, GitHub, Apache Airflow, Amazon Elastic MapReduce (EMR), Apache Impala, Apache Sqoop, Apache Avro, Apache NiFi, GitLab CI/CD, Jenkins, Microsoft Power BI, Teradata Warehouse Miner, BigQuery, Qlik Sense, Grafana, IBM InfoSphere (DataStage), Crystal Reports, Kibana, Travis CI, Azure Logic Apps, Terraform

Paradigms

Business Intelligence (BI), ETL, REST, Continuous Integration (CI), Automation, Data Science, Azure DevOps

Platforms

Databricks, Jupyter Notebook, Linux, Apache Kafka, Hortonworks Data Platform (HDP), Unix, Google Cloud Platform (GCP), Azure, Oracle, Docker, Kubernetes, Amazon Web Services (AWS), OS X, Azure Synapse, Azure Functions

Storage

MySQL, Teradata, Apache Hive, PostgreSQL, Microsoft SQL Server, Data Pipelines, Amazon S3 (AWS S3), MongoDB, HBase, Sybase, Elasticsearch, PostGIS, Redis, SQL Server Integration Services (SSIS), Cassandra, Datastage

Other

Azure Data Lake, Azure Data Factory, Data Warehouse Design, Big Data, Data Warehousing, Aprimo, APIs, Data Engineering, Data Architecture, Big Data Architecture, Data Analysis, Data Modeling, Modeling, NiFi, SAP BusinessObjects (BO), Parquet, Machine Learning, Google BigQuery, GeoPandas, GeoSpark, Scraping, Unix Shell Scripting, Prometheus, Apache Flume, Statistics, Delta Lake, GitHub Actions, Architecture, Pub/Sub, Streaming Data, Azure Databricks, Message Bus, Electronic Health Records (EHR), CI/CD Pipelines

Education

2010 - 2011

Diploma in Business Intelligence and Software Development

Information Technology Institute - Cairo, Egypt

2005 - 2010

Bachelor of Engineering Degree in Computer Engineering

Benha University - Banha, Egypt

Certifications

DECEMBER 2017 - DECEMBER 2019

CCA Spark and Hadoop Developer CCA175

Cloudera

Collaboration That Works

How to Work with Toptal

Toptal matches you directly with global industry experts from our network in hours—not weeks or months.

Share your needs

Discuss your requirements and refine your scope in a call with a Toptal domain expert.

Choose your talent

Get a short list of expertly matched talent within 24 hours to review, interview, and choose from.

Start your risk-free talent trial

Work with your chosen talent on a trial basis for up to two weeks. Pay only if you decide to hire them.

Top talent is in high demand.

Start hiring