Mahmoud Mehdi
Verified Expert in Engineering
Apache Spark Developer
Mahmoud is a senior data engineer who shows a lot of interest in building large-scale data processing systems. His passion for processing a massive amount of data helped him build his data skill rapidly. Mahmoud is a certified Apache Spark developer; he used this framework to help many clients process big data in various fields (music industry, retail, insurance, and fraud detection). He's also a Delta Lake open-source contributor (a project developed on top of Apache Spark by Databricks).
Portfolio
Experience
Availability
Preferred Environment
Amazon Web Services (AWS), Java, Scala, Apache Spark, Data Engineering, Data Pipelines
The most amazing...
...things I've implemented were data pipelines for a European retail leader that allows them to retrieve the product's stocks (terabytes of data) in real time.
Work Experience
Senior Data Engineer | Technical Lead
Intermarché
- Developed an Azure function that retrieves tickets from an Azure event hub and calculates the client's discounts in real time.
- Developed an Apache Spark job that calculates several clients' KPIs, such as the number of coupons used per client, the purchase frequency of each client, and the number of clients who have bought a discounted product by the campaign.
- Worked as the team's technical lead and defined each project's technical architecture, making FinOps studies to estimate the cloud budget.
- Developed an API using Azure Functions, Azure API Management, and Delta Lake. All the stores use the API to determine substitutes for each unavailable product when an order is getting prepared.
- Developed an API that returns a client's tickets by querying an Azure Cosmos DB table enriched in real time whenever we receive a new ticket in the event hub.
AWS Solutions Architect | Senior Data Engineer
SeLoger
- Audited the Apache Spark jobs and presented some optimizations and changes to adopt in order to enhance the workflows.
- Started and suggested the migration from Parquet to Delta Lake.
- Implemented an SCD2 pattern framework with Pandas DataFrame (dedicated to the company's data scientists).
- Implemented the Amazon Macie solution in order to detect PII data and automated its deployment with AWS CloudFormation.
- Implemented some data transformations with AWS Glue DataBrew to automatically handle the group's sensitive data (PII) by applying advanced transformations, such as replacement and encryption.
Senior Data Engineer
Believe
- Developed a big data system in the music industry: it allows users to calculate royalties to pay producers depending on the source (Deezer, Spotify, iTunes, etc.) and the contract made with the company.
- Shared my Delta Lake knowledge as a contributor to the open-source project to help my client make ACID transactions on the stored parquet files.
- Tuned the Spark jobs that handle a large amount of data.
- Orchestrated Apache Spark jobs using Apache Airflow.
- Wrote APIs to expose data using AWS Lambda and API Gateway.
Senior Data Engineer
Tekmetric
- Managed data migrations from different systems to the RDS database.
- Wrote Apache Spark jobs (using Scala) that ensured ETL processing on repair shops' data.
- Developed a "labor_guide" ETL that allows estimating how much time it will take to replace a specific part for all vehicles. (https://www.tekmetric.com/blog-post/3-0-tekmetric-labor-guide).
- Tuned the Spark jobs and ensured they were efficiently running on EMR.
- Made some query-intensive data available using AWS DMS (data migration service) by migrating data from RDS to Elasticsearch.
Senior Data Engineer
AXA
- Developed a big data system that allows fraud detection from the insurance's data.
- Designed the data platform on AWS that will handle AXA's data coming from different sources.
- Used Spark GraphFrames in order to model data in a way that allows us to detect relations between different claims.
- Made the data available for different teams using AWS Glue and Athena.
- Wrote Terraform scripts that allowed us to deploy the solution as code on AWS.
Data Engineer
Carrefour
- Developed a daily SalesSpark application that calculates the daily sales generated from different stores and exposes this data using web services.
- Developed a daily sales comparator that compares the sales' amounts between the legacy system and the new big data system: it allowed us to detect any anomalies in the data.
- Developed an assortment of jobs that processes the products' data and indicates the daily prices for each product per store and region.
- Developed a framework that optimizes the writes to the Cassandra database.
- Developed real-time applications using Spark Streaming in order to calculate in real-time the generated sales' revenues.
- Index the products data using Elasticsearch in order to index data and be able to query it.
Data Engineer | Data Scientist
Zenika
- Created a big data system/application that is able to predict football games using machine learning algorithms.
- Developed a web-scraping solution using Node.js in order to collect football data from different websites.
- Developed different Apache Spark jobs with Scala in order to process data, apply features, and launch several ML algorithms to train models and predict games' scores.
- Developed web services using the Play framework in order to interact with the machine learning models (for example, retrain the models and predict a game).
- Coded an AngularJS application in order to interact with the web services I created. We used this application to predict the UEFA Euro 2016 and Copa America games.
Experience
Delta Lake Contributor
https://github.com/delta-io/delta/Implementing a Custom Data Source with Apache Spark for Carrefour's Daily Sales
In order to avoid such issues, I took the initiative to develop a custom Apache Spark Data source that easily reads data and transforms it.
My team members used that library and had their data ready for analysis after calling only one line.
Implementing an Optimized Spark GraphFrames Solution to Detect Frauds at AXA
Once the project was completed, we were able to detect fraud rapidly compared to the old solution.
Zenprono: A Spark ML Application That Predicts Football Scores
https://blog.zenika.com/2016/06/10/zenprono-resultats-des-matchs-euro-2016/We had a good prediction rate: 77% of the predictions were correct.
Skills
Languages
Scala, SQL, Java, Python, Python 3
Frameworks
Spark, Apache Spark, Data Lakehouse, Hadoop, Scalatra, Swagger, Play, Play Framework
Libraries/APIs
PySpark, REST APIs, Spark ML, Amazon API, Azure API Management, Azure Blob Storage API, Pandas
Tools
ScalaTest, Spark SQL, Git, Terraform, Ansible, AWS Glue, Amazon Elastic MapReduce (EMR), BigQuery, Google Cloud Dataproc, Amazon Simple Queue Service (SQS), GIS, AWS CloudFormation, Apache Airflow
Paradigms
ETL, ETL Implementation & Design
Platforms
Databricks, Spark Core, AWS Lambda, Azure, Apache Kafka, Dataiku, Amazon, Amazon Web Services (AWS), Google Cloud Platform (GCP), Azure Functions, Azure Event Hubs
Storage
Database Modeling, Database Architecture, Databases, Data Pipelines, Azure Cosmos DB, Data Lakes, DB, Data Integration, Amazon S3 (AWS S3), Apache Hive, MySQL, Elasticsearch, Azure SQL, Cassandra
Other
Data Engineering, ETL Testing, ETL Tools, Data, Data Analysis, Data Analytics, Data Modeling, Data Architecture, Big Data Architecture, Scraping, Data Scraping, Web Scraping, Datasets, Data Cleansing, Serverless, API Integration, Big Data, ETL Development, Architecture, Google BigQuery, APIs, Data Warehousing, Data Warehouse Design, Cloud Patterns, AWS Glue DataBrew, Amazon Macie, Delta Lake, Azure Databricks
Education
Bachelor of Engineering Degree in Computer Science
National Institute of Applied Science and Technology - Tunis, Tunisia
Certifications
Databricks Associate Developer (Apache Spark 2.4) with Scala
Databricks
Hadoop Programming
IBM
Scala Programming for Data Science
IBM
How to Work with Toptal
Toptal matches you directly with global industry experts from our network in hours—not weeks or months.
Share your needs
Choose your talent
Start your risk-free talent trial
Top talent is in high demand.
Start hiring