Faisal Falah
Verified Expert in Engineering
AWS Data Engineer & Developer
Dubai, United Arab Emirates
Toptal member since May 19, 2021
Faisal is a data engineer with nine years of experience in implementing batch and stream data pipelines in both AWS and GCP using respective cloud services and open source tools. He's a programmer with good command of Go and Python and hands-on experience in various data projects, from ETL development for the global top bank's data migration projects to real-time clickstream processing in AWS to scaling a recommendation engine in PySpark (EMR) for the leading real estate provider in the US.
Portfolio
Experience
- SQL - 9 years
- Unix - 8 years
- Amazon Web Services (AWS) - 6 years
- Python - 5 years
- Spark - 4 years
- Unix Shell Scripting - 4 years
- Go - 3 years
- Google Cloud Platform (GCP) - 3 years
Availability
Preferred Environment
Unix, Amazon Web Services (AWS), PyCharm, Sublime Text 3, SQL, PySpark, Google Cloud Platform (GCP), RabbitMQ
The most amazing...
...thing I've created is a unified data platform in AWS for one of the leading media clients in India. Its architecture got featured in the official AWS blog.
Work Experience
Cloud Data Engineer
Leading Supply Chain Provider (Data CoE)
- Implemented data lakehouse architecture. Data was collected as a batch on-premise via DataSync, stream data via RabbitMQ to Kinesis, CRM data via REST API, and RDS via DMS. The lakehouse was created with services like S3, Glue, and Redshift.
- Designed the system completely on serverless services in AWS. Heavily used Step Functions, Lambda, and Docker. I also used Athena SQL (insert into statement) for many transformations. It was essential to keep costs to a minimum.
- Used open-source Python libraries to automate Jupiter notebooks created by data scientists with a seamless CI/CD pipeline.
Senior Data Engineer
Nissan Digital
- Served as a data engineer at Nissan car plants in Japan for the project, collecting and transforming data generated during final car quality tests conducted by a third-party service provider. I understood the architecture and took over the project.
- Learned to use new technology, Apache Nifi, and implemented new modules quickly. Initially, the transform data was stored in Hbase, and I changed that architecture to make it faster.
- Implemented a data pipeline on top of Snowflake using Python and SQL for automating data preparation for some models. Used Pandas for final data checks.
Technical Specialist
Brillio
- Designed a data pipeline for recommendation engine using Spark on AWS for a leading real estate provider in the US. Recommendation for each property is generated using a complex mathematical model which includes nested loops.
- Gained hands-on experience with a large EMR cluster with a huge data volume of 15 TB. The cluster was r4.16x large, 80-node cluster. Data was populated to Elasticsearch, DynamoDB, and S3.
- Created a PoC for data collection from IoT devices through Logstash. Data is inserted to S3.
Technical Lead
Hifx It and Media Services
- Served as a key member of the data engineering team to create a unified data platform for a leading media house in India.
- Contributed to major project parts, including clickstream analytics for user churn and conversion prediction, data lake, and warehouse in AWS using S3, Spark, and Redshift.
- Created a chatbot that acts as a virtual real estate broker using AWS Lex and deployed it to production. Also created a PoC of a news chatbot using Google Dialogflow.
- Conducted flower carpet evaluation using AI — Google Cloud's Vision API is used to extract features from submitted entries (images) and used multi-class classification (one-vs-rest) model to get the final category.
- Performed data migration from Redshift to BigQuery. Migrated critical reporting data warehouse from AWS Redshift for better query performance (10-20x) with lesser cost. Used Redshift Unload to S3 and GCS transfer for S3 to GCS.
Lead Engineer
View26 GmbH
- Contributed to the development of View26, a SaaS solution on AWS, to collect and integrate data from different software testing tools like HP ALM or Jira. It was developed by a German-based startup. I was in charge of data collection and storage.
- Created highly concurrent data collection module in Go and storing in MongoDB. Later moved to Postgres for performance and maintainability. To do this, I needed an excellent understanding of SQL and NoSQL.
- Collaborated with the front-end team, working mainly with D3 and AngularJS. We also used REST APIs, Mux, and web server frameworks in Go.
- Contributed to product ideation and other business aspects, as this company was a startup.
ETL Developer
Accenture
- Developed the data migration ETL for credit account information sharing (CAIS) for a world-leading bank. Informatica and Unix text editing utilities like AWK and SED were used for data transformation.
- Performed data integration for the Foreign Account Tax Compliance Act (FATCA). Complex business logic is converted to SQL for data transformation.
- Created the report design with a preliminary semantic layer using Crystal Reports and WebI, SAP reporting tools.
Experience
Unified Data Platform (UDP) on AWS
https://aws.amazon.com/solutions/case-studies/malayala-manorama/Data collected as events from different properties using SDKs (JavaScript, Android, and iOS). AWS Kinesis was used as a message queue. Set up batch ETL using Spark on EMR. Used Python, Go, and Shell scripts for production table data load, Apache Airflow for orchestration, and Athena and Spectrum for ad-hoc queries on top of the data lake.
Data Lakehouse
Analytics SaaS Solution on AWS | View26.com
https://view26.com/As a lead engineer in an early startup, I had hands-on experience in different systems. I mainly designed and implemented a highly concurrent data collection system using Go, MongoDB, Postgres, and Redshift.
Live Clickstream Analytics on Google Cloud
Chatbot on AWS Lex
ETL to Data Warehouse
Redshift to BigQuery Migration
Scaling Recommendation Engine in Spark
Go Module to Generate Dates
https://github.com/kkfaisal/datesSnowflake Community Blog
https://community.snowflake.com/s/article/PostgreSQL-to-Snowflake-ETL-Steps-to-Migrate-DataEducation
Bachelor's Degree in Computer Engineering
Government Engineering College, Kottayam - Kerala, India
Certifications
Databricks Certified Developer for Apache Spark 2.x for Python
Databricks
Google Cloud Certified Professional - Data Engineer
Google Cloud
AWS Certified Big Data - Specialty
Amazon Web Services
AWS Certified Solutions Architect – Associate
Amazon Web Services
C100DEV: MongoDB Certified Developer Associate Exam
MongoDB University
Skills
Libraries/APIs
PySpark, Pandas
Tools
Amazon Athena, AWS Step Functions, PyCharm, Apache Airflow, Sublime Text 3, RabbitMQ, Informatica ETL, Amazon Elastic MapReduce (EMR), Apache NiFi, Amazon Lex, Cloud Dataflow, Apache Beam
Languages
SQL, Go, Python, Python 3, Snowflake
Platforms
Amazon Web Services (AWS), Unix, AWS Lambda, Google Cloud Platform (GCP), Docker, Amazon EC2, Oracle
Storage
Redshift, Oracle 9g, MongoDB, PostgreSQL, HBase, Amazon S3 (AWS S3), Microsoft SQL Server
Frameworks
Spark
Paradigms
ETL
Other
Software Engineering, Unix Shell Scripting, Chatbots, Google BigQuery, Data Warehousing
How to Work with Toptal
Toptal matches you directly with global industry experts from our network in hours—not weeks or months.
Share your needs
Choose your talent
Start your risk-free talent trial
Top talent is in high demand.
Start hiring