
Ali Ashfaq
Verified Expert in Engineering
Data Engineer and Developer
Lahore, Punjab, Pakistan
Toptal member since June 7, 2022
Ali is a Google Certified Professional Data Engineer with 6+ years of experience in data engineering, database design, ETL development, and data warehouse testing with Google Cloud Platform (GCP) projects. He has delivered multiple data pipelines and end-to-end ETL processes in GCP. Ali is also skilled in data modeling for enterprise data warehouse implementation projects.
Portfolio
Experience
- Data Engineering - 8 years
- Data Visualization - 7 years
- Google Cloud Platform (GCP) - 7 years
- Python - 7 years
- Data Analytics - 7 years
- ETL Tools - 7 years
- Talend ETL - 7 years
- BigQuery - 7 years
Availability
Preferred Environment
PyCharm, Windows, Data Warehousing, Google Sheets, Data Warehouse Design, Analytics, Ad-hoc Reporting, Business Intelligence (BI), Dashboards, Data Engineering, Snowflake, PostgreSQL, Graph Databases, Real Estate, MySQL, Back-end Development, OLTP, OLAP, APIs
The most amazing...
...EDW I've developed is a credit scoring model that streamlined the bank's loan approving and dispersing processes, increasing revenue by $500 million.
Work Experience
Senior Data Engineer
Coca-Cola Icecek
- Migrated SAP jobs to GCP Apache Airflow using different airflow operators. Built data pipelines in GCP using Python. Data transformations and cleansing were carried out using SQL queries and Python. Followed CI/CD best practices using GitHub.
- Managed Jira by overseeing project workflows, assigning tasks, tracking progress, and ensuring timely delivery. Implemented efficient ticketing systems and collaborated with teams.
- Facilitated efficient data exchange between SQL Server and GCP, enabling orders by pushing recommendations to a mobile app-connected database. This streamlined sales, enhanced inventory management, and boosted profitability.
Graph Database Engineer
Syngenta
- Built scalable ETL pipelines using AWS Glue to automate data transformation and loading, reducing processing time by 40% while ensuring data accuracy for large-scale agricultural datasets.
- Developed and deployed API integrations using AWS Lambda to connect external data sources with internal systems, improving real-time data accessibility and reducing latency for mission-critical operations.
- Leveraged AWS Glue workflows and Lambda functions to support real-time analytics, improving decision-making for data-driven applications in the agriculture sector.
Data Analyst
Bach App Inc
- Integrated and streamlined data ingestion pipelines by consolidating data from third-party ETL tools like Hevo and Stitch, as well as the client’s proprietary application data, into a centralized Google BigQuery data warehouse.
- Collaborated extensively with the client to understand their complex customer classification model, including detailed segmentation and multi-dimensional attributes, and translated these requirements into a structured schema in BigQuery.
- Designed and implemented custom data transformations in BigQuery to create tables and views that accurately reflected the client’s intricate customer classifications, ensuring consistency and scalability.
- Automated the processing of ingested data using BigQuery scheduled queries and Cloud Functions, ensuring the data remained up-to-date and ready for analysis.
- Created interactive dashboards on Looker Studio, visualizing critical metrics and KPIs tailored to the client’s business model, enabling them to gain actionable insights into customer behavior and marketing performance.
- Built dynamic reporting workflows using Google Sheets integrated with BigQuery, enabling quick ad-hoc analysis and easy access for non-technical stakeholders.
Marketing Scientist and Business Intelligence Analyst
Tawkify, Inc.
- Designed and implemented a comprehensive Sigma Compute Dashboard, enabling the client to track daily traffic trends and marketing spend, resulting in improved campaign performance monitoring.
- Uncovered and presented actionable insights from data analysis, empowering the business to make informed, data-driven decisions that enhanced strategic planning.
- Generated multiple reports to analyze frequently purchased packages, correlating trends with UTM campaign parameters to optimize marketing efforts and improve ROI.
- Identified critical data quality issues and proactively communicated their impact to the engineering team, streamlining data accuracy and reliability for reporting.
- Developed an in-depth User Behavior Dashboard to analyze and compare user versus client demographics, providing actionable insights to tailor marketing strategies.
BigQuery Expert
Peerly Inc
- Structured and optimized a 4TB dataset within BigQuery, leading to a 50% improvement in query performance and data retrieval times.
- Created multiple materialized views to enhance data analysis capabilities, facilitating faster and more accurate reporting for the client's P2P texting platform.
- Enhanced data ingestion pipelines from Pub/Sub and SQL external connections, ensuring seamless and efficient data flow into BigQuery, reducing latency by 30%.
- Assisted the client's team with architectural best practices in BigQuery, elevating their understanding and proficiency, which resulted in a 40% increase in team efficiency.
Graph DB Expert
Syngenta - Digital Product Engineering - Brazil 2024
- Developed and implemented graph database optimizations using Neo4j and AWS Neptune, resulting in a 40% reduction in query response time and improved data retrieval efficiency.
- Designed and deployed serverless functions with AWS Lambda to automate data processing workflows, reducing operational overhead by 30%.
- Created and maintained Node.js applications to facilitate seamless data integration and processing across various systems, enhancing data flow and system interoperability.
- Conducted comprehensive assessments to identify and resolve system bottlenecks, significantly improving the client's data processing capabilities.
- Leveraged AWS cloud services, including Lambda, S3, and EC2, to build scalable and reliable data pipelines, ensuring high availability and performance.
Lead Data Engineer, Web Analytics and Insights
Dr. Barbara Sturm
- Engineered an advanced ETL pipeline that consolidated Clickstream data into Google BigQuery, enhancing data-driven strategies for a leading European online cosmetic retailer.
- Developed complex SQL queries in BigQuery, processing and analyzing vast datasets to optimize digital marketing efforts, resulting in measurable improvements in customer engagement and sales.
- Implemented and fine-tuned data visualizations in Looker Studio, presenting real-time sales, vouchers, and promotional data, which significantly supported decision-making processes for marketing and sales teams.
- Streamlined the integration of Google Analytics data into BigQuery, employing custom SQL scripts to extract nuanced insights into user behavior, which informed and enhanced the online retail marketing strategy for a premier skincare brand.
Data Analyst
Uptraded GmbH
- Analyzed user interaction data within the Uptraded app using Mixpanel, providing insights that led to a 20% improvement in user conversion rates over four weeks.
- Streamlined the data collection framework to ensure the capture of meaningful analytics, optimizing the Mixpanel set up for future data-driven strategies.
- Conducted in-depth analysis of secondhand fashion consumer trends, contributing to a platform overhaul that emphasized circular fashion and increased app retention by 15%.
- Collaborated with cross-functional teams, translating complex data into actionable strategies aligned with Uptraded's mission of sustainable fashion consumption.
- Facilitated knowledge transfer sessions for the Uptraded team, empowering them with the analytical skills necessary to leverage Mixpanel for ongoing conversion optimization initiatives.
GCP Data Engineer
Patrianna Limited
- Led the optimization of ETL processes on GCP, which reduced data processing times by 40%, enhancing the app's performance for end-users.
- Collaborated with a team of data analysts and financial experts to translate complex financial concepts into clear, actionable insights within the app, driving a user satisfaction score increase of 20%.
- Implemented BigQuery solutions to handle complex queries over large datasets, enabling the app to calculate projected savings growth and net worth estimations rapidly.
- Orchestrated secure and compliant data storage mechanisms using Cloud Storage with encryption at rest and in transit, adhering to financial data security regulations and best practices.
- Automated data ingestion workflows using Cloud Dataflow and Cloud Composer, ensuring efficient and error-free data updates across user accounts for accurate financial tracking.
Senior Data Engineer
Tealbook
- Developed a robust web scraping solution using Python, Scrapy, and Selenium. This enabled the efficient extraction of certification data from various sources. I transformed and preprocessed the data using Apache Airflow, ensuring consistency and quality.
- Collaborated with cross-functional teams, using data engineering best practices. Leveraging DBT, I defined data models and ETL processes. Robust data-quality checks and monitoring mechanisms addressed integrity and consistency issues.
- Benefited from reliable and accurate certification data that supports informed decision-making and drives business growth.
Senior Data Analytics Consultant
Systems Limited
- Built and architected multiple data pipelines and end-to-end ETL processes for data ingestion and transformation in GCP.
- Prepared and coordinated tasks among the team I managed.
- Designed and created various layers of the data lake.
- Executed test cases to identify potential issues with ETL jobs and ensure data sanity and integrity in the database.
- Integrated more than 20 sources to give customers a 360-degree view.
ETL/BI Developer
Analytics Private Limited
- Managed and mentored a team of five resources and collaborated with clients to understand data needs and maintain a close working relationship.
- Performed data modeling and led an enterprise data warehouse (EDW) implementation project.
- Designed and developed end-to-end ETL pipelines and tested processes for data validation before loading it into a data warehouse.
- Identified and implemented methodologies to ensure data integrity and quality.
Experience
Core Modules for Easy Buy
https://www.mattressfirm.com/The project used analytics and business intelligence (BI) to improve customer experience and engagement, retain potential customers, offer them the best choice, enhance the system's efficiency, and increase sales based on historical and incremental data.
I designed and built multiple end-to-end ETL processes for data ingestion and transformation in the Google Cloud Platform (GCP) and coordinated tasks among the team. The core modules allowed the system to integrate data from various heterogeneous source platforms, including Google Cloud, Amazon S3 bucket, MuleSoft API, and SFTP. It also featured the ETL process by incorporating data in the GCP deep learning and BigQuery DWH and ingesting the data in the Neo4J graph database.
I used Python, SQL, and GCP stack for data processing and Apache Airflow for data orchestration during the project. I used the Google Cloud function with Python to load data into BigQuery for on-arrival CSV files in the Google Cloud Storage (GCS) bucket and maintained raw file archival in the cloud storage.
EDW for a Private Credit Bureau
https://tasdeeq.com/My main contributions included the ETL process, EDW architecture, and aggregated data set (ADS), using technologies such as Vertica, IBM InfoSphere DataStage, and IBM Cognos.
I developed the model's architecture that was successfully launched with over 90% accuracy and efficiency. It features the ETL process of extracting loan/lease transactional data from 70+ financial institutions and loading the transformed data into the DWH.
I also prepared an ADS to aid the formation of the statistical model. It can predict and categorize the customer/borrower as either good, average, or bad and assign a default probability to every customer based on the information provided to the model.
Unified Analytics Solution for Locallogy
To achieve this, I designed a robust data flow incorporating data from various APIs, such as Screaming Frog and AWR (Advanced Web Ranking). I created a potent combination of data sources by integrating Google Analytics and Google Search Console with BigQuery. Additionally, I implemented an interface on Google Sheets, seamlessly connected to BigQuery, facilitating collaboration across the organization. This adaptable solution allowed the team to efficiently insert, update, and delete data from BigQuery, enhancing their ability to deliver robust digital marketing solutions for their clients.
From a technical perspective, I leveraged several Google Cloud Platform (GCP) services, including Compute Engine, Cloud Scheduler, and BigQuery. I employed VM instances running Python ETL scripts to capture and process data, ensuring seamless integration with the APIs.
Optimizing Decision-making with M&M Data Warehouse Architecture
By streamlining the information flow, it enables better insights and more informed decision-making. The M&M Data Warehouse architecture encompasses multiple areas, including the extract, transform, load (ETL) process, which ensures seamless data incorporation into the warehouse after transformation and cleaning.
Data Engineering for an Online Cosmetic Retailer
https://eu.drsturm.com/BI Solution for Crowdbotics
https://www.crowdbotics.com/By implementing an end-to-end data pipeline, I successfully gathered, transformed, and loaded data from various sources, such as Heroku, Stripe, HubSpot, Toggl, and Google Sheets, into a centralized storage system BigQuery Data warehouse. The system included comprehensive dashboards tailored for executives, project managers, team leads, and recruitment departments, enabling them to make quick, informed, data-driven decisions.
I used DBT and SQL in this project, integrating with GitHub. To accommodate the substantial volume of daily data generated by over 5000+ projects and 1000+ developers, I designed and implemented efficient ETL (Extract, Transform, Load) jobs.
These jobs seamlessly combined data from multiple sources, resulting in a unified and reliable source of truth for the organization. Through this implementation, the company gained a comprehensive and streamlined data infrastructure, providing accurate insights for enhanced decision-making.
Technically, the core data model was designed using DBT, GCP Bigquery SQL, and Looker.
Education
Bachelor's Degree in Computer Science
University of Central Punjab (UCP) - Lahore, Pakistan
Certifications
Google Professional Cloud Architect
Google Cloud
Microsoft Certified Azure Data Engineer Associate
Microsoft Learn
Professional Data Engineer
Google Cloud
Advanced Google Analytics
Skills
Libraries/APIs
PySpark, REST APIs, Node.js, Pandas, Stripe
Tools
Talend ETL, Tableau, BigQuery, Apache Airflow, Google Compute Engine (GCE), Microsoft Excel, Screaming Frog, Google Sheets, Looker, Terraform, IBM InfoSphere (DataStage), Google Analytics, Microsoft Power BI, Power Query, Google Cloud Dataproc, Apache Beam, Git, PyCharm, IBM Cognos, GitHub, Cloud Dataflow, Google Kubernetes Engine (GKE), Jira, Stitch Data, Toggl, Zapier, Spark SQL, Google Cloud Composer, AWS Glue
Languages
Python, SQL, C++, Python 3, Snowflake, Java, JavaScript, R
Paradigms
ETL, Business Intelligence (BI), OLAP, ETL Implementation & Design, Search Engine Optimization (SEO)
Platforms
Google Cloud Platform (GCP), Azure, HubSpot, Amazon Web Services (AWS), AWS Lambda, Microsoft Fabric, Amazon, Databricks, Docker, Linux, Heroku, Azure Synapse, Mixpanel
Storage
Neo4j, PostgreSQL, Data Pipelines, Docker Cloud, Graph Databases, MySQL, Data Integration, Google Cloud, MongoDB, Data Lake Design, OLTP, Redshift, Database Replication, SQL Server DBA, SQL Server Integration Services (SSIS), Azure Cosmos DB, Vertica, Google Cloud Storage, Database Administration (DBA), Database Performance, Microsoft SQL Server, Databases, Database Migration, Database Architecture, Data Lakes
Frameworks
Spark, Apache Spark, Scrapy, Selenium
Other
Data Analytics, Data Visualization, ETL Tools, Google Data Studio, Google Cloud Functions, Data Engineering, Google BigQuery, Data Warehousing, Dashboards, Data Analysis, Google SEO, Technical Project Management, Data Warehouse Design, Data Build Tool (dbt), SAP, Data Modeling, Back-end Development, APIs, Parquet, Looker Studio, API Integration, English, Query Optimization, Debugging, IT Support, Data Migration, Data Transformation, Sales, Google Search Console, SEO Tools, Analytics, Ad-hoc Reporting, Amazon RDS, DAX, Real Estate, Pub/Sub, Performance Tuning, Data-level Security, Machine Learning, Full-stack, Scalability, Startups, Google Tag Manager, Marketing Reports, Data Reporting, Fivetran, Big Data, AWS Cloud Architecture, Data Science, Web Development, Google Analytics 4, Google Pub/Sub, Streaming Data, Google Container Engine, HubSpot CRM, AWR, Designing for Data, Azure Databricks, Azure Data Factory (ADF), eCommerce, Google Cloud Dataflow, Web Analytics, Social Media Web Traffic, Digital Marketing, Database Analytics, Data Processing, Unstructured Data Analysis, Dashboard Development, Reporting, Data Architecture, Sharding, Architecture, Back-end, Cloud Infrastructure, Data Structures, Advisory, Consulting, ELT, Infrastructure, Data, Data Flows, Business Intelligence (BI) Platforms, Marketing Analytics, Amazon Neptune, Data Processing Automation, Metadata, Marketing Strategy, Marketing Campaigns, Marketing Technology (MarTech), TikTok
How to Work with Toptal
Toptal matches you directly with global industry experts from our network in hours—not weeks or months.
Share your needs
Choose your talent
Start your risk-free talent trial
Top talent is in high demand.
Start hiring