Syed Akbar Naqvi
Verified Expert in Engineering
Data Engineer and Software Developer
Muscat, Muscat Governorate, Oman
Toptal member since June 18, 2020
Syed has over 18 years of experience working as a database developer, data engineer, data architect, and data analyst in the banking, insurance, retail, and agronomy sectors. He's designed and developed solutions for a high-performance multi-terabyte DWH on different technology stacks, including Oracle, SQL, PL/SQL, PostgreSQL, Redshift, AWS, DWH, Python, PySpark, Kafka, and other data-related tools. Syed is always excited about challenging projects where he can deliver collateral success.
Portfolio
Experience
- ETL - 10 years
- Data Modeling - 10 years
- ETL Implementation & Design - 9 years
- Amazon Web Services (AWS) - 8 years
- Python - 6 years
- Redshift - 6 years
- AWS Glue - 4 years
- Snowflake - 3 years
Availability
Preferred Environment
Amazon Web Services (AWS), Amazon EC2, Linux, PL/SQL, Python 3, Apache Airflow, Apache Kafka, Redshift, Snowflake, Amazon RDS
The most amazing...
...thing I've done was a real-time DWH labor scheduler which involves multiple DBs and environments; the code interacts with data from different sources.
Work Experience
Data Engineer
CMA CGM
- Worked on designing and developing AWS infrastructure for the entire project. Set up development, UAT, and production environments on different AWS accounts.
- Set up access and security roles, developed Glue jobs for large data pipelines, and implemented event-driven architecture.
- Worked on building design and development of data architecture for Snowflake. Worked on designing and building a data warehouse and data pipelines to process data from multiple sources to Snowflake. Extensively worked with Snowflake DWH features.
- Worked on building IaC and CI/CD pipelines using Terraform, Jenkins, Gitlab, AWS, Snowflake, and other tools.
- Integrated ETL pipelines using Airflow DAGs and Tasks for multiple projects and implemented improvements to existing libraries to be used across the organization.
Data Engineer
Varda AG
- Designed the architecture for asynchronous and synchronous flows for the main REST API for data ingestion of geospatial data from the file and user.
- Worked on multiple modules using Python, SQL, PostgreSQL, and PostGIS to handle incoming data validation and processing large GeoJSON and shape files.
- Built the data model for storing boundary and field id-related data. The data model forms the main source of truth for all incoming and outgoing data.
- Worked on the POC data warehouse for bulk data delivery and monitoring purposes on the Snowflake database.
- Designed and developed the data model for geometric data and built a data pipeline using Python, SQL, Confluent Kafka, DBT, and Airflow process data in near real time.
Senior Data Architect
ShopCircle
- Developed a Python library to extract data from Shopify APIs and load it into the Snowflake database stage table.
- Managed the Snowflake admin to create and modify database components such as users, schema, tables, views, permissions, etc.
- Designed and developed a data model for Shopify event data and built KPIs for the Tableau dashboard.
- Created models in dbt using SQL extensively to transform the data and load the data to the Snowflake database for reporting purposes.
- Orchestrated the ETL pipeline to run every hour and process the delta data using DAGs in Airflow.
- Built multiple KPIs like ARR, MRR, Cohort, Churn, Active Users, etc., and created complex charts to utilize these KPIs.
- Created multiple dashboards and charts in Tableau Desktop and deployed them on Tableau Cloud.
Data Engineer
Yara International - DNU - Varda (formerly Shared Data Exchange: SDX (ODX))
- Participated in major design decisions to develop pipelines for processing large geospatial datasets by reading Redshift and pushing data to Amazon DocumentDB and Amazon S3 bucket for read-only access.
- Developed complex pipelines to build data catalog for use with WebUI.
- Led the design of the data model for soil sample data for Amazon DocumentDB and worked on developing the pipeline to be used with Apache Airflow.
AWS QuickSight Expert
CartHook Inc
- Developed data marts and queries for reports and dashboards using SQL and Python to aggregate data beforehand.
- Built multiple charts including but not limited to ADU MRU, Cohort, ARR, MRR, etc.
- Designed and developed high-performance queries on existing tables of Redshift so that the results are shown in a click second.
Data Engineer
Yara International
- Worked independently on ETL pipeline development and improvement for event-based data from multiple sources. The warehouse housed approximately 50 data sources to be consolidated into one schema table after cleanup and normalization.
- Migrated the old ETL pipeline with a few data sources to a new stack with more data sources.
- Improved the performance of overall ETL pipelines by rewriting performant Redshift SQL queries.
- Built multiple DAGs and orchestrated them using Airflow.
AWS Redshift Expert
CartHook Inc
- Designed and developed the complete DWH for analytical reporting independently.
- Developed and designed ETL pipelines for data transformation in the near real-time to be used with a custom dashboard and Quicksight.
- Optimized the performance of the database and queries to perform massive operations in milliseconds, resulting in a lower cost of Redshift Infrastructure.
Data Engineer
PepsiCo Global - PepsiCo International Limited
- Worked on an application that was a POC for the UK region to find the best stores where Pepsico has its products displayed on shelves for sale; the product was called Perfect Store.
- Developed the data pipeline that will be used to process substantial images and data taken from each store and transform it into insights to set up the Perfect Store product for Pepsico.
- Used Azure Databricks, Data Factory, and PySpark to develop pipelines for processing and enriching data from Nielsen and Trax.
Senior Technical Architect
Nexgen Technology Services Pvt Ltd
- Designed and developed ETL customization to US retailers for their retail merchandising system for their daily business analytics on Oracle database using ORDM, OWB, PLSQL, and Oracle Scheduler. Coded the complex business logic in SQL and PLSQL.
- Designed and developed the ETL for an AWS-cloud-based DWH using AWS Redshift. Integrated the data from multiple sources into one source of truth like Flat files on SFTP, AWS S3, Google Analytics extracts, and IBM Silverpop.
- Rigorously used open-source technologies like Python, TOS DI, SOS scheduler, and others to minimize the cost of operations.
- Created well-maintained end-to-end architecture for the data flow from different sources that can execute independently without or with minimal user interaction.
- Performed the day-to-day maintenance and recommendation tasks on multiple platforms, including Unix, Linux Windows, AWS, Redshift, Oracle, and other database administration activities.
- Implemented performance tuning of queries and code as and when required.
- Led the team to sort out the issues on all technical aspects of the database and ETL-related tasks.
- Designed and developed the data model for a large project related to the labor management in retail.
Technical Writer
IAmOnDemand (via Toptal)
- Worked on approximately 15 articles for technology people like, CIO, database administrators, developers, cloud architects, and so on.
- Wrote several excellent articles (5-10 pages long), with the table of contents, etc. All articles were published and read by thousands. Some topics include Cloud Skills Set, AWS Redshift, RDS vs. On-prem DBaaS, and Aurora vs. RDS, to name a few.
- Fact-checked the article content and ensured it was not plagiarized.
Senior Consultant
Capgemini Consulting India Pvt Ltd
- Supported a large Java development team of 50 or more people with writing Oracle database queries—creating views, procedures and functions. Worked as part of the core database team to deliver different use-cases.
- Created hundreds of Oracle procedures and packages for all DML operation for one of the top clients in PSU sector in Netherlands. This was done using dynamic SQL to speedup the development.
- Designed and worked on the deliverables for one of the top PSU sector company in Netherlands which later resulted in bigger better monitory gain to the organization.
- Worked as the only DBA to support all the instances of the Oracle databases used by the projects. Tasks involved setting up the database, loading data, and tuning the performance for the development team.
- Worked on site with a client in the Netherlands for requirement gathering and deployment of projects.
- Worked as a moderator to deliver a complex and challenging project related to a near real-time DWH. This involved the use of an Oracle stream and programming to load data from the OLTP environment to OLAP environment.
Experience
RMS Connector
It easily integrates the RMS data feeds into ORDM for all levels of sales and inventory reporting and is used by the top retailers in the US and Central America.
Work Done:
• Built all the ETL and data flow from Flat files to the Oracle database using the Oracle Warehouse Builder.
• Developed PLSQL packages and procedures for all new files from RMS to ORDM.
• Made recommendations and was involved in the planning and setup of the database architecture for production.
• Tuned the database and report performance.
• Set up ETL automation using Oracle Scheduler Chains.
Customer Segmentation
The architecture includes:
• Redshift: for the storage and reporting of data
• Python: for data processing based on user input
• SOS Belin Scheduler: a scheduler for executing Python scripts based on user inputs asynchronously
• UI: for user input
• Tableau: for reporting on segments created
• Environment: EC2, Redshift
My job was to design and develop the end-to-end data flow and the required APIs for data processing.
Flow:
1) The user creates a segment model by selecting different KPIs related to the customer and then submits the job.
2) Then submits it by calling SOS REST and the Python library for customer segmentation.
3) Examines the input provided in the REST call and, based on that, makes the next decision.
4) The data is then processed and is ready to be picked up by Tableau.
An initial data load is required for customer transactions and KPI preparation.
Store Operations
Work done:
• Developed the data model for layering, including the dimensions, facts, and aggregates.
• Built ETL procedures using Talend, PLINK, Python, SOS, and Berlin Scheduler.
• Wrote shell scripts to manage the data feeds, and they used Python scripts to process the files from Amazon S3 to a Redshift database.
Labor Scheduler
Work done:
• Created the end-to-end data model.
• Developed the procedures and APIs for the data operation from the REST APIs.
• Implemented version control in the DB rows.
• Set up AWS RDS PostgreSQL to keep costs within parameters and get high throughput.
• Set up data interactions between multiple databases using PostgreSQL database links to a Redshift database for extracting analytical data.
• Rigorously used PL/pgSQL, Python, PostgreSQL, and Redshift for managing data.
IX Marketing
Work done:
• Created the data model and set up the environment using the Redshift database.
• Extracted and loaded data from multiple sources like SFTP, Amazon S3, Google Analytics API, and IBM Silverpop.
• Created Python scripts to automate data loading.
Ministrie Van Defencie
Work done:
• Implemented new design recommended during POC.
• Installed Oracle streams between Oracle 10g and Oracle 9i databases for real-time extractions.
• Developed the new logic for ETL processes for near real-time transformation in ODM.
• Set up batch jobs for the periodical load of transformed data into CDM from ODM for Cognos reporting.
• Configured selected PeopleSoft HRMS tables for the Streams configuration.
• Performance-tuned their current system.
Eneco Energies
Work done:
• Successfully designed and implemented an MVS system into the SOA-enabled architecture.
• Tuned the physical model for performance improvement in ETL processes.
• Successfully segregated all the objects pertaining to one functional area into separate databases amounting to 300GB out of 2TB.
• Worked on data modeling, physical design, and database administration.
• Performance-tuned the largest tables with up to 150 partitions, amounting to 400GB alone.
• Developed a mechanism to automate the setup of a testing environment.
Policy Administration System of the Netherlands
Role: Database Team Member | DBA
Work Done:
• Worked on all activities of an Oracle DBA and developer, from logical and physical design, administration, PL/SQL, and scripting to communication with the front office about the CRs and use cases.
• Created and modified the DSS and OLTP physical data model.
• Performed database administration—sizing, backup recovery strategy planning, and implementation.
• Database design and administration.
• Database maintenance and release activities.
• Performance-tuned.
Global FieldID
http://varda.agI designed the data model, architecture flow, and building data pipelines. Tools and technologies used, Python, SQL, PGSQL, PostgreSQL, Redshift, Confluent Kafka, Apache Airflow, PostGIS, etc.
Field Stories
http://varda.agUsing this data, the farmers can make informed decisions about using chemicals and fertilizers to achieve optimal cultivation and increased harvest. I analyzed the data and built models and pipelines for data from different sources.
I was also responsible for transforming the data and mapping it with different Global Boundary IDs to provide easy access from the GUI.
Shop Circle
I worked on an internal but very important dashboard that business owners use to identify app performance and then suggest business improvements to the client.
I built the data model for KPIs to be used for creating dashboards and charts using Tableau Online. I built the transformations using SQL, DBT, Snowflake, and Python. I developed charts for multiple KPIs like MRR, ARR, Churn, APR, Cohorts, etc., to name a few.
Perfect Store
The POC should identify the best-performing store and then the shelf and product placement, competitor product sales, rows, and location of the shelf, etc. All this data is then used to build and adjust the other stores so that similar kinds of sales can be achieved. I built the data pipelines using Azure Stack, Azure Databricks, Azure Data Lake, and Azure Data Factory.
Education
Master's Degree in Computer Science
Vinayaka Missions University - Patna, India
Certifications
1Z0-052 Oracle Database 11g Admin - 1
Oracle University
Skills
Libraries/APIs
PySpark, Segment.io
Tools
AWS Deployment, SOS Berlin Scheduler, Toad, Erwin, Postman, Amazon QuickSight, Amazon CloudWatch, AWS IAM, Talend ETL, Apache Airflow, Confluence, Jira, Grafana, Periscope Data, AWS Glue, Tableau, Terraform, Jenkins
Languages
SQL, Snowflake, Stored Procedure, PL/pgSQL, Python, Python 3
Paradigms
ETL, ETL Implementation & Design, Business Intelligence (BI), Serverless Architecture, DevOps
Platforms
Amazon Web Services (AWS), Linux, Azure, AWS Lambda, Unix, Amazon EC2, Windows, Talend, AIX, Databricks, Docker, Apache Kafka, Shopify, Oracle
Storage
Database as a Service (DBaaS), PostgreSQL, Oracle PL/SQL, Redshift, Oracle Rdb, PL/SQL, JSON, Database Architecture, RDBMS, Data Pipelines, Databases, MySQL, Amazon Aurora, Oracle DBA, Relational Databases, Amazon S3 (AWS S3), Datadog, Oracle SQL, Database Administration (DBA), PostGIS, Database Modeling, MongoDB
Frameworks
Spark
Other
Data Warehousing, Data Warehouse Design, Technical Architecture, Data Analysis, Writing & Editing, Data Modeling, Data Engineering, Performance Tuning, CSV, Amazon RDS, Data Architecture, Business Intelligence (BI) Platforms, Database Optimization, Data Analytics, Data, ELT, Oracle Streams, Shell Scripting, Virtualization, APIs, Lambda Functions, Data Science, Data Visualization, BI Reporting, eCommerce, Geospatial Data, Back-end Development, Exploratory Data Analysis, MySQL DBA, Data Build Tool (dbt), DocumentDB, Uber H3, Azure Data Factory, Computer Science, Azure Databricks, Microsoft Azure
How to Work with Toptal
Toptal matches you directly with global industry experts from our network in hours—not weeks or months.
Share your needs
Choose your talent
Start your risk-free talent trial
Top talent is in high demand.
Start hiring