Syed Akbar Naqvi, Developer in Muscat, Muscat Governorate, Oman
Syed is available for hire
Hire Syed

Syed Akbar Naqvi

Verified Expert  in Engineering

Data Engineer and Software Developer

Muscat, Muscat Governorate, Oman

Toptal member since June 18, 2020

Bio

Syed has over 18 years of experience working as a database developer, data engineer, data architect, and data analyst in the banking, insurance, retail, and agronomy sectors. He's designed and developed solutions for a high-performance multi-terabyte DWH on different technology stacks, including Oracle, SQL, PL/SQL, PostgreSQL, Redshift, AWS, DWH, Python, PySpark, Kafka, and other data-related tools. Syed is always excited about challenging projects where he can deliver collateral success.

Portfolio

CMA CGM
Python, SQL, ETL, Amazon Web Services (AWS), Apache Airflow, Snowflake, Jenkins...
Varda AG
Data Engineering, SQL, ETL, Redshift, Python, Amazon RDS, Confluence, Jira...
ShopCircle
Snowflake, Data Build Tool (dbt), Tableau, SQL, ETL, Shopify, Data Architecture...

Experience

  • ETL - 10 years
  • Data Modeling - 10 years
  • ETL Implementation & Design - 9 years
  • Amazon Web Services (AWS) - 8 years
  • Python - 6 years
  • Redshift - 6 years
  • AWS Glue - 4 years
  • Snowflake - 3 years

Availability

Part-time

Preferred Environment

Amazon Web Services (AWS), Amazon EC2, Linux, PL/SQL, Python 3, Apache Airflow, Apache Kafka, Redshift, Snowflake, Amazon RDS

The most amazing...

...thing I've done was a real-time DWH labor scheduler which involves multiple DBs and environments; the code interacts with data from different sources.

Work Experience

Data Engineer

2023 - PRESENT
CMA CGM
  • Worked on designing and developing AWS infrastructure for the entire project. Set up development, UAT, and production environments on different AWS accounts.
  • Set up access and security roles, developed Glue jobs for large data pipelines, and implemented event-driven architecture.
  • Worked on building design and development of data architecture for Snowflake. Worked on designing and building a data warehouse and data pipelines to process data from multiple sources to Snowflake. Extensively worked with Snowflake DWH features.
  • Worked on building IaC and CI/CD pipelines using Terraform, Jenkins, Gitlab, AWS, Snowflake, and other tools.
  • Integrated ETL pipelines using Airflow DAGs and Tasks for multiple projects and implemented improvements to existing libraries to be used across the organization.
Technologies: Python, SQL, ETL, Amazon Web Services (AWS), Apache Airflow, Snowflake, Jenkins, AWS Glue, Terraform

Data Engineer

2022 - 2023
Varda AG
  • Designed the architecture for asynchronous and synchronous flows for the main REST API for data ingestion of geospatial data from the file and user.
  • Worked on multiple modules using Python, SQL, PostgreSQL, and PostGIS to handle incoming data validation and processing large GeoJSON and shape files.
  • Built the data model for storing boundary and field id-related data. The data model forms the main source of truth for all incoming and outgoing data.
  • Worked on the POC data warehouse for bulk data delivery and monitoring purposes on the Snowflake database.
  • Designed and developed the data model for geometric data and built a data pipeline using Python, SQL, Confluent Kafka, DBT, and Airflow process data in near real time.
Technologies: Data Engineering, SQL, ETL, Redshift, Python, Amazon RDS, Confluence, Jira, PostgreSQL, Grafana, PostGIS, Docker, Amazon Web Services (AWS), PL/SQL, Apache Kafka, Data Architecture, APIs, Stored Procedure, AWS Lambda, Lambda Functions, Databases, Geospatial Data, Technical Architecture, Relational Databases, Data, Snowflake, ELT

Senior Data Architect

2022 - 2022
ShopCircle
  • Developed a Python library to extract data from Shopify APIs and load it into the Snowflake database stage table.
  • Managed the Snowflake admin to create and modify database components such as users, schema, tables, views, permissions, etc.
  • Designed and developed a data model for Shopify event data and built KPIs for the Tableau dashboard.
  • Created models in dbt using SQL extensively to transform the data and load the data to the Snowflake database for reporting purposes.
  • Orchestrated the ETL pipeline to run every hour and process the delta data using DAGs in Airflow.
  • Built multiple KPIs like ARR, MRR, Cohort, Churn, Active Users, etc., and created complex charts to utilize these KPIs.
  • Created multiple dashboards and charts in Tableau Desktop and deployed them on Tableau Cloud.
Technologies: Snowflake, Data Build Tool (dbt), Tableau, SQL, ETL, Shopify, Data Architecture, APIs, Business Intelligence (BI) Platforms, Databases, Data Pipelines, Database Optimization, Database Modeling, Data Analysis, Technical Architecture, Relational Databases, Data, ELT

Data Engineer

2021 - 2022
Yara International - DNU - Varda (formerly Shared Data Exchange: SDX (ODX))
  • Participated in major design decisions to develop pipelines for processing large geospatial datasets by reading Redshift and pushing data to Amazon DocumentDB and Amazon S3 bucket for read-only access.
  • Developed complex pipelines to build data catalog for use with WebUI.
  • Led the design of the data model for soil sample data for Amazon DocumentDB and worked on developing the pipeline to be used with Apache Airflow.
Technologies: Data Engineering, SQL, Redshift, Amazon RDS, Python, Apache Kafka, ETL, Apache Airflow, Data Modeling, Amazon Web Services (AWS), DocumentDB, Jira, Uber H3, Data Architecture, APIs, Stored Procedure, AWS Lambda, Lambda Functions, Databases, MongoDB, Data Pipelines, Database Optimization, Technical Architecture, Data Build Tool (dbt), Relational Databases, Data, ELT, AWS Glue

AWS QuickSight Expert

2021 - 2021
CartHook Inc
  • Developed data marts and queries for reports and dashboards using SQL and Python to aggregate data beforehand.
  • Built multiple charts including but not limited to ADU MRU, Cohort, ARR, MRR, etc.
  • Designed and developed high-performance queries on existing tables of Redshift so that the results are shown in a click second.
Technologies: Amazon Web Services (AWS), Amazon QuickSight, Periscope Data, Redshift, SQL, Python 3, Data Architecture, Business Intelligence (BI) Platforms, Databases, Database Optimization, Datadog, Database Modeling, Data Analysis, Relational Databases, Data, ELT

Data Engineer

2020 - 2021
Yara International
  • Worked independently on ETL pipeline development and improvement for event-based data from multiple sources. The warehouse housed approximately 50 data sources to be consolidated into one schema table after cleanup and normalization.
  • Migrated the old ETL pipeline with a few data sources to a new stack with more data sources.
  • Improved the performance of overall ETL pipelines by rewriting performant Redshift SQL queries.
  • Built multiple DAGs and orchestrated them using Airflow.
Technologies: Data Engineering, Python, SQL, Amazon Web Services (AWS), Apache Kafka, Apache Airflow, PostgreSQL, Redshift, Segment.io, Data Architecture, Databases, Data Pipelines, Database Optimization, Data Analysis, Data Build Tool (dbt), PySpark, Relational Databases, Data, ELT, AWS Glue

AWS Redshift Expert

2020 - 2020
CartHook Inc
  • Designed and developed the complete DWH for analytical reporting independently.
  • Developed and designed ETL pipelines for data transformation in the near real-time to be used with a custom dashboard and Quicksight.
  • Optimized the performance of the database and queries to perform massive operations in milliseconds, resulting in a lower cost of Redshift Infrastructure.
Technologies: SQL, Redshift, Amazon Web Services (AWS), ETL, SOS Berlin Scheduler, AWS Glue, Data Architecture, Stored Procedure, Business Intelligence (BI) Platforms, Databases, Data Pipelines, Database Optimization, Datadog, Database Modeling, Data Analysis, Relational Databases, Data, ELT

Data Engineer

2020 - 2020
PepsiCo Global - PepsiCo International Limited
  • Worked on an application that was a POC for the UK region to find the best stores where Pepsico has its products displayed on shelves for sale; the product was called Perfect Store.
  • Developed the data pipeline that will be used to process substantial images and data taken from each store and transform it into insights to set up the Perfect Store product for Pepsico.
  • Used Azure Databricks, Data Factory, and PySpark to develop pipelines for processing and enriching data from Nielsen and Trax.
Technologies: Data Engineering, Python, Databricks, Azure Data Factory, PySpark, Azure Databricks, Microsoft Azure, Azure, Databases, Data Pipelines, Technical Architecture, Relational Databases, Data, ELT

Senior Technical Architect

2011 - 2019
Nexgen Technology Services Pvt Ltd
  • Designed and developed ETL customization to US retailers for their retail merchandising system for their daily business analytics on Oracle database using ORDM, OWB, PLSQL, and Oracle Scheduler. Coded the complex business logic in SQL and PLSQL.
  • Designed and developed the ETL for an AWS-cloud-based DWH using AWS Redshift. Integrated the data from multiple sources into one source of truth like Flat files on SFTP, AWS S3, Google Analytics extracts, and IBM Silverpop.
  • Rigorously used open-source technologies like Python, TOS DI, SOS scheduler, and others to minimize the cost of operations.
  • Created well-maintained end-to-end architecture for the data flow from different sources that can execute independently without or with minimal user interaction.
  • Performed the day-to-day maintenance and recommendation tasks on multiple platforms, including Unix, Linux Windows, AWS, Redshift, Oracle, and other database administration activities.
  • Implemented performance tuning of queries and code as and when required.
  • Led the team to sort out the issues on all technical aspects of the database and ETL-related tasks.
  • Designed and developed the data model for a large project related to the labor management in retail.
Technologies: Amazon Web Services (AWS), ETL, SOS Berlin Scheduler, Database Administration (DBA), Business Intelligence (BI), Data Warehouse Design, Data Warehousing, Data Modeling, Talend, Python, PostgreSQL, Redshift, Oracle SQL, Data Engineering, Oracle PL/SQL, SQL, PL/pgSQL, Amazon S3 (AWS S3), Technical Architecture, ETL Implementation & Design, Database Architecture, RDBMS, Data Architecture, Stored Procedure, Business Intelligence (BI) Platforms, Databases, Data Pipelines, Database Optimization, Database Modeling, Oracle, Relational Databases, Data

Technical Writer

2016 - 2018
IAmOnDemand (via Toptal)
  • Worked on approximately 15 articles for technology people like, CIO, database administrators, developers, cloud architects, and so on.
  • Wrote several excellent articles (5-10 pages long), with the table of contents, etc. All articles were published and read by thousands. Some topics include Cloud Skills Set, AWS Redshift, RDS vs. On-prem DBaaS, and Aurora vs. RDS, to name a few.
  • Fact-checked the article content and ensured it was not plagiarized.
Technologies: Amazon Web Services (AWS), DevOps, Database as a Service (DBaaS), Redshift, Databases

Senior Consultant

2005 - 2009
Capgemini Consulting India Pvt Ltd
  • Supported a large Java development team of 50 or more people with writing Oracle database queries—creating views, procedures and functions. Worked as part of the core database team to deliver different use-cases.
  • Created hundreds of Oracle procedures and packages for all DML operation for one of the top clients in PSU sector in Netherlands. This was done using dynamic SQL to speedup the development.
  • Designed and worked on the deliverables for one of the top PSU sector company in Netherlands which later resulted in bigger better monitory gain to the organization.
  • Worked as the only DBA to support all the instances of the Oracle databases used by the projects. Tasks involved setting up the database, loading data, and tuning the performance for the development team.
  • Worked on site with a client in the Netherlands for requirement gathering and deployment of projects.
  • Worked as a moderator to deliver a complex and challenging project related to a near real-time DWH. This involved the use of an Oracle stream and programming to load data from the OLTP environment to OLAP environment.
Technologies: Shell Scripting, Database Administration (DBA), Erwin, Oracle Streams, Toad, Linux, AIX, PL/SQL, Oracle SQL, Stored Procedure, Databases, Database Modeling, Oracle PL/SQL, Oracle, Relational Databases, Data

RMS Connector

RMS Connector works as an interface between a retail merchandising system (RMS) and Oracle retail data warehouse (ORDM).

It easily integrates the RMS data feeds into ORDM for all levels of sales and inventory reporting and is used by the top retailers in the US and Central America.

Work Done:
• Built all the ETL and data flow from Flat files to the Oracle database using the Oracle Warehouse Builder.
• Developed PLSQL packages and procedures for all new files from RMS to ORDM.
• Made recommendations and was involved in the planning and setup of the database architecture for production.
• Tuned the database and report performance.
• Set up ETL automation using Oracle Scheduler Chains.

Customer Segmentation

This project is a SaaS application running on AWS.

The architecture includes:
• Redshift: for the storage and reporting of data
• Python: for data processing based on user input
• SOS Belin Scheduler: a scheduler for executing Python scripts based on user inputs asynchronously
• UI: for user input
• Tableau: for reporting on segments created
• Environment: EC2, Redshift

My job was to design and develop the end-to-end data flow and the required APIs for data processing.

Flow:
1) The user creates a segment model by selecting different KPIs related to the customer and then submits the job.
2) Then submits it by calling SOS REST and the Python library for customer segmentation.
3) Examines the input provided in the REST call and, based on that, makes the next decision.
4) The data is then processed and is ready to be picked up by Tableau.

An initial data load is required for customer transactions and KPI preparation.

Store Operations

This is a mobile-based application for the near real-time view to store the conversion in terms of sales, traffic, conversion, and associate contributions crucial for any store manager.

Work done:
• Developed the data model for layering, including the dimensions, facts, and aggregates.
• Built ETL procedures using Talend, PLINK, Python, SOS, and Berlin Scheduler.
• Wrote shell scripts to manage the data feeds, and they used Python scripts to process the files from Amazon S3 to a Redshift database.

Labor Scheduler

Labor Scheduler helps the store manager connect predicted traffic demand with staff productivity and availability to optimize conversion.

Work done:
• Created the end-to-end data model.
• Developed the procedures and APIs for the data operation from the REST APIs.
• Implemented version control in the DB rows.
• Set up AWS RDS PostgreSQL to keep costs within parameters and get high throughput.
• Set up data interactions between multiple databases using PostgreSQL database links to a Redshift database for extracting analytical data.
• Rigorously used PL/pgSQL, Python, PostgreSQL, and Redshift for managing data.

IX Marketing

The IX Marketing module helps retailers understand and make the right decision at the right time to increase sales with customers online and in the store.

Work done:
• Created the data model and set up the environment using the Redshift database.
• Extracted and loaded data from multiple sources like SFTP, Amazon S3, Google Analytics API, and IBM Silverpop.
• Created Python scripts to automate data loading.

Ministrie Van Defencie

MOD is the biggest employer in the Netherlands, with nearly 90,000 employees worldwide.

Work done:
• Implemented new design recommended during POC.
• Installed Oracle streams between Oracle 10g and Oracle 9i databases for real-time extractions.
• Developed the new logic for ETL processes for near real-time transformation in ODM.
• Set up batch jobs for the periodical load of transformed data into CDM from ODM for Cognos reporting.
• Configured selected PeopleSoft HRMS tables for the Streams configuration.
• Performance-tuned their current system.

Eneco Energies

Eneco Energies is one of the largest energy companies in the Netherlands.

Work done:
• Successfully designed and implemented an MVS system into the SOA-enabled architecture.
• Tuned the physical model for performance improvement in ETL processes.
• Successfully segregated all the objects pertaining to one functional area into separate databases amounting to 300GB out of 2TB.
• Worked on data modeling, physical design, and database administration.
• Performance-tuned the largest tables with up to 150 partitions, amounting to 400GB alone.
• Developed a mechanism to automate the setup of a testing environment.

Policy Administration System of the Netherlands

The policy administration system interfaces primarily with the tax and administration department and is integral to reintegrating work processes and employee tax/income information processing.

Role: Database Team Member | DBA
Work Done:
• Worked on all activities of an Oracle DBA and developer, from logical and physical design, administration, PL/SQL, and scripting to communication with the front office about the CRs and use cases.
• Created and modified the DSS and OLTP physical data model.
• Performed database administration—sizing, backup recovery strategy planning, and implementation.
• Database design and administration.
• Database maintenance and release activities.
• Performance-tuned.

Global FieldID

http://varda.ag
FieldID is a centralized database for processing millions of boundaries and assigning a Unique ID for the given boundary and its associate Field. FieldID's primary goal is to provide field-level agronomy data that different farmers and food producers can use in making informed farming decisions and controlling carbon compliance.

I designed the data model, architecture flow, and building data pipelines. Tools and technologies used, Python, SQL, PGSQL, PostgreSQL, Redshift, Confluent Kafka, Apache Airflow, PostGIS, etc.

Field Stories

http://varda.ag
Field Stories provides end users access to large and processed agronomy datasets which could be purchased by the farmers or food producers. The tools provide a GUI with maps and boundaries with different biological data about the soil of selected fields.

Using this data, the farmers can make informed decisions about using chemicals and fertilizers to achieve optimal cultivation and increased harvest. I analyzed the data and built models and pipelines for data from different sources.

I was also responsible for transforming the data and mapping it with different Global Boundary IDs to provide easy access from the GUI.

Shop Circle

Shop Circle is a startup with a main focus on the enhancement of Shopify marketplace applications. They buy and enhance already running and perfuming applications by upscaling their capabilities by capturing and mining event data from Shopify.

I worked on an internal but very important dashboard that business owners use to identify app performance and then suggest business improvements to the client.

I built the data model for KPIs to be used for creating dashboards and charts using Tableau Online. I built the transformations using SQL, DBT, Snowflake, and Python. I developed charts for multiple KPIs like MRR, ARR, Churn, APR, Cohorts, etc., to name a few.

Perfect Store

Perfect Store project was a high stake project done for identifying the best-performing stores based on parameters such as how the Pepsico products are placed inside any particular store and the overall sales performance for all the stores.

The POC should identify the best-performing store and then the shelf and product placement, competitor product sales, rows, and location of the shelf, etc. All this data is then used to build and adjust the other stores so that similar kinds of sales can be achieved. I built the data pipelines using Azure Stack, Azure Databricks, Azure Data Lake, and Azure Data Factory.
2005 - 2007

Master's Degree in Computer Science

Vinayaka Missions University - Patna, India

SEPTEMBER 2013 - PRESENT

1Z0-052 Oracle Database 11g Admin - 1

Oracle University

Libraries/APIs

PySpark, Segment.io

Tools

AWS Deployment, SOS Berlin Scheduler, Toad, Erwin, Postman, Amazon QuickSight, Amazon CloudWatch, AWS IAM, Talend ETL, Apache Airflow, Confluence, Jira, Grafana, Periscope Data, AWS Glue, Tableau, Terraform, Jenkins

Languages

SQL, Snowflake, Stored Procedure, PL/pgSQL, Python, Python 3

Paradigms

ETL, ETL Implementation & Design, Business Intelligence (BI), Serverless Architecture, DevOps

Platforms

Amazon Web Services (AWS), Linux, Azure, AWS Lambda, Unix, Amazon EC2, Windows, Talend, AIX, Databricks, Docker, Apache Kafka, Shopify, Oracle

Storage

Database as a Service (DBaaS), PostgreSQL, Oracle PL/SQL, Redshift, Oracle Rdb, PL/SQL, JSON, Database Architecture, RDBMS, Data Pipelines, Databases, MySQL, Amazon Aurora, Oracle DBA, Relational Databases, Amazon S3 (AWS S3), Datadog, Oracle SQL, Database Administration (DBA), PostGIS, Database Modeling, MongoDB

Frameworks

Spark

Other

Data Warehousing, Data Warehouse Design, Technical Architecture, Data Analysis, Writing & Editing, Data Modeling, Data Engineering, Performance Tuning, CSV, Amazon RDS, Data Architecture, Business Intelligence (BI) Platforms, Database Optimization, Data Analytics, Data, ELT, Oracle Streams, Shell Scripting, Virtualization, APIs, Lambda Functions, Data Science, Data Visualization, BI Reporting, eCommerce, Geospatial Data, Back-end Development, Exploratory Data Analysis, MySQL DBA, Data Build Tool (dbt), DocumentDB, Uber H3, Azure Data Factory, Computer Science, Azure Databricks, Microsoft Azure

Collaboration That Works

How to Work with Toptal

Toptal matches you directly with global industry experts from our network in hours—not weeks or months.

1

Share your needs

Discuss your requirements and refine your scope in a call with a Toptal domain expert.
2

Choose your talent

Get a short list of expertly matched talent within 24 hours to review, interview, and choose from.
3

Start your risk-free talent trial

Work with your chosen talent on a trial basis for up to two weeks. Pay only if you decide to hire them.

Top talent is in high demand.

Start hiring