Syed Akbar Naqvi
Verified Expert in Engineering
Data Engineer and Software Developer
Syed has over 18 years of experience working as a database developer, data engineer, data architect, and data analyst in the banking, insurance, retail, and agronomy sectors. He's designed and developed solutions for a high-performance multi-terabyte DWH on different technology stacks, including Oracle, SQL, PL/SQL, PostgreSQL, Redshift, AWS, DWH, Python, PySpark, Kafka, and other data-related tools. Syed is always excited about challenging projects where he can deliver collateral success.
Amazon Web Services (AWS), Amazon EC2, Linux, PL/SQL, Python 3, Apache Airflow, Apache Kafka, Redshift, Snowflake, Amazon RDS
The most amazing...
...thing I've done was a real-time DWH labor scheduler which involves multiple DBs and environments; the code interacts with data from different sources.
- Designed the architecture for asynchronous and synchronous flows for the main REST API for data ingestion of geospatial data from the file and user.
- Worked on multiple modules using Python, SQL, PostgreSQL, and PostGIS to handle incoming data validation and processing large GeoJSON and shape files.
- Built the data model for storing boundary and field id-related data. The data model forms the main source of truth for all incoming and outgoing data.
- Worked on the POC data warehouse for bulk data delivery and monitoring purposes on the Snowflake database.
- Designed and developed the data model for geometric data and built a data pipeline using Python, SQL, Confluent Kafka, DBT, and Airflow process data in near real time.
Senior Data Architect
- Developed a Python library to extract data from Shopify APIs and load it into the Snowflake database stage table.
- Managed the Snowflake admin to create and modify database components such as users, schema, tables, views, permissions, etc.
- Designed and developed a data model for Shopify event data and built KPIs for the Tableau dashboard.
- Created models in dbt using SQL extensively to transform the data and load the data to the Snowflake database for reporting purposes.
- Orchestrated the ETL pipeline to run every hour and process the delta data using DAGs in Airflow.
- Built multiple KPIs like ARR, MRR, Cohort, Churn, Active Users, etc., and created complex charts to utilize these KPIs.
- Created multiple dashboards and charts in Tableau Desktop and deployed them on Tableau Cloud.
Yara International - DNU - Varda (formerly Shared Data Exchange: SDX (ODX))
- Participated in major design decisions to develop pipelines for processing large geospatial datasets by reading Redshift and pushing data to Amazon DocumentDB and Amazon S3 bucket for read-only access.
- Developed complex pipelines to build data catalog for use with WebUI.
- Led the design of the data model for soil sample data for Amazon DocumentDB and worked on developing the pipeline to be used with Apache Airflow.
AWS QuickSight Expert
- Developed data marts and queries for reports and dashboards using SQL and Python to aggregate data beforehand.
- Built multiple charts including but not limited to ADU MRU, Cohort, ARR, MRR, etc.
- Designed and developed high-performance queries on existing tables of Redshift so that the results are shown in a click second.
- Worked independently on ETL pipeline development and improvement for event-based data from multiple sources. The warehouse housed approximately 50 data sources to be consolidated into one schema table after cleanup and normalization.
- Migrated the old ETL pipeline with a few data sources to a new stack with more data sources.
- Improved the performance of overall ETL pipelines by rewriting performant Redshift SQL queries.
- Built multiple DAGs and orchestrated them using Airflow.
AWS Redshift Expert
- Designed and developed the complete DWH for analytical reporting independently.
- Developed and designed ETL pipelines for data transformation in the near real-time to be used with a custom dashboard and Quicksight.
- Optimized the performance of the database and queries to perform massive operations in milliseconds, resulting in a lower cost of Redshift Infrastructure.
PepsiCo Global - PepsiCo International Limited
- Worked on an application that was a POC for the UK region to find the best stores where Pepsico has its products displayed on shelves for sale; the product was called Perfect Store.
- Developed the data pipeline that will be used to process substantial images and data taken from each store and transform it into insights to set up the Perfect Store product for Pepsico.
- Used Azure Databricks, Data Factory, and PySpark to develop pipelines for processing and enriching data from Nielsen and Trax.
Senior Technical Architect
Nexgen Technology Services Pvt Ltd
- Designed and developed ETL customization to US retailers for their retail merchandising system for their daily business analytics on Oracle database using ORDM, OWB, PLSQL, and Oracle Scheduler. Coded the complex business logic in SQL and PLSQL.
- Designed and developed the ETL for an AWS-cloud-based DWH using AWS Redshift. Integrated the data from multiple sources into one source of truth like Flat files on SFTP, AWS S3, Google Analytics extracts, and IBM Silverpop.
- Rigorously used open-source technologies like Python, TOS DI, SOS scheduler, and others to minimize the cost of operations.
- Created well-maintained end-to-end architecture for the data flow from different sources that can execute independently without or with minimal user interaction.
- Performed the day-to-day maintenance and recommendation tasks on multiple platforms, including Unix, Linux Windows, AWS, Redshift, Oracle, and other database administration activities.
- Implemented performance tuning of queries and code as and when required.
- Led the team to sort out the issues on all technical aspects of the database and ETL-related tasks.
- Designed and developed the data model for a large project related to the labor management in retail.
IAmOnDemand (via Toptal)
- Worked on approximately 15 articles for technology people like, CIO, database administrators, developers, cloud architects, and so on.
- Wrote several excellent articles (5-10 pages long), with the table of contents, etc. All articles were published and read by thousands. Some topics include Cloud Skills Set, AWS Redshift, RDS vs. On-prem DBaaS, and Aurora vs. RDS, to name a few.
- Fact-checked the article content and ensured it was not plagiarized.
Capgemini Consulting India Pvt Ltd
- Supported a large Java development team of 50 or more people with writing Oracle database queries—creating views, procedures and functions. Worked as part of the core database team to deliver different use-cases.
- Created hundreds of Oracle procedures and packages for all DML operation for one of the top clients in PSU sector in Netherlands. This was done using dynamic SQL to speedup the development.
- Designed and worked on the deliverables for one of the top PSU sector company in Netherlands which later resulted in bigger better monitory gain to the organization.
- Worked as the only DBA to support all the instances of the Oracle databases used by the projects. Tasks involved setting up the database, loading data, and tuning the performance for the development team.
- Worked on site with a client in the Netherlands for requirement gathering and deployment of projects.
- Worked as a moderator to deliver a complex and challenging project related to a near real-time DWH. This involved the use of an Oracle stream and programming to load data from the OLTP environment to OLAP environment.
It easily integrates the RMS data feeds into ORDM for all levels of sales and inventory reporting and is used by the top retailers in the US and Central America.
• Built all the ETL and data flow from Flat files to the Oracle database using the Oracle Warehouse Builder.
• Developed PLSQL packages and procedures for all new files from RMS to ORDM.
• Made recommendations and was involved in the planning and setup of the database architecture for production.
• Tuned the database and report performance.
• Set up ETL automation using Oracle Scheduler Chains.
The architecture includes:
• Redshift: for the storage and reporting of data
• Python: for data processing based on user input
• SOS Belin Scheduler: a scheduler for executing Python scripts based on user inputs asynchronously
• UI: for user input
• Tableau: for reporting on segments created
• Environment: EC2, Redshift
My job was to design and develop the end-to-end data flow and the required APIs for data processing.
1) The user creates a segment model by selecting different KPIs related to the customer and then submits the job.
2) Then submits it by calling SOS REST and the Python library for customer segmentation.
3) Examines the input provided in the REST call and, based on that, makes the next decision.
4) The data is then processed and is ready to be picked up by Tableau.
An initial data load is required for customer transactions and KPI preparation.
• Developed the data model for layering, including the dimensions, facts, and aggregates.
• Built ETL procedures using Talend, PLINK, Python, SOS, and Berlin Scheduler.
• Wrote shell scripts to manage the data feeds, and they used Python scripts to process the files from Amazon S3 to a Redshift database.
• Created the end-to-end data model.
• Developed the procedures and APIs for the data operation from the REST APIs.
• Implemented version control in the DB rows.
• Set up AWS RDS PostgreSQL to keep costs within parameters and get high throughput.
• Set up data interactions between multiple databases using PostgreSQL database links to a Redshift database for extracting analytical data.
• Rigorously used PL/pgSQL, Python, PostgreSQL, and Redshift for managing data.
• Created the data model and set up the environment using the Redshift database.
• Extracted and loaded data from multiple sources like SFTP, Amazon S3, Google Analytics API, and IBM Silverpop.
• Created Python scripts to automate data loading.
Ministrie Van Defencie
• Implemented new design recommended during POC.
• Installed Oracle streams between Oracle 10g and Oracle 9i databases for real-time extractions.
• Developed the new logic for ETL processes for near real-time transformation in ODM.
• Set up batch jobs for the periodical load of transformed data into CDM from ODM for Cognos reporting.
• Configured selected PeopleSoft HRMS tables for the Streams configuration.
• Performance-tuned their current system.
• Successfully designed and implemented an MVS system into the SOA-enabled architecture.
• Tuned the physical model for performance improvement in ETL processes.
• Successfully segregated all the objects pertaining to one functional area into separate databases amounting to 300GB out of 2TB.
• Worked on data modeling, physical design, and database administration.
• Performance-tuned the largest tables with up to 150 partitions, amounting to 400GB alone.
• Developed a mechanism to automate the setup of a testing environment.
Policy Administration System of the Netherlands
Role: Database Team Member | DBA
• Worked on all activities of an Oracle DBA and developer, from logical and physical design, administration, PL/SQL, and scripting to communication with the front office about the CRs and use cases.
• Created and modified the DSS and OLTP physical data model.
• Performed database administration—sizing, backup recovery strategy planning, and implementation.
• Database design and administration.
• Database maintenance and release activities.
I designed the data model, architecture flow, and building data pipelines. Tools and technologies used, Python, SQL, PGSQL, PostgreSQL, Redshift, Confluent Kafka, Apache Airflow, PostGIS, etc.
Using this data, the farmers can make informed decisions about using chemicals and fertilizers to achieve optimal cultivation and increased harvest. I analyzed the data and built models and pipelines for data from different sources.
I was also responsible for transforming the data and mapping it with different Global Boundary IDs to provide easy access from the GUI.
I worked on an internal but very important dashboard that business owners use to identify app performance and then suggest business improvements to the client.
I built the data model for KPIs to be used for creating dashboards and charts using Tableau Online. I built the transformations using SQL, DBT, Snowflake, and Python. I developed charts for multiple KPIs like MRR, ARR, Churn, APR, Cohorts, etc., to name a few.
The POC should identify the best-performing store and then the shelf and product placement, competitor product sales, rows, and location of the shelf, etc. All this data is then used to build and adjust the other stores so that similar kinds of sales can be achieved. I built the data pipelines using Azure Stack, Azure Databricks, Azure Data Lake, and Azure Data Factory.
SQL, Stored Procedure, PL/pgSQL, Python, Python 3, Snowflake
ETL, ETL Implementation & Design, Business Intelligence (BI), Serverless Architecture, Data Science, DevOps
Amazon Web Services (AWS), Linux, Azure, AWS Lambda, Unix, Amazon EC2, Windows, Talend, AIX, Databricks, Docker, Apache Kafka, Shopify, Oracle
PostgreSQL, Oracle PL/SQL, Redshift, Oracle Rdb, PL/SQL, JSON, Database Architecture, RDBMS, Data Pipelines, Databases, MySQL, Amazon Aurora, Oracle DBA, Relational Databases, Amazon S3 (AWS S3), Datadog, Oracle SQL, Database Administration (DBA), PostGIS, Database Modeling, MongoDB
Data Warehousing, Database as a Service (DBaaS), Data Warehouse Design, Technical Architecture, Data Analysis, Writing & Editing, Data Modeling, Data Engineering, Performance Tuning, CSV, Amazon RDS, Data Architecture, Business Intelligence (BI) Platforms, Database Optimization, Data Analytics, Data, ELT, Oracle Streams, Shell Scripting, Virtualization, APIs, Lambda Functions, Data Visualization, BI Reporting, eCommerce, Geospatial Data, Back-end Development, Exploratory Data Analysis, MySQL DBA, Data Build Tool (dbt), DocumentDB, Uber H3, Azure Data Factory, Computer Science, Azure Databricks, Microsoft Azure
AWS Deployment, SOS Berlin Scheduler, Toad, Erwin, Postman, Amazon QuickSight, Amazon CloudWatch, AWS IAM, Talend ETL, Apache Airflow, Confluence, Jira, Grafana, Periscope Data, AWS Glue, Tableau, Terraform
Master's Degree in Computer Science
Vinayaka Missions University - Patna, India
1Z0-052 Oracle Database 11g Admin - 1