Biswadip Paul, Developer in Mumbai, Maharashtra, India
Biswadip is available for hire
Hire Biswadip

Biswadip Paul

Verified Expert  in Engineering

Data Engineer and ETL Developer

Location
Mumbai, Maharashtra, India
Toptal Member Since
January 22, 2021

Biswadip is a SAS and Advanced SAS Certified Developer with 22 years of experience in business intelligence, technology selection, architecture, pre-sales, consulting, and project management. He excels in developing software architecture and execution involving the real-time processing of big data, including geospatial data. A career highlight of Biswadip's is when he developed an easy-to-use data visualization solution—overlaid data over Google Maps—for customers to consume data.

Portfolio

PepsiCo Global - Main
SQL, Apache Airflow, Data Build Tool (dbt), Python, GitHub, GitHub Pages...
Chegg
Data Engineering, SQL, Apache Airflow, Data, Data Modeling, ETL, Python...
Quenext
Google Cloud SQL, DevOps, Machine Learning Operations (MLOps), Data Pipelines...

Experience

Availability

Part-time

Preferred Environment

Tableau Desktop Pro, Python 3, PL/SQL, Pentaho Data Integration (Kettle), Oracle, MySQL, PostgreSQL, Linux, Data Engineering, ETL

The most amazing...

...project I've handled was the full-stack development and architecture for an Agriwatch product, which involved overlaying visualizations over Google Maps.

Work Experience

Data Engineer

2023 - 2024
PepsiCo Global - Main
  • Designed and Developed two data marts in Snowflake to move sales and operational data from different sources using dbt for transformation and Airflow as the orchestrator using SQL and Python.
  • Developed dbt macros that can be used organization-wide, resulting in code reusability and decreasing development time for other teams. After the macro was made, the 2nd datamart was created end-to-end in eight hours.
  • Oversaw and coordinated with various teams, from data owners to the infrastructure and security requirements and approval team across multiple time zones; the risks were averted, and the project was finished on time.
  • Recognized by the client who is ready to provide recommendations if required in the future.
  • Docker with kubernetes in local machine as dev environment for quick development.
  • Managed Jira Tickets, developed and linked to Github commits.
Technologies: SQL, Apache Airflow, Data Build Tool (dbt), Python, GitHub, GitHub Pages, GitHub Actions, Docker, Snowflake, Azure Blobs, Amazon S3 (AWS S3), Data Processing, English, Query Optimization, Database Replication

Data/BI Engineer Lead

2021 - 2023
Chegg
  • Designed and developed datamarts for various Chegg businesses. Created generic frameworks in Python OOPs for data loading from various REST APIs.
  • Tracked and fixed bugs using the Jira tool. Tracked tasks using Kanban in Atlassian.
  • Created a new reliable ETL flow to remove a bottleneck and dependency on another team. The framework reflects the corresponding datatypes and data length (even Unicode length) in the source system, automatically adding columns based on the source.
  • Created a data quality framework to check the consistency and freshness of the data loaded. Used and integrated Git as part of the Databricks jobs, used Airflow for orchestration, and triggered refreshes of Tableau reports.
  • Added CI/CD to move new changes from dev to production.
  • Optimized data pipelines using DBT, reducing processing time by 30%. Designed efficient data models, enhancing collaboration Data validation tests in DBT, leading to a 20% decrease in data-related errors and ensuring high data quality standards.
  • Created a generic SCD2 framework in Spark. This framework creates data marts for Calendly data using Calendly REST API. Other uses but not limited to loading data from Microsoft 360 cloud drive Excels to data mart.
  • Orchestrated using Apache Airflow. Created fanout jobs (parallel running jobs source system Salesforce and destination being Redshift) using Databricks operator. XCOM was used to make the number of jobs being passed dynamically.
Technologies: Data Engineering, SQL, Apache Airflow, Data, Data Modeling, ETL, Python, Data Science, Amazon Web Services (AWS), Databricks, Data Build Tool (dbt), ETL Development, Redshift, Tableau, Tableau Server, Tableau Desktop Pro, Orchestration, ELT, Data Analytics, Looker, Snowflake, PySpark, Apache Spark, Spark, Hadoop, Database Structure, Dashboard Development, Scripting, Automation, Education, Dynamic SQL, Data Migration, ETL Tools, Data Integration, Apache Hive, Presto, BI Reporting, Linux, Entity Relationships, Solution Architecture, Data Structures, Data Validation, Database Security, Scala, Data Lakes, Relational Databases, Data Transformation, APIs, Excel 365, Microsoft Excel, OLAP, Architecture, Web Crawlers, Scraping, Web Scraping, Amazon DynamoDB, Data Scientist, GitHub, Data Aggregation, Large Data Sets, SSH, REST APIs, Pipelines, Data Processing, English, Query Optimization, Database Replication

Technology Director | Co-founder (Software Architecture and Full-stack Data Engineering)

2012 - 2020
Quenext
  • Developed two products from end to end. One product is for the power sector, forecasting power demand and portfolio optimization. The other product is for the agricultural sector that does the historical analysis of plots based on satellite images.
  • Contributed to the development of the front end and data visualization.
  • Implemented a microservice architecture using Flask.
  • Ensured that the forecast accuracy of the power product was 98%; certified by the client UPCL.
  • Received the following patent: Power Demand Forecasting Patent No US10468883B2.
Technologies: Google Cloud SQL, DevOps, Machine Learning Operations (MLOps), Data Pipelines, Data Engineering, Quality Assurance (QA), Data Cleaning, Data Cleansing, Data Analysis, Complex Data Analysis, Python, Big Data, Data Modeling, Data Mining, Performance Tuning, Data Management, SQL, ETL, Business Intelligence (BI), Database Modeling, Database Design, Computer Vision, Machine Learning, Analytics, Amazon Web Services (AWS), Celery, RabbitMQ, Amazon S3 (AWS S3), D3.js, Bootstrap 3, HTML5, CSS, JavaScript, Google Cloud Platform (GCP), OpenCV, NumPy, Pandas, PostgreSQL, PostGIS, MySQL, Bash, Flask, Python 3, Python 2, Databases, CSV File Processing, Spatial Databases, Reports, Database Development, Data Profiling, Data Architecture, MongoDB, Microsoft Power BI, Reporting, GIS, Geospatial Data, Geospatial Analytics, Big Data Architecture, Data Governance, Data Gathering, NoSQL, Data Visualization, Apache Airflow, Data, ETL Development, ETL Implementation & Design, Google Data Studio, CSV, REST APIs, BigQuery, Realtime, Customer Data, Node.js, Apache Kafka, Data Build Tool (dbt), Back-end Development, Amazon RDS, Database Architecture, Database Performance, Orchestration, Docker, Amazon CloudWatch, ELT, Minimum Viable Product (MVP), Data Analytics, Business Analytics, PySpark, Apache Spark, Spark, Amazon EC2, Predictive Modeling, Hadoop, Database Structure, Database Transactions, Transactions, Dashboard Development, Scripting, Automation, MongoDB Atlas, Dynamic SQL, ETL Tools, Data Integration, Apache Hive, Presto, BI Reporting, GeoJSON, GeoPandas, Shapely, Linux, Entity Relationships, Solution Architecture, Data Structures, Data Validation, Database Security, Google BigQuery, Cloud Migration, AWS Lambda, Relational Databases, Data Transformation, AWS Data Pipeline Service, Message Queues, Amazon Athena, APIs, RESTful Microservices, Excel 365, Microsoft Excel, OLTP, OLAP, ELK (Elastic Stack), Kubernetes, SQL Stored Procedures, Architecture, Regulatory Compliance, Web Crawlers, Scraping, Web Scraping, Data Scientist, GitHub, Data Aggregation, Large Data Sets, Startups, Startup Funding, Venture Capital, Venture Funding, SSH, Pipelines, Data Processing, English, Query Optimization

Chief Architect (Architecture, Data Engineering, ETL, Data Modeling, Database Design, Dashboards)

2010 - 2012
Mzaya, Pvt., Ltd.
  • Handled the first implementation of a customer intelligence product at a large scale in the Nation Stock Exchange and Reliance Mutual Fund.
  • Created insightful data visualizations in Panopticon and Tableau.
  • Implemented the large-scale handling of data volume while using the existing hardware.
Technologies: Machine Learning Operations (MLOps), Data Pipelines, Data Engineering, Quality Assurance (QA), Data Cleansing, Data Cleaning, Data Analysis, Complex Data Analysis, Data Warehousing, Data Warehouse Design, Big Data, Data Mining, Database Modeling, Performance Tuning, Data Management, PL/SQL, SQL, ETL, Master Data Management (MDM), Analytics, Business Intelligence (BI), Oracle, SAS, Base SAS, Tableau, Dashboards, Databases, Database Development, Reports, CSV File Processing, Data Profiling, Data Architecture, Reporting, Big Data Architecture, Data Governance, Data Visualization, Data, ETL Development, ETL Implementation & Design, CSV, Customer Data, Database Architecture, Database Performance, Orchestration, ELT, Data Analytics, Business Analytics, Database Structure, Database Transactions, Scripting, Automation, Dynamic SQL, Data Migration, ETL Tools, Data Integration, BI Reporting, Linux, Entity Relationships, Solution Architecture, Data Structures, Data Validation, Database Security, Relational Databases, Data Transformation, Microsoft Excel, OLAP, SQL Stored Procedures, Architecture, Web Crawlers, Scraping, Web Scraping, Data Scientist, Data Aggregation, Large Data Sets, SSH, Pipelines, Data Processing, English, Query Optimization

Software Specialist (ETL, Data Modeling, Database Design, Dashboard, MDM, Reports)

2009 - 2010
SAS R&D India, Pvt., Ltd.
  • Developed a SAS IIS (insurance intelligence solution) cross-sell, up-sell, and customer retention/segmentation solution for the forthcoming release.
  • Constructed and automated a SAS log reader so the source data and target data columns transformation documents could be automatically generated.
  • Managed and mentored four team members in ETL development and review.
Technologies: Data Warehousing, Data Warehouse Design, Data Mining, Data Modeling, Performance Tuning, Data Management, SQL, ETL, Oracle, Unix, Database Modeling, Business Intelligence (BI), SAS Metadata Server, SAS Data Integration (DI) Studio, SAS, Base SAS, Databases, Database Development, Reports, Data Profiling, Data Architecture, Reporting, Big Data Architecture, Data Governance, Data, Big Data, ETL Development, ETL Implementation & Design, Data Analysis, Database Architecture, Orchestration, ELT, Data Analytics, Database Structure, Scripting, Automation, Dynamic SQL, ETL Tools, Data Integration, BI Reporting, Entity Relationships, Data Structures, Data Validation, Database Security, Relational Databases, Data Transformation, Microsoft Excel, OLAP, SQL Stored Procedures, Oracle Database, Architecture, Data Aggregation, SSH, Pipelines, Data Processing, English, Query Optimization

Software Specialist (ETL, Data Modeling, Database Design, Dashboards, Reports, MDM)

2007 - 2008
SAS India Institute, Pvt., Ltd.
  • Implemented the pilot of a SAS OpRisk solution (operation risk) for Axis Bank.
  • Worked on the SAS credit scoring UI, reports, and chart enhancement while consulting for SAS R&D.
  • Earned my Base SAS certification within six months of joining.
Technologies: Data Pipelines, Data Engineering, Quality Assurance (QA), Data Cleansing, Data Cleaning, Data Analysis, Complex Data Analysis, Data Warehouse Design, Data Warehousing, Data Modeling, Data Mining, Performance Tuning, Data Management, Database Modeling, Unix, SQL, ETL, Business Intelligence (BI), CSS, HTML, JavaScript, Java, Unix Shell Scripting, VMware, Flux, SAS Data Integration (DI) Studio, SAS, Base SAS, Databases, Database Development, CSV File Processing, Reports, Data Profiling, Data Architecture, Reporting, Data Governance, Data Visualization, Data, ETL Development, ETL Implementation & Design, Customer Data, Database Architecture, Orchestration, ELT, Data Analytics, Database Structure, Dashboard Development, Scripting, Automation, Dynamic SQL, ETL Tools, Data Integration, BI Reporting, Linux, Entity Relationships, Data Structures, Data Validation, Database Security, Relational Databases, Data Transformation, Microsoft Excel, OLAP, SQL Stored Procedures, Oracle Database, Architecture, Data Aggregation, SSH, Pipelines, Data Processing, English, Query Optimization

Associate Consultant (Oracle, ETL, Database Design, Business Intelligence, Reporting, Dashboards)

2000 - 2007
Tata Consultancy Services
  • Executed a Six Sigma DMADV project with no defects during the implementation of GEMMS software in a GE Superabrasives Florida manufacturing plant.
  • Facilitated the installation of Oracle 9i RAC and Oracle Apps by creating the necessary environment in Redhat. One of the early adopters of Oracle 9i RAC in Red Hat Linux RH3. Migrated the data over from Oracle 7 to Oracle 9i.
  • Led a team of 30 in the creation of a new data warehouse for GE APAC. Manage, Design , Develop and Deployment of database schemas, stored procedures, triggers and ETL.
  • Created intuitive BusinessObjects and Tableau dashboards and reports.
  • Sybase development and implementation of database schemas, stored procedures, and triggers, enhancing application functionalities.
  • Participated in the migration of Sybase versions for improved data management and system efficiency.
  • Windows 2000 Batch scripting to push the power builder application to all the machines that were on the shop floor and used the application. To keep the version of the software across all the machines.
Technologies: Data Pipelines, Data Engineering, Quality Assurance (QA), Data Analysis, Data Cleaning, Data Cleansing, Data Warehouse Design, Data Warehousing, Data Modeling, Data Mining, Data Management, Performance Tuning, Business Intelligence (BI), SQL, ETL, PL/SQL Tuning, Database Design, Database Modeling, PowerBuilder, Oracle, Manufacturing, FTP, Shell Scripting, SAP BusinessObjects Data Integrator, SAP BusinessObjects (BO), PL/SQL Developer, Oracle PL/SQL, Tableau, Dashboards, Databases, Database Development, CSV File Processing, Reports, Data Profiling, Data Architecture, Microsoft Access, Reporting, Data Governance, Data Visualization, Data, ETL Development, ETL Implementation & Design, Customer Data, Database Architecture, Database Performance, Orchestration, ELT, Data Analytics, Database Structure, Database Transactions, Transactions, Dashboard Development, Scripting, Automation, Dynamic SQL, Data Migration, ETL Tools, Data Integration, BI Reporting, Autosys, Linux, Entity Relationships, Data Structures, Data Validation, Database Security, Relational Databases, Data Transformation, Microsoft Excel, OLTP, OLAP, SQL Stored Procedures, Oracle Database, Architecture, Data Aggregation, SSH, Pipelines, Data Processing, English, Query Optimization, Database Replication, Sybase

AgriWatch

A SaaS product developed completely in-house using open source tools and technologies and deployed in GCP well as AWS. It is a tool that virtually digitizes the farming land records and gives insight into the productivity of the land-based on satellite images and gives a score to the land relatively in a radius of 100 kilometers. All the data is overlaid over Google Maps with an easy-to-use and intuitive overlay of data and visualization over Google Maps. HDFC Bank has used it to make decisions regarding farming loans and save the costs involved in manually checking the farm.

Powermatics/EnergyWatch

This is an application for electricity distribution companies and it forecasts, a day ahead, the power requirements for 96 time-blocks (15 minutes) and optimizes the purchase portfolio by considering the power exchange market prices. It also provides closed envelope bidding prices for the market and provides intuitive data visualization using D3.js and a visualization-based data manipulation directly from the plots.

Brand-new Enterprise Data Warehouse Creation

As an architect and project manager, I led the creation of a brand-new data warehouse for the APAC region for GE using Oracle as a database and Business Object Data Integrator.

My work ranged from designing to team management as well as coordinating with six different countries project module leaders and coordinators to make the project an eventual success for the GE Plastics APAC region.

Edtech Data Warehouse and Datamarts of Various Chegg Businesses

I handled ETL's data warehouse and datamart design and development for various Chegg businesses, including calling Salesforce, Calendly, Microsoft Azure Drive, Databricks, and Tableau APIs to make things simple and consistent. The technology stack used was Airflow, Databricks, DBT, Redshift, Tableau, and Looker.

Languages

Python 3, SAS, SQL, Python, PowerBuilder, JavaScript, HTML, Python 2, Bash, HTML5, Snowflake, C, Fortran, Java, CSS, Scala

Tools

SAS Data Integration (DI) Studio, Tableau, Microsoft Power BI, GIS, Pentaho Data Integration (Kettle), Tableau Desktop Pro, VMware, Microsoft Access, Apache Airflow, BigQuery, GitHub, Looker, MongoDB Atlas, Autosys, Microsoft Excel, RabbitMQ, Celery, Amazon CloudWatch, Amazon Athena, ELK (Elastic Stack), GitHub Pages

Paradigms

Business Intelligence (BI), Database Design, ETL, Spatial Databases, Database Development, ETL Implementation & Design, OLAP, DevOps, Automation, Mechanical Design, Data Science

Platforms

Oracle, Google Cloud Platform (GCP), Amazon Web Services (AWS), Oracle Database, Linux, Unix, Databricks, Amazon EC2, AWS Lambda, Apache Kafka, Salesforce, Docker, Kubernetes, Jet Admin

Storage

PostGIS, PostgreSQL, MySQL, PL/SQL, Master Data Management (MDM), Database Modeling, Data Pipelines, Google Cloud SQL, Databases, Database Architecture, Database Structure, Dynamic SQL, Data Integration, Relational Databases, OLTP, SQL Stored Procedures, Oracle PL/SQL, PL/SQL Developer, Amazon S3 (AWS S3), MongoDB, NoSQL, Database Performance, Database Transactions, Apache Hive, Data Validation, Database Security, Data Lakes, AWS Data Pipeline Service, Amazon DynamoDB, Database Replication, Sybase, GeoServer, Redshift, Azure Blobs

Other

Base SAS, FTP, Business, Data Management, Data Modeling, Big Data, Data Warehousing, Complex Data Analysis, Data Analysis, Data Cleansing, Data Cleaning, Data Engineering, Machine Learning Operations (MLOps), Dashboards, CSV File Processing, Reports, Data Profiling, Data Architecture, Data Warehouse Design, Reporting, BI Reporting, Data Governance, Data Gathering, Data, ETL Development, CSV, Customer Data, ELT, Data Analytics, Scripting, Data Migration, ETL Tools, Entity Relationships, Solution Architecture, Data Transformation, Message Queues, Data Aggregation, Pipelines, Data Processing, English, Query Optimization, Shell Scripting, SAP BusinessObjects (BO), SAP BusinessObjects Data Integrator, Unix Shell Scripting, SAS Metadata Server, Computer Vision, SAP BusinessObjects Data Service (BODS), PL/SQL Tuning, Performance Tuning, Data Mining, Geospatial Data, Big Data Architecture, Data Visualization, Google Data Studio, Manufacturing, Data Build Tool (dbt), Back-end Development, Amazon RDS, Orchestration, Minimum Viable Product (MVP), Business Analytics, Predictive Modeling, Transactions, Dashboard Development, Education, GeoJSON, GeoPandas, Data Structures, Google BigQuery, Cloud Migration, APIs, RESTful Microservices, Excel 365, Architecture, Regulatory Compliance, Web Crawlers, Scraping, Web Scraping, Data Scientist, Large Data Sets, Startups, Startup Funding, Venture Capital, Venture Funding, SSH, Statistical Quality Control (SQC), Machine Design, Fluid Dynamics, Industrial Engineering, Operations Research, PLC, Analytics, Machine Learning, Quality Assurance (QA), Geospatial Analytics, Tableau Server, GitHub Actions

Frameworks

Flux, Flask, Realtime, Apache Spark, Spark, Hadoop, Presto, Bootstrap 3

Libraries/APIs

Pandas, NumPy, OpenCV, Google Maps API, REST APIs, PySpark, Shapely, D3.js, Plotly.js, Node.js

1994 - 2000

Bachelor of Technology Degree in Mechanical Engineering

North Eastern Regional Institute of Science and Technology - Nirjuli, Itanagar, Arunachal Pradesh, India

MARCH 2009 - PRESENT

SAS Certified Advanced Programmer for SAS9

SAS

MARCH 2008 - PRESENT

SAS Certified Base Programmer for SAS9

SAS

Collaboration That Works

How to Work with Toptal

Toptal matches you directly with global industry experts from our network in hours—not weeks or months.

1

Share your needs

Discuss your requirements and refine your scope in a call with a Toptal domain expert.
2

Choose your talent

Get a short list of expertly matched talent within 24 hours to review, interview, and choose from.
3

Start your risk-free talent trial

Work with your chosen talent on a trial basis for up to two weeks. Pay only if you decide to hire them.

Top talent is in high demand.

Start hiring