Biswadip Paul, Developer in Mumbai, Maharashtra, India
Biswadip is available for hire
Hire Biswadip

Biswadip Paul

Bio

Biswadip is a SAS and Advanced SAS Certified Developer with 22 years of experience in business intelligence, technology selection, architecture, pre-sales, consulting, and project management. He excels in developing software architecture and execution involving the real-time processing of big data, including geospatial data. A career highlight of Biswadip's is when he developed an easy-to-use data visualization solution—overlaid data over Google Maps—for customers to consume data.

Portfolio

The Interaction Company of California
ClickHouse, Database Optimization, Database Architecture, Cloud Storage...
Minh Nguyen
Python, Data Engineering, API Integration, JSON, Data Migration...
Oy Teboil Ab
Sybase, Database Replication, Databases, Architecture, Data Marts, Data Quality...

Experience

  • ETL - 20 years
  • Data Engineering - 20 years
  • Data Warehousing - 12 years
  • Oracle - 12 years
  • Data Warehouse Design - 12 years
  • PostgreSQL - 8 years
  • MySQL - 8 years
  • Python 3 - 3 years

Preferred Environment

Tableau Desktop Pro, Python 3, PL/SQL, Pentaho Data Integration (Kettle), Oracle, MySQL, PostgreSQL, Linux, Data Engineering, ETL

The most amazing...

...project I've handled was the full-stack development and architecture for an Agriwatch product, which involved overlaying visualizations over Google Maps.

Work Experience

ClickHouse Database Expert

2025 - 2025
The Interaction Company of California
  • Reduced the number of replica shards used from 6 to 3 to reduce the cost by 30%.
  • Optimized the table structure by removing fat columns into a separate table. Created projections by removing the asterisk in it. Reducing the query performance and memory used to decrease the ClickHouse cost by 20%.
  • Redesigned the table partition, ordering key, or primary key change based on executed queries, balancing the partitions to leverage sparse indexing, significantly reducing the amount of data read and speeding up the execution time by 4x.
Technologies: ClickHouse, Database Optimization, Database Architecture, Cloud Storage, Databases, Database Table Optimization, Query Optimization, Database Performance, Data Engineering, Data Warehousing, Database Design, Data Marts, EDA, Data Quality Analysis, A/B Testing, Data Warehouse Implementation, Attribution Modeling, Code Refactoring, Legacy Code

Python Senior Data Engineer

2025 - 2025
Minh Nguyen
  • Migrated data out of OneTrust into Transcend.io, enabling Transcend to simplify client migrations from OneTrust using their own data.
  • Identified all business areas requiring data extraction from OneTrust and mapped the data into the OneTrust system.
  • Created a CLI that takes user input to map columns to the OneTrust system and make the migration seamless.
Technologies: Python, Data Engineering, API Integration, JSON, Data Migration, Data & Backup Management, Data Quality, Pydantic, EDA, Google Sheets, Data Quality Analysis, Cursor AI, Pandas, Beautiful Soup

Developer

2025 - 2025
Oy Teboil Ab
  • Fixed Sybase replication and other system syncs to allow inventory transactions to resume in the depots.
  • Synchronized a system-wide shutdown and fix within a one-hour window.
  • Restored automation and saved and increased the speed of transactions and movement of a million-dollar inventory.
Technologies: Sybase, Database Replication, Databases, Architecture, Data Marts, Data Quality, EDA, Code Refactoring, Legacy Code

Senior Python Data Engineer

2025 - 2025
Transcend
  • Supported the pre-developed codes. Created videos and presentations.
  • Tested the pre-developed codes and fixed bugs and issues.
  • Created documentation and pushed to GitHub with the README and the architecture and scalability of the CLI developed.
Technologies: Python, Data Engineering, API Integration, JSON, Data Build Tool (dbt), Data Migration, Data & Backup Management, SQLite, DuckDB, TypeScript, Data Quality, Pydantic, EDA, Google Sheets, Data Quality Analysis, Cursor AI, Pandas, Beautiful Soup

Senior Data Engineer

2025 - 2025
ADOC International Trading
  • Deployed web scraping to AWS ECS Fargate with no overlap run prevention and with high frequency.
  • Deployed bronze medallion architecture deployment with parquet partitions.
  • Developed highly performant and cost-effective web crawlers in Python with the headless and headful Playwright library.
Technologies: Data Engineering, Amazon S3 (AWS S3), Python, Amazon Web Services (AWS), Docker, AWS ECS Fargate, Amazon Elastic Container Registry (ECR), Amazon CloudWatch, Serverless, Data Marts, Data Quality, EDA, Data Quality Analysis, Pandas, Beautiful Soup

Software Developer

2025 - 2025
Oy Teboil Ab
  • Resolved the client-server remote database syncing issue to ensure smooth operations at oil depots.
  • Fixed issues with Topsy Orders generated from the terminal, eliminating the need for manual updates by drivers.
  • Saved millions of dollars in long-term operational costs and facilitated the transport of oil inventory, especially crucial as some terminals are slated for permanent shutdown and have millions of liters of oil to be relocated.
Technologies: Sybase, Database Replication, Databases, Architecture, DataOps, Data Quality, EDA, Code Refactoring, Legacy Code

Senior Data Engineer

2025 - 2025
Transcend
  • Developed a command-line interface (CLI) that backs up data from OneTrust REST APIs for various subject areas and profiles.
  • Made it configurable using YAML, including dependencies between APIs (parent/child hierarchies and dependencies).
  • Created a translation layer to load them to transcend.io using the Transcend.io CLI and REST APIs.
Technologies: Python, Data Engineering, API Integration, JSON, Data Build Tool (dbt), Data Migration, Data & Backup Management, SQLite, DuckDB, Data Orchestration, ETL Pipelines, BI Reports, DataOps, TypeScript, Data Quality, Pydantic, EDA, Google Sheets, Data Quality Analysis, Cursor AI, Pandas, Beautiful Soup

Developer

2025 - 2025
Oy Teboil Ab
  • Assisted in diagnosing an issue with one of the loading arms at the oil depot by analyzing the Sybase database and reviewing Java logs.
  • Performed PLC machine and software checks and RS-232 serial communication checks.
  • Executed and documented hardware-based live test scenarios. The investigation concluded that faulty wiring was the likely cause, warranting further verification to confirm the findings.
Technologies: Sybase, Database Replication, Databases, Architecture, Back-end, NestJS, Prisma, Database Schema Design, SQL Performance, DataOps, Data Marts, Data Quality, EDA, Code Refactoring, Legacy Code

Senior Data Analytics Engineer

2024 - 2025
Twentyeight Health Inc.
  • Architected, designed, and developed a data warehouse from scratch for Twentyeighthealth.com.
  • Designed and implemented the data warehouse infrastructure using AWS services, including VPS, IAM, S3, Redshift, Secrets Manager, and AWS Glue. Also established a secure connection between Amazon Redshift and Heroku Postgres.
  • Conducted a feasibility proof of concept (POC) for Metabase as a cost-effective alternative to Tableau, aiming to reduce licensing expenses.
  • Collaborated with business and operations teams to deliver data efficiently, securely, and with proper governance, ensuring an optimized user experience while minimizing the reporting load on operational systems.
  • Resolved issues with corrupted Tableau connectors caused by a version upgrade by analyzing the underlying XML and automating the repair process using Python.
Technologies: Ruby, ETL, Tableau, Data Analytics, Amazon Redshift, PostgreSQL, Data Migration, SQL, Amazon DynamoDB, Amazon CloudWatch, Amazon S3 (AWS S3), Apache Parquet, AWS Glue, Amazon Elastic Container Service (ECS), AWS Fargate, Data Build Tool (dbt), Metabase, Amazon Aurora, Back-end, Cloud Environments, DuckDB, Cloud, Database Schema Design, Tableau Desktop, Amazon Simple Queue Service (SQS), Jinja, Star Schema, SQL Performance, Role-based Access Control (RBAC), Amazon RDS, Amazon QuickSight, Data Orchestration, ETL Pipelines, BI Reports, DataOps, EMR, Apache Superset, Superset, Serverless, ClickHouse, Cloud Storage, Database Table Optimization, Amazon EventBridge, Relational Database Design, Data Marts, Data Quality, EDA, Key Performance Indicators (KPIs), Kimball Methodology, Google Sheets, Database Normalization, Data Quality Analysis, Product Analytics, Data Warehouse Implementation, Technical Leadership, Pandas

Snowflake and Airflow Data Engineer

2024 - 2024
PepsiCo Global - Main
  • Supported the business for CDP workflows for customer profiling and segmentation and other use cases.
  • Transitioned an Apache Airflow job cluster to spot instances.
  • Tested the viability of Grafana to monitor airflow.
Technologies: Data Engineering, Python, SQL, Apache Airflow, Snowflake, Data Build Tool (dbt), Delta Lake, Grafana 2, Artificial Intelligence (AI), CI/CD Pipelines, Amazon Athena, Data Warehouse Testing, Database Optimization, API Integration, Back-end, Terraform, Cloud Environments, DuckDB, Cloud, Database Schema Design, Amazon Simple Queue Service (SQS), Jinja, Star Schema, SQL Performance, Data Orchestration, ETL Pipelines, AI Pipeline, Enterprise Data Warehouse (EDW), DataOps, Azure, Cloud Storage, Database Table Optimization, Amazon EventBridge, Data Marts, Data Quality, EDA, Data Quality Analysis, A/B Testing, Product Analytics, Data Warehouse Implementation, Attribution Modeling, Pandas, Beautiful Soup, Code Refactoring, Legacy Code

Data Engineer (Python/MongoDB)

2024 - 2024
Max Planck Institute
  • Developed scalable pipelines using MongoDB collections as sources and transformations in Python for moral human vs. machine data collected through an online questionnaire.
  • Created logging framework for the ETL for both delta and full load, leveraging MongoDB pipelines, Python Polars, and MongoDB bulk loader.
  • Used GitLab pipeline schedules to schedule the runs bi-monthly. Saved on infrastructure costs.
Technologies: Python, MongoDB, Data Extraction, ETL, Data Transformation, CI/CD Pipelines, Data Warehouse Testing, Database Optimization, Back-end, NoSQL, Cloud Environments, Cloud, Database Schema Design, Data Orchestration, ETL Pipelines, BI Reports, DataOps, Cloud Storage, Database Table Optimization, Data Marts, Data Quality, EDA, Data Quality Analysis, Data Warehouse Implementation, Pandas, Code Refactoring, Legacy Code

Developer

2024 - 2024
Oy Teboil Ab
  • Installed Sybase 17 in one of the depot servers. Ensured all the standards used in the installation mimic other depots using Sybase 16. Ensured that the prebuilt queries would work with issues.
  • Fixed client-server replication issues between the Head Quarter Sybase 16 server and multiple depot servers (an SQL remote system where the message type was FILE-based). I carefully found the gaps and made sure nothing failed.
  • Ensured the Java codes that use Sybase to connect to multiple systems, such as PLC, FTP, and printers, were tested and running as desired.
  • Fixed all the problems with almost zero documentation. We fixed all of the issues in live production. Implementing all the fixes took 20 days.
  • Appreciated by the client and currently working hourly for an indefinite period.
  • Documented the system and fixes. Now client has a document when previously they didn't have any.
Technologies: Sybase, Database Replication, Databases, Architecture, SQL Anywhere, Performance Optimization, Data Warehouse Testing, Database Optimization, Back-end, Cloud Environments, Cloud, Database Schema Design, SQL Performance, DataOps, Data Marts, Data Quality, EDA, Code Refactoring, Legacy Code

Data Engineer

2023 - 2024
PepsiCo Global - Main
  • Designed and Developed two data marts in Snowflake to move sales and operational data from different sources using dbt for transformation and Airflow as the orchestrator using SQL and Python.
  • Developed dbt macros that can be used organization-wide, resulting in code reusability and decreasing development time for other teams. After the macro was made, the 2nd datamart was created end-to-end in eight hours.
  • Oversaw and coordinated with various teams, from data owners to the infrastructure and security requirements and approval team across multiple time zones; the risks were averted, and the project was finished on time.
  • Recognized by the client who is ready to provide recommendations if required in the future.
  • Docker with kubernetes in local machine as dev environment for quick development.
  • Managed Jira Tickets, developed and linked to Github commits.
Technologies: SQL, Apache Airflow, Data Build Tool (dbt), Python, GitHub, GitHub Pages, GitHub Actions, Docker, Snowflake, Azure Blobs, Amazon S3 (AWS S3), Data Processing, English, Query Optimization, Database Replication, Digital Marketing, JSON, Distributed Systems, Git, Stitch Data, Data Extraction, Streamlit, Analytical Thinking, Requirements Analysis, Business Requirements, Performance Optimization, Artificial Intelligence (AI), CI/CD Pipelines, Data Warehouse Testing, Customer Data Platform (CDP), Database Optimization, API Integration, Back-end, Cloud Environments, Cloud, Database Schema Design, Amazon Simple Queue Service (SQS), Jinja, Star Schema, SQL Performance, Data Orchestration, ETL Pipelines, AI Pipeline, Enterprise Data Warehouse (EDW), DataOps, Cloud Storage, Database Table Optimization, Amazon EventBridge, Data Marts, Data Quality, EDA, Key Performance Indicators (KPIs), Kimball Methodology, Data Quality Analysis, Data Warehouse Implementation, Technical Leadership, Pandas, Beautiful Soup, Code Refactoring, Legacy Code

Data/BI Engineer Lead

2021 - 2023
Chegg
  • Designed and developed datamarts for various Chegg businesses. Created generic frameworks in Python OOPs for data loading from various REST APIs.
  • Tracked and fixed bugs using the Jira tool. Tracked tasks using Kanban in Atlassian.
  • Created a new reliable ETL flow to remove a bottleneck and dependency on another team. The framework reflects the corresponding datatypes and data length (even Unicode length) in the source system, automatically adding columns based on the source.
  • Created a data quality framework to check the consistency and freshness of the data loaded. Used and integrated Git as part of the Databricks jobs, used Airflow for orchestration, and triggered refreshes of Tableau reports.
  • Added CI/CD to move new changes from dev to production.
  • Optimized data pipelines using DBT, reducing processing time by 30%. Designed efficient data models, enhancing collaboration Data validation tests in DBT, leading to a 20% decrease in data-related errors and ensuring high data quality standards.
  • Created a generic SCD2 framework in Spark. This framework creates data marts for Calendly data using Calendly REST API. Other uses but not limited to loading data from Microsoft 360 cloud drive Excels to data mart.
  • Orchestrated using Apache Airflow. Created fanout jobs (parallel running jobs source system Salesforce and destination being Redshift) using Databricks operator. XCOM was used to make the number of jobs being passed dynamically.
Technologies: Data Engineering, SQL, Apache Airflow, Data, Data Modeling, ETL, Python, Data Science, Amazon Web Services (AWS), Databricks, Data Build Tool (dbt), ETL Development, Redshift, Tableau, Tableau Server, Tableau Desktop Pro, Orchestration, ELT, Data Analytics, Looker, Snowflake, PySpark, Apache Spark, Spark, Hadoop, Database Structure, Dashboard Development, Scripting, Automation, Education, Dynamic SQL, Data Migration, ETL Tools, Data Integration, Apache Hive, Presto, BI Reporting, Linux, Entity Relationships, Solution Architecture, Data Structures, Data Validation, Database Security, Scala, Data Lakes, Relational Databases, Data Transformation, APIs, Excel 365, Microsoft Excel, OLAP, Architecture, Web Crawlers, Scraping, Web Scraping, Amazon DynamoDB, Data Scientist, GitHub, Data Aggregation, Large Data Sets, SSH, REST APIs, Pipelines, Data Processing, English, Query Optimization, Database Replication, Digital Marketing, JSON, Data Scraping, Distributed Systems, Git, Scalability, Tableau Prep Flow, Data Extraction, Streamlit, Analytical Thinking, Requirements Analysis, Business Requirements, Amazon Redshift, Performance Optimization, Artificial Intelligence (AI), CI/CD Pipelines, Data Warehouse Testing, Database Optimization, API Integration, Looker Studio, Back-end, Cloud Environments, Cloud, Database Schema Design, Tableau Desktop, Amazon Simple Queue Service (SQS), Jinja, Star Schema, SQL Performance, Medallion Architecture, Matplotlib, Data Orchestration, ETL Pipelines, AI Pipeline, BI Reports, Enterprise Data Warehouse (EDW), Internet of Things (IoT), DataOps, EMR, Cloud Storage, Database Table Optimization, Amazon EventBridge, Relational Database Design, Amazon Managed Workflows for Apache Airflow (MWAA), Data Marts, Data Quality, EDA, Key Performance Indicators (KPIs), Kimball Methodology, Data Quality Analysis, Product Analytics, Data Warehouse Implementation, Attribution Modeling, Technical Leadership, Pandas, Beautiful Soup, Code Refactoring, Legacy Code

Technology Director | Co-founder (Software Architecture and Full-stack Data Engineering)

2012 - 2020
Quenext
  • Developed two products from end to end. One product is for the power sector, forecasting power demand and portfolio optimization. The other product is for the agricultural sector that does the historical analysis of plots based on satellite images.
  • Contributed to the development of the front end and data visualization.
  • Implemented a microservice architecture using Flask.
  • Ensured that the forecast accuracy of the power product was 98%; certified by the client UPCL.
  • Received the following patent: Power Demand Forecasting Patent No US10468883B2.
Technologies: Google Cloud SQL, DevOps, Machine Learning Operations (MLOps), Data Pipelines, Data Engineering, Quality Assurance (QA), Data Cleaning, Data Cleansing, Data Analysis, Complex Data Analysis, Python, Big Data, Data Modeling, Data Mining, Performance Tuning, Data Management, SQL, ETL, Business Intelligence (BI), Database Modeling, Database Design, Computer Vision, Machine Learning, Analytics, Amazon Web Services (AWS), Celery, RabbitMQ, Amazon S3 (AWS S3), D3.js, Bootstrap 3, HTML5, CSS, JavaScript, Google Cloud Platform (GCP), OpenCV, NumPy, Pandas, PostgreSQL, PostGIS, MySQL, Bash, Flask, Python 3, Python 2, Databases, CSV File Processing, Spatial Databases, Reports, Database Development, Data Profiling, Data Architecture, MongoDB, Microsoft Power BI, Reporting, GIS, Geospatial Data, Geospatial Analytics, Big Data Architecture, Data Governance, Data Gathering, NoSQL, Data Visualization, Apache Airflow, Data, ETL Development, ETL Implementation & Design, Google Data Studio, CSV, REST APIs, BigQuery, Real-time Data, Customer Data, Node.js, Apache Kafka, Data Build Tool (dbt), Back-end Development, Amazon RDS, Database Architecture, Database Performance, Orchestration, Docker, Amazon CloudWatch, ELT, Minimum Viable Product (MVP), Data Analytics, Business Analytics, PySpark, Apache Spark, Spark, Amazon EC2, Predictive Modeling, Hadoop, Database Structure, Database Transactions, Transactions, Dashboard Development, Scripting, Automation, MongoDB Atlas, Dynamic SQL, ETL Tools, Data Integration, Apache Hive, Presto, BI Reporting, GeoJSON, GeoPandas, Shapely, Linux, Entity Relationships, Solution Architecture, Data Structures, Data Validation, Database Security, Google BigQuery, Cloud Migration, AWS Lambda, Relational Databases, Data Transformation, AWS Data Pipeline Service, Message Queues, Amazon Athena, APIs, RESTful Microservices, Excel 365, Microsoft Excel, OLTP, OLAP, ELK (Elastic Stack), Kubernetes, SQL Stored Procedures, Architecture, Regulatory Compliance, Web Crawlers, Scraping, Web Scraping, Data Scientist, GitHub, Data Aggregation, Large Data Sets, Startups, Startup Funding, Venture Capital, Venture Funding, SSH, Pipelines, Data Processing, English, Query Optimization, JSON, Data Scraping, Relational Database Services (RDS), Data Flows, Distributed Systems, Git, Scalability, Pentaho, Data Extraction, AWS Glue, Analytical Thinking, Requirements Analysis, Business Requirements, Ruby, Performance Optimization, Software as a Service (SaaS), Django, Artificial Intelligence (AI), Data Warehouse Testing, Database Optimization, API Integration, Go, Full-stack, Back-end, Apache Druid, Cassandra, Cloud Environments, Data & Backup Management, SQLite, GraphQL API, OAuth, Database Administration (DBA), Cloud, Database Schema Design, Tableau Desktop, Jinja, Star Schema, Kafka Connect, SQL Performance, Access Control, Role-based Access Control (RBAC), Matplotlib, Elasticsearch, Data Orchestration, ETL Pipelines, AI Pipeline, BI Reports, DigitalOcean, Internet of Things (IoT), Grafana, EMR, Google Cloud, TypeScript, ClickHouse, Cloud Storage, Database Table Optimization, Amazon EventBridge, Relational Database Design, Mapping, Dash, Ray, Data Marts, Data Quality, DAX, EDA, Key Performance Indicators (KPIs), Kimball Methodology, Google Sheets, Database Normalization, Data Quality Analysis, MySQL DBA, Data Warehouse Implementation, Power BI Desktop, Power BI Report Server, Technical Leadership, Beautiful Soup

Chief Architect (Architecture, Data Engineering, ETL, Data Modeling, Database Design, Dashboards)

2010 - 2012
Mzaya, Pvt., Ltd.
  • Handled the first implementation of a customer intelligence product at a large scale in the Nation Stock Exchange and Reliance Mutual Fund.
  • Created insightful data visualizations in Panopticon and Tableau.
  • Implemented the large-scale handling of data volume while using the existing hardware.
Technologies: Machine Learning Operations (MLOps), Data Pipelines, Data Engineering, Quality Assurance (QA), Data Cleansing, Data Cleaning, Data Analysis, Complex Data Analysis, Data Warehouse Design, Data Warehousing, Big Data, Data Mining, Database Modeling, Performance Tuning, Data Management, PL/SQL, SQL, ETL, Master Data Management (MDM), Analytics, Business Intelligence (BI), Oracle, SAS, Base SAS, Tableau, Dashboards, Databases, Database Development, Reports, CSV File Processing, Data Profiling, Data Architecture, Reporting, Big Data Architecture, Data Governance, Data Visualization, Data, ETL Development, ETL Implementation & Design, CSV, Customer Data, Database Architecture, Database Performance, Orchestration, ELT, Data Analytics, Business Analytics, Database Structure, Database Transactions, Scripting, Automation, Dynamic SQL, Data Migration, ETL Tools, Data Integration, BI Reporting, Linux, Entity Relationships, Solution Architecture, Data Structures, Data Validation, Database Security, Relational Databases, Data Transformation, Microsoft Excel, OLAP, SQL Stored Procedures, Architecture, Web Crawlers, Scraping, Web Scraping, Data Scientist, Data Aggregation, Large Data Sets, SSH, Pipelines, Data Processing, English, Query Optimization, Data Scraping, Transact-SQL (T-SQL), Scalability, Pentaho, Analytical Thinking, Requirements Analysis, Performance Optimization, Artificial Intelligence (AI), Data Warehouse Testing, Database Optimization, Back-end, Fraud Detection, Database Schema Design, Tableau Desktop, Star Schema, Data Vault 2.0, SQL Performance, Data Orchestration, ETL Pipelines, BI Reports, DataOps, Database Table Optimization, Relational Database Design, Data Marts, Data Quality, EDA, Key Performance Indicators (KPIs), Kimball Methodology, Data Quality Analysis, Data Warehouse Implementation, Technical Leadership

Software Specialist (ETL, Data Modeling, Database Design, Dashboard, MDM, Reports)

2009 - 2010
SAS R&D India, Pvt., Ltd.
  • Developed a SAS IIS (insurance intelligence solution) cross-sell, up-sell, and customer retention/segmentation solution for the forthcoming release.
  • Constructed and automated a SAS log reader so the source data and target data columns transformation documents could be automatically generated.
  • Managed and mentored four team members in ETL development and review.
Technologies: Data Warehouse Design, Data Warehousing, Data Mining, Data Modeling, Performance Tuning, Data Management, SQL, ETL, Oracle, Unix, Database Modeling, Business Intelligence (BI), SAS Metadata Server, SAS Data Integration (DI) Studio, SAS, Base SAS, Databases, Database Development, Reports, Data Profiling, Data Architecture, Reporting, Big Data Architecture, Data Governance, Data, Big Data, ETL Development, ETL Implementation & Design, Data Analysis, Database Architecture, Orchestration, ELT, Data Analytics, Database Structure, Scripting, Automation, Dynamic SQL, ETL Tools, Data Integration, BI Reporting, Entity Relationships, Data Structures, Data Validation, Database Security, Relational Databases, Data Transformation, Microsoft Excel, OLAP, SQL Stored Procedures, Oracle Database, Architecture, Data Aggregation, SSH, Pipelines, Data Processing, English, Query Optimization, Scalability, Analytical Thinking, Requirements Analysis, Performance Optimization, Artificial Intelligence (AI), Data Warehouse Testing, Database Optimization, Fraud Detection, Database Schema Design, Star Schema, SQL Performance, Access Control, Role-based Access Control (RBAC), Data Orchestration, ETL Pipelines, BI Reports, Enterprise Data Warehouse (EDW), Database Table Optimization, Relational Database Design, Data Marts, Data Quality, EDA, Key Performance Indicators (KPIs), Kimball Methodology, Data Warehouse Implementation, Technical Leadership, Code Refactoring, Legacy Code

Software Specialist (ETL, Data Modeling, Database Design, Dashboards, Reports, MDM)

2007 - 2008
SAS India Institute, Pvt., Ltd.
  • Implemented the pilot of a SAS OpRisk solution (operation risk) for Axis Bank.
  • Worked on the SAS credit scoring UI, reports, and chart enhancement while consulting for SAS R&D.
  • Earned my Base SAS certification within six months of joining.
Technologies: Data Pipelines, Data Engineering, Quality Assurance (QA), Data Cleansing, Data Cleaning, Data Analysis, Complex Data Analysis, Data Warehousing, Data Warehouse Design, Data Modeling, Data Mining, Performance Tuning, Data Management, Database Modeling, Unix, SQL, ETL, Business Intelligence (BI), CSS, HTML, JavaScript, Java, Unix Shell Scripting, VMware, Flux, SAS Data Integration (DI) Studio, SAS, Base SAS, Databases, Database Development, CSV File Processing, Reports, Data Profiling, Data Architecture, Reporting, Data Governance, Data Visualization, Data, ETL Development, ETL Implementation & Design, Customer Data, Database Architecture, Orchestration, ELT, Data Analytics, Database Structure, Dashboard Development, Scripting, Automation, Dynamic SQL, ETL Tools, Data Integration, BI Reporting, Linux, Entity Relationships, Data Structures, Data Validation, Database Security, Relational Databases, Data Transformation, Microsoft Excel, OLAP, SQL Stored Procedures, Oracle Database, Architecture, Data Aggregation, SSH, Pipelines, Data Processing, English, Query Optimization, Transact-SQL (T-SQL), Scalability, Requirements Analysis, Performance Optimization, Artificial Intelligence (AI), Data Warehouse Testing, Database Optimization, Database Schema Design, Star Schema, Data Vaults, SQL Performance, Access Control, Role-based Access Control (RBAC), Data Orchestration, ETL Pipelines, BI Reports, Enterprise Data Warehouse (EDW), DataOps, Database Table Optimization, Relational Database Design, Data Marts, Data Quality, EDA, Key Performance Indicators (KPIs), Kimball Methodology, Data Quality Analysis, Data Warehouse Implementation, Code Refactoring, Legacy Code

Associate Consultant (Oracle, ETL, Database Design, Business Intelligence, Reporting, Dashboards)

2000 - 2007
Tata Consultancy Services
  • Executed a Six Sigma DMADV project with no defects during the implementation of GEMMS software in a GE Superabrasives Florida manufacturing plant.
  • Facilitated the installation of Oracle 9i RAC and Oracle Apps by creating the necessary environment in Redhat. One of the early adopters of Oracle 9i RAC in Red Hat Linux RH3. Migrated the data over from Oracle 7 to Oracle 9i.
  • Led a team of 30 in the creation of a new data warehouse for GE APAC. Manage, Design , Develop and Deployment of database schemas, stored procedures, triggers and ETL.
  • Created intuitive BusinessObjects and Tableau dashboards and reports.
  • Sybase development and implementation of database schemas, stored procedures, and triggers, enhancing application functionalities.
  • Participated in the migration of Sybase versions for improved data management and system efficiency.
  • Windows 2000 Batch scripting to push the power builder application to all the machines that were on the shop floor and used the application. To keep the version of the software across all the machines.
Technologies: Data Pipelines, Data Engineering, Quality Assurance (QA), Data Analysis, Data Cleaning, Data Cleansing, Data Warehouse Design, Data Warehousing, Data Modeling, Data Mining, Data Management, Performance Tuning, Business Intelligence (BI), SQL, ETL, PL/SQL Tuning, Database Design, Database Modeling, PowerBuilder, Manufacturing, Oracle, FTP, Shell Scripting, SAP BusinessObjects Data Integrator, SAP BusinessObjects (BO), PL/SQL Developer, Oracle PL/SQL, Tableau, Dashboards, Databases, Database Development, CSV File Processing, Reports, Data Profiling, Data Architecture, Microsoft Access, Reporting, Data Governance, Data Visualization, Data, ETL Development, ETL Implementation & Design, Customer Data, Database Architecture, Database Performance, Orchestration, ELT, Data Analytics, Database Structure, Database Transactions, Transactions, Dashboard Development, Scripting, Automation, Dynamic SQL, Data Migration, ETL Tools, Data Integration, BI Reporting, Autosys, Linux, Entity Relationships, Data Structures, Data Validation, Database Security, Relational Databases, Data Transformation, Microsoft Excel, OLTP, OLAP, SQL Stored Procedures, Oracle Database, Architecture, Data Aggregation, SSH, Pipelines, Data Processing, English, Query Optimization, Database Replication, Sybase, Transact-SQL (T-SQL), Scalability, Data Extraction, SQL Anywhere, Requirements Analysis, Performance Optimization, Data Warehouse Testing, Database Optimization, Data & Backup Management, Database Schema Design, Star Schema, SQL Performance, Access Control, Role-based Access Control (RBAC), Data Orchestration, ETL Pipelines, BI Reports, Enterprise Data Warehouse (EDW), DataOps, Database Table Optimization, Relational Database Design, Data Marts, Data Quality, EDA, Key Performance Indicators (KPIs), Kimball Methodology, Database Normalization, Data Quality Analysis, Data Warehouse Implementation, Code Refactoring, Legacy Code

Experience

AgriWatch

A SaaS product developed completely in-house using open source tools and technologies and deployed in GCP well as AWS. It is a tool that virtually digitizes the farming land records and gives insight into the productivity of the land-based on satellite images and gives a score to the land relatively in a radius of 100 kilometers. All the data is overlaid over Google Maps with an easy-to-use and intuitive overlay of data and visualization over Google Maps. HDFC Bank has used it to make decisions regarding farming loans and save the costs involved in manually checking the farm.

Powermatics/EnergyWatch

This is an application for electricity distribution companies and it forecasts, a day ahead, the power requirements for 96 time-blocks (15 minutes) and optimizes the purchase portfolio by considering the power exchange market prices. It also provides closed envelope bidding prices for the market and provides intuitive data visualization using D3.js and a visualization-based data manipulation directly from the plots.

Brand-new Enterprise Data Warehouse Creation

As an architect and project manager, I led the creation of a brand-new data warehouse for the APAC region for GE using Oracle as a database and Business Object Data Integrator.

My work ranged from designing to team management as well as coordinating with six different countries project module leaders and coordinators to make the project an eventual success for the GE Plastics APAC region.

Edtech Data Warehouse and Datamarts of Various Chegg Businesses

I handled ETL's data warehouse and datamart design and development for various Chegg businesses, including calling Salesforce, Calendly, Microsoft Azure Drive, Databricks, and Tableau APIs to make things simple and consistent. The technology stack used was Airflow, Databricks, DBT, Redshift, Tableau, and Looker.

Education

1994 - 2000

Bachelor of Technology Degree in Mechanical Engineering

North Eastern Regional Institute of Science and Technology - Nirjuli, Itanagar, Arunachal Pradesh, India

Certifications

MARCH 2009 - PRESENT

SAS Certified Advanced Programmer for SAS9

SAS

MARCH 2008 - PRESENT

SAS Certified Base Programmer for SAS9

SAS

Skills

Libraries/APIs

REST APIs, Pandas, NumPy, OpenCV, Google Maps API, Node.js, PySpark, Shapely, GraphQL API, Matplotlib, Pydantic, Beautiful Soup, D3.js, Plotly.js

Tools

SAS Data Integration (DI) Studio, Tableau, Microsoft Power BI, GIS, BigQuery, Microsoft Excel, Pentaho Data Integration (Kettle), Tableau Desktop Pro, SAP BusinessObjects Data Integrator, VMware, Microsoft Access, Apache Airflow, GitHub, Looker, MongoDB Atlas, Autosys, Amazon Athena, Git, Tableau Prep Flow, Stitch Data, AWS Glue, Prisma, Tableau Desktop, Amazon Simple Queue Service (SQS), Kafka Connect, Amazon QuickSight, Superset, Google Sheets, Power BI Desktop, Power BI Report Server, RabbitMQ, Celery, Amazon CloudWatch, ELK (Elastic Stack), GitHub Pages, Amazon Elastic Container Service (ECS), AWS Fargate, Terraform, Apache Druid, Grafana, Grafana k6, Amazon Elastic Container Registry (ECR)

Languages

Python 3, SAS, SQL, Python, Snowflake, PowerBuilder, JavaScript, HTML, Python 2, Bash, HTML5, Transact-SQL (T-SQL), Ruby, Go, C, Fortran, Java, CSS, Scala, TypeScript

Paradigms

Business Intelligence (BI), Database Design, ETL, Spatial Databases, Database Development, ETL Implementation & Design, OLAP, DevOps, Automation, Requirements Analysis, Role-based Access Control (RBAC), Kimball Methodology, Code Refactoring, Mechanical Design

Platforms

Oracle, Google Cloud Platform (GCP), Amazon Web Services (AWS), Oracle Database, Linux, Unix, Apache Kafka, Databricks, Amazon EC2, AWS Lambda, Pentaho, DigitalOcean, Azure, Salesforce, Docker, Kubernetes, Jet Admin, Confluent Kafka, Temporal Cloud

Storage

PostGIS, PostgreSQL, MySQL, PL/SQL, Oracle PL/SQL, Amazon S3 (AWS S3), Master Data Management (MDM), Database Modeling, Data Pipelines, Google Cloud SQL, Databases, Database Architecture, Database Structure, Dynamic SQL, Data Integration, Relational Databases, OLTP, SQL Stored Procedures, JSON, SQL Performance, ClickHouse, PL/SQL Developer, MongoDB, NoSQL, Database Performance, Redshift, Database Transactions, Apache Hive, Data Validation, Database Security, Data Lakes, AWS Data Pipeline Service, Amazon DynamoDB, Database Replication, Sybase, SQL Anywhere, Cloud Environments, SQLite, Database Administration (DBA), Elasticsearch, Google Cloud, GeoServer, Azure Blobs, Apache Parquet, Amazon Aurora, Cassandra

Frameworks

Flux, Flask, Apache Spark, Spark, Hadoop, Presto, Streamlit, NestJS, Jinja, Ray, Bootstrap 3, Django

Other

Base SAS, FTP, Business, Data Management, Data Modeling, Big Data, Data Warehousing, Complex Data Analysis, Data Analysis, Data Cleansing, Data Cleaning, Data Engineering, Machine Learning Operations (MLOps), Dashboards, CSV File Processing, Reports, Data Profiling, Data Architecture, Data Warehouse Design, Reporting, BI Reporting, Data Governance, Data Gathering, Data, ETL Development, CSV, Customer Data, Data Build Tool (dbt), Amazon RDS, ELT, Data Analytics, Scripting, Data Migration, ETL Tools, Entity Relationships, Solution Architecture, Data Transformation, Message Queues, APIs, Data Aggregation, Pipelines, Data Processing, English, Query Optimization, Relational Database Services (RDS), Data Extraction, Data Strategy, Data Warehouse Testing, Database Optimization, Back-end, Database Schema Design, Star Schema, Data Orchestration, ETL Pipelines, BI Reports, Enterprise Data Warehouse (EDW), DataOps, Cloud Storage, Database Table Optimization, Amazon EventBridge, Relational Database Design, EDA, Key Performance Indicators (KPIs), Data Quality Analysis, Data Warehouse Implementation, Shell Scripting, SAP BusinessObjects (BO), Unix Shell Scripting, SAS Metadata Server, Computer Vision, SAP BusinessObjects Data Service (BODS), PL/SQL Tuning, Performance Tuning, Data Mining, Geospatial Data, Big Data Architecture, Data Visualization, Google Data Studio, Real-time Data, Manufacturing, Back-end Development, Orchestration, Minimum Viable Product (MVP), Business Analytics, Predictive Modeling, Transactions, Dashboard Development, Education, GeoJSON, GeoPandas, Data Structures, Google BigQuery, Cloud Migration, RESTful Microservices, Excel 365, Architecture, Regulatory Compliance, Web Crawlers, Scraping, Web Scraping, Data Scientist, Large Data Sets, Startups, Startup Funding, Venture Capital, Venture Funding, SSH, Digital Marketing, Data Scraping, Data Flows, Distributed Systems, Scalability, Analytical Thinking, Business Requirements, Amazon Redshift, Performance Optimization, Software as a Service (SaaS), Delta Lake, Artificial Intelligence (AI), CI/CD Pipelines, Customer Data Platform (CDP), API Integration, Forecasting, Statistical Modeling, Metabase, Looker Studio, Full-stack, Fraud Detection, Data & Backup Management, DuckDB, OAuth, Cloud, Data Vaults, Data Vault 2.0, Access Control, Medallion Architecture, AI Pipeline, Internet of Things (IoT), EMR, Apache Superset, Scalable Platforms, Serverless, Mapping, Dash, Amazon Managed Workflows for Apache Airflow (MWAA), Data Marts, Data Quality, DAX, Database Normalization, MySQL DBA, A/B Testing, Product Analytics, Cursor AI, Attribution Modeling, Technical Leadership, Legacy Code, Statistical Quality Control (SQC), Machine Design, Fluid Dynamics, Industrial Engineering, Operations Research, PLC, Analytics, Machine Learning, Quality Assurance (QA), Geospatial Analytics, Data Science, Tableau Server, GitHub Actions, Data Curation, Grafana 2, Prometheus, AWS ECS Fargate

Collaboration That Works

How to Work with Toptal

Toptal matches you directly with global industry experts from our network in hours—not weeks or months.

1

Share your needs

Discuss your requirements and refine your scope in a call with a Toptal domain expert.
2

Choose your talent

Get a short list of expertly matched talent within 24 hours to review, interview, and choose from.
3

Start your risk-free talent trial

Work with your chosen talent on a trial basis for up to two weeks. Pay only if you decide to hire them.

Top talent is in high demand.

Start hiring