Biswadip Paul
Verified Expert in Engineering
Data Engineer and ETL Developer
Mumbai, Maharashtra, India
Toptal member since January 22, 2021
Biswadip is a SAS and Advanced SAS Certified Developer with 22 years of experience in business intelligence, technology selection, architecture, pre-sales, consulting, and project management. He excels in developing software architecture and execution involving the real-time processing of big data, including geospatial data. A career highlight of Biswadip's is when he developed an easy-to-use data visualization solution—overlaid data over Google Maps—for customers to consume data.
Portfolio
Experience
Availability
Preferred Environment
Tableau Desktop Pro, Python 3, PL/SQL, Pentaho Data Integration (Kettle), Oracle, MySQL, PostgreSQL, Linux, Data Engineering, ETL
The most amazing...
...project I've handled was the full-stack development and architecture for an Agriwatch product, which involved overlaying visualizations over Google Maps.
Work Experience
Developer
Oy Teboil Ab
- Installed Sybase 17 in one of the depot servers. Ensured all the standards used in the installation mimic other depots using Sybase 16. Ensured that the prebuilt queries would work with issues.
- Fixed client-server replication issues between the Head Quarter Sybase 16 server and multiple depot servers (an SQL remote system where the message type was FILE-based). I carefully found the gaps and made sure nothing failed.
- Ensured the Java codes that use Sybase to connect to multiple systems, such as PLC, FTP, and printers, were tested and running as desired.
- Fixed all the problems with almost zero documentation. We fixed all of the issues in live production. Implementing all the fixes took 20 days.
- Appreciated by the client and currently working hourly for an indefinite period.
- Documented the system and fixes. Now client has a document when previously they didn't have any.
Data Engineer
PepsiCo Global - Main
- Designed and Developed two data marts in Snowflake to move sales and operational data from different sources using dbt for transformation and Airflow as the orchestrator using SQL and Python.
- Developed dbt macros that can be used organization-wide, resulting in code reusability and decreasing development time for other teams. After the macro was made, the 2nd datamart was created end-to-end in eight hours.
- Oversaw and coordinated with various teams, from data owners to the infrastructure and security requirements and approval team across multiple time zones; the risks were averted, and the project was finished on time.
- Recognized by the client who is ready to provide recommendations if required in the future.
- Docker with kubernetes in local machine as dev environment for quick development.
- Managed Jira Tickets, developed and linked to Github commits.
Data/BI Engineer Lead
Chegg
- Designed and developed datamarts for various Chegg businesses. Created generic frameworks in Python OOPs for data loading from various REST APIs.
- Tracked and fixed bugs using the Jira tool. Tracked tasks using Kanban in Atlassian.
- Created a new reliable ETL flow to remove a bottleneck and dependency on another team. The framework reflects the corresponding datatypes and data length (even Unicode length) in the source system, automatically adding columns based on the source.
- Created a data quality framework to check the consistency and freshness of the data loaded. Used and integrated Git as part of the Databricks jobs, used Airflow for orchestration, and triggered refreshes of Tableau reports.
- Added CI/CD to move new changes from dev to production.
- Optimized data pipelines using DBT, reducing processing time by 30%. Designed efficient data models, enhancing collaboration Data validation tests in DBT, leading to a 20% decrease in data-related errors and ensuring high data quality standards.
- Created a generic SCD2 framework in Spark. This framework creates data marts for Calendly data using Calendly REST API. Other uses but not limited to loading data from Microsoft 360 cloud drive Excels to data mart.
- Orchestrated using Apache Airflow. Created fanout jobs (parallel running jobs source system Salesforce and destination being Redshift) using Databricks operator. XCOM was used to make the number of jobs being passed dynamically.
Technology Director | Co-founder (Software Architecture and Full-stack Data Engineering)
Quenext
- Developed two products from end to end. One product is for the power sector, forecasting power demand and portfolio optimization. The other product is for the agricultural sector that does the historical analysis of plots based on satellite images.
- Contributed to the development of the front end and data visualization.
- Implemented a microservice architecture using Flask.
- Ensured that the forecast accuracy of the power product was 98%; certified by the client UPCL.
- Received the following patent: Power Demand Forecasting Patent No US10468883B2.
Chief Architect (Architecture, Data Engineering, ETL, Data Modeling, Database Design, Dashboards)
Mzaya, Pvt., Ltd.
- Handled the first implementation of a customer intelligence product at a large scale in the Nation Stock Exchange and Reliance Mutual Fund.
- Created insightful data visualizations in Panopticon and Tableau.
- Implemented the large-scale handling of data volume while using the existing hardware.
Software Specialist (ETL, Data Modeling, Database Design, Dashboard, MDM, Reports)
SAS R&D India, Pvt., Ltd.
- Developed a SAS IIS (insurance intelligence solution) cross-sell, up-sell, and customer retention/segmentation solution for the forthcoming release.
- Constructed and automated a SAS log reader so the source data and target data columns transformation documents could be automatically generated.
- Managed and mentored four team members in ETL development and review.
Software Specialist (ETL, Data Modeling, Database Design, Dashboards, Reports, MDM)
SAS India Institute, Pvt., Ltd.
- Implemented the pilot of a SAS OpRisk solution (operation risk) for Axis Bank.
- Worked on the SAS credit scoring UI, reports, and chart enhancement while consulting for SAS R&D.
- Earned my Base SAS certification within six months of joining.
Associate Consultant (Oracle, ETL, Database Design, Business Intelligence, Reporting, Dashboards)
Tata Consultancy Services
- Executed a Six Sigma DMADV project with no defects during the implementation of GEMMS software in a GE Superabrasives Florida manufacturing plant.
- Facilitated the installation of Oracle 9i RAC and Oracle Apps by creating the necessary environment in Redhat. One of the early adopters of Oracle 9i RAC in Red Hat Linux RH3. Migrated the data over from Oracle 7 to Oracle 9i.
- Led a team of 30 in the creation of a new data warehouse for GE APAC. Manage, Design , Develop and Deployment of database schemas, stored procedures, triggers and ETL.
- Created intuitive BusinessObjects and Tableau dashboards and reports.
- Sybase development and implementation of database schemas, stored procedures, and triggers, enhancing application functionalities.
- Participated in the migration of Sybase versions for improved data management and system efficiency.
- Windows 2000 Batch scripting to push the power builder application to all the machines that were on the shop floor and used the application. To keep the version of the software across all the machines.
Experience
AgriWatch
Powermatics/EnergyWatch
Brand-new Enterprise Data Warehouse Creation
My work ranged from designing to team management as well as coordinating with six different countries project module leaders and coordinators to make the project an eventual success for the GE Plastics APAC region.
Edtech Data Warehouse and Datamarts of Various Chegg Businesses
Education
Bachelor of Technology Degree in Mechanical Engineering
North Eastern Regional Institute of Science and Technology - Nirjuli, Itanagar, Arunachal Pradesh, India
Certifications
SAS Certified Advanced Programmer for SAS9
SAS
SAS Certified Base Programmer for SAS9
SAS
Skills
Libraries/APIs
REST APIs, Pandas, NumPy, OpenCV, Google Maps API, PySpark, Shapely, D3.js, Plotly.js, Node.js
Tools
SAS Data Integration (DI) Studio, Tableau, Microsoft Power BI, GIS, Microsoft Excel, Pentaho Data Integration (Kettle), Tableau Desktop Pro, SAP BusinessObjects Data Integrator, VMware, Microsoft Access, Apache Airflow, BigQuery, GitHub, Looker, MongoDB Atlas, Autosys, Amazon Athena, Git, Tableau Prep Flow, Stitch Data, AWS Glue, RabbitMQ, Celery, Amazon CloudWatch, ELK (Elastic Stack), GitHub Pages
Languages
Python 3, SAS, SQL, Python, PowerBuilder, JavaScript, HTML, Python 2, Bash, HTML5, Snowflake, T-SQL (Transact-SQL), Ruby, C, Fortran, Java, CSS, Scala
Paradigms
Business Intelligence (BI), Database Design, ETL, Spatial Databases, Database Development, ETL Implementation & Design, OLAP, DevOps, Automation, Requirements Analysis, Mechanical Design
Platforms
Oracle, Google Cloud Platform (GCP), Amazon Web Services (AWS), Oracle Database, Linux, Unix, Databricks, Amazon EC2, AWS Lambda, Pentaho, Apache Kafka, Salesforce, Docker, Kubernetes, Jet Admin
Storage
PostGIS, PostgreSQL, MySQL, PL/SQL, Master Data Management (MDM), Database Modeling, Data Pipelines, Google Cloud SQL, Databases, Database Architecture, Database Structure, Dynamic SQL, Data Integration, Relational Databases, OLTP, SQL Stored Procedures, JSON, Oracle PL/SQL, PL/SQL Developer, Amazon S3 (AWS S3), MongoDB, NoSQL, Database Performance, Redshift, Database Transactions, Apache Hive, Data Validation, Database Security, Data Lakes, AWS Data Pipeline Service, Amazon DynamoDB, Database Replication, Sybase, SQL Anywhere, GeoServer, Azure Blobs
Frameworks
Flux, Flask, Realtime, Apache Spark, Spark, Hadoop, Presto, Streamlit, Bootstrap 3, Django
Other
Base SAS, FTP, Business, Data Management, Data Modeling, Big Data, Data Warehousing, Complex Data Analysis, Data Analysis, Data Cleansing, Data Cleaning, Data Engineering, Machine Learning Operations (MLOps), Dashboards, CSV File Processing, Reports, Data Profiling, Data Architecture, Data Warehouse Design, Reporting, BI Reporting, Data Governance, Data Gathering, Data, ETL Development, CSV, Customer Data, Data Build Tool (dbt), ELT, Data Analytics, Scripting, Data Migration, ETL Tools, Entity Relationships, Solution Architecture, Data Transformation, Message Queues, Data Aggregation, Pipelines, Data Processing, English, Query Optimization, Relational Database Services (RDS), Data Extraction, Shell Scripting, SAP BusinessObjects (BO), Unix Shell Scripting, SAS Metadata Server, Computer Vision, SAP BusinessObjects Data Service (BODS), PL/SQL Tuning, Performance Tuning, Data Mining, Geospatial Data, Big Data Architecture, Data Visualization, Google Data Studio, Manufacturing, Back-end Development, Amazon RDS, Orchestration, Minimum Viable Product (MVP), Business Analytics, Predictive Modeling, Transactions, Dashboard Development, Education, GeoJSON, GeoPandas, Data Structures, Google BigQuery, Cloud Migration, APIs, RESTful Microservices, Excel 365, Architecture, Regulatory Compliance, Web Crawlers, Scraping, Web Scraping, Data Scientist, Large Data Sets, Startups, Startup Funding, Venture Capital, Venture Funding, SSH, Digital Marketing, Data Scraping, Data Flows, Distributed Systems, Scalability, Analytical Thinking, Business Requirements, Amazon Redshift, Performance Optimization, Software as a Service (SaaS), Statistical Quality Control (SQC), Machine Design, Fluid Dynamics, Industrial Engineering, Operations Research, PLC, Analytics, Machine Learning, Quality Assurance (QA), Geospatial Analytics, Data Science, Tableau Server, GitHub Actions, Data Curation
How to Work with Toptal
Toptal matches you directly with global industry experts from our network in hours—not weeks or months.
Share your needs
Choose your talent
Start your risk-free talent trial
Top talent is in high demand.
Start hiring