Rajib Baruah, Developer in Macungie, PA, United States
Rajib is available for hire
Hire Rajib

Rajib Baruah

Verified Expert  in Engineering

Data Developer

Location
Macungie, PA, United States
Toptal Member Since
April 28, 2022

Rajib is a senior data engineer with 23 years of experience in T-SQL coding and building SQL Server databases, ETL data pipelines in the Azure cloud using ADF, or on-premises software using SQL Server Integration Services (SSIS) to ingest data and processes. An expert in coding using Python and PySpark, he creates notebooks in Databricks for data transformation and load. With his extensive experience, Rajib will be an excellent addition to any team.

Portfolio

Assurant
SQL, Python, PySpark, SQL Server Integration Services (SSIS), Databricks...
Innovative Control Systems
Microsoft SQL Server, T-SQL (Transact-SQL)...
RSG Media
SQL, SQL Server Integration Services (SSIS), Microsoft SQL Server, Media...

Experience

Availability

Part-time

Preferred Environment

SQL, Python, PySpark, Azure Data Factory, SQL Server Integration Services (SSIS), T-SQL (Transact-SQL), Databricks, Microsoft SQL Server, ETL, Databases, Relational Databases, Azure, Azure SQL, Agile, Scrum, Data Modeling, ETL Development

The most amazing...

...data engineering solution I’ve developed prepares raw historical data for data scientists to be used for forecasting the future pricing of online ads.

Work Experience

Data Engineer

2013 - PRESENT
Assurant
  • Created data pipelines using Azure Data Factory (ADF), Databricks, Python, and PySpark. Converted on-premises ETL processes into ADF pipelines for scalability.
  • Designed a relational and denormalized star schema-based database and built multiple SQL databases. I also did extensive T-SQL coding in the form of stored procedures, functions, views, and ad hoc SQL scripts.
  • Handled the ongoing performance tuning and SQL database and t-SQL code for optimal performance. Identified and resolved performance-related issues in production.
  • Created many SQL Server Integration Services (SSIS) packages to load and process inbound files from external vendors and other internal systems. I also created SSIS packages to transform data and generate outbound file feeds.
  • Interacted with end customers and business analysts to get a thorough understanding of the business requirements and deliver the right solutions.
  • Created a few Power-BI and SQL Server Reporting Services (SSRS) reports for business users.
  • Oversaw a team of engineers working on the project and program launch.
Technologies: SQL, Python, PySpark, SQL Server Integration Services (SSIS), Databricks, Azure Cosmos DB, Microsoft SQL Server, T-SQL (Transact-SQL), Azure Data Factory, ETL, Blob Storage, Software Development, Database Management, SQL Performance, Data Engineering, Apache Spark, Spark, Data Pipelines, Azure Synapse, Business Intelligence (BI), Data Visualization, Insurance, Microsoft Power BI, Telecommunications, Agile, Scrum, Excel 365, Microsoft Excel, Data Modeling, Data Analysis, GitHub, ETL Development

DBA | Data Architect

2012 - 2013
Innovative Control Systems
  • Designed relational and star schema-based databases for the online transaction processing (OLTP) and reporting systems, respectively.
  • Created the first SQL Data Warehouse for the company from the relational SQL Server database, providing the company's decision-makers with easy, almost real-time access to the sales data at the regional and store levels.
  • Set up SQL Server replication from the stores to the central database. Created SSIS packages to load data from the OLTP database to the SQL Data Warehouse.
  • Built cubes using SQL Server Analysis Services (SSAS), enabling the generation of sales and activity reports by time and location.
  • Maintained databases, handling various actions, such as backup and restore, indexing, performance tuning, and monitoring.
Technologies: Microsoft SQL Server, T-SQL (Transact-SQL), SQL Server Integration Services (SSIS), SSAS, SQL Server Reporting Services (SSRS), Software Development, SQL Performance, Data Engineering, Data Pipelines, Data Visualization, SQL Server Analysis Services (SSAS), Database Design, Databases, Relational Databases, Excel 365, Microsoft Excel, SQL, Data Modeling, Data Analysis, Data Warehousing, ETL Development

Consultant | Data Architect

2010 - 2012
RSG Media
  • Created SSIS packages to process raw-impression data, transform it into relational structured data, and load it into the SQL Server database that data scientists and analysts then use to generate pricing and demand future predictions.
  • Collaborated with data scientists to understand their data needs and design the database accordingly.
  • Oversaw and mentored two database developers in the team.
Technologies: SQL, SQL Server Integration Services (SSIS), Microsoft SQL Server, Media, Software Development, SQL Performance, T-SQL (Transact-SQL), Data Engineering, Data Pipelines, Data Visualization, Database Design, Databases, Relational Databases, Excel 365, Microsoft Excel, Data Modeling, Data Analysis, ETL Development

Lead DBA

2005 - 2010
KGB.com
  • Designed, built, and maintained multiple databases for the company's application used by call center agents, their QA monitoring system, and their reporting database.
  • Created a data warehouse on top of the OLTP databases that contained the call center activities of the directory assistance business, generating almost real-time reports on the call volumes.
  • Implemented database partitioning to speed up search performance on a database table with over a billion records. It aimed to keep all searches under a second, helping the company process more calls with fewer call center agents.
  • Set up the offshore to onshore real-time replication between SQL Server databases.
  • Continued performance tuning, indexing, database backup and restore, and log shipping.
Technologies: Microsoft SQL Server, SQL Server Integration Services (SSIS), SQL Server Analysis Services (SSAS), Excel VBA, Software Development, SQL Performance, T-SQL (Transact-SQL), Data Engineering, Data Pipelines, Data Visualization, SSAS, Telecommunications, Database Design, Databases, Relational Databases, Excel 365, Microsoft Excel, SQL, Data Modeling, Data Analysis, Data Warehousing, ETL Development

Data Pipelines in Azure and Sync On-prem Database

I created data pipelines to ingest multiple daily input files from different sources. This process decrypts and loads the daily input files, takes data elements from these files, compares them against historical data, calculates other custom column values, and loads the data into the online transaction processing (OLTP) database. The data pipelines were created using Azure Data Factory, which integrates Databricks notebooks written in Python and PySpark.

Shipping Terminal Management System

I was heavily involved in extensive T-SQL coding for a terminal seaport management system that tracks shipments from overseas to the US addresses. It is a complex system that integrates electronic data interchange (EDI) messages from oversea shipping lines. The system can locate a small package in a ship that carries thousands of containers and transfer those containers to trains and trucks for onshore distribution.

Automated Enrollment Correction

An SSIS-based, fully configurable data correction back-end system built on top of an SQL Server database for proper customer billing. The system corrects data inconsistency caused by human errors. It examines the incoming data feed and compares the configured data mapping and data from multiple systems to assign customers to proper products or programs. It also aligns underlying customer data with their contracts so customers are billed correctly, reducing disputes and unnecessary customer call volumes. This project has been highly successful as it addressed a complex data correction requirement with many variables and inputs.

ETL for Predictive Analysis

The SSIS processed large raw-impression data files from ads on top media websites. Raw data is loaded, transformed, and loaded again into an SQL server database. Analysts and data scientists then use that data to run their R programs for predicting future demands and pricing.

Languages

SQL, Python, T-SQL (Transact-SQL), Excel VBA

Paradigms

ETL, Database Design, Agile, Scrum, Business Intelligence (BI)

Platforms

Databricks, Azure, Azure Synapse

Storage

SQL Server Integration Services (SSIS), Microsoft SQL Server, Databases, Relational Databases, Azure SQL Databases, SQL Performance, Azure SQL, Data Pipelines, Database Management, Azure Cosmos DB, SQL Server Reporting Services (SSRS), SQL Server Analysis Services (SSAS)

Other

Azure Data Factory, Data Engineering, Data Analysis, Data Modeling, ETL Development, Media, Transportation & Shipping, Excel 365, Data Warehousing, Software Development, Software Architecture, Azure Data Lake, Blob Storage, Messaging, Electronic Data Interchange (EDI), MSMQ, Data Visualization, CSV

Frameworks

Apache Spark, Spark, .NET

Libraries/APIs

PySpark

Tools

Microsoft Excel, GitHub, Microsoft Power BI, SSAS

Industry Expertise

Insurance, Telecommunications

1993 - 1997

Engineer's Degree in Computer Science

Maulana Azad National Institute of Technology Bhopal - Bhopal, India

JUNE 2021 - PRESENT

Microsoft Certified AI Fundamentals

Microsoft

JUNE 2021 - PRESENT

Microsoft Certified: Azure Data Fundamentals

Microsoft

Collaboration That Works

How to Work with Toptal

Toptal matches you directly with global industry experts from our network in hours—not weeks or months.

1

Share your needs

Discuss your requirements and refine your scope in a call with a Toptal domain expert.
2

Choose your talent

Get a short list of expertly matched talent within 24 hours to review, interview, and choose from.
3

Start your risk-free talent trial

Work with your chosen talent on a trial basis for up to two weeks. Pay only if you decide to hire them.

Top talent is in high demand.

Start hiring