Rajib Baruah, Data Developer in Macungie, PA, United States
Rajib Baruah

Data Developer in Macungie, PA, United States

Member since April 28, 2022
Rajib is a senior data engineer with 23 years of experience in T-SQL coding and building SQL Server databases, ETL data pipelines in the Azure cloud using ADF, or on-premises software using SQL Server Integration Services (SSIS) to ingest data and processes. An expert in coding using Python and PySpark, he creates notebooks in Databricks for data transformation and load. With his extensive experience, Rajib will be an excellent addition to any team.
Rajib is now available for hire

Portfolio

  • Assurant
    SQL, Python, PySpark, SQL Server Integration Services (SSIS), Databricks...
  • Innovative Control Systems
    Microsoft SQL Server, T-SQL, SQL Server Integration Services (SSIS), SSAS...
  • RSG Media
    SQL, SQL Server Integration Services (SSIS), Microsoft SQL Server...

Experience

Location

Macungie, PA, United States

Availability

Part-time

Preferred Environment

SQL, Python, PySpark, Azure Data Factory, SQL Server Integration Services (SSIS), T-SQL, Databricks, Microsoft SQL Server, ETL, Databases, Relational Databases, Azure, Azure SQL, Agile, Scrum, Data Modeling, ETL Development

The most amazing...

...data engineering solution I’ve developed prepares raw historical data for data scientists to be used for forecasting the future pricing of online ads.

Employment

  • Data Engineer

    2013 - PRESENT
    Assurant
    • Created data pipelines using Azure Data Factory (ADF), Databricks, Python, and PySpark. Converted on-premises ETL processes into ADF pipelines for scalability.
    • Designed a relational and denormalized star schema-based database and built multiple SQL databases. I also did extensive T-SQL coding in the form of stored procedures, functions, views, and ad hoc SQL scripts.
    • Handled the ongoing performance tuning and SQL database and t-SQL code for optimal performance. Identified and resolved performance-related issues in production.
    • Created many SQL Server Integration Services (SSIS) packages to load and process inbound files from external vendors and other internal systems. I also created SSIS packages to transform data and generate outbound file feeds.
    • Interacted with end customers and business analysts to get a thorough understanding of the business requirements and deliver the right solutions.
    • Created a few Power-BI and SQL Server Reporting Services (SSRS) reports for business users.
    • Oversaw a team of engineers working on the project and program launch.
    Technologies: SQL, Python, PySpark, SQL Server Integration Services (SSIS), Databricks, Azure Cosmos DB, Microsoft SQL Server, T-SQL, Azure Data Factory, ETL, Blob Storage, Software Development, Database Management, SQL Performance, Data Engineering, Apache Spark, Spark, Data Pipelines, Azure Synapse, Business Intelligence (BI), Data Visualization, Insurance, Insurance Industry, Microsoft Power BI, Telecommunications, Agile, Scrum, Excel 365, Data Analyst, Microsoft Excel, Data Modeling, Data Analysis, GitHub, ETL Development
  • DBA | Data Architect

    2012 - 2013
    Innovative Control Systems
    • Designed relational and star schema-based databases for the online transaction processing (OLTP) and reporting systems, respectively.
    • Created the first SQL Data Warehouse for the company from the relational SQL Server database, providing the company's decision-makers with easy, almost real-time access to the sales data at the regional and store levels.
    • Set up SQL Server replication from the stores to the central database. Created SSIS packages to load data from the OLTP database to the SQL Data Warehouse.
    • Built cubes using SQL Server Analysis Services (SSAS), enabling the generation of sales and activity reports by time and location.
    • Maintained databases, handling various actions, such as backup and restore, indexing, performance tuning, and monitoring.
    Technologies: Microsoft SQL Server, T-SQL, SQL Server Integration Services (SSIS), SSAS, SSRS, Software Development, SQL Performance, Data Engineering, Data Pipelines, Data Visualization, SQL Server Analysis Services (SSAS), Database Design, Databases, Relational Databases, Excel 365, Data Analyst, Microsoft Excel, SQL, Data Modeling, Data Analysis, Data Warehousing, ETL Development
  • Consultant | Data Architect

    2010 - 2012
    RSG Media
    • Created SSIS packages to process raw-impression data, transform it into relational structured data, and load it into the SQL Server database that data scientists and analysts then use to generate pricing and demand future predictions.
    • Collaborated with data scientists to understand their data needs and design the database accordingly.
    • Oversaw and mentored two database developers in the team.
    Technologies: SQL, SQL Server Integration Services (SSIS), Microsoft SQL Server, Media Industry, Software Development, SQL Performance, T-SQL, Data Engineering, Data Pipelines, Data Visualization, Database Design, Databases, Relational Databases, Excel 365, Data Analyst, Microsoft Excel, Data Modeling, Data Analysis, ETL Development
  • Lead DBA

    2005 - 2010
    KGB.com
    • Designed, built, and maintained multiple databases for the company's application used by call center agents, their QA monitoring system, and their reporting database.
    • Created a data warehouse on top of the OLTP databases that contained the call center activities of the directory assistance business, generating almost real-time reports on the call volumes.
    • Implemented database partitioning to speed up search performance on a database table with over a billion records. It aimed to keep all searches under a second, helping the company process more calls with fewer call center agents.
    • Set up the offshore to onshore real-time replication between SQL Server databases.
    • Continued performance tuning, indexing, database backup and restore, and log shipping.
    Technologies: Microsoft SQL Server, SQL Server Integration Services (SSIS), SQL Server Analysis Services (SSAS), Excel VBA, Software Development, SQL Performance, T-SQL, Data Engineering, Data Pipelines, Data Visualization, SSAS, Telecommunications, Database Design, Databases, Relational Databases, Excel 365, Data Analyst, Microsoft Excel, SQL, Data Modeling, Data Analysis, Data Warehousing, ETL Development

Experience

  • Data Pipelines in Azure and Sync On-prem Database

    I created data pipelines to ingest multiple daily input files from different sources. This process decrypts and loads the daily input files, takes data elements from these files, compares them against historical data, calculates other custom column values, and loads the data into the online transaction processing (OLTP) database. The data pipelines were created using Azure Data Factory, which integrates Databricks notebooks written in Python and PySpark.

  • Shipping Terminal Management System

    I was heavily involved in extensive T-SQL coding for a terminal seaport management system that tracks shipments from overseas to the US addresses. It is a complex system that integrates electronic data interchange (EDI) messages from oversea shipping lines. The system can locate a small package in a ship that carries thousands of containers and transfer those containers to trains and trucks for onshore distribution.

  • Automated Enrollment Correction

    An SSIS-based, fully configurable data correction back-end system built on top of an SQL Server database for proper customer billing. The system corrects data inconsistency caused by human errors. It examines the incoming data feed and compares the configured data mapping and data from multiple systems to assign customers to proper products or programs. It also aligns underlying customer data with their contracts so customers are billed correctly, reducing disputes and unnecessary customer call volumes. This project has been highly successful as it addressed a complex data correction requirement with many variables and inputs.

  • ETL for Predictive Analysis

    The SSIS processed large raw-impression data files from ads on top media websites. Raw data is loaded, transformed, and loaded again into an SQL server database. Analysts and data scientists then use that data to run their R programs for predicting future demands and pricing.

Skills

  • Languages

    SQL, Python, T-SQL, Excel VBA
  • Paradigms

    ETL, Database Design, Agile, Scrum, Business Intelligence (BI)
  • Platforms

    Databricks, Azure
  • Storage

    SQL Server Integration Services (SSIS), Microsoft SQL Server, Databases, Relational Databases, Azure SQL Databases, SQL Performance, Azure SQL, Data Pipelines, Database Management, Azure Cosmos DB, SQL Server Analysis Services (SSAS)
  • Other

    Azure Data Factory, Data Engineering, Data Analysis, Data Modeling, ETL Development, Insurance Industry, Media Industry, Transportation & Shipping, Excel 365, Data Analyst, Data Warehousing, Software Development, Software Architecture, Azure Data Lake, Blob Storage, Messaging, EDI, MSMQ, Azure Synapse, Data Visualization, CSV
  • Frameworks

    Apache Spark, Spark, .NET
  • Libraries/APIs

    PySpark
  • Tools

    Microsoft Excel, GitHub, Microsoft Power BI, SSAS, SSRS
  • Industry Expertise

    Telecommunications

Education

  • Engineer's Degree in Computer Science
    1993 - 1997
    Maulana Azad National Institute of Technology Bhopal - Bhopal, India

Certifications

  • Microsoft Certified AI Fundamentals
    JUNE 2021 - PRESENT
    Microsoft
  • Microsoft Certified: Azure Data Fundamentals
    JUNE 2021 - PRESENT
    Microsoft

To view more profiles

Join Toptal
Share it with others