Sumit Rai, Developer in Patan, Nepal
Sumit is available for hire
Hire Sumit

Sumit Rai

Verified Expert  in Engineering

Data Engineer and Developer

Location
Patan, Nepal
Toptal Member Since
September 1, 2022

Sumit is a senior data engineer specializing in building an enterprise data warehouse, automating workflows, and tackling challenging architectural scalability problems. He has eight years of experience in the IT industry working as a software developer, DevOps engineer, and data engineer. With this vast experience, Sumit meets the business objectives, follows best practices to deliver quality products, and troubleshoots challenging technical problems.

Portfolio

Abacus Insights
Databricks, PySpark, Spark SQL, Python 3, Data Build Tool (dbt)...
CloudFactory
Snowflake, Data Build Tool (dbt), Linux, Prefect, Pipelines, Singer ETL...
CloudFactory
Amazon Web Services (AWS), Jenkins, SSH, MongoDB, Amazon EC2, Amazon RDS...

Experience

Availability

Full-time

Preferred Environment

MacOS, Linux, Amazon Web Services (AWS), Python, Snowflake, Data Build Tool (dbt), Visual Studio Code (VS Code), Docker

The most amazing...

...thing I've built is a data warehouse to centralize data and extract business answers and values.

Work Experience

Principal Software Engineer

2023 - PRESENT
Abacus Insights
  • Generalized and centralized the insurance claims of several US Healthcare Insurance organizations for the data usability to generate reports.
  • Designed and built the automated data pipelines for extracting, loading, and transforming the data in the Databricks Platform using PySpark or SparkSQL. Ensured the data quality with unit testing using the Great Expectations library.
  • Led colleagues by teaching and helping them to understand the business objectives, system architectures and workflows, and technical challenges.
  • Ensured the requirements were delivered timely and reviewed the work to ensure quality.
Technologies: Databricks, PySpark, Spark SQL, Python 3, Data Build Tool (dbt), Amazon S3 (AWS S3), Snowflake, Data Engineering, ETL, ELT, OLAP, OLTP, Delta Lake, Spark, Leadership, GitHub

Senior Data Engineer

2020 - 2022
CloudFactory
  • Led the migration of all data pipelines existing in Xplenty to Prefect orchestration due to the lower return value compared to the cost spent.
  • Recognized for working dedicatedly on different parts of the project and being the pillar of the data team.
  • Implemented documentation and tests, improving the team's performance by 80% and meeting the organization's goals of building a self-service BI platform.
  • Wrote transformations in dbt, following the Kimball approach, and ensured complete, accurate, and timely data.
  • Built a CI pipeline from scratch in GitHub Actions to have quick feedback for data model source contributors.
  • Implemented the SQL standard by introducing SQLFluff and automatically evaluating the quality of the SQL using GitHub Actions.
  • Loaded the data sources into the enterprise data warehouse using ETL/ELT tools and scripts.
  • Orchestrated data pipelines using Prefect that existed in EC2 to enable centralized logging and proper visibility of the pipeline runs.
  • Created and managed the Snowflake objects, including databases, catalogs, schemas, tables, views, stages, Snowpipe, masks, and tasks.
Technologies: Snowflake, Data Build Tool (dbt), Linux, Prefect, Pipelines, Singer ETL, Fivetran, Amazon Web Services (AWS), Stitch Data, Terraform, Amazon Elastic Container Service (Amazon ECS), Xplenty, Snowpipe, ELT, Data Engineering, SQL, Data Warehousing, ETL, Zsh, Bash Script, Bash, Pandas, Kimball Methodology, GitHub Actions, Visual Studio Code (VS Code), Regex, Data Pipelines, OLAP, OLTP, Data Modeling, Leadership, GitHub, REST APIs, APIs

Software and DevOps Engineer

2018 - 2020
CloudFactory
  • Increased the availability and stability of a critical communication microservice application from 95% to 99% and was awarded for this effort.
  • Reduced the startup time of EC2 Auto Scaling instances by 70% by replacing on-load executable Ansible scripts into HashiCorp packer.
  • Wrote optimized SQL queries to process big data and get ultimate results to make proper business decisions.
  • Troubleshot and resolved issues across many apps to support the ongoing platform development in development, test, and production environments.
  • Monitored the application's operation and enhanced its performance.
  • Updated and upgraded legacy infrastructure to support business operations.
  • Collaborated with the software engineering teams to meet the Scrum sprint goals.
Technologies: Amazon Web Services (AWS), Jenkins, SSH, MongoDB, Amazon EC2, Amazon RDS, AWS Lambda, PostgreSQL, Amazon Athena, Ansible, RabbitMQ, AWS Certificate Manager, AWS CloudFormation, Docker, Amazon Elastic Container Service (Amazon ECS), SQL, Terraform, Packer, Rocket.Chat, AWS Auto Scaling, NGINX, Apache2, Zsh, Bash, Bash Script, Jupyter Notebook, Burp Suite, Visual Studio Code (VS Code), Regex, AWS Glue, AWS Step Functions, AWS IAM, GitHub, REST APIs, APIs

Software and DevOps Engineer

2016 - 2018
Leapfrog Technology
  • Developed the Python and Ansible scripts to configure EC2 infrastructures.
  • Constructed the Jenkins pipelines to deploy the source codes automatically.
  • Designed and structured the AWS resources with Auto Scaling and Elastic Load Balancing via CloudFormation. Configured and troubleshot them.
  • Set up and configured the pfSense firewall for the intranet. Configured the traffic shaping for suitable working environments, VPN (OpenVPN), Snort IPS, and Squid web filter.
  • Engineered, implemented, and monitored security measures to protect computer systems, networks, and information.
  • Set up FreeIPA and Active Directory identity managers to centralize the employees' identities and manage access permissions in networks and servers.
Technologies: Amazon Web Services (AWS), Amazon Elastic Container Service (Amazon ECS), Amazon RDS, Amazon S3 (AWS S3), Docker, Networking, pfSense, OpenVPN, FreeIPA, Active Directory Federation, Amazon Route 53, Ansible, Jenkins, GoDaddy, SSL Certificates, SQL, Burp Suite, Sophos Firewall, Apache2, NGINX, Zsh, TCP/IP, AWS IAM, GitHub

Software Developer

2014 - 2016
Incessant Rain Studios
  • Designed, developed, tested, and maintained file pipelines for animators from different departments to enable them to download necessary files or upload files they worked on.
  • Researched render batch commands and cooperated with the .NET programming team to build an automated render job handler.
  • Developed a remote services manager tool for the rendering department to ease their render jobs.
Technologies: Python, Regex, Microsoft SQL Server

CloudFactory's Enterprise Data Warehouse

A data warehouse for the organization to centralize all the data and extract business answers and values.

I built facts and dimensions tables by writing SQL queries and executed them using the dbt tool for building reports by the data analyst. The tool was capable of documentation and testing, and with these features and GitHub Action, the CI/CD was designed to have automated testing, resulting in reliable SQL in production.

ShareSansar Migration to Cloud and Containerization

https://www.sharesansar.com/
ShareSansar is Nepal's financial news portal and also owns Nepal's stock analysis product.
I migrated all their web hosting control panel to AWS cloud services. The web servers migrated to EC2, the databases to the RDS cluster, and static files to S3. I recommended and guided the developer to refactor the SQL to increase the server performance.
I got reached again to scale the web servers and increase the performance. This time, I migrated EC2 to ECS, RDS cluster to Aurora, and introduced LoadBalancer andElastiCache. I recommended and guided the developer to upgrade the Laravel version to the latest stable version and change the way of caching in the source code. I wrote the CloudWatch log insight queries to find URLs that took a long response time and asked to refactor the source code.

Worker Focus Telemetry Data

Worker Focus is a feature that captures the telemetry data of the user's activities from a work machine. My contribution to the project was orchestrating the AWS Glue jobs and catalogs by building AWS Step Functions and storing them in the Postgres database as the raw data. The raw data was processed further to calculate the time spent on different URLs or applications, which were categorized as focused vs. unfocused.
2019 - 2022

Master of Science Degree in IT and Applied Security

London Metropolitan University - London, United Kingdom

2012 - 2014

Bachelor of Science (Hons) Degree in Computer Networking and IT Security

London Metropolitan University - London, United Kingdom

2011 - 2011

International Diploma in Information and Communication Technology

Informatics Academy - Victoria Street, Singapore

AUGUST 2022 - AUGUST 2023

Databricks Lakehouse Fundamentals

Databricks

AUGUST 2022 - AUGUST 2024

dbt Fundamentals

dbt Labs

Libraries/APIs

Pandas, REST APIs, PySpark

Tools

NGINX, Zsh, GitHub, Stitch Data, Terraform, Amazon Elastic Container Service (Amazon ECS), Jenkins, Amazon Athena, Ansible, RabbitMQ, AWS CloudFormation, pfSense, OpenVPN, Sophos Firewall, Packer, Plotly, Seaborn, AWS Glue, AWS Step Functions, Amazon ElastiCache, Amazon CloudWatch, Spark SQL, AWS IAM

Paradigms

ETL, Kimball Methodology, OLAP

Frameworks

Spark, Laravel

Languages

Python, Snowflake, SQL, Regex, Bash, Bash Script, R, HTML, CSS, Java, Python 3

Platforms

Linux, Amazon EC2, Docker, Amazon Web Services (AWS), AWS Lambda, Burp Suite, Apache2, Visual Studio Code (VS Code), MacOS, Xplenty, Databricks, Rocket.Chat, Jupyter Notebook

Industry Expertise

Cybersecurity, Network Security

Storage

Amazon S3 (AWS S3), Microsoft SQL Server, Data Pipelines, OLTP, MongoDB, PostgreSQL, HDFS

Other

Data Build Tool (dbt), SSH, Amazon RDS, ELT, Data Engineering, AWS Auto Scaling, Big Data, Networking, Fivetran, Snowpipe, TCP/IP, Data Warehousing, Data Modeling, Delta Lake, Leadership, APIs, Software Project Management, Data Analytics, Prefect, Pipelines, Singer ETL, AWS Certificate Manager, FreeIPA, Active Directory Federation, Amazon Route 53, GoDaddy, SSL Certificates, Google Data Studio, GitHub Actions, Elastic Load Balancers, Google BigQuery

Collaboration That Works

How to Work with Toptal

Toptal matches you directly with global industry experts from our network in hours—not weeks or months.

1

Share your needs

Discuss your requirements and refine your scope in a call with a Toptal domain expert.
2

Choose your talent

Get a short list of expertly matched talent within 24 hours to review, interview, and choose from.
3

Start your risk-free talent trial

Work with your chosen talent on a trial basis for up to two weeks. Pay only if you decide to hire them.

Top talent is in high demand.

Start hiring