Sumit Rai, Developer in Kathmandu, Central Development Region, Nepal
Sumit is available for hire
Hire Sumit

Sumit Rai

Verified Expert  in Engineering

Data Engineer and Developer

Kathmandu, Central Development Region, Nepal

Toptal member since September 1, 2022

Bio

Sumit is a principal software engineer with nine years of IT experience. He specializes in solving US health insurance and workforce solution challenges using his diverse skills and expertise. He excels in building enterprise data warehouses, automating workflows, writing software and scripts, and overcoming architectural and scalability obstacles. Sumit's role as a data, DevOps, and software engineer enables him to face technical hurdles and achieve business objectives with top-notch results.

Portfolio

Abacus Insights
Databricks, PySpark, Spark SQL, Python 3, Data Build Tool (dbt)...
CloudFactory
Snowflake, Data Build Tool (dbt), Linux, Prefect, Pipelines, Singer ETL...
CloudFactory
Amazon Web Services (AWS), Jenkins, SSH, MongoDB, Amazon EC2, Amazon RDS...

Experience

  • Python - 10 years
  • Amazon Web Services (AWS) - 8 years
  • SQL - 7 years
  • Docker - 6 years
  • Big Data - 3 years
  • Snowflake - 3 years
  • Data Build Tool (dbt) - 3 years
  • ELT - 3 years

Availability

Part-time

Preferred Environment

MacOS, Linux, Amazon Web Services (AWS), Python, Snowflake, Data Build Tool (dbt), Visual Studio Code (VS Code), Docker, Windows 10, Python 3

The most amazing...

...thing I've built is a data warehouse to centralize data and extract business answers and values.

Work Experience

Principal Software Engineer

2023 - PRESENT
Abacus Insights
  • Managed a team of six highly skilled software engineers, providing leadership and guidance in understanding business objectives, system architectures, workflows, and technical challenges.
  • Streamlined and consolidated insurance claims data from multiple US healthcare insurance organizations, enhancing data usability and enabling efficient report generation.
  • Architected and developed automated data pipelines on the Databricks platform using PySpark or SparkSQL for seamless extraction, loading, and transformation of data.
  • Ensured the timely delivery of project requirements and conducted thorough reviews to maintain high-quality standards.
  • Facilitated cross-department communication and collaboration with the business analyst (BA) and quality assurance (QA) teams, ensuring alignment and effective coordination throughout the project lifecycle.
Technologies: Databricks, PySpark, Spark SQL, Python 3, Data Build Tool (dbt), Amazon S3 (AWS S3), Snowflake, Data Engineering, ETL, ELT, OLAP, OLTP, Delta Lake, Spark, Leadership, GitHub, Terraform, Amazon Web Services (AWS), Python, Visual Studio Code (VS Code), Pipelines, Data Warehousing, Pandas, Data Pipelines, JSON, Microsoft Excel

Senior Data Engineer

2020 - 2022
CloudFactory
  • Migrated data pipelines from Xplenty to Prefect orchestration, improving cost-effectiveness.
  • Provided crucial support to the data team, overseeing the technical operations of the enterprise data warehouse. Facilitated cross-departmental understanding of business operations, processes, and data origins.
  • Implemented documentation and testing protocols, resulting in an 80% improvement in team performance.
  • Developed dbt transformations following the Kimball approach for complete, accurate, and timely data processing.
  • Supported the establishment of a self-service BI platform with accurate data delivery and user-friendly documentation.
  • Created a CI pipeline in GitHub Actions, ensuring code quality checks, model validation, and data testing.
  • Orchestrated data pipelines using Prefect, transitioning from EC2 to enhance visibility and optimize resource allocation in the AWS Fargate environment.
  • Created and managed the Snowflake objects, including databases, catalogs, schemas, tables, views, stages, Snowpipe, masks, and tasks.
Technologies: Snowflake, Data Build Tool (dbt), Linux, Prefect, Pipelines, Singer ETL, Fivetran, Amazon Web Services (AWS), Stitch Data, Terraform, Amazon Elastic Container Service (ECS), Xplenty, Snowpipe, ELT, Data Engineering, SQL, Data Warehousing, ETL, Zsh, Bash Script, Bash, Pandas, Kimball Methodology, GitHub Actions, Visual Studio Code (VS Code), Regex, Data Pipelines, OLAP, OLTP, Data Modeling, Leadership, GitHub, REST APIs, APIs, MacOS, Python, Amazon S3 (AWS S3), Python 3, JSON, Microsoft Excel

Software and DevOps Engineer

2018 - 2020
CloudFactory
  • Improved availability and stability of a critical communication microservice application, recognized and rewarded for achieving performance increase from 95% to 99%.
  • Implemented a game-changing optimization, reducing EC2 Auto Scaling instance startup time by 70%.
  • Replaced Ansible scripts with HashiCorp Packer for faster instance provisioning using custom AWS AMIs built with AWS CodeBuild.
  • Developed optimized SQL queries for processing big data, collaborating with data scientists on PySpark and AWS Glue jobs.
  • Orchestrated AWS Glue jobs and catalogs using AWS Step Functions and stored raw data in PostgreSQL for further processing and categorization.
  • Troubleshot and resolved issues across multiple applications in development, test, and production environments.
  • Monitored application performance proactively, optimizing operations and making necessary enhancements.
  • Led the upgrade of legacy infrastructure to align with evolving business operations and leverage advanced AWS services.
  • Migrated 200-300 old VPC-less instances to AWS VPC environment, enhancing security and internal connectivity.
  • Collaborated with software engineering teams to achieve Scrum sprint goals, fostering effective teamwork and alignment.
Technologies: Amazon Web Services (AWS), Jenkins, SSH, MongoDB, Amazon EC2, Amazon RDS, AWS Lambda, PostgreSQL, Amazon Athena, Ansible, RabbitMQ, AWS Certificate Manager, AWS CloudFormation, Docker, Amazon Elastic Container Service (ECS), SQL, Terraform, Packer, Rocket.Chat, AWS Auto Scaling, NGINX, Apache2, Zsh, Bash, Bash Script, Jupyter Notebook, Burp Suite, Visual Studio Code (VS Code), Regex, AWS Glue, AWS Step Functions, AWS IAM, GitHub, REST APIs, APIs, MacOS, Python, Amazon S3 (AWS S3), Python 3, JSON, Microsoft Excel

Software and DevOps Engineer

2016 - 2018
Leapfrog Technology
  • Developed Python and Ansible scripts to efficiently configure EC2 infrastructures, streamlining the deployment process and ensuring consistent configuration management.
  • Implemented Jenkins pipelines to automate the deployment of source codes, enabling seamless and efficient deployment workflows.
  • Designed and structured AWS resources with Auto Scaling and Elastic Load Balancing via CloudFormation and configured and troubleshot them.
  • Configured and resolved issues related to these resources to optimize performance and scalability.
  • Deployed and configured the pfSense firewall for the intranet, including traffic shaping for optimal working environments.
  • Configured additional security measures such as VPN (OpenVPN), Snort IPS, and Squid web filter to enhance network security.
  • Engineered, implemented, and monitored comprehensive security measures to safeguard computer systems, networks, and sensitive information, ensuring the integrity and confidentiality of data.
  • Established FreeIPA and Active Directory identity managers to centralize employee identities and manage access permissions across networks and servers, enhancing security and simplifying user management processes.
Technologies: Amazon Web Services (AWS), Amazon Elastic Container Service (ECS), Amazon RDS, Amazon S3 (AWS S3), Docker, Networking, pfSense, OpenVPN, FreeIPA, Active Directory Federation, Amazon Route 53, Ansible, Jenkins, GoDaddy, SSL Certificates, SQL, Burp Suite, Sophos Firewall, Apache2, NGINX, Zsh, TCP/IP, AWS IAM, GitHub, Python, Python 3, JSON, Microsoft Excel

Software Developer

2014 - 2016
Incessant Rain Studios
  • Designed and maintained file pipelines for seamless file transfers across departments, supporting animators in accessing and uploading necessary files.
  • Collaborated with the .NET programming team to automate render job handling, resulting in a streamlined rendering process with improved efficiency and reduced manual intervention.
  • Developed a customized remote services manager tool for the rendering department, simplifying render job management and increasing productivity while minimizing administrative overhead.
Technologies: Python, Regex, Microsoft SQL Server, Python 3

Experience

CloudFactory's Enterprise Data Warehouse

A data warehouse for the organization to centralize all the data and extract business answers and values.

I built facts and dimensions tables by writing SQL queries and executed them using the dbt tool for building reports by the data analyst. The tool was capable of documentation and testing, and with these features and GitHub Action, the CI/CD was designed to have automated testing, resulting in reliable SQL in production.

ShareSansar Migration to Cloud and Containerization

https://www.sharesansar.com/
ShareSansar is Nepal's financial news portal and also owns Nepal's stock analysis product.
I migrated all their web hosting control panel to AWS cloud services. The web servers migrated to EC2, the databases to the RDS cluster, and static files to S3. I recommended and guided the developer to refactor the SQL to increase the server performance.
I got reached again to scale the web servers and increase the performance. This time, I migrated EC2 to ECS, RDS cluster to Aurora, and introduced LoadBalancer andElastiCache. I recommended and guided the developer to upgrade the Laravel version to the latest stable version and change the way of caching in the source code. I wrote the CloudWatch log insight queries to find URLs that took a long response time and asked to refactor the source code.

Worker Focus Telemetry Data

Worker Focus is a feature that captures the telemetry data of the user's activities from a work machine. My contribution to the project was orchestrating the AWS Glue jobs and catalogs by building AWS Step Functions and storing them in the Postgres database as the raw data. The raw data was processed further to calculate the time spent on different URLs or applications, which were categorized as focused vs. unfocused.

Education

2019 - 2022

Master of Science Degree in IT and Applied Security

London Metropolitan University - London, United Kingdom

2012 - 2014

Bachelor of Science (Hons) Degree in Computer Networking and IT Security

London Metropolitan University - London, United Kingdom

2011 - 2011

International Diploma in Information and Communication Technology

Informatics Academy - Victoria Street, Singapore

Certifications

AUGUST 2022 - AUGUST 2023

Databricks Lakehouse Fundamentals

Databricks

AUGUST 2022 - AUGUST 2024

dbt Fundamentals

dbt Labs

Skills

Libraries/APIs

Pandas, REST APIs, PySpark

Tools

NGINX, Zsh, GitHub, Microsoft Excel, Prefect, Stitch Data, Terraform, Amazon Elastic Container Service (ECS), Jenkins, Amazon Athena, Ansible, RabbitMQ, AWS CloudFormation, pfSense, OpenVPN, Sophos Firewall, Packer, Plotly, Seaborn, AWS Glue, AWS Step Functions, Amazon ElastiCache, Amazon CloudWatch, Spark SQL, AWS IAM

Languages

Python, Snowflake, SQL, Regex, Bash, Bash Script, R, HTML, CSS, Java, Python 3

Paradigms

ETL, Kimball Methodology, OLAP

Platforms

Linux, Amazon EC2, Docker, Amazon Web Services (AWS), AWS Lambda, Burp Suite, Apache2, Visual Studio Code (VS Code), MacOS, Xplenty, Databricks, Rocket.Chat, Jupyter Notebook

Storage

JSON, Amazon S3 (AWS S3), Microsoft SQL Server, Data Pipelines, OLTP, MongoDB, PostgreSQL, HDFS

Frameworks

Spark, Laravel

Industry Expertise

Cybersecurity

Other

Data Build Tool (dbt), SSH, Amazon RDS, ELT, Data Engineering, AWS Auto Scaling, Big Data, Networking, Fivetran, Snowpipe, TCP/IP, Data Warehousing, Data Modeling, Delta Lake, Leadership, APIs, Software Project Management, Data Analytics, Network Security, Pipelines, Singer ETL, AWS Certificate Manager, FreeIPA, Active Directory Federation, Amazon Route 53, GoDaddy, SSL Certificates, Google Data Studio, GitHub Actions, Elastic Load Balancers, Google BigQuery, Windows 10

Collaboration That Works

How to Work with Toptal

Toptal matches you directly with global industry experts from our network in hours—not weeks or months.

1

Share your needs

Discuss your requirements and refine your scope in a call with a Toptal domain expert.
2

Choose your talent

Get a short list of expertly matched talent within 24 hours to review, interview, and choose from.
3

Start your risk-free talent trial

Work with your chosen talent on a trial basis for up to two weeks. Pay only if you decide to hire them.

Top talent is in high demand.

Start hiring