
Sumit Rai
Verified Expert in Engineering
Data Engineer and Developer
Kathmandu, Central Development Region, Nepal
Toptal member since September 1, 2022
Sumit is a principal software engineer with nine years of IT experience. He specializes in solving US health insurance and workforce solution challenges using his diverse skills and expertise. He excels in building enterprise data warehouses, automating workflows, writing software and scripts, and overcoming architectural and scalability obstacles. Sumit's role as a data, DevOps, and software engineer enables him to face technical hurdles and achieve business objectives with top-notch results.
Portfolio
Experience
- Python - 10 years
- Amazon Web Services (AWS) - 8 years
- SQL - 7 years
- Docker - 6 years
- Big Data - 3 years
- Snowflake - 3 years
- Data Build Tool (dbt) - 3 years
- ELT - 3 years
Availability
Preferred Environment
MacOS, Linux, Amazon Web Services (AWS), Python, Snowflake, Data Build Tool (dbt), Visual Studio Code (VS Code), Docker, Windows 10, Python 3
The most amazing...
...thing I've built is a data warehouse to centralize data and extract business answers and values.
Work Experience
Principal Software Engineer
Abacus Insights
- Managed a team of six highly skilled software engineers, providing leadership and guidance in understanding business objectives, system architectures, workflows, and technical challenges.
- Streamlined and consolidated insurance claims data from multiple US healthcare insurance organizations, enhancing data usability and enabling efficient report generation.
- Architected and developed automated data pipelines on the Databricks platform using PySpark or SparkSQL for seamless extraction, loading, and transformation of data.
- Ensured the timely delivery of project requirements and conducted thorough reviews to maintain high-quality standards.
- Facilitated cross-department communication and collaboration with the business analyst (BA) and quality assurance (QA) teams, ensuring alignment and effective coordination throughout the project lifecycle.
Senior Data Engineer
CloudFactory
- Migrated data pipelines from Xplenty to Prefect orchestration, improving cost-effectiveness.
- Provided crucial support to the data team, overseeing the technical operations of the enterprise data warehouse. Facilitated cross-departmental understanding of business operations, processes, and data origins.
- Implemented documentation and testing protocols, resulting in an 80% improvement in team performance.
- Developed dbt transformations following the Kimball approach for complete, accurate, and timely data processing.
- Supported the establishment of a self-service BI platform with accurate data delivery and user-friendly documentation.
- Created a CI pipeline in GitHub Actions, ensuring code quality checks, model validation, and data testing.
- Orchestrated data pipelines using Prefect, transitioning from EC2 to enhance visibility and optimize resource allocation in the AWS Fargate environment.
- Created and managed the Snowflake objects, including databases, catalogs, schemas, tables, views, stages, Snowpipe, masks, and tasks.
Software and DevOps Engineer
CloudFactory
- Improved availability and stability of a critical communication microservice application, recognized and rewarded for achieving performance increase from 95% to 99%.
- Implemented a game-changing optimization, reducing EC2 Auto Scaling instance startup time by 70%.
- Replaced Ansible scripts with HashiCorp Packer for faster instance provisioning using custom AWS AMIs built with AWS CodeBuild.
- Developed optimized SQL queries for processing big data, collaborating with data scientists on PySpark and AWS Glue jobs.
- Orchestrated AWS Glue jobs and catalogs using AWS Step Functions and stored raw data in PostgreSQL for further processing and categorization.
- Troubleshot and resolved issues across multiple applications in development, test, and production environments.
- Monitored application performance proactively, optimizing operations and making necessary enhancements.
- Led the upgrade of legacy infrastructure to align with evolving business operations and leverage advanced AWS services.
- Migrated 200-300 old VPC-less instances to AWS VPC environment, enhancing security and internal connectivity.
- Collaborated with software engineering teams to achieve Scrum sprint goals, fostering effective teamwork and alignment.
Software and DevOps Engineer
Leapfrog Technology
- Developed Python and Ansible scripts to efficiently configure EC2 infrastructures, streamlining the deployment process and ensuring consistent configuration management.
- Implemented Jenkins pipelines to automate the deployment of source codes, enabling seamless and efficient deployment workflows.
- Designed and structured AWS resources with Auto Scaling and Elastic Load Balancing via CloudFormation and configured and troubleshot them.
- Configured and resolved issues related to these resources to optimize performance and scalability.
- Deployed and configured the pfSense firewall for the intranet, including traffic shaping for optimal working environments.
- Configured additional security measures such as VPN (OpenVPN), Snort IPS, and Squid web filter to enhance network security.
- Engineered, implemented, and monitored comprehensive security measures to safeguard computer systems, networks, and sensitive information, ensuring the integrity and confidentiality of data.
- Established FreeIPA and Active Directory identity managers to centralize employee identities and manage access permissions across networks and servers, enhancing security and simplifying user management processes.
Software Developer
Incessant Rain Studios
- Designed and maintained file pipelines for seamless file transfers across departments, supporting animators in accessing and uploading necessary files.
- Collaborated with the .NET programming team to automate render job handling, resulting in a streamlined rendering process with improved efficiency and reduced manual intervention.
- Developed a customized remote services manager tool for the rendering department, simplifying render job management and increasing productivity while minimizing administrative overhead.
Experience
CloudFactory's Enterprise Data Warehouse
I built facts and dimensions tables by writing SQL queries and executed them using the dbt tool for building reports by the data analyst. The tool was capable of documentation and testing, and with these features and GitHub Action, the CI/CD was designed to have automated testing, resulting in reliable SQL in production.
ShareSansar Migration to Cloud and Containerization
https://www.sharesansar.com/I migrated all their web hosting control panel to AWS cloud services. The web servers migrated to EC2, the databases to the RDS cluster, and static files to S3. I recommended and guided the developer to refactor the SQL to increase the server performance.
I got reached again to scale the web servers and increase the performance. This time, I migrated EC2 to ECS, RDS cluster to Aurora, and introduced LoadBalancer andElastiCache. I recommended and guided the developer to upgrade the Laravel version to the latest stable version and change the way of caching in the source code. I wrote the CloudWatch log insight queries to find URLs that took a long response time and asked to refactor the source code.
Worker Focus Telemetry Data
Education
Master of Science Degree in IT and Applied Security
London Metropolitan University - London, United Kingdom
Bachelor of Science (Hons) Degree in Computer Networking and IT Security
London Metropolitan University - London, United Kingdom
International Diploma in Information and Communication Technology
Informatics Academy - Victoria Street, Singapore
Certifications
Databricks Lakehouse Fundamentals
Databricks
dbt Fundamentals
dbt Labs
Skills
Libraries/APIs
Pandas, REST APIs, PySpark
Tools
NGINX, Zsh, GitHub, Microsoft Excel, Prefect, Stitch Data, Terraform, Amazon Elastic Container Service (ECS), Jenkins, Amazon Athena, Ansible, RabbitMQ, AWS CloudFormation, pfSense, OpenVPN, Sophos Firewall, Packer, Plotly, Seaborn, AWS Glue, AWS Step Functions, Amazon ElastiCache, Amazon CloudWatch, Spark SQL, AWS IAM
Languages
Python, Snowflake, SQL, Regex, Bash, Bash Script, R, HTML, CSS, Java, Python 3
Paradigms
ETL, Kimball Methodology, OLAP
Platforms
Linux, Amazon EC2, Docker, Amazon Web Services (AWS), AWS Lambda, Burp Suite, Apache2, Visual Studio Code (VS Code), MacOS, Xplenty, Databricks, Rocket.Chat, Jupyter Notebook
Storage
JSON, Amazon S3 (AWS S3), Microsoft SQL Server, Data Pipelines, OLTP, MongoDB, PostgreSQL, HDFS
Frameworks
Spark, Laravel
Industry Expertise
Cybersecurity
Other
Data Build Tool (dbt), SSH, Amazon RDS, ELT, Data Engineering, AWS Auto Scaling, Big Data, Networking, Fivetran, Snowpipe, TCP/IP, Data Warehousing, Data Modeling, Delta Lake, Leadership, APIs, Software Project Management, Data Analytics, Network Security, Pipelines, Singer ETL, AWS Certificate Manager, FreeIPA, Active Directory Federation, Amazon Route 53, GoDaddy, SSL Certificates, Google Data Studio, GitHub Actions, Elastic Load Balancers, Google BigQuery, Windows 10
How to Work with Toptal
Toptal matches you directly with global industry experts from our network in hours—not weeks or months.
Share your needs
Choose your talent
Start your risk-free talent trial
Top talent is in high demand.
Start hiring