Sumit Rai
Verified Expert in Engineering
Data Engineer and Developer
Sumit is a senior data engineer specializing in building an enterprise data warehouse, automating workflows, and tackling challenging architectural scalability problems. He has eight years of experience in the IT industry working as a software developer, DevOps engineer, and data engineer. With this vast experience, Sumit meets the business objectives, follows best practices to deliver quality products, and troubleshoots challenging technical problems.
Portfolio
Experience
Availability
Preferred Environment
MacOS, Linux, Amazon Web Services (AWS), Python, Snowflake, Data Build Tool (dbt), Visual Studio Code (VS Code), Docker
The most amazing...
...thing I've built is a data warehouse to centralize data and extract business answers and values.
Work Experience
Principal Software Engineer
Abacus Insights
- Generalized and centralized the insurance claims of several US Healthcare Insurance organizations for the data usability to generate reports.
- Designed and built the automated data pipelines for extracting, loading, and transforming the data in the Databricks Platform using PySpark or SparkSQL. Ensured the data quality with unit testing using the Great Expectations library.
- Led colleagues by teaching and helping them to understand the business objectives, system architectures and workflows, and technical challenges.
- Ensured the requirements were delivered timely and reviewed the work to ensure quality.
Senior Data Engineer
CloudFactory
- Led the migration of all data pipelines existing in Xplenty to Prefect orchestration due to the lower return value compared to the cost spent.
- Recognized for working dedicatedly on different parts of the project and being the pillar of the data team.
- Implemented documentation and tests, improving the team's performance by 80% and meeting the organization's goals of building a self-service BI platform.
- Wrote transformations in dbt, following the Kimball approach, and ensured complete, accurate, and timely data.
- Built a CI pipeline from scratch in GitHub Actions to have quick feedback for data model source contributors.
- Implemented the SQL standard by introducing SQLFluff and automatically evaluating the quality of the SQL using GitHub Actions.
- Loaded the data sources into the enterprise data warehouse using ETL/ELT tools and scripts.
- Orchestrated data pipelines using Prefect that existed in EC2 to enable centralized logging and proper visibility of the pipeline runs.
- Created and managed the Snowflake objects, including databases, catalogs, schemas, tables, views, stages, Snowpipe, masks, and tasks.
Software and DevOps Engineer
CloudFactory
- Increased the availability and stability of a critical communication microservice application from 95% to 99% and was awarded for this effort.
- Reduced the startup time of EC2 Auto Scaling instances by 70% by replacing on-load executable Ansible scripts into HashiCorp packer.
- Wrote optimized SQL queries to process big data and get ultimate results to make proper business decisions.
- Troubleshot and resolved issues across many apps to support the ongoing platform development in development, test, and production environments.
- Monitored the application's operation and enhanced its performance.
- Updated and upgraded legacy infrastructure to support business operations.
- Collaborated with the software engineering teams to meet the Scrum sprint goals.
Software and DevOps Engineer
Leapfrog Technology
- Developed the Python and Ansible scripts to configure EC2 infrastructures.
- Constructed the Jenkins pipelines to deploy the source codes automatically.
- Designed and structured the AWS resources with Auto Scaling and Elastic Load Balancing via CloudFormation. Configured and troubleshot them.
- Set up and configured the pfSense firewall for the intranet. Configured the traffic shaping for suitable working environments, VPN (OpenVPN), Snort IPS, and Squid web filter.
- Engineered, implemented, and monitored security measures to protect computer systems, networks, and information.
- Set up FreeIPA and Active Directory identity managers to centralize the employees' identities and manage access permissions in networks and servers.
Software Developer
Incessant Rain Studios
- Designed, developed, tested, and maintained file pipelines for animators from different departments to enable them to download necessary files or upload files they worked on.
- Researched render batch commands and cooperated with the .NET programming team to build an automated render job handler.
- Developed a remote services manager tool for the rendering department to ease their render jobs.
Experience
CloudFactory's Enterprise Data Warehouse
I built facts and dimensions tables by writing SQL queries and executed them using the dbt tool for building reports by the data analyst. The tool was capable of documentation and testing, and with these features and GitHub Action, the CI/CD was designed to have automated testing, resulting in reliable SQL in production.
ShareSansar Migration to Cloud and Containerization
https://www.sharesansar.com/I migrated all their web hosting control panel to AWS cloud services. The web servers migrated to EC2, the databases to the RDS cluster, and static files to S3. I recommended and guided the developer to refactor the SQL to increase the server performance.
I got reached again to scale the web servers and increase the performance. This time, I migrated EC2 to ECS, RDS cluster to Aurora, and introduced LoadBalancer andElastiCache. I recommended and guided the developer to upgrade the Laravel version to the latest stable version and change the way of caching in the source code. I wrote the CloudWatch log insight queries to find URLs that took a long response time and asked to refactor the source code.
Worker Focus Telemetry Data
Education
Master of Science Degree in IT and Applied Security
London Metropolitan University - London, United Kingdom
Bachelor of Science (Hons) Degree in Computer Networking and IT Security
London Metropolitan University - London, United Kingdom
International Diploma in Information and Communication Technology
Informatics Academy - Victoria Street, Singapore
Certifications
Databricks Lakehouse Fundamentals
Databricks
dbt Fundamentals
dbt Labs
Skills
Libraries/APIs
Pandas, REST APIs, PySpark
Tools
NGINX, Zsh, GitHub, Stitch Data, Terraform, Amazon Elastic Container Service (Amazon ECS), Jenkins, Amazon Athena, Ansible, RabbitMQ, AWS CloudFormation, pfSense, OpenVPN, Sophos Firewall, Packer, Plotly, Seaborn, AWS Glue, AWS Step Functions, Amazon ElastiCache, Amazon CloudWatch, Spark SQL, AWS IAM
Paradigms
ETL, Kimball Methodology, OLAP
Frameworks
Spark, Laravel
Languages
Python, Snowflake, SQL, Regex, Bash, Bash Script, R, HTML, CSS, Java, Python 3
Platforms
Linux, Amazon EC2, Docker, Amazon Web Services (AWS), AWS Lambda, Burp Suite, Apache2, Visual Studio Code (VS Code), MacOS, Xplenty, Databricks, Rocket.Chat, Jupyter Notebook
Industry Expertise
Cybersecurity, Network Security
Storage
Amazon S3 (AWS S3), Microsoft SQL Server, Data Pipelines, OLTP, MongoDB, PostgreSQL, HDFS
Other
Data Build Tool (dbt), SSH, Amazon RDS, ELT, Data Engineering, AWS Auto Scaling, Big Data, Networking, Fivetran, Snowpipe, TCP/IP, Data Warehousing, Data Modeling, Delta Lake, Leadership, APIs, Software Project Management, Data Analytics, Prefect, Pipelines, Singer ETL, AWS Certificate Manager, FreeIPA, Active Directory Federation, Amazon Route 53, GoDaddy, SSL Certificates, Google Data Studio, GitHub Actions, Elastic Load Balancers, Google BigQuery
How to Work with Toptal
Toptal matches you directly with global industry experts from our network in hours—not weeks or months.
Share your needs
Choose your talent
Start your risk-free talent trial
Top talent is in high demand.
Start hiring