Steven Rouk
Verified Expert in Engineering
Data Engineer and Developer
Arlington, VA, United States
Toptal member since October 6, 2021
Steven is an expert Python and SQL data engineer with strong data science and data analysis capabilities and eight years of experience. He has led the development of technical solutions, and his most recent role was as a senior member of a data engineering team at a Fortune 100 company. The team processed billions of rows of data in a Snowflake ecosystem with thousands of tables, and the client regarded Steven as one of the highest contributing members of the 10+ member team.
Portfolio
Experience
- Python - 10 years
- Data Engineering - 7 years
- SQL - 7 years
- Jupyter Notebook - 6 years
- Pandas - 6 years
- Neo4j - 1 year
- Amazon Web Services (AWS) - 1 year
- Snowflake - 1 year
Availability
Preferred Environment
Python, SQL, Snowflake, Jupyter Notebook, Tableau, Pandas, Data Build Tool (dbt), Fivetran, Google Cloud Platform (GCP), Amazon Web Services (AWS)
The most amazing...
...tools I've developed are a Python-based automated data profiling tool and an automated data quality tool.
Work Experience
Data Engineer Consultant
Slalom
- Served as the lead data engineer for complex ETL processes at a Fortune 100 telecommunications company. Regularly processed billions of rows of data in a Snowflake ecosystem with thousands of tables. One dataset resulted in more than 25% more accurate data.
- Created an exploratory data analysis (EDA) process used by a team of 10 data engineers. The process included automated data profiling using Python and SQL and guidance for ad hoc analysis and data visualization using Jupyter notebooks.
- Led and managed a workflow migration from Apache Airflow to Jenkins.
- Created a centralized dashboard to monitor ETL workflows. This required pulling job data from Jenkins, writing it to Snowflake, and visualizing it in Sigma computing.
- Developed an automated Snowflake table cleanup script that drops old temp tables nightly, saving our client thousands of dollars per month.
- Mentored five junior data engineers in Python, SQL, Snowflake, Jenkins, Bash, Git, and data engineering techniques and processes.
- Prototyped a data lineage solution using SQL parsing and a Neo4j graph database.
Analytics and Research Specialist
Mercy For Animals
- Analyzed year-end donation data, uncovering distinct email clusters and revenue trends.
- Created impact estimation methodologies for multiple programs.
- Developed a data pipeline prototype using Google Cloud Platform.
- Presented and led data and research workshops at four staff retreats.
Python Developer and Tableau Consultant
Boulder Insight
- Helped clients across a variety of industries with ETL, data visualization, and data dashboards.
- Served as the lead developer on a client-facing Python Flask web application to automate the distribution of Tableau dashboards.
- Constructed Python ETL pipelines to pull data from web APIs, process it, then save it to a MySQL database.
- Presented at Boulder Startup Week, Analyze Boulder, Boulder Tableau User Group, and Boulder Python.
Experience
Production Dataset for Network Customer Service Applications
As the sole developer on the project, I worked closely with end users to design the desired data schema, create data logic to meet business needs, and validate the accuracy of the dataset. As the requirements continually evolved, I brought together disparate teams at each step to reach a common understanding of the required data and data logic. In the end, we increased the data accuracy by 25%+ in a customer-facing production environment.
Data Infrastructure Setup | Airbyte, dbt, and BigQuery
I used Airbyte for the extract/load part of this process and dbt for the data transformations. Raw and transformed data were landed on BigQuery. Data visualizations and reports were built in Looker Studio.
Because of this new, automated analytics infrastructure, the team can now see metrics and data instantaneously rather than through the laborious manual approach they previously had to use.
Automated Data Profiling Script for SQL Databases
Evolution of Machine Learning | Analysis and Website
https://github.com/stevenrouk/evolution-of-machine-learningFinding Patterns in Social Networks Using Graph Data
https://github.com/stevenrouk/social-network-graph-analysisKey Activities
• Created a graph of the data using the NetworkX Python graph library.
• Experimented with creating my own graph data objects to load and traverse the graph.
• Analyzed connections (in-degree and out-degree), distinct networks (component analysis), sharing reciprocity, centrality, and PageRank.
• Experimented with ways to visualize large graph structures by randomly sampling neighbor nodes.
Education
Bachelor's Degree in Mathematics
Arizona State University - Tempe, AZ, USA
Certifications
Neo4j Certified Professional
Neo4j
AWS Certified Machine Learning – Specialty
AWS
AWS Certified Solutions Architect – Associate
AWS
Skills
Libraries/APIs
Pandas, Scikit-learn, Matplotlib, NetworkX, NumPy
Tools
Jenkins, Git, Tableau, BigQuery, Amazon SageMaker, Apache Airflow, Optimizely, Seaborn
Languages
Python, SQL, Snowflake, Bash, Cypher, R
Platforms
Jupyter Notebook, Visual Studio Code (VS Code), Google Cloud Platform (GCP), Amazon Web Services (AWS), Amazon EC2, Airbyte
Paradigms
ETL
Storage
Data Pipelines, Neo4j, Graph Databases, MySQL
Frameworks
Flask
Other
Data Science, Data Engineering, Mathematics, Programming, Machine Learning, APIs, Data Modeling, Amazon RDS, Amazon Machine Learning, Natural Language Processing (NLP), Graph Theory, Social Network Analysis, Statistics, Generative Pre-trained Transformers (GPT), Data Build Tool (dbt), Fivetran
How to Work with Toptal
Toptal matches you directly with global industry experts from our network in hours—not weeks or months.
Share your needs
Choose your talent
Start your risk-free talent trial
Top talent is in high demand.
Start hiring