
Steven Rouk
Verified Expert in Engineering
Data Engineer and Developer
Steven is an expert Python and SQL data engineer with strong data science and data analysis capabilities and eight years of experience. He has led the development of technical solutions, and his most recent role was as a senior member of a data engineering team at a Fortune 100 company. The team processed billions of rows of data in a Snowflake ecosystem with thousands of tables, and the client regarded Steven as one of the highest contributing members of the 10+-member team.
Portfolio
Experience
Availability
Preferred Environment
Python, SQL, Visual Studio Code (VS Code), Snowflake, Jenkins, Jupyter Notebook, Git, Bash, Tableau, Pandas
The most amazing...
...tools I've developed are a Python-based automated data profiling tool and an automated data quality tool.
Work Experience
Data Engineer Consultant
Slalom
- Served as the lead data engineer for complex ETL processes at a Fortune 100 telecommunications company. Regularly processed billions of rows of data in a Snowflake ecosystem with thousands of tables. One dataset resulted in 25%+ more accurate data.
- Created an exploratory data analysis (EDA) process used by a team of ten data engineers. The process included automated data profiling using Python and SQL and guidance for ad hoc analysis and data visualization using Jupyter notebooks.
- Led and managed a workflow migration from Apache Airflow to Jenkins.
- Created a centralized dashboard to monitor ETL workflows. This required pulling job data from Jenkins, writing it to Snowflake, and visualizing it in Sigma computing.
- Developed an automated Snowflake table cleanup script that drops old temp tables nightly, saving our client thousands of dollars per month.
- Mentored five junior data engineers in Python, SQL, Snowflake, Jenkins, Bash, Git, and data engineering techniques and processes.
- Prototyped a data lineage solution using SQL parsing and a Neo4j graph database.
Analytics and Research Specialist
Mercy For Animals
- Analyzed year-end donation data, uncovering distinct email clusters and revenue trends.
- Created impact estimation methodologies for multiple programs.
- Developed a data pipeline prototype using Google Cloud Platform.
- Presented and led data and research workshops at four staff retreats.
Python Developer and Tableau Consultant
Boulder Insight
- Helped clients across a variety of industries with ETL, data visualization, and data dashboards.
- Served as the lead developer on a client-facing Python Flask web application to automate the distribution of Tableau dashboards.
- Constructed Python ETL pipelines to pull data from web APIs, process it, then save it to a MySQL database.
- Presented at Boulder Startup Week, Analyze Boulder, Boulder Tableau User Group, and Boulder Python.
Experience
Production Dataset for Network Customer Service Applications
As the sole developer on the project, I worked closely with end users to design the desired data schema, come up with data logic to meet the business needs, and validate the accuracy of the dataset. As the requirements continually evolved, I brought together disparate teams at each step to reach a common understanding of the required data and data logic. In the end, we increased the data accuracy by 25%+ in a customer-facing production environment.
Automated Data Profiling Script for SQL Databases
Evolution of Machine Learning | Analysis and Website
https://github.com/stevenrouk/evolution-of-machine-learningFinding Patterns in Social Networks Using Graph Data
https://github.com/stevenrouk/social-network-graph-analysisKey Activities
• Created a graph of the data using the NetworkX Python graph library.
• Experimented with creating my own graph data objects to load and traverse the graph.
• Analyzed connections (in-degree and out-degree), distinct networks (component analysis), sharing reciprocity, centrality, and PageRank.
• Experimented with ways to visualize large graph structures by randomly sampling neighbor nodes.
Skills
Languages
Python, SQL, Snowflake, Bash, Cypher, R
Libraries/APIs
Pandas, Scikit-learn, Matplotlib, NetworkX, NumPy
Paradigms
Data Science, ETL
Platforms
Jupyter Notebook, Visual Studio Code (VS Code), Amazon Web Services (AWS), Amazon EC2
Other
Data Engineering, Mathematics, Programming, Machine Learning, Data Modeling, Amazon RDS, Amazon Machine Learning, APIs, Natural Language Processing (NLP), Graph Theory, Social Network Analysis, Statistics, GPT, Generative Pre-trained Transformers (GPT)
Tools
Jenkins, Git, Tableau, Amazon SageMaker, Apache Airflow, Optimizely, Seaborn
Storage
Data Pipelines, Neo4j, Graph Databases, MySQL
Frameworks
Flask
Education
Bachelor's Degree in Mathematics
Arizona State University - Tempe, AZ, USA
Certifications
Neo4j Certified Professional
Neo4j
AWS Certified Machine Learning – Specialty
AWS
AWS Certified Solutions Architect – Associate
AWS