Derek Owens-Oas, Data Scientist and Developer in Ashland, OR, United States
Derek Owens-Oas

Data Scientist and Developer in Ashland, OR, United States

Member since December 20, 2019
Derek has a Ph.D. in statistical science and has worked as a data scientist and software developer at Xylem. A published author in the Journal of Classification, his expertise is in providing technical reports and insights with interactive visualizations. Derek's extensive knowledge of Python and R libraries, state-of-the-art methods, and ability to communicate make him an asset to any company. His specialties include text and online social network analysis.
Derek is now available for hire

Portfolio

Experience

Location

Ashland, OR, United States

Availability

Full-time

Preferred Environment

GitHub, Microsoft Excel, Python, WordPress, Microsoft Word, R

The most amazing...

...contribution I've made at Xylem was an interactive app to help city utilities assess water-pipe-network quality in Dallas, DC, and Howard County.

Employment

  • Tutor and Consultant

    2018 - PRESENT
    Varsity Tutors
    • Developed a web application to visualize cost distribution with health insurance claims data.
    • Used machine learning and labeled data to estimate the sentiment of tweets on Twitter.
    • Quantified wound volume reduction for treated and control groups of patients.
    • Estimated username from internet session activity data.
    • Edited code on programming and statistics homework assignments with high school, college, and graduate students.
    Technologies: Algorithms, Computer Science, Statistiscs, SPSS, Microsoft Excel, SAS, R, Python, Natural Language Processing (NLP)
  • Data Science Developer

    2020 - 2020
    Shopper Media Group
    • Developed code to estimate the number of visitors at shopping centers with WiFi data.
    • Implemented methods for predicting shopper visits using a proxy center.
    • Imported table with visitation frequency charts into Redshift warehouse.
    • Gave video and audio reports with a daily status.
    • Typed up documentation about the process from surveying to a presentation on the web application.
    Technologies: SQL Functions, Data Science, Amazon Redshift, Google Chrome, Zoom, Microsoft Excel, Microsoft Word, K Nearest Neighbors, ARIMA, Redshift, SQL, Python, Big Data, Pandas
  • Data Scientist and Software Developer

    2020 - 2020
    SureTint Technologies
    • Integrated customer relationship management software for a beauty salon application.
    • Continued the development of a Python package about color combination.
    • Reorganized the data and code file folder structure.
    • Gathered and added new data into the existing pipeline.
    • Tested and ensured the good quality of the program performance.
    • Deployed a basic Django app and experimented with an alternate methodology.
    • Typed code in the AWS SageMaker computing environment.
    • Trained multiple linear models to estimate hair color with products.
    • Applied nearest neighbor method to convert a hair formula product line.
    Technologies: SQL Functions, SaaS, Data Science, Amazon Web Services (AWS), Statistical Modeling, Django, Git, Jupyter, Python, Amazon SageMaker, AWS, Pandas
  • Data Scientist

    2018 - 2019
    Xylem, Inc.
    • Developed a predictive model and application to efficiently prioritize water pipe inspection for major US city utilities.
    • Recruited talent to Xylem at an American Statistical Association event.
    • Wrote technical reports with data graphics and statistical language to inform management and a company executive.
    • Composed blog posts to emphasize and clarify company impacts.
    • Created and presented an interactive visualization of water quality and algae level in Lake Erie.
    Technologies: SaaS, Data Science, Amazon Web Services (AWS), AWS EC2, AWS S3, AWS, Atlassian Confluence, Jira, GitHub, Python, R

Experience

  • Online Social Network Report and Application (Development)
    https://github.com/dmo11/political_blog_posts/blob/master/link_block_lda_results.pdf

    I developed features, a learning algorithm, and web app visualization for topics and connections in an online social network. The R and Python implementations are available. Blog posts, Facebook comments and messages, Twitter tweets, and courtroom transcriptions are among the communication modes analyzed.

    Here is a link to the video showing this application:

    https://drive.google.com/file/d/1-Goo7OjKdGs9cvYxDfAu58GUuzDNSQg3/view?usp=sharing

  • Water Pipe Inspection Prioritizing Application (Development)

    A statistical report and web application to evaluate water pipe quality in DC, Dallas, and Howard County. I wrote the code by applying machine learning algorithms to estimate the probability of each pipe breaking in the next three years, along with visualizing results on an interactive map.

  • Lake Erie Water Quality Assessment (Development)

    I developed an interactive map giving estimates of water quality between sensor locations. I used machine learning, optimized linear predictive modeling, and spatial statistics along with writing a technical report which detailed the patterns of algae blooms.

  • Health Procedure Cost Explorer | Web App (Development)
    https://drive.google.com/file/d/1IwtWOAObd1aBcfm2IukvtzqNQaR_PjiP/view

    I set up a free, simple, full-stack server for hosting a web app. The link is to an online, interactive box-plot visualization that enables exploring health procedure costs. Insurance claims data are used to show how expenses are distributed to the insurer, provider, and patient.

    A second bar-graph version allows the user to mouse-over various procedural choices for treating osteoarthritis. Here is the link:

    https://drive.google.com/file/d/10gVQWka51w0RA5wmO4_BPIeEt3nt-ZRr/view?usp=sharing

    A healthcare provider can view the patient outcomes to guide the choice going forward.

  • Learning Topics and Communities in Political Blog Posts (Development)
    https://arxiv.org/pdf/1610.05756.pdf

    I designed, implemented, and authored a publication which applies a statistical learning algorithm to political blog post data. A latent group that provides commentary on sensational crime is identified. The results are published in the Journal of Classification.

  • Learning Original Poster in Group Conversation Data (Development)
    https://arxiv.org/pdf/1809.03648.pdf

    I contributed to and applied a dynamic programming algorithm to an election day mega-thread on Reddit and courtroom transcriptions. This method is a credit attribution method like those used in web advertising.

  • Statistics Web Blog (Development)

    I created a WordPress web blog where I've written posts sharing my experience during my Ph.D. program in statistics and as an early-career data scientist and quantitative consultant. These include graphics I've created and discussion of industries of interest.

  • Learning to Make a Tableau Dashboard (Development)
    https://drive.google.com/file/d/1ygKMZlXeIxfsyl8YjEJPGQGrVphbpYUg/view?usp=sharing

    I used a tutorial to visualize CO2 emissions data by countries in years. One graphic shows the amounts on a world map, and the other is a time series plot. It's possible to subset portions geographically and to mouse over and get specific observational values.

  • Salon Customer Brand Converter (Development)
    https://drive.google.com/file/d/1uVhkJSdCEioSStJNuitvSPb9NVxnSdJ7/view?usp=sharing

    SureTint Technologies software LaRu enables beauty salons to record customer hair formulas.

    I continued developing application which converts formulas from one product line to another. Data are on AWS, code is Python, and a statistical model was used.

    Features developed include a filter to ensure products conform to manufacturer recommendations.

Skills

  • Languages

    R, Python, SQL, JavaScript, HTML, SAS, CSS, Java
  • Frameworks

    RStudio Shiny, Django, Spark
  • Libraries/APIs

    Pandas, Scikit-learn, Caret, Facebook API, PySpark, PyTorch, Node.js, TensorFlow Deep Learning Library (TFLearn), Facebook Ads API, Twitter API, TensorFlow, Keras
  • Paradigms

    ETL, Automation, Data Science, App Development, Microservices, Quantitative Research, Business Intelligence (BI)
  • Industry Expertise

    Project Management, eCommerce, Financial Modeling, Web Development, Healthcare, Marketing
  • Storage

    Data Pipelines, Databases, SQL Functions, JSON, AWS S3, Redshift, PostgreSQL, AWS DynamoDB, MySQL
  • Other

    Data Analytics, Data Reporting, Data Visualization, Data Cleaning, Analytics, Algorithms, Natural Language Processing (NLP), Data Architecture, Data Modeling, Data Engineering, Analysis, Regression Models, Statistical Modeling, Excel Reporting, Artificial Intelligence (AI), Quantitative Models, A/B Testing, Topic Modeling, Classification, Visualization, Data Analyst, Predictive Analytics, SaaS, Big Data, Machine Learning, Technical Reports, Applied Mathematics, Statistics, Statistical Analysis, Data Analysis, Consulting, Time Series, Data Matching, Higher Education, Scraping, Web Scraping, AWS, Tableau Configuration, UI Development, Dashboards, APIs, Scheduling, Custom Audio Embedding, Deep Learning, Advertising, Serverless, ARIMA, K Nearest Neighbors, Computer Science, Amazon Redshift, Quantitative Finance, Data Handling, Software Development, Publishing, Blogging, Neural Networks
  • Tools

    Jira, Atlassian Confluence, Jupyter, Microsoft Excel, Microsoft Word, R Studio, Git, GitHub, Tableau, SPSS, Amazon SageMaker, Zoom
  • Platforms

    AWS EC2, WordPress, Docker, AWS Lambda, Google Chrome, Amazon Web Services (AWS)

Education

  • Doctor of Philosophy & Master of Science degree in Statistical Science
    2013 - 2018
    Duke University - Durham, NC, USA
  • Bachelor of Arts degree in Mathematics
    2009 - 2013
    Pomona College - Claremont, CA, USA

To view more profiles

Join Toptal
Share it with others