Jeffrey Halley, Data Engineering Developer in San Francisco, CA, United States
Jeffrey Halley

Data Engineering Developer in San Francisco, CA, United States

Member since June 24, 2020
Jeff is a data engineer, a software engineer, and a former geneticist and educator. He develops innovative uses for existing data, discovers and implements efficiencies, and helps others excel in their projects. As a scientist, he learned how to understand and explore complex problems. As an educator, he mastered the art of clearly communicating advanced topics. Jeff brings this rare combination of skills and experience to every data and software development project he takes on.
Jeffrey is now available for hire




San Francisco, CA, United States



Preferred Environment

Snowflake, NoSQL, SQL, Pandas, Spark, Python

The most amazing...

...thing I've developed is a tool to extract participation information from online meeting platform logs.


  • Data Engineer

    2020 - 2020
    • Provided data scientists and business analysts with reliable access to loan application and payment data by building an Airflow-orchestrated data pipeline between an RDS transactional database and a Snowflake data warehouse.
    • Ensured reliable service by writing automated tests for data pipelines using Pytest and Tox.
    • Increased team productivity by expanding documentation and writing Bash shell scripts to automate the installation of required tools and packages.
    Technologies: Pytest, PostgreSQL, Apache Airflow, Snowflake, Python
  • Data Engineer

    2019 - 2020
    Insight Data Science
    • Assisted Google Ads users to find the most cost-effective options for their Google Ads (AdWords) purchases.
    • Created an application that identifies new trending words within social media communities devoted to a specific topic.
    • Provided a fast and resilient pipeline that ingests data from social media sites, processes the data with Spark to find trending topic-specific words, and stores the processed data in a PostgreSQL database that updates via Airflow DAG.
    • Built an easy-to-use and informative Dash-based UI that delivers results from a database by converting user input into SQL queries to generate a list of possible words for Google Ads and informative plots about the words’ usage on Reddit.
    Technologies: Amazon Web Services (AWS), Plotly, AWS, Python, PostgreSQL, Spark
  • Instructor and Technology Committee Member

    2010 - 2019
    Stanford University
    • Enabled online teachers to quantitatively track their students’ participation and use of class time.
    • Developed a Python application that extracts student participation data from XML-log files and generates easily understandable reports and charts using Bokeh.
    • Saved teachers approximately five hours per week by finding, testing, evaluating, and making recommendations about new software for learning management, grade book, video recording, and video playback.
    • Increased new technology adoption rate by approximately 30% by giving talks, hosting workshops, and writing user guides for instructors and staff.
    Technologies: Python


  • WordEdge (Social Media NLP ETL Pipeline)

    I developed WordEdge to help users get the best deals on their Google Ads purchases. Businesses that advertise on Google Ads purchase search terms through an auction. Suppose you wanted to advertise "Basketball Shoes." You might want to purchase the search term "Basketball Shoes," but because of its popularity, it's likely too expensive to be cost-effective.

    WordEdge helped users identify the newest trending words in a topic related to their business before those words got cool and before they got so expensive. If you were the basketball-shoe seller described above, WordEdge would help you discover basketball fans' inside jokes, player nicknames, and names of hot new rookies, all of which enabled effective and affordable search term purchases.

  • Adobe Connect Participation Extractor (XML ETL)

    This Python script extracts participation information from the .XML files that are included with downloaded recordings of Adobe Connect sessions. For each participant, the script determines time on camera, time with camera paused, time on microphone, the number of chat messages sent, and a summary participation grade. The script generates a report on all of these features and some related calculations in a summary participation report .csv file. Additionally, the script generates a series of bar plots showing each of the participation features and saves them as a .html file.

  • Predictive Text in R

    The Word Suggester app searches a database of commonly used phrases (wordlists) for a phrase that matches the text typed by a user. The app attempts to match all of the text input by the user (up to five words long), but if no matches are found, the app progressively removes words from the beginning of the phrase until a match is found. Once a match is found, the three words that most commonly follow the matched phrase are suggested to the user, ranked in order from most common to least common. If fewer than three words commonly follow a particular phrase, the app will suggest the words that follow that phrase and then trim words from the beginning of the phrase until a total of three words can be suggested to the user.


  • Languages

    SQL, Python 3, R, Snowflake, Python
  • Frameworks

  • Libraries/APIs

  • Storage

    Database Modeling, NoSQL, PostgreSQL
  • Other

    Data Engineering, Natural Language Processing (NLP), AWS
  • Tools

    Apache Airflow, Pytest, Plotly
  • Platforms

    Amazon Web Services (AWS)


  • Ph.D. in Molecular and Cellular Biology
    2003 - 2009
    University of California, Berkeley - Berkeley, CA, USA

To view more profiles

Join Toptal
Share it with others