Cover image
Design Process
10 minute read

Designing GitHub’s Octoverse: A Data Visualization Case Study

Designer Gemma Busquets shares how she created a responsive website and 20+ engaging charts and graphs for the software development platform’s annual report.

Last year I collaborated with GitHub to design the 2021 State of the Octoverse report. GitHub’s Octoverse analyzes real-world data from millions of developers and repositories in order to present the year’s software development industry insights. The 2021 report covers three major trends: improving performance and well-being by developing code, creating documentation, and supporting communities in a smarter, more sustainable way.

As the project’s creative liaison, it was my job to assist the GitHub team in making the data-heavy report easy to understand. Using data visualization, I designed 20+ charts, maps, and graphs to help readers unravel the information that GitHub data scientists collected.

In this data visualization case study, I explain my design process, showcase the website I helped to create for GitHub’s Octoverse, and share key learnings from the project.

Designing Engaging Digital Experiences With Data Visualization

State of the Octoverse 2021 is a sprawling report, with data collected from over 73 million GitHub developers and more than 61 million new repositories. It’s also the first time a survey on respondent demographics has been included. Making sense of the data required an extensive design effort.

Our modest team, which included developer Jose Luis Garrido and project manager Miquel Lopez, was tasked with synthesizing this immense amount of information for readers. Despite a delayed start and other simultaneous projects, we delivered.

Kicking Off the Design Process

The first stage of my data visualization design process was discovery. GitHub’s data scientists collected and analyzed information from developers and repositories through Excel files, PowerPoint presentations, and other data sets.

With this information, along with GitHub’s initial data visualization sketches and a 60-page context document, I began to think about how best to illustrate each data set. Then, I set about designing each chart, map, and diagram for maximum user engagement and an intuitive user experience.

Choosing Your Chart

There are three key points to choosing an effective data visualization:

1. Identify the Chart’s Purpose

Data can be represented in numerous ways–bar charts, line graphs, heatmaps, waterfall charts, and more. Each chart serves a purpose, and it’s important to use the right one to ensure that a clear and accurate message is conveyed.

For example, if you want to present the difference between two quantities, use a bar chart. If you want to show a trend over time, use a line graph.

2. Consider the End User

You also need to be aware of your users’ ability to read and analyze data. Most of us are familiar with pie, bar, and line charts. We see them everywhere, and we know how to read them.

On the other hand, fewer people know how to read box plots, which are used in many research publications to summarize multiple data variables into one chart.

If you present users with unfamiliar visualizations, they’ll have a hard time interpreting the data.

3. Design With Clarity

Is the data visualization clear and concise, or is there too much noise? Bar charts can be a great way to display data, but not if there are 100 bars with individual labels. Likewise, streamgraphs are beautiful and functional, but only when there’s a clear data pattern. Sometimes less is more.

Designing Perfect Data Visualizations

Throughout the 2021 State of the Octoverse report, you’ll find a variety of data visualizations that have been carefully composed in accordance with the corresponding data insight.

The Butterfly Chart

On the Overview page, I needed to design an infographic for two sets of data—showing where respondents worked before the pandemic and after it. GitHub provided me with two pie charts that each mapped out four data points: collocated, hybrid, fully remote, and not applicable. However, pie charts are not particularly effective when comparing two sets of data.

Instead, I opted for a butterfly chart. Butterfly charts plot the data as two horizontal bars side by side, resembling butterfly wings. These charts clearly show the difference between two groups that share the same parameters, and make comparing two sets of data much easier.

A butterfly chart for GitHub's Octoverse report showing two sets of data side by side. The data compares where respondents worked before (left) and after (right) the pandemic. There are four data points: collocated, hybrid, fully remote, and not applicable for both data sets.
GitHub provided me with two pie charts that identified where respondents worked before and after the pandemic. I felt it was more effective to produce a butterfly chart, which displays two sets of data next to each other for easy comparison.

The Bump Chart

Another effective data visualization is the bump chart. We used this chart to present the information on the most popular computer programming languages used by developers over the past eight years. Bump charts are great for displaying changes in rank over a period of time, and they have become a staple in the Octoverse report.

A bump chart for GitHub's Octoverse report that shows the most popular computer programming languages used by developers over the past eight years. Each language is represented by a different colored line. There are 10 languages in total.
For this data visualization, I used an interactive bump chart to show the most popular computer programming languages used by developers over the past eight years.

The Treemap

I needed to illustrate the different sectors to which respondents contribute code. The final decision came down to pie charts versus treemaps.

Pie charts are useful when you have three or four sectors and when the quantities are clearly different. However, our brains don’t process angles well, so when there’s a pie chart with lots of similarly sized wedges, people have a hard time deciphering which is bigger.

In contrast, treemaps allow users to easily compare segments to each other, as well as to the whole. The largest rectangles are placed in the top left, followed by progressively smaller rectangles. It’s easier to compare straight lines than it is to compare wedges or angles.

A treemap for GitHub's Octoverse report illustrates the different sectors to which respondents contributed code during 2021. Each sector is represented by a rectangle.  The largest rectangles are placed in the top left, followed by progressively smaller rectangles. Each rectangle is a different color.
I used a treemap to present the different sectors to which the respondents had contributed code. In some cases, treemaps are preferable to pie charts because it’s easier to compare rectangles than slices.

The Cartogram

Finally, I needed to illustrate the geographical distribution of organizations using GitHub in 2021 by region or country. For this, I used a population cartogram. Cartograms are maps in which the geometry is distorted to accommodate a particular economic, social, political, or environmental feature.

In this data visualization, the size of the squares indicates the population size. Additionally, the saturation of the square’s color indicates how many organizations in that area are using GitHub.

A population cartogram for GitHub's Octoverse report represents the geographical distribution of organizations in 2021. This map alters the reality of physical location in order to better visualize a particular factor, in this case business. The saturation of the square's color indicates how many organizations are using GitHub, with lighter shades representing fewer and darker shades representing more.
Cartograms are great for thematic mapping, when you want to emphasize information other than physical location, such as an economic, social, political, or environmental feature.

Responsive Website Design For GitHub’s Octoverse 2021

In addition to designing data visualizations, I also helped the GitHub team produce a website for Octoverse 2021. This site was a hub for users to read, explore, and interact with the report’s data insights.

To encourage user engagement, we opted for a fully responsive website that would adapt the site’s rendering to different sized viewports. GitHub asked us to pay special attention to the desktop version after finding that larger devices drove the majority of Octoverse visits.

When designing the responsive site, I followed these best practices:

  • Composing text with desktop-friendly and mobile-friendly typefaces. This included choosing optimal font sizes, typefaces, and line length and height, and refining how the text looks at different breakpoints.
  • Laying out the visual elements on each page to encourage scrolling.
  • Designing a user-friendly top navigation bar that adapts its layout to the viewport size.

Because I designed the website with different devices in mind from the start, most charts rendered well on all screen sizes. I only needed to make minor adjustments for optimal viewability, such as to the circular dendrogram at the end of the “Sustainable communities” section.

A circular dendrogram for GitHub's Octoverse report. Each circle represents one of the 20 largest repositories by category and repository contributors. Each sector is represented by a different color.
Dendrograms are a great way to show the relationships between categories. You can click on the circles within this interactive dendrogram to discover the number of repository contributors and % of contributors with account age <2 years.

Organizing the Information Architecture

I explored different options for the website’s information architecture. I didn’t want to overwhelm users with too much information, but I also didn’t want the site to be scattered or difficult to navigate.

With this in mind, I started by designing a long scrolling website, with all the content on the same page. When that became visually overwhelming, I tried placing each chart on a separate page. To help with navigation, I added a side navigation menu to each page with a table of contents, similar to what you might find in a book. The final design on the Octoverse website consists of separate webpages for the three main trends, plus a homepage that serves as a summary of the most important data.

After deciding on the information architecture, I moved on to designing the site’s content structure, navigation flow, images, and graphics. I created wireframes to map out the content and show paths between different pages.

Making the Website Interactive

The Scroll Progress Indicator

To satisfy GitHub’s request for an engaging, dynamic website, we added interactive elements. For instance, under the top navigation bar, I designed a scroll progress indicator so visitors could keep track of where they were on the site. As readers scroll down a page, the indicator bar scales incrementally, and each page has a different fill color for the bar: gray, purple, blue, or green.

A portion of the "Sustainable communities" webpage within the GitHub Octoverse 2021 website. The scroll progress indicator across the top is interactive. As the user scrolls down the page, the indicator bar changes from light gray to green.
Subtle touches of interactivity: The scroll progress indicator bar changes from light gray to green as you scroll down the page.

Animated Headers, Images, and Data Visualization

To keep the website from looking flat, we decided to animate the section headers. I created the illustrations and our team’s developer animated them. We also animated the hero image for the homepage and each subsection, and their corresponding chapter cards at the bottom of each webpage.

Animated gif of the three chapter cards that can be found at the bottom of each webpage in GitHub's Octoverse website. They are: Writing and shipping code faster (with an interactive purple header), Creating documentation to support developers (with an interactive blue header), and Supporting sustainable communities (with an interactive green header.
At the bottom of each webpage, you can find animated chapter cards for each of the three main trends presented in GitHub’s Octoverse.

We also made some of the static data visualization charts interactive. For example, as you scroll over a line in the bump chart, the line thickens to emphasize the corresponding data point. It’s a simple but effective animation that lets site visitors interact with the data and quickly compare languages.

Creating Successful Data Visualizations and Digital Designs for GitHub: Key Learnings

Data is only useful if you can make sense of it, and the process of designing data-heavy content that users can easily decipher is challenging. Nevertheless, this collaboration with GitHub broadened my knowledge in data visualization design. Here are the most important takeaways from this data visualization case study:

  • Know the brand: Being familiar with a brand’s core style guidelines—such as ​​its use of type, color, and images—speeds up the design process because it frees designers to move on to the creative process. I was lucky that I knew a lot about GitHub’s brand before the collaboration, and I was able to use this knowledge to inform my designs.
  • Choose the right types of data visualizations: Selecting the correct visualization to represent a data point is essential. An incorrect representation can cause confusion or convey the wrong message.
  • Use color wisely: The right color combination will guide the reader’s eye and draw attention to a particular data point.
  • Stay curious: When you’re trying to tell a compelling data story, you’re bound to encounter complex design problems, so it’s important to be open to uncommon solutions and continuous learning.

Understanding the basics

GitHub’s State of the Octoverse is a report that presents software development trends and insights. Data from millions of developers and repositories is collected and analyzed to make up the annual report. Trends include working habits, productivity, and career satisfaction.

Data visualization is the process of creating graphical representations of data sets, such as charts, graphs, and maps. This design technique is used to clearly communicate complex data to users.

GitHub is an open-source code-hosting platform for version control and collaboration where developers and programmers can download, review, and evaluate each other's work. It is the platform of choice for millions of developers.