Executive Summary

What is Get & Transform?
  • Get & Transform is a data-transformation tool for use within Microsoft Excel and Power BI software packages.
  • Data often arrives in unstructured formats, which makes the ETL (extract, transform, and load) process a tedious process of manual workarounds.
  • Get & Transform automates and expedites the process of cleaning and organizing such raw data, which ultimately assists the analytical task of uncovering observations and trends.
  • Some examples of functionality provided by Get & Transform include: Removing columns, grouping data, splitting strings into substrings, and appending rows from another table.
  • For maintaining workflows within the Excel universe, Get & Transform is an excellent tool which can be easily explained and demonstrated to relevant stakeholders.
How can I use Get & Transform?
  • Access in Excel is via the Get & Transform Data section within the Data tab. In Power BI it exists in the External Data section of the Home tab.
  • Loading CSVs: Importing a CSV via Get & Transform allows for it to be cleaned and made "narrower" or "wider" to assist with data pivoting. These instructions can be saved and then repeated for future imports.
  • Handling text strings: As a significant improvement over the Text to Columns functionality in Excel, Get & Transform can quickly parse through and separate combined text and number strings into separate columns.
  • Different data sources: With a wide range of input files accepted, it is possible to work with disparate sources while maintaining consistent and normalized output quality.
  • Customizing with code: The M language is the functional code used within Get & Transform, and it is possible to write custom queries for more bespoke requests.

In this age of data lakes and petabyte-scale databases, it is shocking how frequently I still receive data in the form of CSV, text, and Excel files. While modern-day analytics focuses on cutting-edge advances in machine learning algorithms, the day-to-day drudgery of data analysis is still a manual process of finding, compiling, and wrangling disparate data types.

For the financial analyst, data often arrives as an Excel spreadsheet, but just as often, it is a data dump into a CSV or a query into a SQL database. Sometimes, the data is arranged in a confusing layout or does not have all the requisite components for analysis. Time spent scrubbing this data is valuable time wasted for the analyst, yet at times this task is accepted as a necessary evil to be tolerated.

What Does Get & Transform Do?

A solution to this common problem is actually quite accessible: Excel and Power BI have an entire set of data transformation tools that few users are aware of, named Get & Transform (formerly known as Power Query). Using its embedded extract, transform, and load (ETL) functionality enables financial analysts to seamlessly link to their data sources and get to insights quicker.

As we tee up data to load into Excel or Power BI, we usually have to perform some transformations to the data. Some examples of data manipulation would include:

  • Removing columns,
  • Filtering the data,
  • Grouping the data,
  • Pivoting/unpivoting the data,
  • Splitting strings into substrings,
  • Extracting keywords from strings,
  • Appending rows from another table, and
  • Joining two dimension tables.

In the diagram below, we see that Get & Transform performs this tedious role of pre-processing the data before it is loaded.

Diagram of Excel Get and Transform performing data pre-processing

Why Should You Use Get & Transform?

Why is it worthwhile learning how to use Get & Transform? Well, when I look at what I have personally used this functionality for, it has offered me a malleable set of tools for:

  • Loading an entire folder of text files into a single data table
  • Converting exported accounting files into a digestible layout
  • Loading millions of sales rows into Power Pivot directly
  • Grouping daily data into manageable monthly results before importing them into Excel
  • Splicing in data from another table by joining on matching columns

Generally, when I receive new data, I will explore it using Get & Transform before loading it into Power Pivot. This allows me to see what transformations might be necessary and quickly perform some pivots and groupings on the data to formulate a framework for analysis. In many cases, at this stage, I will find that I need more data, or that there are data issues. By using an Excel-based platform, I can quickly iterate with my data source to find these data anomalies.

Ultimately, the decision to stay in Excel or move the data analysis to another platform will depend upon the audience and the repeatability and distribution of the analysis. If my clients only use Excel, then I will almost always use Get & Transform to load the data, Power Pivot to perform the analysis, and Excel to product the PivotTables and charts. To the client, this will feel seamless since it is all housed within Excel.

However, if my client:

  1. Wants to use another visualization tool,
  2. Has multiple users who will be refreshing the data, or
  3. Needs to employ machine-learning models,

Then I will use Get & Transform solely for the initial data exploration, and then move the heavy lifting into R.

How to Access Get & Transform in Excel or Power BI

In previous versions of Excel, Power Query was an add-in that could be installed to help with ETL functions. However, in Excel 2016 and Power BI, these tools are more tightly integrated. In Excel 2016, they can be accessed through the Data tab, and then the Get & Transform Data section.

Screenshot of how to access the Get and Transform feature from Excel 2016's Data tab

In Power BI, the functionality exists on the Home tab, in the External Data section.

Screenshot of how to access the Get and Transform feature from the Home tab in the External Data section in Power BI

In this article, my examples take place in Power BI, but the interface is almost identical to Excel’s. I will point out the differences when they arise so the tutorial should make sense to both types of users.

1. Loading CSV Files

To assist this tutorial, I have created a few examples of sales data for a fictional retailer that sells outdoor gear and clothing. In each of these examples, the data will be produced in different ways to demonstrate realistic methods of data dumps.

As an initial example, we will see the data presented as a large data dump into a CSV file. The complicating factor is that the data is presented with multiple columns representing various stores. We ideally would like to import and transform the data into a more usable layout.

Below is a screenshot of what the raw CSV looks like:

Screenshot of raw data visualized from a CSV

Why would we want to change this? To take advantage of the relationship capabilities that are possible in these applications. We will see this play out further on in the discussion.

For now, let’s assume that we need to see the data as a “narrower and taller” structure, rather than a “wider and shorter” one. The first step is to load the CSV; then, we will start to “unpivot” the data.

Demonstration of the loading and unpivoting process

As you can see, the final structure of the data is narrower than the initial data, and a lot longer. Another point is that, as we are clicking on different actions, the tool on the right-hand side is generating a list of applied steps used to build the query. It is important to understand that this is going on in the background, as it will be revisited later.

Get & Transform looks and behaves similarly between Power BI and Excel for the most part. However, in Excel, after clicking Close and Load, there is one additional prompt. In the figure below, we can toggle between whether we wish to load the data into:

  1. A table in Excel,
  2. A PivotTable created against the data,
  3. A PivotChart created against the data, or
  4. “Only Create a Connection.”

In addition, we are also given the option of whether or not to Add this data to the Data Model. Checking this box loads the data into a Power Pivot table. If we are going to analyze the data in Power Pivot, I advise choosing Only Create a Connection and then making sure that the Add this data to the Data Model option is selected. If the data is within the Excel row limit, and we prefer to conduct our analysis in Excel, then just choose Table.

Screenshot of the Import Data options window

In the next clip, we will see that the reason why we formatted the data to be long and skinny is so that we are able to analyze the sales not only by store but also by region and state. To accomplish this task, we will import a table that maps each store to a region and state. We will see below that we can quickly create reports that show sales by these different groupings.

Demonstration of quick and easy report creation using region and state groupings

You can imagine how this type of capability for data transformation in Excel, or Power BI, can be powerfully applied to any case where we have dynamic groupings of data, such as:

  • Rolling up daily data into weeks, months, and quarters;
  • Grouping sales personnel into departments and regions; or
  • Mapping SKUs to product types.

While this article addresses CSV and other Excel files, Get & Transform tackles a wide range of data types. Once a query is created, it can be refreshed over time as the data changes.

2. Handling Text Strings

In order to demonstrate Get & Transform’s ability to manipulate strings, I created another dataset that mimics a text file showing accounting transactions from a firm’s general ledger (GL).

Table showing accounting transactions from a firm’s general ledger

Notice how the account number and name appear in the same string? In Power BI, we can effortlessly parse the account number and name into separate fields.

Demonstration of parsing the account number and name into separate fields

In this video, you can see that after I split the column, the tool guessed that the new left-hand side of the Account field should be a number, and it creates a “Changed Type1” step. Since we ultimately want this field as a string, we can go ahead and delete the step manually under the applied steps.

Next, we take the same data and create a chart of accounts with mappings to account categories.

Demonstration of creating a chart of accounts with mappings to account categories from the data

Why would we go through all those steps to map a few account numbers? A real general ledger can be hundreds or even thousands of accounts. This quick mapping query, as we have shown, would scale to that level with no additional work.

3. Working with Different Data Sources

Get & Transform supports many different data sources. While not an exhaustive list, below are some examples:

Text File Excel Facebook Adobe Analytics Google Analytics Salesforce Azure Redshift Spark SQL Server SAP HANA Teradata Google BigQuery

Personally, I have only tried about half of the connections on the above list. Each of the connectors I have used has been fairly robust; I have gotten from raw data to insights without a burdensome amount of work. Equally importantly, it serves as a validator between disparate sources of data, ensuring that the end outputs have a normalized level of quality control.

4. Personalizing Code with the M Language

In the background, Get & Transform is generating code each time we click a button in the tool or make a selection. Below is an example of how you would access the code for the account-mapping query we created:

Demonstration of an example of how you would access the code for the account-mapping query

The code uses a functional language named M, which auto-generates for basic use cases. However, for more complicated data wrangling, we can edit and write our own code. For most cases, I will only ever make minor modifications to this code. In more complicated transformations, I may write most of the code from scratch to stage temporary tables, or for performing more complicated joins.

The Limits of Get & Transform

Excel tends to reach its limits when you try exporting more than a million rows. In the cases where I have transformed millions of rows with Get & Transform, the only way to ship out ungrouped rows is via tedious hacks or workarounds. I have also found that Get & Transform queries can be unstable to deploy to multiple users, especially if you use multiple data sources and joins. In those cases, I will always use R to deploy the duplicable data wrangling. Finally, Excel is not built for more advanced data modeling. You can perform linear regressions pretty quickly, but beyond that, you will need to use a more rigorous platform.

Having said all that, I find that Excel is what most of my clients are most comfortable with. Excel is still the most important tool in a financial analyst’s arsenal. By incorporating the Get & Transform functionality, Excel and Power BI become even more powerful through the range of data sources that they can accept.

Understanding the Basics

What is the concept of ETL?

Extract, transform, and load is the process of moving data from different sources into a centralized data warehouse.

About the author

Ellen Su, United States
member since September 15, 2016
Ellen is a fixed-income trader and portfolio manager who now freelances in financial modeling. She is excited to bring to Toptal clients a vast set of tools to employ on analytical projects. Her unique talent is a seamless combination of data sourcing, programming, financial analysis, storyboarding, and visualization. [click to continue...]
Hiring? Meet the Top 10 Freelance Excel Experts for Hire in November 2018

Comments

comments powered by Disqus
Subscribe
Free email updates
Get the latest content first.
No spam. Just great articles & insights.
Free email updates
Get the latest content first.
Thank you for subscribing!
Check your inbox to confirm subscription. You'll start receiving posts after you confirm.
Trending articles
Related Skills
About the author
Ellen Su
Finance Expert
Ellen is a fixed-income trader and portfolio manager who now freelances in financial modeling. She is excited to bring to Toptal clients a vast set of tools to employ on analytical projects. Her unique talent is a seamless combination of data sourcing, programming, financial analysis, storyboarding, and visualization.