What is Get & Transform?
- Get & Transform is a data-transformation tool for use within Microsoft Excel and Power BI software packages.
- Data often arrives in unstructured formats, which makes the ETL (extract, transform, and load) process a tedious process of manual workarounds.
- Get & Transform automates and expedites the process of cleaning and organizing such raw data, which ultimately assists the analytical task of uncovering observations and trends.
- Some examples of functionality provided by Get & Transform include: Removing columns, grouping data, splitting strings into substrings, and appending rows from another table.
- For maintaining workflows within the Excel universe, Get & Transform is an excellent tool which can be easily explained and demonstrated to relevant stakeholders.
How can I use Get & Transform?
- Access in Excel is via the Get & Transform Data section within the Data tab. In Power BI it exists in the External Data section of the Home tab.
- Loading CSVs: Importing a CSV via Get & Transform allows for it to be cleaned and made "narrower" or "wider" to assist with data pivoting. These instructions can be saved and then repeated for future imports.
- Handling text strings: As a significant improvement over the Text to Columns functionality in Excel, Get & Transform can quickly parse through and separate combined text and number strings into separate columns.
- Different data sources: With a wide range of input files accepted, it is possible to work with disparate sources while maintaining consistent and normalized output quality.
- Customizing with code: The M language is the functional code used within Get & Transform, and it is possible to write custom queries for more bespoke requests.
In this age of data lakes and petabyte-scale databases, it is shocking how frequently I still receive data in the form of CSV, text, and Excel files. While modern-day analytics focuses on cutting-edge advances in machine learning algorithms, the day-to-day drudgery of data analysis is still a manual process of finding, compiling, and wrangling disparate data types.
For the financial analyst, data often arrives as an Excel spreadsheet, but just as often, it is a data dump into a CSV or a query into a SQL database. Sometimes, the data is arranged in a confusing layout or does not have all the requisite components for analysis. Time spent scrubbing this data is valuable time wasted for the analyst, yet at times this task is accepted as a necessary evil to be tolerated.
What Does Get & Transform Do?
A solution to this common problem is actually quite accessible: Excel and Power BI have an entire set of data transformation tools that few users are aware of, named Get & Transform (formerly known as Power Query). Using its embedded extract, transform, and load (ETL) functionality enables financial analysts to seamlessly link to their data sources and get to insights quicker.
As we tee up data to load into Excel or Power BI, we usually have to perform some transformations to the data. Some examples of data manipulation would include:
- Removing columns,
- Filtering the data,
- Grouping the data,
- Pivoting/unpivoting the data,
- Splitting strings into substrings,
- Extracting keywords from strings,
- Appending rows from another table, and
- Joining two dimension tables.
In the diagram below, we see that Get & Transform performs this tedious role of pre-processing the data before it is loaded.
Why Should You Use Get & Transform?
Why is it worthwhile learning how to use Get & Transform? Well, when I look at what I have personally used this functionality for, it has offered me a malleable set of tools for:
- Loading an entire folder of text files into a single data table
- Converting exported accounting files into a digestible layout
- Loading millions of sales rows into Power Pivot directly
- Grouping daily data into manageable monthly results before importing them into Excel
- Splicing in data from another table by joining on matching columns
Generally, when I receive new data, I will explore it using Get & Transform before loading it into Power Pivot. This allows me to see what transformations might be necessary and quickly perform some pivots and groupings on the data to formulate a framework for analysis. In many cases, at this stage, I will find that I need more data, or that there are data issues. By using an Excel-based platform, I can quickly iterate with my data source to find these data anomalies.
Ultimately, the decision to stay in Excel or move the data analysis to another platform will depend upon the audience and the repeatability and distribution of the analysis. If my clients only use Excel, then I will almost always use Get & Transform to load the data, Power Pivot to perform the analysis, and Excel to product the PivotTables and charts. To the client, this will feel seamless since it is all housed within Excel.
However, if my client:
- Wants to use another visualization tool,
- Has multiple users who will be refreshing the data, or
- Needs to employ machine-learning models,
Then I will use Get & Transform solely for the initial data exploration, and then move the heavy lifting into R.
How to Access Get & Transform in Excel or Power BI
In previous versions of Excel, Power Query was an add-in that could be installed to help with ETL functions. However, in Excel 2016 and Power BI, these tools are more tightly integrated. In Excel 2016, they can be accessed through the Data tab, and then the Get & Transform Data section.
In Power BI, the functionality exists on the Home tab, in the External Data section.
In this article, my examples take place in Power BI, but the interface is almost identical to Excel’s. I will point out the differences when they arise so the tutorial should make sense to both types of users.
1. Loading CSV Files
To assist this tutorial, I have created a few examples of sales data for a fictional retailer that sells outdoor gear and clothing. In each of these examples, the data will be produced in different ways to demonstrate realistic methods of data dumps.
As an initial example, we will see the data presented as a large data dump into a CSV file. The complicating factor is that the data is presented with multiple columns representing various stores. We ideally would like to import and transform the data into a more usable layout.
Below is a screenshot of what the raw CSV looks like:
Why would we want to change this? To take advantage of the relationship capabilities that are possible in these applications. We will see this play out further on in the discussion.
For now, let’s assume that we need to see the data as a “narrower and taller” structure, rather than a “wider and shorter” one. The first step is to load the CSV; then, we will start to “unpivot” the data.
As you can see, the final structure of the data is narrower than the initial data, and a lot longer. Another point is that, as we are clicking on different actions, the tool on the right-hand side is generating a list of applied steps used to build the query. It is important to understand that this is going on in the background, as it will be revisited later.
Get & Transform looks and behaves similarly between Power BI and Excel for the most part. However, in Excel, after clicking Close and Load, there is one additional prompt. In the figure below, we can toggle between whether we wish to load the data into:
- A table in Excel,
- A PivotTable created against the data,
- A PivotChart created against the data, or
- “Only Create a Connection.”
In addition, we are also given the option of whether or not to Add this data to the Data Model. Checking this box loads the data into a Power Pivot table. If we are going to analyze the data in Power Pivot, I advise choosing Only Create a Connection and then making sure that the Add this data to the Data Model option is selected. If the data is within the Excel row limit, and we prefer to conduct our analysis in Excel, then just choose Table.
In the next clip, we will see that the reason why we formatted the data to be long and skinny is so that we are able to analyze the sales not only by store but also by region and state. To accomplish this task, we will import a table that maps each store to a region and state. We will see below that we can quickly create reports that show sales by these different groupings.
You can imagine how this type of capability for data transformation in Excel, or Power BI, can be powerfully applied to any case where we have dynamic groupings of data, such as:
- Rolling up daily data into weeks, months, and quarters;
- Grouping sales personnel into departments and regions; or
- Mapping SKUs to product types.
While this article addresses CSV and other Excel files, Get & Transform tackles a wide range of data types. Once a query is created, it can be refreshed over time as the data changes.
2. Handling Text Strings
In order to demonstrate Get & Transform’s ability to manipulate strings, I created another dataset that mimics a text file showing accounting transactions from a firm’s general ledger (GL).
Notice how the account number and name appear in the same string? In Power BI, we can effortlessly parse the account number and name into separate fields.
In this video, you can see that after I split the column, the tool guessed that the new left-hand side of the Account field should be a number, and it creates a “Changed Type1” step. Since we ultimately want this field as a string, we can go ahead and delete the step manually under the applied steps.
Next, we take the same data and create a chart of accounts with mappings to account categories.
Why would we go through all those steps to map a few account numbers? A real general ledger can be hundreds or even thousands of accounts. This quick mapping query, as we have shown, would scale to that level with no additional work.
3. Working with Different Data Sources
Get & Transform supports many different data sources. While not an exhaustive list, below are some examples:
Personally, I have only tried about half of the connections on the above list. Each of the connectors I have used has been fairly robust; I have gotten from raw data to insights without a burdensome amount of work. Equally importantly, it serves as a validator between disparate sources of data, ensuring that the end outputs have a normalized level of quality control.
4. Personalizing Code with the M Language
In the background, Get & Transform is generating code each time we click a button in the tool or make a selection. Below is an example of how you would access the code for the account-mapping query we created:
The code uses a functional language named M, which auto-generates for basic use cases. However, for more complicated data wrangling, we can edit and write our own code. For most cases, I will only ever make minor modifications to this code. In more complicated transformations, I may write most of the code from scratch to stage temporary tables, or for performing more complicated joins.
The Limits of Get & Transform
Excel tends to reach its limits when you try exporting more than a million rows. In the cases where I have transformed millions of rows with Get & Transform, the only way to ship out ungrouped rows is via tedious hacks or workarounds. I have also found that Get & Transform queries can be unstable to deploy to multiple users, especially if you use multiple data sources and joins. In those cases, I will always use R to deploy the duplicable data wrangling. Finally, Excel is not built for more advanced data modeling. You can perform linear regressions pretty quickly, but beyond that, you will need to use a more rigorous platform.
Having said all that, I find that Excel is what most of my clients are most comfortable with. Excel is still the most important tool in a financial analyst’s arsenal. By incorporating the Get & Transform functionality, Excel and Power BI become even more powerful through the range of data sources that they can accept.
Understanding the basics
Extract, transform, and load is the process of moving data from different sources into a centralized data warehouse.
Power BI is a business analytics software piece from Microsoft. It provides deep capabilities for visualizing data and creating automated reports and dashboards.
A data lake is a single repository for all data sources within an organization. This may include structured and unstructured pockets of data, which ultimately can be drawn upon and processed whenever required.