Shaping data in Power Query, when working within the Power BI ecosystem, is a foundational concept that underpins the process of data transformation and preparation. It serves as the backbone for structuring, cleansing, and enriching your data before it’s used for reporting and visualizations. With shape data, you can seamlessly connect to diverse data sources, apply data cleansing operations, merge tables, and create calculated columns to ensure your data is in the right form for insightful and impactful data analysis and visualization in Power BI.
In Power Query, the “Home” tab serves as your starting point for data transformation and preparation. It provides a range of essential tools and options to manage your data. From the “Home” tab, you can connect to various data sources, load data into Power Query, and access other key functions to shape your data. The “Transform” tab in Power Query is where the real magic happens. It’s your central hub for applying a wide array of data transformation operations. Here, you can clean, filter, sort, and reshape your data to meet your specific requirements. The “Add Column” tab empowers you to create new columns by applying formulas, merging tables, or performing other calculations on your data. These added columns can be customized to suit your analytical needs, providing you with valuable insights and flexibility in your Power BI reports and visualizations.
The two charts above illustrate the tools that included in the “Transform” and “Add Column” Tabs. What we can observe is that lots of the tools exist in both tabs, such as Format, Standard Calculations, Extract, and Date. The difference is indicated by the name of the tabs: the ones in “Transform” tab changes the original column data, while the ones in the “Add Column” creates one or more new columns with the function used when keeping the original columns untouched. Here we will give an example using “Extract” function to parse text from the [Address] column within both tabs to see the difference. We will use the same dataset that we used for our Regex Exercise (https://www.thedataschool.com.au/juliet-ruan/3-ways-to-use-regex-regular-expression-in-alteryx-empowering-data-transformation-and-analysis/). And we are trying to get the postcode from the [Address] column. Here we will use the function ‘Extract’, which exists in both tabs.
First, within the ‘Transform’ tab, press ‘Extract’ and select ‘Text After Delimiter’.
It brings up the window to entre the delimiter, which in this case is comma, and then expand the ‘Advanced Options’. As we can see the postcode is after the third comma in the [Address] Column, so we would like to skip the first 2 comma. So here we entre 2 in ‘Number of delimiters to skip’.
As we press ‘ok’, we can see original column [Address] has been replaced with the postcode only. Double click the Column Name to rename it to [Postcode].
Now if we go back to the original dataset, and use the ‘Extract’ function in the ‘Add Column’ tab and repeat the same steps, we can see that instead of replacing the original column, it adds a new column called ‘Text After Delimiter’, double click the column name to rename it to [Postcode].
As a good data model will largely extend the flexibility in the visualisation in Power BI, Power Query is a powerful tool to shape our data ready for visualise. This blog introduced the difference between “Transform” and “Add Column’, with an example of parsing text under both tabs.