Before getting to know Alteryx, I used different tools, such as Excel, SQL, SSIS, and python, to process data; however, you may encounter some limitations when using these tools to perform ETL. During the first week of training at the data school, I had the chance to learn Alteryx Designer to process data. I realize the powerful data processing and analysis capabilities of Alteryx during the first-week training. Alteryx Designer simplifies this process by providing a repeatable workflow for self-service data analysis, resulting in deeper insights in hours instead of weeks. Alteryx Designer combines data preparation, data merging, and data analysis (forecasting, statistics, and spatial) by using the same intuitive user interface, thereby enhancing the capabilities of data analysts.

Powerful one-stop data analysis platform Alteryx

  1. Use the drag-and-drop method to generate workflow; the most significant advantage is Low-code or even No-code usage mode. You can see the results at every step, let you know the data flow processing flow clearly
  2. Seamlessly integrate internal data, third-party data, and cloud data, and the results can be published on the Alteryx server to achieve process automation
  3. Powerful analysis functions to achieve predictive analysis. Alteryx performs the large-scale spatial analysis by providing a repeatable workflow, including location data mixing and advanced spatial analysis
  4. Generate reports, data visualization, and data analysis applications in a simple way
  5. Complete the work within a few hours. If you use traditional tools, it may take several days

Case study – Using Alteryx to process my application dataset

The third phase of my interview with the Data school requires processing an 800+million rows of IOWA liquor sales data set with 64 fields and making a tableau dashboard. When processing this data before getting to know Alteryx, I spent much time dealing with this dataset. Now I use Alteryx to handle it, which dramatically shortens the processing time and processing difficulty. In addition, because the intermediate processing results are visible, I am more confident of the result.

  1. Data Processing: There are many missing values ​ and spelling errors (for example, misspelled store name, some county names use abbreviated, but some use full name). Fortunately, the product number is unique, which provides convenience for filling in missing values ​​and correcting spelling errors using my favorites Alteryx tool, “Multi-row formula.”
  2. Data Modelling: The current data set is a wide table contain 64 fields and 800+ million rows. The file size is more than 2000MB; if you import the file to tableau directly, it may lead to a performance issue. To address this issue, I extract the dimension fields from the dataset through Alteryx and then establish lookup Tables, leaving only the sales-related measure fields to generate a fact table. By using this method, the original dataset size is reduced, and the data redundancy is moved.
  3. Export Hyper file: Alteryx supports exporting Hyper file for tableau.
  4. Connect lookup table and fact table through relationship in Tableau

Alteryx workflow

Dataset Split and Aggregate by Alteryx; Original Dataset size reduce significantly from 2000MB+ to 23MB

Data Model in Tableau

Summary: By learning and using Alteryx to process data, I have experienced the power of Alteryx. Alteryx integrates a variety of data analysis and processing tools and generates workflows by dragging and dropping. The processing can be repeated, and the results can be seen, which significantly improves data processing efficiency. I believe that Alteryx will continue to stand out in the field of data analysis.

Gary Li
Author: Gary Li