1. Introduction

You encounter a date column where only the first letter of the month and the year are recorded, e.g. J07- January 2007. Each subsequent month is recorded by its first letter, but the year is only recorded for the month of January. Your job is to reformat the date correctly into a month column containing the first three letters of the month, and a separate year column.


2. The Solution

First, we connect the data to the formula tool that grabs the first letter from the date column into a month column. Then, using regex, we can parse out the year with the expression (\d+) that will get the year from the date column. Now we have both a year and month column.

Notice the Year column has null values. We can add a multi-row formula to fill in the Null values with the correct year. The expression we can use is :

IF isempty([Year]) THEN [Row-1:Year] ELSE [Year] ENDIF

This means if the year is empty, then fill it with the year from the previous row, otherwise if it is not empty leave the year as it is. The multi-row formula will evaluate each row of data and will fill it with the correct year. Next, we create another multi-row formula to create a recordID that starts from 1 and ends at 12 for each year, which is the ID for the months of the year. We do this by creating a new field called RecordID, select group by as year, and in the expression write:


Next, we create a text input with two columns, one column called Num from 1 to 12 and the second column called field2 from Jan to Dec. We connect the two data streams to a join tool, and join by the recordID to the num i.e. the ID for the months, and we get the following table:

Now we can add a sort tool, sorting by year and ascending. Next we connect the select tool to rename and select the columns to get the final solution.




The Data School
Author: The Data School