For my last bit of exploration into my own Netflix data, I wanted to look at me and my friends’ show preferences, specifically to identify where our interests overlap. I figured that a great way to visualise this is to use a network graph, where I’d be able to see the number of shows watched in common between each profile, and easily identify shows that have been watched by multiple profiles.

While looking for ways to create network visualisations in Tableau, I came across the Data Surfer’s blog post on the subject, which demonstrated how easy it was to use Gephi to create the structure needed for a network graph. The Data Surfer Blog post covers everything you need to know on using Gephi and plotting in Tableau, so I will focus on how I prepared the data and the output of Gephi for plotting in Tableau.

Preparing the data for Gephi

There are two tables you need to create a network graph in Gephi:

  1. Nodes: Nodes, sometimes referred to as vertices, represent individual entities in the network. In the context of my Netflix data exploration, a node signifies a specific show or a user profile.
  2. Edges: Edges represent the relationships or connections between the nodes. In the network graph I’m creating for my Netflix data, an edge indicates that a specific profile has viewed a particular show.

 

Nodes: Creating the data for nodes is fairly straightforward. Simply assign unique IDs to all entities (profiles and shows in the the data), then combine them into one table.

nodes.csv

Edges: Using the unique IDs created in the previous step, create a table that shows the relationship between entities, in my case the profiles and the shows they watched.

edges.csv

Next up, simply import the nodes table and edges table into Gephi, and experiment with the layouts till you settle on something you like (more details here).

 

Gephi Output

Export your desired network graph as a .gexf file, then open it as an XML file in Excel. Click through the prompts and warnings, and you should arrive at this

If you scroll through the file, you will see that it comes in two sections.

The top section contains the coordinates for all the nodes in your network.

Top section

The next section contains all relationships in the network. This should be the same as the Edge table that you imported into Gephi, so you can also choose to use that if you prefer.

Bottom Section

Save a copy of this workbook, and bring it into Alteryx.

 

Alteryx

During the initial input stage, I start by eliminating the unused columns (A-L), then partition the dataset into the two sections previously described. This can be simply achieved by filtering the data depending on whether [Id3] or [Id5] hold valid values.

Node Coordinates

For the top section, I simply want to prepare a list of nodes and their coordinates for joining. This also doubles as the dimension table for nodes, as it also contains the full name of each node alongside their unique IDs.

Node Coordinate Table

Create Relationship

The bottom section contains the relationship between nodes. I want to be able to identify each edge and their nodes, and then join the coordinates from the previous table so that they can be plotted in Tableau.

Relationship Table

To visualize the edges (relationships) of the network in Tableau, we need to identify the pairs of nodes associated with an edge by generating unique IDs for each relationship. For ease of use, I opted to concatenate the source and target nodes.

Source and Target Node Coordinates

What I want to achieve here is to create a table that contains the identifier for each edge (Relationship) and the coordinates for the node at node at each end of the the edge.

End Product

First, I want to join the coordinates from Node Coordinates table to the source column in the Relationship table, resulting in something like this:

Source Node Coordinates

Next, I want to create another copy of the relationship table. This time, I want to join the coordinates to the target column in the Relationship table.

Target Node Coordinates

Swap the values between the source and target column in the Target Node Coordinates table by adding a select tool and renaming each column: change “source” to “target” and “target” to “source”.

Union Source Node Coordinates and Target Node Coordinates to get the following:

Output

 

And that is all you need to plot your network graph in Tableau

 

The Data School
Author: The Data School