Site Loader

As I mentioned in my post ‘A Dataset for Teaching Clustering – The Fruit Dataset‘, one of the major challenges in understanding clustering results is the difficulty of visualizing these results in a way that lets you directly see if the clustering algorithm has grouped datapoints in a way that is relevant to your specific context.

In the fruit dataset, there is an image of each object in question, which makes it much easier to understand the results. Most datasets do not come with included images, however. In this case, what strategies might be used to visually understand the clustering results?

My colleague Rich Webster spends a lot of time thinking about new and innovative ways to visualize data, and during a discussion about ways to visualize high level patterns in datasets he mentioned a new package created by his colleague Nick Barrowman: vtree. vtree enables visualization of the overall composition of datasets through the creation of variable trees. These variable trees then allow people to visualize how different factors in the dataset interact with each other to partition the dataset into various subsets with particular qualities.

In the context of clustering, this strategy could also be used to visualize the composition of clusters (which are dataset subsets) relative to the dataset as a whole. This would seem to open up new avenues for a more intuitive understanding of clustering results. I look forward to seeing what comes out of combining vtree and clustering analysis and will aim to post a blog post with an update as new work on this front develops.

Post Author: Jen Schellinck

Jen Schellinck is the principal of Sysabee and an adjunct professor at Carleton's Institute of Cognitive Science. She founded Sysabee in 2012 with the goal of taking analysis techniques from machine learning and systems modeling and making these available to organizations who are seeking to gain the benefits of technology supported analysis and decision making. For each project, she draws from a pool of expert consultants to create a team customized to the specific needs of the project. She is also the founding member of the Data Science Experts Group, an association of data professionals that build flexible, customized solutions for data-driven companies and organizations. She remains an active participant in academic research via Carleton’s Cognitive Modeling Lab.