Mar 232012
 

There is an important problem in network visual analysis, and here is a tool to solve it.

The problem

The problem is, the visual exploration of networks is most insightful when the network shows some interesting structure, and often there is not much structure to be seen. By “structure”, I mean that the network shows different regions, each with different densities, maybe also key players not only in the center but also elsewhere, and basically anything which shows diversity or interesting irregularities in the network.

But often, representing linked data as a network does not exhibit much structure. Just as an example, here is the network of twitter users interested in “data visualization”, made by Mortiz Stefaner:

 

 

As you see, this viz makes it hard to understand how this community is structured. We do find key players (big nodes), there are vague sub-regions in the network that can be distinghushed, but that’s very unconclusive.

(my point is not to criticize this particular viz. This problem occurs everywhere. If you are a biologist working on protein networks, or a consultant drawing the social network of an organization, or working with semantic networks, you will surely be familiar with this problem!)

 

 

 

 The solution

Let’s reconsider the linked data we have: people being connected to people, in this case people following or mentioning others on Twitter (but imagine any other scenario, like proteins being linked to genes, etc.). Instead of representing these connections, let’s represent only the connections between people who share many connections. Simply: 2 twitter users will be connected not if one follow the other, but if they both follow in common a high proportion of other twitter users.*

As an application, I used data provided by Jeff Clark:

person A, person B, 5000
person C, person B, 120
person B, person D,  234

(meaning, person A mentions person B with a frequency of 5000 (arbitrary scale), etc.)

I wrote a program called “Gaze” which takes these data and identifies which pairs of persons mention most frequently the same other persons. The resulting network looks like this:

 

(click here for a beautifully interactive version of this viz)

Sub regions of the network now clearly appear, and distinct communities can be spotted. There would be much more to say about the parameters which can be modified to achieve this, but I’ll mention just one. 2 persons are linked if they frequently mention the same persons in their tweets. But how “frequently” exactly? Well, that’s simply a parameter you can change, from “almost never the same persons” to “almost always the same persons”. This gives very interesting insights, since you can observe the consequences of your hypothesis on the structure of the network (with Gephi and its “filter” function, these changes in parameters can be observed instantaneously on the viz).

 

 

 

 

The tool

- The software “Gaze” can be found on the software page of Clement Levallois (yours truly), here.

- A Youtube tutorial is available here (turn on the volume, and make it full screen and HD): here.

- The source code for Gaze can be checked on Github.

 

If you liked this post, you can follow me on Twitter, check my academic profile or suggest cool collaboration projects.

Clement Levallois

[EDIT March 25: the map has been updated, after a bug fix in the software. Previous version was incorrect]

*technically, this is simply a similarity measure, very common in the field of information retrieval. I use the cosine similarity. The basic idea of using a similarity measure was suggested by my work in scientometrics, where the viz of Rafols and al. rely heavily on it.

 

 

  One Response to “The quest for structure in networks”

  1. [...] Putting this list into the free MS Excel add-in NodeXL and using the Import > From Twitter List Network lets you get data on which of these accounts follow each other. I played around with visualising the network in NodeXL but found it easier in the end to put the data into Gephi getting the image below. These ‘hairballs’ have limited value and you’re best having a play with the interactive version, which is an export of Gephi visualised using the gexf-js tool by Raphaël Velt (De-hairballing is something Clement Levallois (‏@seinecle) and he kindly sent me a post to a new tool he’s creating called Gaze). [...]

 Leave a Reply

(required)

(required)

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code lang=""> <del datetime=""> <em> <i> <q cite=""> <strike> <strong> <pre lang="" extra="">