Tuesday, June 27, 2006

info viz 9: dotplots

Here's your Shakespeare. All of it.

"The concatenation order of Shakespeare's works was determined by a combinatorial optimization algorithm that attempts to position dark grid boxes near the main diagonal, emphasizing cluster of similar documents. Without automatic reordering it would be impossible to see the large dark cluster in the upper left formed by the European Histories. The tokens matching the most in this cluster (the dominant vocabulary in the European Histories after term weighting) include Richard, God, Duke, John, Lord, Henry, Sir, death, Queen, York, France, hand, and blood."

More: "Dotplots were first used in Biology to study homology (self-similarity) in genetic sequences. These diagonals indicate an almost-perfect match between two DNA sequences. Unlike the early dotplots used in Biology, we have generalized the technique to allow arbitrary weighting functions and the plotting of much larger amounts of data."

A sort of, in this case, visual expression of bibliometric data...

Build your own here (link). "This version uses character tokenization with no weighting, reconstruction, or approximation, so every pair of matching characters is plotted."


Reading: Black Hole by Charles Burns

No comments: