Google NGram Viewer — “Culturomics”?

NgramViewerThe NGram Viewer from Google made a splash when it was introduced in December of 2010. It is essentially a data-mining application that enables queries against Google's massive digitized books corpus.  Researchers behind the Google Books project wrote about the Viewer in the ambitiously titled Science article "Quantitative Analysis of Culture Using Millions of Digitized Books." Some of the ways the tool has been used include exploring social and political change in China , the evolution of marketing history, and changes in the popularity of specific drugs. The authors called this "culturomics," and defined it as "the application of high-throughput data collection and analysis to the study of human culture."
According to that article, the "oldest works were published in the 1500s. The early decades are represented by only a few books per year... By 1800, the corpus grows to 60 million words per year; by 1900, 1.4 billion; and by 2000, 8 billion."

However, remember some of the limitations of the N-gram Viewer:

  • Google Books consists of only about 4% of all books ever written
  • the data end in 2008
  • the project is only books, and most of them come from libraries, meaning that  popular culture isn't really reflected
  • it takes about a decade for events or trends to start being reflected in literature
  • the graphs are sized for easy viewing, but the numbers on the X axis are usually tiny

Use commas to separate your words or phrases, and it's CASE-SENSITIVE, so if you're looking up proper names, use capital letters. LET'S PLAY!

  • Greece, Italy, Athens, Rome -- Boy, Rome has really gotten attention over the years. (Not much recently, though.)
  • dogs, cats, dog, cat -- What on earth is that giant “cat” spike?? It’s between about 1612 and 1624. To look more closely, you can either enter those dates, or scroll to the bottom of the page, choose a date range, and see what kinds of books are listed. Oh, I see – it’s all kinds of usages of those three letters together, including abbreviations and non-English words. But it sure looks impressive.
  • Our interest in sharks certainly keeps growing. Note the blip at 1974, when Jaws was written.
  • black hole, worm hole, wormhole -- What’s that “black hole” peak between 1610 and 1618? When I focus the date, it looks like there are two peaks, at 1610 and 1618. A search of Google Books for 1610-1618 gives two results. They’re both referring to the same thing – a prison.
  • Facebook, Twitter, Instagram, Tumblr, Flickr -- Tumblr and Instagram got zero hits – this means that (1) through 2008, (2) they were mentioned in fewer than 40 books (3) that had been scanned by Google. No surprise, since they were launched in 2007 and 2010, respectively.

What groups of words or phrases would you like to see displayed in the N-gram Viewer?  Here’s how to do more advanced searches -- the Viewer is more powerful than you realize!  If you want even more information, here’s their page about datasets.

Good-bye, farewell, ta-ta, have a nice day!

 

About Sue Vazakas

Science/Engineering Librarian and devoted reader.

Leave a reply