Google parses this history of 500 billion words
Word lovers, rejoice. Google, with the help of Harvard researchers, has created the "Books Ngram Viewer," a dream database to understand the historical and cultural changes of 500 billion words. The data set culled from these words can be downloaded for scholars, but for the amateur student of lexicography, a simple online tool Google Labs built lets people map the changes in our vocabulary since 1500 A.D. For instance, "women" and "men" did not meet in use until 1982. Now "women" appears more often than "men." It's an endlessly entertaining game. And the study, which appeared in the journal Science, has already come up with some fascinating tidbits:
1. The digitized texts make up only 4 percent of all books ever printed. They digitized 5,195,769 books. And that's only 4 percent!? That means 129 million books have been printed? Holy cannoli.
2. Even though this is only 4 percent of all the books ever printed, "If you tried to read only the entries from the year 2000 alone, at the reasonable pace of 200 words/minute, without interruptions for food or sleep, it would take eighty years."
3. I just read "The Professor and the Madman," about the monumental task of putting together the Oxford English Dictionary. Despite the Herculean task (and 20 volumes of text) the computer scanning shows that the dictionary only covers about 25 percent of existing words.
4. Actors become famous at 30 years of age. Politicians at 50. Scientists at 60 and Mathematicians never.
Play around with the tool here.
Update: Love this! One of the readers Peter Pappas points out that everyone needs to go to the Ngram site and type in the words "never gonna give you up." That is all. Thanks, Peter!
| December 17, 2010; 1:14 PM ET
Categories: The Daily Catch
Save & Share: Previous: Jon Stewart's campaign for the Zadroga bill (Video)
Next: Word Lens iPhone app: Yet another reason to book that trip to Argentina
Posted by: ozpunk | December 17, 2010 3:07 PM | Report abuse
Posted by: paulm9 | December 17, 2010 3:53 PM | Report abuse
Posted by: peterpappas | December 17, 2010 4:28 PM | Report abuse
Posted by: CorpusProf | December 20, 2010 10:56 AM | Report abuse