Document Security 101
Few things in the world of digital documents are as pesky and revealing as "metadata" -- the information automatically embedded in documents by popular software such as Microsoft Word or Adobe Acrobat. When the government or a business forgets to purge metadata from documents before releasing them to the public, the results can range from embarrassing to dangerous.
On Sunday, the New York Times ran a story on President Bush's Nov. 30 speech on the war in Iraq. While White House officials said many federal departments contributed to the new national strategy on Iraq, one look at the metadata stored in the 35-page National Security Council document, titled, "Our National Strategy for Victory in Iraq," showed that the original author of the paper was Peter D. Feaver, a Duke University political scientist who was recruited to join the NSC staff as a special adviser in June after he and several Duke colleagues presented the administration with an analysis of polls about the Iraq war. Their analysis concluded that Americans would support a war with mounting casualties if they believed that effort would ultimately succeed.
The Times piece didn't uncover a huge scandal here. But it's not the first time official organizations have published a document that contained a little more information than they realized. In October, the United Nations released a report on an investigation into the Valentine's Day assassination of Lebanese Prime Minister Rafik Hariri. That document contained metadata showing substantial revisions had been made, including removing the names of persons closely tied to the Syrian government.
Even washingtonpost.com was once burned by metadata. In our coverage of the 2002 Washington-area sniper attacks, we published a letter allegedly written by the snipers and sent to police demanding that $10 million be deposited into a stolen credit card account. The editors here blacked out some of the more sensitive information in the scanned-to-PDF version of the document, including the bank account number where the loot was to be deposited. Unfortunately, as document sleuths later pointed out, the commercial version of Adobe Acrobat software can easily remove the blacked-out areas intended to hide certain details.
Finding metadata in a document is as simple as a few keystrokes. To locate metadata in an Adobe PDF document, check out this tutorial on Adobe's site. Microsoft also has published instructions that detail how to find and remove metadata from documents created in Microsoft Office. Harlan Carvey, a computer forensics expert here in Washington, also has posted some interesting findings on locating metadata online and in Microsoft Windows.
If you're interested in scanning Web sites for documents that contain metadata, check out Trace, a free tool from document security company Workshare. I played with Trace the other day while browsing the White House Web site and found a few documents with some interesting revision history. One document, dated May 2002, lays out a draft of the White House's e-government strategy, and includes a ton of metadata and revision information. Near the top of the document, for example, is a redacted warning not to apply the OMB seal until the last minute, because -- as Keith Thurston, assistant deputy associate administrator in the Office of e-Government and Technology at the US General Services Administration, wrote, "it mungs the printers."
The comments to this entry are closed.