Network News

X My Profile
View More Activity

NSA Issues 'Metadata' Guidelines for Agencies

Following a series of foibles in which federal agencies and even the White House issued documents that contained hidden data that readers weren't meant to see, the the National Security Agency has issued guidelines for the federal government on removing revision histories and other so-called "metadata" from official documents before public release.

Metadata literally means "data about data", but that's not very descriptive. Essentially, metadata is automatically embedded in documents created with popular software such as Microsoft Word or Adobe Acrobat, and includes things like the document author's name, the date it was created, and often any changes or revisions that have been made and by whom.

Back in December, I wrote about an incident involving metadata that presented a rather embarrassing episode for the Bush administration's efforts to win more support for the war in Iraq.

But metadata isn't all bad -- sometimes it helps law enforcement officials track down the bad guys. Case in point: In August, the FBI and Moroccan authorities arrested an 18-year-old hacker Farid Essebar, who went by the online screen name "Diabl0" for creating the "Zotob" worm that infected thousands of computers at a number of high-profile companies last summer.

In a presentation at the recent Shmoocon hacker conference, Joe Stewart, a senior security researcher at LURHQ, talked about how authorities seized Essebar's computer and found a copy of the worm's source code. When they dissected it they uncovered some interesting metadata: Apparently Essebar had compiled the worm's source code with Microsoft Visual Studio, which embedded the text string "C:\Documents and Settings\Farid." Possessing source code for a worm that whacked a bunch of Fortune 500 companies is bad enough, but having your name engraved in the heart of it is downright damning.

Anyway, if you're interested in finding metadata or making sure no one else finds it in your documents, check out the NSA's tips and the ones we posted a while back.

Here's one tip I didn't mention in the earlier post on this: a quick and dirty way to find metadata hidden in Word documents. Start up Word, click File, then Open; under the "Files of Type" drop-down menu select "Recover Text from Any File"; then select the file you want to open and it should display any metadata.

Funnily enough, the PDF document released by the NSA also contains metadata. The text at the top of the document says it was created Dec. 13, 2005, but the metadata inside the PDF indicates it was created Jan. 10, 2006. The guy who pointed this out to me -- fellow security blogger Harlan Carvey, who is also a forensics expert -- says the discrepancy is due to the fact that the document was originally created in Microsoft Word, then converted to PDF on Jan. 10.

By Brian Krebs  |  January 24, 2006; 1:00 PM ET
Save & Share:  Send E-mail   Facebook   Twitter   Digg   Yahoo Buzz   StumbleUpon   Technorati   Google Buzz   Previous: FTC Urged to Sue Adware Maker 180Solutions
Next: T-Mobile Sues Cell Phone Record Diggers


Cool! Thanks for the link.

Posted by: William | January 24, 2006 3:15 PM | Report abuse

MS Word files are binary and can hide metadata. What your 'quickie' method reveals is only the metadata that MS wants you to have. If you even try to find their embedded copyright notices, for example, you are in violation of the EULA.

The problem of binary source files is so severe that NIH offers 'open source' alternatives:

Posted by: GTexas | January 24, 2006 5:53 PM | Report abuse


You're correct about the MSWord format...but I'm not sure what that has to do with the issue at hand. Sure, MSWord documents contain metadata...we all know that. Besides things like the last (up to 10) people to save the document, when the document was last saved/printed, you can also determine if the document was created or revised on a Mac or Windows a bunch of other interesting data.

However, this information isn't necessarily parsed and carried over when the file is converted to PDF.

As far as this "quickie" method you're referring're way off. First off, the MS API provides the means for someone knowledgeable to retrieve the "copyright notices"...and that API is publicly available at the MS site. In addition, one can retrieve all of the embedded metadata from an MSWord document without having a Windows OS, MSOffice, or even the Word application installed. Since the Word document doesn't carry any warnings itself, what stops someone from opening the document in a hex editor, or parsing it using some scripting language?

H. Carvey
"Windows Forensics and Incident Recovery"

Posted by: keydet89 | January 25, 2006 6:53 AM | Report abuse


All more or less true, but with a caveat.

The original issue at hand was the "conversion" of MS Word documents to PDF. My point is that the metadata of a document is in the character entities (symbolic tokens), not the rendered glyphs (pictures of tokens).

Both Microsoft and Adobe bear the reponsibility that their customer's documents do not contain any "pictures" which may tell 1,000 unflattering words about the author.

Posted by: GTexas | January 25, 2006 3:33 PM | Report abuse


> Both Microsoft and Adobe bear the reponsibility...

Since when? Since you decided this? I've used several conversion programs, some freeware...and not once did I see anything in a license agreement that stated what you said.

Making assumptions like that is simply a deluded way of avoiding personal responsibility. Assuming that some developer living thousands of miles away is somehow going to divine your needs and requirements and do everything he or she can to protect your secrets is mind-numbingly silly.

Neither application you mention makes any claims of protecting the user's privacy. That's up to the user.

H. Carvey
"Windows Forensics and Incident Recovery"

Posted by: keydet89 | January 26, 2006 7:49 AM | Report abuse


>...Making assumptions like that is simply >a deluded way of avoiding personal >responsibility...Neither application you >mention makes any claims of protecting >the user's privacy. That's up to the user.

"Companies like ChoicePoint are realizing that it is a bad business practice to ignore the security of consumer data," Majoras (FTC Chairman Deborah Platt Majoras) said. The settlement, she said sends the message that companies that deal in consumer data "must guard the front door -- through procedures for verifying and identifying customers -- as well as guard the backdoor against hackers."

The unintentional disclosure of metadata is data "going out the back door" to use her metaphor. Apparently this is a delusion I share with The Federal Trade Comission.

Posted by: GTexas | January 27, 2006 4:26 PM | Report abuse


Wow! You have to be commended for your innate ability to make connections where there simply are none.

You should really try to keep things in context, my friend. From the same ChoicePoint blog entry: "...ChoicePoint acknowledged that crooks had gained access to thousand of consumer records by posing as legitimate businesses."

Poor business verification practices has nothing whatsoever to do with metadata stored within documents.

Maybe your tin-foil helmet is a little too snug.

Posted by: keydet89 | January 30, 2006 9:17 AM | Report abuse

> Maybe your tin-foil helmet is a little too snug.

Doing a hell of a job there, keydet89.

> Poor business verification practices has nothing whatsoever to do with metadata stored within documents.

If you think that the audience and distribution have nothing to do with the metadata stored in a document then I think you do not understand metadata -- sort of surprizing since you live in the "Gotcha" capital of the world.

ChoicePoint's argument that "bad guys lied to us" carries no weight in a civil action -- bad guys lie; it's in the metadata of the profile.

Posted by: GTexas | January 30, 2006 4:34 PM | Report abuse

The comments to this entry are closed.

RSS Feed
Subscribe to The Post

© 2010 The Washington Post Company