Subscribe to DSC Newsletter

Metadata for document indexing, cataloguing, copyright protection and forensics

Most applications that produce content (printout, image, video, Word document, Excel spreadsheet, web page etc.) pretty much anything that can be published, nowadays comes with metadata. Metadata is data about data.

Metadata are information summaries about data (in short, data about data), embedded in the document (image, web page HTML code). It is used for

  • Information indexing, catalog building, structuring unstructured data, and information retrieval. A typical example is how Google assigns keywords and categories to an image or video, to optimize image and video searching. Actually, Google does not directly assign keywords to an image, by analyzing its binary file, though Google will look at keywords on the web page where the image is published using its web page / keyword mapping index. Instead Google extracts the first few bytes of the image (the header, encoded in binary format) to assign keywords and other info (image size) from the image in question. This technology is however vulnerable to manipulations, where image creators include false metadata in the image to get more visibility and appear in irrelevant search results.
  • Forensics. Example: Someone prints and sends an anonymous threatening letter to President Obama. Most modern printers will add an invisible watermark on the paper when printing the document, with encrypted information about when the document was printed, which parameters were used, maybe your IP address, and a unique product identifier that allows enforcement agencies to identify the printer used in the process. So would-be terrorists, beware! Your printer can get you arrested.

My questions:

  1. Is there any software that will allow you to embed and encrypt invisible watermark or digital (invisible) signatures in images and videos, in such a way that Google can decode it? Or for copyright protection? How do you encode such metadata to make it usable by Google, and highly secure? By secure, I mean that even if you alter the image, the encrypted watermark/signature will still be readable and usable, possibly because it relies on some redundancy mechanisms.
  2. Is there a standard for metadata format (XML standard for metadata recorded as text, and another one for binary metadata embedded in binary files)

Views: 1124

Reply to This

Replies to This Discussion

The reason that we love standards is that there are so many of them!  The same can be said about Metadata standards --> http://en.wikipedia.org/wiki/Metadata_standards. I do not know off-hand if there is a standard for binary-coded metadata, but there is no reason it cannot be done. For example, a JPEG2000 image is a binary file --- nevertheless, its metadata are in an XML format yet they are written in binary notation within the image file: http://www.jpeg.org/jpeg2000/metadata.html . Here is a huge long report on this: http://www.jpeg.org/metadata/15444-2.PDF (which I did not bother to read). :)

Another answer, from Patrick Maroney:

Steganography would be one method for achieving most of the objectives. One paper "A Secure Robust Image Steganographic Model" (2000) by Yeuan-Kuen Lee , Ling-hwei Chen 
discusses approaches on improving the survivability/resilience of the encoded "covert channel". it can be found @http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.32.4617. (As always take appropriate measures to mitigate any risks from accessing potentially malicious content).

This has worked to watermark specific documents, logos, etc. (commonly used by modern adversaries to create attack packages using content from public facing websites). This can be a highly effective methodology to identify where adversary is acquiring content in their Reconaissance and Weaponization phases.

As to providing Google a way to securely index and search against covert metadata buried in a covert channel? "Tongue in Cheek" comment: What makes one think they (Google) aren't already capable of this and more?

Addendum:

That link to the paper is stale. The NCTU paper can be found on their web site: http://debut.cis.nctu.edu.tw/Publications/pdfs/C22.pdf (same caveat on taking appropriate meauses to mitigate any risks from accessing potentially malicious content).

RSS

On Data Science Central

© 2021   TechTarget, Inc.   Powered by

Badges  |  Report an Issue  |  Privacy Policy  |  Terms of Service