[cairo] Some thoughts on metadata

Sat Apr 29 09:11:55 PDT 2006

On Wed, 18 Jan 2006 09:37:05 -0800, Carl Worth wrote:
> 
> Obviously, there _are_ a lot of printing-specific options, settings,
> and metadata that users will need to provide. And the point has been
> made above that these need to be available on a per-page basis.
...
> A few of us (and likely a group missing some printing experts) started
> talking about what this new options-setting API would look like at the
> GNOME summit. The strawman we came up with is here, and might make a
> good starting point for further discussion:
> 
> 	http://live.gnome.org/GtkTeamPrintingBreakout
> 	(See the "Cairo's printing API" section)
> 
> It's clearly not complete, but the question is whether this is a
> workable-basis for extending to something complete.

So one broken piece of that scheme is that it muddles some things that
are strictly document metadata (title, author, etc.) with things that
have more direct impact on printing/print preview such as
orientation---let's call these "printer options and settings".

My current goal for the cairo 1.2 release is to provide support for
the most essential printer options and settings. (In a separate mail,
I'll propose something for doing per-page paper size changes as well
as setting PPD options).

I don't plan to support the more general aspects of document metadata
in cairo 1.2. I've spent some time looking into the metadata issues
this week, and it's not yet obvious what the right approach
is. Meanwhile, the metadata doesn't seem essential for direct support
of printing, so it doesn't fit the priorities of the cairo 1.2 release.

But, since I've looked at this a bit, I'll at least summarize some of
what I've found here.

I haven't found any direct support in PostScript[1] for document
metadata, but the Document Structuring Conventions stuff[2] does
define comments for 7 metadata fields.

Meanwhile, PDF[3] provides simple support for 8 defined fields in the
Document Information Dictionary. In addition, PDF also provides for a
more elaborate scheme in which metadata can be attached to various
elements throughout a document's structure with an XML dialect known
as XMP[4].

Independent of PS and PDF, there a standard scheme for metadata terms
known as the Dublin Core Metadata Initiative[5] which has ISO
standardization for parts. Dublin Core defines 15 basic metadata terms
and provides RDF schemas for using them.

I looked at the 7 terms from PostScript's DSC, the 8 from PDF's
Document Information Dictionary, and the 15 from Dublin Core and
attempted to group them according to related terms. I came up with the
following:

PS/DSC          PDF             Dublin Core
======		===		===========
Title           Title           Title
--              --              --
Creator         Creator         Creator
                Author          Contributor
                Producer        Publisher
--              --              --
CreationDate    CreationDate    Date
                ModDate
--              --              --
                Subject         Subject
                Keywords        Description
--              --              --
Copyright                       Rights
--              --              --

PostScript and Dublin Core provide a few more terms that don't seem to
have directly corresponding terms in the other groups. These are:

PS/DSC:		For, Routing

Dublin Core:	Coverage, Format, Resource Identifier, Language,
		Relation, Source, Type

I did find one source recommending using XMP to embed Dublin
Core/RDF metadata into PDF files[6].

So I'm not quite sure what the right API would be to try to pull all
of that together. I've been talking about PostScript and PDF output
here. But it would be reasonable to embed some Dublin Core/RDF into a
PNG header as well.

Maybe the right API would just accept name/value pairs and could
recommend using Dublin Core names, (or point to something like the
table above). One question is how much interpretation/conversion cairo
should do (if any). Or if it should just shove the names it gets
wherever it can without really looking at them. (The fact that the
PostScript/DSC stuff is implemented as comments and the PDF Document
Information is a dictionary means that both can naturally be
extended.)

Anyway, thoughts from anyone would be appreciated---of particular
interest would be suggestions from people who have needs/experience in
this field. For example, are you already generating metadata like this
and need to maintain interoperability with something else as you
contemplate a switch to using cairo to generate your documents?

-Carl

[1] http://www.adobe.com/products/postscript/pdfs/PLRM.pdf
[2] http://partners.adobe.com/public/developer/en/ps/5001.DSC_Spec.pdf
[3] http://partners.adobe.com/public/developer/en/pdf/PDFReference16.pdf
[4] http://www.adobe.com/products/xmp/
[5] http://dublincore.org/
[6] http://creativecommons.org/technology/xmp-help
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: not available
Url : http://lists.freedesktop.org/archives/cairo/attachments/20060429/a57ea035/attachment.pgp