[cairo] PDF API for links and metadata

Adrian Johnson ajohnson at redneon.com
Sun Jun 5 13:09:30 UTC 2016


I have previously indicated I intend adding support for PDF hyperlinks
for 1.16. PDF supports a large range of non drawing related features.
Based on the various PDF files I have seen over the last few years, the
majority of these features are never used. There are only a small number
of interactive and document interchange features that are regularly used
and would be reasonably easy to support in cairo with a minimal amount
of extra API.

These features are:
- metadata
- page labels
- thumbnails
- links
- bookmarks
- tagged pdf

The following outlines the API that I am planning to add to support
these features.

Metadata
--------
PDF can contain document metadata that can be displayed by PDF viewers.

The following API can be used to set the metadata.

typedef enum _cairo_pdf_metadata {
    CAIRO_PDF_METADATA_TITLE,
    CAIRO_PDF_METADATA_AUTHOR,
    CAIRO_PDF_METADATA_SUBJECT,
    CAIRO_PDF_METADATA_KEYWORDS,
    CAIRO_PDF_METADATA_CREATOR,
    CAIRO_PDF_METADATA_CREATE_DATE,
    CAIRO_PDF_METADATA_MOD_DATE,
} cairo_pdf_metadata_t;

void
cairo_pdf_surface_set_metadata (cairo_pdf_metadata_t metadata,
                                const char *utf8);

Setting utf8 to NULL removes any metadata previously set. The
_CREATE_DATE defaults to the current date time. Date strings need to be
a particular format: D:YYYYMMDDHHmmSSOHH'mm eg D:199812231952-08'00.
Since most applications will use the "current time" default, I do not
see the need for date specific API for setting the time.


Page Labels
-----------
A PDF file may optionally define page labels that appear in the viewer
instead of the page index number. For example the document may use roman
numerals for the front matter and start the first chapter at page "1".

The following function sets the page label for the current page. Setting
utf8 to NULL removes any page label previously set.

void
cairo_pdf_surface_set_page_label (cairo_surface_t *surface,
                                  const char *utf8);


Thumbnails
----------
PDF can store thumbnail images of the pages that can be displayed by the
viewer.

This function specifies the thumbnail size for the current page, and all
subsequent pages until the next invocation of this function.

void
cairo_pdf_surface_set_thumbnail_size (int width, int height);

Setting width and height to (0, 0) disables thumbnails. The default is
(0, 0).


Links
-----
PDF can contain hyperlinks to another location in the file, a location
in another PDF file, or a URL.

I initially started with the following API but then changed my mind. See
the Tagged PDF section for the new API.

The following function creates a link on the current page. In PDF links
are defined by a one or more rectangles (more than one would be used
when a link is split across two lines) defining the region that can be
clicked on. Normally the application would set the rectangle to the
extents of the link text.

typedef enum _cairo_link_flags {
    CAIRO_LINK_FLAG_APPEARANCE_DEFAULT = 0,
    CAIRO_LINK_FLAG_APPEARANCE_NONE = 1,
    CAIRO_LINK_FLAG_APPEARANCE_RECTANGLE = 2,
    CAIRO_LINK_FLAG_APPEARANCE_UNDERLINE = 3,
    CAIRO_LINK_FLAG_URI = 4,
} cairo_link_flags_t;

void
cairo_create_link (cairo_t *cr,
                   int num_rectangles,
		   cairo_rectangle_t *rectangles,
		   const char *dest_name,
		   cairo_link_flags_t flags);

If the appearance is not _NONE, use the current color and line style to
draw the box/underline.

For internal links we need a way to associate destination names with
locations in the document. The following function creates a destination
to the position x,y on the current page.

typedef enum _cairo_destination_flags {
    CAIRO_DESTINATION_FLAG_INTERNAL = 1, /* can optimize away name or
the destination if unused */
} cairo_destination_flags_t;

void
cairo_create_destination (cairo_t *cr,
                          const char *dest_name,
			  double x, double y,
			  cairo_destination_flags_t flags);

Bookmarks
---------
A PDF file can contain bookmarks (also called document outline) that is
a hierarchical set of links into the document. Using the
cairo_create_destination() function it is easy to create a document
outline with one API function.

typedef enum _cairo_pdf_bookmark_flags {
    CAIRO_BOOKMARK_FLAG_BOLD = 1,
    CAIRO_BOOKMARK_FLAG_ITALIC = 2,
} cairo_pdf_bookmark_flags_t;

#define CAIRO_PDF_BOOKMARK_ROOT 0

int
cairo_pdf_surface_add_bookmark (int parent_id,
                                const char *utf8,
                                const char *dest_name,
                                cairo_pdf_bookmark_flags_t flags);

This function adds a bookmark with the name, utf8, that links to
dest_name. It returns a bookmark id. The parent_id is the parent
bookmark above this bookmark. Set to CAIRO_PDF_BOOKMARK_ROOT for the top
level bookmark.


Tagged PDF
----------
A tagged PDF contains additional data that defines the logical structure
of the page content. The logical structure includes information such as
headings, paragraphs, tables, and figures. Tagged PDF is intended to be
used for things like extraction of text and graphics into other
applications, reflowing of text and graphics to fit a different page
size, searching and indexing, and accessibility support.

Cairo is already using one of the tagged PDF features, ActualText, to
support the cairo_show_text_glyphs() function.

The following API can be used for tagging the drawing operations
enclosed by the cairo_tag_begin() and cairo_tag_end() functions with the
specified tag. Tags can be nested.

void
cairo_tag_begin (cairo_t *cr, const char *tag_name);

void
cairo_tag_end (cairo_t *cr, const char *tag_name);

The tag names are defined in PDF32000 section 14.8 [1]. Examples of tag
names include:

"P": paragraph
"H1" - "H6": headings
"Table": table
"TR", "TH", "TD", "THead", "TBody" "TFoot": table elements
"Link": hyperlink

PDF32000 also defines an extensive range of attributes that can be
include with each tag. I have omitted attributes from the API to keep it
simple and because the tag name alone should be sufficient for the
intended usage.


New Link API
------------
The SVG backend also supports hyperlinks. SVG links are defined using
the 'a' element. eg

  <a xlink:href="http://www.w3.org">
    <ellipse cx="2.5" cy="1.5" rx="2" ry="1"
             fill="red" />
  </a>

Instead of requiring the application to provide a rectangle and then
cairo has to figure out what text is inside the rectangle, we can use
the tagged API to define the link text.

#define CAIRO_TAG_LINK "Link"

Then the application can wrap the link text drawing operations and the
call to cairo_create_link() (with num rectangles = 0) with
cairo_tag_begin(CAIRO_TAG_LINK) and cairo_tag_end(CAIRO_TAG_LINK).

It then occurred to me that we could drop the
cairo_create_link()/cairo_create_destination() API and extend the
tagging API to also create links.


#define CAIRO_TAG_LINK "Link"
#define CAIRO_TAG_DEST "cairo.dest" /* cairo prefix because it is not a
standard PDF tag */

void
cairo_tag_begin (cairo_t *cr,
                 const char *tag_name,
                 const char *attributes);

void
cairo_tag_end (cairo_t *cr, const char *tag_name);

For example:

Create a destination at position 100,20 on the current page.

  cairo_tag_begin (cr, CAIRO_TAG_DEST, "pos=\"100 20\"");

If the position is not specified it defaults to the top left of the
extents of the drawing operations enclosed by this tag. If no drawing
operations within the tag, the default position is the top left of the page.

Create URL link:

  cairo_tag_begin (cr, CAIRO_TAG_LINK,
   "href=\"http:://cairographics.org/\" rect=\"0 0 100 20\"
    appearance=\"underline\"");

If the rectangle is not specified, it defaults to the extents of the
drawing operations enclosed by this link tag.

Create an internal link:

  cairo_tag_begin (cr, CAIRO_TAG_LINK, "ref=\"section3\"
    appearance=\"none\"");


[1]
http://www.adobe.com/content/dam/Adobe/en/devnet/acrobat/pdfs/PDF32000_2008.pdf


More information about the cairo mailing list