[cairo] PDF API for links and metadata

Behdad Esfahbod behdad at cs.toronto.edu
Mon Jun 6 01:11:35 UTC 2016


Thanks Adrian.  That sounds generally right to me.

On Sun, Jun 5, 2016 at 6:09 AM, Adrian Johnson <ajohnson at redneon.com> wrote:
> I have previously indicated I intend adding support for PDF hyperlinks
> for 1.16. PDF supports a large range of non drawing related features.
> Based on the various PDF files I have seen over the last few years, the
> majority of these features are never used. There are only a small number
> of interactive and document interchange features that are regularly used
> and would be reasonably easy to support in cairo with a minimal amount
> of extra API.
>
> These features are:
> - metadata
> - page labels
> - thumbnails
> - links
> - bookmarks
> - tagged pdf
>
> The following outlines the API that I am planning to add to support
> these features.
>
> Metadata
> --------
> PDF can contain document metadata that can be displayed by PDF viewers.
>
> The following API can be used to set the metadata.
>
> typedef enum _cairo_pdf_metadata {
>     CAIRO_PDF_METADATA_TITLE,
>     CAIRO_PDF_METADATA_AUTHOR,
>     CAIRO_PDF_METADATA_SUBJECT,
>     CAIRO_PDF_METADATA_KEYWORDS,
>     CAIRO_PDF_METADATA_CREATOR,
>     CAIRO_PDF_METADATA_CREATE_DATE,
>     CAIRO_PDF_METADATA_MOD_DATE,
> } cairo_pdf_metadata_t;
>
> void
> cairo_pdf_surface_set_metadata (cairo_pdf_metadata_t metadata,
>                                 const char *utf8);
>
> Setting utf8 to NULL removes any metadata previously set. The
> _CREATE_DATE defaults to the current date time. Date strings need to be
> a particular format: D:YYYYMMDDHHmmSSOHH'mm eg D:199812231952-08'00.
> Since most applications will use the "current time" default, I do not
> see the need for date specific API for setting the time.
>
>
> Page Labels
> -----------
> A PDF file may optionally define page labels that appear in the viewer
> instead of the page index number. For example the document may use roman
> numerals for the front matter and start the first chapter at page "1".
>
> The following function sets the page label for the current page. Setting
> utf8 to NULL removes any page label previously set.
>
> void
> cairo_pdf_surface_set_page_label (cairo_surface_t *surface,
>                                   const char *utf8);
>
>
> Thumbnails
> ----------
> PDF can store thumbnail images of the pages that can be displayed by the
> viewer.
>
> This function specifies the thumbnail size for the current page, and all
> subsequent pages until the next invocation of this function.
>
> void
> cairo_pdf_surface_set_thumbnail_size (int width, int height);
>
> Setting width and height to (0, 0) disables thumbnails. The default is
> (0, 0).
>
>
> Links
> -----
> PDF can contain hyperlinks to another location in the file, a location
> in another PDF file, or a URL.
>
> I initially started with the following API but then changed my mind. See
> the Tagged PDF section for the new API.
>
> The following function creates a link on the current page. In PDF links
> are defined by a one or more rectangles (more than one would be used
> when a link is split across two lines) defining the region that can be
> clicked on. Normally the application would set the rectangle to the
> extents of the link text.
>
> typedef enum _cairo_link_flags {
>     CAIRO_LINK_FLAG_APPEARANCE_DEFAULT = 0,
>     CAIRO_LINK_FLAG_APPEARANCE_NONE = 1,
>     CAIRO_LINK_FLAG_APPEARANCE_RECTANGLE = 2,
>     CAIRO_LINK_FLAG_APPEARANCE_UNDERLINE = 3,
>     CAIRO_LINK_FLAG_URI = 4,
> } cairo_link_flags_t;
>
> void
> cairo_create_link (cairo_t *cr,
>                    int num_rectangles,
>                    cairo_rectangle_t *rectangles,
>                    const char *dest_name,
>                    cairo_link_flags_t flags);
>
> If the appearance is not _NONE, use the current color and line style to
> draw the box/underline.
>
> For internal links we need a way to associate destination names with
> locations in the document. The following function creates a destination
> to the position x,y on the current page.
>
> typedef enum _cairo_destination_flags {
>     CAIRO_DESTINATION_FLAG_INTERNAL = 1, /* can optimize away name or
> the destination if unused */
> } cairo_destination_flags_t;
>
> void
> cairo_create_destination (cairo_t *cr,
>                           const char *dest_name,
>                           double x, double y,
>                           cairo_destination_flags_t flags);
>
> Bookmarks
> ---------
> A PDF file can contain bookmarks (also called document outline) that is
> a hierarchical set of links into the document. Using the
> cairo_create_destination() function it is easy to create a document
> outline with one API function.
>
> typedef enum _cairo_pdf_bookmark_flags {
>     CAIRO_BOOKMARK_FLAG_BOLD = 1,
>     CAIRO_BOOKMARK_FLAG_ITALIC = 2,
> } cairo_pdf_bookmark_flags_t;
>
> #define CAIRO_PDF_BOOKMARK_ROOT 0
>
> int
> cairo_pdf_surface_add_bookmark (int parent_id,
>                                 const char *utf8,
>                                 const char *dest_name,
>                                 cairo_pdf_bookmark_flags_t flags);
>
> This function adds a bookmark with the name, utf8, that links to
> dest_name. It returns a bookmark id. The parent_id is the parent
> bookmark above this bookmark. Set to CAIRO_PDF_BOOKMARK_ROOT for the top
> level bookmark.
>
>
> Tagged PDF
> ----------
> A tagged PDF contains additional data that defines the logical structure
> of the page content. The logical structure includes information such as
> headings, paragraphs, tables, and figures. Tagged PDF is intended to be
> used for things like extraction of text and graphics into other
> applications, reflowing of text and graphics to fit a different page
> size, searching and indexing, and accessibility support.
>
> Cairo is already using one of the tagged PDF features, ActualText, to
> support the cairo_show_text_glyphs() function.
>
> The following API can be used for tagging the drawing operations
> enclosed by the cairo_tag_begin() and cairo_tag_end() functions with the
> specified tag. Tags can be nested.
>
> void
> cairo_tag_begin (cairo_t *cr, const char *tag_name);
>
> void
> cairo_tag_end (cairo_t *cr, const char *tag_name);
>
> The tag names are defined in PDF32000 section 14.8 [1]. Examples of tag
> names include:
>
> "P": paragraph
> "H1" - "H6": headings
> "Table": table
> "TR", "TH", "TD", "THead", "TBody" "TFoot": table elements
> "Link": hyperlink
>
> PDF32000 also defines an extensive range of attributes that can be
> include with each tag. I have omitted attributes from the API to keep it
> simple and because the tag name alone should be sufficient for the
> intended usage.
>
>
> New Link API
> ------------
> The SVG backend also supports hyperlinks. SVG links are defined using
> the 'a' element. eg
>
>   <a xlink:href="http://www.w3.org">
>     <ellipse cx="2.5" cy="1.5" rx="2" ry="1"
>              fill="red" />
>   </a>
>
> Instead of requiring the application to provide a rectangle and then
> cairo has to figure out what text is inside the rectangle, we can use
> the tagged API to define the link text.
>
> #define CAIRO_TAG_LINK "Link"
>
> Then the application can wrap the link text drawing operations and the
> call to cairo_create_link() (with num rectangles = 0) with
> cairo_tag_begin(CAIRO_TAG_LINK) and cairo_tag_end(CAIRO_TAG_LINK).
>
> It then occurred to me that we could drop the
> cairo_create_link()/cairo_create_destination() API and extend the
> tagging API to also create links.
>
>
> #define CAIRO_TAG_LINK "Link"
> #define CAIRO_TAG_DEST "cairo.dest" /* cairo prefix because it is not a
> standard PDF tag */
>
> void
> cairo_tag_begin (cairo_t *cr,
>                  const char *tag_name,
>                  const char *attributes);
>
> void
> cairo_tag_end (cairo_t *cr, const char *tag_name);
>
> For example:
>
> Create a destination at position 100,20 on the current page.
>
>   cairo_tag_begin (cr, CAIRO_TAG_DEST, "pos=\"100 20\"");
>
> If the position is not specified it defaults to the top left of the
> extents of the drawing operations enclosed by this tag. If no drawing
> operations within the tag, the default position is the top left of the page.
>
> Create URL link:
>
>   cairo_tag_begin (cr, CAIRO_TAG_LINK,
>    "href=\"http:://cairographics.org/\" rect=\"0 0 100 20\"
>     appearance=\"underline\"");
>
> If the rectangle is not specified, it defaults to the extents of the
> drawing operations enclosed by this link tag.
>
> Create an internal link:
>
>   cairo_tag_begin (cr, CAIRO_TAG_LINK, "ref=\"section3\"
>     appearance=\"none\"");
>
>
> [1]
> http://www.adobe.com/content/dam/Adobe/en/devnet/acrobat/pdfs/PDF32000_2008.pdf
> --
> cairo mailing list
> cairo at cairographics.org
> https://lists.cairographics.org/mailman/listinfo/cairo



-- 
behdad
http://behdad.org/


More information about the cairo mailing list