[cairo] New PDF backend snapshot

Wed Dec 22 11:51:07 PST 2004

Carl Worth wrote:
> On Tue, 21 Dec 2004 17:51:26 -0500, Kristian Høgsberg wrote:
> 
>>backend specific "meta data" for the underlying resource.  The only 
>>difference with the PDF backend is that it doesn't rely on an external 
>>library.
> 
> That's the key difference there. I'd like to reduce the amount of
> setup-work needed before a user can begin using cairo as much as
> possible. When an external library is involved, there's not much I can
> do about that in cairo of course, (though higher-level libraries could
> still help out).
> 
>>It is certainly possible to remove cairo_pdf_document_t from 
>>the public API and manipulate document properties through the surface 
>>object.  An alternative route would be to split part of the backend into 
>>it's own library and namespace like glitz.
> 
> I would suggest the former. A separate library could make sense if it
> were designed to be a general-purpose PDF creation library. I can
> think of a lot of things that would belong there that don't fit well
> in cairo. But I don't know of any such PDF library that already
> exists, and I didn't get the idea that you were interested in creating
> one.

As long as I'm just testing the backend on the cairo snippets, a simple, 
easy to use API is fine.  But if you want to create a "real" PDF 
document, you'll want to set the author and creator, you'll want 
bookmarks (table of contents).  I just want to make sure that there is a 
way we can add these things later when it becomes more clear what kind 
of meta data we want to support.

Oh, and I'm not sure a standalone PDF library makes sense here - the API 
would look very much like cairo's plus a number of PDF specific meta 
data functions and at that point, why not just use cairo?

But other than that, sure, I'll push the cairo_pdf_document_t back into 
the PDF backend and change the public API so it matches the PS API.

> And without cairo_pdf_document_t, we could still switch to some
> external PDF library later if something does appear.
> 
> 
>>>>+void
>>>>+cairo_set_target_pdf (cairo_t  *cr,
>>>>+                     cairo_pdf_document_t *document);
>>
>>I understand that I'm breaking the rule here, I'm just not sure how you 
>>would do it otherwise.
> 
> 
> The convention I want is that the following template should work for
> all backend:
> 
> 	cairo_surface_t *surface;
> 	surface = cairo_foo_surface_create (/* foo-specific args*/);
> 	cairo_set_target_surface (cr, surface);
> 	cairo_surface_destroy (surface);
> 
> And, for all backends, there is a convenience function that achieves
> the same, but without making the user manage a surface object:
> 
> 	cairo_set_target_foo (/* foo-specific args */);
> 
> 
>>Of course, dropping cairo_pdf_document_t from 
>>the API would require both to take the same arguments as 
>>cairo_pdf_document_create(), which would solve the problem.
> 
> 
> Yes.
> 
> 
>>                                                            But if you 
>>did that, cairo_pdf_surface_create() could only be used for creating a 
>>surface to be used with cairo_set_target_surface() - you couldn't 
>>composite it into another PDF surface since it would belong to a 
>>different document.
> 
> That would be broken. Perhaps what you are calling
> cairo_pdf_document_t should just be renamed cairo_pdf_surface_t?

Ugh... that's what I first did, and the split into document and surface 
was done primarily to support compositing of one PDF surface onto 
another.  I'm generating PDF as the user draws.  Compositing PDF 
surfaces is done by referencing a stream of drawing operators from 
elsewhere in the file.  Referencing a stream of drawing operators from 
another file is not possible with the current design.

If this operation must be supported,  I'll need to queue up all the 
drawing in memory for a surface and only write the PDF drawing operators 
to file when cairo_show_page() is called.  In a way, much like the 
current PS backend uses an image surface.  I'm not saying I don't like 
the idea, it's just a major rewrite, plus it will use more memory.  In 
fact, I think something like this will be necessary for the "real" PS 
backend, since you need to do a global analysis on the final stream of 
drawing commands to figure out where to use bitmaps.

> Perhaps not the ideal name, but that's the name imposed by the API
> convention. Then it could (internally) contain an cairo_pdf_page_t?
> 
> Or perhaps the two structures should be merged? That would certainly
> prevent their separation inadvertently breaking the behavior of
> cairo_pdf_surface in user-visible ways.
> 
> 
>>less).  The downside is that it's one of the more advanced features of 
>>PDF 1.4 so it will be interesting to see what level of support the 
>>various viewers implement.
> 
> 
> It seems rather convenient that a cairo-based PDF viewer is beginning
> to come together... But portability of the output is something to keep
> in mind.

I'm going to go through with the soft mask approach and try to get it 
working.  I'm not quite ready to rule out the set_clip_trapezoids() 
approach yet, though, since that will allow us to implement much more of 
cairo's functionality without using soft masks.  We'll see how the 
vairous viewers handle the soft mask based PDF files.

>>I think there is also value in separating out boring container code from 
>>the interesting cairo code.
> 
> 
> Agreed.
> 
> 
>>                            The code is easier to follow when you don't 
>>have to parse array growing logic intertwined with e.g. bounding box 
>>computations.
> 
> 
> I think the implementation does this fairly well already. All
> "grow_by" functions are called only from their corresponding "add"
> functions. For example:
> 
> 	_cairo_polygon_add_edge
> 	_cairo_spline_add_point
> 	_cairo_traps_add_trap
> 
> There are a few cases of realloc in the code without these layers of
> functions and datatypes. But I don't think any of them constitute
> confusing or intertwined logic. And that approach is likely the best
> where there is only local manipulation of a local array,
> (eg. _utf8_to_ucs4).
> 
> 
>>The cairo_array_index() was meant to give direct array access, for example:
>>
>>	num_elements = cairo_array_num_elements (array);
> 
> 
> Why not just array->num_elements here?

Sure, I guess since it's only an internal API, accessor functions is 
overkill.

>>	traps = cairo_array_index (array, index, num_elements);
> 
> 
> I don't follow. Why pass num_elements in here? array already contains
> that.

My idea was that you declare how many elements you intend to access, so 
the index function could assert() that index + num_elements was <= 
array->num_elements.  So in for a 10 element array, the following would 
fail:

	traps = cairo_array_index (array, 5, 10);

but
	traps = cairo_array_index (array, 2, 2);

would work and traps would be guaranteed to be a valid pointer to 
elements 2, and 3 as trap[0] and traps[1].

>>	for (i = 0; i < num_elements; i++)
>>		traps[i].top = 100;
> 
> 
> I see. If that's the intended usage, then the index argument to
> cairo_array_index would always be zero. So the above could be
> simplified to:
> 
> 	traps = cairo_array_pointer (array); /* or similar? */
> 	for (i = 0; i < array->num_elements; i++)
> 		traps[i].top = 100;
> 
> That's basically just using a function call in place of a cast, but
> maybe that would be worthwhile.

Well, this really is the only way the arrays are accessed in cairo, 
maybe the cairo_array_index() function is overkill.  In this case, yes 
it's only a wrapper around a cast, and my comment about private API 
above applies.

>>(the num_elements argument to cairo_array_index() isn't in the patch, 
>>but I'm thinking it should be there so the function can assert() that 
>>the elements the caller wants to access are within bounds).
> 
> 
>>                                                            As for type 
>>safety, that's a classic tradeoff with containers
> 
> 
> Yes, and it's a tradeoff that I consider carefully.

I think it's a tradeoff well worth it.  I believe Keith mentioned that 
the existing cairo array should be change to double their size instead 
of just growing by one.  This has to be done in 3 places now.  Also it's 
a trade off that every Glib (and Gtk+ and GNOME) application make when 
it uses the Glib containers.  And Java applications using the Java 
container classes.

>>                                                  and we're already 
>>passing void pointers to surfaces to the backends.
> 
> 
> Certainly one case of void pointers in the code doesn't mean we give
> up type-safety anywhere. The virtual backend functions have the
> advantage of being explicitly enumerated in a table, and their
> implementations immediately cast back to the appropriate type, making
> it easy to verify the type-soundness of any backend implementation.

I'm not suggesting that we give up type safety everywhere, and I 
appreciate that cairo has a fairly typesafe API both public and 
internally.  I'm just saying it's one of those things you need to be 
pragmatic about when programming C.

> Notice that whenever a backend implementation also needs to directly
> call an interface function, I've tried to always call a type-safe
> function for which the interface function is just a trivial wrapper,
> (eg. _cairo_image_abstract_surface_set_matrix ->
> _cairo_image_surface_set_matrix).
> 
> I would even prefer to be able to restrict calls of the virtual
> functions to calls through the table, but I don't know an easy
> mechanism for that.
> 
> So, coming back to the array issue. It's not clear to me that we
> really need a new cairo_array, but I would prefer something that
> didn't use "void *" in the usage pattern. An approach with consistent
> or contained structures might allow more of the ugly stuff, (void *,
> casts, etc.) to be contained in cairo_array.c. The header files would
> also have to be verified and kept consistent, but that seems easier to
> do than for all function calls.

Are you thinking of defining a set of cairo_array_{spline,traps,...} 
structs where only the data pointer type varies?  Or something like this:

	struct cairo_array {
		int num_elements;
		int size;
		int element_size;
	};

	struct cairo_void_array {
		cairo_array_t base;
		void *data;
	};

	struct cairo_trapezoid_array {
		cairo_array_t base;
		cairo_trapezoid_t *data;
	};

I'm not sure I think that's cleaner.

> My, how I sometimes wish we had a decent language...

It's not a problem that many languages solve though, and even fewer 
solve it nicely.

cheers,
Kristian