[cairo] how to propose a change?
behdad at behdad.org
Thu Aug 30 15:44:22 PDT 2007
On Fri, 2007-08-24 at 19:18 -0400, ionous wrote:
> thanks for the warm welcome and all the info.
> okay. here's my proposal. sorry in advance for the extreme length.
Thanks for writing this down with so much level of detail. All the time
and energy you put into this is highly appreciated.
> only read if you're interested in cairo arcana--
> i'll try to whip up a patch over the next few days,
> then let the code speak for itself.
> the general goal of my proposal is to unify path memory usage across
> compilers, decrease potentially wasted space, and in, what i suspect
> are the most frequent use cases, increase the number of paths cairo
> can store in the same space it uses now.
> the general concept is to change the dual lists of ops and points into
> a single unified command buffer stream. the following goes into
> details about how it works now, how the command stream would work, and
> how -- in most cases -- the command stream would save memory. there's
> a couple implementation questions towards the end.
> how it works now:
> cairo's paths ( cairo_path_fixed_t ) store their path operations and
> path points in chains of fixed size buffers ( cairo_path_buf_t ) on
> all platforms each buffer holds 512 bytes. there are always the same
> number of points and ops in each structure, however the size of the
> ops vary per compiler, and thus the number of items each structure can
> hold varies. under visual c 6.1 ops take 4 bytes, while via gcc each
> op is 1 bytes. the former leads to 2 set of 38 items ( 38 points, 38
> ops ), the latter 2 sets of 51 items.
> with the existing algorithm whenever a new path operation would
> overflow either the number of available points or ops, a new buffer
> gets allocated. there is not a one to one mapping between the number
> of ops and the number of points used in a given path. each operation
> takes different numbers of points: move to and lines to add 1 point, a
> curve to adds 3, the close path op adds no points.
> the worst real world cases, memory usage wise, are curves. a new
> buffer is needed under visual c after 12 curve segments; under gcc:
> for a buffer full of curves, visaul c produces 2 wasted points and 26
> (38-12) wasted ops; gcc no wasted points, but 34 (51-17) wasted ops.
> that's a total of 112 (2*4 + 26*4) and 34 (34*1) wasted bytes
> respectively; 22% and 7% of path storage memory goes unused in the
> full curve case. not terrible, especially under gcc, but i think
> there's a simple way to use almost all the allocated space for path
Ok, the win32 case was clearly wasteful. I went ahead and committed a
fix to always use a byte for the operation.
> command buffers
> since paths and ops are both integer types, it would be easy to store
> a single command stream of uniform type combing the list of ops and
> points. each op would take 1 command stream element, while each point
> would take 2 elements ( 1 for x, 1 for y ). the entire command stream
> would have 116 elements available in the space devoted formerly to the
> separate point and op lists.
> if every operation that is currently added to the list is directly
> followed by the points that use it, a curve segment would have the
> stream: [ "CURVE", curve-point-1,cp2,cp3, "CURVE",c2p1,c2p2,c2p3 ]
> each curve would take 7 (1+3*2) elements ( 28 bytes ) and cairo could
> store 16 curves with no wasted space. for visual c: that's good ( 16
> vs. 12 ), for gcc not so good ( 16 vs. 17 ). Lines of course are even
> worse for gcc -- reducing it to 38 points from 51 points.
So, this is not going to save us anything.
> command buffers + counts
> to get towards the storage savings that cairo has under gcc, the next
> step would split the op element into two parts: the operation key and
> a count tracking the number of times an operation has been used
> without interruption. in the repeating curve segment case the command
> stream would appear something like [ ("CURVE", X), c1p1,c1p2,c1p3,
> c2p1,c2p2,c2p3, ... ] where X is the number of curve segments in the
> current buffer. in this case: 38 curve segments could fit. if the
> next path segment were also a curve there would be just 4 bytes ( 1 op
> + 38*3 points = 115 elements used; 116 total elements-115 used els=1
> wasted el; 1el*4bytes/el=4 ) of wasted space. [ the bookkeeping needed
> -- an index to the current operation would be kept at the path, not
> buffer, level ]
> that's great: 38 curves vs 12(or 17) - a win for all compilers.
And this is in my opinion trading simplicity for a little hardly
beneficial memory usage improvement. Note that path memory is a very
transitional kind of memory. Each cairo context at each time doesn't
have more than one path attached to it, and a cairo context itself is a
transitional object too.
> two new commands
> the downside appears again however in situations where there are lots
> of changes in the segment type -- ex. "MOVE", "LINE", "MOVE", "LINE",
> "MOVE", "LINE" --this scheme could only fit 19 (116/(2*(1+2))) such
> paired segments, while currently under gcc can fit 25 (51/2).
> the next and final step, at the cost of making some changes outside of
> the buffer code itself, would be to introduce two new commands
> LINE_SEGMENT and CURVE_SEGMENT. ( this makes the lists very
> 3d-accelerator like with their line strip / line list primitives ).
> the segment commands would represent the "MOVE", "LINE" pair as a
> single operation with two points., rather than 2 operations with 2
> points. in the current buffer space cairo could store: 28
> ((116-1) /(2*4)) line list segments per buffer.
> this isn't quite as straight forward as the other changes as there
> would probably have to be some work to handle detecting line strip
> startup cases ( move-line-line-... ) cleanly ( similar to the existing
> move / line startups ). a nice thing about this change though is that
> separate strips would wind up shedding their separate initial move
> operation; the move designated solely by the start of a new strip
> that said thrashing between strips and lists will still eat up memory
> slightly faster than it used to.
> overall gains
> i think in the majority of cases these changes as a whole will save
> memory for all compilers, but in between fully mixed and fully chained
> paths, there's a lot of wiggle room. while i suspect most use cases
> are composed of chains of similar primitives - i don't know that for
> there's some smaller possibility it might make acceleration
> compatibility easier in some sense -- given the tighter mapping to
> gl/d3d style primitive types -- though i don't really know how/where
> that would actually play out.
> how should chaining work?
> one open question is how to chain together series of same path
> operations that span multiple buffers. the two main options are
> either store the complete chain length even if it crosses buffers, or
> to store only the per buffer length, always terminating a chain at the
> end of a buffer. the option still tops out at 65535 operations, and
> so some sort of chain continuation code would be needed in either
> case. given that the second option likely makes more sense. that
> said there could be some advantages in knowing the size of chains up
> front.... not completely sure but easy to tweak later on.
This slips over to over-engineering I would say. Specially if it needs
internal API change.
> are reversals a dead code path?
> while going over the necessary changes again this morning i see that
> the buffer's prev pointer allows _cairo_path_fixed_interpret() to
> traverse the op and point list both forwards and backwards.
> while the reversal traversal actually appears unused in current code,
> the command stream would require a minor addition to support it. the
> operator elements would further split into: command / repeat count /
> prev op index with no more than 255 elements stored in a single
> while generally i'm a firm believer in removing unused code paths, i'd
> definitely want to hear from people with more cairo experience than
> myself. if reversal is unneeded that would nicely free up space for
> another command stream element ^o^
> if i'm wrong or insane on any front just let me know,
I like the fact that you put so much effort into it, but would rather
see some part of code get your attention that is a real performance
(time/memory) bottleneck, showable in some kind of real-world scenario.
As for the path storage, I'm pretty happy with how simple the current
code is and given that the bad win32 performance is gone now, I don't
think we need to change it for now.
"Those who would give up Essential Liberty to purchase a little
Temporary Safety, deserve neither Liberty nor Safety."
-- Benjamin Franklin, 1759
More information about the cairo