[cairo] gallium surface still maintained ?

Petr Kobalíček kobalicek.petr at gmail.com
Fri Aug 5 20:40:22 UTC 2016


ARM port is planned - I have already a basic overview about the ARM32 and
ARM64 instruction sets (and their differences) and I also started some work
on ArmAssembler (asmjit). But it will not happen before the X86 version is
production ready.

Just to give you some overview - Blend2D's X86 backend is currently around
8000 lines of C++ code. This produces optimized pipelines of all supported
combinations (fetch-op, blend-op, rasterizer-op) for many possible
combinations detected at runtime (1, 4, 8, 16 pixels per loop iteration). I
expect it to grow in the future (especially if I target AVX512, which has
some innovative concepts). This means that an initial ARM port would be
around the same size. I think it's not bad if I consider how much
architecture-specific code is in pixman, for example.


On Fri, Aug 5, 2016 at 6:51 PM, Guillermo Rodriguez <
guillerodriguez.dev at gmail.com> wrote:

> Hello,
>
> I definitely share your view.
>
> Blend2D looks very interesting. I hope there will be an ARM port in the
> future; for what I have seen the JIT engine is currently targetting x86
> architectures only.
>
> Best regards,
>
> Guillermo
>
> 2016-08-05 16:49 GMT+02:00 Petr Kobalíček <kobalicek.petr at gmail.com>:
>
>> I'm reading the discussion and I would like to contribute.
>>
>> I'm author of Blend2D (http://blend2d.com) and I'm just finalizing an
>> evaluation version. And I think, from my own experience, that CPU rendering
>> is feasible and can be really fast. The problem is that libraries are not
>> optimized to use CPU well.
>>
>> If you check out the pipelines of open-source 2D libraries then you will
>> see basically the same thing - it many cases pixels are just copied from
>> one place to another many times before they are written to the destination
>> buffer. Another thing is the dispatching mechanism - in many cases these
>> libraries call tens of functions (sometimes even allocate dynamic memory)
>> before pixels start changing - and this happens every time you call some
>> drawing function that is not "fillRect".
>>
>> I think that the most critical is UI and vector-art rendering, because
>> these generally perform many drawing calls that render tiny things.
>>
>> I have my own benchmarking suite that compares performance of Blend2D,
>> Cairo, and Qt. I can announce on this list when I release the beta version
>> so Cairo devs and users can see the real difference between "optimized for
>> CPU" and "supports CPU".
>>
>> Cheers,
>> Petr
>>
>>
>>
>> On Fri, Aug 5, 2016 at 11:38 AM, Guillermo Rodriguez <
>> guillerodriguez.dev at gmail.com> wrote:
>>
>>> Hi,
>>>
>>> 2016-08-05 11:00 GMT+02:00 Enrico Weigelt, metux IT consult <
>>> enrico.weigelt at gr13.net>:
>>>
>>>> On 05.08.2016 10:35, Enrico Weigelt, metux IT consult wrote:
>>>>
>>>> <snip>
>>>>
>>>
>>>> Oh, could you check whether your driver sets DRM_CAP_DUMB_PREFER_SHADOW.
>>>>
>>>> https://lists.freedesktop.org/archives/dri-devel/2016-August
>>>> /114970.html
>>>>
>>>> On my box (w/ an i915) the driver sets this flag, so I'll have to assume
>>>> that writing individual bytes going to be slow, and a shadow buffer
>>>> should be used, which then is copied over in bursts.
>>>>
>>>
>>> The driver I am using does not set that flag.
>>>
>>> Anyway my application does all compositing on a back (shadow) buffer,
>>> and then blits dirty regions to the DRM buffer.
>>>
>>>
>>>>
>>>> Now the interesting question: how to archieve that ?
>>>> Is there some easy way to trace which pixels/regions in a image surface
>>>> have been touched ?
>>>>
>>>
>>> I keep track of dirty regions which are merged together using a naive
>>> algorithm: The resulting dirty region is just a rectangle that encloses all
>>> dirty rectangles. That is then blitted to the screen in each update cycle.
>>>
>>> This merging algorithm is obviously inefficient in some cases. For
>>> example let's say you have two small dirty regions in opposite corners of
>>> the screen; the merged dirty region will be large, and blitting that will
>>> be less efficient than blitting the two original dirty regions. You could
>>> optimize this by applying some heuristics in order to decide when to merge
>>> and when not to merge, but I haven't found the need to do that yet.
>>>
>>> Guillermo
>>>
>>>
>>> --
>>> cairo mailing list
>>> cairo at cairographics.org
>>> https://lists.cairographics.org/mailman/listinfo/cairo
>>>
>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.cairographics.org/archives/cairo/attachments/20160805/27affac6/attachment-0001.html>


More information about the cairo mailing list