[cairo] Glyph utf8 string in svg output for OCR/OMR tasks

Kwon-Young Choi kwon-young.choi at hotmail.fr
Tue May 3 09:35:32 UTC 2022


Hello,

I've just posted a new pull-request at https://gitlab.freedesktop.org/cairo/cairo/-/
merge_requests/318[1].

My interest is mainly around OCR/OMR (Optical Music Recognition) tasks where the goal is 
to recover numerical documents from images and pdfs.
My main goal is to use cairo to extract every drawing and glyphs from pdf so that I can 
train models for symbol classification/detection and semantical reconstruction.
The current main missing bit of information from the svg output of cairo is the utf8 string 
of glyph.

My merge request aims to add this information with as little modifications as possible 
both in the code and in the svg output.

I hope this kind of use-cases will interest some people and help me merge this feature.

Let me know if there are anything I should in order to improve my contribution.

Best regards,

Kwon-Young Choi

--------
[1] https://gitlab.freedesktop.org/cairo/cairo/-/merge_requests/318
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.cairographics.org/archives/cairo/attachments/20220503/69b2e407/attachment.htm>


More information about the cairo mailing list