[cairo] Glyph utf8 string in svg output for OCR/OMR tasks
Kwon-Young Choi
kwon-young.choi at hotmail.fr
Tue May 3 09:35:32 UTC 2022
Hello,
I've just posted a new pull-request at https://gitlab.freedesktop.org/cairo/cairo/-/
merge_requests/318[1].
My interest is mainly around OCR/OMR (Optical Music Recognition) tasks where the goal is
to recover numerical documents from images and pdfs.
My main goal is to use cairo to extract every drawing and glyphs from pdf so that I can
train models for symbol classification/detection and semantical reconstruction.
The current main missing bit of information from the svg output of cairo is the utf8 string
of glyph.
My merge request aims to add this information with as little modifications as possible
both in the code and in the svg output.
I hope this kind of use-cases will interest some people and help me merge this feature.
Let me know if there are anything I should in order to improve my contribution.
Best regards,
Kwon-Young Choi
--------
[1] https://gitlab.freedesktop.org/cairo/cairo/-/merge_requests/318
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.cairographics.org/archives/cairo/attachments/20220503/69b2e407/attachment.htm>
More information about the cairo
mailing list