From kwon-young.choi at hotmail.fr Tue May 3 09:35:32 2022 From: kwon-young.choi at hotmail.fr (Kwon-Young Choi) Date: Tue, 03 May 2022 11:35:32 +0200 Subject: [cairo] Glyph utf8 string in svg output for OCR/OMR tasks Message-ID: Hello, I've just posted a new pull-request at https://gitlab.freedesktop.org/cairo/cairo/-/ merge_requests/318[1]. My interest is mainly around OCR/OMR (Optical Music Recognition) tasks where the goal is to recover numerical documents from images and pdfs. My main goal is to use cairo to extract every drawing and glyphs from pdf so that I can train models for symbol classification/detection and semantical reconstruction. The current main missing bit of information from the svg output of cairo is the utf8 string of glyph. My merge request aims to add this information with as little modifications as possible both in the code and in the svg output. I hope this kind of use-cases will interest some people and help me merge this feature. Let me know if there are anything I should in order to improve my contribution. Best regards, Kwon-Young Choi -------- [1] https://gitlab.freedesktop.org/cairo/cairo/-/merge_requests/318 -------------- next part -------------- An HTML attachment was scrubbed... URL: