qt6-bb10

History

Giuseppe D'Angelo a794c5e287 Unicode: fix the extended grapheme cluster algorithm UAX #29 in Unicode 11 changed the EGC algorithm to its current form. Although Qt has upgraded the Unicode tables all the way up to Unicode 13, the algorithm has never been adapted; in other words, it has been working by chance for years. Luckily, MOST of the cases were dealt with correctly, but emoji handling actually manages to break it. This commit: * Adds parsing of emoji-data.txt into the unicode table generator. That is necessary to extract the Extended_Pictographic property, which is used by the EGC algorithm. * Regenerates the tables. * Removes some obsoleted grapheme cluster break properties, and adds the ones added in the meanwhile. * Rewrites the EGC algorithm according to Unicode 13. This is done by simplifying a lot the lookup table. Some rules (GB11, GB12, GB13) can't be done by the table alone so some hand-rolled code is necessary in that case. * Thanks to these fixes, the complete upstream GraphemeBreakTest now passes. Remove the "edited" version that ignored some rows (because they were failing). Change-Id: Iaa07cb2e6d0ab9deac28397f46d9af189d2edf8b Pick-to: 6.1 6.0 5.15 Fixes: QTBUG-92822 Reviewed-by: Thiago Macieira <thiago.macieira@intel.com> Reviewed-by: Konstantin Ritt <ritt.ks@gmail.com>		2021-04-16 20:31:39 +02:00
..
GraphemeBreakTest.txt	Unicode: fix the extended grapheme cluster algorithm	2021-04-16 20:31:39 +02:00
LineBreakTest.txt	Update UCD to Revision 26	2020-03-14 11:26:59 +01:00
LineBreakTest.txt.full	Update UCD to Revision 26	2020-03-14 11:26:59 +01:00
ReadMe.full.txt	Update UCD to Revision 26	2020-03-14 11:26:59 +01:00
SentenceBreakTest.txt	Update UCD to Revision 26	2020-03-14 11:26:59 +01:00
WordBreakTest.html	Update UCD to Revision 26	2020-03-14 11:26:59 +01:00
WordBreakTest.txt	Update UCD to Revision 26	2020-03-14 11:26:59 +01:00
WordBreakTest.txt.full	Update UCD data to Unicode 12.1.0's Revision 24	2019-10-30 17:38:02 +01:00

ReadMe.full.txt

Temporary kludge at UCD Revision 24--26 until code can be fixed up.

57+53 of the tests defined by the UCD data are here commented out.
The raw upstream files are provided as *.txt.full where this was needed.