Commit Graph

139 Commits (5ea248155654b58fcb52ef326dc4d94de83d0409)

Author SHA1 Message Date
Edward Welbourne 020a224a79 Break out timezone data from cldr2qtimezone.py
This separates the large slabs of data (and their documentation) from
the code that mixes them with CLDR-derived data and generates the data
we actually use. In the process, put the shorter table before the
longer one, to make it less likely that folk shall fail to notice it's
even there at all.

Change-Id: I8457741911657dac0dad53c2e65b977821bb4e71
Reviewed-by: Friedemann Kleint <Friedemann.Kleint@qt.io>
2024-05-06 20:27:41 +02:00
Edward Welbourne 5f8dc8ea5f Purge an almost-redundant duplicate datetime format conversion
The QLocale XML reader was passing datetime formats through a format
conversion despite the data being converted at the point where we read
it from CLDR. It turns out this was needed because the long date and
time formats in our hard-coded data for the C Locale object used CLDR
format strings, unlike all other Locale objects. Fix those two formats
in the C locale and remove the redundant processing step. This, in
turn, enables the parser to include the date and time formats in its
general handling of most fields that it reads.

This does not result in any change to the generated data QLocale uses
(although it does change the intermediate QLocale XML file).

Task-number: QTBUG-115158
Change-Id: Iaf9da206158043dda2e9e5a3790f009b100e46b4
Reviewed-by: Mate Barany <mate.barany@qt.io>
2024-04-30 18:30:15 +02:00
Edward Welbourne f83206229e Apply a common style to the main()s of locale database programs
Include documentation in both, using common phrasing. Take sys.argv as
a parameter, along with sys.stdout and sys.stderr, so that we can
invoke them from python when importing into a python session to debug
or test. Supply the script name to the argument parser as prog, so it
can correctly report it and forward the rest of argv to parse_args().
Remove comments anticipating one of the several calendars we don't yet
support; the existing entries suffice to make clear what shall be
needed when we get round to adding more.

Change-Id: I2cebd385679e3c84d4ccf899e60091ac823ad10d
Reviewed-by: Mate Barany <mate.barany@qt.io>
2024-04-26 07:36:16 +02:00
Edward Welbourne 065548e7b4 Modernise testlocales/ program and make it compile
After several years unused, it had bit-rotted to the point of not
compiling and failing an assertion. It also appears to have always had
a bad free() on exit, due to passing the address of a static object to
a function that took ownership and later deleted it.

Change-Id: I91856258c3fedf820bf151b5d205d257876a8e13
Reviewed-by: Jason McDonald <macadder1@gmail.com>
2024-04-26 07:36:16 +02:00
Edward Welbourne 1d48bf34db Automate updating of list of locales for testlocales
This old test program has bitrotted due to not being autogenerated as
part of CLDR updates. Amend qlocalexml2cpp.py to regenerate it and do
such an update. It was still using Qt5's QLocale enum numeric values,
many of which have changed in Qt6. Actually fixing the code so that it
compiles and runs can wait for a later commit.

Inspired by a patch supplied by Kizito Birabwa.

Task-number: QTBUG-124200
Change-Id: I33811313976a4860aad6d7b5b88a40c5b111a4fe
Reviewed-by: Mate Barany <mate.barany@qt.io>
2024-04-26 05:36:16 +00:00
Edward Welbourne e08ca2c9c8 Fix spacing inconsistencies brought to light by flake8
It has many grumbles about spacing, but at least this code is
currently consistent about its departure from PEP8's spacing rules
(and closer to Qt's) for the present. We can review whether to do a
drastic spacing revolution later.

Change-Id: Ife4e8a5b02b63434bd9c7ac7ba4cbc11b6311f9f
Reviewed-by: Mate Barany <mate.barany@qt.io>
2024-04-23 20:51:19 +02:00
Edward Welbourne cf0ebc9ad3 Fix typo in doc comment for QLocaleXmlWriter.close()
Change-Id: I128ed5e0ebd01a7ed1f3a3049d2b63f1df042562
Reviewed-by: Cristian Maureira-Fredes <cristian.maureira-fredes@qt.io>
2024-04-22 18:56:20 +02:00
Edward Welbourne f2a2379de8 Use dict comprehensions more in cldr.py and qlocalexml.py
They're a bit more readable than calling dict on a generator.

Change-Id: I3177e31b1f617b80d1cf5d5f83df7036fc0c4c01
Reviewed-by: Cristian Maureira-Fredes <cristian.maureira-fredes@qt.io>
2024-04-22 18:56:20 +02:00
Edward Welbourne d935a89d25 Tweak the message for variants
Although the code does not, in fact, know about them, it's more
pertinent to say that they're unsupported than to say that the variant
in question is unknown.

Change-Id: I411d792dc91f2d7af58a4b7919c952a005b3417e
Reviewed-by: Cristian Maureira-Fredes <cristian.maureira-fredes@qt.io>
2024-04-22 17:22:12 +02:00
Edward Welbourne dd56558ecd Improve fidelity of approximation to CLDR zone representations
I neglected to update the CLDR dateconverter code when I expanded the
range of forms we support for display of a timezone. Even that
expanded range doesn't cover all the cases CLDR does, but we can at
least approximate each of CLDR's options by the closest we do support.
Make matching changes to how the Darwin backend for the system locale
maps its ICU-derived formats to ours.

This in practice changes all locales previously using t (abbreviation)
as zone format to use tttt (IANA ID) instead. Test data updated to
match.

[ChangeLog][QtCore][QLocale] Date-time formats now more faithfully
follow the CLDR data in handling timezones. In most cases this means
the IANA ID is used in place of the abbreviation.

Change-Id: I0276843085839ba9a7855a78922cffe285174643
Reviewed-by: Thiago Macieira <thiago.macieira@intel.com>
2024-04-22 11:58:25 +02:00
Edward Welbourne b87db4296c Correct handling of 'u' in CLDR date format strings
It explicitly excludes having a two-digit special case like 'yy'.
Correct that in qlocale_mac.mm, add support in dateconverter.py
No current locale actually uses the 'u' format, so this makes no
change to data.

Change-Id: I16dfed2d3a7d2054b4b86f9a246bff297df9fc0a
Reviewed-by: Dennis Oberst <dennis.oberst@qt.io>
Reviewed-by: Thiago Macieira <thiago.macieira@intel.com>
2024-04-19 13:39:41 +02:00
Edward Welbourne 42e4e1816a Fix handling of am/pm indicators in mapping from CLDR to Qt formats
Both qlocale_mac.mm and dateconverter.py were mapping the CLDR am/pm
indicator, 'a', to the Qt format token 'AP', forcing the indicator to
uppercase. The LDML spec [0] says:

    May be upper or lowercase depending on the locale and other
    options.

[0] https://www.unicode.org/reports/tr35/tr35-68/tr35-dates.html#Date_Field_Symbol_Table
We don't support the "other options" mentioned, but we can at least
(since 6.3) preserve the the locale-appropriate case, instead of
forcing upper-case. As such, this change is a follow-up to
commit 4641ff0f6a

Changes locale data, as expected, to use "Ap" in place of "AP" in
various formats in the time_format_data[] array.

[ChangeLog][QtCore][QLocale] Where CLDR specifies an am/pm indicator,
the case of the CLDR-supplied indicator is used, where previously
QLocale forced it to upper-case.

Change-Id: Iee7d55e6f3c78372659668b9798c8e24a1fa8982
Reviewed-by: Konstantin Ritt <ritt.ks@gmail.com>
Reviewed-by: Thiago Macieira <thiago.macieira@intel.com>
2024-04-19 13:39:41 +02:00
Edward Welbourne d778d17b9f Cope with CLDR's "day period" format specifiers
The LDML spec includes a 'b' pattern character which is like the 'a'
pattern, for AM and PM, but would rather use noon and midnight
indicators for those specific times. We don't support those and using
am/pm will be right enough of the time to be better than simply
discarding this option, if it ever gets used (which it currently
isn't), so treat as an alias for 'a'. No locale in CLDR currently uses
this.

CLDR also has a 'B' specifiers for "flexible day periods", including
things like "at night" and "in the day". At present only zh_Hant uses
'B'. As a result, this change only affects zh_Hant's formats for time
and datetime, which only zh_Hant_TW uses - zh_Hant_HK overrides them
to use am/pm markers and zh_Hant_MO inherits that from
zh_Hant_HK. Based on this and user feed-back, I've opted to treat 'B'
as another synonym of 'a'.

This removes an entry from the time_format_data[] table (it happened
to occupy one whole twelve-character row), causing many other locales'
offsets into that table to be shifted by 12. Only zh_Hant_TW has an
actual change to which entry in the table it uses.
Added a test-case.

[ChangeLog][QtCore][QLocale] CLDR's 'B' (flexible day period, e.g. "at
night" &c.) field, not currently supported, is now handled as a
synonym for the AM/PM field 'a', instead of leaving the B as literal
text. Only affects zh_TW at present.

Fixes: QTBUG-123872
Change-Id: I6ba008c0a048190bf7af8c7df7629a885b05804f
Reviewed-by: Thiago Macieira <thiago.macieira@intel.com>
2024-04-19 13:39:40 +02:00
Edward Welbourne ea806fa3f1 Rewrite CLDR-ingestion's date-time format conversion
Rework the somewhat ad-hoc handling of format blocks. Instead of
converting one character at a time, then coming back to map contiguous
chunks of various lengths to Qt's best match, use the first
non-separator character to select a function that looks ahead to see
what to consume with it. Quoted text can be handled the same way, with
a look-ahead. This potentially allows for more flexible parsing in
future.

In the process, matching qlocale_mac.mm, treat all unquoted letters as
reserved. The LDML spec says:

  Currently, A..Z and a..z are reserved for use as pattern characters
  (unless they are quoted, see next item).

and its description of literal text explcitly says these reserved
characters are not to be understood as literals. Document the letters
we do know about as unsupported pattern characters, but don't do
anything specific to handle them. This transiently changes zh_TW's
"Bh" hour fields to plain "h" but an imminent commit will change that
again and there is no other change to data, so the locale data is not
regenerated in this commit, to save churn.

This makes the parsing front-end function more straightforward and
makes it easier to document the quirks of the different format letters
and the impedance mismatches between CLDR's and Qt's. In the process,
recognize C, like j and J, as special magic to ignore and harmonize
with what qlocale_mac.cpp's macToQtFormat() does, where it's right and
dateconverter.py differed. Document the need to stay in sync with this
last.

Task-number: QTBUG-123872
Change-Id: I490d395b37751c9b8d6f3ee5ed4edbc0d405db5b
Reviewed-by: Mate Barany <mate.barany@qt.io>
2024-04-19 13:39:40 +02:00
Edward Welbourne 43cb4136a9 Move LocaleScanner's INHERIT check from find upstream to __find
When digesting CLDR v44.1's github form, some data (e.g. pt_BR's
language endonym) were None that had perfectly sensible values in the
zip-file form.  Letting __find() yield INHERIT entries lead to find()
sometimes returning None, where __find() should have tried harder or
raised an Error.

This further amends commit bcdd51cfae
(after commit 0f770b0b34 isolated its
magic value).

Pick-to: 6.7
Task-number: QTBUG-115158
Change-Id: I1af92a687cd50b8fd026c25f068c804a3516ef95
Reviewed-by: Mate Barany <mate.barany@qt.io>
2024-04-19 13:39:40 +02:00
Edward Welbourne a643a956d4 Rework enumdata.py's comments
Turn the large comment at the start into a doc-string and add some
more details to it. Fix the Ivory Coast comment's indent and a typo in
it.

Change-Id: I36b4e5094d3c3d5c5b91809424b424bcac5daafa
Reviewed-by: Friedemann Kleint <Friedemann.Kleint@qt.io>
2024-03-18 14:59:26 +01:00
Edward Welbourne 693bf76306 Minor tidy-up of CldrAccess.__enumMap: revise comment, modernize code
A comment dated from when variables misleadingly named language_list,
script_list and country_list actually held mappings not lists; they've
been renamed to s/list/map/ a while back, so rephrase.

Use a dict-comprehension rather than the somewhat lisp-ier invocation
of the dict constructor on an iterator over pairs.

Change-Id: Ibcb97122434122dbb1dcb0f621aae02b25a4e1fa
Reviewed-by: Cristian Maureira-Fredes <cristian.maureira-fredes@qt.io>
2024-02-13 15:58:43 +01:00
Edward Welbourne b0f4bd7f23 Move QTimeZone's CLDR-derived data into a namespace
Introduce namespace QtTimeZoneCldr instead of having a Q prefix on
each class name used for the data.

Change-Id: Icb22a91340b67f9cc93173b77374a70f69f81bbe
Reviewed-by: Ivan Solovev <ivan.solovev@qt.io>
2024-02-08 15:40:04 +01:00
Edward Welbourne a736e613fc Document LocaleScanner's constructor
I needed to know in order to make recent changes.
Save the need to work it out again next time.

Change-Id: Ibc606cbe2e6af16e6820fd753a643331a03cdfb3
Reviewed-by: Ievgenii Meshcheriakov <ievgenii.meshcheriakov@qt.io>
2024-02-08 15:40:04 +01:00
Edward Welbourne 063026cc50 Update QLocale and calendar data to CLDR v44.1
(This turns out to be identical to v44, for our purposes.)

The CLDR license has been revised at v44 to "UNICODE LICENSE V3",
which is now included (as LICENSES/UNICODE-3.0.txt) in addition to the
old license (still in use, presumably, by UCD - at least until its
next update). Some new QLocale::Language entries are needed. There is
no change to the time-zone data.

Some tests needed changes:
* Various Arabic locales now use U+0623 (Arabic letter aleph with
  hamza above) in exponent separator, replacing plain U+0627 (Arabic
  letter aleph); it is still followed by U+0633 (Arabic letter seen).
* Where likely sub-tags used to fill in world, 001, as territory for a
  language, they now (e.g. for Prussian and Yiddish) give specific
  countries.
* Tamil locales now have something of a mix of inherited and localized
  forms for AM/PM, which looks a lot like a mistake in CLDR.
* New likely sub-tag rules fix ctor(und_US) and ctor(und_GB), which
  previously failed.

[ChangeLog][Third-Party Code] Updated QLocale's data extracted from
the Unicode Common Locale Data Repository (CLDR) to v44.1. The license
changed to Unicode License V3.

Pick-to: 6.7 6.6 6.5
Fixes: QTBUG-121485
Task-number: QTBUG-121325
Change-Id: Ide1a68016129526d7a5aa3fc67f1a674858696bc
Reviewed-by: Qt CI Bot <qt_ci_bot@qt-project.org>
Reviewed-by: Mårten Nordheim <marten.nordheim@qt.io>
2024-02-02 08:26:03 +01:00
Edward Welbourne 4b2eb26bf9 Fix ordering of Windows timezones
The list is meant to be sorted in increasing order, requiring
"<anything> (Mexico)" to appear after "<anything>" but in two out of
four cases such pairs were in the wrong order. China sorts after
Chatham Island and lexical sorting of numbers doesn't match sorting by
numeric value.

Assert the expected ordering. (The more important check needs a
QBAV::compare(), which isn't constexpr, so we can't static_assert.)
Later commits shall use binary chop exploiting this ordering. The
assertion failed without the rest of this change.

Also improve the comments describing the data tables these searches
check and the types of their entries. Some were inaccurate, others
merely unclear. Likewise, comment the sorting expectations in the
python code that generates the tables.

Change-Id: I640a3cca8f820c5fd5939a2fe5feb96b04407335
Reviewed-by: Thiago Macieira <thiago.macieira@intel.com>
2024-02-01 21:50:50 +01:00
Edward Welbourne 0f770b0b34 Move special-case LDML value to a module global
Giving it a symbolic name is clearer (and saves me the need to
duplicate the comment when I add some more references to it).
This amends commit bcdd51cfae

Task-number: QTBUG-115158
Change-Id: I7577e0cde783fcda840009c7aea46934964c6e4c
Reviewed-by: Cristian Maureira-Fredes <cristian.maureira-fredes@qt.io>
2024-01-29 15:14:43 +01:00
Edward Welbourne c8b70f4e51 ldml.LocaleScanner.__find(): only Error if no matches found
The existing caller returns early on finding a match, so never ran off
the end of the iteration unless there were no matches. I'll soon be
adding a new caller that wants to iterate all matches, so will run off
the end even when there are some. So only raise the Error if we found
nothing.

Task-number: QTBUG-115158
Change-Id: I1cae4674eb5e83c433554c15ecc4441b756f20eb
Reviewed-by: Cristian Maureira-Fredes <cristian.maureira-fredes@qt.io>
2024-01-29 15:14:43 +01:00
Edward Welbourne 82f8afe6ba Package DOM attributes for Node objects
The Supplement type did the needed mapping (using nodeValue when the
value wasn't a string) and it turns out to be useful to do the same
for the DOM object packaged by a Node, too. Pull out into a helper
function, use dict-comprehension and expose as a method of Node.

Change-Id: Ice6737a54a33372b45cf42152e3fdbf5f2da7ba4
Reviewed-by: Cristian Maureira-Fredes <cristian.maureira-fredes@qt.io>
2024-01-29 15:14:43 +01:00
Edward Welbourne bcdd51cfae Prepare to support taking CLDR data from its github upstream
We've previously used the zip-file form, but that's not been published
for CLDR v44.1 - the advice on the list was to use github
instead. That, however, has ↑↑↑ as a special value for fields, meaning
to inherit from a prent locale. So special-case that value. I have
verified that v44 from the zip file produces identical results to v44
from github, with this minor fix. As it happens v44.1 also produces
identical results.

Pick-to: 6.7 6.5
Change-Id: I6eb0aedda7556753cdc83bb9d76652fbb68dc669
Reviewed-by: Ievgenii Meshcheriakov <ievgenii.meshcheriakov@qt.io>
2024-01-19 15:38:25 +01:00
Edward Welbourne e45d05dfc0 Convert UTC offset table look-ups to binary chop
The table was almost sorted by offset - its UTC entry, with offset 0,
was at the front rather than first among the offset 0 entries. The
lookups in it were being done as if the IDs were in space-joined lists
(as for the IANA IDs in the Windows table), splitting on space,
despite the fact that it had separate entries for different IDs at the
same offset (this only arose for offset 0). So actually massage the
input table in python to combine IDs with the same offset using space,
placing UTC first among the offset 0 entries, and ensure the C++ table
is sorted. Regenerated the CLDR data tables using the updated script.

In the process, fix an off-by-one error in the iteration over
space-joined IDs, where the search only advanced to the space, rather
than to just after it. That wasn't a problem before, but now would be.

Change-Id: Ib49c27bac269b557166fa10738c3e396d58456c0
Reviewed-by: Thiago Macieira <thiago.macieira@intel.com>
2023-11-03 18:27:13 +01:00
Edward Welbourne 2636258b29 Add note on -00:00 as offset suffix and UTC-00:00 as zone
The former is overtly forbidden by ISO 8601 but should be recognized
due to RFC 3339's use of it. The "Serialising Extended Data About
Times and Events Working Group" (sedate WG) has established how to
resolve this, so document that conclusion and note the consequent
inadvisability of using UTC-00:00 as a zone ID (although it should
still be accepted).

Change-Id: Ib9fbbe6765117bfa9a84e726d0e75f7397a4c813
Reviewed-by: Thiago Macieira <thiago.macieira@intel.com>
2023-10-27 20:06:06 +02:00
Christian Ehrlicher 4e8b54eb81 Preparations to deprecate QItemDelegate
QItemDelegate was superseded since Qt4 by QStyledItemDelegate but it
took until Qt6.7 to remove the last occurrences in qtbase.
 - remove unused includes / replace with qabstractitemdelegate.h
 - replace references in the documentation with QStyledItemDelegate
 - adjust the examples and tests to use QStyledItemDelegate

Pick-to: 6.5 6.6
Change-Id: I246755004ce2d01192a726ca0972106c237df0cc
Reviewed-by: Volker Hilsheimer <volker.hilsheimer@qt.io>
Reviewed-by: Paul Wicking <paul.wicking@qt.io>
2023-10-05 21:08:45 +02:00
Edward Welbourne 2ec8acf611 Change enumdata.py names so comments read more naturally
Now that the "and" is only seen in enumdata.py and comments, we can
s/And/and/ in all the various territory names that used it.

Change-Id: Ic376d5904b6f5ab54931f96230c1dd5b7f357b8d
Reviewed-by: Ievgenii Meshcheriakov <ievgenii.meshcheriakov@qt.io>
2023-08-09 17:53:45 +02:00
Edward Welbourne 1ae24f8b50 Use CLDR's names in QLocale::*ToName() for language, script, territory
Various comments need to continue using the enumdata.py names, as they
associate data with particular enum members, but we can now correctly
use the en.xml versions of their names when we report them, rather
than the enum-friendly names we use in the code. Since this now means
the data may stray outside plain ASCII - it'll be UTF-8-encoded - this
implies replacing the QLatin1StringView()s of the code that formerly
read this data with QString::fromUtf8().

Fixes: QTBUG-94460
Change-Id: Id3b08875a46af58c0555c3e303b0e15a19441509
Reviewed-by: Qt CI Bot <qt_ci_bot@qt-project.org>
Reviewed-by: Thiago Macieira <thiago.macieira@intel.com>
2023-08-09 17:53:42 +02:00
Edward Welbourne afd7d68244 Revise enumdata.py's names to more closely match CLDR's
We could already use dashes in some, rather than spaces, and now no
longer need to capitalize each word. This changes the *_name_list[]
entries for affected languages to more closely match what CLDR gives
as their names. It also amends various comments. Added tests for the
QLocale::*ToString() functions to cover the entries changed.

Task-number: QTBUG-94460
Change-Id: I0163795cb282881f15a97be00a5311c1936c3a09
Reviewed-by: Thiago Macieira <thiago.macieira@intel.com>
2023-08-09 17:53:36 +02:00
Edward Welbourne 743ceb7cc2 Move enum-name-munging from LocaleHeaderWriter to QLocaleXmlReader
The former needed the latter's .dupes to do the job, so can now just
take a method as a tool to do the job instead, letting .dupes become
private. In the process refine the munging to free enumdata.py from
having to capitalize each word in its names. This will, in due course,
let us use more natural forms in various comments. This causes no
change to generted data.

Update enumdata.py's introduction doc, mainly to reflect this but also
fixing the out-of-date names (old *_list have long been *_map) and
adding some details to other paragraphs.

Task-number: QTBUG-94460
Change-Id: If195b2e94a53a495fc4f1f216bed07a910439fa7
Reviewed-by: Ievgenii Meshcheriakov <ievgenii.meshcheriakov@qt.io>
2023-08-09 17:53:26 +02:00
Edward Welbourne e212b3633c Break clashing-names test function out of CldrAccess.__checkEnum()
Moving it makes it easier to document what it's up to and why, while
leaving __checkEnum() easier to read; and I'm going to need it
elsewhere anyway. This makes no difference to generated data.

Task-number: QTBUG-94460
Change-Id: I684375bc926d5d54928fbf5b5e08978528aef487
Reviewed-by: Ievgenii Meshcheriakov <ievgenii.meshcheriakov@qt.io>
2023-08-09 17:53:20 +02:00
Edward Welbourne a0c5a7c483 Wrap CLDR-derived time-zone ID tables to within a 100-column margin
Ugly display in code-review bugged me ...

Change-Id: If9243907973cad49f1a5c8d78ff23c14fab2d4b4
Reviewed-by: Ievgenii Meshcheriakov <ievgenii.meshcheriakov@qt.io>
2023-08-07 19:51:09 +02:00
Edward Welbourne 40b063cd74 Tweak lookup of en.xml names for languages, scripts and territories
Prefer stand-alone versions of the names when available. This saves
the need for a Han-specific kludge in the check for discrepancies
between our enum names and the en.xml names. Causes no change to
generated locale data.

Pick-to: 6.6 6.5
Change-Id: I162f3107d6ffc1f8b893b206e0b78b61cf7254f6
Reviewed-by: Ievgenii Meshcheriakov <ievgenii.meshcheriakov@qt.io>
2023-08-03 19:16:27 +02:00
Edward Welbourne c5ef6e1e68 Wrap QLocale's CLDR-derived string data tables to a 100-column margin
The string data tables have a mix of digit-length tokens, up to
four-digit hex; we can fit 12 of those per line within our margins.
Leave the one-row-per-locale tables as they are, though, despite long
lines.

Change-Id: I655fddecf24133c26d16187b7a5a8fbc25553e07
Reviewed-by: Ievgenii Meshcheriakov <ievgenii.meshcheriakov@qt.io>
2023-08-03 19:16:27 +02:00
Edward Welbourne 69a0cec4d0 Canonicalize space in lists of IANA time-zones
In the Windows zone-ID code, we tokenize() a text extracted from CLDR
data. However, a leading or trailing space (or a repeated internal
space) would then give an empty "IANA ID" for us to match, causing the
empty ID to be mapped to the Windows ID for the entry with the
superfluous space. This was uncovered by an entry with a trailing
space in CLDR v43's data.

Canonicalize spacing in the IANA ID lists extracted from CLDR so as to
ensure this doesn't happen. (We could pass Qt::SkipEmptyParts to the
tokenize() call, but fixing the issue when generating the data is
cheaper and more robust than fixing it at run-time every time it's
consulted.)

Task-number: QTBUG-111550
Change-Id: Ib3883419558d6574141e9ab0bc929ade2d73e020
Reviewed-by: Ievgenii Meshcheriakov <ievgenii.meshcheriakov@qt.io>
2023-08-02 09:38:14 +02:00
Edward Welbourne 5db5d3e4b1 Add new languages and a script for CLDR v43
Also add a comment to check the locales new additions enable do have
substantial data. Some of those added in the past are more or less
stubs, for all that they're officially present.

Task-number: QTBUG-111550
Change-Id: I04d46ee96303ecec56c056a0deff6a9457b863e9
Reviewed-by: Ievgenii Meshcheriakov <ievgenii.meshcheriakov@qt.io>
Reviewed-by: Mate Barany <mate.barany@qt.io>
2023-08-01 15:36:31 +02:00
Edward Welbourne 8a762c6f0f Cope with CLDR data conflict decimal == group
The digit-grouping and fractional-part separators need to be distinct
for parsing to be able to distinguish between two thousand and one vs
two and one thousandth. Thakfully ldml.py asserted this, so caught the
glitch in CLDR v43's data where mn_Mong_MN over-rode mn's decimal, but
not group, and thereby clashed with group. Fortunately the over-ride
is marked as draft="contributed" so we can back out of the collision
and limit the selection to draft="approved" values (but only when
there *is* such a conflict, as plenty of locales have (compatible)
draft data), thereby ignoring the conflicting contribution.

Brought to the attention of cldr-users at:
https://groups.google.com/a/unicode.org/g/cldr-users/c/6kW9kC6fz3g
hopefully that'll lead to a saner resolution at v44.

Task-number: QTBUG-111550
Change-Id: I1332486e60481cb4494446c0c87d89d74bd317d4
Reviewed-by: Ievgenii Meshcheriakov <ievgenii.meshcheriakov@qt.io>
2023-08-01 15:36:20 +02:00
Edward Welbourne 615047e98f Ignore parentLocales nodes with component="..." attributes
From CLDR v43, "The parentLocale elements now have an optional
component attribute, with a value of segmentations or
collations. These should be used for inheritance for those respective
elements." Since we aren't extracting collation or segmentation data
for the present, omit these elements from the scan for parentLocale
information.

Task-number: QTBUG-111550
Change-Id: I42871929f539c1852471812801953f2fc8be0e8a
Reviewed-by: Ievgenii Meshcheriakov <ievgenii.meshcheriakov@qt.io>
2023-08-01 15:36:06 +02:00
Edward Welbourne 37c5a9f20b Fix typos in QLocaleXmlWriter
The script and territory to exclude from reports about unused ones
were swapped, so we excluded a territory from the script list (which
didn't contain it anyway) and vice versa.

TheTest for whether to report used the non-existend .territories
attribute by mistake for .__territories

Change-Id: I29e9d9f8f34883d7c3a5ac15470d9e7a0366e3db
Reviewed-by: Ievgenii Meshcheriakov <ievgenii.meshcheriakov@qt.io>
2023-08-01 15:35:57 +02:00
Edward Welbourne 4e23da9086 Correct enumdata.py's new-at-v42 entries to match generated data
Amends commit 9a8b9473d5 - apparently
the enumdata.py entries were tidied up after the data had been
generated, leading to them being inconsistent (and I missed that in
review). That, in turn, meant the next update would have changed the
public API enum members, backwards-incompatibly; so make enumdata.py
consistent with the released public API. We'll be tidying the order up
at Qt 7 in any case.

Task-number: QTBUG-110333
Change-Id: I3eed2924ce8b69deb552e923d9b0dc142c5f3a65
Reviewed-by: Konrad Kujawa <konrad.kujawa@qt.io>
Reviewed-by: Mate Barany <mate.barany@qt.io>
Reviewed-by: Ievgenii Meshcheriakov <ievgenii.meshcheriakov@qt.io>
2023-08-01 15:35:48 +02:00
Mate Barany 3dcd6b7ec9 Pack languageCodeList tighter
Pack some of the arrays that contain locale data more tightly. The
AlphaCode struct is a char[4] but always holds only [a-z]{,3} which
could be fit into 16 bits, halving the size of an AlphaCode struct.

With the new constructor the initialization of the AlphaCode struct
also changes - modify qlocalexml2cpp.py to reflect this change and
regenerate the languageCodeList.

Fixes: QTBUG-105050
Change-Id: I2b1e93ab7cc3f2d667bf67b45769b74a15211931
Reviewed-by: Edward Welbourne <edward.welbourne@qt.io>
2023-03-15 19:48:30 +01:00
Edward Welbourne 277f3345f2 Record a recent discovery: Suzhou isn't hanidec
Revise a comment in ldml.py about Suzhou "digits", since it turns out
they aren't the same as hanidec, which is far from contiguous.

Change-Id: Ia3947dbc5a927772026e55fe197c8ebce2540da2
Reviewed-by: Ievgenii Meshcheriakov <ievgenii.meshcheriakov@qt.io>
2023-02-28 20:14:18 +01:00
Mate Barany 9a8b9473d5 Update CLDR to v42
New languages (and one local for each) added with v42
- Haryanvi
- Moksha
- Northern Frisian
- Obolo
- Pijin
- Rajasthani
- Toki Pona

It also appears that Canada has changed its date format. Modify the
relevant test case to reflect this change.

Task-number: QTBUG-110333
Pick-to: 6.5
Change-Id: Ia8975c2866cd54c9e565543d05bacd52f4987909
Reviewed-by: Mårten Nordheim <marten.nordheim@qt.io>
Reviewed-by: Edward Welbourne <edward.welbourne@qt.io>
2023-02-07 19:04:11 +01:00
Edward Welbourne 0de7e7a066 LocaleDB: Make passing qtbase root dir on command-lines optional
We can easily enough obtain the root of the present source tree using
the value of __file__, so might as well do so.

Change-Id: If14773ac1127278b6018a090c0b376437b9c6eec
Reviewed-by: Ievgenii Meshcheriakov <ievgenii.meshcheriakov@qt.io>
2022-11-01 19:14:14 +02:00
Edward Welbourne 914857be08 Fix trivial typo in cldr.py doc-string
Change-Id: I24b039f9256adb3dc7808cd04bd621226b1a5ed5
Reviewed-by: Thiago Macieira <thiago.macieira@intel.com>
2022-09-14 17:46:39 +02:00
Yuhang Zhao 42fe677584 Core: make CLDR data constexpr
Task-number: QTBUG-100485
Pick-to: 6.3 6.2
Change-Id: Ib8c5160ca0994662a5fdc2293dc734c1bdcac4f2
Reviewed-by: Marc Mutz <marc.mutz@qt.io>
2022-05-27 09:33:28 +08:00
Lucie Gérard 05fc3aef53 Use SPDX license identifiers
Replace the current license disclaimer in files by
a SPDX-License-Identifier.
Files that have to be modified by hand are modified.
License files are organized under LICENSES directory.

Task-number: QTBUG-67283
Change-Id: Id880c92784c40f3bbde861c0d93f58151c18b9f1
Reviewed-by: Qt CI Bot <qt_ci_bot@qt-project.org>
Reviewed-by: Lars Knoll <lars.knoll@qt.io>
Reviewed-by: Jörg Bornemann <joerg.bornemann@qt.io>
2022-05-16 16:37:38 +02:00
Ievgenii Meshcheriakov 4f53c703e4 QLocale: Extend support for language codes
This commit extends functionality for QLocale::codeToLanguage()
and QLocale::languageToCode() by adding an additional argument
that allows selection of the ISO 639 code-set to consider for
those operations.

The following ISO 639 codes are supported:
    * Part 1
    * Part 2 bibliographic
    * Part 2 terminological
    * Part 3

As a result of this change the codeToLanguage() overload without
the additional argument now returns a Language value if it matches
any know code. Previously a valid language was returned only if
the function argument matched the first code defined for that
language from the above list.

[ChangeLog][QtCore][QLocale] Added overloads for codeToLanguage()
and languageToCode() that support specifying which ISO 639 codes
to consider.

Fixes: QTBUG-98129
Change-Id: I4da8a89e2e68a673cf63a621359cded609873fa2
Reviewed-by: Edward Welbourne <edward.welbourne@qt.io>
2021-12-09 03:45:08 +01:00