qt6-bb10/util/locale_database
Edward Welbourne 06f77ab19c Correct handling of World in mapping MS's zone IDs to IANA ones
The AnyTerritory entries in the zoneDataTable are derived from
territory="ZZ" entries in the upstream CLDR data; the World ones from
territory="001". The latter give the default IANA ID for each MS ID,
the former give an (often legacy) IANA ID for the MS ID, that is not
based on geography. Some of these are being removed at CLDR v46.

The documentation said the ZZ entries have "no known territorial
association", hinting that there may be some (unknown) territorial
association; however, CLDR's inclusion of them is as entries with a
known non-territorial association, so revise the phrasing to reflect
this.

Also document that windowsIdToDefaultIanaId() returns empty when
there is no territory-specific value, and callers can use the
territory-neutral call to get a suitable value in that case. (They
may, however, wish to distinguish this case, to treat it differently,
so I decided not to just return that in place of empty in any case.)

The upstream CLDR tables do have entries for territory 001, so we
should report these if asked for World as territory. Amend the
available zone ID lookup and mapping from MS to IANA functions that
take a territory to duly handle World via the default-data that was
derived from 001 data in CLDR, instead of from the territory-varying
table, from which those were effectively filtered out when generating
the two tables. Update docs to mention this handling of World, for
contrast with that of AnyTerritory.

In the process remove a spurious split-on-space from the MS to default
IANA lookup, asserting there is no space (in a field now stored in the
table for single IANA ID entries, instead of the one for space-joined
lists of them in which it used to be stored, before I noticed it's
always only one ID). There is a matching assertion in the cldr.py code
that extracts the data. Added an assertion to this last, that each
default IANA ID given by CLDR's MS data does in fact also appear as
one of the IANA IDs for at least one territory (potentially ZZ), and
comment in C++ code on why this means we don't need to scan the
windowsDataTable in a few places, where it would just produce
duplicate entries.

On picking to 6.8, removed the timezone_locale addition, only relevant
on 6.9 and later.

[ChangeLog][QtCore][QTimeZone] Corrected handling of QLocale::World
and clarified in docs how QLocale::AnyTerritory is handled when
QTimeZone selects zones by territory.

Task-number: QTBUG-130877
Change-Id: I861c777c68b0cb73a194138fe23fbff839df49e6
Reviewed-by: Thiago Macieira <thiago.macieira@intel.com>
(cherry picked from commit e23dc7c420297fb62db9834a17c59bbf5992dad7)
Reviewed-by: Mate Barany <mate.barany@qt.io>
2024-12-04 11:50:36 +01:00
..
testlocales Include relevant Unicode Inc. copyright line in generated data files 2024-08-31 08:56:42 +00:00
README Add a note to README about encoding errors on windows 2024-09-18 09:50:58 +00:00
cldr.py Correct handling of World in mapping MS's zone IDs to IANA ones 2024-12-04 11:50:36 +01:00
cldr2qlocalexml.py Add type annotations to CldrReader 2024-11-13 15:08:05 +01:00
dateconverter.py Improve fidelity of approximation to CLDR zone representations 2024-04-22 11:58:25 +02:00
enumdata.py Update CLDR to v45, adding language Kuvi 2024-07-17 12:57:30 +02:00
formattags.txt
iso639_3.py Use SPDX license identifiers 2022-05-16 16:37:38 +02:00
ldml.py Add type annotations to LocaleScanner 2024-10-28 10:02:22 +00:00
localetools.py Add type annotations to CldrAccess 2024-11-11 12:57:11 +00:00
qlocalexml.py Add type annotations to Spacer 2024-12-03 08:19:33 +00:00
qlocalexml.rnc Integrate timezone data into the CLDR-via-QLocaleXml pipeline 2024-06-02 15:25:27 +02:00
qlocalexml2cpp.py Make static constexpr data tables inline in corelib/t*/q*_data_p.h 2024-11-13 15:04:00 +01:00
zonedata.py Simplify UTC offset ID data by computing the offsets 2024-06-02 15:25:13 +02:00

README

locale_database is used to generate qlocale data from CLDR.

CLDR is the Common Locale Data Repository, a database for localized
data (like date formats, country names etc).  It is provided by the
Unicode consortium.

See cldr2qlocalexml.py for how to run it and qlocalexml2cpp.py to
update the locale data tables (principally text/qlocale_data_p.h,
time/q*calendar_data_p.h and time/qtimezone*_data_p.h under
src/corelib/). See enumdata.py and zonedata.py for when and how to
update the data they provide. You shall definitely need to pass
--no-verify or -n to git commit for these changes.

NOTE: on Windows it is advisable to set the environment variable
PYTHONUTF8 to 1 before running the scripts to avoid encoding errors.