Commit Graph

25 Commits (dbff2edaa169cf33ce78266fd23d3502dadf4fbd)

Author SHA1 Message Date
Mårten Nordheim dbff2edaa1 Update UCD to Unicode 16.0.0
They added some new scripts.

There were a few changes to the line break algorithm,
most notably there is more rules that require more context than before.
While not major, there was some shuffling and additions to our
implementation to match the new rules.

IDNA test data now disallows the trailing dot/empty root label,
technically to be toggled off by an option that controls a few things,
but we don't have options. For test-data they changed the format a
little - "" is used to mean empty string, while a blank segment is
null/no string, update the parser to read this.

Changes in this cherry-pick:
  - Reran tool to resolve conflicts due to
    emoji-data not being extracted in this branch

[ChangeLog][Third-Party Code] Updated the Unicode Character Database to
UCD revision 34/Unicode 16.

Fixes: QTBUG-132902
Task-number: QTBUG-132851
Pick-to: 6.5
Change-Id: I4569703659f6fd0f20943110a03301c1cf8cc1ed
Reviewed-by: Edward Welbourne <edward.welbourne@qt.io>
(cherry picked from commit 85899ff181984a1310cd1ad10cdb0824f1ca5118)
Reviewed-by: Qt Cherry-pick Bot <cherrypick_bot@qt-project.org>
(cherry picked from commit 5985c90d37a096f35b68546f916bec29a218e112)
2025-02-17 14:39:31 +01:00
Lucie Gérard 9fbf346c59 Update license rule to Unicode-3.0
Also remove the now unused license and update the qt_attribution.json

[ChangeLog][Third-Party Code] UCD-generated data files now come under Unicode-3.0

Change-Id: I133b1f20643e29a412053eb08ae4c250d07c561e
Reviewed-by: Edward Welbourne <edward.welbourne@qt.io>
(cherry picked from commit d0bf0660b17af2545c7566329e4bad621c369fee)
2025-02-17 14:39:31 +01:00
Mate Barany 7bb97b13c5 Update CLDR to v46.1
[ChangeLog][Third-Party Code] Updated CLDR data, used by QLocale, to
v46.1.

Task-number: QTBUG-132851
Change-Id: Id08d9337e11234d0ca428c7e435808be1b044f7c
Reviewed-by: Edward Welbourne <edward.welbourne@qt.io>
Reviewed-by: Mårten Nordheim <marten.nordheim@qt.io>
(cherry picked from commit 918566aeddbbf56f8539c44bcd45223d2fbab996)
Reviewed-by: Qt Cherry-pick Bot <cherrypick_bot@qt-project.org>
(cherry picked from commit eac8f36080881f7b8f41a0e1ed811da6e9f08a2c)
2025-01-24 11:41:40 +00:00
Mate Barany 5fbe185931 Update CLDR to v46
New languages added with v46
- Kara-Kalpak
- Swampy Cree

Several new Chinese-language locales have been added, including one
using Latin script, which invalidated some prior QLocale tests, which
have been adjusted to fit.

Some obsolete time-zone identifiers are now treated as deprecated
aliases. These have lost their AnyTerritory association, implying
changes to QTimeZone tests.

Many redundant likely sub-tag rules for unspecified language have been
dropped, in favor of simpler rules.

[ChangeLog][Third-Party Code] Updated CLDR data, used by QLocale, to
v46.

Task-number: QTBUG-130877
Change-Id: I92cf210422c7759dd829a7ca2f845d20e263d25b
Reviewed-by: Edward Welbourne <edward.welbourne@qt.io>
(cherry picked from commit e316276b76b9c3768ca4e19a04d03308ef21fe12)
Reviewed-by: Qt Cherry-pick Bot <cherrypick_bot@qt-project.org>
(cherry picked from commit 9413c19cc1f394bc39a9f46d7d12a71fb42c8d1a)
2025-01-14 11:15:42 +01:00
Alexandru Croitor e2ba5d9053 CMake: Add PURL and CPE info to 3rd party attribution files
The change adds CPE and PURL keys to all qt_attribution.json files in
the repo.

In case if no sensible CPE or PURL exists, a "Comment" field is added
with the text "no relevant CPE or PURL found". If only one of them
does not exist, it is written as such in the Comment field.

This allows filtering for files that haven't had the information added
yet vs those that were looked up but no relevant information was
found.

For sources that are not hosted on github, a generic PURL is used with
a download_url fragment pointing either to the exact location where
the sources can be downloaded, or to the homepage of the project.
The generic package name was chosen based on the 'Id' key of the
attribution entry where it was present, and is not authoritative.

For PURL github packages, the 'git tag' name was specified into the
'version' part of the PURL, rather than the 'version number', because
SBOM processing tooling handle that better than the version number.
For example for the freetype package, we specify the string
'VER-2-13-3' rather than the tag name '2.13.3'.
We might revisit this in the future.

[ChangeLog][Third-Party Code] Added PURL and CPE information to the
attribution files of 3rd party sources.

Task-number: QTBUG-122899
Task-number: QTBUG-129602
Change-Id: Iad126242cafc3ea0b678c5c36b26f857039b1dbd
Reviewed-by: Alexey Edelev <alexey.edelev@qt.io>
(cherry picked from commit 36dca3c04f759449f74008a3e79021a179b0f35e)
2024-11-15 17:41:43 +01:00
Edward Welbourne 6a0f00ac4e Update CLDR to v45, adding language Kuvi
This was in fact present in v44, but we overlooked it somehow. The new
version also fixes some inconsistencies in the data, that I reported
against v44.1; in particular, Tamil no longer claims to override the
root AM/PM markers (probably because it uses 24-hour time so doesn't
need them).

Add the test-file under util to the list of files containing generated
content.

Conflict at 6.8 resolved by regenerating the data; this only changed
the date of generation, not the data. Then hand-edited the date to
match the picked upstream commit, to avoid future conflicts.

[ChangeLog][Third-Party Code] Updated CLDR data, used by QLocale, to
v45.

Task-number: QTBUG-126060
Pick-to: 6.7 6.5 6.2
Change-Id: I81a5bcca49519b55091fc541de6b73b606661bb4
Reviewed-by: Thiago Macieira <thiago.macieira@intel.com>
(cherry picked from commit f79548e268a496698d77d0e78365334d0e507212)
Reviewed-by: Mårten Nordheim <marten.nordheim@qt.io>
2024-07-17 12:57:30 +02:00
Edward Welbourne d5e40b5e58 Revise UCD-generated data files' SPDX headers
The existing data comes under Unicode-DFS-2016 but future updates
shall come under Unicode-3.0, so update the existing headers with the
former and the generator script with the latter. Leave a note in the
attribution file about this transitional state and how to resolve it.

Replaced UNICODE_LICENSE.txt from src/corelib/text/ with
LICENSES/Unicode-DFS-2016.txt, as fetched using reuse download.
This doesn't look like a rename but only actually adds some irrelevant
lines about where on the Unicode website the upstream files (to which
we do not apply this license) come from and changes some spacing.

Pick-to: 6.7 6.5
Fixes: QTBUG-121653
Change-Id: I50c9f4badc77a9aa402af946561aff58ae9e3e7a
Reviewed-by: Volker Hilsheimer <volker.hilsheimer@qt.io>
Reviewed-by: Kai Köhne <kai.koehne@qt.io>
2024-04-22 15:22:12 +00:00
Kai Köhne 39c4c868a4 Use canonical capitalization of Unicode-3.0 SPDX tag
The SPDX database lists the license as 'Unicode-3.0', and 'Unicode
License v3'. Now, the SPDX standard actually says that

   License identifiers (including license exception identifiers) used
   in SPDX documents or source code files should be matched in a case-
   insensitive manner.

But the website at https://spdx.org/licenses/ doesn't treat it this way,
so the link we generate out of the identifier actually gives a 404. So
it's just easier to use the 'original' capitalization.

Amends 063026cc50

Pick-to: 6.5 6.6 6.7
Change-Id: I826077a914721b7b9499ad62c08fdf20be94e88d
Reviewed-by: Mårten Nordheim <marten.nordheim@qt.io>
Reviewed-by: Edward Welbourne <edward.welbourne@qt.io>
2024-03-13 14:43:10 +00:00
Edward Welbourne 063026cc50 Update QLocale and calendar data to CLDR v44.1
(This turns out to be identical to v44, for our purposes.)

The CLDR license has been revised at v44 to "UNICODE LICENSE V3",
which is now included (as LICENSES/UNICODE-3.0.txt) in addition to the
old license (still in use, presumably, by UCD - at least until its
next update). Some new QLocale::Language entries are needed. There is
no change to the time-zone data.

Some tests needed changes:
* Various Arabic locales now use U+0623 (Arabic letter aleph with
  hamza above) in exponent separator, replacing plain U+0627 (Arabic
  letter aleph); it is still followed by U+0633 (Arabic letter seen).
* Where likely sub-tags used to fill in world, 001, as territory for a
  language, they now (e.g. for Prussian and Yiddish) give specific
  countries.
* Tamil locales now have something of a mix of inherited and localized
  forms for AM/PM, which looks a lot like a mistake in CLDR.
* New likely sub-tag rules fix ctor(und_US) and ctor(und_GB), which
  previously failed.

[ChangeLog][Third-Party Code] Updated QLocale's data extracted from
the Unicode Common Locale Data Repository (CLDR) to v44.1. The license
changed to Unicode License V3.

Pick-to: 6.7 6.6 6.5
Fixes: QTBUG-121485
Task-number: QTBUG-121325
Change-Id: Ide1a68016129526d7a5aa3fc67f1a674858696bc
Reviewed-by: Qt CI Bot <qt_ci_bot@qt-project.org>
Reviewed-by: Mårten Nordheim <marten.nordheim@qt.io>
2024-02-02 08:26:03 +01:00
Edward Welbourne 196a1acffc Update CLDR version in qt_attribution.json
Amends commit 9237908327 - I neglected
to update the attribution. The license hasn't changed.

Pick-to: 6.6 6.5
Change-Id: Ie1e281bf08ac31506e152fc0fa17c8fae6b7ac98
Reviewed-by: Mårten Nordheim <marten.nordheim@qt.io>
Reviewed-by: Kai Köhne <kai.koehne@qt.io>
Reviewed-by: Ievgenii Meshcheriakov <ievgenii.meshcheriakov@qt.io>
2023-08-22 16:26:02 +02:00
Edward Welbourne 970841235b Split multi-file Files entries in qt_attribution.json as lists
This is now the official format for Files, when there's more than one,
rather than using space-joined lists.

Pick-to: 6.5
Change-Id: I4a6247fff0ece8ece2944178af38894fd5a2e1e2
Reviewed-by: Qt CI Bot <qt_ci_bot@qt-project.org>
Reviewed-by: Joerg Bornemann <joerg.bornemann@qt.io>
Reviewed-by: Kai Köhne <kai.koehne@qt.io>
2023-04-20 14:17:26 +01:00
Edward Welbourne ce8839e056 Deploy Comment fields in qt_attribution.json files
Replace the old abuse of other fields as comments, to be overwritten
by a later setting to a proper value, with actual Comment fields, now
that we have them.

Added a new comment to the valgrind files to say where they come from
in the upstream.

Pick-to: 6.5
Change-Id: I2edcfa2949fa9e59f3f67d3e578d8e5009854cf6
Reviewed-by: Joerg Bornemann <joerg.bornemann@qt.io>
Reviewed-by: Kai Köhne <kai.koehne@qt.io>
2023-04-20 14:17:26 +01:00
Edward Welbourne 1bf1aec790 Update the list of CLDR-based files
The corelib/text/qt_attribution.json didn't mention the
time/q*calendar_data_p.h files which are also generated from CLDR.

Pick-to: 6.5 6.4 6.2 5.15
Change-Id: I768555d4623204245006897c45af58635761bfa1
Reviewed-by: Kai Köhne <kai.koehne@qt.io>
2023-04-20 14:17:26 +01:00
Mate Barany 9a8b9473d5 Update CLDR to v42
New languages (and one local for each) added with v42
- Haryanvi
- Moksha
- Northern Frisian
- Obolo
- Pijin
- Rajasthani
- Toki Pona

It also appears that Canada has changed its date format. Modify the
relevant test case to reflect this change.

Task-number: QTBUG-110333
Pick-to: 6.5
Change-Id: Ia8975c2866cd54c9e565543d05bacd52f4987909
Reviewed-by: Mårten Nordheim <marten.nordheim@qt.io>
Reviewed-by: Edward Welbourne <edward.welbourne@qt.io>
2023-02-07 19:04:11 +01:00
Kai Köhne cbd5bc0b58 Doc: Fix paths for Files property in qt_attribution.json files
qtattributionsscanner expects file paths to be separated by a space.

Pick-to: 6.2 6.4
Change-Id: I4c9dfea0f086fc9631cb06f40e2d3cab0a32ca4e
Reviewed-by: Jörg Bornemann <joerg.bornemann@qt.io>
2022-12-08 15:14:17 +01:00
Ievgenii Meshcheriakov c4e550703c Update UCD to Revision 30
This corresponds to Unicode version 15.0.0.

Added the following scripts:

    * Kawi
    * Nag Mundari

Full support of these scripts requires harfbuzz version 5.2.0,
this version adds support for Unicode 15.0:

    https://github.com/harfbuzz/harfbuzz/releases/tag/5.2.0

Fixes: QTBUG-106810
Change-Id: Ib06c526e49b0f01ef9f21123bcf875c6b19f2601
Reviewed-by: Edward Welbourne <edward.welbourne@qt.io>
2022-10-11 14:10:59 +00:00
Ievgenii Meshcheriakov 6b739f836b Fix CLDR version in qt_attribution.json
CLDR was updated to version 41 in 59860685a1
but this file was not updated.

Task-number: QTBUG-103663
Change-Id: I163a4a3f6ce16d611c013656fa569be01880e72c
Reviewed-by: Ivan Solovev <ivan.solovev@qt.io>
2022-05-23 20:28:50 +00:00
Ievgenii Meshcheriakov 96a03533f9 Update CLDR-derived data to newly-released v40
Update tst_qlocale to take into account "narrow" day representation
change for Russian locales. This version of CLDR changes narrow forms
to one letter. Previously those forms were identical to short forms
(two letter). The new representation is consistent with other languages
and so does not appear to be a bug.

Fixes: QTBUG-94358
Pick-to: 6.2
Change-Id: I9724c281a250685da8232e5c05c9c375a8c79253
Reviewed-by: Edward Welbourne <edward.welbourne@qt.io>
2021-11-10 00:36:12 +01:00
Ievgenii Meshcheriakov 826fc8c9bd Update UCD to Revision 28
This corresponds to Unicode version 14.0.0.

Added the following scripts:

    * CyproMinoan
    * OldUyghur
    * Tangsa
    * Toto
    * Vithkuqi

Full support of these scripts requires harfbuzz version 3.0.0,
this version adds support for Unicode 14.0:

    https://github.com/harfbuzz/harfbuzz/releases/tag/3.0.0

With this release 10 test cases in tst_qurluts46 were fixed, one
additional test case is failing in tst_qtextboundaryfinder and
is commented out. In total 62 line break test cases and 44 word
break test cases are failing.

A comment in src/corelib/text/qt_attribution.json was updated to
include the URL of the page containing UCD version number.

Fixes: QTBUG-94359
Change-Id: Iefc9ff13f3df279f91cbdb1246d56f75b20ecb35
Reviewed-by: Edward Welbourne <edward.welbourne@qt.io>
2021-10-18 16:45:10 +00:00
Mårten Nordheim 20a31b1a39 Update CLDR qt_attribution.json
We updated to v39 in 6235893d54

Task-number: QTBUG-94410
Pick-to: 6.2 6.1 5.15
Change-Id: I73d539d677c9066dc5ceb6b4fc65fb544f39ac7f
Reviewed-by: Edward Welbourne <edward.welbourne@qt.io>
2021-06-14 11:28:38 +02:00
Edward Welbourne 246ba8ca61 Update CLDR to v38
Fresh on the heels of our update to v37, they've released a new version.
No new languages to complicate life, fortunately.

Updated license (year range) and attribution. One test also needed an
update: Catalan's long time format now parenthesizes the zone.

Task-number: QTBUG-87925
Pick-to: 5.15
Change-Id: I54fb9b7f084b5cd019c983c1e3862dc03865a272
Reviewed-by: Mårten Nordheim <marten.nordheim@qt.io>
2020-11-08 13:01:29 +01:00
Edward Welbourne 54f8be6cc0 Update UCD to Revision 26
Include WordBreakTest.html, since a test uses sample strings from it,
albeit without actually reading the file.

Had to comment out more of the new tests, as at Revision 24, pending
an update to harfbuzz and the text boundary detection code.

Task-number: QTBUG-79631
Task-number: QTBUG-79418
Task-number: QTBUG-82747
Change-Id: I0082294b09d67ffdc6a9b5c15acf77ad3b86f65f
Reviewed-by: Lars Knoll <lars.knoll@qt.io>
2020-03-14 11:26:59 +01:00
Edward Welbourne c3eb521a0f Update UCD data to Unicode 12.1.0's Revision 24
Had to teach the update program to accept category Lm as for
Joining_Transparent, for the sake of a new ArabicShaping.txt entry.
Added three new Unicode versions, several new scripts and a new
word-break class.

Updated UCD's test data for tst_QTextBoundaryFinder.  This left 57
tests failing; I have commented out the data rows for those tests,
pending someone with more knowledge addressing this.

Task-number: QTBUG-79631
Task-number: QTBUG-79418
Change-Id: Ic33d3b3551195d47a84d98e84020f57a68f0b201
Reviewed-by: Eskil Abrahamsen Blomfeldt <eskil.abrahamsen-blomfeldt@qt.io>
2019-10-30 17:38:02 +01:00
Edward Welbourne 43f64b4dc8 Update CLDR to v36
Released on October 4th.
Adds Windows names for two time zones, Qyzylorda and Volgograd.
Added languages Chickasaw (cic), Muscogee (mus) and Silesian (szl).

Norwegian number formatting has flipped back to using colon rather
than dot as time separator; it's flipped back and forth over the last
several CLDR releases.  The dot form is present as a variant, the
colon form was long given as the normal pattern, then went away; but
now it's back as a contributed draft and that's what we pick up.

The MS-Win time-zone ID script was iterating a dict, causing random
reshuffling when new entries are added. Fixed that by doing the
critical iteration in sorted order.

Omitted locales ccp_BD and ccp_IN due to QTBUG-69324.

Task-number: QTBUG-79418
Change-Id: I43869ee1810ecc1fe876523947ddcbcddf4e550a
Reviewed-by: Lars Knoll <lars.knoll@qt.io>
2019-10-25 11:44:48 +02:00
Edward Welbourne a9aa206b7b Move text-related code out of corelib/tools/ to corelib/text/
This includes byte array, string, char, unicode, locale, collation and
regular expressions.

Change-Id: I8b125fa52c8c513eb57a0f1298b91910e5a0d786
Reviewed-by: Volker Hilsheimer <volker.hilsheimer@qt.io>
2019-07-10 17:05:30 +02:00