isHexDigit, isOctalDigit, isAsciiDigit, isAsciiLower, isAsciiUpper,
isAsciiLetterOrNumber.
This de-duplicates some code through out.
Rename two local lambdas that were called "isAsciiLetterOrNumber" to not
conflict with the method in QtMiscUtils.
Change-Id: I5b631f95b9f109136d19515f7e20b8e2fbca3d43
Reviewed-by: Thiago Macieira <thiago.macieira@intel.com>
With the methods that use helpers from qstring.cpp defined in the
latter.
Change-Id: I11d6b0bfb95efe34e56d33d2ecbfe8f4423a9e6c
Reviewed-by: Thiago Macieira <thiago.macieira@intel.com>
Also add optimizations for more string comparisons and add tests and
benchmarks.
[ChangeLog][QtCore][QString] Added utf-8 case-insensitive comparisons
Fixes: QTBUG-100235
Change-Id: I7c0809c6d80c00e9a5d0e8ac3ebb045cf7004a30
Reviewed-by: Thiago Macieira <thiago.macieira@intel.com>
This is a semantic patch using ClangTidyTransformator as in
qtbase/df9d882d41b741fef7c5beeddb0abe9d904443d8, but extended to
handle typedefs and accesses through pointers, too:
const std::string o = "object";
auto hasTypeIgnoringPointer = [](auto type) { return anyOf(hasType(type), hasType(pointsTo(type))); };
auto derivedFromAnyOfClasses = [&](ArrayRef<StringRef> classes) {
auto exprOfDeclaredType = [&](auto decl) {
return expr(hasTypeIgnoringPointer(hasUnqualifiedDesugaredType(recordType(hasDeclaration(decl))))).bind(o);
};
return exprOfDeclaredType(cxxRecordDecl(isSameOrDerivedFrom(hasAnyName(classes))));
};
auto renameMethod = [&] (ArrayRef<StringRef> classes,
StringRef from, StringRef to) {
return makeRule(cxxMemberCallExpr(on(derivedFromAnyOfClasses(classes)),
callee(cxxMethodDecl(hasName(from), parameterCountIs(0)))),
changeTo(cat(access(o, cat(to)), "()")),
cat("use '", to, "' instead of '", from, "'"));
};
renameMethod(<classes>, "count", "size");
renameMethod(<classes>, "length", "size");
except that the on() matcher has been replaced by one that doesn't
ignoreParens().
a.k.a qt-port-to-std-compatible-api V5 with config Scope: 'Container'.
Added two NOLINTNEXTLINEs in tst_qbitarray and tst_qcontiguouscache,
to avoid porting calls that explicitly test count().
Change-Id: Icfb8808c2ff4a30187e9935a51cad26987451c22
Reviewed-by: Ivan Solovev <ivan.solovev@qt.io>
Reviewed-by: Qt CI Bot <qt_ci_bot@qt-project.org>
With the introduction of QAnyStringView, overloading based on UTF-8
and Latin-1 is becoming more common. Often, the two overloads can
share the processing backend, because we're only interested in the
US-ASCII subset of each.
But if they can't, we need a faster way to convert L1 into UTF-8 than
going via UTF-16. This is where the new private API comes in.
Eventually, we should have the converse operation, too, to complete
the set of direct conversions between the possible three
QAnyStringView encodings L1/U8/U16, but this direction is easier to
code (there are no error cases) and more immediately useful, so
provide L1->U8 alone for now.
Change-Id: I3f7e1a9c89979d0eb604cb9e42dedf3d514fca2c
Reviewed-by: Edward Welbourne <edward.welbourne@qt.io>
Reviewed-by: Qt CI Bot <qt_ci_bot@qt-project.org>
Reviewed-by: Mårten Nordheim <marten.nordheim@qt.io>
Reviewed-by: Thiago Macieira <thiago.macieira@intel.com>
It doesn't like 0x80 passed to a char, causing a warning
qstringconverter.cpp(196): warning C4309: 'argument': truncation of constant value
Pick-to: 6.2 6.4
Change-Id: I07ec23f3cb174fb197c3fffd17215b6f83476ebf
Reviewed-by: Lars Knoll <lars@knoll.priv.no>
There are still problems with platforms-specific APIs that are 32-bit
only (cf. QTBUG-105105), but this patch finishes the port of the
cross-platform parts of QStringConverter.
None of these changes have a user-visible effect. They just avoid the
Code Smell that int has become since Qt 6.0.
Pick-to: 6.4
Task-number: QTBUG-103531
Change-Id: I267e2e1268a18c130892fa2fd80d1b5dabb3d9b9
Reviewed-by: Thiago Macieira <thiago.macieira@intel.com>
Int variables are a code smell these days, so make the narrowing
conversion (from ptrdiff_t to int) explicit and add a comment.
Pick-to: 6.4 6.3 6.2
Task-number: QTBUG-105105
Change-Id: Ia4e14f1cc132ca36d15e9684bfcb4605d7b9251f
Reviewed-by: Edward Welbourne <edward.welbourne@qt.io>
Reviewed-by: Thiago Macieira <thiago.macieira@intel.com>
GCC 13 warns:
qstringconverter_p.h:29:6: warning: identifier ‘char8_t’ is a keyword in C++20 [-Wc++20-compat]
29 | enum char8_t : uchar {};
Fix by calling the replacement qchar8_t (and making it a typedef to
char8_t when the latter is available).
Pick-to: 6.4 6.3 6.2
Change-Id: If59a9d55667bf1f5245e3a34189687995b000daa
Reviewed-by: Ville Voutilainen <ville.voutilainen@qt.io>
Reviewed-by: Thiago Macieira <thiago.macieira@intel.com>
We already check for __SSE2__, which gets undefined when __SSE2__ is not
set.
Moreover, we want to use the intrinsics without a runtime check there,
so checking for __SSE2__ is the correct thing to do.
Change-Id: I7f8610e2927650b439c3697585234b843e345e4c
Reviewed-by: Qt CI Bot <qt_ci_bot@qt-project.org>
Reviewed-by: Thiago Macieira <thiago.macieira@intel.com>
value() can potentially throw an exception. We know that it doesn't in
this case, but the compiler doesn't know. And our code checker doesn't
know either and generates lots of false positives. Also, without the
exception propagation code the resulting binary is probably smaller.
Coverity-Id: 386110
Coverity-Id: 384314
Coverity-Id: 383835
Coverity-Id: 383784
Pick-to: 6.4
Change-Id: Icdacf8e003fd3a6ac8fd260ed335239a59de3295
Reviewed-by: Giuseppe D'Angelo <giuseppe.dangelo@kdab.com>
Reviewed-by: Thiago Macieira <thiago.macieira@intel.com>
Now that QStringConverter can handle non UTF encodings through ICU,
add a way to get a decoder for arbitrary HTML code.
Opposed to QStringConverter::encodingForHtml(), this method will
try to create a valid string decoder also for non unicode codecs.
Pick-to: 6.4
Change-Id: I343584da1b114396c744f482d9b433c9cedcc511
Reviewed-by: Fabian Kosmale <fabian.kosmale@qt.io>
Reviewed-by: Lars Knoll <lars.knoll@qt.io>
Reviewed-by: Thiago Macieira <thiago.macieira@intel.com>
This adds support for additional codecs to QStringConverter when ICU is
available.
We store the converter in the state (d[0]), and its canonical name in
d[1]. We need the name there, as in the clear function we close the
UConverter, and set the pointer to null. Consequently, the actual
conversion functions might need to re-open the converter again. The
advantage of this approach is that clear is used in the destructor of
State, and with this approach we properly clean up the state.
There is however a disadvantage: The clear function was so far also used
for resetting the state when QStringConverter::resetState . Discarding
the whole Uconverter for that is however rather costly. For that reason
we modify resetState to call a new function, State::reset. For existing
converters, it behaves the same as clear; for the ICU based converter,
we call the more efficient ucnv_reset. Code compiled against Qt 6.4 can
benefit from this more efficient version; code compiled against older Qt
versions will continue to work, as the conversion functions can just
recretate the converter from the name.
We can distinguish between ICU and non-ICU converters by checking if the
UsesIcu flag is set.
QStringConverter::name is changed to return the name stored in d[1]. The
interface of the ICU converter has a dummy name, so code using the old
name function from QT < 6.4 still returns something, namely a message
asking the user to recompile.
The function is moved out of line, as we need to check for the private
ICU feature, and want to avoid having that check in the public header.
As the QStringConverter ctor taking a name now can allocate memory, it
can no longer be noexcept. Removing the noexceptness is safe, as it was
only added after Qt 6.3.
Note that we cannot extend the API consuming or returning Encoding, as
we use Encoding values to index into an array of converter interfaces in
inline API.
Further API to support getting an ICU converter for HTML will be added
in a future commit.
Currently, the code depending on ICU is enabled at compile time if ICU
is found. However, in the future it could be moved into a plugin to
avoid a hard dependency on ICU in Core.
[ChangeLog][Corelib][Text] QStringConverter and API using it now
supports more text codecs if Qt is compiled with ICU support.
Fixes: QTBUG-103375
Change-Id: I7afb92fc68ef994179ebc7a3aa73beebb1386204
Reviewed-by: Thiago Macieira <thiago.macieira@intel.com>
The QLocal8Bit implementation assumes that there's at most one
continuation byte -- that is, that all codecs are either Single or
Double Byte Character Sets (SBCS or DBCS). It appears to be the case for
all Windows default codepages, except for CP_UTF8, which is an opt-in
anyway.
Instead of fixing our codec, let's just use the optimized UTF-8
implementation.
[ChangeLog][Windows] Fixed support for using Qt applications with UTF-8
as the system codepage or by enabling that in the application's
manifest.
Discussed-on: https://lists.qt-project.org/pipermail/interest/2022-May/038241.html
Pick-to: 6.2 6.3
Change-Id: I77c8221eb2824c369feffffd16f0912550a98049
Reviewed-by: Lars Knoll <lars.knoll@qt.io>
Replace the current license disclaimer in files by
a SPDX-License-Identifier.
Files that have to be modified by hand are modified.
License files are organized under LICENSES directory.
Task-number: QTBUG-67283
Change-Id: Id880c92784c40f3bbde861c0d93f58151c18b9f1
Reviewed-by: Qt CI Bot <qt_ci_bot@qt-project.org>
Reviewed-by: Lars Knoll <lars.knoll@qt.io>
Reviewed-by: Jörg Bornemann <joerg.bornemann@qt.io>
- Replaced QLatin1String with QLatin1StringView in QString/QLatin1String
APIs and docs (except for QLatin1String class declaration and ctor
names).
- Made the docs look like QLatin1StringView is "The Real Thing".
[ChangeLog][QtCore] Made QLatin1StringView the recommended name for
referring to a Latin-1 string view (instead of QLatin1String).
Task-number: QTBUG-98434
Change-Id: I6d9a85cc956c6da0c910ad7d23be7956e4bd94ac
Reviewed-by: Edward Welbourne <edward.welbourne@qt.io>
The interesting part is what cannot be noexcept:
- nameForEncoding() and ctor from Encoding: because they don't handle
all valid values of type Encoding, so have a narrow contract
- encodingForHtml(): because it allocates memory (→ QTBUG-101046)
Change-Id: I30cdc19a32537be047e43955e3337e4d6ccc363f
Reviewed-by: Thiago Macieira <thiago.macieira@intel.com>
Reviewed-by: Mårten Nordheim <marten.nordheim@qt.io>
The State state data member had non-noexcept move-SMFs, which were
inherited by the move-SMFs of QStringConverter, QStringEncoder and
QStringDecoder.
To fix, because it is called in the move-assignment operator, we need
to mark State::clear() as noexcept, and, since that can perform an
indirect call through clearFn, require the clearFn to be noexcept,
too.
The only users of clearFn were in Qt5Compat; a separate fix should
have been merged there by the time this lands.
Change-Id: Ibe8147970886526b6a479960050e108607b63874
Reviewed-by: Thiago Macieira <thiago.macieira@intel.com>
The existing name lookup code used C's toupper() function for
case-insensitive comparison. However, that function's result depends
on the current locale.
Since the matcher is supposed to match the likes of "iso-8859-1" and
"latin-1", matching may fail in locales, such as Turkish, where
toupper(i) is İ (or i, if the former isn't representable in the current
charset), but toupper(I) remains I, causing a False Negative.
To fix, use the US-ASCII-only QtMiscUtils::toAsciiLower() function,
which has the added advantage that it's inline.
Pick-to: 6.3 6.2
Change-Id: I70613c0167d84e3dc3d282c61c716b5dd0b3e6bb
Reviewed-by: Thiago Macieira <thiago.macieira@intel.com>
It's relied on implicitly, which is no longer valid in dev.
And may accidentally be broken in other branches.
Pick-to: 6.3 6.2
Change-Id: I2272b6914e883e20d0989a1762eb1a5c1aef4e0e
Reviewed-by: Fabian Kosmale <fabian.kosmale@qt.io>
The Boyer-Moore tables can be calculated at compile-time, and the
needles are long enough to make skipping worthwhile, even for small
haystacks.
Pick-to: 6.3
Change-Id: I3237812490367ed0491eb8d1667c6da67f38c517
Reviewed-by: Thiago Macieira <thiago.macieira@intel.com>
Reviewed-by: Edward Welbourne <edward.welbourne@qt.io>
Reviewed-by: Mårten Nordheim <marten.nordheim@qt.io>
I can't test on Windows, so skipped the platform-specific code.
Pick-to: 6.3 6.2
Change-Id: Id13d4abc447ddd5d17fb67b670b83207877456f6
Reviewed-by: Thiago Macieira <thiago.macieira@intel.com>
Found by clang 13:
qstringconverter.cpp:1039:15: warning: variable 'length' set but not used [-Wunused-but-set-variable]
Pick-to: 6.2
Change-Id: Iea05060bc2c046928536fffd16adf46d4934c37c
Reviewed-by: Ievgenii Meshcheriakov <ievgenii.meshcheriakov@qt.io>
Reviewed-by: Edward Welbourne <edward.welbourne@qt.io>
Looks like it was copied from QUtf16::convertFromUnicode(), but for the
UTF-32 case that is not correct. UTF-16 to UTF-32 conversions can change
the length of the string due to surrogates.
There are unit test tests for creating and parsing UTF-32 headers, and
for detecting content as UTF-32, but there aren't any for UTF-32
conversions. I don't have time to add a full test for that.
Fixes: QTBUG-97122
Pick-to: 6.2 6.1
Change-Id: Ic17a33f599b844d8ab5dfffd16ab2c4cb6b0547d
Reviewed-by: Ievgenii Meshcheriakov <ievgenii.meshcheriakov@qt.io>
Reviewed-by: Edward Welbourne <edward.welbourne@qt.io>
Reviewed-by: Lars Knoll <lars.knoll@qt.io>
Use standard char16_t and char32_t types instead of ushort and uint.
Remove members of QUtf8BaseTraits that use those integer types.
Change-Id: I77b1a9106244835c813336a50417f6bbdfada288
Reviewed-by: Edward Welbourne <edward.welbourne@qt.io>
this adds emulation for 2 NEON commands that armv7 lacks
this increases conversion speed by around 50% in my simple tests
Change-Id: I4f52d353184e9a8d88089de60e17bd5670637c0c
Reviewed-by: Allan Sandfeld Jensen <allan.jensen@qt.io>
And check that the result fits.
Change-Id: Iaee1085315559bdffea9400b94b29869621ab7ff
Reviewed-by: Edward Welbourne <edward.welbourne@qt.io>
Reviewed-by: Thiago Macieira <thiago.macieira@intel.com>
Try to get rid of APIs that use raw 'const {char, QChar} *, length'
pairs. Instead, use QByteArrayView or QStringView.
As QStringConverter is a new class, simply change the API to what we'd like
to have. Also adjust hidden API in QStringBuilder and friends.
Change-Id: I897d47f63a7b965f5574a1e51da64147f9e981f6
Reviewed-by: Lars Knoll <lars.knoll@qt.io>
Otherwise we hit an #error statement in MSVC standard library.
Change-Id: Ib029edf0be8513a80f2640fd9ca75541615a0448
Reviewed-by: Giuseppe D'Angelo <giuseppe.dangelo@kdab.com>
Avoids using compiler builtins, and can in future replace them.
Change-Id: I3f0afe7d28b6ba05bcd1c1132b44a8db7b182d8a
Reviewed-by: Thiago Macieira <thiago.macieira@intel.com>
This allows us to skip the surrogate pair decoding too, since it can't
match anyway.
Change-Id: Ied637aece2a7427b8a2dfffd16118183e5d76794
Reviewed-by: Lars Knoll <lars.knoll@qt.io>
Properly use the new QStringConverter API and not an internal
qFromUtfEncoded method that was buggy after the changes.
Take the oppportunity to clean up and remove qFromUtfEncoded, as
QClipboard was its only user.
Fixes: QTBUG-85417
Change-Id: I8540d12056bf3f448c1f628ce0bd0ad462a6447d
Reviewed-by: Friedemann Kleint <Friedemann.Kleint@qt.io>
The functional style interface is nice, but does feel alien in some
contexts, so better also have explicit encode and decode methods.
Change-Id: Ic07ced15f65cdb3a7f1cf044041e341d2ef87f79
Reviewed-by: Thiago Macieira <thiago.macieira@intel.com>
Reviewed-by: Alex Blasche <alexander.blasche@qt.io>
Change-Id: I8e29846db77581953d90c818060950744cb9f521
Reviewed-by: Leena Miettinen <riitta-leena.miettinen@qt.io>
Reviewed-by: Oliver Wolff <oliver.wolff@qt.io>
Macros and the await helper function from qfunctions_winrt(_p).h are
needed in other Qt modules which use UWP APIs on desktop windows.
Task-number: QTBUG-84434
Change-Id: Ice09c11436ad151c17bdccd2c7defadd08c13925
Reviewed-by: Tor Arne Vestbø <tor.arne.vestbo@qt.io>
Move pre/and post condition handling out of the main loop
to make that one as fast as possible.
Remove special handling of a corner case when the input length
is zero, where the utf8 decoder did something else than all
other decoders.
Change-Id: I94992767ea15405b38f7953adadaa6ff98b20b6f
Reviewed-by: Qt CI Bot <qt_ci_bot@qt-project.org>
Reviewed-by: Thiago Macieira <thiago.macieira@intel.com>