While retaining the old sorting order, this allows us to simplify the
ifdef'ery and produces much better code.
With Clang, an equality check is
vmovdqu (%rdi), %xmm0
vpxor (%rsi), %xmm0, %xmm0
vptest %xmm0, %xmm0
sete %al
in C++20 mode.
GCC generates four 64-bit loads instead of using vectors:
movbeq (%rdi), %rax
movbeq 8(%rdi), %rdx
movbeq (%rsi), %r8
movbeq 8(%rsi), %rcx
movq %rdx, %r10
movq %rax, %r11
movq %r8, %rdx
movq %rcx, %rax
xorq %r10, %rax
xorq %r11, %rdx
orq %rdx, %rax
sete %al
(the four MOV in the middle don't seem necessary)
For the sorting case, the compilers need to generate extra code
because of the check on the variant, something I'm scheduling for
removal in Qt 7.0. For long-term sorting code, both GCC and Clang
generate four 64-bit load-and-swap-endianness instructions, but Clang
for some reason also kept the 128-bit vector code (I'm guessing it's a
minor optimization bug that will be corrected in due time).
Change-Id: I46feca3a447244a8ba19fffd17dceacc8e528c3e
Reviewed-by: Ivan Solovev <ivan.solovev@qt.io>
(cherry picked from commit 15f753ca5a60b5273d243f528978e25c28a9b56d)
Reviewed-by: Qt Cherry-pick Bot <cherrypick_bot@qt-project.org>