PR fortran/114024
gcc/fortran/ChangeLog:
* trans-stmt.cc (gfc_trans_allocate): When a source expression has
substring references, part-refs, or %re/%im inquiries, wrap the
entity in parentheses to force evaluation of the expression.
gcc/testsuite/ChangeLog:
* gfortran.dg/allocate_with_source_27.f90: New test.
* gfortran.dg/allocate_with_source_28.f90: New test.
Co-Authored-By: Harald Anlauf <anlauf@gmx.de>
For a vec_init (_a, _a, _a, _a) with _a of mode DImode we try to
construct a "superword" of two "_a"s. This only works for modes < Pmode
when we can "shift and or" both halves into one Pmode register.
This patch disallows the optimization for inner_mode == Pmode and emits
a simple broadcast in such a case.
gcc/ChangeLog:
PR target/114028
* config/riscv/riscv-v.cc (rvv_builder::can_duplicate_repeating_sequence_p):
Return false if inner mode is already Pmode.
(rvv_builder::is_all_same_sequence): New function.
(expand_vec_init): Emit broadcast if sequence is all same.
gcc/testsuite/ChangeLog:
* gcc.target/riscv/rvv/autovec/pr114028.c: New test.
When targetm.cxx.cdtor_returns_this () (aka on arm32 TARGET_AAPCS_BASED)
constructor is supposed to return this pointer, but when we cp_fold such
a call, we don't take that into account and just INIT_EXPR the object,
so we can later ICE during gimplification, because the expression doesn't
have the right type.
2024-02-23 Jakub Jelinek <jakub@redhat.com>
PR c++/113083
* cp-gimplify.cc (cp_fold): For targetm.cxx.cdtor_returns_this ()
wrap r into a COMPOUND_EXPR and return folded CALL_EXPR_ARG (x, 0).
* g++.dg/cpp0x/constexpr-113083.C: New test.
early-ra already had code to do regrename-style "broadening"
of the allocation, to promote scheduling freedom. However,
the pass divides the function into allocation regions
and this broadening only worked within a single region.
This meant that if a basic block contained one subblock
of FPR use, followed by a point at which no FPRs were live,
followed by another subblock of FPR use, the two subblocks
would tend to reuse the same registers. This in turn meant
that it wasn't possible to form LDP/STP pairs between them.
The failure to form LDPs and STPs in the testcase was a
regression from GCC 13.
The patch adds a simple heuristic to prefer less recently
used registers in the event of a tie.
gcc/
PR target/113613
* config/aarch64/aarch64-early-ra.cc
(early_ra::m_current_region): New member variable.
(early_ra::m_fpr_recency): Likewise.
(early_ra::start_new_region): Bump m_current_region.
(early_ra::allocate_colors): Prefer less recently used registers
in the event of a tie. Add a comment to explain why we prefer(ed)
higher-numbered registers.
(early_ra::find_oldest_color): Prefer less recently used registers
here too.
(early_ra::finalize_allocation): Update recency information for
allocated registers.
(early_ra::process_blocks): Initialize m_current_region and
m_fpr_recency.
gcc/testsuite/
PR target/113613
* gcc.target/aarch64/pr113613.c: New test.
Most code in early-ra used is_chain_candidate to check whether we
should chain two allocnos. This included both tests that matter
for correctness and tests for certain heuristics.
Once that test passes for one pair of allocnos, we test whether
it's safe to chain the containing groups (which might contain
multiple allocnos for x2, x3 and x4 modes). This test used an
inline test for correctness only, deliberately skipping the
heuristics. However, this instance of the test was missing
some handling of equivalent allocnos.
This patch fixes things by making is_chain_candidate take a
strictness parameter: correctness only, or correctness + heuristics.
It then makes the group-chaining test use the correctness version
rather than trying to replicate it inline.
gcc/
PR target/113295
* config/aarch64/aarch64-early-ra.cc
(early_ra::test_strictness): New enum.
(early_ra::is_chain_candidate): Add a strictness parameter to
control whether only correctness matters, or whether both correctness
and heuristics should be used. Handle multiple levels of equivalence.
(early_ra::find_related_start): Update call accordingly.
(early_ra::strided_polarity_pref): Likewise.
(early_ra::form_chains): Likewise.
(early_ra::try_to_chain_allocnos): Use is_chain_candidate in
correctness mode rather than trying to inline the test.
gcc/testsuite/
PR target/113295
* gcc.target/aarch64/pr113295-2.c: New test.
416.gamess showed up two wrong-code bugs in early-ra. This patch
fixes the first of them. It was difficult to reduce the source code
to something that would meaningfully show the situation, so the
testcase uses a direct RTL sequence instead.
In the sequence:
(a) register <2> is set more than once
(b) register <2> is copied to a temporary (<4>)
(c) register <2> is the destination of an FCSEL between <4> and
another value (<5>)
(d) <4> and <2> are equivalent for <4>'s live range
(e) <5>'s and <2>'s live ranges do not intersect, and there is
a pseudo-copy between <5> and <2>
On its own, (d) implies that <4> can be treated as equivalent to <2>.
And on its own, (e) implies that <5> can share <2>'s register. But
<4>'s and <5>'s live ranges conflict, meaning that they cannot both
share the register together. A bit of missing bookkeeping meant that
the mechanism for detecting this didn't fire. We therefore ended up
with an FCSEL in which both inputs were the same register.
gcc/
PR target/113295
* config/aarch64/aarch64-early-ra.cc
(early_ra::find_related_start): Account for definitions by shared
registers when testing for a single register definition.
(early_ra::accumulate_defs): New function.
(early_ra::record_copy): If A shares B's register, fold A's
definition information into B's. Fold A's use information into B's.
gcc/testsuite/
PR target/113295
* gcc.dg/rtl/aarch64/pr113295-1.c: New test.
If assembler and linker supports
add %reg1, name@gottpoff(%rip), %reg2
with R_X86_64_CODE_6_GOTTPOFF, we can generate it instead of
mov name@gottpoff(%rip), %reg2
add %reg1, %reg2
gcc/
* configure.ac (HAVE_AS_R_X86_64_CODE_6_GOTTPOFF): Defined as 1
if R_X86_64_CODE_6_GOTTPOFF is supported.
* config.in: Regenerated.
* configure: Likewise.
* config/i386/predicates.md (apx_ndd_add_memory_operand): Allow
UNSPEC_GOTNTPOFF if R_X86_64_CODE_6_GOTTPOFF is supported.
gcc/testsuite/
* gcc.target/i386/apx-ndd-tls-1b.c: New test.
* lib/target-supports.exp
(check_effective_target_code_6_gottpoff_reloc): New.
The expand pattern for reciprocal division was enabled for all math
optimization modes, but the patterns it was generating were not
enabled unless -funsafe-math-optimizations were enabled, this leads to
an ICE when the pattern we generate cannot be recognized.
Fixed by only enabling vector division when doing unsafe math.
gcc:
PR target/108120
* config/arm/neon.md (div<VCVTF:mode>3): Rename from div<mode>3.
Gate with ARM_HAVE_NEON_<MODE>_ARITH.
gcc/testsuite:
PR target/108120
* gcc.target/arm/neon-recip-div-1.c: New file.
The following testcase ICEs, because the REDUCE_BIT_FIELD macro uses
the target variable implicitly:
#define REDUCE_BIT_FIELD(expr) (reduce_bit_field \
? reduce_to_bit_field_precision ((expr), \
target, \
type) \
: (expr))
and so when the code below reuses the target variable, documented to be
The value may be stored in TARGET if TARGET is nonzero.
TARGET is just a suggestion; callers must assume that
the rtx returned may not be the same as TARGET.
for something unrelated (the value that should be returned), this misbehaves
(in the testcase target is set to a CONST_INT, which has VOIDmode and
reduce_to_bit_field_precision assert checking doesn't like that).
Needed to say that
If TARGET is CONST0_RTX, it means that the value will be ignored.
but in expand_expr_real_2 does at the start:
ignore = (target == const0_rtx
|| ((CONVERT_EXPR_CODE_P (code)
|| code == COND_EXPR || code == VIEW_CONVERT_EXPR)
&& TREE_CODE (type) == VOID_TYPE));
/* We should be called only if we need the result. */
gcc_assert (!ignore);
- so such target is mainly meant for calls and the like in other routines.
Certainly doesn't expect that target changes from not being ignored
initially to ignore later on and other CONST_INT results as well as anything
which is not an object into which anything can be stored.
So, the following patch fixes that by using a more appripriate temporary
for the result, which other code is using.
2024-02-23 Jakub Jelinek <jakub@redhat.com>
PR rtl-optimization/114054
* expr.cc (expand_expr_real_2) <case MULT_EXPR>: Use
temp variable instead of target parameter for result.
* gcc.dg/bitint-92.c: New test.
The following testcases show 2 bugs in the .{ADD,SUB}_OVERFLOW lowering,
both related to storing of the REALPART_EXPR part of the result.
On the first testcase prec is 255, prec_limbs is 4 and for the second limb
in the loop the store of the REALPART_EXPR of .USUBC (_30) is stored through:
if (_27 <= 3)
goto <bb 12>; [80.00%]
else
goto <bb 15>; [20.00%]
<bb 12> [local count: 1073741824]:
if (_27 < 3)
goto <bb 14>; [80.00%]
else
goto <bb 13>; [20.00%]
<bb 13> [local count: 1073741824]:
bitint.3[_27] = _30;
goto <bb 15>; [100.00%]
<bb 14> [local count: 858993464]:
MEM[(unsigned long *)&bitint.3 + 24B] = _30;
<bb 15> [local count: 1073741824]:
The first check is right, as prec_limbs is 4, we don't want to store
bitint.3[4] or above at all, those limbs are just computed for the overflow
checking and nothing else, so _27 > 4 leads to no store.
But the other condition is exact opposite of what should be done, if
the current index of the second limb (_27) is < 3, then it should
bitint.3[_27] = _30;
and if it is == 3, it should
MEM[(unsigned long *)&bitint.3 + 24B] = _30;
and (especially important for the targets which would bitinfo.extended = 1)
should actually in this case zero extend it from the 63 bits to 64, that is
the handling of the partial limb. The if_then_if_then_else helper if
there are 2 conditions sets m_gsi to be at the start of the
edge_true_false->dest bb, i.e. when the first condition is true and second
false, and that is where we store the SSA_NAME indexed limb store, so the
condition needs to be reversed.
The following patch does that and adds the cast as well, the usual
assumption that already handle_operand has the partial limb type doesn't
have to be true here, because the source operand could have much larger
precision than the REALPART_EXPR of the lhs.
2024-02-23 Jakub Jelinek <jakub@redhat.com>
PR tree-optimization/114040
* gimple-lower-bitint.cc (bitint_large_huge::lower_addsub_overflow):
Use EQ_EXPR rather than LT_EXPR for g2 condition and change its
probability from likely to unlikely. When handling the true true
store, first cast to limb_access_type and then to l's type.
* gcc.dg/torture/bitint-60.c: New test.
* gcc.dg/torture/bitint-61.c: New test.
The gold linker has never been ported to LoongArch (and it seems
unlikely to be ported in the future as the new architectures are
focusing on lld and/or mold for fast linkers).
ChangeLog:
* configure.ac (ENABLE_GOLD): Remove loongarch*-*-* from target
list.
* configure: Regenerate.
The following deprecates ia64*-*-* for GCC 14. Since we plan to
force LRA for GCC 15 and the target only has slim chances of getting
updated this notifies people in advance. Given both Linux and
glibc have axed the target further development is also made difficult.
There is no listed maintainer for ia64 either.
PR target/90785
gcc/
* config.gcc: Add ia64*-*-* to the list of obsoleted targets.
contrib/
* config-list.mk (LIST): --enable-obsolete for ia64*-*-*.
gcc.dg/vect/vect-bic-bitmask-12.c and gcc.dg/vect/vect-bic-bitmask-23.c
currently FAIL on 32 and 64-bit Solaris/SPARC
FAIL: gcc.dg/vect/vect-bic-bitmask-12.c -flto -ffat-lto-objects scan-tree-dump dce7 "<=\\\\s*.+{ 255,.+}"
FAIL: gcc.dg/vect/vect-bic-bitmask-12.c scan-tree-dump dce7 "<=\\\\s*.+{ 255,.+}"
FAIL: gcc.dg/vect/vect-bic-bitmask-23.c -flto -ffat-lto-objects scan-tree-dump dce7 "<=\\\\s*.+{ 255, 15, 1, 65535 }"
FAIL: gcc.dg/vect/vect-bic-bitmask-23.c scan-tree-dump dce7 "<=\\\\s*.+{ 255, 15, 1, 65535 }"
although they should be skipped since
commit 5f07095d22
Author: Tamar Christina <tamar.christina@arm.com>
Date: Tue Mar 8 11:32:59 2022 +0000
vect: disable bitmask tests on sparc
The problem is that dg-skip-if must come after dg-do, although this
isn't currently documented unfortunately.
Fixed by reordering the directives.
Tested on sparc-sun-solaris2.11 and i386-pc-solaris2.11.
2024-02-22 Rainer Orth <ro@CeBiTec.Uni-Bielefeld.DE>
gcc/testsuite:
* gcc.dg/vect/vect-bic-bitmask-12.c: Move dg-skip-if down.
* gcc.dg/vect/vect-bic-bitmask-23.c: Likewise.
gcc.dg/plugin/crash-test-write-though-null-sarif.c FAILs on Solaris:
FAIL: gcc.dg/plugin/crash-test-write-though-null-sarif.c -fplugin=./crash_test_plugin.so scan-sarif-file "text": "Segmentation fault
Comparing the sarif files between Linux and Solaris reveals
- "message": {"text": "Segmentation fault"},
+ "message": {"text": "Segmentation Fault"},
This patch allows for both forms.
Tested on i386-pc-solaris2.11, sparc-sun-solaris2.11, and
x86_64-pc-linux-gnu.
2024-02-22 Rainer Orth <ro@CeBiTec.Uni-Bielefeld.DE>
gcc/testsuite:
* gcc.dg/plugin/crash-test-write-though-null-sarif.c
(scan-sarif-file): Allow for "Segmentation Fault", too.
This builds for me, and I frequently have python-is-python3 type
packages installed so I think I've been implicitly testing it for a
while. Looks like Kito's tested similar configurations, and the
bugzilla indicates we should be moving over.
gcc/ChangeLog:
PR other/109668
* config/riscv/arch-canonicalize: Move to python3
* config/riscv/multilib-generator: Likewise
This came up recently as Edwin was looking through the test suite. A
few of us were talking about this during the patchwork meeting and were
surprised. Looks like this is the desired behavior, so let's at least
document it.
gcc/ChangeLog:
* doc/invoke.texi: Document -mcpu.
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
In binutils 2.40 and earlier versions, only a warning will be reported
when a relocation immediate value is out of bounds. As a result,
the value of the macro HAVE_AS_COND_BRANCH_RELAXATION will also be
defined as 1 when the assembler does not support conditional branch
relaxation. Therefore, add the compilation option "--fatal-warnings"
to avoid this problem.
gcc/ChangeLog:
* configure: Regenerate.
* configure.ac: Add parameter "--fatal-warnings" to assemble
when checking whether the assemble support conditional branch
relaxation.
This fixes
error: 'operator new' takes type 'size_t' ('unsigned int') as first parameter [-fpermissive]
gcc/testsuite/ChangeLog:
* g++.dg/warn/Wmismatched-new-delete-8.C: Use __SIZE_TYPE__.
We aren't able to parse __has_attribute (vendor::attr) (and __has_c_attribute
and __has_cpp_attribute) in strict C < C23 modes. While in -std=gnu* modes
or in -std=c23 there is CPP_SCOPE token, in -std=c* (except for -std=c23)
there are is just a pair of CPP_COLON tokens.
The c-lex.cc hunk adds support for that.
That leads to a question if we should return 1 or 0 from
__has_attribute (gnu::unused) or not, because while
[[gnu::unused]] is parsed fine in -std=gnu*/-std=c23 modes (sure, with
pedwarn for < C23), we do not parse it at all in -std=c* (except for
-std=c23), we only parse [[__extension__ gnu::unused]] there. While
the __extension__ in there helps to avoid the pedwarn, I think it is
better to be consistent between GNU and strict C < C23 modes and
parse [[gnu::unused]] too; on the other side, I think parsing
[[__extension__ gnu : : unused]] is too weird and undesirable.
So, the following patch adds a flag during preprocessing at the point
where we normally create CPP_SCOPE tokens out of 2 consecutive colons
on the first CPP_COLON to mark the consecutive case (as we are tight
on the bits, I've reused the PURE_ZERO flag, which is used just by the
C++ FE and only ever set (both C and C++) on CPP_NUMBER tokens, this
new flag has the same value and is only ever used on CPP_COLON tokens)
and instead of checking loose_scope_p argument (i.e. whether it is
[[__extension__ ...]] or not), it just parses CPP_SCOPE or CPP_COLON
with CLONE_SCOPE flag followed by another CPP_COLON the same.
The latter will never appear in >= C23 or -std=gnu* modes, though
guarding its use say with flag_iso && !flag_isoc23 && doesn't really
work because the __extension__ case temporarily clears flag_iso flag.
This makes the -std=c11 etc. behavior more similar to -std=gnu11 or
-std=c23, the only difference I'm aware of are the
#define JOIN2(A, B) A##B
[[vendor JOIN2(:,:) attr]]
[[__extension__ vendor JOIN2(:,:) attr]]
cases, which are accepted in the latter modes, but results in error
in -std=c11; but the error is during preprocessing that :: doesn't
form a valid preprocessing token, which is true, so just don't do that if
you try to have __STRICT_ANSI__ && __STDC_VERSION__ <= 201710L
compatibility.
2024-02-22 Jakub Jelinek <jakub@redhat.com>
PR c/114007
gcc/
* doc/extend.texi: (__extension__): Remove comments about scope
tokens vs. two colons.
gcc/c-family/
* c-lex.cc (c_common_has_attribute): Parse 2 CPP_COLONs with
the first one with COLON_SCOPE flag the same as CPP_SCOPE.
gcc/c/
* c-parser.cc (c_parser_std_attribute): Remove loose_scope_p argument.
Instead of checking it, parse 2 CPP_COLONs with the first one with
COLON_SCOPE flag the same as CPP_SCOPE.
(c_parser_std_attribute_list): Remove loose_scope_p argument, don't
pass it to c_parser_std_attribute.
(c_parser_std_attribute_specifier): Adjust c_parser_std_attribute_list
caller.
gcc/testsuite/
* gcc.dg/c23-attr-syntax-6.c: Adjust testcase for :: being valid
even in -std=c11 even without __extension__ and : : etc. not being
valid anymore even with __extension__.
* gcc.dg/c23-attr-syntax-7.c: Likewise.
* gcc.dg/c23-attr-syntax-8.c: New test.
libcpp/
* include/cpplib.h (COLON_SCOPE): Define to PURE_ZERO.
* lex.cc (_cpp_lex_direct): When lexing CPP_COLON with another
colon after it, if !CPP_OPTION (pfile, scope) set COLON_SCOPE
flag on the first CPP_COLON token.
This looks like an oversight of handling DEMANGLE_COMPONENT_UNNAMED_TYPE.
DEMANGLE_COMPONENT_UNNAMED_TYPE only has the u.s_number.number set while
the code expected newc.u.s_binary.left would be valid.
So this treats DEMANGLE_COMPONENT_UNNAMED_TYPE like we treat function paramaters
(DEMANGLE_COMPONENT_FUNCTION_PARAM) and template paramaters (DEMANGLE_COMPONENT_TEMPLATE_PARAM).
Note the code in the demangler does this when it sets DEMANGLE_COMPONENT_UNNAMED_TYPE:
ret->type = DEMANGLE_COMPONENT_UNNAMED_TYPE;
ret->u.s_number.number = num;
Committed as obvious after bootstrap/test on x86_64-linux-gnu
PR tree-optimization/109804
gcc/ChangeLog:
* gimple-ssa-warn-access.cc (new_delete_mismatch_p): Handle
DEMANGLE_COMPONENT_UNNAMED_TYPE.
gcc/testsuite/ChangeLog:
* g++.dg/warn/Wmismatched-new-delete-8.C: New test.
Signed-off-by: Andrew Pinski <quic_apinski@quicinc.com>
In the vget_set_lane_1.c test the following entries now generate a zip1 instead of an INS
BUILD_TEST (float32x2_t, float32x2_t, , , f32, 1, 0)
BUILD_TEST (int32x2_t, int32x2_t, , , s32, 1, 0)
BUILD_TEST (uint32x2_t, uint32x2_t, , , u32, 1, 0)
This is because the non-Q variant for indices 0 and 1 are just shuffling values.
There is no perf difference between INS SIMD to SIMD and ZIP on Arm uArches but
preferring the INS alternative has a drawback on all uArches as ZIP being a three
operand instruction can be used to tie the result to the return register whereas
INS would require an fmov.
As such just update the test file for now.
gcc/testsuite/ChangeLog:
PR target/112375
* gcc.target/aarch64/vget_set_lane_1.c: Update test output.
The fix marks a constant created during the default BY clause of the
FOR loop as internal. The type checker will always return true if
checking against an internal const.
gcc/m2/ChangeLog:
PR modula2/114055
* gm2-compiler/M2Check.mod (Import): IsConstLitInternal and
IsConstLit.
(isInternal): New procedure function.
(doCheck): Test for isInternal in either operand and early
return true.
* gm2-compiler/M2Quads.mod (PushOne): Rewrite with extra
parameter internal.
(BuildPseudoBy): Add TRUE parameter to PushOne call.
(BuildIncProcedure): Add FALSE parameter to PushOne call.
(BuildDecProcedure): Add FALSE parameter to PushOne call.
* gm2-compiler/M2Range.mod (ForLoopBeginTypeCompatible):
Uncomment code and tidy up error string.
* gm2-compiler/SymbolTable.def (PutConstLitInternal):
New procedure.
(IsConstLitInternal): New procedure function.
* gm2-compiler/SymbolTable.mod (PutConstLitInternal):
New procedure.
(IsConstLitInternal): New procedure function.
(SymConstLit): New field IsInternal.
(CreateConstLit): Initialize IsInternal to FALSE.
gcc/testsuite/ChangeLog:
PR modula2/114055
* gm2/pim/fail/forloopby.mod: New test.
* gm2/pim/pass/forloopby2.mod: New test.
Signed-off-by: Gaius Mulley <gaiusmod2@gmail.com>
The following adds another omission to the assert verifying we're
not running into spurious off == -1.
PR tree-optimization/114048
* tree-ssa-sccvn.cc (copy_reference_ops_from_ref): MEM_REF
can also produce -1 off.
* gcc.dg/torture/pr114048.c: New testcase.
When we classify a conditional reduction chain as CONST_COND_REDUCTION
we fail to verify all involved conditionals have the same constant.
That's a quite unlikely situation so the following simply disables
such classification when there's more than one reduction statement.
PR tree-optimization/114027
* tree-vect-loop.cc (vecctorizable_reduction): Use optimized
condition reduction classification only for single-element
chains.
* gcc.dg/vect/pr114027.c: New testcase.
The profile_count::dump (char *, struct function * = NULL) const;
method has a single caller, the
profile_count::dump (FILE *f, struct function *fun) const;
method and for that going through a temporary buffer is just slower
and opens doors for buffer overflows, which is exactly why this P1
was filed.
The buffer size is 64 bytes, the previous maximum
"%" PRId64 " (%s)"
would print up to 61 bytes in there (19 bytes for arbitrary uint64_t:61
bitfield printed as signed, "estimated locally, globally 0 adjusted"
i.e. 38 bytes longest %s and 4 other characters).
Now, after the r14-2389 changes, it can be
19 + 38 plus 11 other characters + %.4f, which is worst case
309 chars before decimal point, decimal point and 4 digits after it,
so total 382 bytes.
So, either we could bump the buffer[64] to buffer[400], or the following
patch just drops the indirection through buffer and prints it directly to
stream. After all, having APIs which fill in some buffer without passing
down the size of the buffer is just asking for buffer overflows over time.
2024-02-22 Jakub Jelinek <jakub@redhat.com>
PR ipa/111960
* profile-count.h (profile_count::dump): Remove overload with
char * first argument.
* profile-count.cc (profile_count::dump): Change overload with char *
first argument which uses sprintf into the overfload with FILE *
first argument and use fprintf instead. Remove overload which wrapped
it.
The following testcase ICEs, because can_test_argument_range
returns true for BUILT_IN_{COSH,SINH,EXP{,M1,2}}{F32X,F64X}
among many other builtins, but get_no_error_domain doesn't handle
those.
float32x_type_node when supported in GCC always has DFmode, so that
case is easy (and call-cdce assumes that SFmode is IEEE float and DFmode
is IEEE double). So *F32X is simply handled by adding those cases
next to *F64.
float64x_type_node when supported in GCC by definition has a mode
with larger precision and exponent range than DFmode, so it can be XFmode,
TFmode or KFmode. I went through all the l/f128 suffixed builtins and
verified that the float128x_type_node no error domain range is actually
identical to the Intel extended long double no error domain range; it isn't
that surprising, both IEEE quad and Intel/Motorola extended have the same
exponent range [-16381, 16384] (well, Motorola -16382 probably because of
different behavior for denormals, but that has nothing to do with
get_no_error_domain which is about large inputs overflowing into +-Inf
or triggering NaN, denormals could in theory do something solely for sqrt
and even that is fine). In theory some target could have different larger
type, so for *F64X the code verifies that
REAL_MODE_FORMAT (TYPE_MODE (float64x_type_node))->emax == 16384
and if so, uses the *F128 domains, otherwise falls back to the non-suffixed
ones (aka *F64), that is certainly the conservative minimum.
While at it, the patch also changes the *L suffixed cases to do pretty much
the same, the comment said that the function just assumes for *L
the *F64 ranges, but that is unnecessarily conservative.
All we currently have for long double is:
1) IEEE quad (emax 16384, *F128 ranges)
2) XFmode Intel/Motorola extended (emax 16384, same as *F128 ranges)
3) IBM extended (double double, emax 1024, the extra precision doesn't
really help and the domains are the same as for *F64)
4) same as double (*F64 again)
So, the patch uses also for *L
REAL_MODE_FORMAT (TYPE_MODE (long_double_type_node))->emax == 16384
checks and either tail recurses into the *F128 case for that or to
non-suffixed (aka *F64) case otherwise.
BUILT_IN_*F128X not handled because no target has those and it doesn't
seem something is on the horizon and who knows what would be used for that.
Thus, all we get this wrong for are probably VAX floats or something
similar, no intent from me to look at that, that is preexisting issue.
BTW, I'm surprised we don't have BUILT_IN_EXP10F{16,32,64,128,32X,64X,128X}
builtins, seems glibc has those (sure, I think except *16 and *128x).
2024-02-22 Jakub Jelinek <jakub@redhat.com>
PR tree-optimization/113993
* tree-call-cdce.cc (get_no_error_domain): Handle
BUILT_IN_{COSH,SINH,EXP{,M1,2}}{F32X,F64X}. Handle
BUILT_IN_{COSH,SINH,EXP{,M1,2}}L for
REAL_MODE_FORMAT (TYPE_MODE (long_double_type_node))->emax == 16384
the as the F128 suffixed cases, otherwise as non-suffixed ones.
Handle BUILT_IN_{EXP,POW}10L for
REAL_MODE_FORMAT (TYPE_MODE (long_double_type_node))->emax == 16384
as (-inf, 4932).
* gcc.dg/tree-ssa/pr113993.c: New test.
Currently, bitint_large_huge::lower_mul_overflow uses cnt 1 only if
startlimb == endlimb and in that case doesn't use a loop and handles
everything in a special if:
unsigned cnt;
bool use_loop = false;
if (startlimb == endlimb)
cnt = 1;
else if (startlimb + 1 == endlimb)
cnt = 2;
else if ((end % limb_prec) == 0)
{
cnt = 2;
use_loop = true;
}
else
{
cnt = 3;
use_loop = startlimb + 2 < endlimb;
}
if (cnt == 1)
{
...
}
else
The loop handling for the loop exit condition wants to compare if the
incremented index is equal to endlimb, but that is correct only if
end is not divisible by limb_prec and there will be a straight line
check after the loop as well for the most significant limb. The code
used endlimb + (cnt == 1) for that, but cnt == 1 is never true here,
because cnt is either 2 or 3, so the right check is (cnt == 2).
2024-02-22 Jakub Jelinek <jakub@redhat.com>
PR tree-optimization/114038
* gimple-lower-bitint.cc (bitint_large_huge::lower_mul_overflow): Fix
loop exit condition if end is divisible by limb_prec.
* gcc.dg/torture/bitint-59.c: New test.
The problem is that, there are these lines in mips.opt.urls:
; skipping UrlSuffix for 'mabi=' due to finding no URLs
; skipping UrlSuffix for 'mno-flush-func' due to finding no URLs
; skipping UrlSuffix for 'mexplicit-relocs' due to finding no URLs
These lines is not fixed by this patch due to that we don't
document these options:
; skipping UrlSuffix for 'mlra' due to finding no URLs
; skipping UrlSuffix for 'mdebug' due to finding no URLs
; skipping UrlSuffix for 'meb' due to finding no URLs
; skipping UrlSuffix for 'mel' due to finding no URLs
gcc
* doc/invoke.texi(MIPS Options): Fix skipping UrlSuffix
problem of mabi=, mno-flush-func, mexplicit-relocs;
add missing leading - of mbranch-cost option.
* config/mips/mips.opt.urls: Regenerate.
Upgrade the version of RVV intrinsic from 0.11 to 0.12.
PR target/114017
gcc/ChangeLog:
* config/riscv/riscv-c.cc (riscv_cpu_cpp_builtins): Upgrade
the version to 0.12.
gcc/testsuite/ChangeLog:
* gcc.target/riscv/predef-__riscv_v_intrinsic.c: Update the
version to 0.12.
* gcc.target/riscv/rvv/base/pr114017-1.c: New test.
Signed-off-by: Pan Li <pan2.li@intel.com>
The constraints "i" and "s" can be used with a symbol that binds
externally, e.g.
```
namespace ns { extern int var, a[4]; }
void foo() {
asm(".pushsection .xxx,\"aw\"; .dc.a %0; .popsection" :: "s"(&ns::var));
asm(".reloc ., BFD_RELOC_NONE, %0" :: "s"(&ns::a[3]));
}
```
gcc/testsuite/ChangeLog:
* gcc.target/riscv/asm-raw-symbol.c: New test.
Enables assert that every typed instruction is associated with a
dfa reservation
gcc/ChangeLog:
* config/riscv/riscv.cc (riscv_sched_variable_issue): Enable assert
Signed-off-by: Edwin Lu <ewlu@rivosinc.com>
The testcase pr113742.c is failing for 32 bit targets due to the following cc1
error:
cc1: error: ABI requries '-march=rv64'
Specify '-march=rv64gc' with '-mtune=sifive-p600-series'
PR target/113742
gcc/testsuite/ChangeLog:
* gcc.target/riscv/pr113742.c: change mcpu to mtune and add march
Signed-off-by: Edwin Lu <ewlu@rivosinc.com>
BPF programs are not typically linked, which means we cannot fall back
on library calls to implement __builtin_{memmove,memcpy} and should
always expand them inline if possible.
GCC already successfully expands these builtins inline in many cases,
but failed to do so for a few for simple cases involving overlapping
memmove in the kernel BPF selftests and was instead emitting a libcall.
This patch implements a simple inline expansion of memcpy and memmove in
the BPF backend in a verifier-friendly way, with the caveat that the
size must be an integer constant, which is also required by clang.
gcc/
* config/bpf/bpf-protos.h (bpf_expand_cpymem): New.
* config/bpf/bpf.cc: (emit_move_loop, bpf_expand_cpymem): New.
* config/bpf/bpf.md: (cpymemdi, movmemdi): New define_expands.
gcc/testsuite/
* gcc.target/bpf/memcpy-1.c: New test.
* gcc.target/bpf/memmove-1.c: New test.
* gcc.target/bpf/memmove-2.c: New test.
If a for loop contains an incompatible type expression between the
designator and the second expression then the location
used when generating the error message is set to token 0.
The bug is fixed by extending the range checking
InitForLoopBeginRangeCheck. The range checking is processed after
all types, constants have been resolved (and converted into gcc
trees). The range check will check for assignment compatibility
between des and expr1, expression compatibility between des and expr2.
Separate token positions for des, exp1, expr2 and by are stored in the
Range record and used to create virtual tokens if they are on the same
source line.
gcc/m2/ChangeLog:
PR modula2/114026
* gm2-compiler/M2GenGCC.mod (Import): Remove DisplayQuadruples.
Remove DisplayQuadList.
(MixTypesBinary): Replace check with overflowCheck.
New variable typeChecking.
Use GenQuadOTypetok to retrieve typeChecking.
Use typeChecking to suppress error message.
* gm2-compiler/M2LexBuf.def (MakeVirtual2Tok): New procedure
function.
* gm2-compiler/M2LexBuf.mod (MakeVirtualTok): Improve comment.
(MakeVirtual2Tok): New procedure function.
* gm2-compiler/M2Quads.def (GetQuadOTypetok): New procedure.
* gm2-compiler/M2Quads.mod (QuadFrame): New field CheckType.
(PutQuadO): Rewrite using PutQuadOType.
(PutQuadOType): New procedure.
(GetQuadOTypetok): New procedure.
(BuildPseudoBy): Rewrite.
(BuildForToByDo): Remove type checking.
Add parameters e2, e2tok, BySym, bytok to
InitForLoopBeginRange.
Push the RangeId.
(BuildEndFor): Pop the RangeId.
Use GenQuadOTypetok to generate AddOp without type checking.
Call PutRangeForIncrement with the RangeId and IncQuad.
(GenQuadOtok): Rewrite using GenQuadOTypetok.
(GenQuadOTypetok): New procedure.
* gm2-compiler/M2Range.def (InitForLoopBeginRangeCheck):
Rename d as des, e as expr.
Add expr1, expr1tok, expr2, expr2tok, byconst, byconsttok
parameters.
(PutRangeForIncrement): New procedure.
* gm2-compiler/M2Range.mod (Import): MakeVirtual2Tok.
(Range): Add expr2, byconst, destok, exprtok, expr2tok,
incrementquad.
(InitRange): Initialize expr2 to NulSym.
Initialize byconst to NulSym.
Initialize tokenNo, destok, exprtok, expr2tok, byconst to
UnknownTokenNo.
Initialize incrementquad to 0.
(PutRangeForIncrement): New procedure.
(PutRangeDesExpr2): New procedure.
(InitForLoopBeginRangeCheck): Rewrite.
(ForLoopBeginTypeCompatible): New procedure function.
(CodeForLoopBegin): Call ForLoopBeginTypeCompatible and
only code the for loop assignment if all the type checks
succeed.
gcc/testsuite/ChangeLog:
PR modula2/114026
* gm2/extensions/run/pass/callingc10.mod: New test.
* gm2/extensions/run/pass/callingc11.mod: New test.
* gm2/extensions/run/pass/callingc9.mod: New test.
* gm2/extensions/run/pass/strconst.def: New test.
* gm2/pim/fail/forloop.mod: New test.
* gm2/pim/pass/forloop2.mod: New test.
Signed-off-by: Gaius Mulley <gaiusmod2@gmail.com>
In PR 113476 we have discovered that ipcp_param_lattices is no longer
a POD and should be destructed. In a follow-up discussion it
transpired that their initialization done by memsetting their backing
memory to zero is also invalid because now any write there before
construction can be considered dead. Plus that having them in an
array is a little bit old-school and does not get the extra checking
offered by vector along with automatic construction and destruction
when necessary.
So this patch converts the array to a vector. That however means that
ipcp_param_lattices cannot be just a forward declared type but must be
known to all code that deals with ipa_node_params and thus to all code
that includes ipa-prop.h. Therefore I have moved ipcp_param_lattices
and the type it depends on to a new header ipa-cp.h which now
ipa-prop.h depends on. Because we have the (IMHO not a very wise)
rule that headers don't include what they need themselves, I had to
add inclusions of ipa-cp.h and sreal.h (on which it depends) to very
many files, which made the patch rather ugly.
gcc/lto/ChangeLog:
2024-02-16 Martin Jambor <mjambor@suse.cz>
PR ipa/113476
* lto-common.cc: Include sreal.h and ipa-cp.h.
* lto-partition.cc: Include ipa-cp.h, move inclusion of sreal higher.
* lto.cc: Include sreal.h and ipa-cp.h.
gcc/ChangeLog:
2024-02-16 Martin Jambor <mjambor@suse.cz>
PR ipa/113476
* ipa-prop.h (ipa_node_params): Convert lattices to a vector, adjust
initializers in the contructor.
(ipa_node_params::~ipa_node_params): Release lattices as a vector.
* ipa-cp.h: New file.
* ipa-cp.cc: Include sreal.h and ipa-cp.h.
(ipcp_value_source): Move to ipa-cp.h.
(ipcp_value_base): Likewise.
(ipcp_value): Likewise.
(ipcp_lattice): Likewise.
(ipcp_agg_lattice): Likewise.
(ipcp_bits_lattice): Likewise.
(ipcp_vr_lattice): Likewise.
(ipcp_param_lattices): Likewise.
(ipa_get_parm_lattices): Remove assert latticess is non-NULL.
(ipa_value_from_jfunc): Adjust a check for empty lattices.
(ipa_context_from_jfunc): Likewise.
(ipa_agg_value_from_jfunc): Likewise.
(merge_agg_lats_step): Do not memset new aggregate lattices to zero.
(ipcp_propagate_stage): Allocate lattices in a vector as opposed to
just in contiguous memory.
(ipcp_store_vr_results): Adjust a check for empty lattices.
* auto-profile.cc: Include sreal.h and ipa-cp.h.
* cgraph.cc: Likewise.
* cgraphclones.cc: Likewise.
* cgraphunit.cc: Likewise.
* config/aarch64/aarch64.cc: Likewise.
* config/i386/i386-builtins.cc: Likewise.
* config/i386/i386-expand.cc: Likewise.
* config/i386/i386-features.cc: Likewise.
* config/i386/i386-options.cc: Likewise.
* config/i386/i386.cc: Likewise.
* config/rs6000/rs6000.cc: Likewise.
* config/s390/s390.cc: Likewise.
* gengtype.cc (open_base_files): Added sreal.h and ipa-cp.h to the
files to be included in gtype-desc.cc.
* gimple-range-fold.cc: Include sreal.h and ipa-cp.h.
* ipa-devirt.cc: Likewise.
* ipa-fnsummary.cc: Likewise.
* ipa-icf.cc: Likewise.
* ipa-inline-analysis.cc: Likewise.
* ipa-inline-transform.cc: Likewise.
* ipa-inline.cc: Include ipa-cp.h, move inclusion of sreal.h higher.
* ipa-modref.cc: Include sreal.h and ipa-cp.h.
* ipa-param-manipulation.cc: Likewise.
* ipa-predicate.cc: Likewise.
* ipa-profile.cc: Likewise.
* ipa-prop.cc: Likewise.
(ipa_node_params_t::duplicate): Assert new lattices remain empty
instead of setting them to NULL.
* ipa-pure-const.cc: Include sreal.h and ipa-cp.h.
* ipa-split.cc: Likewise.
* ipa-sra.cc: Likewise.
* ipa-strub.cc: Likewise.
* ipa-utils.cc: Likewise.
* ipa.cc: Likewise.
* toplev.cc: Likewise.
* tree-ssa-ccp.cc: Likewise.
* tree-ssa-sccvn.cc: Likewise.
* tree-vrp.cc: Likewise.