Find a file
Richard Biener 85c39a3cf1 AVX512 fully masked vectorization
This implemens fully masked vectorization or a masked epilog for
AVX512 style masks which single themselves out by representing
each lane with a single bit and by using integer modes for the mask
(both is much like GCN).

AVX512 is also special in that it doesn't have any instruction
to compute the mask from a scalar IV like SVE has with while_ult.
Instead the masks are produced by vector compares and the loop
control retains the scalar IV (mainly to avoid dependences on
mask generation, a suitable mask test instruction is available).

Like RVV code generation prefers a decrementing IV though IVOPTs
messes things up in some cases removing that IV to eliminate
it with an incrementing one used for address generation.

One of the motivating testcases is from PR108410 which in turn
is extracted from x264 where large size vectorization shows
issues with small trip loops.  Execution time there improves
compared to classic AVX512 with AVX2 epilogues for the cases
of less than 32 iterations.

size   scalar     128     256     512    512e    512f
    1    9.42   11.32    9.35   11.17   15.13   16.89
    2    5.72    6.53    6.66    6.66    7.62    8.56
    3    4.49    5.10    5.10    5.74    5.08    5.73
    4    4.10    4.33    4.29    5.21    3.79    4.25
    6    3.78    3.85    3.86    4.76    2.54    2.85
    8    3.64    1.89    3.76    4.50    1.92    2.16
   12    3.56    2.21    3.75    4.26    1.26    1.42
   16    3.36    0.83    1.06    4.16    0.95    1.07
   20    3.39    1.42    1.33    4.07    0.75    0.85
   24    3.23    0.66    1.72    4.22    0.62    0.70
   28    3.18    1.09    2.04    4.20    0.54    0.61
   32    3.16    0.47    0.41    0.41    0.47    0.53
   34    3.16    0.67    0.61    0.56    0.44    0.50
   38    3.19    0.95    0.95    0.82    0.40    0.45
   42    3.09    0.58    1.21    1.13    0.36    0.40

'size' specifies the number of actual iterations, 512e is for
a masked epilog and 512f for the fully masked loop.  From
4 scalar iterations on the AVX512 masked epilog code is clearly
the winner, the fully masked variant is clearly worse and
it's size benefit is also tiny.

This patch does not enable using fully masked loops or
masked epilogues by default.  More work on cost modeling
and vectorization kind selection on x86_64 is necessary
for this.

Implementation wise this introduces LOOP_VINFO_PARTIAL_VECTORS_STYLE
which could be exploited further to unify some of the flags
we have right now but there didn't seem to be many easy things
to merge, so I'm leaving this for followups.

Mask requirements as registered by vect_record_loop_mask are kept in their
original form and recorded in a hash_set now instead of being
processed to a vector of rgroup_controls.  Instead that's now
left to the final analysis phase which tries forming the rgroup_controls
vector using while_ult and if that fails now tries AVX512 style
which needs a different organization and instead fills a hash_map
with the relevant info.  vect_get_loop_mask now has two implementations,
one for the two mask styles we then have.

I have decided against interweaving vect_set_loop_condition_partial_vectors
with conditions to do AVX512 style masking and instead opted to
"duplicate" this to vect_set_loop_condition_partial_vectors_avx512.
Likewise for vect_verify_full_masking vs vect_verify_full_masking_avx512.

The vect_prepare_for_masked_peels hunk might run into issues with
SVE, I didn't check yet but using LOOP_VINFO_RGROUP_COMPARE_TYPE
looked odd.

Bootstrapped and tested on x86_64-unknown-linux-gnu.  I've run
the testsuite with --param vect-partial-vector-usage=2 with and
without -fno-vect-cost-model and filed two bugs, one ICE (PR110221)
and one latent wrong-code (PR110237).

	* tree-vectorizer.h (enum vect_partial_vector_style): New.
	(_loop_vec_info::partial_vector_style): Likewise.
	(LOOP_VINFO_PARTIAL_VECTORS_STYLE): Likewise.
	(rgroup_controls::compare_type): Add.
	(vec_loop_masks): Change from a typedef to auto_vec<>
	to a structure.
	* tree-vect-loop-manip.cc (vect_set_loop_condition_partial_vectors):
	Adjust.  Convert niters_skip to compare_type.
	(vect_set_loop_condition_partial_vectors_avx512): New function
	implementing the AVX512 partial vector codegen.
	(vect_set_loop_condition): Dispatch to the correct
	vect_set_loop_condition_partial_vectors_* function based on
	LOOP_VINFO_PARTIAL_VECTORS_STYLE.
	(vect_prepare_for_masked_peels): Compute LOOP_VINFO_MASK_SKIP_NITERS
	in the original niter type.
	* tree-vect-loop.cc (_loop_vec_info::_loop_vec_info): Initialize
	partial_vector_style.
	(can_produce_all_loop_masks_p): Adjust.
	(vect_verify_full_masking): Produce the rgroup_controls vector
	here.  Set LOOP_VINFO_PARTIAL_VECTORS_STYLE on success.
	(vect_verify_full_masking_avx512): New function implementing
	verification of AVX512 style masking.
	(vect_verify_loop_lens): Set LOOP_VINFO_PARTIAL_VECTORS_STYLE.
	(vect_analyze_loop_2): Also try AVX512 style masking.
	Adjust condition.
	(vect_estimate_min_profitable_iters): Implement AVX512 style
	mask producing cost.
	(vect_record_loop_mask): Do not build the rgroup_controls
	vector here but record masks in a hash-set.
	(vect_get_loop_mask): Implement AVX512 style mask query,
	complementing the existing while_ult style.
2023-06-19 09:28:46 +02:00
c++tools Daily bump. 2023-06-16 00:17:18 +00:00
config
contrib Daily bump. 2023-06-18 00:16:32 +00:00
fixincludes Daily bump. 2023-06-16 00:17:18 +00:00
gcc AVX512 fully masked vectorization 2023-06-19 09:28:46 +02:00
gnattools Daily bump. 2023-04-26 00:17:46 +00:00
gotools
include Daily bump. 2023-06-13 00:17:29 +00:00
INSTALL
intl Daily bump. 2023-06-16 00:17:18 +00:00
libada
libatomic Daily bump. 2023-06-11 00:17:37 +00:00
libbacktrace
libcc1 Daily bump. 2023-05-19 00:17:43 +00:00
libcody Daily bump. 2023-06-16 00:17:18 +00:00
libcpp Daily bump. 2023-06-16 00:17:18 +00:00
libdecnumber Daily bump. 2023-06-16 00:17:18 +00:00
libffi Daily bump. 2023-05-07 00:16:40 +00:00
libgcc Daily bump. 2023-06-08 00:17:20 +00:00
libgfortran Daily bump. 2023-06-12 00:16:56 +00:00
libgm2 Daily bump. 2023-06-03 00:16:48 +00:00
libgo libgo/testsuite: add benchmarks and examples to list 2023-06-16 12:29:04 -07:00
libgomp OpenMP (C/C++): Keep pointer value of unmapped ptr with default mapping [PR110270] 2023-06-19 09:08:51 +02:00
libiberty Daily bump. 2023-06-16 00:17:18 +00:00
libitm Daily bump. 2023-06-03 00:16:48 +00:00
libobjc Daily bump. 2023-05-23 00:17:11 +00:00
libphobos
libquadmath
libsanitizer Daily bump. 2023-05-01 00:16:44 +00:00
libssp
libstdc++-v3 Daily bump. 2023-06-17 00:17:17 +00:00
libvtv
lto-plugin Daily bump. 2023-05-12 00:18:12 +00:00
maintainer-scripts Daily bump. 2023-04-21 00:17:31 +00:00
zlib Daily bump. 2023-06-17 00:17:17 +00:00
.dir-locals.el
.gitattributes
.gitignore
ABOUT-NLS
ar-lib
ChangeLog Daily bump. 2023-06-17 00:17:17 +00:00
ChangeLog.jit
ChangeLog.tree-ssa
compile
config-ml.in
config.guess
config.rpath
config.sub
configure configure: Implement --enable-host-pie 2023-06-15 16:51:27 -04:00
configure.ac configure: Implement --enable-host-pie 2023-06-15 16:51:27 -04:00
COPYING
COPYING.LIB
COPYING.RUNTIME
COPYING3
COPYING3.LIB
depcomp
install-sh
libtool-ldflags
libtool.m4
ltgcc.m4
ltmain.sh
ltoptions.m4
ltsugar.m4
ltversion.m4
lt~obsolete.m4
MAINTAINERS MAINTAINERS: move Matthew Fortune to Write After Approval 2023-06-16 16:09:20 +08:00
Makefile.def
Makefile.in Disable warnings as errors for STAGEautofeedback. 2023-05-17 22:43:49 -07:00
Makefile.tpl Disable warnings as errors for STAGEautofeedback. 2023-05-17 22:43:49 -07:00
missing
mkdep
mkinstalldirs
move-if-change
multilib.am
README
symlink-tree
test-driver
ylwrap

This directory contains the GNU Compiler Collection (GCC).

The GNU Compiler Collection is free software.  See the files whose
names start with COPYING for copying permission.  The manuals, and
some of the runtime libraries, are under different terms; see the
individual source files for details.

The directory INSTALL contains copies of the installation information
as HTML and plain text.  The source of this information is
gcc/doc/install.texi.  The installation information includes details
of what is included in the GCC sources and what files GCC installs.

See the file gcc/doc/gcc.texi (together with other files that it
includes) for usage and porting information.  An online readable
version of the manual is in the files gcc/doc/gcc.info*.

See http://gcc.gnu.org/bugs/ for how to report bugs usefully.

Copyright years on GCC source files may be listed using range
notation, e.g., 1987-2012, indicating that every year in the range,
inclusive, is a copyrightable year that could otherwise be listed
individually.