rs6000.c (analyze_swaps commentary): Add discussion of permutes and why we don't handle them.
2014-10-06 Bill Schmidt <wschmidt@linux.vnet.ibm.com> * config/rs6000/rs6000.c (analyze_swaps commentary): Add discussion of permutes and why we don't handle them. From-SVN: r215951
This commit is contained in:
parent
63b9f71bb3
commit
cec5d8be55
2 changed files with 52 additions and 0 deletions
|
@ -1,3 +1,8 @@
|
|||
2014-10-06 Bill Schmidt <wschmidt@linux.vnet.ibm.com>
|
||||
|
||||
* config/rs6000/rs6000.c (analyze_swaps commentary): Add
|
||||
discussion of permutes and why we don't handle them.
|
||||
|
||||
2014-10-06 Eric Botcazou <ebotcazou@adacore.com>
|
||||
|
||||
* config/sparc/predicates.md (int_register_operand): Delete.
|
||||
|
|
|
@ -33431,6 +33431,53 @@ emit_fusion_gpr_load (rtx target, rtx mem)
|
|||
than deleting a swap, we convert the load/store into a permuting
|
||||
load/store (which effectively removes the swap). */
|
||||
|
||||
/* Notes on Permutes
|
||||
|
||||
We do not currently handle computations that contain permutes. There
|
||||
is a general transformation that can be performed correctly, but it
|
||||
may introduce more expensive code than it replaces. To handle these
|
||||
would require a cost model to determine when to perform the optimization.
|
||||
This commentary records how this could be done if desired.
|
||||
|
||||
The most general permute is something like this (example for V16QI):
|
||||
|
||||
(vec_select:V16QI (vec_concat:V32QI (op1:V16QI) (op2:V16QI))
|
||||
(parallel [(const_int a0) (const_int a1)
|
||||
...
|
||||
(const_int a14) (const_int a15)]))
|
||||
|
||||
where a0,...,a15 are in [0,31] and select elements from op1 and op2
|
||||
to produce in the result.
|
||||
|
||||
Regardless of mode, we can convert the PARALLEL to a mask of 16
|
||||
byte-element selectors. Let's call this M, with M[i] representing
|
||||
the ith byte-element selector value. Then if we swap doublewords
|
||||
throughout the computation, we can get correct behavior by replacing
|
||||
M with M' as follows:
|
||||
|
||||
{ M[i+8]+8 : i < 8, M[i+8] in [0,7] U [16,23]
|
||||
M'[i] = { M[i+8]-8 : i < 8, M[i+8] in [8,15] U [24,31]
|
||||
{ M[i-8]+8 : i >= 8, M[i-8] in [0,7] U [16,23]
|
||||
{ M[i-8]-8 : i >= 8, M[i-8] in [8,15] U [24,31]
|
||||
|
||||
This seems promising at first, since we are just replacing one mask
|
||||
with another. But certain masks are preferable to others. If M
|
||||
is a mask that matches a vmrghh pattern, for example, M' certainly
|
||||
will not. Instead of a single vmrghh, we would generate a load of
|
||||
M' and a vperm. So we would need to know how many xxswapd's we can
|
||||
remove as a result of this transformation to determine if it's
|
||||
profitable; and preferably the logic would need to be aware of all
|
||||
the special preferable masks.
|
||||
|
||||
Another form of permute is an UNSPEC_VPERM, in which the mask is
|
||||
already in a register. In some cases, this mask may be a constant
|
||||
that we can discover with ud-chains, in which case the above
|
||||
transformation is ok. However, the common usage here is for the
|
||||
mask to be produced by an UNSPEC_LVSL, in which case the mask
|
||||
cannot be known at compile time. In such a case we would have to
|
||||
generate several instructions to compute M' as above at run time,
|
||||
and a cost model is needed again. */
|
||||
|
||||
/* This is based on the union-find logic in web.c. web_entry_base is
|
||||
defined in df.h. */
|
||||
class swap_web_entry : public web_entry_base
|
||||
|
|
Loading…
Add table
Reference in a new issue