re PR target/36079 (cld instruction is not emitted anymore.)

PR target/36079
	* configure.ac: Handle --enable-cld.
	* configure: Regenerated.
	* config.gcc: Add USE_IX86_CLD to tm_defines for x86 targets.
	* config/i386/i386.h (struct machine_function): Add needs_cld field.
	(ix86_current_function_needs_cld): New define.
	* config/i386/i386.md (UNSPEC_CLD): New unspec volatile constant.
	(cld): New isns pattern.
	(strmov_singleop, rep_mov, strset_singleop, rep_stos, cmpstrnqi_nz_1,
	cmpstrnqi_1, strlenqi_1): Set ix86_current_function_needs_cld flag.
	* config/i386/i386.opt (mcld): New option.
	* config/i386/i386.c (ix86_expand_prologue): Emit cld insn if
	TARGET_CLD and ix86_current_function_needs_cld.
	(override_options): Use -mcld by default for 32-bit code if
	USE_IX86_CLD.

	* doc/install.texi (Options specification): Document --enable-cld.
	* doc/invoke.texi (Machine Dependent Options)
        [i386 and x86-64 Options]: Add -mcld option.
        (Intel 386 and AMD x86-64 Options): Document -mcld option.

From-SVN: r135792
This commit is contained in:
Uros Bizjak 2008-05-23 09:53:16 +02:00
parent 71995c2c69
commit 922e3e33b2
10 changed files with 122 additions and 34 deletions

View file

@ -1,3 +1,27 @@
2008-05-23 Uros Bizjak <ubizjak@gmail.com>
Jakub Jelinek <jakub@redhat.com>
PR target/36079
* configure.ac: Handle --enable-cld.
* configure: Regenerated.
* config.gcc: Add USE_IX86_CLD to tm_defines for x86 targets.
* config/i386/i386.h (struct machine_function): Add needs_cld field.
(ix86_current_function_needs_cld): New define.
* config/i386/i386.md (UNSPEC_CLD): New unspec volatile constant.
(cld): New isns pattern.
(strmov_singleop, rep_mov, strset_singleop, rep_stos, cmpstrnqi_nz_1,
cmpstrnqi_1, strlenqi_1): Set ix86_current_function_needs_cld flag.
* config/i386/i386.opt (mcld): New option.
* config/i386/i386.c (ix86_expand_prologue): Emit cld insn if
TARGET_CLD and ix86_current_function_needs_cld.
(override_options): Use -mcld by default for 32-bit code if
USE_IX86_CLD.
* doc/install.texi (Options specification): Document --enable-cld.
* doc/invoke.texi (Machine Dependent Options)
[i386 and x86-64 Options]: Add -mcld option.
(Intel 386 and AMD x86-64 Options): Document -mcld option.
2008-05-23 Kai Tietz <kai.tietz@onevison.com>
* config/i386/i386.c (return_in_memory_32): Add ATTRIBUTE_UNUSED.
(return_in_memory_64): Likewise.
@ -58,9 +82,8 @@
(vector_alignment_reachable_p): Likewise.
* tree-vect-transform.c (vectorizable_load): Likewise.
* tree-vectorizer.c (vect_supportable_dr_alignment): Likewise.
* tree-vectorizer.c (get_vectype_for_scalar_type): Pass mode of
scalar_type to UNITS_PER_SIMD_WORD.
(get_vectype_for_scalar_type): Pass mode of scalar_type
to UNITS_PER_SIMD_WORD.
* config/arm/arm.h (UNITS_PER_SIMD_WORD): Updated.
* config/i386/i386.h (UNITS_PER_SIMD_WORD): Likewise.
@ -206,27 +229,21 @@
2008-05-20 David Daney <ddaney@avtrex.com>
* config/mips/mips.md (UNSPEC_SYNC_NEW_OP_12,
UNSPEC_SYNC_OLD_OP_12,
UNSPEC_SYNC_EXCHANGE_12): New define_constants.
(UNSPEC_SYNC_EXCHANGE, UNSPEC_MEMORY_BARRIER,
UNSPEC_SET_GOT_VERSION,
UNSPEC_SYNC_OLD_OP_12, UNSPEC_SYNC_EXCHANGE_12): New define_constants.
(UNSPEC_SYNC_EXCHANGE, UNSPEC_MEMORY_BARRIER, UNSPEC_SET_GOT_VERSION,
UNSPEC_UPDATE_GOT_VERSION): Renumber.
(optab, insn): Add 'plus' and 'minus' to define_code_attr.
(atomic_hiqi_op): New define_code_iterator.
(sync_compare_and_swap<mode>): Call
mips_expand_atomic_qihi instead of
(sync_compare_and_swap<mode>): Call mips_expand_atomic_qihi instead of
mips_expand_compare_and_swap_12.
(compare_and_swap_12): Use MIPS_COMPARE_AND_SWAP_12 instead of
MIPS_COMPARE_AND_SWAP_12_0. Pass argument to
MIPS_COMPARE_AND_SWAP_12.
MIPS_COMPARE_AND_SWAP_12_0. Pass argument to MIPS_COMPARE_AND_SWAP_12.
(sync_<optab><mode>, sync_old_<optab><mode>,
sync_new_<optab><mode>, sync_nand<mode>, sync_old_nand<mode>,
sync_new_nand<mode>): New define_expands for HI and QI mode
operands.
sync_new_nand<mode>): New define_expands for HI and QI mode operands.
(sync_<optab>_12, sync_old_<optab>_12, sync_new_<optab>_12,
sync_nand_12, sync_old_nand_12, sync_new_nand_12): New insns.
(sync_lock_test_and_set<mode>): New define_expand for HI and QI
modes.
(sync_lock_test_and_set<mode>): New define_expand for HI and QI modes.
(test_and_set_12): New insn.
(sync_old_add<mode>, sync_new_add<mode>, sync_old_<optab><mode>,
sync_new_<optab><mode>, sync_old_nand<mode>,
@ -284,10 +301,12 @@
2008-05-20 Jan Sjodin <jan.sjodin@amd.com>
Sebastian Pop <sebastian.pop@amd.com>
* tree-loop-linear.c (gather_interchange_stats): Look in the access matrix,
and never look at the tree representation of the memory accesses.
* tree-loop-linear.c (gather_interchange_stats): Look in the access
matrix, and never look at the tree representation of the memory
accesses.
(linear_transform_loops): Computes parameters and access matrices.
* tree-data-ref.c (compute_data_dependences_for_loop): Returns false when fails.
* tree-data-ref.c (compute_data_dependences_for_loop): Returns false
when fails.
(access_matrix_get_index_for_parameter): New.
* tree-data-ref.h (struct access_matrix): New.
(AM_LOOP_NEST_NUM, AM_NB_INDUCTION_VARS, AM_PARAMETERS, AM_MATRIX,
@ -333,15 +352,15 @@
PR tree-optimization/36206
* tree-chrec.h (chrec_fold_op): New.
* tree-data-ref.c (initialize_matrix_A): Traverse NOP_EXPR, PLUS_EXPR, and
other trees.
* tree-data-ref.c (initialize_matrix_A): Traverse NOP_EXPR, PLUS_EXPR,
and other trees.
2008-05-20 Nathan Sidwell <nathan@codesourcery.com>
* c-incpath.c (INO_T_EQ): Do not define on non-inode systems.
(DIRS_EQ): New.
(remove_duplicates): Do not set inode on non-inode systems. Use
DIRS_EQ.
(remove_duplicates): Do not set inode on non-inode systems.
Use DIRS_EQ.
2008-05-20 Sandra Loosemore <sandra@codesourcery.com>
@ -349,8 +368,7 @@
2008-05-20 Richard Guenther <rguenther@suse.de>
* tree-ssa-reassoc.c (fini_reassoc): Use the statistics
infrastructure.
* tree-ssa-reassoc.c (fini_reassoc): Use the statistics infrastructure.
* tree-ssa-sccvn.c (process_scc): Likewise.
* tree-ssa-sink.c (execute_sink_code): Likewise.
* tree-ssa-threadupdate.c (thread_through_all_blocks): Likewise.

View file

@ -397,8 +397,16 @@ then
fi
case ${target} in
i[34567]86-*-*)
if test $enable_cld = yes; then
tm_defines="${tm_defines} USE_IX86_CLD=1"
fi
;;
x86_64-*-*)
tm_file="i386/biarch64.h ${tm_file}"
if test $enable_cld = yes; then
tm_defines="${tm_defines} USE_IX86_CLD=1"
fi
;;
esac

View file

@ -2764,6 +2764,12 @@ override_options (void)
can be optimized to ap = __builtin_next_arg (0). */
if (!TARGET_64BIT || TARGET_64BIT_MS_ABI)
targetm.expand_builtin_va_start = NULL;
#ifdef USE_IX86_CLD
/* Use -mcld by default for 32-bit code if configured with --enable-cld. */
if (!TARGET_64BIT)
target_flags |= MASK_CLD & ~target_flags_explicit;
#endif
}
/* Return true if this goes in large data/bss. */
@ -6597,6 +6603,10 @@ ix86_expand_prologue (void)
emit_insn (gen_prologue_use (pic_offset_table_rtx));
emit_insn (gen_blockage ());
}
/* Emit cld instruction if stringops are used in the function. */
if (TARGET_CLD && ix86_current_function_needs_cld)
emit_insn (gen_cld ());
}
/* Emit code to restore saved registers using MOV insns. First register

View file

@ -2432,8 +2432,9 @@ struct machine_function GTY(())
int save_varrargs_registers;
int accesses_prev_frame;
int optimize_mode_switching[MAX_386_ENTITIES];
/* Set by ix86_compute_frame_layout and used by prologue/epilogue expander to
determine the style used. */
int needs_cld;
/* Set by ix86_compute_frame_layout and used by prologue/epilogue
expander to determine the style used. */
int use_fast_prologue_epilogue;
/* Number of saved registers USE_FAST_PROLOGUE_EPILOGUE has been computed
for. */
@ -2453,6 +2454,7 @@ struct machine_function GTY(())
#define ix86_stack_locals (cfun->machine->stack_locals)
#define ix86_save_varrargs_registers (cfun->machine->save_varrargs_registers)
#define ix86_optimize_mode_switching (cfun->machine->optimize_mode_switching)
#define ix86_current_function_needs_cld (cfun->machine->needs_cld)
#define ix86_tls_descriptor_calls_expanded_in_cfun \
(cfun->machine->tls_descriptor_call_expanded_p)
/* Since tls_descriptor_call_expanded is not cleared, even if all TLS

View file

@ -213,6 +213,7 @@
(UNSPECV_XCHG 12)
(UNSPECV_LOCK 13)
(UNSPECV_PROLOGUE_USE 14)
(UNSPECV_CLD 15)
])
;; Constants to represent pcomtrue/pcomfalse variants
@ -18374,6 +18375,14 @@
;; Block operation instructions
(define_insn "cld"
[(unspec_volatile [(const_int 0)] UNSPECV_CLD)]
""
"cld"
[(set_attr "length" "1")
(set_attr "length_immediate" "0")
(set_attr "modrm" "0")])
(define_expand "movmemsi"
[(use (match_operand:BLK 0 "memory_operand" ""))
(use (match_operand:BLK 1 "memory_operand" ""))
@ -18446,7 +18455,7 @@
(set (match_operand 2 "register_operand" "")
(match_operand 5 "" ""))])]
"TARGET_SINGLE_STRINGOP || optimize_size"
"")
"ix86_current_function_needs_cld = 1;")
(define_insn "*strmovdi_rex_1"
[(set (mem:DI (match_operand:DI 2 "register_operand" "0"))
@ -18563,7 +18572,7 @@
(match_operand 3 "memory_operand" ""))
(use (match_dup 4))])]
""
"")
"ix86_current_function_needs_cld = 1;")
(define_insn "*rep_movdi_rex64"
[(set (match_operand:DI 2 "register_operand" "=c") (const_int 0))
@ -18723,7 +18732,7 @@
(set (match_operand 0 "register_operand" "")
(match_operand 3 "" ""))])]
"TARGET_SINGLE_STRINGOP || optimize_size"
"")
"ix86_current_function_needs_cld = 1;")
(define_insn "*strsetdi_rex_1"
[(set (mem:DI (match_operand:DI 1 "register_operand" "0"))
@ -18817,7 +18826,7 @@
(use (match_operand 3 "register_operand" ""))
(use (match_dup 1))])]
""
"")
"ix86_current_function_needs_cld = 1;")
(define_insn "*rep_stosdi_rex64"
[(set (match_operand:DI 1 "register_operand" "=c") (const_int 0))
@ -18993,7 +19002,7 @@
(clobber (match_operand 1 "register_operand" ""))
(clobber (match_dup 2))])]
""
"")
"ix86_current_function_needs_cld = 1;")
(define_insn "*cmpstrnqi_nz_1"
[(set (reg:CC FLAGS_REG)
@ -19040,7 +19049,7 @@
(clobber (match_operand 1 "register_operand" ""))
(clobber (match_dup 2))])]
""
"")
"ix86_current_function_needs_cld = 1;")
(define_insn "*cmpstrnqi_1"
[(set (reg:CC FLAGS_REG)
@ -19109,7 +19118,7 @@
(clobber (match_operand 1 "register_operand" ""))
(clobber (reg:CC FLAGS_REG))])]
""
"")
"ix86_current_function_needs_cld = 1;")
(define_insn "*strlenqi_1"
[(set (match_operand:SI 0 "register_operand" "=&c")

View file

@ -250,6 +250,10 @@ Support SSE5 built-in functions and code generation
;; Instruction support
mcld
Target Report Mask(CLD)
Generate cld instruction in the function prologue.
mabm
Target Report RejectNegative Var(x86_abm)
Support code generation of Advanced Bit Manipulation (ABM) instructions.

9
gcc/configure vendored
View file

@ -1046,6 +1046,7 @@ Optional Features:
--enable-sjlj-exceptions
arrange to use setjmp/longjmp exception handling
--enable-secureplt enable -msecure-plt by default for PowerPC
--enable-cld enable -mcld by default for 32bit x86
--disable-win32-registry
disable lookup of installation paths in the
Registry on Windows hosts
@ -13709,6 +13710,14 @@ if test "${enable_secureplt+set}" = set; then
fi;
# Check whether --enable-cld or --disable-cld was given.
if test "${enable_cld+set}" = set; then
enableval="$enable_cld"
else
enable_cld=no
fi;
# Windows32 Registry support for specifying GCC installation paths.
# Check whether --enable-win32-registry or --disable-win32-registry was given.
if test "${enable_win32_registry+set}" = set; then

View file

@ -1528,6 +1528,10 @@ AC_ARG_ENABLE(secureplt,
[ --enable-secureplt enable -msecure-plt by default for PowerPC],
[], [])
AC_ARG_ENABLE(cld,
[ --enable-cld enable -mcld by default for 32bit x86], [],
[enable_cld=no])
# Windows32 Registry support for specifying GCC installation paths.
AC_ARG_ENABLE(win32-registry,
[ --disable-win32-registry

View file

@ -1214,6 +1214,16 @@ Using the GNU Compiler Collection (GCC)},
See ``RS/6000 and PowerPC Options'' in the main manual
@end ifhtml
@item --enable-cld
This option enables @option{-mcld} by default for 32-bit x86 targets.
@ifnothtml
@xref{i386 and x86-64 Options,, i386 and x86-64 Options, gcc,
Using the GNU Compiler Collection (GCC)},
@end ifnothtml
@ifhtml
See ``i386 and x86-64 Options'' in the main manual
@end ifhtml
@item --enable-win32-registry
@itemx --enable-win32-registry=@var{key}
@itemx --disable-win32-registry

View file

@ -553,7 +553,7 @@ Objective-C and Objective-C++ Dialects}.
-masm=@var{dialect} -mno-fancy-math-387 @gol
-mno-fp-ret-in-387 -msoft-float @gol
-mno-wide-multiply -mrtd -malign-double @gol
-mpreferred-stack-boundary=@var{num} -mcx16 -msahf -mrecip @gol
-mpreferred-stack-boundary=@var{num} -mcld -mcx16 -msahf -mrecip @gol
-mmmx -msse -msse2 -msse3 -mssse3 -msse4.1 -msse4.2 -msse4 @gol
-maes -mpclmul @gol
-msse4a -m3dnow -mpopcnt -mabm -msse5 @gol
@ -10814,6 +10814,20 @@ supported architecture, using the appropriate flags. In particular,
the file containing the CPU detection code should be compiled without
these options.
@item -mcld
@opindex mcld
This option instructs GCC to emit a @code{cld} instruction in the prologue
of functions that use string instructions. String instructions depend on
the DF flag to select between autoincrement or autodecrement mode. While the
ABI specifies the DF flag to be cleared on function entry, some operating
systems violate this specification by not clearing the DF flag in their
exception dispatchers. The exception handler can be invoked with the DF flag
set which leads to wrong direction mode, when string instructions are used.
This option can be enabled by default on 32-bit x86 targets by configuring
GCC with the @option{--enable-cld} configure option. Generation of @code{cld}
instructions can be suppressed with the @option{-mno-cld} compiler option
in this case.
@item -mcx16
@opindex mcx16
This option will enable GCC to use CMPXCHG16B instruction in generated code.