re PR tree-optimization/24659 (Conversions are not vectorized)

PR tree-optimization/24659 * optabs.h (enum optab_index): Add OTI_vec_unpacks_float_hi, OTI_vec_unpacks_float_lo, OTI_vec_unpacku_float_hi, OTI_vec_unpacku_float_lo, OTI_vec_pack_sfix_trunc and OTI_vec_pack_ufix_trunc. (vec_unpacks_float_hi_optab): Define new macro. (vec_unpacks_float_lo_optab): Ditto. (vec_unpacku_float_hi_optab): Ditto. (vec_unpacku_float_lo_optab): Ditto. (vec_pack_sfix_trunc_optab): Ditto. (vec_pack_ufix_trunc_optab): Ditto. * genopinit.c (optabs): Implement vec_unpack[s|u]_[hi|lo]_optab and vec_pack_[s|u]fix_trunc_optab using vec_unpack[s|u]_[hi\lo]_* and vec_pack_[u|s]fix_trunc_* patterns * tree-vectorizer.c (supportable_widening_operation): Handle FLOAT_EXPR and CONVERT_EXPR. Update comment. (supportable_narrowing_operation): New function. * tree-vectorizer.h (supportable_narrowing_operation): Prototype. * tree-vect-transform.c (vectorizable_conversion): Handle (nunits_in == nunits_out / 2) and (nunits_out == nunits_in / 2) cases. (vect_gen_widened_results_half): Move before vectorizable_conversion. (vectorizable_type_demotion): Call supportable_narrowing_operation() to check for target support. * optabs.c (optab_for_tree_code) Return vec_unpack[s|u]_float_hi_optab for VEC_UNPACK_FLOAT_HI_EXPR, vec_unpack[s|u]_float_lo_optab for VEC_UNPACK_FLOAT_LO_EXPR and vec_pack_[u|s]fix_trunc_optab for VEC_PACK_FIX_TRUNC_EXPR. (expand_binop): Special case mode of the result for vec_pack_[u|s]fix_trunc_optab. (init_optabs): Initialize vec_unpack[s|u]_[hi|lo]_optab and vec_pack_[u|s]fix_trunc_optab. * tree.def (VEC_UNPACK_FLOAT_HI_EXPR, VEC_UNPACK_FLOAT_LO_EXPR, VEC_PACK_FIX_TRUNC_EXPR): New tree codes. * tree-pretty-print.c (dump_generic_node): Handle VEC_UNPACK_FLOAT_HI_EXPR, VEC_UNPACK_FLOAT_LO_EXPR and VEC_PACK_FIX_TRUNC_EXPR. (op_prio): Ditto. * expr.c (expand_expr_real_1): Ditto. * tree-inline.c (estimate_num_insns_1): Ditto. * tree-vect-generic.c (expand_vector_operations_1): Ditto. * config/i386/sse.md (vec_unpacks_float_hi_v8hi): New expander. (vec_unpacks_float_lo_v8hi): Ditto. (vec_unpacku_float_hi_v8hi): Ditto. (vec_unpacku_float_lo_v8hi): Ditto. (vec_unpacks_float_hi_v4si): Ditto. (vec_unpacks_float_lo_v4si): Ditto. (vec_pack_sfix_trunc_v2df): Ditto. * doc/c-tree.texi (Expression trees) [VEC_UNPACK_FLOAT_HI_EXPR]: Document. [VEC_UNPACK_FLOAT_LO_EXPR]: Ditto. [VEC_PACK_FIX_TRUNC_EXPR]: Ditto. * doc/md.texi (Standard Names) [vec_pack_sfix_trunc]: Document. [vec_pack_ufix_trunc]: Ditto. [vec_unpacks_float_hi]: Ditto. [vec_unpacks_float_lo]: Ditto. [vec_unpacku_float_hi]: Ditto. [vec_unpacku_float_lo]: Ditto. testsuite/ChangeLog: PR tree-optimization/24659 * gcc.dg/vect/vect-floatint-conversion-2.c: New test. * gcc.dg/vect/vect-intfloat-conversion-1.c: Require vect_float, not vect_int target. * gcc.dg/vect/vect-intfloat-conversion-2.c: Require vect_float, not vect_int target. Loop is vectorized for vect_intfloat_cvt targets. * gcc.dg/vect/vect-intfloat-conversion-3.c: New test. * gcc.dg/vect/vect-intfloat-conversion-4a.c: New test. * gcc.dg/vect/vect-intfloat-conversion-4b.c: New test. From-SVN: r124784
2007-05-17 08:31:05 +02:00 · 2007-05-17 08:31:05 +02:00 · d9987fb407
commit d9987fb407
parent f59d2a7c86
22 changed files with 791 additions and 150 deletions
--- a/gcc/ChangeLog
+++ b/gcc/ChangeLog
@ -1,3 +1,66 @@
+2007-05-17  Uros Bizjak  <ubizjak@gmail.com>
+
+	PR tree-optimization/24659
+	* optabs.h (enum optab_index): Add OTI_vec_unpacks_float_hi,
+	OTI_vec_unpacks_float_lo, OTI_vec_unpacku_float_hi,
+	OTI_vec_unpacku_float_lo, OTI_vec_pack_sfix_trunc and
+	OTI_vec_pack_ufix_trunc.
+	(vec_unpacks_float_hi_optab): Define new macro.
+	(vec_unpacks_float_lo_optab): Ditto.
+	(vec_unpacku_float_hi_optab): Ditto.
+	(vec_unpacku_float_lo_optab): Ditto.
+	(vec_pack_sfix_trunc_optab): Ditto.
+	(vec_pack_ufix_trunc_optab): Ditto.
+	* genopinit.c (optabs): Implement vec_unpack[s|u]_[hi|lo]_optab
+	and vec_pack_[s|u]fix_trunc_optab using
+	vec_unpack[s|u]_[hi\lo]_* and vec_pack_[u|s]fix_trunc_* patterns
+	* tree-vectorizer.c (supportable_widening_operation): Handle
+	FLOAT_EXPR and CONVERT_EXPR.  Update comment.
+	(supportable_narrowing_operation): New function.
+	* tree-vectorizer.h (supportable_narrowing_operation): Prototype.
+	* tree-vect-transform.c (vectorizable_conversion): Handle
+	(nunits_in == nunits_out / 2) and (nunits_out == nunits_in / 2) cases.
+	(vect_gen_widened_results_half): Move before vectorizable_conversion.
+	(vectorizable_type_demotion): Call supportable_narrowing_operation()
+	to check for target support.
+	* optabs.c (optab_for_tree_code) Return vec_unpack[s|u]_float_hi_optab
+	for VEC_UNPACK_FLOAT_HI_EXPR, vec_unpack[s|u]_float_lo_optab
+	for VEC_UNPACK_FLOAT_LO_EXPR and vec_pack_[u|s]fix_trunc_optab
+	for VEC_PACK_FIX_TRUNC_EXPR.
+	(expand_binop): Special case mode of the result for
+	vec_pack_[u|s]fix_trunc_optab.
+	(init_optabs): Initialize vec_unpack[s|u]_[hi|lo]_optab and
+	vec_pack_[u|s]fix_trunc_optab.
+
+	* tree.def (VEC_UNPACK_FLOAT_HI_EXPR, VEC_UNPACK_FLOAT_LO_EXPR,
+	VEC_PACK_FIX_TRUNC_EXPR): New tree codes.
+	* tree-pretty-print.c (dump_generic_node): Handle
+	VEC_UNPACK_FLOAT_HI_EXPR, VEC_UNPACK_FLOAT_LO_EXPR and
+	VEC_PACK_FIX_TRUNC_EXPR.
+	(op_prio): Ditto.
+	* expr.c (expand_expr_real_1): Ditto.
+	* tree-inline.c (estimate_num_insns_1): Ditto.
+	* tree-vect-generic.c (expand_vector_operations_1): Ditto.
+
+	* config/i386/sse.md (vec_unpacks_float_hi_v8hi): New expander.
+	(vec_unpacks_float_lo_v8hi): Ditto.
+	(vec_unpacku_float_hi_v8hi): Ditto.
+	(vec_unpacku_float_lo_v8hi): Ditto.
+	(vec_unpacks_float_hi_v4si): Ditto.
+	(vec_unpacks_float_lo_v4si): Ditto.
+	(vec_pack_sfix_trunc_v2df): Ditto.
+
+	* doc/c-tree.texi (Expression trees) [VEC_UNPACK_FLOAT_HI_EXPR]:
+	Document.
+	[VEC_UNPACK_FLOAT_LO_EXPR]: Ditto.
+	[VEC_PACK_FIX_TRUNC_EXPR]: Ditto.
+	* doc/md.texi (Standard Names) [vec_pack_sfix_trunc]: Document.
+	[vec_pack_ufix_trunc]: Ditto.
+	[vec_unpacks_float_hi]: Ditto.
+	[vec_unpacks_float_lo]: Ditto.
+	[vec_unpacku_float_hi]: Ditto.
+	[vec_unpacku_float_lo]: Ditto.
+
 2007-05-16  Uros Bizjak  <ubizjak@gmail.com>

 	* soft-fp/README: Update for new files.
@ -46,14 +109,15 @@

 2007-05-16  Paolo Bonzini  <bonzini@gnu.org>

-        * config/i386/i386.c (legitimize_tls_address): Mark __tls_get_addr
-        calls as pure.
+	* config/i386/i386.c (legitimize_tls_address): Mark __tls_get_addr
+	calls as pure.

 2007-05-16  Eric Christopher  <echristo@apple.com>

 	* config/rs6000/rs6000.c (rs6000_emit_prologue): Move altivec register
-        saving after stack push. Set sp_offset whenever we push.
-        (rs6000_emit_epilogue): Move altivec register restore before stack push.
+	saving after stack push. Set sp_offset whenever we push.
+	(rs6000_emit_epilogue): Move altivec register restore before
+	stack push.

 2007-05-16  Richard Sandiford  <richard@codesourcery.com>

@ -496,7 +560,7 @@
 	dumps.

 2007-05-08  Sandra Loosemore  <sandra@codesourcery.com>
-            Nigel Stephens  <nigel@mips.com>
+	    Nigel Stephens  <nigel@mips.com>

 	* config/mips/mips.h (MAX_FPRS_PER_FMT): Renamed from FP_INC.
 	Update comments and all uses.
@ -563,7 +627,7 @@
 	* configure: Regenerate.
 	* config.in: Regenerate.

-2007-05-07   Naveen.H.S  <naveen.hs@kpitcummins.com>
+2007-05-07  Naveen.H.S  <naveen.hs@kpitcummins.com>

 	* config/m32c/muldiv.md (mulhisi3_c): Limit the mode of the 2nd
 	operand to HI mode.
@ -1062,7 +1126,7 @@
 	PR middle-end/22156
 	Temporarily revert:
 	2007-04-06  Andreas Tobler  <a.tobler@schweiz.org>
-        * tree-sra.c (sra_build_elt_assignment): Initialize min/maxshift.
+	* tree-sra.c (sra_build_elt_assignment): Initialize min/maxshift.
 	2007-04-05  Alexandre Oliva  <aoliva@redhat.com>
 	* tree-sra.c (try_instantiate_multiple_fields): Needlessly
 	initialize align to silence bogus warning.
@ -1274,17 +1338,17 @@
 	PR tree-optimization/30965
 	PR tree-optimization/30978
 	* Makefile.in (tree-ssa-forwprop.o): Depend on $(FLAGS_H).
-        * tree-ssa-forwprop.c (forward_propagate_into_cond_1): Remove.
-        (find_equivalent_equality_comparison): Likewise.
-        (simplify_cond): Likewise.
-        (get_prop_source_stmt): New helper.
-        (get_prop_dest_stmt): Likewise.
+	* tree-ssa-forwprop.c (forward_propagate_into_cond_1): Remove.
+	(find_equivalent_equality_comparison): Likewise.
+	(simplify_cond): Likewise.
+	(get_prop_source_stmt): New helper.
+	(get_prop_dest_stmt): Likewise.
 	(can_propagate_from): Likewise.
 	(remove_prop_source_from_use): Likewise.
-        (combine_cond_expr_cond): Likewise.
-        (forward_propagate_comparison): New function.
-        (forward_propagate_into_cond): Rewrite to use fold for
-        tree combining.
+	(combine_cond_expr_cond): Likewise.
+	(forward_propagate_comparison): New function.
+	(forward_propagate_into_cond): Rewrite to use fold for
+	tree combining.
 	(tree_ssa_forward_propagate_single_use_vars): Call
 	forward_propagate_comparison to propagate comparisons.

--- a/gcc/config/i386/sse.md
+++ b/gcc/config/i386/sse.md
@ -2205,6 +2205,80 @@
 	    (parallel [(const_int 0) (const_int 1)]))))]
  "TARGET_SSE2")

+(define_expand "vec_unpacks_float_hi_v8hi"
+  [(match_operand:V4SF 0 "register_operand" "")
+   (match_operand:V8HI 1 "register_operand" "")]
+  "TARGET_SSE2"
+{
+  rtx tmp = gen_reg_rtx (V4SImode);
+
+  emit_insn (gen_vec_unpacks_hi_v8hi (tmp, operands[1]));
+  emit_insn (gen_sse2_cvtdq2ps (operands[0], tmp));
+  DONE;
+})
+
+(define_expand "vec_unpacks_float_lo_v8hi"
+  [(match_operand:V4SF 0 "register_operand" "")
+   (match_operand:V8HI 1 "register_operand" "")]
+  "TARGET_SSE2"
+{
+  rtx tmp = gen_reg_rtx (V4SImode);
+
+  emit_insn (gen_vec_unpacks_lo_v8hi (tmp, operands[1]));
+  emit_insn (gen_sse2_cvtdq2ps (operands[0], tmp));
+  DONE;
+})
+
+(define_expand "vec_unpacku_float_hi_v8hi"
+  [(match_operand:V4SF 0 "register_operand" "")
+   (match_operand:V8HI 1 "register_operand" "")]
+  "TARGET_SSE2"
+{
+  rtx tmp = gen_reg_rtx (V4SImode);
+
+  emit_insn (gen_vec_unpacku_hi_v8hi (tmp, operands[1]));
+  emit_insn (gen_sse2_cvtdq2ps (operands[0], tmp));
+  DONE;
+})
+
+(define_expand "vec_unpacku_float_lo_v8hi"
+  [(match_operand:V4SF 0 "register_operand" "")
+   (match_operand:V8HI 1 "register_operand" "")]
+  "TARGET_SSE2"
+{
+  rtx tmp = gen_reg_rtx (V4SImode);
+
+  emit_insn (gen_vec_unpacku_lo_v8hi (tmp, operands[1]));
+  emit_insn (gen_sse2_cvtdq2ps (operands[0], tmp));
+  DONE;
+})
+
+(define_expand "vec_unpacks_float_hi_v4si"
+  [(set (match_dup 2)
+	(vec_select:V4SI
+	  (match_operand:V4SI 1 "nonimmediate_operand" "")
+	  (parallel [(const_int 2)
+		     (const_int 3)
+		     (const_int 2)
+		     (const_int 3)])))
+   (set (match_operand:V2DF 0 "register_operand" "")
+        (float:V2DF
+	  (vec_select:V2SI
+	  (match_dup 2)
+	    (parallel [(const_int 0) (const_int 1)]))))]
+ "TARGET_SSE2"
+{
+ operands[2] = gen_reg_rtx (V4SImode);
+})
+
+(define_expand "vec_unpacks_float_lo_v4si"
+  [(set (match_operand:V2DF 0 "register_operand" "")
+	(float:V2DF
+	  (vec_select:V2SI
+	    (match_operand:V4SI 1 "nonimmediate_operand" "")
+	    (parallel [(const_int 0) (const_int 1)]))))]
+  "TARGET_SSE2")
+
 (define_expand "vec_pack_trunc_v2df"
  [(match_operand:V4SF 0 "register_operand" "")
   (match_operand:V2DF 1 "nonimmediate_operand" "")
@ -2222,6 +2296,25 @@
  DONE;
 })

+(define_expand "vec_pack_sfix_trunc_v2df"
+  [(match_operand:V4SI 0 "register_operand" "")
+   (match_operand:V2DF 1 "nonimmediate_operand" "")
+   (match_operand:V2DF 2 "nonimmediate_operand" "")]
+  "TARGET_SSE2"
+{
+  rtx r1, r2;
+
+  r1 = gen_reg_rtx (V4SImode);
+  r2 = gen_reg_rtx (V4SImode);
+
+  emit_insn (gen_sse2_cvttpd2dq (r1, operands[1]));
+  emit_insn (gen_sse2_cvttpd2dq (r2, operands[2]));
+  emit_insn (gen_sse2_punpcklqdq (gen_lowpart (V2DImode, operands[0]),
+				  gen_lowpart (V2DImode, r1),
+				  gen_lowpart (V2DImode, r2)));
+  DONE;
+})
+
 ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
 ;;
 ;; Parallel double-precision floating point element swizzling
@ -3525,7 +3618,7 @@
  "TARGET_SSE2"
 {
  rtx op1, op2, h1, l1, h2, l2, h3, l3;
-                                                                                
+
  op1 = gen_lowpart (V16QImode, operands[1]);
  op2 = gen_lowpart (V16QImode, operands[2]);
  h1 = gen_reg_rtx (V16QImode);
@ -3534,7 +3627,7 @@
  l2 = gen_reg_rtx (V16QImode);
  h3 = gen_reg_rtx (V16QImode);
  l3 = gen_reg_rtx (V16QImode);
-                                                                                
+
  emit_insn (gen_vec_interleave_highv16qi (h1, op1, op2));
  emit_insn (gen_vec_interleave_lowv16qi (l1, op1, op2));
  emit_insn (gen_vec_interleave_highv16qi (h2, l1, h1));
@ -3544,7 +3637,7 @@
  emit_insn (gen_vec_interleave_lowv16qi (operands[0], l3, h3));
  DONE;
 })
-                                                                                
+
 ;; Reduce:
 ;;      op1 = abcdefgh
 ;;      op2 = ijklmnop
@ -3560,14 +3653,14 @@
  "TARGET_SSE2"
 {
  rtx op1, op2, h1, l1, h2, l2;
-                                                                                
+
  op1 = gen_lowpart (V8HImode, operands[1]);
  op2 = gen_lowpart (V8HImode, operands[2]);
  h1 = gen_reg_rtx (V8HImode);
  l1 = gen_reg_rtx (V8HImode);
  h2 = gen_reg_rtx (V8HImode);
  l2 = gen_reg_rtx (V8HImode);
-                                                                                
+
  emit_insn (gen_vec_interleave_highv8hi (h1, op1, op2));
  emit_insn (gen_vec_interleave_lowv8hi (l1, op1, op2));
  emit_insn (gen_vec_interleave_highv8hi (h2, l1, h1));
@ -3575,7 +3668,7 @@
  emit_insn (gen_vec_interleave_lowv8hi (operands[0], l2, h2));
  DONE;
 })
-                                                                                
+
 ;; Reduce:
 ;;     op1 = abcd
 ;;     op2 = efgh
@ -3589,12 +3682,12 @@
  "TARGET_SSE2"
 {
  rtx op1, op2, h1, l1;
-                                                                                
+
  op1 = gen_lowpart (V4SImode, operands[1]);
  op2 = gen_lowpart (V4SImode, operands[2]);
  h1 = gen_reg_rtx (V4SImode);
  l1 = gen_reg_rtx (V4SImode);
-                                                                                
+
  emit_insn (gen_vec_interleave_highv4si (h1, op1, op2));
  emit_insn (gen_vec_interleave_lowv4si (l1, op1, op2));
  emit_insn (gen_vec_interleave_lowv4si (operands[0], l1, h1));
--- a/gcc/doc/c-tree.texi
+++ b/gcc/doc/c-tree.texi
@ -1983,8 +1983,11 @@ This macro returns the attributes on the type @var{type}.
@tindex VEC_WIDEN_MULT_LO_EXPR
@tindex VEC_UNPACK_HI_EXPR
@tindex VEC_UNPACK_LO_EXPR
+@tindex VEC_UNPACK_FLOAT_HI_EXPR
+@tindex VEC_UNPACK_FLOAT_LO_EXPR
@tindex VEC_PACK_TRUNC_EXPR
@tindex VEC_PACK_SAT_EXPR
+@tindex VEC_PACK_FIX_TRUNC_EXPR
@tindex VEC_EXTRACT_EVEN_EXPR 
@tindex VEC_EXTRACT_ODD_EXPR
@tindex VEC_INTERLEAVE_HIGH_EXPR
@ -2846,6 +2849,17 @@ high @code{N/2} elements of the vector are extracted and widened (promoted).
 In the case of @code{VEC_UNPACK_LO_EXPR} the low @code{N/2} elements of the
 vector are extracted and widened (promoted).

+@item VEC_UNPACK_FLOAT_HI_EXPR
+@item VEC_UNPACK_FLOAT_LO_EXPR
+These nodes represent unpacking of the high and low parts of the input vector,
+where the values are converted from fixed point to floating point.  The
+single operand is a vector that contains @code{N} elements of the same
+integral type.  The result is a vector that contains half as many elements
+of a floating point type whose size is twice as wide.  In the case of
+@code{VEC_UNPACK_HI_EXPR} the high @code{N/2} elements of the vector are
+extracted, converted and widened.  In the case of @code{VEC_UNPACK_LO_EXPR}
+the low @code{N/2} elements of the vector are extracted, converted and widened.
+
@item VEC_PACK_TRUNC_EXPR
 This node represents packing of truncated elements of the two input vectors
 into the output vector.  Input operands are vectors that contain the same
@ -2862,6 +2876,15 @@ vector that contains twice as many elements of an integral type whose size
 is half as wide.  The elements of the two vectors are demoted and merged
 (concatenated) to form the output vector.

+@item VEC_PACK_FIX_TRUNC_EXPR
+This node represents packing of elements of the two input vectors into the
+output vector, where the values are converted from floating point
+to fixed point.  Input operands are vectors that contain the same number
+of elements of a floating point type.  The result is a vector that contains
+twice as many elements of an integral type whose size is half as wide.  The
+elements of the two vectors are merged (concatenated) to form the output
+vector.
+
@item VEC_EXTRACT_EVEN_EXPR
@item VEC_EXTRACT_ODD_EXPR
 These nodes represent extracting of the even/odd elements of the two input 
--- a/gcc/doc/md.texi
+++ b/gcc/doc/md.texi
@ -3607,6 +3607,14 @@ Operand 0 is the resulting vector in which the elements of the two input
 vectors are concatenated after narrowing them down using signed/unsigned
 saturating arithmetic.

+@cindex @code{vec_pack_sfix_trunc_@var{m}} instruction pattern
+@cindex @code{vec_pack_ufix_trunc_@var{m}} instruction pattern
+@item @samp{vec_pack_sfix_trunc_@var{m}}, @samp{vec_pack_ufix_trunc_@var{m}}
+Narrow, convert to signed/unsigned integral type and merge the elements
+of two vectors.  Operands 1 and 2 are vectors of the same mode having N
+floating point elements of size S.  Operand 0 is the resulting vector
+in which 2*N elements of size N/2 are concatenated.
+
@cindex @code{vec_unpacks_hi_@var{m}} instruction pattern
@cindex @code{vec_unpacks_lo_@var{m}} instruction pattern
@item @samp{vec_unpacks_hi_@var{m}}, @samp{vec_unpacks_lo_@var{m}}
@ -3624,11 +3632,24 @@ integral elements.  The input vector (operand 1) has N elements of size S.
 Widen (promote) the high/low elements of the vector using zero extension and
 place the resulting N/2 values of size 2*S in the output vector (operand 0).

+@cindex @code{vec_unpacks_float_hi_@var{m}} instruction pattern
+@cindex @code{vec_unpacks_float_lo_@var{m}} instruction pattern
+@cindex @code{vec_unpacku_float_hi_@var{m}} instruction pattern
+@cindex @code{vec_unpacku_float_lo_@var{m}} instruction pattern
+@item @samp{vec_unpacks_float_hi_@var{m}}, @samp{vec_unpacks_float_lo_@var{m}}
+@itemx @samp{vec_unpacku_float_hi_@var{m}}, @samp{vec_unpacku_float_lo_@var{m}}
+Extract, convert to floating point type and widen the high/low part of a
+vector of signed/unsigned integral elements.  The input vector (operand 1)
+has N elements of size S.  Convert the high/low elements of the vector using
+floating point conversion and place the resulting N/2 values of size 2*S in
+the output vector (operand 0).
+
@cindex @code{vec_widen_umult_hi_@var{m}} instruction pattern
@cindex @code{vec_widen_umult_lo__@var{m}} instruction pattern
@cindex @code{vec_widen_smult_hi_@var{m}} instruction pattern
@cindex @code{vec_widen_smult_lo_@var{m}} instruction pattern
-@item @samp{vec_widen_umult_hi_@var{m}}, @samp{vec_widen_umult_lo_@var{m}}, @samp{vec_widen_smult_hi_@var{m}}, @samp{vec_widen_smult_lo_@var{m}}
+@item @samp{vec_widen_umult_hi_@var{m}}, @samp{vec_widen_umult_lo_@var{m}}
+@itemx @samp{vec_widen_smult_hi_@var{m}}, @samp{vec_widen_smult_lo_@var{m}}
 Signed/Unsigned widening multiplication.  The two inputs (operands 1 and 2)
 are vectors with N signed/unsigned elements of size S.  Multiply the high/low
 elements of the two vectors, and put the N/2 products of size 2*S in the
--- a/gcc/expr.c
+++ b/gcc/expr.c
@ -9001,6 +9001,21 @@ expand_expr_real_1 (tree exp, rtx target, enum machine_mode tmode,
 	return temp;
      }

+    case VEC_UNPACK_FLOAT_HI_EXPR:
+    case VEC_UNPACK_FLOAT_LO_EXPR:
+      {
+	op0 = expand_normal (TREE_OPERAND (exp, 0));
+	/* The signedness is determined from input operand.  */
+	this_optab = optab_for_tree_code (code,
+					  TREE_TYPE (TREE_OPERAND (exp, 0)));
+	temp = expand_widen_pattern_expr
+	  (exp, op0, NULL_RTX, NULL_RTX,
+	   target, TYPE_UNSIGNED (TREE_TYPE (TREE_OPERAND (exp, 0))));
+
+	gcc_assert (temp);
+	return temp;
+      }
+
    case VEC_WIDEN_MULT_HI_EXPR:
    case VEC_WIDEN_MULT_LO_EXPR:
      {
@ -9016,6 +9031,7 @@ expand_expr_real_1 (tree exp, rtx target, enum machine_mode tmode,

    case VEC_PACK_TRUNC_EXPR:
    case VEC_PACK_SAT_EXPR:
+    case VEC_PACK_FIX_TRUNC_EXPR:
      {
 	mode = TYPE_MODE (TREE_TYPE (TREE_OPERAND (exp, 0)));
 	goto binop;
--- a/gcc/genopinit.c
+++ b/gcc/genopinit.c
@ -233,9 +233,15 @@ static const char * const optabs[] =
  "vec_unpacks_lo_optab->handlers[$A].insn_code = CODE_FOR_$(vec_unpacks_lo_$a$)",
  "vec_unpacku_hi_optab->handlers[$A].insn_code = CODE_FOR_$(vec_unpacku_hi_$a$)",
  "vec_unpacku_lo_optab->handlers[$A].insn_code = CODE_FOR_$(vec_unpacku_lo_$a$)",
+  "vec_unpacks_float_hi_optab->handlers[$A].insn_code = CODE_FOR_$(vec_unpacks_float_hi_$a$)",
+  "vec_unpacks_float_lo_optab->handlers[$A].insn_code = CODE_FOR_$(vec_unpacks_float_lo_$a$)",
+  "vec_unpacku_float_hi_optab->handlers[$A].insn_code = CODE_FOR_$(vec_unpacku_float_hi_$a$)",
+  "vec_unpacku_float_lo_optab->handlers[$A].insn_code = CODE_FOR_$(vec_unpacku_float_lo_$a$)",
  "vec_pack_trunc_optab->handlers[$A].insn_code = CODE_FOR_$(vec_pack_trunc_$a$)",
  "vec_pack_ssat_optab->handlers[$A].insn_code = CODE_FOR_$(vec_pack_ssat_$a$)",
-  "vec_pack_usat_optab->handlers[$A].insn_code = CODE_FOR_$(vec_pack_usat_$a$)"
+  "vec_pack_usat_optab->handlers[$A].insn_code = CODE_FOR_$(vec_pack_usat_$a$)",
+  "vec_pack_sfix_trunc_optab->handlers[$A].insn_code = CODE_FOR_$(vec_pack_sfix_trunc_$a$)",
+  "vec_pack_ufix_trunc_optab->handlers[$A].insn_code = CODE_FOR_$(vec_pack_ufix_trunc_$a$)"
 };

 static void gen_insn (rtx);
--- a/gcc/optabs.c
+++ b/gcc/optabs.c
@ -340,12 +340,26 @@ optab_for_tree_code (enum tree_code code, tree type)
      return TYPE_UNSIGNED (type) ? 
 	vec_unpacku_lo_optab : vec_unpacks_lo_optab;

+    case VEC_UNPACK_FLOAT_HI_EXPR:
+      /* The signedness is determined from input operand.  */
+      return TYPE_UNSIGNED (type) ?
+	vec_unpacku_float_hi_optab : vec_unpacks_float_hi_optab;
+
+    case VEC_UNPACK_FLOAT_LO_EXPR:
+      /* The signedness is determined from input operand.  */
+      return TYPE_UNSIGNED (type) ? 
+	vec_unpacku_float_lo_optab : vec_unpacks_float_lo_optab;
+
    case VEC_PACK_TRUNC_EXPR:
      return vec_pack_trunc_optab;

    case VEC_PACK_SAT_EXPR:
      return TYPE_UNSIGNED (type) ? vec_pack_usat_optab : vec_pack_ssat_optab;

+    case VEC_PACK_FIX_TRUNC_EXPR:
+      return TYPE_UNSIGNED (type) ?
+	vec_pack_ufix_trunc_optab : vec_pack_sfix_trunc_optab;
+
    default:
      break;
    }
@ -1375,7 +1389,9 @@ expand_binop (enum machine_mode mode, optab binoptab, rtx op0, rtx op1,

      if (binoptab == vec_pack_trunc_optab 
 	  || binoptab == vec_pack_usat_optab
-          || binoptab == vec_pack_ssat_optab)
+	  || binoptab == vec_pack_ssat_optab
+	  || binoptab == vec_pack_ufix_trunc_optab
+	  || binoptab == vec_pack_sfix_trunc_optab)
 	{
 	  /* The mode of the result is different then the mode of the
 	     arguments.  */
@ -5565,9 +5581,15 @@ init_optabs (void)
  vec_unpacks_lo_optab = init_optab (UNKNOWN);
  vec_unpacku_hi_optab = init_optab (UNKNOWN);
  vec_unpacku_lo_optab = init_optab (UNKNOWN);
+  vec_unpacks_float_hi_optab = init_optab (UNKNOWN);
+  vec_unpacks_float_lo_optab = init_optab (UNKNOWN);
+  vec_unpacku_float_hi_optab = init_optab (UNKNOWN);
+  vec_unpacku_float_lo_optab = init_optab (UNKNOWN);
  vec_pack_trunc_optab = init_optab (UNKNOWN);
  vec_pack_usat_optab = init_optab (UNKNOWN);
  vec_pack_ssat_optab = init_optab (UNKNOWN);
+  vec_pack_ufix_trunc_optab = init_optab (UNKNOWN);
+  vec_pack_sfix_trunc_optab = init_optab (UNKNOWN);

  powi_optab = init_optab (UNKNOWN);

--- a/gcc/optabs.h
+++ b/gcc/optabs.h
@ -298,11 +298,24 @@ enum optab_index
     elements.  */
  OTI_vec_unpacku_hi,
  OTI_vec_unpacku_lo,
+
+  /* Extract, convert to floating point and widen the high/low part of
+     a vector of signed or unsigned integer elements.  */
+  OTI_vec_unpacks_float_hi,
+  OTI_vec_unpacks_float_lo,
+  OTI_vec_unpacku_float_hi,
+  OTI_vec_unpacku_float_lo,
+
  /* Narrow (demote) and merge the elements of two vectors.  */
  OTI_vec_pack_trunc,
  OTI_vec_pack_usat,
  OTI_vec_pack_ssat,

+  /* Convert to signed/unsigned integer, narrow and merge elements
+     of two vectors of floating point elements.  */
+  OTI_vec_pack_sfix_trunc,
+  OTI_vec_pack_ufix_trunc,
+
  /* Perform a raise to the power of integer.  */
  OTI_powi,

@ -446,9 +459,15 @@ extern GTY(()) optab optab_table[OTI_MAX];
 #define vec_unpacks_lo_optab (optab_table[OTI_vec_unpacks_lo])
 #define vec_unpacku_hi_optab (optab_table[OTI_vec_unpacku_hi])
 #define vec_unpacku_lo_optab (optab_table[OTI_vec_unpacku_lo])
+#define vec_unpacks_float_hi_optab (optab_table[OTI_vec_unpacks_float_hi])
+#define vec_unpacks_float_lo_optab (optab_table[OTI_vec_unpacks_float_lo])
+#define vec_unpacku_float_hi_optab (optab_table[OTI_vec_unpacku_float_hi])
+#define vec_unpacku_float_lo_optab (optab_table[OTI_vec_unpacku_float_lo])
 #define vec_pack_trunc_optab (optab_table[OTI_vec_pack_trunc])
 #define vec_pack_ssat_optab (optab_table[OTI_vec_pack_ssat])
 #define vec_pack_usat_optab (optab_table[OTI_vec_pack_usat])
+#define vec_pack_sfix_trunc_optab (optab_table[OTI_vec_pack_sfix_trunc])
+#define vec_pack_ufix_trunc_optab (optab_table[OTI_vec_pack_ufix_trunc])

 #define powi_optab (optab_table[OTI_powi])

--- a/gcc/testsuite/ChangeLog
+++ b/gcc/testsuite/ChangeLog
@ -1,3 +1,16 @@
+2007-05-17  Uros Bizjak  <ubizjak@gmail.com>
+
+	PR tree-optimization/24659
+	* gcc.dg/vect/vect-floatint-conversion-2.c: New test.
+	* gcc.dg/vect/vect-intfloat-conversion-1.c: Require vect_float,
+	not vect_int target.
+	* gcc.dg/vect/vect-intfloat-conversion-2.c: Require vect_float,
+	not vect_int target.  Loop is vectorized for vect_intfloat_cvt
+	targets.
+	* gcc.dg/vect/vect-intfloat-conversion-3.c: New test.
+	* gcc.dg/vect/vect-intfloat-conversion-4a.c: New test.
+	* gcc.dg/vect/vect-intfloat-conversion-4b.c: New test.
+
 2007-05-16  Uros Bizjak  <ubizjak@gmail.com>

 	* gcc.dg/torture/fp-int-convert-float128.c: Do not xfail for i?86-*-*
@ -746,7 +759,7 @@
 	* g++.dg/expr/bitfield8.C: New test.

 2007-04-17  Joseph Myers  <joseph@codesourcery.com>
-            Richard Sandiford  <richard@codesourcery.com>
+	    Richard Sandiford  <richard@codesourcery.com>

 	* lib/target-supports.exp (check_profiling_available): Return 0
 	for uClibc with -p or -pg.
--- a/gcc/testsuite/gcc.dg/vect/vect-floatint-conversion-2.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-floatint-conversion-2.c
@ -0,0 +1,40 @@
+/* { dg-require-effective-target vect_double } */
+
+#include <stdarg.h>
+#include "tree-vect.h"
+
+#define N 32
+
+int
+main1 ()
+{
+  int i;
+  double db[N] = {0.4,3.5,6.6,9.4,12.5,15.6,18.4,21.5,24.6,27.4,30.5,33.6,36.4,39.5,42.6,45.4,0.5,3.6,6.4,9.5,12.6,15.4,18.5,21.6,24.4,27.5,30.6,33.4,36.5,39.6,42.4,45.5};
+  int ia[N];
+
+  /* double -> int */
+  for (i = 0; i < N; i++)
+    {
+      ia[i] = (int) db[i];
+    }
+
+  /* check results:  */
+  for (i = 0; i < N; i++)
+    {
+      if (ia[i] != (int) db[i])
+	abort ();
+    }
+
+  return 0;
+}
+
+int
+main (void)
+{
+  check_vect ();
+
+  return main1 ();
+}
+
+/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" { target vect_floatint_cvt } } } */
+/* { dg-final { cleanup-tree-dump "vect" } } */
--- a/gcc/testsuite/gcc.dg/vect/vect-intfloat-conversion-1.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-intfloat-conversion-1.c
@ -1,4 +1,4 @@
-/* { dg-require-effective-target vect_int } */
+/* { dg-require-effective-target vect_float } */

 #include <stdarg.h>
 #include "tree-vect.h"
--- a/gcc/testsuite/gcc.dg/vect/vect-intfloat-conversion-2.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-intfloat-conversion-2.c
@ -1,4 +1,4 @@
-/* { dg-require-effective-target vect_int } */
+/* { dg-require-effective-target vect_float } */

 #include <stdarg.h>
 #include "tree-vect.h"
@ -36,5 +36,5 @@ int main (void)
  return main1 ();
 }

-/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" { target powerpc*-*-* i?86-*-* x86_64-*-* } } } */
+/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" { target vect_intfloat_cvt } } } */
 /* { dg-final { cleanup-tree-dump "vect" } } */
--- a/gcc/testsuite/gcc.dg/vect/vect-intfloat-conversion-3.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-intfloat-conversion-3.c
@ -0,0 +1,38 @@
+/* { dg-require-effective-target vect_double } */
+
+#include <stdarg.h>
+#include "tree-vect.h"
+
+#define N 32
+
+int main1 ()
+{
+  int i;
+  int ib[N] = {0,3,6,9,12,15,18,21,24,27,30,33,36,39,42,45,0,3,6,9,12,15,18,21,24,27,30,33,36,39,42,45};
+  double da[N];
+
+  /* int -> double */
+  for (i = 0; i < N; i++)
+    {
+      da[i] = (double) ib[i];	
+    }
+
+  /* check results:  */
+  for (i = 0; i < N; i++)
+    {
+      if (da[i] != (double) ib[i]) 
+        abort (); 
+    }   
+
+  return 0;
+}
+
+int main (void)
+{ 
+  check_vect ();
+  
+  return main1 ();
+}
+
+/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" { target vect_intfloat_cvt } } } */
+/* { dg-final { cleanup-tree-dump "vect" } } */
--- a/gcc/testsuite/gcc.dg/vect/vect-intfloat-conversion-4a.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-intfloat-conversion-4a.c
@ -0,0 +1,38 @@
+/* { dg-require-effective-target vect_float } */
+
+#include <stdarg.h>
+#include "tree-vect.h"
+
+#define N 32
+
+int main1 ()
+{
+  int i;
+  short sb[N] = {0,3,6,9,12,15,18,21,24,27,30,33,36,39,42,45,0,-3,-6,-9,-12,-15,-18,-21,-24,-27,-30,-33,-36,-39,-42,-45};
+  float fa[N];
+
+  /* short -> float */
+  for (i = 0; i < N; i++)
+    {
+      fa[i] = (float) sb[i];	
+    }
+
+  /* check results:  */
+  for (i = 0; i < N; i++)
+    {
+      if (fa[i] != (float) sb[i]) 
+        abort (); 
+    }   
+
+  return 0;
+}
+
+int main (void)
+{ 
+  check_vect ();
+  
+  return main1 ();
+}
+
+/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" { target vect_intfloat_cvt } } } */
+/* { dg-final { cleanup-tree-dump "vect" } } */
--- a/gcc/testsuite/gcc.dg/vect/vect-intfloat-conversion-4b.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-intfloat-conversion-4b.c
@ -0,0 +1,38 @@
+/* { dg-require-effective-target vect_float } */
+
+#include <stdarg.h>
+#include "tree-vect.h"
+
+#define N 32
+
+int main1 ()
+{
+  int i;
+  unsigned short usb[N] = {0,3,6,9,12,15,18,21,24,27,30,33,36,39,42,45,0,65533,65530,65527,65524,65521,65518,65515,65512,65509,65506,65503,65500,65497,65494,65491};
+  float fa[N];
+
+  /* unsigned short -> float */
+  for (i = 0; i < N; i++)
+    {
+      fa[i] = (float) usb[i];	
+    }
+
+  /* check results:  */
+  for (i = 0; i < N; i++)
+    {
+      if (fa[i] != (float) usb[i]) 
+        abort (); 
+    }   
+
+  return 0;
+}
+
+int main (void)
+{ 
+  check_vect ();
+  
+  return main1 ();
+}
+
+/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" { target vect_intfloat_cvt } } } */
+/* { dg-final { cleanup-tree-dump "vect" } } */
--- a/gcc/tree-inline.c
+++ b/gcc/tree-inline.c
@ -2148,8 +2148,11 @@ estimate_num_insns_1 (tree *tp, int *walk_subtrees, void *data)
    case VEC_WIDEN_MULT_LO_EXPR:
    case VEC_UNPACK_HI_EXPR:
    case VEC_UNPACK_LO_EXPR:
+    case VEC_UNPACK_FLOAT_HI_EXPR:
+    case VEC_UNPACK_FLOAT_LO_EXPR:
    case VEC_PACK_TRUNC_EXPR:
    case VEC_PACK_SAT_EXPR:
+    case VEC_PACK_FIX_TRUNC_EXPR:

    case WIDEN_MULT_EXPR:

--- a/gcc/tree-pretty-print.c
+++ b/gcc/tree-pretty-print.c
@ -1943,6 +1943,18 @@ dump_generic_node (pretty_printer *buffer, tree node, int spc, int flags,
      pp_string (buffer, " > ");
      break;

+    case VEC_UNPACK_FLOAT_HI_EXPR:
+      pp_string (buffer, " VEC_UNPACK_FLOAT_HI_EXPR < ");
+      dump_generic_node (buffer, TREE_OPERAND (node, 0), spc, flags, false);
+      pp_string (buffer, " > ");
+      break;
+
+    case VEC_UNPACK_FLOAT_LO_EXPR:
+      pp_string (buffer, " VEC_UNPACK_FLOAT_LO_EXPR < ");
+      dump_generic_node (buffer, TREE_OPERAND (node, 0), spc, flags, false);
+      pp_string (buffer, " > ");
+      break;
+
    case VEC_PACK_TRUNC_EXPR:
      pp_string (buffer, " VEC_PACK_TRUNC_EXPR < ");
      dump_generic_node (buffer, TREE_OPERAND (node, 0), spc, flags, false);
@ -1950,7 +1962,7 @@ dump_generic_node (pretty_printer *buffer, tree node, int spc, int flags,
      dump_generic_node (buffer, TREE_OPERAND (node, 1), spc, flags, false);
      pp_string (buffer, " > ");
      break;
-                                                                                
+
    case VEC_PACK_SAT_EXPR:
      pp_string (buffer, " VEC_PACK_SAT_EXPR < ");
      dump_generic_node (buffer, TREE_OPERAND (node, 0), spc, flags, false);
@ -1958,7 +1970,15 @@ dump_generic_node (pretty_printer *buffer, tree node, int spc, int flags,
      dump_generic_node (buffer, TREE_OPERAND (node, 1), spc, flags, false);
      pp_string (buffer, " > ");
      break;
-                                                                                
+
+    case VEC_PACK_FIX_TRUNC_EXPR:
+      pp_string (buffer, " VEC_PACK_FIX_TRUNC_EXPR < ");
+      dump_generic_node (buffer, TREE_OPERAND (node, 0), spc, flags, false);
+      pp_string (buffer, ", ");
+      dump_generic_node (buffer, TREE_OPERAND (node, 1), spc, flags, false);
+      pp_string (buffer, " > ");
+      break;
+
    case BLOCK:
      {
 	tree t;
@ -2352,6 +2372,8 @@ op_prio (tree op)
    case VEC_RSHIFT_EXPR:
    case VEC_UNPACK_HI_EXPR:
    case VEC_UNPACK_LO_EXPR:
+    case VEC_UNPACK_FLOAT_HI_EXPR:
+    case VEC_UNPACK_FLOAT_LO_EXPR:
    case VEC_PACK_TRUNC_EXPR:
    case VEC_PACK_SAT_EXPR:
      return 16;
--- a/gcc/tree-vect-generic.c
+++ b/gcc/tree-vect-generic.c
@ -421,8 +421,11 @@ expand_vector_operations_1 (block_stmt_iterator *bsi)
      || code == VEC_WIDEN_MULT_LO_EXPR
      || code == VEC_UNPACK_HI_EXPR
      || code == VEC_UNPACK_LO_EXPR
+      || code == VEC_UNPACK_FLOAT_HI_EXPR
+      || code == VEC_UNPACK_FLOAT_LO_EXPR
      || code == VEC_PACK_TRUNC_EXPR
-      || code == VEC_PACK_SAT_EXPR)
+      || code == VEC_PACK_SAT_EXPR
+      || code == VEC_PACK_FIX_TRUNC_EXPR)
    type = TREE_TYPE (TREE_OPERAND (rhs, 0));

  /* Optabs will try converting a negation into a subtraction, so
--- a/gcc/tree-vect-transform.c
+++ b/gcc/tree-vect-transform.c
@ -210,7 +210,7 @@ vect_create_addr_base_for_vector_ref (tree stmt,
   accessed in the loop by STMT, along with the def-use update chain to 
   appropriately advance the pointer through the loop iterations. Also set
   aliasing information for the pointer.  This vector pointer is used by the
-   callers to this function to create a memory reference expression for vector 
+   callers to this function to create a memory reference expression for vector
   load/store access.

   Input:
@ -1931,6 +1931,64 @@ vectorizable_call (tree stmt, block_stmt_iterator *bsi, tree *vec_stmt)
 }


+/* Function vect_gen_widened_results_half
+
+   Create a vector stmt whose code, type, number of arguments, and result
+   variable are CODE, VECTYPE, OP_TYPE, and VEC_DEST, and its arguments are 
+   VEC_OPRND0 and VEC_OPRND1. The new vector stmt is to be inserted at BSI.
+   In the case that CODE is a CALL_EXPR, this means that a call to DECL
+   needs to be created (DECL is a function-decl of a target-builtin).
+   STMT is the original scalar stmt that we are vectorizing.  */
+
+static tree
+vect_gen_widened_results_half (enum tree_code code, tree vectype, tree decl,
+                               tree vec_oprnd0, tree vec_oprnd1, int op_type,
+                               tree vec_dest, block_stmt_iterator *bsi,
+			       tree stmt)
+{ 
+  tree expr; 
+  tree new_stmt; 
+  tree new_temp; 
+  tree sym; 
+  ssa_op_iter iter;
+ 
+  /* Generate half of the widened result:  */ 
+  if (code == CALL_EXPR) 
+    {  
+      /* Target specific support  */ 
+      if (op_type == binary_op)
+	expr = build_call_expr (decl, 2, vec_oprnd0, vec_oprnd1);
+      else
+	expr = build_call_expr (decl, 1, vec_oprnd0);
+    } 
+  else 
+    { 
+      /* Generic support */ 
+      gcc_assert (op_type == TREE_CODE_LENGTH (code)); 
+      if (op_type == binary_op) 
+        expr = build2 (code, vectype, vec_oprnd0, vec_oprnd1); 
+      else  
+        expr = build1 (code, vectype, vec_oprnd0); 
+    } 
+  new_stmt = build_gimple_modify_stmt (vec_dest, expr);
+  new_temp = make_ssa_name (vec_dest, new_stmt); 
+  GIMPLE_STMT_OPERAND (new_stmt, 0) = new_temp; 
+  vect_finish_stmt_generation (stmt, new_stmt, bsi); 
+
+  if (code == CALL_EXPR)
+    {
+      FOR_EACH_SSA_TREE_OPERAND (sym, new_stmt, iter, SSA_OP_ALL_VIRTUALS)
+        {
+          if (TREE_CODE (sym) == SSA_NAME)
+            sym = SSA_NAME_VAR (sym);
+          mark_sym_for_renaming (sym);
+        }
+    }
+
+  return new_stmt;
+}
+
+
 /* Function vectorizable_conversion.

 Check if STMT performs a conversion operation, that can be vectorized. 
@ -1946,21 +2004,24 @@ vectorizable_conversion (tree stmt, block_stmt_iterator * bsi,
  tree scalar_dest;
  tree operation;
  tree op0;
-  tree vec_oprnd0 = NULL_TREE;
+  tree vec_oprnd0 = NULL_TREE, vec_oprnd1 = NULL_TREE;
  stmt_vec_info stmt_info = vinfo_for_stmt (stmt);
  loop_vec_info loop_vinfo = STMT_VINFO_LOOP_VINFO (stmt_info);
-  enum tree_code code;
+  enum tree_code code, code1 = CODE_FOR_nothing, code2 = CODE_FOR_nothing;
+  tree decl1 = NULL_TREE, decl2 = NULL_TREE;
  tree new_temp;
  tree def, def_stmt;
  enum vect_def_type dt0;
  tree new_stmt;
+  stmt_vec_info prev_stmt_info;
  int nunits_in;
  int nunits_out;
-  int ncopies, j;
  tree vectype_out, vectype_in;
+  int ncopies, j;
+  tree expr;
  tree rhs_type, lhs_type;
  tree builtin_decl;
-  stmt_vec_info prev_stmt_info;
+  enum { NARROW, NONE, WIDEN } modifier;

  /* Is STMT a vectorizable conversion?   */

@ -1998,23 +2059,36 @@ vectorizable_conversion (tree stmt, block_stmt_iterator * bsi,
  scalar_dest = GIMPLE_STMT_OPERAND (stmt, 0);
  lhs_type = TREE_TYPE (scalar_dest);
  vectype_out = get_vectype_for_scalar_type (lhs_type);
-  gcc_assert (STMT_VINFO_VECTYPE (stmt_info) == vectype_out);
  nunits_out = TYPE_VECTOR_SUBPARTS (vectype_out);

-  /* FORNOW: need to extend to support short<->float conversions as well.  */
-  if (nunits_out != nunits_in)
+  /* FORNOW */
+  if (nunits_in == nunits_out / 2)
+    modifier = NARROW;
+  else if (nunits_out == nunits_in)
+    modifier = NONE;
+  else if (nunits_out == nunits_in / 2)
+    modifier = WIDEN;
+  else
    return false;

+  if (modifier == NONE)
+    gcc_assert (STMT_VINFO_VECTYPE (stmt_info) == vectype_out);
+
  /* Bail out if the types are both integral or non-integral */
  if ((INTEGRAL_TYPE_P (rhs_type) && INTEGRAL_TYPE_P (lhs_type))
      || (!INTEGRAL_TYPE_P (rhs_type) && !INTEGRAL_TYPE_P (lhs_type)))
    return false;

+  if (modifier == NARROW)
+    ncopies = LOOP_VINFO_VECT_FACTOR (loop_vinfo) / nunits_out;
+  else
+    ncopies = LOOP_VINFO_VECT_FACTOR (loop_vinfo) / nunits_in;
+
  /* Sanity check: make sure that at least one copy of the vectorized stmt
     needs to be generated.  */
-  ncopies = LOOP_VINFO_VECT_FACTOR (loop_vinfo) / nunits_in;
  gcc_assert (ncopies >= 1);

+  /* Check the operands of the operation.  */
  if (!vect_is_simple_use (op0, loop_vinfo, &def_stmt, &def, &dt0))
    {
      if (vect_print_dump_info (REPORT_DETAILS))
@ -2023,21 +2097,31 @@ vectorizable_conversion (tree stmt, block_stmt_iterator * bsi,
    }

  /* Supportable by target?  */
-  if (!targetm.vectorize.builtin_conversion (code, vectype_in))
+  if ((modifier == NONE
+       && !targetm.vectorize.builtin_conversion (code, vectype_in))
+      || (modifier == WIDEN
+	  && !supportable_widening_operation (code, stmt, vectype_in,
+					      &decl1, &decl2,
+					      &code1, &code2))
+      || (modifier == NARROW
+	  && !supportable_narrowing_operation (code, stmt, vectype_in,
+					       &code1)))
    {
      if (vect_print_dump_info (REPORT_DETAILS))
        fprintf (vect_dump, "op not supported by target.");
      return false;
    }

+  if (modifier != NONE)
+    STMT_VINFO_VECTYPE (stmt_info) = vectype_in;
+
  if (!vec_stmt)		/* transformation not required.  */
    {
      STMT_VINFO_TYPE (stmt_info) = type_conversion_vec_info_type;
      return true;
    }

-    /** Transform.  **/
-
+  /** Transform.  **/
  if (vect_print_dump_info (REPORT_DETAILS))
    fprintf (vect_dump, "transform conversion.");

@ -2045,37 +2129,113 @@ vectorizable_conversion (tree stmt, block_stmt_iterator * bsi,
  vec_dest = vect_create_destination_var (scalar_dest, vectype_out);

  prev_stmt_info = NULL;
-  for (j = 0; j < ncopies; j++)
+  switch (modifier)
    {
-      tree sym;
-      ssa_op_iter iter;
+    case NONE:
+      for (j = 0; j < ncopies; j++)
+	{
+	  tree sym;
+	  ssa_op_iter iter;

-      if (j == 0)
-	vec_oprnd0 = vect_get_vec_def_for_operand (op0, stmt, NULL);
-      else
-	vec_oprnd0 = vect_get_vec_def_for_stmt_copy (dt0, vec_oprnd0);
+	  if (j == 0)
+	    vec_oprnd0 = vect_get_vec_def_for_operand (op0, stmt, NULL);
+	  else
+	    vec_oprnd0 = vect_get_vec_def_for_stmt_copy (dt0, vec_oprnd0);

-      builtin_decl =
-	targetm.vectorize.builtin_conversion (code, vectype_in);
-      new_stmt = build_call_expr (builtin_decl, 1, vec_oprnd0);
+	  builtin_decl =
+	    targetm.vectorize.builtin_conversion (code, vectype_in);
+	  new_stmt = build_call_expr (builtin_decl, 1, vec_oprnd0);

-      /* Arguments are ready. create the new vector stmt.  */
-      new_stmt = build_gimple_modify_stmt (vec_dest, new_stmt);
-      new_temp = make_ssa_name (vec_dest, new_stmt);
-      GIMPLE_STMT_OPERAND (new_stmt, 0) = new_temp;
-      vect_finish_stmt_generation (stmt, new_stmt, bsi);
-      FOR_EACH_SSA_TREE_OPERAND (sym, new_stmt, iter, SSA_OP_ALL_VIRTUALS)
-        {
-          if (TREE_CODE (sym) == SSA_NAME)
-            sym = SSA_NAME_VAR (sym);
-          mark_sym_for_renaming (sym);
-        }
+	  /* Arguments are ready. create the new vector stmt.  */
+	  new_stmt = build_gimple_modify_stmt (vec_dest, new_stmt);
+	  new_temp = make_ssa_name (vec_dest, new_stmt);
+	  GIMPLE_STMT_OPERAND (new_stmt, 0) = new_temp;
+	  vect_finish_stmt_generation (stmt, new_stmt, bsi);
+	  FOR_EACH_SSA_TREE_OPERAND (sym, new_stmt, iter, SSA_OP_ALL_VIRTUALS)
+	    {
+	      if (TREE_CODE (sym) == SSA_NAME)
+		sym = SSA_NAME_VAR (sym);
+	      mark_sym_for_renaming (sym);
+	    }

-      if (j == 0)
-	STMT_VINFO_VEC_STMT (stmt_info) = *vec_stmt = new_stmt;
-      else
-	STMT_VINFO_RELATED_STMT (prev_stmt_info) = new_stmt;
-      prev_stmt_info = vinfo_for_stmt (new_stmt);
+	  if (j == 0)
+	    STMT_VINFO_VEC_STMT (stmt_info) = *vec_stmt = new_stmt;
+	  else
+	    STMT_VINFO_RELATED_STMT (prev_stmt_info) = new_stmt;
+	  prev_stmt_info = vinfo_for_stmt (new_stmt);
+	}
+      break;
+
+    case WIDEN:
+      /* In case the vectorization factor (VF) is bigger than the number
+	 of elements that we can fit in a vectype (nunits), we have to
+	 generate more than one vector stmt - i.e - we need to "unroll"
+	 the vector stmt by a factor VF/nunits.  */
+      for (j = 0; j < ncopies; j++)
+	{
+	  if (j == 0)
+	    vec_oprnd0 = vect_get_vec_def_for_operand (op0, stmt, NULL);
+	  else
+	    vec_oprnd0 = vect_get_vec_def_for_stmt_copy (dt0, vec_oprnd0);
+
+	  STMT_VINFO_VECTYPE (stmt_info) = vectype_in;
+
+	  /* Generate first half of the widened result:  */
+	  new_stmt
+	    = vect_gen_widened_results_half (code1, vectype_out, decl1, 
+					     vec_oprnd0, vec_oprnd1,
+					     unary_op, vec_dest, bsi, stmt);
+	  if (j == 0)
+	    STMT_VINFO_VEC_STMT (stmt_info) = new_stmt;
+	  else
+	    STMT_VINFO_RELATED_STMT (prev_stmt_info) = new_stmt;
+	  prev_stmt_info = vinfo_for_stmt (new_stmt);
+
+	  /* Generate second half of the widened result:  */
+	  new_stmt
+	    = vect_gen_widened_results_half (code2, vectype_out, decl2,
+					     vec_oprnd0, vec_oprnd1,
+					     unary_op, vec_dest, bsi, stmt);
+	  STMT_VINFO_RELATED_STMT (prev_stmt_info) = new_stmt;
+	  prev_stmt_info = vinfo_for_stmt (new_stmt);
+	}
+      break;
+
+    case NARROW:
+      /* In case the vectorization factor (VF) is bigger than the number
+	 of elements that we can fit in a vectype (nunits), we have to
+	 generate more than one vector stmt - i.e - we need to "unroll"
+	 the vector stmt by a factor VF/nunits.  */
+      for (j = 0; j < ncopies; j++)
+	{
+	  /* Handle uses.  */
+	  if (j == 0)
+	    {
+	      vec_oprnd0 = vect_get_vec_def_for_operand (op0, stmt, NULL);
+	      vec_oprnd1 = vect_get_vec_def_for_stmt_copy (dt0, vec_oprnd0);
+	    }
+	  else
+	    {
+	      vec_oprnd0 = vect_get_vec_def_for_stmt_copy (dt0, vec_oprnd1);
+	      vec_oprnd1 = vect_get_vec_def_for_stmt_copy (dt0, vec_oprnd0);
+	    }
+
+	  /* Arguments are ready. Create the new vector stmt.  */
+	  expr = build2 (code1, vectype_out, vec_oprnd0, vec_oprnd1);
+	  new_stmt = build_gimple_modify_stmt (vec_dest, expr);
+	  new_temp = make_ssa_name (vec_dest, new_stmt);
+	  GIMPLE_STMT_OPERAND (new_stmt, 0) = new_temp;
+	  vect_finish_stmt_generation (stmt, new_stmt, bsi);
+
+	  if (j == 0)
+	    STMT_VINFO_VEC_STMT (stmt_info) = new_stmt;
+	  else
+	    STMT_VINFO_RELATED_STMT (prev_stmt_info) = new_stmt;
+
+	  prev_stmt_info = vinfo_for_stmt (new_stmt);
+	}
+
+      *vec_stmt = STMT_VINFO_VEC_STMT (stmt_info);
    }
  return true;
 }
@ -2525,7 +2685,7 @@ vectorizable_operation (tree stmt, block_stmt_iterator *bsi, tree *vec_stmt)

 bool
 vectorizable_type_demotion (tree stmt, block_stmt_iterator *bsi,
-                             tree *vec_stmt)
+			    tree *vec_stmt)
 {
  tree vec_dest;
  tree scalar_dest;
@ -2534,7 +2694,7 @@ vectorizable_type_demotion (tree stmt, block_stmt_iterator *bsi,
  tree vec_oprnd0=NULL, vec_oprnd1=NULL;
  stmt_vec_info stmt_info = vinfo_for_stmt (stmt);
  loop_vec_info loop_vinfo = STMT_VINFO_LOOP_VINFO (stmt_info);
-  enum tree_code code;
+  enum tree_code code, code1 = CODE_FOR_nothing;
  tree new_temp;
  tree def, def_stmt;
  enum vect_def_type dt0;
@ -2548,8 +2708,6 @@ vectorizable_type_demotion (tree stmt, block_stmt_iterator *bsi,
  tree expr;
  tree vectype_in;
  tree scalar_type;
-  optab optab;
-  enum machine_mode vec_mode;

  if (!STMT_VINFO_RELEVANT_P (stmt_info))
    return false;
@ -2607,13 +2765,7 @@ vectorizable_type_demotion (tree stmt, block_stmt_iterator *bsi,
    }

  /* Supportable by target?  */
-  code = VEC_PACK_TRUNC_EXPR;
-  optab = optab_for_tree_code (code, vectype_in);
-  if (!optab)
-    return false;
-
-  vec_mode = TYPE_MODE (vectype_in);
-  if (optab->handlers[(int) vec_mode].insn_code == CODE_FOR_nothing)
+  if (!supportable_narrowing_operation (code, stmt, vectype_in, &code1))
    return false;

  STMT_VINFO_VECTYPE (stmt_info) = vectype_in;
@ -2652,7 +2804,7 @@ vectorizable_type_demotion (tree stmt, block_stmt_iterator *bsi,
 	}

      /* Arguments are ready. Create the new vector stmt.  */
-      expr = build2 (code, vectype_out, vec_oprnd0, vec_oprnd1);
+      expr = build2 (code1, vectype_out, vec_oprnd0, vec_oprnd1);
      new_stmt = build_gimple_modify_stmt (vec_dest, expr);
      new_temp = make_ssa_name (vec_dest, new_stmt);
      GIMPLE_STMT_OPERAND (new_stmt, 0) = new_temp;
@ -2671,64 +2823,6 @@ vectorizable_type_demotion (tree stmt, block_stmt_iterator *bsi,
 }


-/* Function vect_gen_widened_results_half
-
-   Create a vector stmt whose code, type, number of arguments, and result
-   variable are CODE, VECTYPE, OP_TYPE, and VEC_DEST, and its arguments are 
-   VEC_OPRND0 and VEC_OPRND1. The new vector stmt is to be inserted at BSI.
-   In the case that CODE is a CALL_EXPR, this means that a call to DECL
-   needs to be created (DECL is a function-decl of a target-builtin).
-   STMT is the original scalar stmt that we are vectorizing.  */
-
-static tree
-vect_gen_widened_results_half (enum tree_code code, tree vectype, tree decl,
-                               tree vec_oprnd0, tree vec_oprnd1, int op_type,
-                               tree vec_dest, block_stmt_iterator *bsi,
-			       tree stmt)
-{ 
-  tree expr; 
-  tree new_stmt; 
-  tree new_temp; 
-  tree sym; 
-  ssa_op_iter iter;
- 
-  /* Generate half of the widened result:  */ 
-  if (code == CALL_EXPR) 
-    {  
-      /* Target specific support  */ 
-      if (op_type == binary_op)
-	expr = build_call_expr (decl, 2, vec_oprnd0, vec_oprnd1);
-      else
-	expr = build_call_expr (decl, 1, vec_oprnd0);
-    } 
-  else 
-    { 
-      /* Generic support */ 
-      gcc_assert (op_type == TREE_CODE_LENGTH (code)); 
-      if (op_type == binary_op) 
-        expr = build2 (code, vectype, vec_oprnd0, vec_oprnd1); 
-      else  
-        expr = build1 (code, vectype, vec_oprnd0); 
-    } 
-  new_stmt = build_gimple_modify_stmt (vec_dest, expr);
-  new_temp = make_ssa_name (vec_dest, new_stmt); 
-  GIMPLE_STMT_OPERAND (new_stmt, 0) = new_temp; 
-  vect_finish_stmt_generation (stmt, new_stmt, bsi); 
-
-  if (code == CALL_EXPR)
-    {
-      FOR_EACH_SSA_TREE_OPERAND (sym, new_stmt, iter, SSA_OP_ALL_VIRTUALS)
-        {
-          if (TREE_CODE (sym) == SSA_NAME)
-            sym = SSA_NAME_VAR (sym);
-          mark_sym_for_renaming (sym);
-        }
-    }
-
-  return new_stmt;
-}
-
-
 /* Function vectorizable_type_promotion

   Check if STMT performs a binary or unary operation that involves
@ -2785,7 +2879,8 @@ vectorizable_type_promotion (tree stmt, block_stmt_iterator *bsi,

  operation = GIMPLE_STMT_OPERAND (stmt, 1);
  code = TREE_CODE (operation);
-  if (code != NOP_EXPR && code != WIDEN_MULT_EXPR)
+  if (code != NOP_EXPR && code != CONVERT_EXPR
+      && code != WIDEN_MULT_EXPR)
    return false;

  op0 = TREE_OPERAND (operation, 0);
--- a/gcc/tree-vectorizer.c
+++ b/gcc/tree-vectorizer.c
@ -1736,10 +1736,10 @@ vect_is_simple_use (tree operand, loop_vec_info loop_vinfo, tree *def_stmt,
   widening operation that is supported by the target platform in 
   vector form (i.e., when operating on arguments of type VECTYPE).
    
-   The two kinds of widening operations we currently support are
-   NOP and WIDEN_MULT. This function checks if these operations
-   are supported by the target platform either directly (via vector 
-   tree-codes), or via target builtins.
+   Widening operations we currently support are NOP (CONVERT), FLOAT
+   and WIDEN_MULT.  This function checks if these operations are supported
+   by the target platform either directly (via vector tree-codes), or via
+   target builtins.

   Output:
   - CODE1 and CODE2 are codes of vector operations to be used when 
@ -1815,6 +1815,7 @@ supportable_widening_operation (enum tree_code code, tree stmt, tree vectype,
      break;

    case NOP_EXPR:
+    case CONVERT_EXPR:
      if (BYTES_BIG_ENDIAN)
        {
          c1 = VEC_UNPACK_HI_EXPR;
@ -1827,6 +1828,19 @@ supportable_widening_operation (enum tree_code code, tree stmt, tree vectype,
        }
      break;

+    case FLOAT_EXPR:
+      if (BYTES_BIG_ENDIAN)
+        {
+          c1 = VEC_UNPACK_FLOAT_HI_EXPR;
+          c2 = VEC_UNPACK_FLOAT_LO_EXPR;
+        }
+      else
+        {
+          c2 = VEC_UNPACK_FLOAT_HI_EXPR;
+          c1 = VEC_UNPACK_FLOAT_LO_EXPR;
+        }
+      break;
+
    default:
      gcc_unreachable ();
    }
@ -1851,6 +1865,63 @@ supportable_widening_operation (enum tree_code code, tree stmt, tree vectype,
 }


+/* Function supportable_narrowing_operation
+
+   Check whether an operation represented by the code CODE is a 
+   narrowing operation that is supported by the target platform in 
+   vector form (i.e., when operating on arguments of type VECTYPE).
+    
+   Narrowing operations we currently support are NOP (CONVERT) and
+   FIX_TRUNC. This function checks if these operations are supported by
+   the target platform directly via vector tree-codes.
+
+   Output:
+   - CODE1 is the code of a vector operation to be used when 
+   vectorizing the operation, if available.  */
+
+bool
+supportable_narrowing_operation (enum tree_code code,
+				 tree stmt, tree vectype,
+				 enum tree_code *code1)
+{
+  enum machine_mode vec_mode;
+  enum insn_code icode1;
+  optab optab1;
+  tree expr = GIMPLE_STMT_OPERAND (stmt, 1);
+  tree type = TREE_TYPE (expr);
+  tree narrow_vectype = get_vectype_for_scalar_type (type);
+  enum tree_code c1;
+
+  switch (code)
+    {
+    case NOP_EXPR:
+    case CONVERT_EXPR:
+      c1 = VEC_PACK_TRUNC_EXPR;
+      break;
+
+    case FIX_TRUNC_EXPR:
+      c1 = VEC_PACK_FIX_TRUNC_EXPR;
+      break;
+
+    default:
+      gcc_unreachable ();
+    }
+
+  *code1 = c1;
+  optab1 = optab_for_tree_code (c1, vectype);
+
+  if (!optab1)
+    return false;
+
+  vec_mode = TYPE_MODE (vectype);
+  if ((icode1 = optab1->handlers[(int) vec_mode].insn_code) == CODE_FOR_nothing
+      || insn_data[icode1].operand[0].mode != TYPE_MODE (narrow_vectype))
+    return false;
+
+  return true;
+}
+
+
 /* Function reduction_code_for_scalar_code

   Input:
--- a/gcc/tree-vectorizer.h
+++ b/gcc/tree-vectorizer.h
@ -398,6 +398,9 @@ extern enum dr_alignment_support vect_supportable_dr_alignment
 extern bool reduction_code_for_scalar_code (enum tree_code, enum tree_code *);
 extern bool supportable_widening_operation (enum tree_code, tree, tree,
  tree *, tree *, enum tree_code *, enum tree_code *);
+extern bool supportable_narrowing_operation (enum tree_code, tree, tree,
+					     enum tree_code *);
+
 /* Creation and deletion of loop and stmt info structs.  */
 extern loop_vec_info new_loop_vec_info (struct loop *loop);
 extern void destroy_loop_vec_info (loop_vec_info);
--- a/gcc/tree.def
+++ b/gcc/tree.def
@ -1085,13 +1085,20 @@ DEFTREECODE (GIMPLE_MODIFY_STMT, "gimple_modify_stmt", tcc_gimple_stmt, 2)
 DEFTREECODE (VEC_WIDEN_MULT_HI_EXPR, "widen_mult_hi_expr", tcc_binary, 2)
 DEFTREECODE (VEC_WIDEN_MULT_LO_EXPR, "widen_mult_hi_expr", tcc_binary, 2)

-/* Unpack (extract and promote/widen) the high/low elements of the input vector
-   into the output vector. The input vector has twice as many elements
-   as the output vector, that are half the size of the elements
+/* Unpack (extract and promote/widen) the high/low elements of the input
+   vector into the output vector.  The input vector has twice as many
+   elements as the output vector, that are half the size of the elements
   of the output vector.  This is used to support type promotion. */
 DEFTREECODE (VEC_UNPACK_HI_EXPR, "vec_unpack_hi_expr", tcc_unary, 1)
 DEFTREECODE (VEC_UNPACK_LO_EXPR, "vec_unpack_lo_expr", tcc_unary, 1)

+/* Unpack (extract) the high/low elements of the input vector, convert
+   fixed point values to floating point and widen elements into the
+   output vector.  The input vector has twice as many elements as the output
+   vector, that are half the size of the elements of the output vector.  */
+DEFTREECODE (VEC_UNPACK_FLOAT_HI_EXPR, "vec_unpack_float_hi_expr", tcc_unary, 1)
+DEFTREECODE (VEC_UNPACK_FLOAT_LO_EXPR, "vec_unpack_float_lo_expr", tcc_unary, 1)
+
 /* Pack (demote/narrow and merge) the elements of the two input vectors
   into the output vector using truncation/saturation.
   The elements of the input vectors are twice the size of the elements of the
@ -1099,6 +1106,12 @@ DEFTREECODE (VEC_UNPACK_LO_EXPR, "vec_unpack_lo_expr", tcc_unary, 1)
 DEFTREECODE (VEC_PACK_TRUNC_EXPR, "vec_pack_trunc_expr", tcc_binary, 2)
 DEFTREECODE (VEC_PACK_SAT_EXPR, "vec_pack_sat_expr", tcc_binary, 2)

+/* Convert floating point values of the two input vectors to integer
+   and pack (narrow and merge) the elements into the output vector. The
+   elements of the input vector are twice the size of the elements of
+   the output vector.  */
+DEFTREECODE (VEC_PACK_FIX_TRUNC_EXPR, "vec_pack_fix_trunc_expr", tcc_binary, 2)
+
 /* Extract even/odd fields from vectors.  */
 DEFTREECODE (VEC_EXTRACT_EVEN_EXPR, "vec_extracteven_expr", tcc_binary, 2)
 DEFTREECODE (VEC_EXTRACT_ODD_EXPR, "vec_extractodd_expr", tcc_binary, 2)