[llvm-commits] [PATCH] Fix PR11334

Mon Jul 30 16:01:13 PDT 2012

Hi

Please find the revised patch attached. Changes includes

+ a minor fix to quit if a non-constant index is found in
extract-vector-elt node
+ rebase the code to the current trunk code

The reason why I choose to recover the scalarized FP_EXTEND back instead
of do a pre-type-legalization transformation directly from FP_EXTEND to
X86ISD::VFPEXT is to leverage the existing target-independent DAG
optimization as much as possible till FP_EXTEND is scalaried into
BUILD_VECTOR. Even though it makes the code a little complicated
(recognizing FP_EXTEND pattern from the BUILD_VECTOR and series of
scalarized EXTRACT_VEC_ELTs), it won't break the optimization like
(fp_round (fp_extend x)) -> x and etc. It also supports extending v4f32
-> v4f64 better as well as from v3f32 to v3f64.

Please commit if it looks OK.

Thanks
- Michael

On Wed, 2012-07-25 at 13:31 -0700, Michael Liao wrote:
> Typo
> 
> On Wed, 2012-07-25 at 13:29 -0700, Michael Liao wrote:
> > On Wed, 2012-07-25 at 13:24 -0700, Rotem, Nadav wrote:
> > > >>In fact, the real root cause from my understanding is that ISD::FP_EXTEND (including others as well) has the constraint that the input and output vectors must have matching >>element numbers. 'v2f32' is not legal on x86 and there is way to construct a legal FP_EXTEND from
> > > >>v2f32 to v2f64. This lead to the scalarization of FP_EXTEND during type legalization. The added optimization is to recover it back and re-construct that extending using a target->>specific without that constrain.
> > > 
> > > Yes. But you don’t need to reconstruct the vector, if you can handle it before it gets scalarized.  All you have to do is transform the FP_EXTEND node to your own X86ISD node. 
> > > The inputs to your ISD nodes would be v4f32, and the output would be v2f64. 
> > 
> > The optimization is only targeted to optimize the case for FP_EXTEND. In
> 
> is not only target to optimize this bug.
> 
> > case of a user code constructs the similar pattern, it would be
> > optimized as well. Note the extra shuffle node in the patch, if pattern
> > constructed including non-identify shuffle (by constructing a series of
> > extract elements), it could be optimized as well to construct a shuffle
> > followed by a conversion.
> > 
> > > 
> > > >> For <3 x float>, it will be legailized (widened) into v4f32. The test included verified that.
> > > 
> > > Right, the question is, how important is it to support this type ? Because, if we handle FP_EXTEND before type-legalization, then it would be a bit more difficult to handle this type.
> > > 
> > 
> > <3 x float> is solved as by-product. The main case is to fix <2 x
> > float>. If the user code uses <2 x double> for most cases but only
> > several places need converting from float. It's better to use <2 x
> > float> for that values.
> > 
> > Yours
> > - Michael
> > 
> > _______________________________________________
> > llvm-commits mailing list
> > llvm-commits at cs.uiuc.edu
> > http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits
> 
> 
> _______________________________________________
> llvm-commits mailing list
> llvm-commits at cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits

-------------- next part --------------
A non-text attachment was scrubbed...
Name: 0001-fix-PR11334.patch
Type: text/x-patch
Size: 10391 bytes
Desc: not available
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20120730/f47b10be/attachment.bin>