[llvm-commits] [PATCH] Fix PR11334

Wed Jul 25 13:29:51 PDT 2012

On Wed, 2012-07-25 at 13:24 -0700, Rotem, Nadav wrote:
> >>In fact, the real root cause from my understanding is that ISD::FP_EXTEND (including others as well) has the constraint that the input and output vectors must have matching >>element numbers. 'v2f32' is not legal on x86 and there is way to construct a legal FP_EXTEND from
> >>v2f32 to v2f64. This lead to the scalarization of FP_EXTEND during type legalization. The added optimization is to recover it back and re-construct that extending using a target->>specific without that constrain.
> 
> Yes. But you don’t need to reconstruct the vector, if you can handle it before it gets scalarized.  All you have to do is transform the FP_EXTEND node to your own X86ISD node. 
> The inputs to your ISD nodes would be v4f32, and the output would be v2f64. 

The optimization is only targeted to optimize the case for FP_EXTEND. In
case of a user code constructs the similar pattern, it would be
optimized as well. Note the extra shuffle node in the patch, if pattern
constructed including non-identify shuffle (by constructing a series of
extract elements), it could be optimized as well to construct a shuffle
followed by a conversion.

> 
> >> For <3 x float>, it will be legailized (widened) into v4f32. The test included verified that.
> 
> Right, the question is, how important is it to support this type ? Because, if we handle FP_EXTEND before type-legalization, then it would be a bit more difficult to handle this type.
> 

<3 x float> is solved as by-product. The main case is to fix <2 x
float>. If the user code uses <2 x double> for most cases but only
several places need converting from float. It's better to use <2 x
float> for that values.

Yours
- Michael