[PATCH] D50791: [SelectionDAG] unroll unsupported vector FP ops earlier to avoid libcalls on undef elements (PR38527)

Wed Aug 15 16:40:06 PDT 2018

hfinkel added a comment.

In https://reviews.llvm.org/D50791#1201524, @spatel wrote:

> In https://reviews.llvm.org/D50791#1201420, @craig.topper wrote:
>
> > Thinking forward to a future where we have SVML library support for v4f32 and a v8f32 sin/cos. What should we do for v5f32?
>
>
> Not sure. Over in:
>  https://bugs.llvm.org/show_bug.cgi?id=38528
>  ...we discussed transforming to veclib calls late in IR, so any special lib support would already be taken care of before we reach this point?

True. But it probably also makes sense to handle them in SDAG so that targets can also provide custom lowering.

> Another note about the current patch: I did try sinking the expansion check directly into WidenVecRes_Unary(N), so it would apply to all of the unary opcodes in the switch rather than just the FP libcall opcodes. But that results in a regression in test/CodeGen/X86/bswap-vector.ll because we scalarize something that currently gets turned into a shuffle.

================
Comment at: lib/CodeGen/SelectionDAG/LegalizeVectorTypes.cpp:2445
+    EVT VT = N->getValueType(0);
+    EVT WideVecVT = TLI.getTypeToTransformTo(*DAG.getContext(), VT);
+    if (TLI.isOperationExpand(N->getOpcode(), WideVecVT) &&
----------------
spatel wrote:
> efriedma wrote:
> > spatel wrote:
> > > efriedma wrote:
> > > > I think you need to call TLI.getRegisterType() rather than getTypeToTransformTo()?  Consider what happens to, for example, `<5 x f32>` on AArch64.
> > > I initially didn't step through that test since it showed no diffs, but here's what I see in the debug output for that case:
> > > VT = v5f32
> > > WideVecVT = v8f32 (is it expected that we ask to widen to the next pow-of-2 size?)
> > > 
> > > ```
> > > Legalizing node: t12: v5f32 = fsin t11
> > > Analyzing result type: v5f32
> > > Widen node result 0: t12: v5f32 = fsin t11
> > > 
> > > Creating new node: t36: f32 = fsin t2
> > > Creating new node: t37: f32 = fsin t4
> > > Creating new node: t38: f32 = fsin t6
> > > Creating new node: t39: f32 = fsin t8
> > > Creating new node: t40: f32 = fsin t10
> > > Creating new node: t41: v8f32 = BUILD_VECTOR t36, t37, t38, t39, t40, undef:f32, undef:f32, undef:f32
> > > 
> > > ```
> > > 
> > > I copied the getTypeToTransformTo() call from the implementation of WidenVecRes_Unary(), so that's what we use if we fall-through. Should there be a difference in the call based on what we're using that type for (UnrollVectorOp() vs. GetWidenedVector())? 
> > Thinking about it a bit more, you'd probably get the same result if you just didn't try to compute the vector type at all, and just check `TLI.isOperationExpand(N->getOpcode(), VT.getScalarType()`.  (If a vector operation is legal, the equivalent scalar operation is generally also legal.)
> We still have to compute the vector type to pass into UnrollVectorOp though? But you're correct that I don't see any tests diffs if we just ask if the scalar type is set to expand. Let me post the updated patch and see if that looks better.
Makes sense to me.

https://reviews.llvm.org/D50791