[PATCH] D25679: Do not assume that FP vector operands are never legalized by expanding

Mon Oct 17 16:22:24 PDT 2016

amehsan added inline comments.

================
Comment at: lib/CodeGen/SelectionDAG/LegalizeTypesGeneric.cpp:337-340
 SDValue DAGTypeLegalizer::ExpandOp_BITCAST(SDNode *N) {
   SDLoc dl(N);
-  if (N->getValueType(0).isVector()) {
+  if (N->getValueType(0).isVector() &&
+      N->getOperand(0).getValueType().isInteger()) {
----------------
nemanjai wrote:
> amehsan wrote:
> > amehsan wrote:
> > > So, for cases that your new condition is not satisfied, what change do we make in the generated code? (The question applies to other platforms as well). I believe there is no testcase in unit-tests for this, (otherwise it would have failed and would have been included in your changes). Still this may happen in benchmarks and real work loads. I think you may need to investigate how that code pattern changes and what is the potential performance impact of your change for that code pattern. 
> > I am asking because this function is called from places like DAGTypeLegalizer::ExpandFloatOperand. My question is the fact that you get a failure for floating point case, is applicable to all platforms or not?
> If you expand the context in the diff a bit, you'll see where the assert that we trip on is located.  Namely, line 321. If this function was ever called with a node whose result is a vector and first operand was a floating point type, that assert would have tripped. All this patch changes is that rather than ignoring the possibility that the operand is a non-integer type and then failing at the assert, we check for this and don't call IntegerToVector.
> 
> So given that this conditional block could not possibly succeed if the result is a vector and the operand is non-integer, I don't see how we can investigate the performance impact of this change. Namely, cases where we would enter this block before and we don't now are exactly those that would not result in a successful compilation before. So the performance difference between the code we produce now and the code we would produce without this patch cannot be studied.
> 
> And I think this makes sense because for example, we can easily expand something like:
> 
> ```
> tX; v2i64 = bitcast tY
>   tY: i128 = ...
> ```
> by splitting the i128 value and passing the two i64 values to a BUILD_VECTOR node.
> However, doing something similar to the equivalent FP node would require adding some machinery that I'm not sure would be worth the effort. Namely, handling this:
> 
> ```
> tX: v2f64 = bitcast tY
>   t81: ppcf128 = ...
> ```
> By splitting and using BUILD_VECTOR vs. through the stack is not something that is likely to come up frequently enough in real code that it would warrant the additional machinery. And of course, doing the same thing for f128 would be significantly more tricky (I don't know enough about IEEE binary-128 to know if it's even possible). So I believe that going through the stack here is the right thing to do.
> Finally, even if we find some code where this happens a lot for ppcf128, I am still not convinced that we would gain much by avoiding the stack since the actual operations on the type are actually done in software, so it is unlikely that this code is expecting particularly high performance. But that's just my opinion.
Yes, if the assert is inside this conditional this is fine. I did look at the code, but it was not quite obvious, or maybe I missed it.

Repository:
  rL LLVM

https://reviews.llvm.org/D25679