[PATCH] D46179: [X86] Lowering adds/addus/subs/subus intrinsics to native IR (LLVM part)

Thu May 10 13:28:23 PDT 2018

sroland added a comment.

In https://reviews.llvm.org/D46179#1093843, @craig.topper wrote:

> I'm also concerned that the more complex patterns are easier for other optimizations to simplify a little and break. The simpler things like pmin/pmax or pabs aren't so bad to not match when they get optimized a little.

I was wondering about that too.
Also, another concern I actually have is that I believe some of these sequences are very suboptimal. Noone (at least when thinking about how to do it fast) would do them that way if trying to emulate this manually.
The subus looks great (we're doing the same as fallback), it's just cmp/select/sub.
The addus is simply terrible.
We're doing  min((unsigned)(a, b xor ~0)) + b instead - that is only xor/cmp/select/add.
(Another (better) solution, and that is what is typically used to detect overflow for unsigned adds (apart from the select of course), is (unsigned)(a + b) < (unsigned)a ? ~0 : a + b - only add / cmp / select)
We're generally avoiding sext/trunc sequences (typically no good comes from doing that...) so for signed saturated sub/add we're using some more complex sequences (using a couple more cmp/select/sub (or add)). Not entirely sure if it would actually result in better code or not if you really have to emulate it, but the sext/trunc can be a problem in itself (don't ask me how many shuffle instructions it would generate on sse2 only...) and of course the add/cmp/select is really times 2 instructions due to running on twice as wide vectors in the end. These are really quite complex to emulate.
So maybe for the unsigned saturated add/sub, when using optimal patterns it wouldn't be too bad, without too much concern about not being able to recognize them again in the end. But not sure about the signed ones.

Repository:
  rL LLVM

https://reviews.llvm.org/D46179