[PATCH] D52177: [InstCombine] Fold ~A - Min/Max(~A, O) -> Max/Min(A, ~O) - A

Fri Oct 12 05:42:54 PDT 2018

dmgreen added a comment.

Yeah, that looks like similar IR to what I was looking at. The vectorised version on Skylake (https://godbolt.org/z/RBS2Os) has a lot of shuffling, perhaps that's deemed unprofitable on Goldmont?

I can agree that 8 registers are hard to deal with. Can you explain the "promoting everything to 32-bits", do you mean essentially zext's/truncs around the whole max/max/xor/sub's block? I gave that a try and the sub's still seemed to be using bl's. (it uses cmp's not branches though, which looks better to my untrained eyes).

Repository:
  rL LLVM

https://reviews.llvm.org/D52177