[PATCH] D74165: [x86] [DAGCombine] Prefer shifts of constant widths.

Justin Lebar via Phabricator via llvm-commits llvm-commits at lists.llvm.org
Fri Feb 7 09:04:10 PST 2020


jlebar marked an inline comment as done.
jlebar added a comment.

Hi, thank you for looking at this!

> Is there any canonicalization happening in InstCombine for this?

Yes, it canonicalizes to `shift x, (select ...)`, https://gcc.godbolt.org/z/AtSRrW.

> Vector tests? SSE shifts by uniform constants are a lot better than the alternatives.

Vector shift by a uniform value is better than shift by a vector of values, but I don't think that's what this transformation allows.  (I recall seeing something in the x86 lowering code for what I think you're describing.)  Do you see a CPU on which vector shift by an *immediate* is so much than vector shift by a register that we'd want to do two shifts by immediates instead of one by a register?  I'm not seeing that as I look through the Agner table for Skylake, but I may be missing something.

TODO: Once we resolve this, I need to handle and test vector shifts somehow, if only by excluding them from this transformation.

> Would it be better to make this more generic from the start? Some mechanism that allows targets to push selects up/down the DAG.

Eh, YAGNI?  I looked for other opportunities to apply this optimization (i.e. where it's worth "speculating" an instruction so that one of the operands gets to be an immediate) and didn't really find any.  Do you?

> "but not FHL or FSHR"? Why shouldn't funnels shifts be included?

Is there a target on which we would do this transformation?  If not I'd rather not write code that is never exercised (and it's more code because funnel shift takes three rather than two args).



================
Comment at: llvm/test/CodeGen/X86/select.ll:1123
+; MCU-NEXT:  .LBB20_1:
+; MCU-NEXT:    shll $3, %eax
 ; MCU-NEXT:    retl
----------------
RKSimon wrote:
> Regression
Wow.  What...is...this...architecture.

I guess the answer is, we don't do this optimization if the target doesn't have cmov?

I mean, I'll do it, but are we sure the complexity is worth it for this target?  I can't even find an Agner optimization table for Intel MCU (Quark?).


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D74165/new/

https://reviews.llvm.org/D74165





More information about the llvm-commits mailing list