[PATCH] D38313: [InstCombine] Introducing Aggressive Instruction Combine pass

Mon Nov 27 11:40:29 PST 2017

escha added a comment.

Two comments on the trunc thing:

1. Thank you!!! As a GPU target maintainer, one of my main frustrations is how much LLVM *loves* to generate code that is needlessly too wide when smaller would do. We mostly have avoided this problem due to being float-heavy, but as integer code becomes more important, I absolutely love any chance I can get to reduce 32-bit to 16-bit and save register space accordingly.

2. I'm worried about this because the DAG *loves* to eliminate """redundant""" truncates and extensions, even if they're both marked as free. I've accidentally triggered infinite loops many times when trying to trick the DAG into emitting code that keeps intermediate variables small, an extreme example being something like this:

  ; pseudo-asm
  ; R1 = *b + (*a & 15);
  ; R2 = *c + (*a >> 16) & 15;
  load.32 R0, [a]
  load.32 R1, [b]
  load.32 R2, [c]
  shr.32 R0H, R0, 16
  and.16 R0L, R0L, 15
  and.16 R0H, R0H, 15
  add.32 R1, R1, R0L
  add.32 R2, R2, R0H

The DAG will usually try to turn this into this:

  load.32 R0, [a]
  load.32 R1, [b]
  load.32 R2, [c]
  shr.32 R3, R0, 16
  and.32 R0, R0, 15
  and.32 R3, R3, 15
  add.32 R1, R1, R0
  add.32 R2, R2, R3

this is just a hypothetical example but in general this makes me worry from past attempts at experimentation in this realm.

https://reviews.llvm.org/D38313