[PATCH] D54803: [x86] promote all multiply i8 by constant to i32

Thu Nov 29 13:32:47 PST 2018

craig.topper added a comment.

Intel core CPUs from Sandy Bridge on always store bits 63:16 and bits 7:0 in the same physical register file entry. Only bits 15:8 of EAX/EBX/ECX/EDX can be separated due to a write to AH/BH/CH/DH. For most binary arithmetic operations one of the input register is also the output register. So its easy to pass the upper bits through without modifying them. So "add %al, %bl" reads all 64-bits of %rax and %rbx (ignoring that %AH and %BH could have been written separately) and leaves bits 63:8 of %rbx unmodified. Instructions that write only bits 7:0 or bits 15:8 of a register and don't also read part of the same register trigger a merge uop to be inserted. This would be instructions like a load into %al or %ax. I believe move immediate into %al or %ax doesn't have a separate merge uop. Its single uop just reads the whole destination register and merges the immediate into the lower bits. 16-bit popcnt/lzcnt/tzcnt also have a false dependency on the upper bits so the single uop can do the merge. MOVSX/MOVZX from 8-bit to 16-bit are similar so that the upper bits can be preserved. If bits 15:8 have been separated and an instruction is issued that needs bits 15:8 and any of the other bits then a merge is inserted to join them.

Repository:
  rL LLVM

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D54803/new/

https://reviews.llvm.org/D54803