[PATCH] D33583: [AMDGPU] Allow SDWA in instructions with immediates and SGPRs

Stanislav Mekhanoshin via Phabricator via llvm-commits llvm-commits at lists.llvm.org
Thu May 25 23:02:32 PDT 2017


rampitec marked 2 inline comments as done.
rampitec added inline comments.


================
Comment at: lib/Target/AMDGPU/SIPeepholeSDWA.cpp:590
 
-  // 2. Are all operands - VGPRs
-  for (const MachineOperand &Operand : MI.explicit_operands()) {
-    if (!Operand.isReg() || !TRI->isVGPR(*MRI, Operand.getReg()))
+  // 2. Are all operands - VGPRs or can be changed to VGPRs
+  const MCInstrDesc &Desc = TII->get(MI.getOpcode());
----------------
rampitec wrote:
> SamWot wrote:
> > I think there should be some heuristic to check if this should be done. E.g. if we would fold only one SDWA operand in this instruction then creating this copy would only increase code size.
> Copy can be hoisted out of a loop. In this situation it is still profitable even if code has grown. I.e. there is a chance to improve, but if not it does not really hurt.
In fact the case which inspired this change has exactly this situation. Transformation by itself does not bring a big improvement, just below 1%. But with MachineLICM added which only hoists the immediate move out of the loop it yields 11% improvement. Of course the case is compute bound and the loop is small.


Repository:
  rL LLVM

https://reviews.llvm.org/D33583





More information about the llvm-commits mailing list