[llvm] r214832 - MachineCombiner Pass for selecting faster instruction

Tue Aug 5 21:19:29 PDT 2014

Hi Kevin,

I assume you are using gcc as your build compiler? At least in gcc v4.9 there seems an issue with the register allocator resulting in a stack overwrite in the target independent part of the machine combiner:

MachineCombiner.cpp:
/// preservesResourceLen - True when the new instructions do not increase
/// resource length
...
>>> Allocate ptr to MBB on the stack
  ArrayRef<const MachineBasicBlock *> MBBarr(MBB);
  unsigned ResLenBeforeCombine = BlockTrace.getResourceLength(MBBarr);

>>>  During the next few value the address on the stack is overwritten.

  SmallVector<const MCSchedClassDesc *, 16> InsInstrsSC;
  SmallVector<const MCSchedClassDesc *, 16> DelInstrsSC;

  instr2instrSC(InsInstrs, InsInstrsSC);
  instr2instrSC(DelInstrs, DelInstrsSC);

  ArrayRef<const MCSchedClassDesc *> MSCInsArr = makeArrayRef(InsInstrsSC);
  ArrayRef<const MCSchedClassDesc *> MSCDelArr = makeArrayRef(DelInstrsSC);

>>> MBBarr contains garbage ptr to MBB
 unsigned ResLenAfterCombine =
      BlockTrace.getResourceLength(MBBarr, MSCInsArr, MSCDelArr);

Any ideas or suggestions on how to pursue from here? There are multiple ways to work-around in the code eg. avoiding ArrayRefs and using SmallVectors etc., but I would prefer to get help on completing root causing the issue.

Big thanks also to Justin for helping with access to Linux resources and his expertise in zooming in on this problem.

Cheers
Gerolf

On Aug 5, 2014, at 1:08 PM, Gerolf Hoflehner <ghoflehner at apple.com> wrote:

> Hi Kevin,
> 
> apologies for multiple inconveniences. There is an issue with the machine model your are using that I don’t understand yet and is hiding from me so far in my local testing (not surprising since I used a different model). The getResourceLength function is supposed to return the resource length of a basic block before any combining action starts.  
> 
> I also realized that I had a mail filter in place hiding the build breakage news. :-(
> 
> Thanks
> Gerolf
> 
> 
> 
> On Aug 4, 2014, at 11:01 PM, Kevin Qin <kevinqindev at gmail.com> wrote:
> 
>> Hi Gerolf,
>> 
>> I reverted it again because it broke the  broke compiling of most Benchmark and internal test, as clang got clashed by segmentation fault or assertion.
>> Here are some dump information:
>> 
>> 0  clang-3.6       0x000000000204502b llvm::sys::PrintStackTrace(_IO_FILE*) + 38
>> 1  clang-3.6       0x00000000020452a8
>> 2  clang-3.6       0x0000000002044c4d
>> 3  libpthread.so.0 0x00002b25774e6340
>> 4  clang-3.6       0x0000000001031e28
>> 5  clang-3.6       0x0000000001a6c28a llvm::MachineTraceMetrics::Trace::getResourceLength(llvm::ArrayRef<llvm::MachineBasicBlock const*>, llvm::ArrayRef<llvm::MCSchedClassDesc const*>, llvm::ArrayRef<llvm::MCSchedClassDesc const*>) const + 256
>> 6  clang-3.6       0x00000000019fa6ae
>> 7  clang-3.6       0x00000000019facd9
>> 8  clang-3.6       0x00000000019fb295
>> 9  clang-3.6       0x0000000001a12c7d llvm::MachineFunctionPass::runOnFunction(llvm::Function&) + 95
>> 10 clang-3.6       0x0000000001cc89ca llvm::FPPassManager::runOnFunction(llvm::Function&) + 290
>> 11 clang-3.6       0x0000000001cc8b3a llvm::FPPassManager::runOnModule(llvm::Module&) + 84
>> 12 clang-3.6       0x0000000001cc8e58
>> 13 clang-3.6       0x0000000001cc94fc llvm::legacy::PassManagerImpl::run(llvm::Module&) + 244
>> 14 clang-3.6       0x0000000001cc971b llvm::legacy::PassManager::run(llvm::Module&) + 39
>> 15 clang-3.6       0x0000000002558021
>> 16 clang-3.6       0x00000000025580f0 clang::EmitBackendOutput(clang::DiagnosticsEngine&, clang::CodeGenOptions const&, clang::TargetOptions const&, clang::LangOptions const&, llvm::StringRef, llvm::Module*, clang::BackendAction, llvm::raw_ostream*) + 127
>> 17 clang-3.6       0x000000000254244f
>> 18 clang-3.6       0x0000000002ea9b48 clang::ParseAST(clang::Sema&, bool, bool) + 776
>> 19 clang-3.6       0x000000000221fedc clang::ASTFrontendAction::ExecuteAction() + 322
>> 20 clang-3.6       0x0000000002544572 clang::CodeGenAction::ExecuteAction() + 1370
>> 21 clang-3.6       0x000000000221fa11 clang::FrontendAction::Execute() + 139
>> 22 clang-3.6       0x00000000021eef47 clang::CompilerInstance::ExecuteAction(clang::FrontendAction&) + 721
>> 23 clang-3.6       0x000000000231ec65 clang::ExecuteCompilerInvocation(clang::CompilerInstance*) + 993
>> 24 clang-3.6       0x000000000100e05c cc1_main(char const**, char const**, char const*, void*) + 722
>> 25 clang-3.6       0x000000000100775a main + 769
>> 26 libc.so.6       0x00002b257814eec5 __libc_start_main + 245
>> 27 clang-3.6       0x0000000001004c59
>> Stack dump:
>> 0.	Program arguments: /home/kevin/llvm_trunk/build/bin/clang-3.6 -cc1 -triple arm64--linux-gnueabi -emit-obj -disable-free -main-file-name compress.c -mrelocation-model static -mdisable-fp-elim -menable-no-infs -menable-no-nans -menable-unsafe-fp-math -ffp-contract=fast -ffast-math -masm-verbose -mconstructor-aliases -fuse-init-array -target-cpu cortex-a57 -target-feature +neon -target-feature +crc -target-feature +crypto -target-abi aapcs -dwarf-column-info -coverage-file /home/kevin/Bench/spec2006/benchspec/CPU2006/401.bzip2/build/build_base_llvm-high-opt.0001/compress.o -resource-dir /home/kevin/llvm_trunk/build/bin/../lib/clang/3.6.0 -D SPEC_CPU -D NDEBUG -D SPEC_CPU_LP64 -isysroot /home/kevin/gcc-linaro-aarch64/aarch64-linux-gnu/libc -internal-isystem /home/kevin/gcc-linaro-aarch64/aarch64-linux-gnu/libc/usr/local/include -internal-isystem /home/kevin/llvm_trunk/build/bin/../lib/clang/3.6.0/include -internal-externc-isystem /home/kevin/gcc-linaro-aarch64/aarch64-linux-gnu/libc/include -internal-externc-isystem /home/kevin/gcc-linaro-aarch64/aarch64-linux-gnu/libc/usr/include -O3 -fdebug-compilation-dir /home/kevin/Bench/spec2006/benchspec/CPU2006/401.bzip2/build/build_base_llvm-high-opt.0001 -ferror-limit 19 -fmessage-length 0 -mstackrealign -fno-signed-char -fobjc-runtime=gcc -fdiagnostics-show-option -vectorize-loops -vectorize-slp -o compress.o -x c compress.c 
>> 1.	<eof> parser at end of file
>> 2.	Code generation
>> 3.	Running pass 'Function Pass Manager' on module 'compress.c'.
>> 4.	Running pass 'Machine InstCombiner' on function '@BZ2_compressBlock'
>> clang-3.6: error: unable to execute command: Segmentation fault (core dumped)
>> clang-3.6: error: clang frontend command failed due to signal (use -v to see invocation)
>> 
>> Or
>> 
>> clang-3.6: /home/llvm-test/slave/pre-commit/build/include/llvm/ADT/SmallVector.h:145: const T& llvm::SmallVectorTemplateCommon<T, <template-parameter-1-2> >::operator[](unsigned int) const [with T = llvm::MachineTraceMetrics::FixedBlockInfo; <template-parameter-1-2> = void; llvm::SmallVectorTemplateCommon<T, <template-parameter-1-2> >::const_reference = const llvm::MachineTraceMetrics::FixedBlockInfo&]: Assertion `begin() + idx < end()' failed.
>> 
>> These failures should be easily reproduced by compiling LNT, SPEC2000 or SPEC2006 on x64 linux.
>> 
>> Regards,
>> Kevin
>> 
>> 
>> 2014-08-05 9:16 GMT+08:00 Gerolf Hoflehner <ghoflehner at apple.com>:
>> Author: ghoflehner
>> Date: Mon Aug  4 20:16:13 2014
>> New Revision: 214832
>> 
>> URL: http://llvm.org/viewvc/llvm-project?rev=214832&view=rev
>> Log:
>> MachineCombiner Pass for selecting faster instruction
>>  sequence on AArch64
>> 
>> Re-commit of r214669 without changes to test cases
>> LLVM::CodeGen/AArch64/arm64-neon-mul-div.ll and
>> LLVM:: CodeGen/AArch64/dp-3source.ll
>> This resolves the reported compfails of the original commit.
>> 
>> 
>> Added:
>>     llvm/trunk/lib/Target/AArch64/AArch64MachineCombinerPattern.h
>>     llvm/trunk/test/CodeGen/AArch64/madd-lohi.ll
>> Modified:
>>     llvm/trunk/lib/Target/AArch64/AArch64InstrFormats.td
>>     llvm/trunk/lib/Target/AArch64/AArch64InstrInfo.cpp
>>     llvm/trunk/lib/Target/AArch64/AArch64InstrInfo.h
>>     llvm/trunk/lib/Target/AArch64/AArch64TargetMachine.cpp
>>     llvm/trunk/test/CodeGen/AArch64/mul-lohi.ll
>> 
>> Modified: llvm/trunk/lib/Target/AArch64/AArch64InstrFormats.td
>> URL: http://llvm.org/viewvc/llvm-project/llvm/trunk/lib/Target/AArch64/AArch64InstrFormats.td?rev=214832&r1=214831&r2=214832&view=diff
>> ==============================================================================
>> --- llvm/trunk/lib/Target/AArch64/AArch64InstrFormats.td (original)
>> +++ llvm/trunk/lib/Target/AArch64/AArch64InstrFormats.td Mon Aug  4 20:16:13 2014
>> @@ -1351,14 +1351,15 @@ class BaseMulAccum<bit isSub, bits<3> op
>>  }
>> 
>>  multiclass MulAccum<bit isSub, string asm, SDNode AccNode> {
>> +  // MADD/MSUB generation is decided by MachineCombiner.cpp
>>    def Wrrr : BaseMulAccum<isSub, 0b000, GPR32, GPR32, asm,
>> -      [(set GPR32:$Rd, (AccNode GPR32:$Ra, (mul GPR32:$Rn, GPR32:$Rm)))]>,
>> +      [/*(set GPR32:$Rd, (AccNode GPR32:$Ra, (mul GPR32:$Rn, GPR32:$Rm)))*/]>,
>>        Sched<[WriteIM32, ReadIM, ReadIM, ReadIMA]> {
>>      let Inst{31} = 0;
>>    }
>> 
>>    def Xrrr : BaseMulAccum<isSub, 0b000, GPR64, GPR64, asm,
>> -      [(set GPR64:$Rd, (AccNode GPR64:$Ra, (mul GPR64:$Rn, GPR64:$Rm)))]>,
>> +      [/*(set GPR64:$Rd, (AccNode GPR64:$Ra, (mul GPR64:$Rn, GPR64:$Rm)))*/]>,
>>        Sched<[WriteIM64, ReadIM, ReadIM, ReadIMA]> {
>>      let Inst{31} = 1;
>>    }
>> 
>> Modified: llvm/trunk/lib/Target/AArch64/AArch64InstrInfo.cpp
>> URL: http://llvm.org/viewvc/llvm-project/llvm/trunk/lib/Target/AArch64/AArch64InstrInfo.cpp?rev=214832&r1=214831&r2=214832&view=diff
>> ==============================================================================
>> --- llvm/trunk/lib/Target/AArch64/AArch64InstrInfo.cpp (original)
>> +++ llvm/trunk/lib/Target/AArch64/AArch64InstrInfo.cpp Mon Aug  4 20:16:13 2014
>> @@ -14,6 +14,7 @@
>>  #include "AArch64InstrInfo.h"
>>  #include "AArch64Subtarget.h"
>>  #include "MCTargetDesc/AArch64AddressingModes.h"
>> +#include "AArch64MachineCombinerPattern.h"
>>  #include "llvm/CodeGen/MachineFrameInfo.h"
>>  #include "llvm/CodeGen/MachineInstrBuilder.h"
>>  #include "llvm/CodeGen/MachineMemOperand.h"
>> @@ -697,17 +698,12 @@ static bool UpdateOperandRegClass(Machin
>>    return true;
>>  }
>> 
>> -/// optimizeCompareInstr - Convert the instruction supplying the argument to the
>> -/// comparison into one that sets the zero bit in the flags register.
>> -bool AArch64InstrInfo::optimizeCompareInstr(
>> -    MachineInstr *CmpInstr, unsigned SrcReg, unsigned SrcReg2, int CmpMask,
>> -    int CmpValue, const MachineRegisterInfo *MRI) const {
>> -
>> -  // Replace SUBSWrr with SUBWrr if NZCV is not used.
>> -  int Cmp_NZCV = CmpInstr->findRegisterDefOperandIdx(AArch64::NZCV, true);
>> -  if (Cmp_NZCV != -1) {
>> +/// convertFlagSettingOpcode - return opcode that does not
>> +/// set flags when possible. The caller is responsible to do
>> +/// the actual substitution and legality checking.
>> +static unsigned convertFlagSettingOpcode(MachineInstr *MI) {
>>      unsigned NewOpc;
>> -    switch (CmpInstr->getOpcode()) {
>> +    switch (MI->getOpcode()) {
>>      default:
>>        return false;
>>      case AArch64::ADDSWrr:      NewOpc = AArch64::ADDWrr; break;
>> @@ -727,7 +723,22 @@ bool AArch64InstrInfo::optimizeCompareIn
>>      case AArch64::SUBSXrs:      NewOpc = AArch64::SUBXrs; break;
>>      case AArch64::SUBSXrx:      NewOpc = AArch64::SUBXrx; break;
>>      }
>> +    return NewOpc;
>> +}
>> 
>> +/// optimizeCompareInstr - Convert the instruction supplying the argument to the
>> +/// comparison into one that sets the zero bit in the flags register.
>> +bool AArch64InstrInfo::optimizeCompareInstr(
>> +    MachineInstr *CmpInstr, unsigned SrcReg, unsigned SrcReg2, int CmpMask,
>> +    int CmpValue, const MachineRegisterInfo *MRI) const {
>> +
>> +  // Replace SUBSWrr with SUBWrr if NZCV is not used.
>> +  int Cmp_NZCV = CmpInstr->findRegisterDefOperandIdx(AArch64::NZCV, true);
>> +  if (Cmp_NZCV != -1) {
>> +    unsigned Opc = CmpInstr->getOpcode();
>> +    unsigned NewOpc = convertFlagSettingOpcode(CmpInstr);
>> +    if (NewOpc == Opc)
>> +      return false;
>>      const MCInstrDesc &MCID = get(NewOpc);
>>      CmpInstr->setDesc(MCID);
>>      CmpInstr->RemoveOperand(Cmp_NZCV);
>> @@ -2185,3 +2196,448 @@ void AArch64InstrInfo::getNoopForMachoTa
>>    NopInst.setOpcode(AArch64::HINT);
>>    NopInst.addOperand(MCOperand::CreateImm(0));
>>  }
>> +/// useMachineCombiner - return true when a target supports MachineCombiner
>> +bool AArch64InstrInfo::useMachineCombiner(void) const {
>> +  // AArch64 supports the combiner
>> +  return true;
>> +}
>> +//
>> +// True when Opc sets flag
>> +static bool isCombineInstrSettingFlag(unsigned Opc) {
>> +  switch (Opc) {
>> +  case AArch64::ADDSWrr:
>> +  case AArch64::ADDSWri:
>> +  case AArch64::ADDSXrr:
>> +  case AArch64::ADDSXri:
>> +  case AArch64::SUBSWrr:
>> +  case AArch64::SUBSXrr:
>> +  // Note: MSUB Wd,Wn,Wm,Wi -> Wd = Wi - WnxWm, not Wd=WnxWm - Wi.
>> +  case AArch64::SUBSWri:
>> +  case AArch64::SUBSXri:
>> +    return true;
>> +  default:
>> +    break;
>> +  }
>> +  return false;
>> +}
>> +//
>> +// 32b Opcodes that can be combined with a MUL
>> +static bool isCombineInstrCandidate32(unsigned Opc) {
>> +  switch (Opc) {
>> +  case AArch64::ADDWrr:
>> +  case AArch64::ADDWri:
>> +  case AArch64::SUBWrr:
>> +  case AArch64::ADDSWrr:
>> +  case AArch64::ADDSWri:
>> +  case AArch64::SUBSWrr:
>> +  // Note: MSUB Wd,Wn,Wm,Wi -> Wd = Wi - WnxWm, not Wd=WnxWm - Wi.
>> +  case AArch64::SUBWri:
>> +  case AArch64::SUBSWri:
>> +    return true;
>> +  default:
>> +    break;
>> +  }
>> +  return false;
>> +}
>> +//
>> +// 64b Opcodes that can be combined with a MUL
>> +static bool isCombineInstrCandidate64(unsigned Opc) {
>> +  switch (Opc) {
>> +  case AArch64::ADDXrr:
>> +  case AArch64::ADDXri:
>> +  case AArch64::SUBXrr:
>> +  case AArch64::ADDSXrr:
>> +  case AArch64::ADDSXri:
>> +  case AArch64::SUBSXrr:
>> +  // Note: MSUB Wd,Wn,Wm,Wi -> Wd = Wi - WnxWm, not Wd=WnxWm - Wi.
>> +  case AArch64::SUBXri:
>> +  case AArch64::SUBSXri:
>> +    return true;
>> +  default:
>> +    break;
>> +  }
>> +  return false;
>> +}
>> +//
>> +// Opcodes that can be combined with a MUL
>> +static bool isCombineInstrCandidate(unsigned Opc) {
>> +  return (isCombineInstrCandidate32(Opc) || isCombineInstrCandidate64(Opc));
>> +}
>> +
>> +static bool canCombineWithMUL(MachineBasicBlock &MBB, MachineOperand &MO,
>> +                              unsigned MulOpc, unsigned ZeroReg) {
>> +  MachineRegisterInfo &MRI = MBB.getParent()->getRegInfo();
>> +  MachineInstr *MI = nullptr;
>> +  // We need a virtual register definition.
>> +  if (MO.isReg() && TargetRegisterInfo::isVirtualRegister(MO.getReg()))
>> +    MI = MRI.getUniqueVRegDef(MO.getReg());
>> +  // And it needs to be in the trace (otherwise, it won't have a depth).
>> +  if (!MI || MI->getParent() != &MBB || (unsigned)MI->getOpcode() != MulOpc)
>> +    return false;
>> +
>> +  assert(MI->getNumOperands() >= 4 && MI->getOperand(0).isReg() &&
>> +         MI->getOperand(1).isReg() && MI->getOperand(2).isReg() &&
>> +         MI->getOperand(3).isReg() && "MAdd/MSub must have a least 4 regs");
>> +
>> +  // The third input reg must be zero.
>> +  if (MI->getOperand(3).getReg() != ZeroReg)
>> +    return false;
>> +
>> +  // Must only used by the user we combine with.
>> +  if (!MRI.hasOneNonDBGUse(MI->getOperand(0).getReg()))
>> +    return false;
>> +
>> +  return true;
>> +}
>> +
>> +/// hasPattern - return true when there is potentially a faster code sequence
>> +/// for an instruction chain ending in \p Root. All potential patterns are
>> +/// listed
>> +/// in the \p Pattern vector. Pattern should be sorted in priority order since
>> +/// the pattern evaluator stops checking as soon as it finds a faster sequence.
>> +
>> +bool AArch64InstrInfo::hasPattern(
>> +    MachineInstr &Root,
>> +    SmallVectorImpl<MachineCombinerPattern::MC_PATTERN> &Pattern) const {
>> +  unsigned Opc = Root.getOpcode();
>> +  MachineBasicBlock &MBB = *Root.getParent();
>> +  bool Found = false;
>> +
>> +  if (!isCombineInstrCandidate(Opc))
>> +    return 0;
>> +  if (isCombineInstrSettingFlag(Opc)) {
>> +    int Cmp_NZCV = Root.findRegisterDefOperandIdx(AArch64::NZCV, true);
>> +    // When NZCV is live bail out.
>> +    if (Cmp_NZCV == -1)
>> +      return 0;
>> +    unsigned NewOpc = convertFlagSettingOpcode(&Root);
>> +    // When opcode can't change bail out.
>> +    // CHECKME: do we miss any cases for opcode conversion?
>> +    if (NewOpc == Opc)
>> +      return 0;
>> +    Opc = NewOpc;
>> +  }
>> +
>> +  switch (Opc) {
>> +  default:
>> +    break;
>> +  case AArch64::ADDWrr:
>> +    assert(Root.getOperand(1).isReg() && Root.getOperand(2).isReg() &&
>> +           "ADDWrr does not have register operands");
>> +    if (canCombineWithMUL(MBB, Root.getOperand(1), AArch64::MADDWrrr,
>> +                          AArch64::WZR)) {
>> +      Pattern.push_back(MachineCombinerPattern::MC_MULADDW_OP1);
>> +      Found = true;
>> +    }
>> +    if (canCombineWithMUL(MBB, Root.getOperand(2), AArch64::MADDWrrr,
>> +                          AArch64::WZR)) {
>> +      Pattern.push_back(MachineCombinerPattern::MC_MULADDW_OP2);
>> +      Found = true;
>> +    }
>> +    break;
>> +  case AArch64::ADDXrr:
>> +    if (canCombineWithMUL(MBB, Root.getOperand(1), AArch64::MADDXrrr,
>> +                          AArch64::XZR)) {
>> +      Pattern.push_back(MachineCombinerPattern::MC_MULADDX_OP1);
>> +      Found = true;
>> +    }
>> +    if (canCombineWithMUL(MBB, Root.getOperand(2), AArch64::MADDXrrr,
>> +                          AArch64::XZR)) {
>> +      Pattern.push_back(MachineCombinerPattern::MC_MULADDX_OP2);
>> +      Found = true;
>> +    }
>> +    break;
>> +  case AArch64::SUBWrr:
>> +    if (canCombineWithMUL(MBB, Root.getOperand(1), AArch64::MADDWrrr,
>> +                          AArch64::WZR)) {
>> +      Pattern.push_back(MachineCombinerPattern::MC_MULSUBW_OP1);
>> +      Found = true;
>> +    }
>> +    if (canCombineWithMUL(MBB, Root.getOperand(2), AArch64::MADDWrrr,
>> +                          AArch64::WZR)) {
>> +      Pattern.push_back(MachineCombinerPattern::MC_MULSUBW_OP2);
>> +      Found = true;
>> +    }
>> +    break;
>> +  case AArch64::SUBXrr:
>> +    if (canCombineWithMUL(MBB, Root.getOperand(1), AArch64::MADDXrrr,
>> +                          AArch64::XZR)) {
>> +      Pattern.push_back(MachineCombinerPattern::MC_MULSUBX_OP1);
>> +      Found = true;
>> +    }
>> +    if (canCombineWithMUL(MBB, Root.getOperand(2), AArch64::MADDXrrr,
>> +                          AArch64::XZR)) {
>> +      Pattern.push_back(MachineCombinerPattern::MC_MULSUBX_OP2);
>> +      Found = true;
>> +    }
>> +    break;
>> +  case AArch64::ADDWri:
>> +    if (canCombineWithMUL(MBB, Root.getOperand(1), AArch64::MADDWrrr,
>> +                          AArch64::WZR)) {
>> +      Pattern.push_back(MachineCombinerPattern::MC_MULADDWI_OP1);
>> +      Found = true;
>> +    }
>> +    break;
>> +  case AArch64::ADDXri:
>> +    if (canCombineWithMUL(MBB, Root.getOperand(1), AArch64::MADDXrrr,
>> +                          AArch64::XZR)) {
>> +      Pattern.push_back(MachineCombinerPattern::MC_MULADDXI_OP1);
>> +      Found = true;
>> +    }
>> +    break;
>> +  case AArch64::SUBWri:
>> +    if (canCombineWithMUL(MBB, Root.getOperand(1), AArch64::MADDWrrr,
>> +                          AArch64::WZR)) {
>> +      Pattern.push_back(MachineCombinerPattern::MC_MULSUBWI_OP1);
>> +      Found = true;
>> +    }
>> +    break;
>> +  case AArch64::SUBXri:
>> +    if (canCombineWithMUL(MBB, Root.getOperand(1), AArch64::MADDXrrr,
>> +                          AArch64::XZR)) {
>> +      Pattern.push_back(MachineCombinerPattern::MC_MULSUBXI_OP1);
>> +      Found = true;
>> +    }
>> +    break;
>> +  }
>> +  return Found;
>> +}
>> +
>> +/// genMadd - Generate madd instruction and combine mul and add.
>> +/// Example:
>> +///  MUL I=A,B,0
>> +///  ADD R,I,C
>> +///  ==> MADD R,A,B,C
>> +/// \param Root is the ADD instruction
>> +/// \param [out] InsInstr is a vector of machine instructions and will
>> +/// contain the generated madd instruction
>> +/// \param IdxMulOpd is index of operand in Root that is the result of
>> +/// the MUL. In the example above IdxMulOpd is 1.
>> +/// \param MaddOpc the opcode fo the madd instruction
>> +static MachineInstr *genMadd(MachineFunction &MF, MachineRegisterInfo &MRI,
>> +                             const TargetInstrInfo *TII, MachineInstr &Root,
>> +                             SmallVectorImpl<MachineInstr *> &InsInstrs,
>> +                             unsigned IdxMulOpd, unsigned MaddOpc) {
>> +  assert(IdxMulOpd == 1 || IdxMulOpd == 2);
>> +
>> +  unsigned IdxOtherOpd = IdxMulOpd == 1 ? 2 : 1;
>> +  MachineInstr *MUL = MRI.getUniqueVRegDef(Root.getOperand(IdxMulOpd).getReg());
>> +  MachineOperand R = Root.getOperand(0);
>> +  MachineOperand A = MUL->getOperand(1);
>> +  MachineOperand B = MUL->getOperand(2);
>> +  MachineOperand C = Root.getOperand(IdxOtherOpd);
>> +  MachineInstrBuilder MIB = BuildMI(MF, Root.getDebugLoc(), TII->get(MaddOpc))
>> +                                .addOperand(R)
>> +                                .addOperand(A)
>> +                                .addOperand(B)
>> +                                .addOperand(C);
>> +  // Insert the MADD
>> +  InsInstrs.push_back(MIB);
>> +  return MUL;
>> +}
>> +
>> +/// genMaddR - Generate madd instruction and combine mul and add using
>> +/// an extra virtual register
>> +/// Example - an ADD intermediate needs to be stored in a register:
>> +///   MUL I=A,B,0
>> +///   ADD R,I,Imm
>> +///   ==> ORR  V, ZR, Imm
>> +///   ==> MADD R,A,B,V
>> +/// \param Root is the ADD instruction
>> +/// \param [out] InsInstr is a vector of machine instructions and will
>> +/// contain the generated madd instruction
>> +/// \param IdxMulOpd is index of operand in Root that is the result of
>> +/// the MUL. In the example above IdxMulOpd is 1.
>> +/// \param MaddOpc the opcode fo the madd instruction
>> +/// \param VR is a virtual register that holds the value of an ADD operand
>> +/// (V in the example above).
>> +static MachineInstr *genMaddR(MachineFunction &MF, MachineRegisterInfo &MRI,
>> +                              const TargetInstrInfo *TII, MachineInstr &Root,
>> +                              SmallVectorImpl<MachineInstr *> &InsInstrs,
>> +                              unsigned IdxMulOpd, unsigned MaddOpc,
>> +                              unsigned VR) {
>> +  assert(IdxMulOpd == 1 || IdxMulOpd == 2);
>> +
>> +  MachineInstr *MUL = MRI.getUniqueVRegDef(Root.getOperand(IdxMulOpd).getReg());
>> +  MachineOperand R = Root.getOperand(0);
>> +  MachineOperand A = MUL->getOperand(1);
>> +  MachineOperand B = MUL->getOperand(2);
>> +  MachineInstrBuilder MIB = BuildMI(MF, Root.getDebugLoc(), TII->get(MaddOpc))
>> +                                .addOperand(R)
>> +                                .addOperand(A)
>> +                                .addOperand(B)
>> +                                .addReg(VR);
>> +  // Insert the MADD
>> +  InsInstrs.push_back(MIB);
>> +  return MUL;
>> +}
>> +/// genAlternativeCodeSequence - when hasPattern() finds a pattern
>> +/// this function generates the instructions that could replace the
>> +/// original code sequence
>> +void AArch64InstrInfo::genAlternativeCodeSequence(
>> +    MachineInstr &Root, MachineCombinerPattern::MC_PATTERN Pattern,
>> +    SmallVectorImpl<MachineInstr *> &InsInstrs,
>> +    SmallVectorImpl<MachineInstr *> &DelInstrs,
>> +    DenseMap<unsigned, unsigned> &InstrIdxForVirtReg) const {
>> +  MachineBasicBlock &MBB = *Root.getParent();
>> +  MachineRegisterInfo &MRI = MBB.getParent()->getRegInfo();
>> +  MachineFunction &MF = *MBB.getParent();
>> +  const TargetInstrInfo *TII = MF.getTarget().getSubtargetImpl()->getInstrInfo();
>> +
>> +  MachineInstr *MUL;
>> +  unsigned Opc;
>> +  switch (Pattern) {
>> +  default:
>> +    // signal error.
>> +    break;
>> +  case MachineCombinerPattern::MC_MULADDW_OP1:
>> +  case MachineCombinerPattern::MC_MULADDX_OP1:
>> +    // MUL I=A,B,0
>> +    // ADD R,I,C
>> +    // ==> MADD R,A,B,C
>> +    // --- Create(MADD);
>> +    Opc = Pattern == MachineCombinerPattern::MC_MULADDW_OP1 ? AArch64::MADDWrrr
>> +                                                            : AArch64::MADDXrrr;
>> +    MUL = genMadd(MF, MRI, TII, Root, InsInstrs, 1, Opc);
>> +    break;
>> +  case MachineCombinerPattern::MC_MULADDW_OP2:
>> +  case MachineCombinerPattern::MC_MULADDX_OP2:
>> +    // MUL I=A,B,0
>> +    // ADD R,C,I
>> +    // ==> MADD R,A,B,C
>> +    // --- Create(MADD);
>> +    Opc = Pattern == MachineCombinerPattern::MC_MULADDW_OP2 ? AArch64::MADDWrrr
>> +                                                            : AArch64::MADDXrrr;
>> +    MUL = genMadd(MF, MRI, TII, Root, InsInstrs, 2, Opc);
>> +    break;
>> +  case MachineCombinerPattern::MC_MULADDWI_OP1:
>> +  case MachineCombinerPattern::MC_MULADDXI_OP1:
>> +    // MUL I=A,B,0
>> +    // ADD R,I,Imm
>> +    // ==> ORR  V, ZR, Imm
>> +    // ==> MADD R,A,B,V
>> +    // --- Create(MADD);
>> +    {
>> +      const TargetRegisterClass *RC =
>> +          MRI.getRegClass(Root.getOperand(1).getReg());
>> +      unsigned NewVR = MRI.createVirtualRegister(RC);
>> +      unsigned BitSize, OrrOpc, ZeroReg;
>> +      if (Pattern == MachineCombinerPattern::MC_MULADDWI_OP1) {
>> +        BitSize = 32;
>> +        OrrOpc = AArch64::ORRWri;
>> +        ZeroReg = AArch64::WZR;
>> +        Opc = AArch64::MADDWrrr;
>> +      } else {
>> +        OrrOpc = AArch64::ORRXri;
>> +        BitSize = 64;
>> +        ZeroReg = AArch64::XZR;
>> +        Opc = AArch64::MADDXrrr;
>> +      }
>> +      uint64_t Imm = Root.getOperand(2).getImm();
>> +
>> +      if (Root.getOperand(3).isImm()) {
>> +        unsigned val = Root.getOperand(3).getImm();
>> +        Imm = Imm << val;
>> +      }
>> +      uint64_t UImm = Imm << (64 - BitSize) >> (64 - BitSize);
>> +      uint64_t Encoding;
>> +
>> +      if (AArch64_AM::processLogicalImmediate(UImm, BitSize, Encoding)) {
>> +        MachineInstrBuilder MIB1 =
>> +            BuildMI(MF, Root.getDebugLoc(), TII->get(OrrOpc))
>> +                .addOperand(MachineOperand::CreateReg(NewVR, RegState::Define))
>> +                .addReg(ZeroReg)
>> +                .addImm(Encoding);
>> +        InsInstrs.push_back(MIB1);
>> +        InstrIdxForVirtReg.insert(std::make_pair(NewVR, 0));
>> +        MUL = genMaddR(MF, MRI, TII, Root, InsInstrs, 1, Opc, NewVR);
>> +      }
>> +    }
>> +    break;
>> +  case MachineCombinerPattern::MC_MULSUBW_OP1:
>> +  case MachineCombinerPattern::MC_MULSUBX_OP1: {
>> +    // MUL I=A,B,0
>> +    // SUB R,I, C
>> +    // ==> SUB  V, 0, C
>> +    // ==> MADD R,A,B,V // = -C + A*B
>> +    // --- Create(MADD);
>> +    const TargetRegisterClass *RC =
>> +        MRI.getRegClass(Root.getOperand(1).getReg());
>> +    unsigned NewVR = MRI.createVirtualRegister(RC);
>> +    unsigned SubOpc, ZeroReg;
>> +    if (Pattern == MachineCombinerPattern::MC_MULSUBW_OP1) {
>> +      SubOpc = AArch64::SUBWrr;
>> +      ZeroReg = AArch64::WZR;
>> +      Opc = AArch64::MADDWrrr;
>> +    } else {
>> +      SubOpc = AArch64::SUBXrr;
>> +      ZeroReg = AArch64::XZR;
>> +      Opc = AArch64::MADDXrrr;
>> +    }
>> +    // SUB NewVR, 0, C
>> +    MachineInstrBuilder MIB1 =
>> +        BuildMI(MF, Root.getDebugLoc(), TII->get(SubOpc))
>> +            .addOperand(MachineOperand::CreateReg(NewVR, RegState::Define))
>> +            .addReg(ZeroReg)
>> +            .addOperand(Root.getOperand(2));
>> +    InsInstrs.push_back(MIB1);
>> +    InstrIdxForVirtReg.insert(std::make_pair(NewVR, 0));
>> +    MUL = genMaddR(MF, MRI, TII, Root, InsInstrs, 1, Opc, NewVR);
>> +  } break;
>> +  case MachineCombinerPattern::MC_MULSUBW_OP2:
>> +  case MachineCombinerPattern::MC_MULSUBX_OP2:
>> +    // MUL I=A,B,0
>> +    // SUB R,C,I
>> +    // ==> MSUB R,A,B,C (computes C - A*B)
>> +    // --- Create(MSUB);
>> +    Opc = Pattern == MachineCombinerPattern::MC_MULSUBW_OP2 ? AArch64::MSUBWrrr
>> +                                                            : AArch64::MSUBXrrr;
>> +    MUL = genMadd(MF, MRI, TII, Root, InsInstrs, 2, Opc);
>> +    break;
>> +  case MachineCombinerPattern::MC_MULSUBWI_OP1:
>> +  case MachineCombinerPattern::MC_MULSUBXI_OP1: {
>> +    // MUL I=A,B,0
>> +    // SUB R,I, Imm
>> +    // ==> ORR  V, ZR, -Imm
>> +    // ==> MADD R,A,B,V // = -Imm + A*B
>> +    // --- Create(MADD);
>> +    const TargetRegisterClass *RC =
>> +        MRI.getRegClass(Root.getOperand(1).getReg());
>> +    unsigned NewVR = MRI.createVirtualRegister(RC);
>> +    unsigned BitSize, OrrOpc, ZeroReg;
>> +    if (Pattern == MachineCombinerPattern::MC_MULSUBWI_OP1) {
>> +      BitSize = 32;
>> +      OrrOpc = AArch64::ORRWri;
>> +      ZeroReg = AArch64::WZR;
>> +      Opc = AArch64::MADDWrrr;
>> +    } else {
>> +      OrrOpc = AArch64::ORRXri;
>> +      BitSize = 64;
>> +      ZeroReg = AArch64::XZR;
>> +      Opc = AArch64::MADDXrrr;
>> +    }
>> +    int Imm = Root.getOperand(2).getImm();
>> +    if (Root.getOperand(3).isImm()) {
>> +      unsigned val = Root.getOperand(3).getImm();
>> +      Imm = Imm << val;
>> +    }
>> +    uint64_t UImm = -Imm << (64 - BitSize) >> (64 - BitSize);
>> +    uint64_t Encoding;
>> +    if (AArch64_AM::processLogicalImmediate(UImm, BitSize, Encoding)) {
>> +      MachineInstrBuilder MIB1 =
>> +          BuildMI(MF, Root.getDebugLoc(), TII->get(OrrOpc))
>> +              .addOperand(MachineOperand::CreateReg(NewVR, RegState::Define))
>> +              .addReg(ZeroReg)
>> +              .addImm(Encoding);
>> +      InsInstrs.push_back(MIB1);
>> +      InstrIdxForVirtReg.insert(std::make_pair(NewVR, 0));
>> +      MUL = genMaddR(MF, MRI, TII, Root, InsInstrs, 1, Opc, NewVR);
>> +    }
>> +  } break;
>> +  }
>> +  // Record MUL and ADD/SUB for deletion
>> +  DelInstrs.push_back(MUL);
>> +  DelInstrs.push_back(&Root);
>> +
>> +  return;
>> +}
>> 
>> Modified: llvm/trunk/lib/Target/AArch64/AArch64InstrInfo.h
>> URL: http://llvm.org/viewvc/llvm-project/llvm/trunk/lib/Target/AArch64/AArch64InstrInfo.h?rev=214832&r1=214831&r2=214832&view=diff
>> ==============================================================================
>> --- llvm/trunk/lib/Target/AArch64/AArch64InstrInfo.h (original)
>> +++ llvm/trunk/lib/Target/AArch64/AArch64InstrInfo.h Mon Aug  4 20:16:13 2014
>> @@ -17,6 +17,7 @@
>>  #include "AArch64.h"
>>  #include "AArch64RegisterInfo.h"
>>  #include "llvm/Target/TargetInstrInfo.h"
>> +#include "llvm/CodeGen/MachineCombinerPattern.h"
>> 
>>  #define GET_INSTRINFO_HEADER
>>  #include "AArch64GenInstrInfo.inc"
>> @@ -156,9 +157,26 @@ public:
>>    bool optimizeCompareInstr(MachineInstr *CmpInstr, unsigned SrcReg,
>>                              unsigned SrcReg2, int CmpMask, int CmpValue,
>>                              const MachineRegisterInfo *MRI) const override;
>> +  /// hasPattern - return true when there is potentially a faster code sequence
>> +  /// for an instruction chain ending in <Root>. All potential patterns are
>> +  /// listed
>> +  /// in the <Pattern> array.
>> +  virtual bool hasPattern(
>> +      MachineInstr &Root,
>> +      SmallVectorImpl<MachineCombinerPattern::MC_PATTERN> &Pattern) const;
>> +
>> +  /// genAlternativeCodeSequence - when hasPattern() finds a pattern
>> +  /// this function generates the instructions that could replace the
>> +  /// original code sequence
>> +  virtual void genAlternativeCodeSequence(
>> +      MachineInstr &Root, MachineCombinerPattern::MC_PATTERN P,
>> +      SmallVectorImpl<MachineInstr *> &InsInstrs,
>> +      SmallVectorImpl<MachineInstr *> &DelInstrs,
>> +      DenseMap<unsigned, unsigned> &InstrIdxForVirtReg) const;
>> +  /// useMachineCombiner - AArch64 supports MachineCombiner
>> +  virtual bool useMachineCombiner(void) const;
>> 
>>    bool expandPostRAPseudo(MachineBasicBlock::iterator MI) const override;
>> -
>>  private:
>>    void instantiateCondBranch(MachineBasicBlock &MBB, DebugLoc DL,
>>                               MachineBasicBlock *TBB,
>> 
>> Added: llvm/trunk/lib/Target/AArch64/AArch64MachineCombinerPattern.h
>> URL: http://llvm.org/viewvc/llvm-project/llvm/trunk/lib/Target/AArch64/AArch64MachineCombinerPattern.h?rev=214832&view=auto
>> ==============================================================================
>> --- llvm/trunk/lib/Target/AArch64/AArch64MachineCombinerPattern.h (added)
>> +++ llvm/trunk/lib/Target/AArch64/AArch64MachineCombinerPattern.h Mon Aug  4 20:16:13 2014
>> @@ -0,0 +1,42 @@
>> +//===- AArch64MachineCombinerPattern.h                                    -===//
>> +//===- AArch64 instruction pattern supported by combiner                  -===//
>> +//
>> +//                     The LLVM Compiler Infrastructure
>> +//
>> +// This file is distributed under the University of Illinois Open Source
>> +// License. See LICENSE.TXT for details.
>> +//
>> +//===----------------------------------------------------------------------===//
>> +//
>> +// This file defines instruction pattern supported by combiner
>> +//
>> +//===----------------------------------------------------------------------===//
>> +
>> +#ifndef LLVM_TARGET_AArch64MACHINECOMBINERPATTERN_H
>> +#define LLVM_TARGET_AArch64MACHINECOMBINERPATTERN_H
>> +
>> +namespace llvm {
>> +
>> +/// Enumeration of instruction pattern supported by machine combiner
>> +///
>> +///
>> +namespace MachineCombinerPattern {
>> +enum MC_PATTERN : int {
>> +  MC_NONE = 0,
>> +  MC_MULADDW_OP1 = 1,
>> +  MC_MULADDW_OP2 = 2,
>> +  MC_MULSUBW_OP1 = 3,
>> +  MC_MULSUBW_OP2 = 4,
>> +  MC_MULADDWI_OP1 = 5,
>> +  MC_MULSUBWI_OP1 = 6,
>> +  MC_MULADDX_OP1 = 7,
>> +  MC_MULADDX_OP2 = 8,
>> +  MC_MULSUBX_OP1 = 9,
>> +  MC_MULSUBX_OP2 = 10,
>> +  MC_MULADDXI_OP1 = 11,
>> +  MC_MULSUBXI_OP1 = 12
>> +};
>> +} // end namespace MachineCombinerPattern
>> +} // end namespace llvm
>> +
>> +#endif
>> 
>> Modified: llvm/trunk/lib/Target/AArch64/AArch64TargetMachine.cpp
>> URL: http://llvm.org/viewvc/llvm-project/llvm/trunk/lib/Target/AArch64/AArch64TargetMachine.cpp?rev=214832&r1=214831&r2=214832&view=diff
>> ==============================================================================
>> --- llvm/trunk/lib/Target/AArch64/AArch64TargetMachine.cpp (original)
>> +++ llvm/trunk/lib/Target/AArch64/AArch64TargetMachine.cpp Mon Aug  4 20:16:13 2014
>> @@ -24,6 +24,10 @@ static cl::opt<bool>
>>  EnableCCMP("aarch64-ccmp", cl::desc("Enable the CCMP formation pass"),
>>             cl::init(true), cl::Hidden);
>> 
>> +static cl::opt<bool> EnableMCR("aarch64-mcr",
>> +                               cl::desc("Enable the machine combiner pass"),
>> +                               cl::init(true), cl::Hidden);
>> +
>>  static cl::opt<bool>
>>  EnableStPairSuppress("aarch64-stp-suppress", cl::desc("Suppress STP for AArch64"),
>>                       cl::init(true), cl::Hidden);
>> @@ -174,6 +178,8 @@ bool AArch64PassConfig::addInstSelector(
>>  bool AArch64PassConfig::addILPOpts() {
>>    if (EnableCCMP)
>>      addPass(createAArch64ConditionalCompares());
>> +  if (EnableMCR)
>> +    addPass(&MachineCombinerID);
>>    addPass(&EarlyIfConverterID);
>>    if (EnableStPairSuppress)
>>      addPass(createAArch64StorePairSuppressPass());
>> 
>> Added: llvm/trunk/test/CodeGen/AArch64/madd-lohi.ll
>> URL: http://llvm.org/viewvc/llvm-project/llvm/trunk/test/CodeGen/AArch64/madd-lohi.ll?rev=214832&view=auto
>> ==============================================================================
>> --- llvm/trunk/test/CodeGen/AArch64/madd-lohi.ll (added)
>> +++ llvm/trunk/test/CodeGen/AArch64/madd-lohi.ll Mon Aug  4 20:16:13 2014
>> @@ -0,0 +1,19 @@
>> +; RUN: llc -mtriple=arm64-apple-ios7.0 %s -o - | FileCheck %s
>> +; RUN: llc -mtriple=aarch64_be-linux-gnu %s -o - | FileCheck --check-prefix=CHECK-BE %s
>> +
>> +define i128 @test_128bitmul(i128 %lhs, i128 %rhs) {
>> +; CHECK-LABEL: test_128bitmul:
>> +; CHECK-DAG: umulh [[CARRY:x[0-9]+]], x0, x2
>> +; CHECK-DAG: madd [[PART1:x[0-9]+]], x0, x3, [[CARRY]]
>> +; CHECK: madd x1, x1, x2, [[PART1]]
>> +; CHECK: mul x0, x0, x2
>> +
>> +; CHECK-BE-LABEL: test_128bitmul:
>> +; CHECK-BE-DAG: umulh [[CARRY:x[0-9]+]], x1, x3
>> +; CHECK-BE-DAG: madd [[PART1:x[0-9]+]], x1, x2, [[CARRY]]
>> +; CHECK-BE: madd x0, x0, x3, [[PART1]]
>> +; CHECK-BE: mul x1, x1, x3
>> +
>> +  %prod = mul i128 %lhs, %rhs
>> +  ret i128 %prod
>> +}
>> 
>> Modified: llvm/trunk/test/CodeGen/AArch64/mul-lohi.ll
>> URL: http://llvm.org/viewvc/llvm-project/llvm/trunk/test/CodeGen/AArch64/mul-lohi.ll?rev=214832&r1=214831&r2=214832&view=diff
>> ==============================================================================
>> --- llvm/trunk/test/CodeGen/AArch64/mul-lohi.ll (original)
>> +++ llvm/trunk/test/CodeGen/AArch64/mul-lohi.ll Mon Aug  4 20:16:13 2014
>> @@ -1,17 +1,16 @@
>> -; RUN: llc -mtriple=arm64-apple-ios7.0 %s -o - | FileCheck %s
>> -; RUN: llc -mtriple=aarch64_be-linux-gnu %s -o - | FileCheck --check-prefix=CHECK-BE %s
>> -
>> +; RUN: llc -mtriple=arm64-apple-ios7.0 -mcpu=cyclone %s -o - | FileCheck %s
>> +; RUN: llc -mtriple=aarch64_be-linux-gnu -mcpu=cyclone %s -o - | FileCheck --check-prefix=CHECK-BE %s
>>  define i128 @test_128bitmul(i128 %lhs, i128 %rhs) {
>>  ; CHECK-LABEL: test_128bitmul:
>> +; CHECK-DAG: mul [[PART1:x[0-9]+]], x0, x3
>>  ; CHECK-DAG: umulh [[CARRY:x[0-9]+]], x0, x2
>> -; CHECK-DAG: madd [[PART1:x[0-9]+]], x0, x3, [[CARRY]]
>> -; CHECK: madd x1, x1, x2, [[PART1]]
>> +; CHECK: mul [[PART2:x[0-9]+]], x1, x2
>>  ; CHECK: mul x0, x0, x2
>> 
>>  ; CHECK-BE-LABEL: test_128bitmul:
>> +; CHECK-BE-DAG: mul [[PART1:x[0-9]+]], x1, x2
>>  ; CHECK-BE-DAG: umulh [[CARRY:x[0-9]+]], x1, x3
>> -; CHECK-BE-DAG: madd [[PART1:x[0-9]+]], x1, x2, [[CARRY]]
>> -; CHECK-BE: madd x0, x0, x3, [[PART1]]
>> +; CHECK-BE: mul [[PART2:x[0-9]+]], x0, x3
>>  ; CHECK-BE: mul x1, x1, x3
>> 
>>    %prod = mul i128 %lhs, %rhs
>> 
>> 
>> _______________________________________________
>> llvm-commits mailing list
>> llvm-commits at cs.uiuc.edu
>> http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits
>> 
>> 
>> 
>> -- 
>> Best Regards,
>> 
>> Kevin Qin
> 

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20140805/8f2a9e06/attachment.html>