[llvm] r196261 - Enhance the fix of PR17631

Dmitry Babokin babokin at gmail.com
Wed Dec 4 12:39:48 PST 2013


Bill,

To be precise r196261+r196391 need to back ported.

Thanks in advance!

Dmitry.


On Wed, Dec 4, 2013 at 4:10 PM, Dmitry Babokin <babokin at gmail.com> wrote:

> Bill,
>
> Could you please consider merging this patch to 3.4? The patch seems to be
> save and at the same time it's quite important, as can cause a lot of pain
> for users on Window who are compiling for AVX target. If the bug triggers,
> half of AVX lane is occasionally zeroed, which is nightmare to debug in
> numerical code, for example.
>
> Dmitry.
>
>
> On Tue, Dec 3, 2013 at 1:17 PM, Michael Liao <michael.liao at intel.com>wrote:
>
>> Author: hliao
>> Date: Tue Dec  3 03:17:32 2013
>> New Revision: 196261
>>
>> URL: http://llvm.org/viewvc/llvm-project?rev=196261&view=rev
>> Log:
>> Enhance the fix of PR17631
>>
>> - The fix to PR17631 fixes part of the cases where 'vzeroupper' should
>>   not be issued before 'call' insn. There're other cases where helper
>>   calls will be inserted not limited to epilog. These helper calls do
>>   not follow the standard calling convention and won't clobber any YMM
>>   registers. (So far, all call conventions will clobber any or part of
>>   YMM registers.)
>>   This patch enhances the previous fix to cover more cases 'vzerosupper'
>> should
>>   not be inserted by checking if that function call won't clobber any YMM
>>   registers and skipping it if so.
>>
>>
>> Modified:
>>     llvm/trunk/lib/Target/X86/X86VZeroUpper.cpp
>>     llvm/trunk/test/CodeGen/X86/pr17631.ll
>>
>> Modified: llvm/trunk/lib/Target/X86/X86VZeroUpper.cpp
>> URL:
>> http://llvm.org/viewvc/llvm-project/llvm/trunk/lib/Target/X86/X86VZeroUpper.cpp?rev=196261&r1=196260&r2=196261&view=diff
>>
>> ==============================================================================
>> --- llvm/trunk/lib/Target/X86/X86VZeroUpper.cpp (original)
>> +++ llvm/trunk/lib/Target/X86/X86VZeroUpper.cpp Tue Dec  3 03:17:32 2013
>> @@ -148,6 +148,25 @@ static bool hasYmmReg(MachineInstr *MI)
>>    return false;
>>  }
>>
>> +/// clobbersAnyYmmReg() - Check if any YMM register will be clobbered by
>> this
>> +/// instruction.
>> +static bool clobbersAnyYmmReg(MachineInstr *MI) {
>> +  for (unsigned i = 0, e = MI->getNumOperands(); i != e; ++i) {
>> +    const MachineOperand &MO = MI->getOperand(i);
>> +    if (!MO.isRegMask())
>> +      continue;
>> +    for (unsigned reg = X86::YMM0; reg < X86::YMM31; ++reg) {
>> +      if (MO.clobbersPhysReg(reg))
>> +        return true;
>> +    }
>> +    for (unsigned reg = X86::ZMM0; reg < X86::ZMM31; ++reg) {
>> +      if (MO.clobbersPhysReg(reg))
>> +        return true;
>> +    }
>> +  }
>> +  return false;
>> +}
>> +
>>  /// runOnMachineFunction - Loop over all of the basic blocks, inserting
>>  /// vzero upper instructions before function calls.
>>  bool VZeroUpperInserter::runOnMachineFunction(MachineFunction &MF) {
>> @@ -234,14 +253,6 @@ bool VZeroUpperInserter::processBasicBlo
>>      DebugLoc dl = I->getDebugLoc();
>>      MachineInstr *MI = I;
>>
>> -    // Don't need to check instructions added in prolog.
>> -    // In prolog, special function calls may be added for specific
>> targets
>> -    // (e.g. on Windows, a prolog helper '_chkstk' is called when the
>> local
>> -    // variables exceed 4K bytes on stack.) These helpers won't use/def
>> YMM/XMM
>> -    // registers.
>> -    if (MI->getFlag(MachineInstr::FrameSetup))
>> -      continue;
>> -
>>      bool isControlFlow = MI->isCall() || MI->isReturn();
>>
>>      // Shortcut: don't need to check regular instructions in dirty state.
>> @@ -260,6 +271,14 @@ bool VZeroUpperInserter::processBasicBlo
>>      if (!isControlFlow)
>>        continue;
>>
>> +    // If the call won't clobber any YMM register, skip it as well. It
>> usually
>> +    // happens on helper function calls (such as '_chkstk', '_ftol2')
>> where
>> +    // standard calling convention is not used (RegMask is not used to
>> mark
>> +    // register clobbered and register usage (def/imp-def/use) is
>> well-dfined
>> +    // and explicitly specified.
>> +    if (MI->isCall() && !clobbersAnyYmmReg(MI))
>> +      continue;
>> +
>>      BBHasCall = true;
>>
>>      // The VZEROUPPER instruction resets the upper 128 bits of all Intel
>> AVX
>>
>> Modified: llvm/trunk/test/CodeGen/X86/pr17631.ll
>> URL:
>> http://llvm.org/viewvc/llvm-project/llvm/trunk/test/CodeGen/X86/pr17631.ll?rev=196261&r1=196260&r2=196261&view=diff
>>
>> ==============================================================================
>> --- llvm/trunk/test/CodeGen/X86/pr17631.ll (original)
>> +++ llvm/trunk/test/CodeGen/X86/pr17631.ll Tue Dec  3 03:17:32 2013
>> @@ -1,16 +1,16 @@
>>  ; RUN: llc < %s -mcpu=core-avx-i -mtriple=i386-pc-win32 | FileCheck %s
>> -
>> +
>>  %struct_type = type { [64 x <8 x float>], <8 x float> }
>> -
>> +
>>  ; Function Attrs: nounwind readnone
>>  declare i32 @llvm.x86.avx.movmsk.ps.256(<8 x float>)
>> -
>> +
>>  ; Function Attrs: nounwind
>>  define i32 @equal(<8 x i32> %A) {
>>  allocas:
>>    %first_alloc  = alloca [64 x <8 x i32>]
>>    %second_alloc = alloca %struct_type
>> -
>> +
>>    %A1 = bitcast <8 x i32> %A to <8 x float>
>>    %A2 = call i32 @llvm.x86.avx.movmsk.ps.256(<8 x float> %A1)
>>    ret i32 %A2
>> @@ -20,3 +20,15 @@ allocas:
>>  ; CHECK-NOT: vzeroupper
>>  ; CHECK: _chkstk
>>  ; CHECK: ret
>> +
>> +define <8 x float> @foo(<8 x float> %y, i64* %p, double %x) {
>> +  %i = fptoui double %x to i64
>> +  store i64 %i, i64* %p
>> +  %ret = fadd <8 x float> %y, %y
>> +  ret <8 x float> %ret
>> +}
>> +
>> +; CHECK: foo
>> +; CHECK-NOT: vzeroupper
>> +; CHECK: _ftol2
>> +; CHECK: ret
>>
>>
>> _______________________________________________
>> llvm-commits mailing list
>> llvm-commits at cs.uiuc.edu
>> http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits
>>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20131205/2513efbd/attachment.html>


More information about the llvm-commits mailing list