[llvm] r196261 - Enhance the fix of PR17631

Dmitry Babokin babokin at gmail.com
Wed Dec 4 04:10:02 PST 2013


Bill,

Could you please consider merging this patch to 3.4? The patch seems to be
save and at the same time it's quite important, as can cause a lot of pain
for users on Window who are compiling for AVX target. If the bug triggers,
half of AVX lane is occasionally zeroed, which is nightmare to debug in
numerical code, for example.

Dmitry.


On Tue, Dec 3, 2013 at 1:17 PM, Michael Liao <michael.liao at intel.com> wrote:

> Author: hliao
> Date: Tue Dec  3 03:17:32 2013
> New Revision: 196261
>
> URL: http://llvm.org/viewvc/llvm-project?rev=196261&view=rev
> Log:
> Enhance the fix of PR17631
>
> - The fix to PR17631 fixes part of the cases where 'vzeroupper' should
>   not be issued before 'call' insn. There're other cases where helper
>   calls will be inserted not limited to epilog. These helper calls do
>   not follow the standard calling convention and won't clobber any YMM
>   registers. (So far, all call conventions will clobber any or part of
>   YMM registers.)
>   This patch enhances the previous fix to cover more cases 'vzerosupper'
> should
>   not be inserted by checking if that function call won't clobber any YMM
>   registers and skipping it if so.
>
>
> Modified:
>     llvm/trunk/lib/Target/X86/X86VZeroUpper.cpp
>     llvm/trunk/test/CodeGen/X86/pr17631.ll
>
> Modified: llvm/trunk/lib/Target/X86/X86VZeroUpper.cpp
> URL:
> http://llvm.org/viewvc/llvm-project/llvm/trunk/lib/Target/X86/X86VZeroUpper.cpp?rev=196261&r1=196260&r2=196261&view=diff
>
> ==============================================================================
> --- llvm/trunk/lib/Target/X86/X86VZeroUpper.cpp (original)
> +++ llvm/trunk/lib/Target/X86/X86VZeroUpper.cpp Tue Dec  3 03:17:32 2013
> @@ -148,6 +148,25 @@ static bool hasYmmReg(MachineInstr *MI)
>    return false;
>  }
>
> +/// clobbersAnyYmmReg() - Check if any YMM register will be clobbered by
> this
> +/// instruction.
> +static bool clobbersAnyYmmReg(MachineInstr *MI) {
> +  for (unsigned i = 0, e = MI->getNumOperands(); i != e; ++i) {
> +    const MachineOperand &MO = MI->getOperand(i);
> +    if (!MO.isRegMask())
> +      continue;
> +    for (unsigned reg = X86::YMM0; reg < X86::YMM31; ++reg) {
> +      if (MO.clobbersPhysReg(reg))
> +        return true;
> +    }
> +    for (unsigned reg = X86::ZMM0; reg < X86::ZMM31; ++reg) {
> +      if (MO.clobbersPhysReg(reg))
> +        return true;
> +    }
> +  }
> +  return false;
> +}
> +
>  /// runOnMachineFunction - Loop over all of the basic blocks, inserting
>  /// vzero upper instructions before function calls.
>  bool VZeroUpperInserter::runOnMachineFunction(MachineFunction &MF) {
> @@ -234,14 +253,6 @@ bool VZeroUpperInserter::processBasicBlo
>      DebugLoc dl = I->getDebugLoc();
>      MachineInstr *MI = I;
>
> -    // Don't need to check instructions added in prolog.
> -    // In prolog, special function calls may be added for specific targets
> -    // (e.g. on Windows, a prolog helper '_chkstk' is called when the
> local
> -    // variables exceed 4K bytes on stack.) These helpers won't use/def
> YMM/XMM
> -    // registers.
> -    if (MI->getFlag(MachineInstr::FrameSetup))
> -      continue;
> -
>      bool isControlFlow = MI->isCall() || MI->isReturn();
>
>      // Shortcut: don't need to check regular instructions in dirty state.
> @@ -260,6 +271,14 @@ bool VZeroUpperInserter::processBasicBlo
>      if (!isControlFlow)
>        continue;
>
> +    // If the call won't clobber any YMM register, skip it as well. It
> usually
> +    // happens on helper function calls (such as '_chkstk', '_ftol2')
> where
> +    // standard calling convention is not used (RegMask is not used to
> mark
> +    // register clobbered and register usage (def/imp-def/use) is
> well-dfined
> +    // and explicitly specified.
> +    if (MI->isCall() && !clobbersAnyYmmReg(MI))
> +      continue;
> +
>      BBHasCall = true;
>
>      // The VZEROUPPER instruction resets the upper 128 bits of all Intel
> AVX
>
> Modified: llvm/trunk/test/CodeGen/X86/pr17631.ll
> URL:
> http://llvm.org/viewvc/llvm-project/llvm/trunk/test/CodeGen/X86/pr17631.ll?rev=196261&r1=196260&r2=196261&view=diff
>
> ==============================================================================
> --- llvm/trunk/test/CodeGen/X86/pr17631.ll (original)
> +++ llvm/trunk/test/CodeGen/X86/pr17631.ll Tue Dec  3 03:17:32 2013
> @@ -1,16 +1,16 @@
>  ; RUN: llc < %s -mcpu=core-avx-i -mtriple=i386-pc-win32 | FileCheck %s
> -
> +
>  %struct_type = type { [64 x <8 x float>], <8 x float> }
> -
> +
>  ; Function Attrs: nounwind readnone
>  declare i32 @llvm.x86.avx.movmsk.ps.256(<8 x float>)
> -
> +
>  ; Function Attrs: nounwind
>  define i32 @equal(<8 x i32> %A) {
>  allocas:
>    %first_alloc  = alloca [64 x <8 x i32>]
>    %second_alloc = alloca %struct_type
> -
> +
>    %A1 = bitcast <8 x i32> %A to <8 x float>
>    %A2 = call i32 @llvm.x86.avx.movmsk.ps.256(<8 x float> %A1)
>    ret i32 %A2
> @@ -20,3 +20,15 @@ allocas:
>  ; CHECK-NOT: vzeroupper
>  ; CHECK: _chkstk
>  ; CHECK: ret
> +
> +define <8 x float> @foo(<8 x float> %y, i64* %p, double %x) {
> +  %i = fptoui double %x to i64
> +  store i64 %i, i64* %p
> +  %ret = fadd <8 x float> %y, %y
> +  ret <8 x float> %ret
> +}
> +
> +; CHECK: foo
> +; CHECK-NOT: vzeroupper
> +; CHECK: _ftol2
> +; CHECK: ret
>
>
> _______________________________________________
> llvm-commits mailing list
> llvm-commits at cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20131204/2240bd67/attachment.html>


More information about the llvm-commits mailing list