[llvm] r196261 - Enhance the fix of PR17631
Dmitry Babokin
babokin at gmail.com
Wed Dec 4 12:39:48 PST 2013
Bill,
To be precise r196261+r196391 need to back ported.
Thanks in advance!
Dmitry.
On Wed, Dec 4, 2013 at 4:10 PM, Dmitry Babokin <babokin at gmail.com> wrote:
> Bill,
>
> Could you please consider merging this patch to 3.4? The patch seems to be
> save and at the same time it's quite important, as can cause a lot of pain
> for users on Window who are compiling for AVX target. If the bug triggers,
> half of AVX lane is occasionally zeroed, which is nightmare to debug in
> numerical code, for example.
>
> Dmitry.
>
>
> On Tue, Dec 3, 2013 at 1:17 PM, Michael Liao <michael.liao at intel.com>wrote:
>
>> Author: hliao
>> Date: Tue Dec 3 03:17:32 2013
>> New Revision: 196261
>>
>> URL: http://llvm.org/viewvc/llvm-project?rev=196261&view=rev
>> Log:
>> Enhance the fix of PR17631
>>
>> - The fix to PR17631 fixes part of the cases where 'vzeroupper' should
>> not be issued before 'call' insn. There're other cases where helper
>> calls will be inserted not limited to epilog. These helper calls do
>> not follow the standard calling convention and won't clobber any YMM
>> registers. (So far, all call conventions will clobber any or part of
>> YMM registers.)
>> This patch enhances the previous fix to cover more cases 'vzerosupper'
>> should
>> not be inserted by checking if that function call won't clobber any YMM
>> registers and skipping it if so.
>>
>>
>> Modified:
>> llvm/trunk/lib/Target/X86/X86VZeroUpper.cpp
>> llvm/trunk/test/CodeGen/X86/pr17631.ll
>>
>> Modified: llvm/trunk/lib/Target/X86/X86VZeroUpper.cpp
>> URL:
>> http://llvm.org/viewvc/llvm-project/llvm/trunk/lib/Target/X86/X86VZeroUpper.cpp?rev=196261&r1=196260&r2=196261&view=diff
>>
>> ==============================================================================
>> --- llvm/trunk/lib/Target/X86/X86VZeroUpper.cpp (original)
>> +++ llvm/trunk/lib/Target/X86/X86VZeroUpper.cpp Tue Dec 3 03:17:32 2013
>> @@ -148,6 +148,25 @@ static bool hasYmmReg(MachineInstr *MI)
>> return false;
>> }
>>
>> +/// clobbersAnyYmmReg() - Check if any YMM register will be clobbered by
>> this
>> +/// instruction.
>> +static bool clobbersAnyYmmReg(MachineInstr *MI) {
>> + for (unsigned i = 0, e = MI->getNumOperands(); i != e; ++i) {
>> + const MachineOperand &MO = MI->getOperand(i);
>> + if (!MO.isRegMask())
>> + continue;
>> + for (unsigned reg = X86::YMM0; reg < X86::YMM31; ++reg) {
>> + if (MO.clobbersPhysReg(reg))
>> + return true;
>> + }
>> + for (unsigned reg = X86::ZMM0; reg < X86::ZMM31; ++reg) {
>> + if (MO.clobbersPhysReg(reg))
>> + return true;
>> + }
>> + }
>> + return false;
>> +}
>> +
>> /// runOnMachineFunction - Loop over all of the basic blocks, inserting
>> /// vzero upper instructions before function calls.
>> bool VZeroUpperInserter::runOnMachineFunction(MachineFunction &MF) {
>> @@ -234,14 +253,6 @@ bool VZeroUpperInserter::processBasicBlo
>> DebugLoc dl = I->getDebugLoc();
>> MachineInstr *MI = I;
>>
>> - // Don't need to check instructions added in prolog.
>> - // In prolog, special function calls may be added for specific
>> targets
>> - // (e.g. on Windows, a prolog helper '_chkstk' is called when the
>> local
>> - // variables exceed 4K bytes on stack.) These helpers won't use/def
>> YMM/XMM
>> - // registers.
>> - if (MI->getFlag(MachineInstr::FrameSetup))
>> - continue;
>> -
>> bool isControlFlow = MI->isCall() || MI->isReturn();
>>
>> // Shortcut: don't need to check regular instructions in dirty state.
>> @@ -260,6 +271,14 @@ bool VZeroUpperInserter::processBasicBlo
>> if (!isControlFlow)
>> continue;
>>
>> + // If the call won't clobber any YMM register, skip it as well. It
>> usually
>> + // happens on helper function calls (such as '_chkstk', '_ftol2')
>> where
>> + // standard calling convention is not used (RegMask is not used to
>> mark
>> + // register clobbered and register usage (def/imp-def/use) is
>> well-dfined
>> + // and explicitly specified.
>> + if (MI->isCall() && !clobbersAnyYmmReg(MI))
>> + continue;
>> +
>> BBHasCall = true;
>>
>> // The VZEROUPPER instruction resets the upper 128 bits of all Intel
>> AVX
>>
>> Modified: llvm/trunk/test/CodeGen/X86/pr17631.ll
>> URL:
>> http://llvm.org/viewvc/llvm-project/llvm/trunk/test/CodeGen/X86/pr17631.ll?rev=196261&r1=196260&r2=196261&view=diff
>>
>> ==============================================================================
>> --- llvm/trunk/test/CodeGen/X86/pr17631.ll (original)
>> +++ llvm/trunk/test/CodeGen/X86/pr17631.ll Tue Dec 3 03:17:32 2013
>> @@ -1,16 +1,16 @@
>> ; RUN: llc < %s -mcpu=core-avx-i -mtriple=i386-pc-win32 | FileCheck %s
>> -
>> +
>> %struct_type = type { [64 x <8 x float>], <8 x float> }
>> -
>> +
>> ; Function Attrs: nounwind readnone
>> declare i32 @llvm.x86.avx.movmsk.ps.256(<8 x float>)
>> -
>> +
>> ; Function Attrs: nounwind
>> define i32 @equal(<8 x i32> %A) {
>> allocas:
>> %first_alloc = alloca [64 x <8 x i32>]
>> %second_alloc = alloca %struct_type
>> -
>> +
>> %A1 = bitcast <8 x i32> %A to <8 x float>
>> %A2 = call i32 @llvm.x86.avx.movmsk.ps.256(<8 x float> %A1)
>> ret i32 %A2
>> @@ -20,3 +20,15 @@ allocas:
>> ; CHECK-NOT: vzeroupper
>> ; CHECK: _chkstk
>> ; CHECK: ret
>> +
>> +define <8 x float> @foo(<8 x float> %y, i64* %p, double %x) {
>> + %i = fptoui double %x to i64
>> + store i64 %i, i64* %p
>> + %ret = fadd <8 x float> %y, %y
>> + ret <8 x float> %ret
>> +}
>> +
>> +; CHECK: foo
>> +; CHECK-NOT: vzeroupper
>> +; CHECK: _ftol2
>> +; CHECK: ret
>>
>>
>> _______________________________________________
>> llvm-commits mailing list
>> llvm-commits at cs.uiuc.edu
>> http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits
>>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20131205/2513efbd/attachment.html>
More information about the llvm-commits
mailing list