[llvm] r347917 - [DAGCombiner] narrow truncated binops

David Jones via llvm-commits llvm-commits at lists.llvm.org
Wed Dec 5 17:47:34 PST 2018


On Wed, Dec 5, 2018 at 5:40 PM Sanjay Patel <spatel at rotateright.com> wrote:

> Hi David,
>
> Thanks for reporting the problem. I don’t have any guesses as to how this
> could cause OOM. Cc’ing some people that might.
>

+a few more, thanks! :-)


> If the problem is only showing up on PPC, a quick hack would be to add a
> TLI hook and disable the transform for PPC. I’d prefer that to a full
> revert (especially since I’ve already enhanced this code to handle some
> vectors, and another enhancement is in progress)...unless we have a test
> case that shows it’s not a PPC-specific bug?
>
>
Agreed on avoiding a revert... this looks like a pretty hard-fought change,
based on https://bugs.llvm.org/show_bug.cgi?id=32023.

I do believe this is PPC-specific (although, as is often the case, your
change may simply be tickling a bug somewhere else).



> Another option is to add a disablement flag and default it to disable
> until we have a reproducer.
> I can add the TLI hook or cripple flag tomorrow if that sounds reasonable.
>

Sure, that's fine. I might find some more useful clues by then, though, so
I'll keep you in the loop.


> On Wed, Dec 5, 2018 at 6:16 PM David Jones <dlj at google.com> wrote:
>
>> Hi Sanjay,
>>
>> I've seen some instances of Clang going OOM, and it appears to cleanly
>> bisect to this revision. The affected builds are while building Clang
>> itself (FrontendAction, Sema, and a couple of others) targeting PPC with
>> ASAN (which uses -O1).
>>
>> Unfortunately, since the reproduction requires running to OOM,
>> reproducing takes longer than I would like. I haven't been able to get a
>> useful, reduced example, but I am trying to get one (manually and via
>> creduce). I'm also trying to reproduce with a Clang built with a few
>> different sanitizers (although these are also a bit finicky, so no luck yet
>> there, either.)
>>
>> I'm not familiar with this area of LLVM, so the best fix I can offer is a
>> revert.
>>
>> Could you advise on a fix? (If you don't have a timely fix in mind,
>> please let me know if you would like me to revert the revision for you.)
>>
>> Thanks,
>> David Jones
>>
>> On Thu, Nov 29, 2018 at 1:01 PM Sanjay Patel via llvm-commits <
>> llvm-commits at lists.llvm.org> wrote:
>>
>>> Author: spatel
>>> Date: Thu Nov 29 12:58:26 2018
>>> New Revision: 347917
>>>
>>> URL: http://llvm.org/viewvc/llvm-project?rev=347917&view=rev
>>> Log:
>>> [DAGCombiner] narrow truncated binops
>>>
>>> The motivating case for this is shown in:
>>> https://bugs.llvm.org/show_bug.cgi?id=32023
>>> and the corresponding rot16.ll regression tests.
>>>
>>> Because x86 scalar shift amounts are i8 values, we can end up with
>>> trunc-binop-trunc
>>> sequences that don't get folded in IR.
>>>
>>> As the TODO comments suggest, there will be regressions if we extend
>>> this (for x86,
>>> we mostly seem to be missing LEA opportunities, but there are likely
>>> vector folds
>>> missing too). I think those should be considered existing bugs because
>>> this is the
>>> same transform that we do as an IR canonicalization in instcombine. We
>>> just need
>>> more tests to make those visible independent of this patch.
>>>
>>> Differential Revision: https://reviews.llvm.org/D54640
>>>
>>> Modified:
>>>     llvm/trunk/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
>>>     llvm/trunk/test/CodeGen/AMDGPU/cgp-bitfield-extract.ll
>>>     llvm/trunk/test/CodeGen/X86/2008-09-11-CoalescerBug2.ll
>>>     llvm/trunk/test/CodeGen/X86/2010-08-04-MaskedSignedCompare.ll
>>>     llvm/trunk/test/CodeGen/X86/add-sub-nsw-nuw.ll
>>>     llvm/trunk/test/CodeGen/X86/bool-math.ll
>>>     llvm/trunk/test/CodeGen/X86/clear-lowbits.ll
>>>     llvm/trunk/test/CodeGen/X86/cmov.ll
>>>     llvm/trunk/test/CodeGen/X86/extract-bits.ll
>>>     llvm/trunk/test/CodeGen/X86/extract-lowbits.ll
>>>     llvm/trunk/test/CodeGen/X86/fshl.ll
>>>     llvm/trunk/test/CodeGen/X86/fshr.ll
>>>     llvm/trunk/test/CodeGen/X86/funnel-shift-rot.ll
>>>     llvm/trunk/test/CodeGen/X86/funnel-shift.ll
>>>     llvm/trunk/test/CodeGen/X86/pr32284.ll
>>>     llvm/trunk/test/CodeGen/X86/pr37879.ll
>>>     llvm/trunk/test/CodeGen/X86/rot16.ll
>>>     llvm/trunk/test/CodeGen/X86/rotate.ll
>>>     llvm/trunk/test/CodeGen/X86/rotate4.ll
>>>     llvm/trunk/test/CodeGen/X86/schedule-x86-64-shld.ll
>>>     llvm/trunk/test/CodeGen/X86/scheduler-backtracking.ll
>>>     llvm/trunk/test/CodeGen/X86/test-shrink.ll
>>>     llvm/trunk/test/CodeGen/X86/xchg-nofold.ll
>>>
>>> Modified: llvm/trunk/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
>>> URL:
>>> http://llvm.org/viewvc/llvm-project/llvm/trunk/lib/CodeGen/SelectionDAG/DAGCombiner.cpp?rev=347917&r1=347916&r2=347917&view=diff
>>>
>>> ==============================================================================
>>> --- llvm/trunk/lib/CodeGen/SelectionDAG/DAGCombiner.cpp (original)
>>> +++ llvm/trunk/lib/CodeGen/SelectionDAG/DAGCombiner.cpp Thu Nov 29
>>> 12:58:26 2018
>>> @@ -9722,6 +9722,28 @@ SDValue DAGCombiner::visitTRUNCATE(SDNod
>>>    if (SDValue NewVSel = matchVSelectOpSizesWithSetCC(N))
>>>      return NewVSel;
>>>
>>> +  // Narrow a suitable binary operation with a constant operand by
>>> moving it
>>> +  // ahead of the truncate. This is limited to pre-legalization because
>>> targets
>>> +  // may prefer a wider type during later combines and invert this
>>> transform.
>>> +  switch (N0.getOpcode()) {
>>> +  // TODO: Add case for ADD - that will likely require a change in
>>> logic here
>>> +  // or target-specific changes to avoid regressions.
>>> +  case ISD::SUB:
>>> +  case ISD::MUL:
>>> +  case ISD::AND:
>>> +  case ISD::OR:
>>> +  case ISD::XOR:
>>> +    // TODO: This should allow vector constants/types too.
>>> +    if (!LegalOperations && N0.hasOneUse() &&
>>> +        (isa<ConstantSDNode>(N0.getOperand(0)) ||
>>> +         isa<ConstantSDNode>(N0.getOperand(1)))) {
>>> +      SDLoc DL(N);
>>> +      SDValue NarrowL = DAG.getNode(ISD::TRUNCATE, DL, VT,
>>> N0.getOperand(0));
>>> +      SDValue NarrowR = DAG.getNode(ISD::TRUNCATE, DL, VT,
>>> N0.getOperand(1));
>>> +      return DAG.getNode(N0.getOpcode(), DL, VT, NarrowL, NarrowR);
>>> +    }
>>> +  }
>>> +
>>>    return SDValue();
>>>  }
>>>
>>>
>>> Modified: llvm/trunk/test/CodeGen/AMDGPU/cgp-bitfield-extract.ll
>>> URL:
>>> http://llvm.org/viewvc/llvm-project/llvm/trunk/test/CodeGen/AMDGPU/cgp-bitfield-extract.ll?rev=347917&r1=347916&r2=347917&view=diff
>>>
>>> ==============================================================================
>>> --- llvm/trunk/test/CodeGen/AMDGPU/cgp-bitfield-extract.ll (original)
>>> +++ llvm/trunk/test/CodeGen/AMDGPU/cgp-bitfield-extract.ll Thu Nov 29
>>> 12:58:26 2018
>>> @@ -125,11 +125,11 @@ ret:
>>>  ; GCN: s_cbranch_scc1
>>>
>>>  ; SI: s_bfe_u32 s{{[0-9]+}}, s{{[0-9]+}}, 0x80004
>>> -; VI: s_and_b32 s{{[0-9]+}}, [[BFE]], 0xff
>>> +; VI: v_mov_b32_e32 v{{[0-9]+}}, 0xff
>>>
>>>  ; GCN: BB2_2:
>>>  ; SI: s_bfe_u32 s{{[0-9]+}}, s{{[0-9]+}}, 0x70004
>>> -; VI: s_and_b32 s{{[0-9]+}}, [[BFE]], 0x7f
>>> +; VI: v_mov_b32_e32 v{{[0-9]+}}, 0x7f
>>>
>>>  ; GCN: BB2_3:
>>>  ; GCN: buffer_store_short
>>>
>>> Modified: llvm/trunk/test/CodeGen/X86/2008-09-11-CoalescerBug2.ll
>>> URL:
>>> http://llvm.org/viewvc/llvm-project/llvm/trunk/test/CodeGen/X86/2008-09-11-CoalescerBug2.ll?rev=347917&r1=347916&r2=347917&view=diff
>>>
>>> ==============================================================================
>>> --- llvm/trunk/test/CodeGen/X86/2008-09-11-CoalescerBug2.ll (original)
>>> +++ llvm/trunk/test/CodeGen/X86/2008-09-11-CoalescerBug2.ll Thu Nov 29
>>> 12:58:26 2018
>>> @@ -17,7 +17,7 @@ define i32 @func_44(i16 signext %p_46) n
>>>  ; SOURCE-SCHED-NEXT:    setg %cl
>>>  ; SOURCE-SCHED-NEXT:    movb g_73, %dl
>>>  ; SOURCE-SCHED-NEXT:    xorl %eax, %eax
>>> -; SOURCE-SCHED-NEXT:    subl {{[0-9]+}}(%esp), %eax
>>> +; SOURCE-SCHED-NEXT:    subb {{[0-9]+}}(%esp), %al
>>>  ; SOURCE-SCHED-NEXT:    testb %dl, %dl
>>>  ; SOURCE-SCHED-NEXT:    jne .LBB0_2
>>>  ; SOURCE-SCHED-NEXT:  # %bb.1: # %bb11
>>>
>>> Modified: llvm/trunk/test/CodeGen/X86/2010-08-04-MaskedSignedCompare.ll
>>> URL:
>>> http://llvm.org/viewvc/llvm-project/llvm/trunk/test/CodeGen/X86/2010-08-04-MaskedSignedCompare.ll?rev=347917&r1=347916&r2=347917&view=diff
>>>
>>> ==============================================================================
>>> --- llvm/trunk/test/CodeGen/X86/2010-08-04-MaskedSignedCompare.ll
>>> (original)
>>> +++ llvm/trunk/test/CodeGen/X86/2010-08-04-MaskedSignedCompare.ll Thu
>>> Nov 29 12:58:26 2018
>>> @@ -11,7 +11,7 @@ define i32 @main() nounwind {
>>>  ; CHECK:       # %bb.0: # %entry
>>>  ; CHECK-NEXT:    xorl %eax, %eax
>>>  ; CHECK-NEXT:    cmpq {{.*}}(%rip), %rax
>>> -; CHECK-NEXT:    sbbl %eax, %eax
>>> +; CHECK-NEXT:    sbbb %al, %al
>>>  ; CHECK-NEXT:    testb $-106, %al
>>>  ; CHECK-NEXT:    jle .LBB0_1
>>>  ; CHECK-NEXT:  # %bb.2: # %if.then
>>>
>>> Modified: llvm/trunk/test/CodeGen/X86/add-sub-nsw-nuw.ll
>>> URL:
>>> http://llvm.org/viewvc/llvm-project/llvm/trunk/test/CodeGen/X86/add-sub-nsw-nuw.ll?rev=347917&r1=347916&r2=347917&view=diff
>>>
>>> ==============================================================================
>>> --- llvm/trunk/test/CodeGen/X86/add-sub-nsw-nuw.ll (original)
>>> +++ llvm/trunk/test/CodeGen/X86/add-sub-nsw-nuw.ll Thu Nov 29 12:58:26
>>> 2018
>>> @@ -9,7 +9,7 @@ define i8 @PR30841(i64 %argc) {
>>>  ; CHECK-LABEL: PR30841:
>>>  ; CHECK:       ## %bb.0: ## %entry
>>>  ; CHECK-NEXT:    movl {{[0-9]+}}(%esp), %eax
>>> -; CHECK-NEXT:    negl %eax
>>> +; CHECK-NEXT:    negb %al
>>>  ; CHECK-NEXT:    ## kill: def $al killed $al killed $eax
>>>  ; CHECK-NEXT:    retl
>>>  entry:
>>>
>>> Modified: llvm/trunk/test/CodeGen/X86/bool-math.ll
>>> URL:
>>> http://llvm.org/viewvc/llvm-project/llvm/trunk/test/CodeGen/X86/bool-math.ll?rev=347917&r1=347916&r2=347917&view=diff
>>>
>>> ==============================================================================
>>> --- llvm/trunk/test/CodeGen/X86/bool-math.ll (original)
>>> +++ llvm/trunk/test/CodeGen/X86/bool-math.ll Thu Nov 29 12:58:26 2018
>>> @@ -33,7 +33,7 @@ define i8 @sub_zext_cmp_mask_narrower_re
>>>  ; CHECK-LABEL: sub_zext_cmp_mask_narrower_result:
>>>  ; CHECK:       # %bb.0:
>>>  ; CHECK-NEXT:    movl %edi, %eax
>>> -; CHECK-NEXT:    andl $1, %eax
>>> +; CHECK-NEXT:    andb $1, %al
>>>  ; CHECK-NEXT:    orb $46, %al
>>>  ; CHECK-NEXT:    # kill: def $al killed $al killed $eax
>>>  ; CHECK-NEXT:    retq
>>> @@ -77,7 +77,7 @@ define i8 @add_zext_cmp_mask_narrower_re
>>>  ; CHECK-LABEL: add_zext_cmp_mask_narrower_result:
>>>  ; CHECK:       # %bb.0:
>>>  ; CHECK-NEXT:    movl %edi, %eax
>>> -; CHECK-NEXT:    andl $1, %eax
>>> +; CHECK-NEXT:    andb $1, %al
>>>  ; CHECK-NEXT:    xorb $43, %al
>>>  ; CHECK-NEXT:    # kill: def $al killed $al killed $eax
>>>  ; CHECK-NEXT:    retq
>>> @@ -159,7 +159,7 @@ define i8 @low_bit_select_constants_bigg
>>>  ; CHECK-LABEL: low_bit_select_constants_bigger_true_narrower_result:
>>>  ; CHECK:       # %bb.0:
>>>  ; CHECK-NEXT:    movl %edi, %eax
>>> -; CHECK-NEXT:    andl $1, %eax
>>> +; CHECK-NEXT:    andb $1, %al
>>>  ; CHECK-NEXT:    xorb $41, %al
>>>  ; CHECK-NEXT:    # kill: def $al killed $al killed $eax
>>>  ; CHECK-NEXT:    retq
>>>
>>> Modified: llvm/trunk/test/CodeGen/X86/clear-lowbits.ll
>>> URL:
>>> http://llvm.org/viewvc/llvm-project/llvm/trunk/test/CodeGen/X86/clear-lowbits.ll?rev=347917&r1=347916&r2=347917&view=diff
>>>
>>> ==============================================================================
>>> --- llvm/trunk/test/CodeGen/X86/clear-lowbits.ll (original)
>>> +++ llvm/trunk/test/CodeGen/X86/clear-lowbits.ll Thu Nov 29 12:58:26 2018
>>> @@ -866,10 +866,9 @@ define i16 @clear_lowbits16_ic0(i16 %val
>>>  ; X86-NOBMI2-LABEL: clear_lowbits16_ic0:
>>>  ; X86-NOBMI2:       # %bb.0:
>>>  ; X86-NOBMI2-NEXT:    movzwl {{[0-9]+}}(%esp), %eax
>>> -; X86-NOBMI2-NEXT:    movw $16, %cx
>>> -; X86-NOBMI2-NEXT:    subw {{[0-9]+}}(%esp), %cx
>>> +; X86-NOBMI2-NEXT:    movb $16, %cl
>>> +; X86-NOBMI2-NEXT:    subb {{[0-9]+}}(%esp), %cl
>>>  ; X86-NOBMI2-NEXT:    shrl %cl, %eax
>>> -; X86-NOBMI2-NEXT:    # kill: def $cl killed $cl killed $cx
>>>  ; X86-NOBMI2-NEXT:    shll %cl, %eax
>>>  ; X86-NOBMI2-NEXT:    # kill: def $ax killed $ax killed $eax
>>>  ; X86-NOBMI2-NEXT:    retl
>>> @@ -877,8 +876,8 @@ define i16 @clear_lowbits16_ic0(i16 %val
>>>  ; X86-BMI2-LABEL: clear_lowbits16_ic0:
>>>  ; X86-BMI2:       # %bb.0:
>>>  ; X86-BMI2-NEXT:    movzwl {{[0-9]+}}(%esp), %eax
>>> -; X86-BMI2-NEXT:    movw $16, %cx
>>> -; X86-BMI2-NEXT:    subw {{[0-9]+}}(%esp), %cx
>>> +; X86-BMI2-NEXT:    movb $16, %cl
>>> +; X86-BMI2-NEXT:    subb {{[0-9]+}}(%esp), %cl
>>>  ; X86-BMI2-NEXT:    shrxl %ecx, %eax, %eax
>>>  ; X86-BMI2-NEXT:    shlxl %ecx, %eax, %eax
>>>  ; X86-BMI2-NEXT:    # kill: def $ax killed $ax killed $eax
>>> @@ -887,10 +886,9 @@ define i16 @clear_lowbits16_ic0(i16 %val
>>>  ; X64-NOBMI2-LABEL: clear_lowbits16_ic0:
>>>  ; X64-NOBMI2:       # %bb.0:
>>>  ; X64-NOBMI2-NEXT:    movzwl %di, %eax
>>> -; X64-NOBMI2-NEXT:    movl $16, %ecx
>>> -; X64-NOBMI2-NEXT:    subl %esi, %ecx
>>> +; X64-NOBMI2-NEXT:    movb $16, %cl
>>> +; X64-NOBMI2-NEXT:    subb %sil, %cl
>>>  ; X64-NOBMI2-NEXT:    shrl %cl, %eax
>>> -; X64-NOBMI2-NEXT:    # kill: def $cl killed $cl killed $ecx
>>>  ; X64-NOBMI2-NEXT:    shll %cl, %eax
>>>  ; X64-NOBMI2-NEXT:    # kill: def $ax killed $ax killed $eax
>>>  ; X64-NOBMI2-NEXT:    retq
>>> @@ -898,8 +896,8 @@ define i16 @clear_lowbits16_ic0(i16 %val
>>>  ; X64-BMI2-LABEL: clear_lowbits16_ic0:
>>>  ; X64-BMI2:       # %bb.0:
>>>  ; X64-BMI2-NEXT:    movzwl %di, %eax
>>> -; X64-BMI2-NEXT:    movl $16, %ecx
>>> -; X64-BMI2-NEXT:    subl %esi, %ecx
>>> +; X64-BMI2-NEXT:    movb $16, %cl
>>> +; X64-BMI2-NEXT:    subb %sil, %cl
>>>  ; X64-BMI2-NEXT:    shrxl %ecx, %eax, %eax
>>>  ; X64-BMI2-NEXT:    shlxl %ecx, %eax, %eax
>>>  ; X64-BMI2-NEXT:    # kill: def $ax killed $ax killed $eax
>>> @@ -962,10 +960,9 @@ define i16 @clear_lowbits16_ic2_load(i16
>>>  ; X86-NOBMI2:       # %bb.0:
>>>  ; X86-NOBMI2-NEXT:    movl {{[0-9]+}}(%esp), %eax
>>>  ; X86-NOBMI2-NEXT:    movzwl (%eax), %eax
>>> -; X86-NOBMI2-NEXT:    movw $16, %cx
>>> -; X86-NOBMI2-NEXT:    subw {{[0-9]+}}(%esp), %cx
>>> +; X86-NOBMI2-NEXT:    movb $16, %cl
>>> +; X86-NOBMI2-NEXT:    subb {{[0-9]+}}(%esp), %cl
>>>  ; X86-NOBMI2-NEXT:    shrl %cl, %eax
>>> -; X86-NOBMI2-NEXT:    # kill: def $cl killed $cl killed $cx
>>>  ; X86-NOBMI2-NEXT:    shll %cl, %eax
>>>  ; X86-NOBMI2-NEXT:    # kill: def $ax killed $ax killed $eax
>>>  ; X86-NOBMI2-NEXT:    retl
>>> @@ -974,8 +971,8 @@ define i16 @clear_lowbits16_ic2_load(i16
>>>  ; X86-BMI2:       # %bb.0:
>>>  ; X86-BMI2-NEXT:    movl {{[0-9]+}}(%esp), %eax
>>>  ; X86-BMI2-NEXT:    movzwl (%eax), %eax
>>> -; X86-BMI2-NEXT:    movw $16, %cx
>>> -; X86-BMI2-NEXT:    subw {{[0-9]+}}(%esp), %cx
>>> +; X86-BMI2-NEXT:    movb $16, %cl
>>> +; X86-BMI2-NEXT:    subb {{[0-9]+}}(%esp), %cl
>>>  ; X86-BMI2-NEXT:    shrxl %ecx, %eax, %eax
>>>  ; X86-BMI2-NEXT:    shlxl %ecx, %eax, %eax
>>>  ; X86-BMI2-NEXT:    # kill: def $ax killed $ax killed $eax
>>> @@ -984,10 +981,9 @@ define i16 @clear_lowbits16_ic2_load(i16
>>>  ; X64-NOBMI2-LABEL: clear_lowbits16_ic2_load:
>>>  ; X64-NOBMI2:       # %bb.0:
>>>  ; X64-NOBMI2-NEXT:    movzwl (%rdi), %eax
>>> -; X64-NOBMI2-NEXT:    movl $16, %ecx
>>> -; X64-NOBMI2-NEXT:    subl %esi, %ecx
>>> +; X64-NOBMI2-NEXT:    movb $16, %cl
>>> +; X64-NOBMI2-NEXT:    subb %sil, %cl
>>>  ; X64-NOBMI2-NEXT:    shrl %cl, %eax
>>> -; X64-NOBMI2-NEXT:    # kill: def $cl killed $cl killed $ecx
>>>  ; X64-NOBMI2-NEXT:    shll %cl, %eax
>>>  ; X64-NOBMI2-NEXT:    # kill: def $ax killed $ax killed $eax
>>>  ; X64-NOBMI2-NEXT:    retq
>>> @@ -995,8 +991,8 @@ define i16 @clear_lowbits16_ic2_load(i16
>>>  ; X64-BMI2-LABEL: clear_lowbits16_ic2_load:
>>>  ; X64-BMI2:       # %bb.0:
>>>  ; X64-BMI2-NEXT:    movzwl (%rdi), %eax
>>> -; X64-BMI2-NEXT:    movl $16, %ecx
>>> -; X64-BMI2-NEXT:    subl %esi, %ecx
>>> +; X64-BMI2-NEXT:    movb $16, %cl
>>> +; X64-BMI2-NEXT:    subb %sil, %cl
>>>  ; X64-BMI2-NEXT:    shrxl %ecx, %eax, %eax
>>>  ; X64-BMI2-NEXT:    shlxl %ecx, %eax, %eax
>>>  ; X64-BMI2-NEXT:    # kill: def $ax killed $ax killed $eax
>>> @@ -1062,10 +1058,9 @@ define i16 @clear_lowbits16_ic4_commutat
>>>  ; X86-NOBMI2-LABEL: clear_lowbits16_ic4_commutative:
>>>  ; X86-NOBMI2:       # %bb.0:
>>>  ; X86-NOBMI2-NEXT:    movzwl {{[0-9]+}}(%esp), %eax
>>> -; X86-NOBMI2-NEXT:    movw $16, %cx
>>> -; X86-NOBMI2-NEXT:    subw {{[0-9]+}}(%esp), %cx
>>> +; X86-NOBMI2-NEXT:    movb $16, %cl
>>> +; X86-NOBMI2-NEXT:    subb {{[0-9]+}}(%esp), %cl
>>>  ; X86-NOBMI2-NEXT:    shrl %cl, %eax
>>> -; X86-NOBMI2-NEXT:    # kill: def $cl killed $cl killed $cx
>>>  ; X86-NOBMI2-NEXT:    shll %cl, %eax
>>>  ; X86-NOBMI2-NEXT:    # kill: def $ax killed $ax killed $eax
>>>  ; X86-NOBMI2-NEXT:    retl
>>> @@ -1073,8 +1068,8 @@ define i16 @clear_lowbits16_ic4_commutat
>>>  ; X86-BMI2-LABEL: clear_lowbits16_ic4_commutative:
>>>  ; X86-BMI2:       # %bb.0:
>>>  ; X86-BMI2-NEXT:    movzwl {{[0-9]+}}(%esp), %eax
>>> -; X86-BMI2-NEXT:    movw $16, %cx
>>> -; X86-BMI2-NEXT:    subw {{[0-9]+}}(%esp), %cx
>>> +; X86-BMI2-NEXT:    movb $16, %cl
>>> +; X86-BMI2-NEXT:    subb {{[0-9]+}}(%esp), %cl
>>>  ; X86-BMI2-NEXT:    shrxl %ecx, %eax, %eax
>>>  ; X86-BMI2-NEXT:    shlxl %ecx, %eax, %eax
>>>  ; X86-BMI2-NEXT:    # kill: def $ax killed $ax killed $eax
>>> @@ -1083,10 +1078,9 @@ define i16 @clear_lowbits16_ic4_commutat
>>>  ; X64-NOBMI2-LABEL: clear_lowbits16_ic4_commutative:
>>>  ; X64-NOBMI2:       # %bb.0:
>>>  ; X64-NOBMI2-NEXT:    movzwl %di, %eax
>>> -; X64-NOBMI2-NEXT:    movl $16, %ecx
>>> -; X64-NOBMI2-NEXT:    subl %esi, %ecx
>>> +; X64-NOBMI2-NEXT:    movb $16, %cl
>>> +; X64-NOBMI2-NEXT:    subb %sil, %cl
>>>  ; X64-NOBMI2-NEXT:    shrl %cl, %eax
>>> -; X64-NOBMI2-NEXT:    # kill: def $cl killed $cl killed $ecx
>>>  ; X64-NOBMI2-NEXT:    shll %cl, %eax
>>>  ; X64-NOBMI2-NEXT:    # kill: def $ax killed $ax killed $eax
>>>  ; X64-NOBMI2-NEXT:    retq
>>> @@ -1094,8 +1088,8 @@ define i16 @clear_lowbits16_ic4_commutat
>>>  ; X64-BMI2-LABEL: clear_lowbits16_ic4_commutative:
>>>  ; X64-BMI2:       # %bb.0:
>>>  ; X64-BMI2-NEXT:    movzwl %di, %eax
>>> -; X64-BMI2-NEXT:    movl $16, %ecx
>>> -; X64-BMI2-NEXT:    subl %esi, %ecx
>>> +; X64-BMI2-NEXT:    movb $16, %cl
>>> +; X64-BMI2-NEXT:    subb %sil, %cl
>>>  ; X64-BMI2-NEXT:    shrxl %ecx, %eax, %eax
>>>  ; X64-BMI2-NEXT:    shlxl %ecx, %eax, %eax
>>>  ; X64-BMI2-NEXT:    # kill: def $ax killed $ax killed $eax
>>> @@ -1113,7 +1107,7 @@ define i32 @clear_lowbits32_ic0(i32 %val
>>>  ; X86-NOBMI2:       # %bb.0:
>>>  ; X86-NOBMI2-NEXT:    movl {{[0-9]+}}(%esp), %eax
>>>  ; X86-NOBMI2-NEXT:    xorl %ecx, %ecx
>>> -; X86-NOBMI2-NEXT:    subl {{[0-9]+}}(%esp), %ecx
>>> +; X86-NOBMI2-NEXT:    subb {{[0-9]+}}(%esp), %cl
>>>  ; X86-NOBMI2-NEXT:    shrl %cl, %eax
>>>  ; X86-NOBMI2-NEXT:    # kill: def $cl killed $cl killed $ecx
>>>  ; X86-NOBMI2-NEXT:    shll %cl, %eax
>>> @@ -1122,7 +1116,7 @@ define i32 @clear_lowbits32_ic0(i32 %val
>>>  ; X86-BMI2-LABEL: clear_lowbits32_ic0:
>>>  ; X86-BMI2:       # %bb.0:
>>>  ; X86-BMI2-NEXT:    xorl %eax, %eax
>>> -; X86-BMI2-NEXT:    subl {{[0-9]+}}(%esp), %eax
>>> +; X86-BMI2-NEXT:    subb {{[0-9]+}}(%esp), %al
>>>  ; X86-BMI2-NEXT:    shrxl %eax, {{[0-9]+}}(%esp), %ecx
>>>  ; X86-BMI2-NEXT:    shlxl %eax, %ecx, %eax
>>>  ; X86-BMI2-NEXT:    retl
>>> @@ -1131,7 +1125,7 @@ define i32 @clear_lowbits32_ic0(i32 %val
>>>  ; X64-NOBMI2:       # %bb.0:
>>>  ; X64-NOBMI2-NEXT:    movl %esi, %ecx
>>>  ; X64-NOBMI2-NEXT:    movl %edi, %eax
>>> -; X64-NOBMI2-NEXT:    negl %ecx
>>> +; X64-NOBMI2-NEXT:    negb %cl
>>>  ; X64-NOBMI2-NEXT:    shrl %cl, %eax
>>>  ; X64-NOBMI2-NEXT:    # kill: def $cl killed $cl killed $ecx
>>>  ; X64-NOBMI2-NEXT:    shll %cl, %eax
>>> @@ -1139,7 +1133,7 @@ define i32 @clear_lowbits32_ic0(i32 %val
>>>  ;
>>>  ; X64-BMI2-LABEL: clear_lowbits32_ic0:
>>>  ; X64-BMI2:       # %bb.0:
>>> -; X64-BMI2-NEXT:    negl %esi
>>> +; X64-BMI2-NEXT:    negb %sil
>>>  ; X64-BMI2-NEXT:    shrxl %esi, %edi, %eax
>>>  ; X64-BMI2-NEXT:    shlxl %esi, %eax, %eax
>>>  ; X64-BMI2-NEXT:    retq
>>> @@ -1197,7 +1191,7 @@ define i32 @clear_lowbits32_ic2_load(i32
>>>  ; X86-NOBMI2-NEXT:    movl {{[0-9]+}}(%esp), %eax
>>>  ; X86-NOBMI2-NEXT:    movl (%eax), %eax
>>>  ; X86-NOBMI2-NEXT:    xorl %ecx, %ecx
>>> -; X86-NOBMI2-NEXT:    subl {{[0-9]+}}(%esp), %ecx
>>> +; X86-NOBMI2-NEXT:    subb {{[0-9]+}}(%esp), %cl
>>>  ; X86-NOBMI2-NEXT:    shrl %cl, %eax
>>>  ; X86-NOBMI2-NEXT:    # kill: def $cl killed $cl killed $ecx
>>>  ; X86-NOBMI2-NEXT:    shll %cl, %eax
>>> @@ -1207,7 +1201,7 @@ define i32 @clear_lowbits32_ic2_load(i32
>>>  ; X86-BMI2:       # %bb.0:
>>>  ; X86-BMI2-NEXT:    movl {{[0-9]+}}(%esp), %eax
>>>  ; X86-BMI2-NEXT:    xorl %ecx, %ecx
>>> -; X86-BMI2-NEXT:    subl {{[0-9]+}}(%esp), %ecx
>>> +; X86-BMI2-NEXT:    subb {{[0-9]+}}(%esp), %cl
>>>  ; X86-BMI2-NEXT:    shrxl %ecx, (%eax), %eax
>>>  ; X86-BMI2-NEXT:    shlxl %ecx, %eax, %eax
>>>  ; X86-BMI2-NEXT:    retl
>>> @@ -1216,7 +1210,7 @@ define i32 @clear_lowbits32_ic2_load(i32
>>>  ; X64-NOBMI2:       # %bb.0:
>>>  ; X64-NOBMI2-NEXT:    movl %esi, %ecx
>>>  ; X64-NOBMI2-NEXT:    movl (%rdi), %eax
>>> -; X64-NOBMI2-NEXT:    negl %ecx
>>> +; X64-NOBMI2-NEXT:    negb %cl
>>>  ; X64-NOBMI2-NEXT:    shrl %cl, %eax
>>>  ; X64-NOBMI2-NEXT:    # kill: def $cl killed $cl killed $ecx
>>>  ; X64-NOBMI2-NEXT:    shll %cl, %eax
>>> @@ -1224,7 +1218,7 @@ define i32 @clear_lowbits32_ic2_load(i32
>>>  ;
>>>  ; X64-BMI2-LABEL: clear_lowbits32_ic2_load:
>>>  ; X64-BMI2:       # %bb.0:
>>> -; X64-BMI2-NEXT:    negl %esi
>>> +; X64-BMI2-NEXT:    negb %sil
>>>  ; X64-BMI2-NEXT:    shrxl %esi, (%rdi), %eax
>>>  ; X64-BMI2-NEXT:    shlxl %esi, %eax, %eax
>>>  ; X64-BMI2-NEXT:    retq
>>> @@ -1285,7 +1279,7 @@ define i32 @clear_lowbits32_ic4_commutat
>>>  ; X86-NOBMI2:       # %bb.0:
>>>  ; X86-NOBMI2-NEXT:    movl {{[0-9]+}}(%esp), %eax
>>>  ; X86-NOBMI2-NEXT:    xorl %ecx, %ecx
>>> -; X86-NOBMI2-NEXT:    subl {{[0-9]+}}(%esp), %ecx
>>> +; X86-NOBMI2-NEXT:    subb {{[0-9]+}}(%esp), %cl
>>>  ; X86-NOBMI2-NEXT:    shrl %cl, %eax
>>>  ; X86-NOBMI2-NEXT:    # kill: def $cl killed $cl killed $ecx
>>>  ; X86-NOBMI2-NEXT:    shll %cl, %eax
>>> @@ -1294,7 +1288,7 @@ define i32 @clear_lowbits32_ic4_commutat
>>>  ; X86-BMI2-LABEL: clear_lowbits32_ic4_commutative:
>>>  ; X86-BMI2:       # %bb.0:
>>>  ; X86-BMI2-NEXT:    xorl %eax, %eax
>>> -; X86-BMI2-NEXT:    subl {{[0-9]+}}(%esp), %eax
>>> +; X86-BMI2-NEXT:    subb {{[0-9]+}}(%esp), %al
>>>  ; X86-BMI2-NEXT:    shrxl %eax, {{[0-9]+}}(%esp), %ecx
>>>  ; X86-BMI2-NEXT:    shlxl %eax, %ecx, %eax
>>>  ; X86-BMI2-NEXT:    retl
>>> @@ -1303,7 +1297,7 @@ define i32 @clear_lowbits32_ic4_commutat
>>>  ; X64-NOBMI2:       # %bb.0:
>>>  ; X64-NOBMI2-NEXT:    movl %esi, %ecx
>>>  ; X64-NOBMI2-NEXT:    movl %edi, %eax
>>> -; X64-NOBMI2-NEXT:    negl %ecx
>>> +; X64-NOBMI2-NEXT:    negb %cl
>>>  ; X64-NOBMI2-NEXT:    shrl %cl, %eax
>>>  ; X64-NOBMI2-NEXT:    # kill: def $cl killed $cl killed $ecx
>>>  ; X64-NOBMI2-NEXT:    shll %cl, %eax
>>> @@ -1311,7 +1305,7 @@ define i32 @clear_lowbits32_ic4_commutat
>>>  ;
>>>  ; X64-BMI2-LABEL: clear_lowbits32_ic4_commutative:
>>>  ; X64-BMI2:       # %bb.0:
>>> -; X64-BMI2-NEXT:    negl %esi
>>> +; X64-BMI2-NEXT:    negb %sil
>>>  ; X64-BMI2-NEXT:    shrxl %esi, %edi, %eax
>>>  ; X64-BMI2-NEXT:    shlxl %esi, %eax, %eax
>>>  ; X64-BMI2-NEXT:    retq
>>> @@ -1326,8 +1320,8 @@ define i32 @clear_lowbits32_ic4_commutat
>>>  define i64 @clear_lowbits64_ic0(i64 %val, i64 %numlowbits) nounwind {
>>>  ; X86-NOBMI2-LABEL: clear_lowbits64_ic0:
>>>  ; X86-NOBMI2:       # %bb.0:
>>> -; X86-NOBMI2-NEXT:    movl $64, %ecx
>>> -; X86-NOBMI2-NEXT:    subl {{[0-9]+}}(%esp), %ecx
>>> +; X86-NOBMI2-NEXT:    movb $64, %cl
>>> +; X86-NOBMI2-NEXT:    subb {{[0-9]+}}(%esp), %cl
>>>  ; X86-NOBMI2-NEXT:    movl $-1, %edx
>>>  ; X86-NOBMI2-NEXT:    movl $-1, %eax
>>>  ; X86-NOBMI2-NEXT:    shll %cl, %eax
>>> @@ -1344,8 +1338,8 @@ define i64 @clear_lowbits64_ic0(i64 %val
>>>  ;
>>>  ; X86-BMI2-LABEL: clear_lowbits64_ic0:
>>>  ; X86-BMI2:       # %bb.0:
>>> -; X86-BMI2-NEXT:    movl $64, %ecx
>>> -; X86-BMI2-NEXT:    subl {{[0-9]+}}(%esp), %ecx
>>> +; X86-BMI2-NEXT:    movb $64, %cl
>>> +; X86-BMI2-NEXT:    subb {{[0-9]+}}(%esp), %cl
>>>  ; X86-BMI2-NEXT:    movl $-1, %edx
>>>  ; X86-BMI2-NEXT:    shlxl %ecx, %edx, %eax
>>>  ; X86-BMI2-NEXT:    shldl %cl, %edx, %edx
>>> @@ -1363,7 +1357,7 @@ define i64 @clear_lowbits64_ic0(i64 %val
>>>  ; X64-NOBMI2:       # %bb.0:
>>>  ; X64-NOBMI2-NEXT:    movq %rsi, %rcx
>>>  ; X64-NOBMI2-NEXT:    movq %rdi, %rax
>>> -; X64-NOBMI2-NEXT:    negl %ecx
>>> +; X64-NOBMI2-NEXT:    negb %cl
>>>  ; X64-NOBMI2-NEXT:    shrq %cl, %rax
>>>  ; X64-NOBMI2-NEXT:    # kill: def $cl killed $cl killed $rcx
>>>  ; X64-NOBMI2-NEXT:    shlq %cl, %rax
>>> @@ -1371,7 +1365,7 @@ define i64 @clear_lowbits64_ic0(i64 %val
>>>  ;
>>>  ; X64-BMI2-LABEL: clear_lowbits64_ic0:
>>>  ; X64-BMI2:       # %bb.0:
>>> -; X64-BMI2-NEXT:    negl %esi
>>> +; X64-BMI2-NEXT:    negb %sil
>>>  ; X64-BMI2-NEXT:    shrxq %rsi, %rdi, %rax
>>>  ; X64-BMI2-NEXT:    shlxq %rsi, %rax, %rax
>>>  ; X64-BMI2-NEXT:    retq
>>> @@ -1446,8 +1440,8 @@ define i64 @clear_lowbits64_ic2_load(i64
>>>  ; X86-NOBMI2:       # %bb.0:
>>>  ; X86-NOBMI2-NEXT:    pushl %esi
>>>  ; X86-NOBMI2-NEXT:    movl {{[0-9]+}}(%esp), %esi
>>> -; X86-NOBMI2-NEXT:    movl $64, %ecx
>>> -; X86-NOBMI2-NEXT:    subl {{[0-9]+}}(%esp), %ecx
>>> +; X86-NOBMI2-NEXT:    movb $64, %cl
>>> +; X86-NOBMI2-NEXT:    subb {{[0-9]+}}(%esp), %cl
>>>  ; X86-NOBMI2-NEXT:    movl $-1, %edx
>>>  ; X86-NOBMI2-NEXT:    movl $-1, %eax
>>>  ; X86-NOBMI2-NEXT:    shll %cl, %eax
>>> @@ -1467,8 +1461,8 @@ define i64 @clear_lowbits64_ic2_load(i64
>>>  ; X86-BMI2:       # %bb.0:
>>>  ; X86-BMI2-NEXT:    pushl %esi
>>>  ; X86-BMI2-NEXT:    movl {{[0-9]+}}(%esp), %esi
>>> -; X86-BMI2-NEXT:    movl $64, %ecx
>>> -; X86-BMI2-NEXT:    subl {{[0-9]+}}(%esp), %ecx
>>> +; X86-BMI2-NEXT:    movb $64, %cl
>>> +; X86-BMI2-NEXT:    subb {{[0-9]+}}(%esp), %cl
>>>  ; X86-BMI2-NEXT:    movl $-1, %edx
>>>  ; X86-BMI2-NEXT:    shlxl %ecx, %edx, %eax
>>>  ; X86-BMI2-NEXT:    shldl %cl, %edx, %edx
>>> @@ -1487,7 +1481,7 @@ define i64 @clear_lowbits64_ic2_load(i64
>>>  ; X64-NOBMI2:       # %bb.0:
>>>  ; X64-NOBMI2-NEXT:    movq %rsi, %rcx
>>>  ; X64-NOBMI2-NEXT:    movq (%rdi), %rax
>>> -; X64-NOBMI2-NEXT:    negl %ecx
>>> +; X64-NOBMI2-NEXT:    negb %cl
>>>  ; X64-NOBMI2-NEXT:    shrq %cl, %rax
>>>  ; X64-NOBMI2-NEXT:    # kill: def $cl killed $cl killed $rcx
>>>  ; X64-NOBMI2-NEXT:    shlq %cl, %rax
>>> @@ -1495,7 +1489,7 @@ define i64 @clear_lowbits64_ic2_load(i64
>>>  ;
>>>  ; X64-BMI2-LABEL: clear_lowbits64_ic2_load:
>>>  ; X64-BMI2:       # %bb.0:
>>> -; X64-BMI2-NEXT:    negl %esi
>>> +; X64-BMI2-NEXT:    negb %sil
>>>  ; X64-BMI2-NEXT:    shrxq %rsi, (%rdi), %rax
>>>  ; X64-BMI2-NEXT:    shlxq %rsi, %rax, %rax
>>>  ; X64-BMI2-NEXT:    retq
>>> @@ -1576,8 +1570,8 @@ define i64 @clear_lowbits64_ic3_load_ind
>>>  define i64 @clear_lowbits64_ic4_commutative(i64 %val, i64 %numlowbits)
>>> nounwind {
>>>  ; X86-NOBMI2-LABEL: clear_lowbits64_ic4_commutative:
>>>  ; X86-NOBMI2:       # %bb.0:
>>> -; X86-NOBMI2-NEXT:    movl $64, %ecx
>>> -; X86-NOBMI2-NEXT:    subl {{[0-9]+}}(%esp), %ecx
>>> +; X86-NOBMI2-NEXT:    movb $64, %cl
>>> +; X86-NOBMI2-NEXT:    subb {{[0-9]+}}(%esp), %cl
>>>  ; X86-NOBMI2-NEXT:    movl $-1, %edx
>>>  ; X86-NOBMI2-NEXT:    movl $-1, %eax
>>>  ; X86-NOBMI2-NEXT:    shll %cl, %eax
>>> @@ -1594,8 +1588,8 @@ define i64 @clear_lowbits64_ic4_commutat
>>>  ;
>>>  ; X86-BMI2-LABEL: clear_lowbits64_ic4_commutative:
>>>  ; X86-BMI2:       # %bb.0:
>>> -; X86-BMI2-NEXT:    movl $64, %ecx
>>> -; X86-BMI2-NEXT:    subl {{[0-9]+}}(%esp), %ecx
>>> +; X86-BMI2-NEXT:    movb $64, %cl
>>> +; X86-BMI2-NEXT:    subb {{[0-9]+}}(%esp), %cl
>>>  ; X86-BMI2-NEXT:    movl $-1, %edx
>>>  ; X86-BMI2-NEXT:    shlxl %ecx, %edx, %eax
>>>  ; X86-BMI2-NEXT:    shldl %cl, %edx, %edx
>>> @@ -1613,7 +1607,7 @@ define i64 @clear_lowbits64_ic4_commutat
>>>  ; X64-NOBMI2:       # %bb.0:
>>>  ; X64-NOBMI2-NEXT:    movq %rsi, %rcx
>>>  ; X64-NOBMI2-NEXT:    movq %rdi, %rax
>>> -; X64-NOBMI2-NEXT:    negl %ecx
>>> +; X64-NOBMI2-NEXT:    negb %cl
>>>  ; X64-NOBMI2-NEXT:    shrq %cl, %rax
>>>  ; X64-NOBMI2-NEXT:    # kill: def $cl killed $cl killed $rcx
>>>  ; X64-NOBMI2-NEXT:    shlq %cl, %rax
>>> @@ -1621,7 +1615,7 @@ define i64 @clear_lowbits64_ic4_commutat
>>>  ;
>>>  ; X64-BMI2-LABEL: clear_lowbits64_ic4_commutative:
>>>  ; X64-BMI2:       # %bb.0:
>>> -; X64-BMI2-NEXT:    negl %esi
>>> +; X64-BMI2-NEXT:    negb %sil
>>>  ; X64-BMI2-NEXT:    shrxq %rsi, %rdi, %rax
>>>  ; X64-BMI2-NEXT:    shlxq %rsi, %rax, %rax
>>>  ; X64-BMI2-NEXT:    retq
>>>
>>> Modified: llvm/trunk/test/CodeGen/X86/cmov.ll
>>> URL:
>>> http://llvm.org/viewvc/llvm-project/llvm/trunk/test/CodeGen/X86/cmov.ll?rev=347917&r1=347916&r2=347917&view=diff
>>>
>>> ==============================================================================
>>> --- llvm/trunk/test/CodeGen/X86/cmov.ll (original)
>>> +++ llvm/trunk/test/CodeGen/X86/cmov.ll Thu Nov 29 12:58:26 2018
>>> @@ -81,7 +81,7 @@ define i1 @test4() nounwind {
>>>  ; CHECK-NEXT:    movsbl {{.*}}(%rip), %edx
>>>  ; CHECK-NEXT:    movzbl %dl, %ecx
>>>  ; CHECK-NEXT:    shrl $7, %ecx
>>> -; CHECK-NEXT:    xorl $1, %ecx
>>> +; CHECK-NEXT:    xorb $1, %cl
>>>  ; CHECK-NEXT:    # kill: def $cl killed $cl killed $ecx
>>>  ; CHECK-NEXT:    sarl %cl, %edx
>>>  ; CHECK-NEXT:    movb {{.*}}(%rip), %al
>>>
>>> Modified: llvm/trunk/test/CodeGen/X86/extract-bits.ll
>>> URL:
>>> http://llvm.org/viewvc/llvm-project/llvm/trunk/test/CodeGen/X86/extract-bits.ll?rev=347917&r1=347916&r2=347917&view=diff
>>>
>>> ==============================================================================
>>> --- llvm/trunk/test/CodeGen/X86/extract-bits.ll (original)
>>> +++ llvm/trunk/test/CodeGen/X86/extract-bits.ll Thu Nov 29 12:58:26 2018
>>> @@ -2983,7 +2983,7 @@ define i32 @bextr32_c0(i32 %val, i32 %nu
>>>  ; X86-NOBMI-NEXT:    movl {{[0-9]+}}(%esp), %edi
>>>  ; X86-NOBMI-NEXT:    shrl %cl, %edi
>>>  ; X86-NOBMI-NEXT:    xorl %ecx, %ecx
>>> -; X86-NOBMI-NEXT:    subl {{[0-9]+}}(%esp), %ecx
>>> +; X86-NOBMI-NEXT:    subb {{[0-9]+}}(%esp), %cl
>>>  ; X86-NOBMI-NEXT:    movl $-1, %esi
>>>  ; X86-NOBMI-NEXT:    # kill: def $cl killed $cl killed $ecx
>>>  ; X86-NOBMI-NEXT:    shrl %cl, %esi
>>> @@ -3005,7 +3005,7 @@ define i32 @bextr32_c0(i32 %val, i32 %nu
>>>  ; X86-BMI1NOTBM-NEXT:    movl {{[0-9]+}}(%esp), %edi
>>>  ; X86-BMI1NOTBM-NEXT:    shrl %cl, %edi
>>>  ; X86-BMI1NOTBM-NEXT:    xorl %ecx, %ecx
>>> -; X86-BMI1NOTBM-NEXT:    subl {{[0-9]+}}(%esp), %ecx
>>> +; X86-BMI1NOTBM-NEXT:    subb {{[0-9]+}}(%esp), %cl
>>>  ; X86-BMI1NOTBM-NEXT:    movl $-1, %esi
>>>  ; X86-BMI1NOTBM-NEXT:    # kill: def $cl killed $cl killed $ecx
>>>  ; X86-BMI1NOTBM-NEXT:    shrl %cl, %esi
>>> @@ -3020,22 +3020,22 @@ define i32 @bextr32_c0(i32 %val, i32 %nu
>>>  ;
>>>  ; X86-BMI1BMI2-LABEL: bextr32_c0:
>>>  ; X86-BMI1BMI2:       # %bb.0:
>>> -; X86-BMI1BMI2-NEXT:    pushl %edi
>>> +; X86-BMI1BMI2-NEXT:    pushl %ebx
>>>  ; X86-BMI1BMI2-NEXT:    pushl %esi
>>>  ; X86-BMI1BMI2-NEXT:    pushl %eax
>>> -; X86-BMI1BMI2-NEXT:    movl {{[0-9]+}}(%esp), %esi
>>> +; X86-BMI1BMI2-NEXT:    movb {{[0-9]+}}(%esp), %bl
>>>  ; X86-BMI1BMI2-NEXT:    movb {{[0-9]+}}(%esp), %al
>>> -; X86-BMI1BMI2-NEXT:    shrxl %eax, {{[0-9]+}}(%esp), %edi
>>> -; X86-BMI1BMI2-NEXT:    movl %esi, %eax
>>> -; X86-BMI1BMI2-NEXT:    negl %eax
>>> +; X86-BMI1BMI2-NEXT:    shrxl %eax, {{[0-9]+}}(%esp), %esi
>>> +; X86-BMI1BMI2-NEXT:    movl %ebx, %eax
>>> +; X86-BMI1BMI2-NEXT:    negb %al
>>>  ; X86-BMI1BMI2-NEXT:    movl $-1, %ecx
>>>  ; X86-BMI1BMI2-NEXT:    shrxl %eax, %ecx, %eax
>>>  ; X86-BMI1BMI2-NEXT:    movl %eax, (%esp)
>>>  ; X86-BMI1BMI2-NEXT:    calll use32
>>> -; X86-BMI1BMI2-NEXT:    bzhil %esi, %edi, %eax
>>> +; X86-BMI1BMI2-NEXT:    bzhil %ebx, %esi, %eax
>>>  ; X86-BMI1BMI2-NEXT:    addl $4, %esp
>>>  ; X86-BMI1BMI2-NEXT:    popl %esi
>>> -; X86-BMI1BMI2-NEXT:    popl %edi
>>> +; X86-BMI1BMI2-NEXT:    popl %ebx
>>>  ; X86-BMI1BMI2-NEXT:    retl
>>>  ;
>>>  ; X64-NOBMI-LABEL: bextr32_c0:
>>> @@ -3047,7 +3047,7 @@ define i32 @bextr32_c0(i32 %val, i32 %nu
>>>  ; X64-NOBMI-NEXT:    movl %edi, %ebx
>>>  ; X64-NOBMI-NEXT:    # kill: def $cl killed $cl killed $ecx
>>>  ; X64-NOBMI-NEXT:    shrl %cl, %ebx
>>> -; X64-NOBMI-NEXT:    negl %edx
>>> +; X64-NOBMI-NEXT:    negb %dl
>>>  ; X64-NOBMI-NEXT:    movl $-1, %ebp
>>>  ; X64-NOBMI-NEXT:    movl %edx, %ecx
>>>  ; X64-NOBMI-NEXT:    shrl %cl, %ebp
>>> @@ -3069,7 +3069,7 @@ define i32 @bextr32_c0(i32 %val, i32 %nu
>>>  ; X64-BMI1NOTBM-NEXT:    movl %edi, %ebx
>>>  ; X64-BMI1NOTBM-NEXT:    # kill: def $cl killed $cl killed $ecx
>>>  ; X64-BMI1NOTBM-NEXT:    shrl %cl, %ebx
>>> -; X64-BMI1NOTBM-NEXT:    negl %edx
>>> +; X64-BMI1NOTBM-NEXT:    negb %dl
>>>  ; X64-BMI1NOTBM-NEXT:    movl $-1, %ebp
>>>  ; X64-BMI1NOTBM-NEXT:    movl %edx, %ecx
>>>  ; X64-BMI1NOTBM-NEXT:    shrl %cl, %ebp
>>> @@ -3089,8 +3089,8 @@ define i32 @bextr32_c0(i32 %val, i32 %nu
>>>  ; X64-BMI1BMI2-NEXT:    pushq %rax
>>>  ; X64-BMI1BMI2-NEXT:    movl %edx, %ebx
>>>  ; X64-BMI1BMI2-NEXT:    shrxl %esi, %edi, %ebp
>>> -; X64-BMI1BMI2-NEXT:    movl %edx, %eax
>>> -; X64-BMI1BMI2-NEXT:    negl %eax
>>> +; X64-BMI1BMI2-NEXT:    movl %ebx, %eax
>>> +; X64-BMI1BMI2-NEXT:    negb %al
>>>  ; X64-BMI1BMI2-NEXT:    movl $-1, %ecx
>>>  ; X64-BMI1BMI2-NEXT:    shrxl %eax, %ecx, %edi
>>>  ; X64-BMI1BMI2-NEXT:    callq use32
>>> @@ -3254,7 +3254,7 @@ define i32 @bextr32_c2_load(i32* %w, i32
>>>  ; X86-NOBMI-NEXT:    movl (%eax), %edi
>>>  ; X86-NOBMI-NEXT:    shrl %cl, %edi
>>>  ; X86-NOBMI-NEXT:    xorl %ecx, %ecx
>>> -; X86-NOBMI-NEXT:    subl {{[0-9]+}}(%esp), %ecx
>>> +; X86-NOBMI-NEXT:    subb {{[0-9]+}}(%esp), %cl
>>>  ; X86-NOBMI-NEXT:    movl $-1, %esi
>>>  ; X86-NOBMI-NEXT:    # kill: def $cl killed $cl killed $ecx
>>>  ; X86-NOBMI-NEXT:    shrl %cl, %esi
>>> @@ -3277,7 +3277,7 @@ define i32 @bextr32_c2_load(i32* %w, i32
>>>  ; X86-BMI1NOTBM-NEXT:    movl (%eax), %edi
>>>  ; X86-BMI1NOTBM-NEXT:    shrl %cl, %edi
>>>  ; X86-BMI1NOTBM-NEXT:    xorl %ecx, %ecx
>>> -; X86-BMI1NOTBM-NEXT:    subl {{[0-9]+}}(%esp), %ecx
>>> +; X86-BMI1NOTBM-NEXT:    subb {{[0-9]+}}(%esp), %cl
>>>  ; X86-BMI1NOTBM-NEXT:    movl $-1, %esi
>>>  ; X86-BMI1NOTBM-NEXT:    # kill: def $cl killed $cl killed $ecx
>>>  ; X86-BMI1NOTBM-NEXT:    shrl %cl, %esi
>>> @@ -3292,23 +3292,23 @@ define i32 @bextr32_c2_load(i32* %w, i32
>>>  ;
>>>  ; X86-BMI1BMI2-LABEL: bextr32_c2_load:
>>>  ; X86-BMI1BMI2:       # %bb.0:
>>> -; X86-BMI1BMI2-NEXT:    pushl %edi
>>> +; X86-BMI1BMI2-NEXT:    pushl %ebx
>>>  ; X86-BMI1BMI2-NEXT:    pushl %esi
>>>  ; X86-BMI1BMI2-NEXT:    pushl %eax
>>> -; X86-BMI1BMI2-NEXT:    movl {{[0-9]+}}(%esp), %esi
>>> +; X86-BMI1BMI2-NEXT:    movb {{[0-9]+}}(%esp), %bl
>>>  ; X86-BMI1BMI2-NEXT:    movl {{[0-9]+}}(%esp), %eax
>>>  ; X86-BMI1BMI2-NEXT:    movb {{[0-9]+}}(%esp), %cl
>>> -; X86-BMI1BMI2-NEXT:    shrxl %ecx, (%eax), %edi
>>> -; X86-BMI1BMI2-NEXT:    movl %esi, %eax
>>> -; X86-BMI1BMI2-NEXT:    negl %eax
>>> +; X86-BMI1BMI2-NEXT:    shrxl %ecx, (%eax), %esi
>>> +; X86-BMI1BMI2-NEXT:    movl %ebx, %eax
>>> +; X86-BMI1BMI2-NEXT:    negb %al
>>>  ; X86-BMI1BMI2-NEXT:    movl $-1, %ecx
>>>  ; X86-BMI1BMI2-NEXT:    shrxl %eax, %ecx, %eax
>>>  ; X86-BMI1BMI2-NEXT:    movl %eax, (%esp)
>>>  ; X86-BMI1BMI2-NEXT:    calll use32
>>> -; X86-BMI1BMI2-NEXT:    bzhil %esi, %edi, %eax
>>> +; X86-BMI1BMI2-NEXT:    bzhil %ebx, %esi, %eax
>>>  ; X86-BMI1BMI2-NEXT:    addl $4, %esp
>>>  ; X86-BMI1BMI2-NEXT:    popl %esi
>>> -; X86-BMI1BMI2-NEXT:    popl %edi
>>> +; X86-BMI1BMI2-NEXT:    popl %ebx
>>>  ; X86-BMI1BMI2-NEXT:    retl
>>>  ;
>>>  ; X64-NOBMI-LABEL: bextr32_c2_load:
>>> @@ -3320,7 +3320,7 @@ define i32 @bextr32_c2_load(i32* %w, i32
>>>  ; X64-NOBMI-NEXT:    movl (%rdi), %ebp
>>>  ; X64-NOBMI-NEXT:    # kill: def $cl killed $cl killed $ecx
>>>  ; X64-NOBMI-NEXT:    shrl %cl, %ebp
>>> -; X64-NOBMI-NEXT:    negl %edx
>>> +; X64-NOBMI-NEXT:    negb %dl
>>>  ; X64-NOBMI-NEXT:    movl $-1, %ebx
>>>  ; X64-NOBMI-NEXT:    movl %edx, %ecx
>>>  ; X64-NOBMI-NEXT:    shrl %cl, %ebx
>>> @@ -3342,7 +3342,7 @@ define i32 @bextr32_c2_load(i32* %w, i32
>>>  ; X64-BMI1NOTBM-NEXT:    movl (%rdi), %ebp
>>>  ; X64-BMI1NOTBM-NEXT:    # kill: def $cl killed $cl killed $ecx
>>>  ; X64-BMI1NOTBM-NEXT:    shrl %cl, %ebp
>>> -; X64-BMI1NOTBM-NEXT:    negl %edx
>>> +; X64-BMI1NOTBM-NEXT:    negb %dl
>>>  ; X64-BMI1NOTBM-NEXT:    movl $-1, %ebx
>>>  ; X64-BMI1NOTBM-NEXT:    movl %edx, %ecx
>>>  ; X64-BMI1NOTBM-NEXT:    shrl %cl, %ebx
>>> @@ -3362,8 +3362,8 @@ define i32 @bextr32_c2_load(i32* %w, i32
>>>  ; X64-BMI1BMI2-NEXT:    pushq %rax
>>>  ; X64-BMI1BMI2-NEXT:    movl %edx, %ebx
>>>  ; X64-BMI1BMI2-NEXT:    shrxl %esi, (%rdi), %ebp
>>> -; X64-BMI1BMI2-NEXT:    movl %edx, %eax
>>> -; X64-BMI1BMI2-NEXT:    negl %eax
>>> +; X64-BMI1BMI2-NEXT:    movl %ebx, %eax
>>> +; X64-BMI1BMI2-NEXT:    negb %al
>>>  ; X64-BMI1BMI2-NEXT:    movl $-1, %ecx
>>>  ; X64-BMI1BMI2-NEXT:    shrxl %eax, %ecx, %edi
>>>  ; X64-BMI1BMI2-NEXT:    callq use32
>>> @@ -3531,7 +3531,7 @@ define i32 @bextr32_c4_commutative(i32 %
>>>  ; X86-NOBMI-NEXT:    movl {{[0-9]+}}(%esp), %edi
>>>  ; X86-NOBMI-NEXT:    shrl %cl, %edi
>>>  ; X86-NOBMI-NEXT:    xorl %ecx, %ecx
>>> -; X86-NOBMI-NEXT:    subl {{[0-9]+}}(%esp), %ecx
>>> +; X86-NOBMI-NEXT:    subb {{[0-9]+}}(%esp), %cl
>>>  ; X86-NOBMI-NEXT:    movl $-1, %esi
>>>  ; X86-NOBMI-NEXT:    # kill: def $cl killed $cl killed $ecx
>>>  ; X86-NOBMI-NEXT:    shrl %cl, %esi
>>> @@ -3553,7 +3553,7 @@ define i32 @bextr32_c4_commutative(i32 %
>>>  ; X86-BMI1NOTBM-NEXT:    movl {{[0-9]+}}(%esp), %edi
>>>  ; X86-BMI1NOTBM-NEXT:    shrl %cl, %edi
>>>  ; X86-BMI1NOTBM-NEXT:    xorl %ecx, %ecx
>>> -; X86-BMI1NOTBM-NEXT:    subl {{[0-9]+}}(%esp), %ecx
>>> +; X86-BMI1NOTBM-NEXT:    subb {{[0-9]+}}(%esp), %cl
>>>  ; X86-BMI1NOTBM-NEXT:    movl $-1, %esi
>>>  ; X86-BMI1NOTBM-NEXT:    # kill: def $cl killed $cl killed $ecx
>>>  ; X86-BMI1NOTBM-NEXT:    shrl %cl, %esi
>>> @@ -3568,22 +3568,22 @@ define i32 @bextr32_c4_commutative(i32 %
>>>  ;
>>>  ; X86-BMI1BMI2-LABEL: bextr32_c4_commutative:
>>>  ; X86-BMI1BMI2:       # %bb.0:
>>> -; X86-BMI1BMI2-NEXT:    pushl %edi
>>> +; X86-BMI1BMI2-NEXT:    pushl %ebx
>>>  ; X86-BMI1BMI2-NEXT:    pushl %esi
>>>  ; X86-BMI1BMI2-NEXT:    pushl %eax
>>> -; X86-BMI1BMI2-NEXT:    movl {{[0-9]+}}(%esp), %esi
>>> +; X86-BMI1BMI2-NEXT:    movb {{[0-9]+}}(%esp), %bl
>>>  ; X86-BMI1BMI2-NEXT:    movb {{[0-9]+}}(%esp), %al
>>> -; X86-BMI1BMI2-NEXT:    shrxl %eax, {{[0-9]+}}(%esp), %edi
>>> -; X86-BMI1BMI2-NEXT:    movl %esi, %eax
>>> -; X86-BMI1BMI2-NEXT:    negl %eax
>>> +; X86-BMI1BMI2-NEXT:    shrxl %eax, {{[0-9]+}}(%esp), %esi
>>> +; X86-BMI1BMI2-NEXT:    movl %ebx, %eax
>>> +; X86-BMI1BMI2-NEXT:    negb %al
>>>  ; X86-BMI1BMI2-NEXT:    movl $-1, %ecx
>>>  ; X86-BMI1BMI2-NEXT:    shrxl %eax, %ecx, %eax
>>>  ; X86-BMI1BMI2-NEXT:    movl %eax, (%esp)
>>>  ; X86-BMI1BMI2-NEXT:    calll use32
>>> -; X86-BMI1BMI2-NEXT:    bzhil %esi, %edi, %eax
>>> +; X86-BMI1BMI2-NEXT:    bzhil %ebx, %esi, %eax
>>>  ; X86-BMI1BMI2-NEXT:    addl $4, %esp
>>>  ; X86-BMI1BMI2-NEXT:    popl %esi
>>> -; X86-BMI1BMI2-NEXT:    popl %edi
>>> +; X86-BMI1BMI2-NEXT:    popl %ebx
>>>  ; X86-BMI1BMI2-NEXT:    retl
>>>  ;
>>>  ; X64-NOBMI-LABEL: bextr32_c4_commutative:
>>> @@ -3595,7 +3595,7 @@ define i32 @bextr32_c4_commutative(i32 %
>>>  ; X64-NOBMI-NEXT:    movl %edi, %ebx
>>>  ; X64-NOBMI-NEXT:    # kill: def $cl killed $cl killed $ecx
>>>  ; X64-NOBMI-NEXT:    shrl %cl, %ebx
>>> -; X64-NOBMI-NEXT:    negl %edx
>>> +; X64-NOBMI-NEXT:    negb %dl
>>>  ; X64-NOBMI-NEXT:    movl $-1, %ebp
>>>  ; X64-NOBMI-NEXT:    movl %edx, %ecx
>>>  ; X64-NOBMI-NEXT:    shrl %cl, %ebp
>>> @@ -3617,7 +3617,7 @@ define i32 @bextr32_c4_commutative(i32 %
>>>  ; X64-BMI1NOTBM-NEXT:    movl %edi, %ebx
>>>  ; X64-BMI1NOTBM-NEXT:    # kill: def $cl killed $cl killed $ecx
>>>  ; X64-BMI1NOTBM-NEXT:    shrl %cl, %ebx
>>> -; X64-BMI1NOTBM-NEXT:    negl %edx
>>> +; X64-BMI1NOTBM-NEXT:    negb %dl
>>>  ; X64-BMI1NOTBM-NEXT:    movl $-1, %ebp
>>>  ; X64-BMI1NOTBM-NEXT:    movl %edx, %ecx
>>>  ; X64-BMI1NOTBM-NEXT:    shrl %cl, %ebp
>>> @@ -3637,8 +3637,8 @@ define i32 @bextr32_c4_commutative(i32 %
>>>  ; X64-BMI1BMI2-NEXT:    pushq %rax
>>>  ; X64-BMI1BMI2-NEXT:    movl %edx, %ebx
>>>  ; X64-BMI1BMI2-NEXT:    shrxl %esi, %edi, %ebp
>>> -; X64-BMI1BMI2-NEXT:    movl %edx, %eax
>>> -; X64-BMI1BMI2-NEXT:    negl %eax
>>> +; X64-BMI1BMI2-NEXT:    movl %ebx, %eax
>>> +; X64-BMI1BMI2-NEXT:    negb %al
>>>  ; X64-BMI1BMI2-NEXT:    movl $-1, %ecx
>>>  ; X64-BMI1BMI2-NEXT:    shrxl %eax, %ecx, %edi
>>>  ; X64-BMI1BMI2-NEXT:    callq use32
>>> @@ -3667,7 +3667,7 @@ define i32 @bextr32_c5_skipextrauses(i32
>>>  ; X86-NOBMI-NEXT:    movl %ebx, %ecx
>>>  ; X86-NOBMI-NEXT:    shrl %cl, %edi
>>>  ; X86-NOBMI-NEXT:    xorl %ecx, %ecx
>>> -; X86-NOBMI-NEXT:    subl {{[0-9]+}}(%esp), %ecx
>>> +; X86-NOBMI-NEXT:    subb {{[0-9]+}}(%esp), %cl
>>>  ; X86-NOBMI-NEXT:    movl $-1, %esi
>>>  ; X86-NOBMI-NEXT:    # kill: def $cl killed $cl killed $ecx
>>>  ; X86-NOBMI-NEXT:    shrl %cl, %esi
>>> @@ -3694,7 +3694,7 @@ define i32 @bextr32_c5_skipextrauses(i32
>>>  ; X86-BMI1NOTBM-NEXT:    movl %ebx, %ecx
>>>  ; X86-BMI1NOTBM-NEXT:    shrl %cl, %edi
>>>  ; X86-BMI1NOTBM-NEXT:    xorl %ecx, %ecx
>>> -; X86-BMI1NOTBM-NEXT:    subl {{[0-9]+}}(%esp), %ecx
>>> +; X86-BMI1NOTBM-NEXT:    subb {{[0-9]+}}(%esp), %cl
>>>  ; X86-BMI1NOTBM-NEXT:    movl $-1, %esi
>>>  ; X86-BMI1NOTBM-NEXT:    # kill: def $cl killed $cl killed $ecx
>>>  ; X86-BMI1NOTBM-NEXT:    shrl %cl, %esi
>>> @@ -3716,16 +3716,16 @@ define i32 @bextr32_c5_skipextrauses(i32
>>>  ; X86-BMI1BMI2-NEXT:    pushl %edi
>>>  ; X86-BMI1BMI2-NEXT:    pushl %esi
>>>  ; X86-BMI1BMI2-NEXT:    subl $16, %esp
>>> -; X86-BMI1BMI2-NEXT:    movl {{[0-9]+}}(%esp), %esi
>>> +; X86-BMI1BMI2-NEXT:    movb {{[0-9]+}}(%esp), %bl
>>>  ; X86-BMI1BMI2-NEXT:    movl {{[0-9]+}}(%esp), %edi
>>> -; X86-BMI1BMI2-NEXT:    shrxl %edi, {{[0-9]+}}(%esp), %ebx
>>> -; X86-BMI1BMI2-NEXT:    movl %esi, %eax
>>> -; X86-BMI1BMI2-NEXT:    negl %eax
>>> +; X86-BMI1BMI2-NEXT:    shrxl %edi, {{[0-9]+}}(%esp), %esi
>>> +; X86-BMI1BMI2-NEXT:    movl %ebx, %eax
>>> +; X86-BMI1BMI2-NEXT:    negb %al
>>>  ; X86-BMI1BMI2-NEXT:    movl $-1, %ecx
>>>  ; X86-BMI1BMI2-NEXT:    shrxl %eax, %ecx, %eax
>>>  ; X86-BMI1BMI2-NEXT:    movl %eax, (%esp)
>>>  ; X86-BMI1BMI2-NEXT:    calll use32
>>> -; X86-BMI1BMI2-NEXT:    bzhil %esi, %ebx, %esi
>>> +; X86-BMI1BMI2-NEXT:    bzhil %ebx, %esi, %esi
>>>  ; X86-BMI1BMI2-NEXT:    movl %edi, (%esp)
>>>  ; X86-BMI1BMI2-NEXT:    calll use32
>>>  ; X86-BMI1BMI2-NEXT:    movl %esi, %eax
>>> @@ -3744,7 +3744,7 @@ define i32 @bextr32_c5_skipextrauses(i32
>>>  ; X64-NOBMI-NEXT:    movl %edi, %ebp
>>>  ; X64-NOBMI-NEXT:    movl %r14d, %ecx
>>>  ; X64-NOBMI-NEXT:    shrl %cl, %ebp
>>> -; X64-NOBMI-NEXT:    negl %edx
>>> +; X64-NOBMI-NEXT:    negb %dl
>>>  ; X64-NOBMI-NEXT:    movl $-1, %ebx
>>>  ; X64-NOBMI-NEXT:    movl %edx, %ecx
>>>  ; X64-NOBMI-NEXT:    shrl %cl, %ebx
>>> @@ -3768,7 +3768,7 @@ define i32 @bextr32_c5_skipextrauses(i32
>>>  ; X64-BMI1NOTBM-NEXT:    movl %edi, %ebp
>>>  ; X64-BMI1NOTBM-NEXT:    movl %r14d, %ecx
>>>  ; X64-BMI1NOTBM-NEXT:    shrl %cl, %ebp
>>> -; X64-BMI1NOTBM-NEXT:    negl %edx
>>> +; X64-BMI1NOTBM-NEXT:    negb %dl
>>>  ; X64-BMI1NOTBM-NEXT:    movl $-1, %ebx
>>>  ; X64-BMI1NOTBM-NEXT:    movl %edx, %ecx
>>>  ; X64-BMI1NOTBM-NEXT:    shrl %cl, %ebx
>>> @@ -3791,8 +3791,8 @@ define i32 @bextr32_c5_skipextrauses(i32
>>>  ; X64-BMI1BMI2-NEXT:    movl %edx, %ebx
>>>  ; X64-BMI1BMI2-NEXT:    movl %esi, %ebp
>>>  ; X64-BMI1BMI2-NEXT:    shrxl %esi, %edi, %r14d
>>> -; X64-BMI1BMI2-NEXT:    movl %edx, %eax
>>> -; X64-BMI1BMI2-NEXT:    negl %eax
>>> +; X64-BMI1BMI2-NEXT:    movl %ebx, %eax
>>> +; X64-BMI1BMI2-NEXT:    negb %al
>>>  ; X64-BMI1BMI2-NEXT:    movl $-1, %ecx
>>>  ; X64-BMI1BMI2-NEXT:    shrxl %eax, %ecx, %edi
>>>  ; X64-BMI1BMI2-NEXT:    callq use32
>>> @@ -3835,8 +3835,8 @@ define i64 @bextr64_c0(i64 %val, i64 %nu
>>>  ; X86-NOBMI-NEXT:    movl %edi, %esi
>>>  ; X86-NOBMI-NEXT:    xorl %edi, %edi
>>>  ; X86-NOBMI-NEXT:  .LBB32_2:
>>> -; X86-NOBMI-NEXT:    movl $64, %ecx
>>> -; X86-NOBMI-NEXT:    subl {{[0-9]+}}(%esp), %ecx
>>> +; X86-NOBMI-NEXT:    movb $64, %cl
>>> +; X86-NOBMI-NEXT:    subb {{[0-9]+}}(%esp), %cl
>>>  ; X86-NOBMI-NEXT:    movl $-1, %ebp
>>>  ; X86-NOBMI-NEXT:    movl $-1, %ebx
>>>  ; X86-NOBMI-NEXT:    shrl %cl, %ebx
>>> @@ -3882,8 +3882,8 @@ define i64 @bextr64_c0(i64 %val, i64 %nu
>>>  ; X86-BMI1NOTBM-NEXT:    movl %edi, %esi
>>>  ; X86-BMI1NOTBM-NEXT:    xorl %edi, %edi
>>>  ; X86-BMI1NOTBM-NEXT:  .LBB32_2:
>>> -; X86-BMI1NOTBM-NEXT:    movl $64, %ecx
>>> -; X86-BMI1NOTBM-NEXT:    subl {{[0-9]+}}(%esp), %ecx
>>> +; X86-BMI1NOTBM-NEXT:    movb $64, %cl
>>> +; X86-BMI1NOTBM-NEXT:    subb {{[0-9]+}}(%esp), %cl
>>>  ; X86-BMI1NOTBM-NEXT:    movl $-1, %ebp
>>>  ; X86-BMI1NOTBM-NEXT:    movl $-1, %ebx
>>>  ; X86-BMI1NOTBM-NEXT:    shrl %cl, %ebx
>>> @@ -3928,8 +3928,8 @@ define i64 @bextr64_c0(i64 %val, i64 %nu
>>>  ; X86-BMI1BMI2-NEXT:    movl %edi, %esi
>>>  ; X86-BMI1BMI2-NEXT:    xorl %edi, %edi
>>>  ; X86-BMI1BMI2-NEXT:  .LBB32_2:
>>> -; X86-BMI1BMI2-NEXT:    movl $64, %ecx
>>> -; X86-BMI1BMI2-NEXT:    subl {{[0-9]+}}(%esp), %ecx
>>> +; X86-BMI1BMI2-NEXT:    movb $64, %cl
>>> +; X86-BMI1BMI2-NEXT:    subb {{[0-9]+}}(%esp), %cl
>>>  ; X86-BMI1BMI2-NEXT:    movl $-1, %ebx
>>>  ; X86-BMI1BMI2-NEXT:    shrxl %ecx, %ebx, %ebp
>>>  ; X86-BMI1BMI2-NEXT:    shrdl %cl, %ebx, %ebx
>>> @@ -3964,7 +3964,7 @@ define i64 @bextr64_c0(i64 %val, i64 %nu
>>>  ; X64-NOBMI-NEXT:    movq %rdi, %r14
>>>  ; X64-NOBMI-NEXT:    # kill: def $cl killed $cl killed $rcx
>>>  ; X64-NOBMI-NEXT:    shrq %cl, %r14
>>> -; X64-NOBMI-NEXT:    negl %edx
>>> +; X64-NOBMI-NEXT:    negb %dl
>>>  ; X64-NOBMI-NEXT:    movq $-1, %rbx
>>>  ; X64-NOBMI-NEXT:    movl %edx, %ecx
>>>  ; X64-NOBMI-NEXT:    shrq %cl, %rbx
>>> @@ -3986,7 +3986,7 @@ define i64 @bextr64_c0(i64 %val, i64 %nu
>>>  ; X64-BMI1NOTBM-NEXT:    movq %rdi, %r14
>>>  ; X64-BMI1NOTBM-NEXT:    # kill: def $cl killed $cl killed $rcx
>>>  ; X64-BMI1NOTBM-NEXT:    shrq %cl, %r14
>>> -; X64-BMI1NOTBM-NEXT:    negl %edx
>>> +; X64-BMI1NOTBM-NEXT:    negb %dl
>>>  ; X64-BMI1NOTBM-NEXT:    movq $-1, %rbx
>>>  ; X64-BMI1NOTBM-NEXT:    movl %edx, %ecx
>>>  ; X64-BMI1NOTBM-NEXT:    shrq %cl, %rbx
>>> @@ -4007,7 +4007,7 @@ define i64 @bextr64_c0(i64 %val, i64 %nu
>>>  ; X64-BMI1BMI2-NEXT:    movq %rdx, %rbx
>>>  ; X64-BMI1BMI2-NEXT:    shrxq %rsi, %rdi, %r14
>>>  ; X64-BMI1BMI2-NEXT:    movl %ebx, %eax
>>> -; X64-BMI1BMI2-NEXT:    negl %eax
>>> +; X64-BMI1BMI2-NEXT:    negb %al
>>>  ; X64-BMI1BMI2-NEXT:    movq $-1, %rcx
>>>  ; X64-BMI1BMI2-NEXT:    shrxq %rax, %rcx, %rdi
>>>  ; X64-BMI1BMI2-NEXT:    callq use64
>>> @@ -4257,8 +4257,8 @@ define i64 @bextr64_c2_load(i64* %w, i64
>>>  ; X86-NOBMI-NEXT:    movl %edi, %esi
>>>  ; X86-NOBMI-NEXT:    xorl %edi, %edi
>>>  ; X86-NOBMI-NEXT:  .LBB34_2:
>>> -; X86-NOBMI-NEXT:    movl $64, %ecx
>>> -; X86-NOBMI-NEXT:    subl {{[0-9]+}}(%esp), %ecx
>>> +; X86-NOBMI-NEXT:    movb $64, %cl
>>> +; X86-NOBMI-NEXT:    subb {{[0-9]+}}(%esp), %cl
>>>  ; X86-NOBMI-NEXT:    movl $-1, %ebp
>>>  ; X86-NOBMI-NEXT:    movl $-1, %ebx
>>>  ; X86-NOBMI-NEXT:    shrl %cl, %ebx
>>> @@ -4305,8 +4305,8 @@ define i64 @bextr64_c2_load(i64* %w, i64
>>>  ; X86-BMI1NOTBM-NEXT:    movl %edi, %esi
>>>  ; X86-BMI1NOTBM-NEXT:    xorl %edi, %edi
>>>  ; X86-BMI1NOTBM-NEXT:  .LBB34_2:
>>> -; X86-BMI1NOTBM-NEXT:    movl $64, %ecx
>>> -; X86-BMI1NOTBM-NEXT:    subl {{[0-9]+}}(%esp), %ecx
>>> +; X86-BMI1NOTBM-NEXT:    movb $64, %cl
>>> +; X86-BMI1NOTBM-NEXT:    subb {{[0-9]+}}(%esp), %cl
>>>  ; X86-BMI1NOTBM-NEXT:    movl $-1, %ebp
>>>  ; X86-BMI1NOTBM-NEXT:    movl $-1, %ebx
>>>  ; X86-BMI1NOTBM-NEXT:    shrl %cl, %ebx
>>> @@ -4352,8 +4352,8 @@ define i64 @bextr64_c2_load(i64* %w, i64
>>>  ; X86-BMI1BMI2-NEXT:    movl %edi, %esi
>>>  ; X86-BMI1BMI2-NEXT:    xorl %edi, %edi
>>>  ; X86-BMI1BMI2-NEXT:  .LBB34_2:
>>> -; X86-BMI1BMI2-NEXT:    movl $64, %ecx
>>> -; X86-BMI1BMI2-NEXT:    subl {{[0-9]+}}(%esp), %ecx
>>> +; X86-BMI1BMI2-NEXT:    movb $64, %cl
>>> +; X86-BMI1BMI2-NEXT:    subb {{[0-9]+}}(%esp), %cl
>>>  ; X86-BMI1BMI2-NEXT:    movl $-1, %ebx
>>>  ; X86-BMI1BMI2-NEXT:    shrxl %ecx, %ebx, %ebp
>>>  ; X86-BMI1BMI2-NEXT:    shrdl %cl, %ebx, %ebx
>>> @@ -4388,7 +4388,7 @@ define i64 @bextr64_c2_load(i64* %w, i64
>>>  ; X64-NOBMI-NEXT:    movq (%rdi), %r14
>>>  ; X64-NOBMI-NEXT:    # kill: def $cl killed $cl killed $rcx
>>>  ; X64-NOBMI-NEXT:    shrq %cl, %r14
>>> -; X64-NOBMI-NEXT:    negl %edx
>>> +; X64-NOBMI-NEXT:    negb %dl
>>>  ; X64-NOBMI-NEXT:    movq $-1, %rbx
>>>  ; X64-NOBMI-NEXT:    movl %edx, %ecx
>>>  ; X64-NOBMI-NEXT:    shrq %cl, %rbx
>>> @@ -4410,7 +4410,7 @@ define i64 @bextr64_c2_load(i64* %w, i64
>>>  ; X64-BMI1NOTBM-NEXT:    movq (%rdi), %r14
>>>  ; X64-BMI1NOTBM-NEXT:    # kill: def $cl killed $cl killed $rcx
>>>  ; X64-BMI1NOTBM-NEXT:    shrq %cl, %r14
>>> -; X64-BMI1NOTBM-NEXT:    negl %edx
>>> +; X64-BMI1NOTBM-NEXT:    negb %dl
>>>  ; X64-BMI1NOTBM-NEXT:    movq $-1, %rbx
>>>  ; X64-BMI1NOTBM-NEXT:    movl %edx, %ecx
>>>  ; X64-BMI1NOTBM-NEXT:    shrq %cl, %rbx
>>> @@ -4431,7 +4431,7 @@ define i64 @bextr64_c2_load(i64* %w, i64
>>>  ; X64-BMI1BMI2-NEXT:    movq %rdx, %rbx
>>>  ; X64-BMI1BMI2-NEXT:    shrxq %rsi, (%rdi), %r14
>>>  ; X64-BMI1BMI2-NEXT:    movl %ebx, %eax
>>> -; X64-BMI1BMI2-NEXT:    negl %eax
>>> +; X64-BMI1BMI2-NEXT:    negb %al
>>>  ; X64-BMI1BMI2-NEXT:    movq $-1, %rcx
>>>  ; X64-BMI1BMI2-NEXT:    shrxq %rax, %rcx, %rdi
>>>  ; X64-BMI1BMI2-NEXT:    callq use64
>>> @@ -4685,8 +4685,8 @@ define i64 @bextr64_c4_commutative(i64 %
>>>  ; X86-NOBMI-NEXT:    movl %edi, %esi
>>>  ; X86-NOBMI-NEXT:    xorl %edi, %edi
>>>  ; X86-NOBMI-NEXT:  .LBB36_2:
>>> -; X86-NOBMI-NEXT:    movl $64, %ecx
>>> -; X86-NOBMI-NEXT:    subl {{[0-9]+}}(%esp), %ecx
>>> +; X86-NOBMI-NEXT:    movb $64, %cl
>>> +; X86-NOBMI-NEXT:    subb {{[0-9]+}}(%esp), %cl
>>>  ; X86-NOBMI-NEXT:    movl $-1, %ebp
>>>  ; X86-NOBMI-NEXT:    movl $-1, %ebx
>>>  ; X86-NOBMI-NEXT:    shrl %cl, %ebx
>>> @@ -4732,8 +4732,8 @@ define i64 @bextr64_c4_commutative(i64 %
>>>  ; X86-BMI1NOTBM-NEXT:    movl %edi, %esi
>>>  ; X86-BMI1NOTBM-NEXT:    xorl %edi, %edi
>>>  ; X86-BMI1NOTBM-NEXT:  .LBB36_2:
>>> -; X86-BMI1NOTBM-NEXT:    movl $64, %ecx
>>> -; X86-BMI1NOTBM-NEXT:    subl {{[0-9]+}}(%esp), %ecx
>>> +; X86-BMI1NOTBM-NEXT:    movb $64, %cl
>>> +; X86-BMI1NOTBM-NEXT:    subb {{[0-9]+}}(%esp), %cl
>>>  ; X86-BMI1NOTBM-NEXT:    movl $-1, %ebp
>>>  ; X86-BMI1NOTBM-NEXT:    movl $-1, %ebx
>>>  ; X86-BMI1NOTBM-NEXT:    shrl %cl, %ebx
>>> @@ -4778,8 +4778,8 @@ define i64 @bextr64_c4_commutative(i64 %
>>>  ; X86-BMI1BMI2-NEXT:    movl %edi, %esi
>>>  ; X86-BMI1BMI2-NEXT:    xorl %edi, %edi
>>>  ; X86-BMI1BMI2-NEXT:  .LBB36_2:
>>> -; X86-BMI1BMI2-NEXT:    movl $64, %ecx
>>> -; X86-BMI1BMI2-NEXT:    subl {{[0-9]+}}(%esp), %ecx
>>> +; X86-BMI1BMI2-NEXT:    movb $64, %cl
>>> +; X86-BMI1BMI2-NEXT:    subb {{[0-9]+}}(%esp), %cl
>>>  ; X86-BMI1BMI2-NEXT:    movl $-1, %ebx
>>>  ; X86-BMI1BMI2-NEXT:    shrxl %ecx, %ebx, %ebp
>>>  ; X86-BMI1BMI2-NEXT:    shrdl %cl, %ebx, %ebx
>>> @@ -4814,7 +4814,7 @@ define i64 @bextr64_c4_commutative(i64 %
>>>  ; X64-NOBMI-NEXT:    movq %rdi, %r14
>>>  ; X64-NOBMI-NEXT:    # kill: def $cl killed $cl killed $rcx
>>>  ; X64-NOBMI-NEXT:    shrq %cl, %r14
>>> -; X64-NOBMI-NEXT:    negl %edx
>>> +; X64-NOBMI-NEXT:    negb %dl
>>>  ; X64-NOBMI-NEXT:    movq $-1, %rbx
>>>  ; X64-NOBMI-NEXT:    movl %edx, %ecx
>>>  ; X64-NOBMI-NEXT:    shrq %cl, %rbx
>>> @@ -4836,7 +4836,7 @@ define i64 @bextr64_c4_commutative(i64 %
>>>  ; X64-BMI1NOTBM-NEXT:    movq %rdi, %r14
>>>  ; X64-BMI1NOTBM-NEXT:    # kill: def $cl killed $cl killed $rcx
>>>  ; X64-BMI1NOTBM-NEXT:    shrq %cl, %r14
>>> -; X64-BMI1NOTBM-NEXT:    negl %edx
>>> +; X64-BMI1NOTBM-NEXT:    negb %dl
>>>  ; X64-BMI1NOTBM-NEXT:    movq $-1, %rbx
>>>  ; X64-BMI1NOTBM-NEXT:    movl %edx, %ecx
>>>  ; X64-BMI1NOTBM-NEXT:    shrq %cl, %rbx
>>> @@ -4857,7 +4857,7 @@ define i64 @bextr64_c4_commutative(i64 %
>>>  ; X64-BMI1BMI2-NEXT:    movq %rdx, %rbx
>>>  ; X64-BMI1BMI2-NEXT:    shrxq %rsi, %rdi, %r14
>>>  ; X64-BMI1BMI2-NEXT:    movl %ebx, %eax
>>> -; X64-BMI1BMI2-NEXT:    negl %eax
>>> +; X64-BMI1BMI2-NEXT:    negb %al
>>>  ; X64-BMI1BMI2-NEXT:    movq $-1, %rcx
>>>  ; X64-BMI1BMI2-NEXT:    shrxq %rax, %rcx, %rdi
>>>  ; X64-BMI1BMI2-NEXT:    callq use64
>>> @@ -4894,8 +4894,8 @@ define i64 @bextr64_c5_skipextrauses(i64
>>>  ; X86-NOBMI-NEXT:    movl %edi, %esi
>>>  ; X86-NOBMI-NEXT:    xorl %edi, %edi
>>>  ; X86-NOBMI-NEXT:  .LBB37_2:
>>> -; X86-NOBMI-NEXT:    movl $64, %ecx
>>> -; X86-NOBMI-NEXT:    subl {{[0-9]+}}(%esp), %ecx
>>> +; X86-NOBMI-NEXT:    movb $64, %cl
>>> +; X86-NOBMI-NEXT:    subb {{[0-9]+}}(%esp), %cl
>>>  ; X86-NOBMI-NEXT:    movl $-1, %ebx
>>>  ; X86-NOBMI-NEXT:    movl $-1, %ebp
>>>  ; X86-NOBMI-NEXT:    shrl %cl, %ebp
>>> @@ -4946,8 +4946,8 @@ define i64 @bextr64_c5_skipextrauses(i64
>>>  ; X86-BMI1NOTBM-NEXT:    movl %edi, %esi
>>>  ; X86-BMI1NOTBM-NEXT:    xorl %edi, %edi
>>>  ; X86-BMI1NOTBM-NEXT:  .LBB37_2:
>>> -; X86-BMI1NOTBM-NEXT:    movl $64, %ecx
>>> -; X86-BMI1NOTBM-NEXT:    subl {{[0-9]+}}(%esp), %ecx
>>> +; X86-BMI1NOTBM-NEXT:    movb $64, %cl
>>> +; X86-BMI1NOTBM-NEXT:    subb {{[0-9]+}}(%esp), %cl
>>>  ; X86-BMI1NOTBM-NEXT:    movl $-1, %ebx
>>>  ; X86-BMI1NOTBM-NEXT:    movl $-1, %ebp
>>>  ; X86-BMI1NOTBM-NEXT:    shrl %cl, %ebp
>>> @@ -4997,8 +4997,8 @@ define i64 @bextr64_c5_skipextrauses(i64
>>>  ; X86-BMI1BMI2-NEXT:    movl %edi, %esi
>>>  ; X86-BMI1BMI2-NEXT:    xorl %edi, %edi
>>>  ; X86-BMI1BMI2-NEXT:  .LBB37_2:
>>> -; X86-BMI1BMI2-NEXT:    movl $64, %ecx
>>> -; X86-BMI1BMI2-NEXT:    subl {{[0-9]+}}(%esp), %ecx
>>> +; X86-BMI1BMI2-NEXT:    movb $64, %cl
>>> +; X86-BMI1BMI2-NEXT:    subb {{[0-9]+}}(%esp), %cl
>>>  ; X86-BMI1BMI2-NEXT:    movl $-1, %ebp
>>>  ; X86-BMI1BMI2-NEXT:    shrxl %ecx, %ebp, %ebx
>>>  ; X86-BMI1BMI2-NEXT:    shrdl %cl, %ebp, %ebp
>>> @@ -5038,7 +5038,7 @@ define i64 @bextr64_c5_skipextrauses(i64
>>>  ; X64-NOBMI-NEXT:    movq %rdi, %r15
>>>  ; X64-NOBMI-NEXT:    movl %r14d, %ecx
>>>  ; X64-NOBMI-NEXT:    shrq %cl, %r15
>>> -; X64-NOBMI-NEXT:    negl %edx
>>> +; X64-NOBMI-NEXT:    negb %dl
>>>  ; X64-NOBMI-NEXT:    movq $-1, %rbx
>>>  ; X64-NOBMI-NEXT:    movl %edx, %ecx
>>>  ; X64-NOBMI-NEXT:    shrq %cl, %rbx
>>> @@ -5062,7 +5062,7 @@ define i64 @bextr64_c5_skipextrauses(i64
>>>  ; X64-BMI1NOTBM-NEXT:    movq %rdi, %r15
>>>  ; X64-BMI1NOTBM-NEXT:    movl %r14d, %ecx
>>>  ; X64-BMI1NOTBM-NEXT:    shrq %cl, %r15
>>> -; X64-BMI1NOTBM-NEXT:    negl %edx
>>> +; X64-BMI1NOTBM-NEXT:    negb %dl
>>>  ; X64-BMI1NOTBM-NEXT:    movq $-1, %rbx
>>>  ; X64-BMI1NOTBM-NEXT:    movl %edx, %ecx
>>>  ; X64-BMI1NOTBM-NEXT:    shrq %cl, %rbx
>>> @@ -5086,7 +5086,7 @@ define i64 @bextr64_c5_skipextrauses(i64
>>>  ; X64-BMI1BMI2-NEXT:    movq %rsi, %r14
>>>  ; X64-BMI1BMI2-NEXT:    shrxq %rsi, %rdi, %r15
>>>  ; X64-BMI1BMI2-NEXT:    movl %ebx, %eax
>>> -; X64-BMI1BMI2-NEXT:    negl %eax
>>> +; X64-BMI1BMI2-NEXT:    negb %al
>>>  ; X64-BMI1BMI2-NEXT:    movq $-1, %rcx
>>>  ; X64-BMI1BMI2-NEXT:    shrxq %rax, %rcx, %rdi
>>>  ; X64-BMI1BMI2-NEXT:    callq use64
>>> @@ -5118,7 +5118,7 @@ define i32 @bextr32_d0(i32 %val, i32 %nu
>>>  ; X86-NOBMI-NEXT:    movl {{[0-9]+}}(%esp), %eax
>>>  ; X86-NOBMI-NEXT:    shrl %cl, %eax
>>>  ; X86-NOBMI-NEXT:    xorl %ecx, %ecx
>>> -; X86-NOBMI-NEXT:    subl {{[0-9]+}}(%esp), %ecx
>>> +; X86-NOBMI-NEXT:    subb {{[0-9]+}}(%esp), %cl
>>>  ; X86-NOBMI-NEXT:    shll %cl, %eax
>>>  ; X86-NOBMI-NEXT:    # kill: def $cl killed $cl killed $ecx
>>>  ; X86-NOBMI-NEXT:    shrl %cl, %eax
>>> @@ -5126,16 +5126,16 @@ define i32 @bextr32_d0(i32 %val, i32 %nu
>>>  ;
>>>  ; X86-BMI1NOTBM-LABEL: bextr32_d0:
>>>  ; X86-BMI1NOTBM:       # %bb.0:
>>> -; X86-BMI1NOTBM-NEXT:    movl {{[0-9]+}}(%esp), %eax
>>> -; X86-BMI1NOTBM-NEXT:    movzbl {{[0-9]+}}(%esp), %ecx
>>> +; X86-BMI1NOTBM-NEXT:    movb {{[0-9]+}}(%esp), %al
>>>  ; X86-BMI1NOTBM-NEXT:    shll $8, %eax
>>> -; X86-BMI1NOTBM-NEXT:    orl %ecx, %eax
>>> -; X86-BMI1NOTBM-NEXT:    bextrl %eax, {{[0-9]+}}(%esp), %eax
>>> +; X86-BMI1NOTBM-NEXT:    movzbl {{[0-9]+}}(%esp), %ecx
>>> +; X86-BMI1NOTBM-NEXT:    orl %eax, %ecx
>>> +; X86-BMI1NOTBM-NEXT:    bextrl %ecx, {{[0-9]+}}(%esp), %eax
>>>  ; X86-BMI1NOTBM-NEXT:    retl
>>>  ;
>>>  ; X86-BMI1BMI2-LABEL: bextr32_d0:
>>>  ; X86-BMI1BMI2:       # %bb.0:
>>> -; X86-BMI1BMI2-NEXT:    movl {{[0-9]+}}(%esp), %eax
>>> +; X86-BMI1BMI2-NEXT:    movb {{[0-9]+}}(%esp), %al
>>>  ; X86-BMI1BMI2-NEXT:    movb {{[0-9]+}}(%esp), %cl
>>>  ; X86-BMI1BMI2-NEXT:    shrxl %ecx, {{[0-9]+}}(%esp), %ecx
>>>  ; X86-BMI1BMI2-NEXT:    bzhil %eax, %ecx, %eax
>>> @@ -5147,7 +5147,7 @@ define i32 @bextr32_d0(i32 %val, i32 %nu
>>>  ; X64-NOBMI-NEXT:    movl %edi, %eax
>>>  ; X64-NOBMI-NEXT:    # kill: def $cl killed $cl killed $ecx
>>>  ; X64-NOBMI-NEXT:    shrl %cl, %eax
>>> -; X64-NOBMI-NEXT:    negl %edx
>>> +; X64-NOBMI-NEXT:    negb %dl
>>>  ; X64-NOBMI-NEXT:    movl %edx, %ecx
>>>  ; X64-NOBMI-NEXT:    shll %cl, %eax
>>>  ; X64-NOBMI-NEXT:    shrl %cl, %eax
>>> @@ -5245,7 +5245,7 @@ define i32 @bextr32_d2_load(i32* %w, i32
>>>  ; X86-NOBMI-NEXT:    movl (%eax), %eax
>>>  ; X86-NOBMI-NEXT:    shrl %cl, %eax
>>>  ; X86-NOBMI-NEXT:    xorl %ecx, %ecx
>>> -; X86-NOBMI-NEXT:    subl {{[0-9]+}}(%esp), %ecx
>>> +; X86-NOBMI-NEXT:    subb {{[0-9]+}}(%esp), %cl
>>>  ; X86-NOBMI-NEXT:    shll %cl, %eax
>>>  ; X86-NOBMI-NEXT:    # kill: def $cl killed $cl killed $ecx
>>>  ; X86-NOBMI-NEXT:    shrl %cl, %eax
>>> @@ -5254,16 +5254,16 @@ define i32 @bextr32_d2_load(i32* %w, i32
>>>  ; X86-BMI1NOTBM-LABEL: bextr32_d2_load:
>>>  ; X86-BMI1NOTBM:       # %bb.0:
>>>  ; X86-BMI1NOTBM-NEXT:    movl {{[0-9]+}}(%esp), %eax
>>> -; X86-BMI1NOTBM-NEXT:    movl {{[0-9]+}}(%esp), %ecx
>>> -; X86-BMI1NOTBM-NEXT:    movzbl {{[0-9]+}}(%esp), %edx
>>> +; X86-BMI1NOTBM-NEXT:    movb {{[0-9]+}}(%esp), %cl
>>>  ; X86-BMI1NOTBM-NEXT:    shll $8, %ecx
>>> -; X86-BMI1NOTBM-NEXT:    orl %edx, %ecx
>>> -; X86-BMI1NOTBM-NEXT:    bextrl %ecx, (%eax), %eax
>>> +; X86-BMI1NOTBM-NEXT:    movzbl {{[0-9]+}}(%esp), %edx
>>> +; X86-BMI1NOTBM-NEXT:    orl %ecx, %edx
>>> +; X86-BMI1NOTBM-NEXT:    bextrl %edx, (%eax), %eax
>>>  ; X86-BMI1NOTBM-NEXT:    retl
>>>  ;
>>>  ; X86-BMI1BMI2-LABEL: bextr32_d2_load:
>>>  ; X86-BMI1BMI2:       # %bb.0:
>>> -; X86-BMI1BMI2-NEXT:    movl {{[0-9]+}}(%esp), %eax
>>> +; X86-BMI1BMI2-NEXT:    movb {{[0-9]+}}(%esp), %al
>>>  ; X86-BMI1BMI2-NEXT:    movl {{[0-9]+}}(%esp), %ecx
>>>  ; X86-BMI1BMI2-NEXT:    movb {{[0-9]+}}(%esp), %dl
>>>  ; X86-BMI1BMI2-NEXT:    shrxl %edx, (%ecx), %ecx
>>> @@ -5276,7 +5276,7 @@ define i32 @bextr32_d2_load(i32* %w, i32
>>>  ; X64-NOBMI-NEXT:    movl (%rdi), %eax
>>>  ; X64-NOBMI-NEXT:    # kill: def $cl killed $cl killed $ecx
>>>  ; X64-NOBMI-NEXT:    shrl %cl, %eax
>>> -; X64-NOBMI-NEXT:    negl %edx
>>> +; X64-NOBMI-NEXT:    negb %dl
>>>  ; X64-NOBMI-NEXT:    movl %edx, %ecx
>>>  ; X64-NOBMI-NEXT:    shll %cl, %eax
>>>  ; X64-NOBMI-NEXT:    shrl %cl, %eax
>>> @@ -5381,7 +5381,7 @@ define i32 @bextr32_d5_skipextrauses(i32
>>>  ; X86-NOBMI-NEXT:    movl %eax, %ecx
>>>  ; X86-NOBMI-NEXT:    shrl %cl, %esi
>>>  ; X86-NOBMI-NEXT:    xorl %ecx, %ecx
>>> -; X86-NOBMI-NEXT:    subl {{[0-9]+}}(%esp), %ecx
>>> +; X86-NOBMI-NEXT:    subb {{[0-9]+}}(%esp), %cl
>>>  ; X86-NOBMI-NEXT:    shll %cl, %esi
>>>  ; X86-NOBMI-NEXT:    # kill: def $cl killed $cl killed $ecx
>>>  ; X86-NOBMI-NEXT:    shrl %cl, %esi
>>> @@ -5396,7 +5396,7 @@ define i32 @bextr32_d5_skipextrauses(i32
>>>  ; X86-BMI1NOTBM:       # %bb.0:
>>>  ; X86-BMI1NOTBM-NEXT:    pushl %esi
>>>  ; X86-BMI1NOTBM-NEXT:    subl $8, %esp
>>> -; X86-BMI1NOTBM-NEXT:    movl {{[0-9]+}}(%esp), %ecx
>>> +; X86-BMI1NOTBM-NEXT:    movb {{[0-9]+}}(%esp), %cl
>>>  ; X86-BMI1NOTBM-NEXT:    movl {{[0-9]+}}(%esp), %eax
>>>  ; X86-BMI1NOTBM-NEXT:    shll $8, %ecx
>>>  ; X86-BMI1NOTBM-NEXT:    movzbl %al, %edx
>>> @@ -5413,7 +5413,7 @@ define i32 @bextr32_d5_skipextrauses(i32
>>>  ; X86-BMI1BMI2:       # %bb.0:
>>>  ; X86-BMI1BMI2-NEXT:    pushl %esi
>>>  ; X86-BMI1BMI2-NEXT:    subl $8, %esp
>>> -; X86-BMI1BMI2-NEXT:    movl {{[0-9]+}}(%esp), %eax
>>> +; X86-BMI1BMI2-NEXT:    movb {{[0-9]+}}(%esp), %al
>>>  ; X86-BMI1BMI2-NEXT:    movl {{[0-9]+}}(%esp), %ecx
>>>  ; X86-BMI1BMI2-NEXT:    shrxl %ecx, {{[0-9]+}}(%esp), %edx
>>>  ; X86-BMI1BMI2-NEXT:    bzhil %eax, %edx, %esi
>>> @@ -5430,7 +5430,7 @@ define i32 @bextr32_d5_skipextrauses(i32
>>>  ; X64-NOBMI-NEXT:    movl %edi, %ebx
>>>  ; X64-NOBMI-NEXT:    movl %esi, %ecx
>>>  ; X64-NOBMI-NEXT:    shrl %cl, %ebx
>>> -; X64-NOBMI-NEXT:    negl %edx
>>> +; X64-NOBMI-NEXT:    negb %dl
>>>  ; X64-NOBMI-NEXT:    movl %edx, %ecx
>>>  ; X64-NOBMI-NEXT:    shll %cl, %ebx
>>>  ; X64-NOBMI-NEXT:    shrl %cl, %ebx
>>> @@ -5492,8 +5492,8 @@ define i64 @bextr64_d0(i64 %val, i64 %nu
>>>  ; X86-NOBMI-NEXT:    movl %eax, %edi
>>>  ; X86-NOBMI-NEXT:    xorl %eax, %eax
>>>  ; X86-NOBMI-NEXT:  .LBB43_2:
>>> -; X86-NOBMI-NEXT:    movl $64, %ecx
>>> -; X86-NOBMI-NEXT:    subl {{[0-9]+}}(%esp), %ecx
>>> +; X86-NOBMI-NEXT:    movb $64, %cl
>>> +; X86-NOBMI-NEXT:    subb {{[0-9]+}}(%esp), %cl
>>>  ; X86-NOBMI-NEXT:    shldl %cl, %edi, %eax
>>>  ; X86-NOBMI-NEXT:    shll %cl, %edi
>>>  ; X86-NOBMI-NEXT:    testb $32, %cl
>>> @@ -5540,8 +5540,8 @@ define i64 @bextr64_d0(i64 %val, i64 %nu
>>>  ; X86-BMI1NOTBM-NEXT:    movl %eax, %edi
>>>  ; X86-BMI1NOTBM-NEXT:    xorl %eax, %eax
>>>  ; X86-BMI1NOTBM-NEXT:  .LBB43_2:
>>> -; X86-BMI1NOTBM-NEXT:    movl $64, %ecx
>>> -; X86-BMI1NOTBM-NEXT:    subl {{[0-9]+}}(%esp), %ecx
>>> +; X86-BMI1NOTBM-NEXT:    movb $64, %cl
>>> +; X86-BMI1NOTBM-NEXT:    subb {{[0-9]+}}(%esp), %cl
>>>  ; X86-BMI1NOTBM-NEXT:    shldl %cl, %edi, %eax
>>>  ; X86-BMI1NOTBM-NEXT:    shll %cl, %edi
>>>  ; X86-BMI1NOTBM-NEXT:    testb $32, %cl
>>> @@ -5586,8 +5586,8 @@ define i64 @bextr64_d0(i64 %val, i64 %nu
>>>  ; X86-BMI1BMI2-NEXT:    movl %esi, %eax
>>>  ; X86-BMI1BMI2-NEXT:    xorl %esi, %esi
>>>  ; X86-BMI1BMI2-NEXT:  .LBB43_2:
>>> -; X86-BMI1BMI2-NEXT:    movl $64, %ecx
>>> -; X86-BMI1BMI2-NEXT:    subl {{[0-9]+}}(%esp), %ecx
>>> +; X86-BMI1BMI2-NEXT:    movb $64, %cl
>>> +; X86-BMI1BMI2-NEXT:    subb {{[0-9]+}}(%esp), %cl
>>>  ; X86-BMI1BMI2-NEXT:    shldl %cl, %eax, %esi
>>>  ; X86-BMI1BMI2-NEXT:    shlxl %ecx, %eax, %edi
>>>  ; X86-BMI1BMI2-NEXT:    testb $32, %cl
>>> @@ -5617,7 +5617,7 @@ define i64 @bextr64_d0(i64 %val, i64 %nu
>>>  ; X64-NOBMI-NEXT:    movq %rdi, %rax
>>>  ; X64-NOBMI-NEXT:    # kill: def $cl killed $cl killed $rcx
>>>  ; X64-NOBMI-NEXT:    shrq %cl, %rax
>>> -; X64-NOBMI-NEXT:    negl %edx
>>> +; X64-NOBMI-NEXT:    negb %dl
>>>  ; X64-NOBMI-NEXT:    movl %edx, %ecx
>>>  ; X64-NOBMI-NEXT:    shlq %cl, %rax
>>>  ; X64-NOBMI-NEXT:    shrq %cl, %rax
>>> @@ -5838,8 +5838,8 @@ define i64 @bextr64_d2_load(i64* %w, i64
>>>  ; X86-NOBMI-NEXT:    movl %eax, %edi
>>>  ; X86-NOBMI-NEXT:    xorl %eax, %eax
>>>  ; X86-NOBMI-NEXT:  .LBB45_2:
>>> -; X86-NOBMI-NEXT:    movl $64, %ecx
>>> -; X86-NOBMI-NEXT:    subl {{[0-9]+}}(%esp), %ecx
>>> +; X86-NOBMI-NEXT:    movb $64, %cl
>>> +; X86-NOBMI-NEXT:    subb {{[0-9]+}}(%esp), %cl
>>>  ; X86-NOBMI-NEXT:    shldl %cl, %edi, %eax
>>>  ; X86-NOBMI-NEXT:    shll %cl, %edi
>>>  ; X86-NOBMI-NEXT:    testb $32, %cl
>>> @@ -5887,8 +5887,8 @@ define i64 @bextr64_d2_load(i64* %w, i64
>>>  ; X86-BMI1NOTBM-NEXT:    movl %eax, %edi
>>>  ; X86-BMI1NOTBM-NEXT:    xorl %eax, %eax
>>>  ; X86-BMI1NOTBM-NEXT:  .LBB45_2:
>>> -; X86-BMI1NOTBM-NEXT:    movl $64, %ecx
>>> -; X86-BMI1NOTBM-NEXT:    subl {{[0-9]+}}(%esp), %ecx
>>> +; X86-BMI1NOTBM-NEXT:    movb $64, %cl
>>> +; X86-BMI1NOTBM-NEXT:    subb {{[0-9]+}}(%esp), %cl
>>>  ; X86-BMI1NOTBM-NEXT:    shldl %cl, %edi, %eax
>>>  ; X86-BMI1NOTBM-NEXT:    shll %cl, %edi
>>>  ; X86-BMI1NOTBM-NEXT:    testb $32, %cl
>>> @@ -5934,8 +5934,8 @@ define i64 @bextr64_d2_load(i64* %w, i64
>>>  ; X86-BMI1BMI2-NEXT:    movl %esi, %eax
>>>  ; X86-BMI1BMI2-NEXT:    xorl %esi, %esi
>>>  ; X86-BMI1BMI2-NEXT:  .LBB45_2:
>>> -; X86-BMI1BMI2-NEXT:    movl $64, %ecx
>>> -; X86-BMI1BMI2-NEXT:    subl {{[0-9]+}}(%esp), %ecx
>>> +; X86-BMI1BMI2-NEXT:    movb $64, %cl
>>> +; X86-BMI1BMI2-NEXT:    subb {{[0-9]+}}(%esp), %cl
>>>  ; X86-BMI1BMI2-NEXT:    shldl %cl, %eax, %esi
>>>  ; X86-BMI1BMI2-NEXT:    shlxl %ecx, %eax, %edi
>>>  ; X86-BMI1BMI2-NEXT:    testb $32, %cl
>>> @@ -5965,7 +5965,7 @@ define i64 @bextr64_d2_load(i64* %w, i64
>>>  ; X64-NOBMI-NEXT:    movq (%rdi), %rax
>>>  ; X64-NOBMI-NEXT:    # kill: def $cl killed $cl killed $rcx
>>>  ; X64-NOBMI-NEXT:    shrq %cl, %rax
>>> -; X64-NOBMI-NEXT:    negl %edx
>>> +; X64-NOBMI-NEXT:    negb %dl
>>>  ; X64-NOBMI-NEXT:    movl %edx, %ecx
>>>  ; X64-NOBMI-NEXT:    shlq %cl, %rax
>>>  ; X64-NOBMI-NEXT:    shrq %cl, %rax
>>> @@ -6193,8 +6193,8 @@ define i64 @bextr64_d5_skipextrauses(i64
>>>  ; X86-NOBMI-NEXT:    movl %esi, %ebx
>>>  ; X86-NOBMI-NEXT:    xorl %esi, %esi
>>>  ; X86-NOBMI-NEXT:  .LBB47_2:
>>> -; X86-NOBMI-NEXT:    movl $64, %ecx
>>> -; X86-NOBMI-NEXT:    subl {{[0-9]+}}(%esp), %ecx
>>> +; X86-NOBMI-NEXT:    movb $64, %cl
>>> +; X86-NOBMI-NEXT:    subb {{[0-9]+}}(%esp), %cl
>>>  ; X86-NOBMI-NEXT:    shldl %cl, %ebx, %esi
>>>  ; X86-NOBMI-NEXT:    shll %cl, %ebx
>>>  ; X86-NOBMI-NEXT:    testb $32, %cl
>>> @@ -6254,8 +6254,8 @@ define i64 @bextr64_d5_skipextrauses(i64
>>>  ; X86-BMI1NOTBM-NEXT:    movl %esi, %ebx
>>>  ; X86-BMI1NOTBM-NEXT:    xorl %esi, %esi
>>>  ; X86-BMI1NOTBM-NEXT:  .LBB47_2:
>>> -; X86-BMI1NOTBM-NEXT:    movl $64, %ecx
>>> -; X86-BMI1NOTBM-NEXT:    subl {{[0-9]+}}(%esp), %ecx
>>> +; X86-BMI1NOTBM-NEXT:    movb $64, %cl
>>> +; X86-BMI1NOTBM-NEXT:    subb {{[0-9]+}}(%esp), %cl
>>>  ; X86-BMI1NOTBM-NEXT:    shldl %cl, %ebx, %esi
>>>  ; X86-BMI1NOTBM-NEXT:    shll %cl, %ebx
>>>  ; X86-BMI1NOTBM-NEXT:    testb $32, %cl
>>> @@ -6312,8 +6312,8 @@ define i64 @bextr64_d5_skipextrauses(i64
>>>  ; X86-BMI1BMI2-NEXT:    movl %edx, %edi
>>>  ; X86-BMI1BMI2-NEXT:    xorl %edx, %edx
>>>  ; X86-BMI1BMI2-NEXT:  .LBB47_2:
>>> -; X86-BMI1BMI2-NEXT:    movl $64, %ecx
>>> -; X86-BMI1BMI2-NEXT:    subl {{[0-9]+}}(%esp), %ecx
>>> +; X86-BMI1BMI2-NEXT:    movb $64, %cl
>>> +; X86-BMI1BMI2-NEXT:    subb {{[0-9]+}}(%esp), %cl
>>>  ; X86-BMI1BMI2-NEXT:    shldl %cl, %edi, %edx
>>>  ; X86-BMI1BMI2-NEXT:    shlxl %ecx, %edi, %ebx
>>>  ; X86-BMI1BMI2-NEXT:    testb $32, %cl
>>> @@ -6352,7 +6352,7 @@ define i64 @bextr64_d5_skipextrauses(i64
>>>  ; X64-NOBMI-NEXT:    movq %rdi, %rbx
>>>  ; X64-NOBMI-NEXT:    movl %esi, %ecx
>>>  ; X64-NOBMI-NEXT:    shrq %cl, %rbx
>>> -; X64-NOBMI-NEXT:    negl %edx
>>> +; X64-NOBMI-NEXT:    negb %dl
>>>  ; X64-NOBMI-NEXT:    movl %edx, %ecx
>>>  ; X64-NOBMI-NEXT:    shlq %cl, %rbx
>>>  ; X64-NOBMI-NEXT:    shrq %cl, %rbx
>>>
>>> Modified: llvm/trunk/test/CodeGen/X86/extract-lowbits.ll
>>> URL:
>>> http://llvm.org/viewvc/llvm-project/llvm/trunk/test/CodeGen/X86/extract-lowbits.ll?rev=347917&r1=347916&r2=347917&view=diff
>>>
>>> ==============================================================================
>>> --- llvm/trunk/test/CodeGen/X86/extract-lowbits.ll (original)
>>> +++ llvm/trunk/test/CodeGen/X86/extract-lowbits.ll Thu Nov 29 12:58:26
>>> 2018
>>> @@ -1436,7 +1436,7 @@ define i32 @bzhi32_c0(i32 %val, i32 %num
>>>  ; X86-NOBMI-NEXT:    pushl %esi
>>>  ; X86-NOBMI-NEXT:    subl $8, %esp
>>>  ; X86-NOBMI-NEXT:    xorl %ecx, %ecx
>>> -; X86-NOBMI-NEXT:    subl {{[0-9]+}}(%esp), %ecx
>>> +; X86-NOBMI-NEXT:    subb {{[0-9]+}}(%esp), %cl
>>>  ; X86-NOBMI-NEXT:    movl $-1, %esi
>>>  ; X86-NOBMI-NEXT:    # kill: def $cl killed $cl killed $ecx
>>>  ; X86-NOBMI-NEXT:    shrl %cl, %esi
>>> @@ -1453,7 +1453,7 @@ define i32 @bzhi32_c0(i32 %val, i32 %num
>>>  ; X86-BMI1NOTBM-NEXT:    pushl %esi
>>>  ; X86-BMI1NOTBM-NEXT:    subl $8, %esp
>>>  ; X86-BMI1NOTBM-NEXT:    xorl %ecx, %ecx
>>> -; X86-BMI1NOTBM-NEXT:    subl {{[0-9]+}}(%esp), %ecx
>>> +; X86-BMI1NOTBM-NEXT:    subb {{[0-9]+}}(%esp), %cl
>>>  ; X86-BMI1NOTBM-NEXT:    movl $-1, %esi
>>>  ; X86-BMI1NOTBM-NEXT:    # kill: def $cl killed $cl killed $ecx
>>>  ; X86-BMI1NOTBM-NEXT:    shrl %cl, %esi
>>> @@ -1467,18 +1467,18 @@ define i32 @bzhi32_c0(i32 %val, i32 %num
>>>  ;
>>>  ; X86-BMI1BMI2-LABEL: bzhi32_c0:
>>>  ; X86-BMI1BMI2:       # %bb.0:
>>> -; X86-BMI1BMI2-NEXT:    pushl %esi
>>> +; X86-BMI1BMI2-NEXT:    pushl %ebx
>>>  ; X86-BMI1BMI2-NEXT:    subl $8, %esp
>>> -; X86-BMI1BMI2-NEXT:    movl {{[0-9]+}}(%esp), %esi
>>> -; X86-BMI1BMI2-NEXT:    movl %esi, %eax
>>> -; X86-BMI1BMI2-NEXT:    negl %eax
>>> +; X86-BMI1BMI2-NEXT:    movb {{[0-9]+}}(%esp), %bl
>>> +; X86-BMI1BMI2-NEXT:    movl %ebx, %eax
>>> +; X86-BMI1BMI2-NEXT:    negb %al
>>>  ; X86-BMI1BMI2-NEXT:    movl $-1, %ecx
>>>  ; X86-BMI1BMI2-NEXT:    shrxl %eax, %ecx, %eax
>>>  ; X86-BMI1BMI2-NEXT:    movl %eax, (%esp)
>>>  ; X86-BMI1BMI2-NEXT:    calll use32
>>> -; X86-BMI1BMI2-NEXT:    bzhil %esi, {{[0-9]+}}(%esp), %eax
>>> +; X86-BMI1BMI2-NEXT:    bzhil %ebx, {{[0-9]+}}(%esp), %eax
>>>  ; X86-BMI1BMI2-NEXT:    addl $8, %esp
>>> -; X86-BMI1BMI2-NEXT:    popl %esi
>>> +; X86-BMI1BMI2-NEXT:    popl %ebx
>>>  ; X86-BMI1BMI2-NEXT:    retl
>>>  ;
>>>  ; X64-NOBMI-LABEL: bzhi32_c0:
>>> @@ -1488,7 +1488,7 @@ define i32 @bzhi32_c0(i32 %val, i32 %num
>>>  ; X64-NOBMI-NEXT:    pushq %rax
>>>  ; X64-NOBMI-NEXT:    movl %esi, %ecx
>>>  ; X64-NOBMI-NEXT:    movl %edi, %ebx
>>> -; X64-NOBMI-NEXT:    negl %ecx
>>> +; X64-NOBMI-NEXT:    negb %cl
>>>  ; X64-NOBMI-NEXT:    movl $-1, %ebp
>>>  ; X64-NOBMI-NEXT:    # kill: def $cl killed $cl killed $ecx
>>>  ; X64-NOBMI-NEXT:    shrl %cl, %ebp
>>> @@ -1508,7 +1508,7 @@ define i32 @bzhi32_c0(i32 %val, i32 %num
>>>  ; X64-BMI1NOTBM-NEXT:    pushq %rax
>>>  ; X64-BMI1NOTBM-NEXT:    movl %esi, %ecx
>>>  ; X64-BMI1NOTBM-NEXT:    movl %edi, %ebx
>>> -; X64-BMI1NOTBM-NEXT:    negl %ecx
>>> +; X64-BMI1NOTBM-NEXT:    negb %cl
>>>  ; X64-BMI1NOTBM-NEXT:    movl $-1, %ebp
>>>  ; X64-BMI1NOTBM-NEXT:    # kill: def $cl killed $cl killed $ecx
>>>  ; X64-BMI1NOTBM-NEXT:    shrl %cl, %ebp
>>> @@ -1528,8 +1528,8 @@ define i32 @bzhi32_c0(i32 %val, i32 %num
>>>  ; X64-BMI1BMI2-NEXT:    pushq %rax
>>>  ; X64-BMI1BMI2-NEXT:    movl %esi, %ebx
>>>  ; X64-BMI1BMI2-NEXT:    movl %edi, %ebp
>>> -; X64-BMI1BMI2-NEXT:    movl %esi, %eax
>>> -; X64-BMI1BMI2-NEXT:    negl %eax
>>> +; X64-BMI1BMI2-NEXT:    movl %ebx, %eax
>>> +; X64-BMI1BMI2-NEXT:    negb %al
>>>  ; X64-BMI1BMI2-NEXT:    movl $-1, %ecx
>>>  ; X64-BMI1BMI2-NEXT:    shrxl %eax, %ecx, %edi
>>>  ; X64-BMI1BMI2-NEXT:    callq use32
>>> @@ -1668,7 +1668,7 @@ define i32 @bzhi32_c2_load(i32* %w, i32
>>>  ; X86-NOBMI-NEXT:    subl $8, %esp
>>>  ; X86-NOBMI-NEXT:    movl {{[0-9]+}}(%esp), %eax
>>>  ; X86-NOBMI-NEXT:    xorl %ecx, %ecx
>>> -; X86-NOBMI-NEXT:    subl {{[0-9]+}}(%esp), %ecx
>>> +; X86-NOBMI-NEXT:    subb {{[0-9]+}}(%esp), %cl
>>>  ; X86-NOBMI-NEXT:    movl $-1, %edx
>>>  ; X86-NOBMI-NEXT:    # kill: def $cl killed $cl killed $ecx
>>>  ; X86-NOBMI-NEXT:    shrl %cl, %edx
>>> @@ -1687,7 +1687,7 @@ define i32 @bzhi32_c2_load(i32* %w, i32
>>>  ; X86-BMI1NOTBM-NEXT:    subl $8, %esp
>>>  ; X86-BMI1NOTBM-NEXT:    movl {{[0-9]+}}(%esp), %eax
>>>  ; X86-BMI1NOTBM-NEXT:    xorl %ecx, %ecx
>>> -; X86-BMI1NOTBM-NEXT:    subl {{[0-9]+}}(%esp), %ecx
>>> +; X86-BMI1NOTBM-NEXT:    subb {{[0-9]+}}(%esp), %cl
>>>  ; X86-BMI1NOTBM-NEXT:    movl $-1, %edx
>>>  ; X86-BMI1NOTBM-NEXT:    # kill: def $cl killed $cl killed $ecx
>>>  ; X86-BMI1NOTBM-NEXT:    shrl %cl, %edx
>>> @@ -1705,9 +1705,10 @@ define i32 @bzhi32_c2_load(i32* %w, i32
>>>  ; X86-BMI1BMI2-NEXT:    pushl %esi
>>>  ; X86-BMI1BMI2-NEXT:    subl $8, %esp
>>>  ; X86-BMI1BMI2-NEXT:    movl {{[0-9]+}}(%esp), %eax
>>> -; X86-BMI1BMI2-NEXT:    movl {{[0-9]+}}(%esp), %ecx
>>> +; X86-BMI1BMI2-NEXT:    movb {{[0-9]+}}(%esp), %cl
>>>  ; X86-BMI1BMI2-NEXT:    bzhil %ecx, (%eax), %esi
>>> -; X86-BMI1BMI2-NEXT:    negl %ecx
>>> +; X86-BMI1BMI2-NEXT:    # kill: def $cl killed $cl killed $ecx def $ecx
>>> +; X86-BMI1BMI2-NEXT:    negb %cl
>>>  ; X86-BMI1BMI2-NEXT:    movl $-1, %eax
>>>  ; X86-BMI1BMI2-NEXT:    shrxl %ecx, %eax, %eax
>>>  ; X86-BMI1BMI2-NEXT:    movl %eax, (%esp)
>>> @@ -1721,7 +1722,7 @@ define i32 @bzhi32_c2_load(i32* %w, i32
>>>  ; X64-NOBMI:       # %bb.0:
>>>  ; X64-NOBMI-NEXT:    pushq %rbx
>>>  ; X64-NOBMI-NEXT:    movl %esi, %ecx
>>> -; X64-NOBMI-NEXT:    negl %ecx
>>> +; X64-NOBMI-NEXT:    negb %cl
>>>  ; X64-NOBMI-NEXT:    movl $-1, %eax
>>>  ; X64-NOBMI-NEXT:    # kill: def $cl killed $cl killed $ecx
>>>  ; X64-NOBMI-NEXT:    shrl %cl, %eax
>>> @@ -1737,7 +1738,7 @@ define i32 @bzhi32_c2_load(i32* %w, i32
>>>  ; X64-BMI1NOTBM:       # %bb.0:
>>>  ; X64-BMI1NOTBM-NEXT:    pushq %rbx
>>>  ; X64-BMI1NOTBM-NEXT:    movl %esi, %ecx
>>> -; X64-BMI1NOTBM-NEXT:    negl %ecx
>>> +; X64-BMI1NOTBM-NEXT:    negb %cl
>>>  ; X64-BMI1NOTBM-NEXT:    movl $-1, %eax
>>>  ; X64-BMI1NOTBM-NEXT:    # kill: def $cl killed $cl killed $ecx
>>>  ; X64-BMI1NOTBM-NEXT:    shrl %cl, %eax
>>> @@ -1753,7 +1754,8 @@ define i32 @bzhi32_c2_load(i32* %w, i32
>>>  ; X64-BMI1BMI2:       # %bb.0:
>>>  ; X64-BMI1BMI2-NEXT:    pushq %rbx
>>>  ; X64-BMI1BMI2-NEXT:    bzhil %esi, (%rdi), %ebx
>>> -; X64-BMI1BMI2-NEXT:    negl %esi
>>> +; X64-BMI1BMI2-NEXT:    # kill: def $sil killed $sil killed $esi def
>>> $esi
>>> +; X64-BMI1BMI2-NEXT:    negb %sil
>>>  ; X64-BMI1BMI2-NEXT:    movl $-1, %eax
>>>  ; X64-BMI1BMI2-NEXT:    shrxl %esi, %eax, %edi
>>>  ; X64-BMI1BMI2-NEXT:    callq use32
>>> @@ -1884,7 +1886,7 @@ define i32 @bzhi32_c4_commutative(i32 %v
>>>  ; X86-NOBMI-NEXT:    pushl %esi
>>>  ; X86-NOBMI-NEXT:    subl $8, %esp
>>>  ; X86-NOBMI-NEXT:    xorl %ecx, %ecx
>>> -; X86-NOBMI-NEXT:    subl {{[0-9]+}}(%esp), %ecx
>>> +; X86-NOBMI-NEXT:    subb {{[0-9]+}}(%esp), %cl
>>>  ; X86-NOBMI-NEXT:    movl $-1, %esi
>>>  ; X86-NOBMI-NEXT:    # kill: def $cl killed $cl killed $ecx
>>>  ; X86-NOBMI-NEXT:    shrl %cl, %esi
>>> @@ -1901,7 +1903,7 @@ define i32 @bzhi32_c4_commutative(i32 %v
>>>  ; X86-BMI1NOTBM-NEXT:    pushl %esi
>>>  ; X86-BMI1NOTBM-NEXT:    subl $8, %esp
>>>  ; X86-BMI1NOTBM-NEXT:    xorl %ecx, %ecx
>>> -; X86-BMI1NOTBM-NEXT:    subl {{[0-9]+}}(%esp), %ecx
>>> +; X86-BMI1NOTBM-NEXT:    subb {{[0-9]+}}(%esp), %cl
>>>  ; X86-BMI1NOTBM-NEXT:    movl $-1, %esi
>>>  ; X86-BMI1NOTBM-NEXT:    # kill: def $cl killed $cl killed $ecx
>>>  ; X86-BMI1NOTBM-NEXT:    shrl %cl, %esi
>>> @@ -1915,18 +1917,18 @@ define i32 @bzhi32_c4_commutative(i32 %v
>>>  ;
>>>  ; X86-BMI1BMI2-LABEL: bzhi32_c4_commutative:
>>>  ; X86-BMI1BMI2:       # %bb.0:
>>> -; X86-BMI1BMI2-NEXT:    pushl %esi
>>> +; X86-BMI1BMI2-NEXT:    pushl %ebx
>>>  ; X86-BMI1BMI2-NEXT:    subl $8, %esp
>>> -; X86-BMI1BMI2-NEXT:    movl {{[0-9]+}}(%esp), %esi
>>> -; X86-BMI1BMI2-NEXT:    movl %esi, %eax
>>> -; X86-BMI1BMI2-NEXT:    negl %eax
>>> +; X86-BMI1BMI2-NEXT:    movb {{[0-9]+}}(%esp), %bl
>>> +; X86-BMI1BMI2-NEXT:    movl %ebx, %eax
>>> +; X86-BMI1BMI2-NEXT:    negb %al
>>>  ; X86-BMI1BMI2-NEXT:    movl $-1, %ecx
>>>  ; X86-BMI1BMI2-NEXT:    shrxl %eax, %ecx, %eax
>>>  ; X86-BMI1BMI2-NEXT:    movl %eax, (%esp)
>>>  ; X86-BMI1BMI2-NEXT:    calll use32
>>> -; X86-BMI1BMI2-NEXT:    bzhil %esi, {{[0-9]+}}(%esp), %eax
>>> +; X86-BMI1BMI2-NEXT:    bzhil %ebx, {{[0-9]+}}(%esp), %eax
>>>  ; X86-BMI1BMI2-NEXT:    addl $8, %esp
>>> -; X86-BMI1BMI2-NEXT:    popl %esi
>>> +; X86-BMI1BMI2-NEXT:    popl %ebx
>>>  ; X86-BMI1BMI2-NEXT:    retl
>>>  ;
>>>  ; X64-NOBMI-LABEL: bzhi32_c4_commutative:
>>> @@ -1936,7 +1938,7 @@ define i32 @bzhi32_c4_commutative(i32 %v
>>>  ; X64-NOBMI-NEXT:    pushq %rax
>>>  ; X64-NOBMI-NEXT:    movl %esi, %ecx
>>>  ; X64-NOBMI-NEXT:    movl %edi, %ebx
>>> -; X64-NOBMI-NEXT:    negl %ecx
>>> +; X64-NOBMI-NEXT:    negb %cl
>>>  ; X64-NOBMI-NEXT:    movl $-1, %ebp
>>>  ; X64-NOBMI-NEXT:    # kill: def $cl killed $cl killed $ecx
>>>  ; X64-NOBMI-NEXT:    shrl %cl, %ebp
>>> @@ -1956,7 +1958,7 @@ define i32 @bzhi32_c4_commutative(i32 %v
>>>  ; X64-BMI1NOTBM-NEXT:    pushq %rax
>>>  ; X64-BMI1NOTBM-NEXT:    movl %esi, %ecx
>>>  ; X64-BMI1NOTBM-NEXT:    movl %edi, %ebx
>>> -; X64-BMI1NOTBM-NEXT:    negl %ecx
>>> +; X64-BMI1NOTBM-NEXT:    negb %cl
>>>  ; X64-BMI1NOTBM-NEXT:    movl $-1, %ebp
>>>  ; X64-BMI1NOTBM-NEXT:    # kill: def $cl killed $cl killed $ecx
>>>  ; X64-BMI1NOTBM-NEXT:    shrl %cl, %ebp
>>> @@ -1976,8 +1978,8 @@ define i32 @bzhi32_c4_commutative(i32 %v
>>>  ; X64-BMI1BMI2-NEXT:    pushq %rax
>>>  ; X64-BMI1BMI2-NEXT:    movl %esi, %ebx
>>>  ; X64-BMI1BMI2-NEXT:    movl %edi, %ebp
>>> -; X64-BMI1BMI2-NEXT:    movl %esi, %eax
>>> -; X64-BMI1BMI2-NEXT:    negl %eax
>>> +; X64-BMI1BMI2-NEXT:    movl %ebx, %eax
>>> +; X64-BMI1BMI2-NEXT:    negb %al
>>>  ; X64-BMI1BMI2-NEXT:    movl $-1, %ecx
>>>  ; X64-BMI1BMI2-NEXT:    shrxl %eax, %ecx, %edi
>>>  ; X64-BMI1BMI2-NEXT:    callq use32
>>> @@ -2003,8 +2005,8 @@ define i64 @bzhi64_c0(i64 %val, i64 %num
>>>  ; X86-NOBMI-NEXT:    pushl %edi
>>>  ; X86-NOBMI-NEXT:    pushl %esi
>>>  ; X86-NOBMI-NEXT:    pushl %eax
>>> -; X86-NOBMI-NEXT:    movl $64, %ecx
>>> -; X86-NOBMI-NEXT:    subl {{[0-9]+}}(%esp), %ecx
>>> +; X86-NOBMI-NEXT:    movb $64, %cl
>>> +; X86-NOBMI-NEXT:    subb {{[0-9]+}}(%esp), %cl
>>>  ; X86-NOBMI-NEXT:    movl $-1, %esi
>>>  ; X86-NOBMI-NEXT:    movl $-1, %edi
>>>  ; X86-NOBMI-NEXT:    shrl %cl, %edi
>>> @@ -2034,8 +2036,8 @@ define i64 @bzhi64_c0(i64 %val, i64 %num
>>>  ; X86-BMI1NOTBM-NEXT:    pushl %edi
>>>  ; X86-BMI1NOTBM-NEXT:    pushl %esi
>>>  ; X86-BMI1NOTBM-NEXT:    pushl %eax
>>> -; X86-BMI1NOTBM-NEXT:    movl $64, %ecx
>>> -; X86-BMI1NOTBM-NEXT:    subl {{[0-9]+}}(%esp), %ecx
>>> +; X86-BMI1NOTBM-NEXT:    movb $64, %cl
>>> +; X86-BMI1NOTBM-NEXT:    subb {{[0-9]+}}(%esp), %cl
>>>  ; X86-BMI1NOTBM-NEXT:    movl $-1, %esi
>>>  ; X86-BMI1NOTBM-NEXT:    movl $-1, %edi
>>>  ; X86-BMI1NOTBM-NEXT:    shrl %cl, %edi
>>> @@ -2065,8 +2067,8 @@ define i64 @bzhi64_c0(i64 %val, i64 %num
>>>  ; X86-BMI1BMI2-NEXT:    pushl %edi
>>>  ; X86-BMI1BMI2-NEXT:    pushl %esi
>>>  ; X86-BMI1BMI2-NEXT:    pushl %eax
>>> -; X86-BMI1BMI2-NEXT:    movl $64, %ecx
>>> -; X86-BMI1BMI2-NEXT:    subl {{[0-9]+}}(%esp), %ecx
>>> +; X86-BMI1BMI2-NEXT:    movb $64, %cl
>>> +; X86-BMI1BMI2-NEXT:    subb {{[0-9]+}}(%esp), %cl
>>>  ; X86-BMI1BMI2-NEXT:    movl $-1, %esi
>>>  ; X86-BMI1BMI2-NEXT:    shrxl %ecx, %esi, %edi
>>>  ; X86-BMI1BMI2-NEXT:    shrdl %cl, %esi, %esi
>>> @@ -2097,7 +2099,7 @@ define i64 @bzhi64_c0(i64 %val, i64 %num
>>>  ; X64-NOBMI-NEXT:    pushq %rax
>>>  ; X64-NOBMI-NEXT:    movq %rsi, %rcx
>>>  ; X64-NOBMI-NEXT:    movq %rdi, %r14
>>> -; X64-NOBMI-NEXT:    negl %ecx
>>> +; X64-NOBMI-NEXT:    negb %cl
>>>  ; X64-NOBMI-NEXT:    movq $-1, %rbx
>>>  ; X64-NOBMI-NEXT:    # kill: def $cl killed $cl killed $rcx
>>>  ; X64-NOBMI-NEXT:    shrq %cl, %rbx
>>> @@ -2117,7 +2119,7 @@ define i64 @bzhi64_c0(i64 %val, i64 %num
>>>  ; X64-BMI1NOTBM-NEXT:    pushq %rax
>>>  ; X64-BMI1NOTBM-NEXT:    movq %rsi, %rcx
>>>  ; X64-BMI1NOTBM-NEXT:    movq %rdi, %r14
>>> -; X64-BMI1NOTBM-NEXT:    negl %ecx
>>> +; X64-BMI1NOTBM-NEXT:    negb %cl
>>>  ; X64-BMI1NOTBM-NEXT:    movq $-1, %rbx
>>>  ; X64-BMI1NOTBM-NEXT:    # kill: def $cl killed $cl killed $rcx
>>>  ; X64-BMI1NOTBM-NEXT:    shrq %cl, %rbx
>>> @@ -2138,7 +2140,7 @@ define i64 @bzhi64_c0(i64 %val, i64 %num
>>>  ; X64-BMI1BMI2-NEXT:    movq %rsi, %rbx
>>>  ; X64-BMI1BMI2-NEXT:    movq %rdi, %r14
>>>  ; X64-BMI1BMI2-NEXT:    movl %ebx, %eax
>>> -; X64-BMI1BMI2-NEXT:    negl %eax
>>> +; X64-BMI1BMI2-NEXT:    negb %al
>>>  ; X64-BMI1BMI2-NEXT:    movq $-1, %rcx
>>>  ; X64-BMI1BMI2-NEXT:    shrxq %rax, %rcx, %rdi
>>>  ; X64-BMI1BMI2-NEXT:    callq use64
>>>
>> @@ -2318,26 +2320,26 @@ define i64 @bzhi64_c2_load(i64* %w, i64
>>>  ; X86-NOBMI-NEXT:    pushl %ebx
>>>  ; X86-NOBMI-NEXT:    pushl %edi
>>>  ; X86-NOBMI-NEXT:    pushl %esi
>>> -; X86-NOBMI-NEXT:    movl {{[0-9]+}}(%esp), %eax
>>> -; X86-NOBMI-NEXT:    movl $64, %ecx
>>> -; X86-NOBMI-NEXT:    subl {{[0-9]+}}(%esp), %ecx
>>> -; X86-NOBMI-NEXT:    movl $-1, %edx
>>> +; X86-NOBMI-NEXT:    movl {{[0-9]+}}(%esp), %edx
>>> +; X86-NOBMI-NEXT:    movb $64, %cl
>>> +; X86-NOBMI-NEXT:    subb {{[0-9]+}}(%esp), %cl
>>> +; X86-NOBMI-NEXT:    movl $-1, %eax
>>>  ; X86-NOBMI-NEXT:    movl $-1, %ebx
>>>  ; X86-NOBMI-NEXT:    shrl %cl, %ebx
>>> -; X86-NOBMI-NEXT:    shrdl %cl, %edx, %edx
>>> +; X86-NOBMI-NEXT:    shrdl %cl, %eax, %eax
>>>  ; X86-NOBMI-NEXT:    testb $32, %cl
>>>  ; X86-NOBMI-NEXT:    je .LBB27_2
>>>  ; X86-NOBMI-NEXT:  # %bb.1:
>>> -; X86-NOBMI-NEXT:    movl %ebx, %edx
>>> +; X86-NOBMI-NEXT:    movl %ebx, %eax
>>>  ; X86-NOBMI-NEXT:    xorl %ebx, %ebx
>>>  ; X86-NOBMI-NEXT:  .LBB27_2:
>>> -; X86-NOBMI-NEXT:    movl (%eax), %esi
>>> -; X86-NOBMI-NEXT:    andl %edx, %esi
>>> -; X86-NOBMI-NEXT:    movl 4(%eax), %edi
>>> +; X86-NOBMI-NEXT:    movl (%edx), %esi
>>> +; X86-NOBMI-NEXT:    andl %eax, %esi
>>> +; X86-NOBMI-NEXT:    movl 4(%edx), %edi
>>>  ; X86-NOBMI-NEXT:    andl %ebx, %edi
>>>  ; X86-NOBMI-NEXT:    subl $8, %esp
>>>  ; X86-NOBMI-NEXT:    pushl %ebx
>>> -; X86-NOBMI-NEXT:    pushl %edx
>>> +; X86-NOBMI-NEXT:    pushl %eax
>>>  ; X86-NOBMI-NEXT:    calll use64
>>>  ; X86-NOBMI-NEXT:    addl $16, %esp
>>>  ; X86-NOBMI-NEXT:    movl %esi, %eax
>>> @@ -2352,26 +2354,26 @@ define i64 @bzhi64_c2_load(i64* %w, i64
>>>  ; X86-BMI1NOTBM-NEXT:    pushl %ebx
>>>  ; X86-BMI1NOTBM-NEXT:    pushl %edi
>>>  ; X86-BMI1NOTBM-NEXT:    pushl %esi
>>> -; X86-BMI1NOTBM-NEXT:    movl {{[0-9]+}}(%esp), %eax
>>> -; X86-BMI1NOTBM-NEXT:    movl $64, %ecx
>>> -; X86-BMI1NOTBM-NEXT:    subl {{[0-9]+}}(%esp), %ecx
>>> -; X86-BMI1NOTBM-NEXT:    movl $-1, %edx
>>> +; X86-BMI1NOTBM-NEXT:    movl {{[0-9]+}}(%esp), %edx
>>> +; X86-BMI1NOTBM-NEXT:    movb $64, %cl
>>> +; X86-BMI1NOTBM-NEXT:    subb {{[0-9]+}}(%esp), %cl
>>> +; X86-BMI1NOTBM-NEXT:    movl $-1, %eax
>>>  ; X86-BMI1NOTBM-NEXT:    movl $-1, %ebx
>>>  ; X86-BMI1NOTBM-NEXT:    shrl %cl, %ebx
>>> -; X86-BMI1NOTBM-NEXT:    shrdl %cl, %edx, %edx
>>> +; X86-BMI1NOTBM-NEXT:    shrdl %cl, %eax, %eax
>>>  ; X86-BMI1NOTBM-NEXT:    testb $32, %cl
>>>  ; X86-BMI1NOTBM-NEXT:    je .LBB27_2
>>>  ; X86-BMI1NOTBM-NEXT:  # %bb.1:
>>> -; X86-BMI1NOTBM-NEXT:    movl %ebx, %edx
>>> +; X86-BMI1NOTBM-NEXT:    movl %ebx, %eax
>>>  ; X86-BMI1NOTBM-NEXT:    xorl %ebx, %ebx
>>>  ; X86-BMI1NOTBM-NEXT:  .LBB27_2:
>>> -; X86-BMI1NOTBM-NEXT:    movl (%eax), %esi
>>> -; X86-BMI1NOTBM-NEXT:    andl %edx, %esi
>>> -; X86-BMI1NOTBM-NEXT:    movl 4(%eax), %edi
>>> +; X86-BMI1NOTBM-NEXT:    movl (%edx), %esi
>>> +; X86-BMI1NOTBM-NEXT:    andl %eax, %esi
>>> +; X86-BMI1NOTBM-NEXT:    movl 4(%edx), %edi
>>>  ; X86-BMI1NOTBM-NEXT:    andl %ebx, %edi
>>>  ; X86-BMI1NOTBM-NEXT:    subl $8, %esp
>>>  ; X86-BMI1NOTBM-NEXT:    pushl %ebx
>>> -; X86-BMI1NOTBM-NEXT:    pushl %edx
>>> +; X86-BMI1NOTBM-NEXT:    pushl %eax
>>>  ; X86-BMI1NOTBM-NEXT:    calll use64
>>>  ; X86-BMI1NOTBM-NEXT:    addl $16, %esp
>>>  ; X86-BMI1NOTBM-NEXT:    movl %esi, %eax
>>> @@ -2386,25 +2388,25 @@ define i64 @bzhi64_c2_load(i64* %w, i64
>>>  ; X86-BMI1BMI2-NEXT:    pushl %ebx
>>>  ; X86-BMI1BMI2-NEXT:    pushl %edi
>>>  ; X86-BMI1BMI2-NEXT:    pushl %esi
>>> -; X86-BMI1BMI2-NEXT:    movl {{[0-9]+}}(%esp), %eax
>>> -; X86-BMI1BMI2-NEXT:    movl $64, %ecx
>>> -; X86-BMI1BMI2-NEXT:    subl {{[0-9]+}}(%esp), %ecx
>>> -; X86-BMI1BMI2-NEXT:    movl $-1, %edx
>>> -; X86-BMI1BMI2-NEXT:    shrxl %ecx, %edx, %ebx
>>> -; X86-BMI1BMI2-NEXT:    shrdl %cl, %edx, %edx
>>> +; X86-BMI1BMI2-NEXT:    movl {{[0-9]+}}(%esp), %edx
>>> +; X86-BMI1BMI2-NEXT:    movb $64, %cl
>>> +; X86-BMI1BMI2-NEXT:    subb {{[0-9]+}}(%esp), %cl
>>> +; X86-BMI1BMI2-NEXT:    movl $-1, %eax
>>> +; X86-BMI1BMI2-NEXT:    shrxl %ecx, %eax, %ebx
>>> +; X86-BMI1BMI2-NEXT:    shrdl %cl, %eax, %eax
>>>  ; X86-BMI1BMI2-NEXT:    testb $32, %cl
>>>  ; X86-BMI1BMI2-NEXT:    je .LBB27_2
>>>  ; X86-BMI1BMI2-NEXT:  # %bb.1:
>>> -; X86-BMI1BMI2-NEXT:    movl %ebx, %edx
>>> +; X86-BMI1BMI2-NEXT:    movl %ebx, %eax
>>>  ; X86-BMI1BMI2-NEXT:    xorl %ebx, %ebx
>>>  ; X86-BMI1BMI2-NEXT:  .LBB27_2:
>>> -; X86-BMI1BMI2-NEXT:    movl (%eax), %esi
>>> -; X86-BMI1BMI2-NEXT:    andl %edx, %esi
>>> -; X86-BMI1BMI2-NEXT:    movl 4(%eax), %edi
>>> +; X86-BMI1BMI2-NEXT:    movl (%edx), %esi
>>> +; X86-BMI1BMI2-NEXT:    andl %eax, %esi
>>> +; X86-BMI1BMI2-NEXT:    movl 4(%edx), %edi
>>>  ; X86-BMI1BMI2-NEXT:    andl %ebx, %edi
>>>  ; X86-BMI1BMI2-NEXT:    subl $8, %esp
>>>  ; X86-BMI1BMI2-NEXT:    pushl %ebx
>>> -; X86-BMI1BMI2-NEXT:    pushl %edx
>>> +; X86-BMI1BMI2-NEXT:    pushl %eax
>>>  ; X86-BMI1BMI2-NEXT:    calll use64
>>>  ; X86-BMI1BMI2-NEXT:    addl $16, %esp
>>>  ; X86-BMI1BMI2-NEXT:    movl %esi, %eax
>>> @@ -2418,7 +2420,7 @@ define i64 @bzhi64_c2_load(i64* %w, i64
>>>  ; X64-NOBMI:       # %bb.0:
>>>  ; X64-NOBMI-NEXT:    pushq %rbx
>>>  ; X64-NOBMI-NEXT:    movq %rsi, %rcx
>>> -; X64-NOBMI-NEXT:    negl %ecx
>>> +; X64-NOBMI-NEXT:    negb %cl
>>>  ; X64-NOBMI-NEXT:    movq $-1, %rax
>>>  ; X64-NOBMI-NEXT:    # kill: def $cl killed $cl killed $rcx
>>>  ; X64-NOBMI-NEXT:    shrq %cl, %rax
>>> @@ -2434,7 +2436,7 @@ define i64 @bzhi64_c2_load(i64* %w, i64
>>>  ; X64-BMI1NOTBM:       # %bb.0:
>>>  ; X64-BMI1NOTBM-NEXT:    pushq %rbx
>>>  ; X64-BMI1NOTBM-NEXT:    movq %rsi, %rcx
>>> -; X64-BMI1NOTBM-NEXT:    negl %ecx
>>> +; X64-BMI1NOTBM-NEXT:    negb %cl
>>>  ; X64-BMI1NOTBM-NEXT:    movq $-1, %rax
>>>  ; X64-BMI1NOTBM-NEXT:    # kill: def $cl killed $cl killed $rcx
>>>  ; X64-BMI1NOTBM-NEXT:    shrq %cl, %rax
>>> @@ -2450,8 +2452,8 @@ define i64 @bzhi64_c2_load(i64* %w, i64
>>>  ; X64-BMI1BMI2:       # %bb.0:
>>>  ; X64-BMI1BMI2-NEXT:    pushq %rbx
>>>  ; X64-BMI1BMI2-NEXT:    bzhiq %rsi, (%rdi), %rbx
>>> -; X64-BMI1BMI2-NEXT:    # kill: def $esi killed $esi killed $rsi def
>>> $rsi
>>> -; X64-BMI1BMI2-NEXT:    negl %esi
>>> +; X64-BMI1BMI2-NEXT:    # kill: def $sil killed $sil killed $rsi def
>>> $rsi
>>> +; X64-BMI1BMI2-NEXT:    negb %sil
>>>  ; X64-BMI1BMI2-NEXT:    movq $-1, %rax
>>>  ; X64-BMI1BMI2-NEXT:    shrxq %rsi, %rax, %rdi
>>>  ; X64-BMI1BMI2-NEXT:    callq use64
>>> @@ -2628,8 +2630,8 @@ define i64 @bzhi64_c4_commutative(i64 %v
>>>  ; X86-NOBMI-NEXT:    pushl %edi
>>>  ; X86-NOBMI-NEXT:    pushl %esi
>>>  ; X86-NOBMI-NEXT:    pushl %eax
>>> -; X86-NOBMI-NEXT:    movl $64, %ecx
>>> -; X86-NOBMI-NEXT:    subl {{[0-9]+}}(%esp), %ecx
>>> +; X86-NOBMI-NEXT:    movb $64, %cl
>>> +; X86-NOBMI-NEXT:    subb {{[0-9]+}}(%esp), %cl
>>>  ; X86-NOBMI-NEXT:    movl $-1, %esi
>>>  ; X86-NOBMI-NEXT:    movl $-1, %edi
>>>  ; X86-NOBMI-NEXT:    shrl %cl, %edi
>>> @@ -2659,8 +2661,8 @@ define i64 @bzhi64_c4_commutative(i64 %v
>>>  ; X86-BMI1NOTBM-NEXT:    pushl %edi
>>>  ; X86-BMI1NOTBM-NEXT:    pushl %esi
>>>  ; X86-BMI1NOTBM-NEXT:    pushl %eax
>>> -; X86-BMI1NOTBM-NEXT:    movl $64, %ecx
>>> -; X86-BMI1NOTBM-NEXT:    subl {{[0-9]+}}(%esp), %ecx
>>> +; X86-BMI1NOTBM-NEXT:    movb $64, %cl
>>> +; X86-BMI1NOTBM-NEXT:    subb {{[0-9]+}}(%esp), %cl
>>>  ; X86-BMI1NOTBM-NEXT:    movl $-1, %esi
>>>  ; X86-BMI1NOTBM-NEXT:    movl $-1, %edi
>>>  ; X86-BMI1NOTBM-NEXT:    shrl %cl, %edi
>>> @@ -2690,8 +2692,8 @@ define i64 @bzhi64_c4_commutative(i64 %v
>>>  ; X86-BMI1BMI2-NEXT:    pushl %edi
>>>  ; X86-BMI1BMI2-NEXT:    pushl %esi
>>>  ; X86-BMI1BMI2-NEXT:    pushl %eax
>>> -; X86-BMI1BMI2-NEXT:    movl $64, %ecx
>>> -; X86-BMI1BMI2-NEXT:    subl {{[0-9]+}}(%esp), %ecx
>>> +; X86-BMI1BMI2-NEXT:    movb $64, %cl
>>> +; X86-BMI1BMI2-NEXT:    subb {{[0-9]+}}(%esp), %cl
>>>  ; X86-BMI1BMI2-NEXT:    movl $-1, %esi
>>>  ; X86-BMI1BMI2-NEXT:    shrxl %ecx, %esi, %edi
>>>  ; X86-BMI1BMI2-NEXT:    shrdl %cl, %esi, %esi
>>> @@ -2722,7 +2724,7 @@ define i64 @bzhi64_c4_commutative(i64 %v
>>>  ; X64-NOBMI-NEXT:    pushq %rax
>>>  ; X64-NOBMI-NEXT:    movq %rsi, %rcx
>>>  ; X64-NOBMI-NEXT:    movq %rdi, %r14
>>> -; X64-NOBMI-NEXT:    negl %ecx
>>> +; X64-NOBMI-NEXT:    negb %cl
>>>  ; X64-NOBMI-NEXT:    movq $-1, %rbx
>>>  ; X64-NOBMI-NEXT:    # kill: def $cl killed $cl killed $rcx
>>>  ; X64-NOBMI-NEXT:    shrq %cl, %rbx
>>> @@ -2742,7 +2744,7 @@ define i64 @bzhi64_c4_commutative(i64 %v
>>>  ; X64-BMI1NOTBM-NEXT:    pushq %rax
>>>  ; X64-BMI1NOTBM-NEXT:    movq %rsi, %rcx
>>>  ; X64-BMI1NOTBM-NEXT:    movq %rdi, %r14
>>> -; X64-BMI1NOTBM-NEXT:    negl %ecx
>>> +; X64-BMI1NOTBM-NEXT:    negb %cl
>>>  ; X64-BMI1NOTBM-NEXT:    movq $-1, %rbx
>>>  ; X64-BMI1NOTBM-NEXT:    # kill: def $cl killed $cl killed $rcx
>>>  ; X64-BMI1NOTBM-NEXT:    shrq %cl, %rbx
>>> @@ -2763,7 +2765,7 @@ define i64 @bzhi64_c4_commutative(i64 %v
>>>  ; X64-BMI1BMI2-NEXT:    movq %rsi, %rbx
>>>  ; X64-BMI1BMI2-NEXT:    movq %rdi, %r14
>>>  ; X64-BMI1BMI2-NEXT:    movl %ebx, %eax
>>> -; X64-BMI1BMI2-NEXT:    negl %eax
>>> +; X64-BMI1BMI2-NEXT:    negb %al
>>>  ; X64-BMI1BMI2-NEXT:    movq $-1, %rcx
>>>  ; X64-BMI1BMI2-NEXT:    shrxq %rax, %rcx, %rdi
>>>  ; X64-BMI1BMI2-NEXT:    callq use64
>>> @@ -2788,7 +2790,7 @@ define i32 @bzhi32_d0(i32 %val, i32 %num
>>>  ; X86-NOBMI:       # %bb.0:
>>>  ; X86-NOBMI-NEXT:    movl {{[0-9]+}}(%esp), %eax
>>>  ; X86-NOBMI-NEXT:    xorl %ecx, %ecx
>>> -; X86-NOBMI-NEXT:    subl {{[0-9]+}}(%esp), %ecx
>>> +; X86-NOBMI-NEXT:    subb {{[0-9]+}}(%esp), %cl
>>>  ; X86-NOBMI-NEXT:    shll %cl, %eax
>>>  ; X86-NOBMI-NEXT:    # kill: def $cl killed $cl killed $ecx
>>>  ; X86-NOBMI-NEXT:    shrl %cl, %eax
>>> @@ -2796,14 +2798,14 @@ define i32 @bzhi32_d0(i32 %val, i32 %num
>>>  ;
>>>  ; X86-BMI1NOTBM-LABEL: bzhi32_d0:
>>>  ; X86-BMI1NOTBM:       # %bb.0:
>>> -; X86-BMI1NOTBM-NEXT:    movl {{[0-9]+}}(%esp), %eax
>>> +; X86-BMI1NOTBM-NEXT:    movb {{[0-9]+}}(%esp), %al
>>>  ; X86-BMI1NOTBM-NEXT:    shll $8, %eax
>>>  ; X86-BMI1NOTBM-NEXT:    bextrl %eax, {{[0-9]+}}(%esp), %eax
>>>  ; X86-BMI1NOTBM-NEXT:    retl
>>>  ;
>>>  ; X86-BMI1BMI2-LABEL: bzhi32_d0:
>>>  ; X86-BMI1BMI2:       # %bb.0:
>>> -; X86-BMI1BMI2-NEXT:    movl {{[0-9]+}}(%esp), %eax
>>> +; X86-BMI1BMI2-NEXT:    movb {{[0-9]+}}(%esp), %al
>>>  ; X86-BMI1BMI2-NEXT:    bzhil %eax, {{[0-9]+}}(%esp), %eax
>>>  ; X86-BMI1BMI2-NEXT:    retl
>>>  ;
>>> @@ -2811,7 +2813,7 @@ define i32 @bzhi32_d0(i32 %val, i32 %num
>>>  ; X64-NOBMI:       # %bb.0:
>>>  ; X64-NOBMI-NEXT:    movl %esi, %ecx
>>>  ; X64-NOBMI-NEXT:    movl %edi, %eax
>>> -; X64-NOBMI-NEXT:    negl %ecx
>>> +; X64-NOBMI-NEXT:    negb %cl
>>>  ; X64-NOBMI-NEXT:    shll %cl, %eax
>>>  ; X64-NOBMI-NEXT:    # kill: def $cl killed $cl killed $ecx
>>>  ; X64-NOBMI-NEXT:    shrl %cl, %eax
>>> @@ -2890,7 +2892,7 @@ define i32 @bzhi32_d2_load(i32* %w, i32
>>>  ; X86-NOBMI-NEXT:    movl {{[0-9]+}}(%esp), %eax
>>>  ; X86-NOBMI-NEXT:    movl (%eax), %eax
>>>  ; X86-NOBMI-NEXT:    xorl %ecx, %ecx
>>> -; X86-NOBMI-NEXT:    subl {{[0-9]+}}(%esp), %ecx
>>> +; X86-NOBMI-NEXT:    subb {{[0-9]+}}(%esp), %cl
>>>  ; X86-NOBMI-NEXT:    shll %cl, %eax
>>>  ; X86-NOBMI-NEXT:    # kill: def $cl killed $cl killed $ecx
>>>  ; X86-NOBMI-NEXT:    shrl %cl, %eax
>>> @@ -2899,7 +2901,7 @@ define i32 @bzhi32_d2_load(i32* %w, i32
>>>  ; X86-BMI1NOTBM-LABEL: bzhi32_d2_load:
>>>  ; X86-BMI1NOTBM:       # %bb.0:
>>>  ; X86-BMI1NOTBM-NEXT:    movl {{[0-9]+}}(%esp), %eax
>>> -; X86-BMI1NOTBM-NEXT:    movl {{[0-9]+}}(%esp), %ecx
>>> +; X86-BMI1NOTBM-NEXT:    movb {{[0-9]+}}(%esp), %cl
>>>  ; X86-BMI1NOTBM-NEXT:    shll $8, %ecx
>>>  ; X86-BMI1NOTBM-NEXT:    bextrl %ecx, (%eax), %eax
>>>  ; X86-BMI1NOTBM-NEXT:    retl
>>> @@ -2907,15 +2909,15 @@ define i32 @bzhi32_d2_load(i32* %w, i32
>>>  ; X86-BMI1BMI2-LABEL: bzhi32_d2_load:
>>>  ; X86-BMI1BMI2:       # %bb.0:
>>>  ; X86-BMI1BMI2-NEXT:    movl {{[0-9]+}}(%esp), %eax
>>> -; X86-BMI1BMI2-NEXT:    movl {{[0-9]+}}(%esp), %ecx
>>> -; X86-BMI1BMI2-NEXT:    bzhil %eax, (%ecx), %eax
>>> +; X86-BMI1BMI2-NEXT:    movb {{[0-9]+}}(%esp), %cl
>>> +; X86-BMI1BMI2-NEXT:    bzhil %ecx, (%eax), %eax
>>>  ; X86-BMI1BMI2-NEXT:    retl
>>>  ;
>>>  ; X64-NOBMI-LABEL: bzhi32_d2_load:
>>>  ; X64-NOBMI:       # %bb.0:
>>>  ; X64-NOBMI-NEXT:    movl %esi, %ecx
>>>  ; X64-NOBMI-NEXT:    movl (%rdi), %eax
>>> -; X64-NOBMI-NEXT:    negl %ecx
>>> +; X64-NOBMI-NEXT:    negb %cl
>>>  ; X64-NOBMI-NEXT:    shll %cl, %eax
>>>  ; X64-NOBMI-NEXT:    # kill: def $cl killed $cl killed $ecx
>>>  ; X64-NOBMI-NEXT:    shrl %cl, %eax
>>> @@ -3003,8 +3005,8 @@ define i64 @bzhi64_d0(i64 %val, i64 %num
>>>  ; X86-NOBMI-NEXT:    pushl %esi
>>>  ; X86-NOBMI-NEXT:    movl {{[0-9]+}}(%esp), %edx
>>>  ; X86-NOBMI-NEXT:    movl {{[0-9]+}}(%esp), %eax
>>> -; X86-NOBMI-NEXT:    movl $64, %ecx
>>> -; X86-NOBMI-NEXT:    subl {{[0-9]+}}(%esp), %ecx
>>> +; X86-NOBMI-NEXT:    movb $64, %cl
>>> +; X86-NOBMI-NEXT:    subb {{[0-9]+}}(%esp), %cl
>>>  ; X86-NOBMI-NEXT:    movl %edx, %esi
>>>  ; X86-NOBMI-NEXT:    shll %cl, %esi
>>>  ; X86-NOBMI-NEXT:    shldl %cl, %edx, %eax
>>> @@ -3042,8 +3044,8 @@ define i64 @bzhi64_d0(i64 %val, i64 %num
>>>  ; X86-BMI1NOTBM-NEXT:    pushl %esi
>>>  ; X86-BMI1NOTBM-NEXT:    movl {{[0-9]+}}(%esp), %edx
>>>  ; X86-BMI1NOTBM-NEXT:    movl {{[0-9]+}}(%esp), %eax
>>> -; X86-BMI1NOTBM-NEXT:    movl $64, %ecx
>>> -; X86-BMI1NOTBM-NEXT:    subl {{[0-9]+}}(%esp), %ecx
>>> +; X86-BMI1NOTBM-NEXT:    movb $64, %cl
>>> +; X86-BMI1NOTBM-NEXT:    subb {{[0-9]+}}(%esp), %cl
>>>  ; X86-BMI1NOTBM-NEXT:    movl %edx, %esi
>>>  ; X86-BMI1NOTBM-NEXT:    shll %cl, %esi
>>>  ; X86-BMI1NOTBM-NEXT:    shldl %cl, %edx, %eax
>>> @@ -3080,8 +3082,8 @@ define i64 @bzhi64_d0(i64 %val, i64 %num
>>>  ; X86-BMI1BMI2-NEXT:    pushl %esi
>>>  ; X86-BMI1BMI2-NEXT:    movl {{[0-9]+}}(%esp), %eax
>>>  ; X86-BMI1BMI2-NEXT:    movl {{[0-9]+}}(%esp), %esi
>>> -; X86-BMI1BMI2-NEXT:    movl $64, %ecx
>>> -; X86-BMI1BMI2-NEXT:    subl {{[0-9]+}}(%esp), %ecx
>>> +; X86-BMI1BMI2-NEXT:    movb $64, %cl
>>> +; X86-BMI1BMI2-NEXT:    subb {{[0-9]+}}(%esp), %cl
>>>  ; X86-BMI1BMI2-NEXT:    shldl %cl, %eax, %esi
>>>  ; X86-BMI1BMI2-NEXT:    shlxl %ecx, %eax, %edi
>>>  ; X86-BMI1BMI2-NEXT:    xorl %edx, %edx
>>> @@ -3110,7 +3112,7 @@ define i64 @bzhi64_d0(i64 %val, i64 %num
>>>  ; X64-NOBMI:       # %bb.0:
>>>  ; X64-NOBMI-NEXT:    movq %rsi, %rcx
>>>  ; X64-NOBMI-NEXT:    movq %rdi, %rax
>>> -; X64-NOBMI-NEXT:    negl %ecx
>>> +; X64-NOBMI-NEXT:    negb %cl
>>>  ; X64-NOBMI-NEXT:    shlq %cl, %rax
>>>  ; X64-NOBMI-NEXT:    # kill: def $cl killed $cl killed $rcx
>>>  ; X64-NOBMI-NEXT:    shrq %cl, %rax
>>> @@ -3281,8 +3283,8 @@ define i64 @bzhi64_d2_load(i64* %w, i64
>>>  ; X86-NOBMI-NEXT:    movl {{[0-9]+}}(%esp), %eax
>>>  ; X86-NOBMI-NEXT:    movl (%eax), %edx
>>>  ; X86-NOBMI-NEXT:    movl 4(%eax), %eax
>>> -; X86-NOBMI-NEXT:    movl $64, %ecx
>>> -; X86-NOBMI-NEXT:    subl {{[0-9]+}}(%esp), %ecx
>>> +; X86-NOBMI-NEXT:    movb $64, %cl
>>> +; X86-NOBMI-NEXT:    subb {{[0-9]+}}(%esp), %cl
>>>  ; X86-NOBMI-NEXT:    movl %edx, %esi
>>>  ; X86-NOBMI-NEXT:    shll %cl, %esi
>>>  ; X86-NOBMI-NEXT:    shldl %cl, %edx, %eax
>>> @@ -3321,8 +3323,8 @@ define i64 @bzhi64_d2_load(i64* %w, i64
>>>  ; X86-BMI1NOTBM-NEXT:    movl {{[0-9]+}}(%esp), %eax
>>>  ; X86-BMI1NOTBM-NEXT:    movl (%eax), %edx
>>>  ; X86-BMI1NOTBM-NEXT:    movl 4(%eax), %eax
>>> -; X86-BMI1NOTBM-NEXT:    movl $64, %ecx
>>> -; X86-BMI1NOTBM-NEXT:    subl {{[0-9]+}}(%esp), %ecx
>>> +; X86-BMI1NOTBM-NEXT:    movb $64, %cl
>>> +; X86-BMI1NOTBM-NEXT:    subb {{[0-9]+}}(%esp), %cl
>>>  ; X86-BMI1NOTBM-NEXT:    movl %edx, %esi
>>>  ; X86-BMI1NOTBM-NEXT:    shll %cl, %esi
>>>  ; X86-BMI1NOTBM-NEXT:    shldl %cl, %edx, %eax
>>> @@ -3360,8 +3362,8 @@ define i64 @bzhi64_d2_load(i64* %w, i64
>>>  ; X86-BMI1BMI2-NEXT:    movl {{[0-9]+}}(%esp), %eax
>>>  ; X86-BMI1BMI2-NEXT:    movl (%eax), %edx
>>>  ; X86-BMI1BMI2-NEXT:    movl 4(%eax), %esi
>>> -; X86-BMI1BMI2-NEXT:    movl $64, %ecx
>>> -; X86-BMI1BMI2-NEXT:    subl {{[0-9]+}}(%esp), %ecx
>>> +; X86-BMI1BMI2-NEXT:    movb $64, %cl
>>> +; X86-BMI1BMI2-NEXT:    subb {{[0-9]+}}(%esp), %cl
>>>  ; X86-BMI1BMI2-NEXT:    shldl %cl, %edx, %esi
>>>  ; X86-BMI1BMI2-NEXT:    shlxl %ecx, %edx, %edi
>>>  ; X86-BMI1BMI2-NEXT:    xorl %edx, %edx
>>> @@ -3390,7 +3392,7 @@ define i64 @bzhi64_d2_load(i64* %w, i64
>>>  ; X64-NOBMI:       # %bb.0:
>>>  ; X64-NOBMI-NEXT:    movq %rsi, %rcx
>>>  ; X64-NOBMI-NEXT:    movq (%rdi), %rax
>>> -; X64-NOBMI-NEXT:    negl %ecx
>>> +; X64-NOBMI-NEXT:    negb %cl
>>>  ; X64-NOBMI-NEXT:    shlq %cl, %rax<b
>>
>>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20181205/e27c37f7/attachment.html>


More information about the llvm-commits mailing list