[llvm-dev] Liveness of AL, AH and AX in x86 backend

Tue May 24 13:59:47 PDT 2016

Thanks Kevin.  This isn't exactly what I'm looking for, though.  The ECX 
is explicitly defined here and CL/CH are only used.  I was interested in 
the opposite situation---where the sub-registers are defined separately 
and then the super-register is used as a whole.

Hopefully the sub-register liveness tracking is what I need, so the 
questions about x86 may become moot.

-Krzysztof

On 5/24/2016 3:25 PM, Smith, Kevin B wrote:
> Here's some of the generated code from the current community head for bzip2.c from spec 256.bzip2, with these options:
>
> clang -m32 -S   -O2      bzip2.c
>
> .LBB14_4:                               # %bsW.exit24
>         subl    %eax, %ebx
>         addl    $8, %eax
>         movl    %ebx, %ecx
>         movl    %eax, bsLive
>         shll    %cl, %edi
>         movl    %ebp, %ecx
>         orl     %esi, %edi
>         movzbl  %ch, %esi
>         cmpl    $8, %eax
>         movl    %edi, bsBuff
>         jl      .LBB14_6
>
> As you can see, it is using both cl and ch for different values in this basic block.  This occurs in the generated code for the routine bsPutUInt32
>
> Kevin Smith
>
>> -----Original Message-----
>> From: mehdi.amini at apple.com [mailto:mehdi.amini at apple.com]
>> Sent: Tuesday, May 24, 2016 1:03 PM
>> To: Krzysztof Parzyszek <kparzysz at codeaurora.org>
>> Cc: mats petersson <mats at planetcatfish.com>; Smith, Kevin B
>> <kevin.b.smith at intel.com>; llvm-dev at lists.llvm.org
>> Subject: Re: [llvm-dev] Liveness of AL, AH and AX in x86 backend
>>
>> Hi,
>>
>> Could you use "MIR" to forge the example you're looking for?
>>
>> --
>> Mehdi
>>
>>
>>> On May 24, 2016, at 10:10 AM, Krzysztof Parzyszek via llvm-dev <llvm-
>> dev at lists.llvm.org> wrote:
>>>
>>> Then let me shift focus from performance to size.  With either optsize or
>> minsize, the output is still the same.
>>>
>>> As per the subject, I'm not really interested in the quality of the final code,
>> but in the way that the x86 target deals with the structural relationship
>> between these registers.  Specifically, I'd like to see if it would generate
>> implicit defs/uses for AX on defs/uses of AH/AL.  I looked in the X86
>> sources and I didn't find code that would make me certain, but I'm not too
>> familiar with that backend.  Having a testcase to work with would make it a lot
>> easier for me.
>>>
>>> -Krzysztof
>>>
>>>
>>> On 5/24/2016 12:03 PM, mats petersson wrote:
>>>> On several variants of x86 processors, mixing `ah`, `al` and `ax` as
>>>> source/destination in the same dependency chain will have some
>>>> penalties, so for THOSE processors, there is a benefit to NOT use `al`
>>>> and `ah` to reflect parts of `ax` - I believe this is caused by the fact
>>>> that the processor doesn't ACTUALLY see these as parts of a bigger
>>>> register internally, and will execute two independent dependency chains,
>>>> UNTIL you start using `ax` as one register. At this point, the processor
>>>> has to make sure both of dependency chains for `al` and `ah` have been
>>>> complete, and that the merged value is available in `ax`. If the
>>>> processor uses `cl` and `al`, this sort of problem is avoided.
>>>>
>>>> <<Quote from Intel Optimisation guide, page 3-44
>>>> http://www.intel.co.uk/content/dam/doc/manual/64-ia-32-architectures-
>> optimization-manual.pdf
>>>>
>>>> A partial register stall happens when an instruction refers to a
>>>> register, portions of
>>>> which were previously modified by other instructions. For example,
>>>> partial register
>>>> stalls occurs with a read to AX while previous instructions stored AL
>>>> and AH, or a read
>>>> to EAX while previous in
>>>> struction modified AX.
>>>> The delay of a partial register stall is small in processors based on
>>>> Intel Core and
>>>> NetBurst microarchitectures, and in Pentium M processor (with CPUID
>>>> signature
>>>> family 6, model 13), Intel Core Solo,
>>>> and Intel Core Duo processors. Pentium M
>>>> processors (CPUID signature with family 6,
>>>> model 9) and the P6 family incur a large
>>>> penalty.
>>>> <<Enq quote>>
>>>>
>>>> So for compact code, yes, it's probably an advantage. For SOME
>>>> processors in the x86 range, not so good for performance.
>>>>
>>>> Whether LLVM has the information as to WHICH processor models have
>> such
>>>> penalties (or better yet, can determine the amount of time lost for this
>>>> sort of operation), I'm not sure. It's obviously something that CAN be
>>>> programmed into a compiler, it's just a matter of understanding the
>>>> effort vs. reward factor for this particular type of optimisation,
>>>> compared to other things that could be done to improve the quality of
>>>> the code generated.
>>>>
>>>> --
>>>> Mats
>>>>
>>>> On 24 May 2016 at 17:09, Smith, Kevin B via llvm-dev
>>>> <llvm-dev at lists.llvm.org <mailto:llvm-dev at lists.llvm.org>> wrote:
>>>>
>>>>    Try using x86 mode rather than Intel64 mode.  I have definitely
>>>>    gotten it to use both ah and al in 32 bit x86 code generation.
>>>>    In particular, I have seen that in loops for both the spec2000 and
>>>>    spec2006 versions of bzip.  It can happen, but it does only rarely.
>>>>
>>>>    Kevin Smith
>>>>
>>>>    >-----Original Message-----
>>>>    >From: llvm-dev [mailto:llvm-dev-bounces at lists.llvm.org
>>>>    <mailto:llvm-dev-bounces at lists.llvm.org>] On Behalf Of
>>>>    >Krzysztof Parzyszek via llvm-dev
>>>>    >Sent: Tuesday, May 24, 2016 8:04 AM
>>>>    >To: LLVM Dev <llvm-dev at lists.llvm.org <mailto:llvm-
>> dev at lists.llvm.org>>
>>>>    >Subject: [llvm-dev] Liveness of AL, AH and AX in x86 backend
>>>>    >
>>>>    >I'm trying to see how the x86 backend deals with the relationship
>>>>    >between AL, AH and AX, but I can't get it to generate any code that
>>>>    >would expose an interesting scenario.
>>>>    >
>>>>    >For example, I wrote this piece:
>>>>    >
>>>>    >typedef struct {
>>>>    >   char x, y;
>>>>    >} struct_t;
>>>>    >
>>>>    >struct_t z;
>>>>    >
>>>>    >struct_t foo(char *p) {
>>>>    >   struct_t s;
>>>>    >   s.x = *p++;
>>>>    >   s.y = *p;
>>>>    >   z = s;
>>>>    >   s.x++;
>>>>    >   return s;
>>>>    >}
>>>>    >
>>>>    >But the output at -O2 is
>>>>    >
>>>>    >foo:                                    # @foo
>>>>    >         .cfi_startproc
>>>>    ># BB#0:                                 # %entry
>>>>    >         movb    (%rdi), %al
>>>>    >         movzbl  1(%rdi), %ecx
>>>>    >         movb    %al, z(%rip)
>>>>    >         movb    %cl, z+1(%rip)
>>>>    >         incb    %al
>>>>    >         shll    $8, %ecx
>>>>    >         movzbl  %al, %eax
>>>>    >         orl     %ecx, %eax
>>>>    >         retq
>>>>    >
>>>>    >
>>>>    >I was hoping it would do something along the lines of
>>>>    >
>>>>    >   movb (%rdi), %al
>>>>    >   movb 1(%rdi), %ah
>>>>    >   movh %ax, z(%rip)
>>>>    >   incb %al
>>>>    >   retq
>>>>    >
>>>>    >
>>>>    >Why is the x86 backend not getting this code?  Does it know that
>>>>    AH:AL =
>>>>    >AX?
>>>>    >
>>>>    >-Krzysztof
>>>>    >
>>>>    >
>>>>    >
>>>>    >--
>>>>    >Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum,
>>>>    >hosted by The Linux Foundation
>>>>    >_______________________________________________
>>>>    >LLVM Developers mailing list
>>>>    >llvm-dev at lists.llvm.org <mailto:llvm-dev at lists.llvm.org>
>>>>    >http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>>>>    _______________________________________________
>>>>    LLVM Developers mailing list
>>>>    llvm-dev at lists.llvm.org <mailto:llvm-dev at lists.llvm.org>
>>>>    http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>>>>
>>>>
>>>
>>>
>>> --
>>> Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum,
>> hosted by The Linux Foundation
>>> _______________________________________________
>>> LLVM Developers mailing list
>>> llvm-dev at lists.llvm.org
>>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>

-- 
Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, 
hosted by The Linux Foundation