[llvm-dev] Liveness of AL, AH and AX in x86 backend

Tue May 24 14:17:59 PDT 2016

Sorry, I misunderstood what you were looking for then.  I have never seen the x86 backend create separate defs of
ah, al, and then use ax as the combined value.  I tried to come up with such cases and never found any.  I suspect
that current CG can never generate such code.

Kevin

>-----Original Message-----
>From: Krzysztof Parzyszek [mailto:kparzysz at codeaurora.org]
>Sent: Tuesday, May 24, 2016 2:00 PM
>To: Smith, Kevin B <kevin.b.smith at intel.com>; 'mehdi.amini at apple.com'
><mehdi.amini at apple.com>
>Cc: mats petersson <mats at planetcatfish.com>; llvm-dev at lists.llvm.org
>Subject: Re: [llvm-dev] Liveness of AL, AH and AX in x86 backend
>
>Thanks Kevin.  This isn't exactly what I'm looking for, though.  The ECX
>is explicitly defined here and CL/CH are only used.  I was interested in
>the opposite situation---where the sub-registers are defined separately
>and then the super-register is used as a whole.
>
>Hopefully the sub-register liveness tracking is what I need, so the
>questions about x86 may become moot.
>
>-Krzysztof
>
>
>
>On 5/24/2016 3:25 PM, Smith, Kevin B wrote:
>> Here's some of the generated code from the current community head for
>bzip2.c from spec 256.bzip2, with these options:
>>
>> clang -m32 -S   -O2      bzip2.c
>>
>> .LBB14_4:                               # %bsW.exit24
>>         subl    %eax, %ebx
>>         addl    $8, %eax
>>         movl    %ebx, %ecx
>>         movl    %eax, bsLive
>>         shll    %cl, %edi
>>         movl    %ebp, %ecx
>>         orl     %esi, %edi
>>         movzbl  %ch, %esi
>>         cmpl    $8, %eax
>>         movl    %edi, bsBuff
>>         jl      .LBB14_6
>>
>> As you can see, it is using both cl and ch for different values in this basic
>block.  This occurs in the generated code for the routine bsPutUInt32
>>
>> Kevin Smith
>>
>>> -----Original Message-----
>>> From: mehdi.amini at apple.com [mailto:mehdi.amini at apple.com]
>>> Sent: Tuesday, May 24, 2016 1:03 PM
>>> To: Krzysztof Parzyszek <kparzysz at codeaurora.org>
>>> Cc: mats petersson <mats at planetcatfish.com>; Smith, Kevin B
>>> <kevin.b.smith at intel.com>; llvm-dev at lists.llvm.org
>>> Subject: Re: [llvm-dev] Liveness of AL, AH and AX in x86 backend
>>>
>>> Hi,
>>>
>>> Could you use "MIR" to forge the example you're looking for?
>>>
>>> --
>>> Mehdi
>>>
>>>
>>>> On May 24, 2016, at 10:10 AM, Krzysztof Parzyszek via llvm-dev <llvm-
>>> dev at lists.llvm.org> wrote:
>>>>
>>>> Then let me shift focus from performance to size.  With either optsize or
>>> minsize, the output is still the same.
>>>>
>>>> As per the subject, I'm not really interested in the quality of the final
>code,
>>> but in the way that the x86 target deals with the structural relationship
>>> between these registers.  Specifically, I'd like to see if it would generate
>>> implicit defs/uses for AX on defs/uses of AH/AL.  I looked in the X86
>>> sources and I didn't find code that would make me certain, but I'm not too
>>> familiar with that backend.  Having a testcase to work with would make it
>a lot
>>> easier for me.
>>>>
>>>> -Krzysztof
>>>>
>>>>
>>>> On 5/24/2016 12:03 PM, mats petersson wrote:
>>>>> On several variants of x86 processors, mixing `ah`, `al` and `ax` as
>>>>> source/destination in the same dependency chain will have some
>>>>> penalties, so for THOSE processors, there is a benefit to NOT use `al`
>>>>> and `ah` to reflect parts of `ax` - I believe this is caused by the fact
>>>>> that the processor doesn't ACTUALLY see these as parts of a bigger
>>>>> register internally, and will execute two independent dependency
>chains,
>>>>> UNTIL you start using `ax` as one register. At this point, the processor
>>>>> has to make sure both of dependency chains for `al` and `ah` have
>been
>>>>> complete, and that the merged value is available in `ax`. If the
>>>>> processor uses `cl` and `al`, this sort of problem is avoided.
>>>>>
>>>>> <<Quote from Intel Optimisation guide, page 3-44
>>>>> http://www.intel.co.uk/content/dam/doc/manual/64-ia-32-architectures-
>>> optimization-manual.pdf
>>>>>
>>>>> A partial register stall happens when an instruction refers to a
>>>>> register, portions of
>>>>> which were previously modified by other instructions. For example,
>>>>> partial register
>>>>> stalls occurs with a read to AX while previous instructions stored AL
>>>>> and AH, or a read
>>>>> to EAX while previous in
>>>>> struction modified AX.
>>>>> The delay of a partial register stall is small in processors based on
>>>>> Intel Core and
>>>>> NetBurst microarchitectures, and in Pentium M processor (with CPUID
>>>>> signature
>>>>> family 6, model 13), Intel Core Solo,
>>>>> and Intel Core Duo processors. Pentium M
>>>>> processors (CPUID signature with family 6,
>>>>> model 9) and the P6 family incur a large
>>>>> penalty.
>>>>> <<Enq quote>>
>>>>>
>>>>> So for compact code, yes, it's probably an advantage. For SOME
>>>>> processors in the x86 range, not so good for performance.
>>>>>
>>>>> Whether LLVM has the information as to WHICH processor models
>have
>>> such
>>>>> penalties (or better yet, can determine the amount of time lost for this
>>>>> sort of operation), I'm not sure. It's obviously something that CAN be
>>>>> programmed into a compiler, it's just a matter of understanding the
>>>>> effort vs. reward factor for this particular type of optimisation,
>>>>> compared to other things that could be done to improve the quality of
>>>>> the code generated.
>>>>>
>>>>> --
>>>>> Mats
>>>>>
>>>>> On 24 May 2016 at 17:09, Smith, Kevin B via llvm-dev
>>>>> <llvm-dev at lists.llvm.org <mailto:llvm-dev at lists.llvm.org>> wrote:
>>>>>
>>>>>    Try using x86 mode rather than Intel64 mode.  I have definitely
>>>>>    gotten it to use both ah and al in 32 bit x86 code generation.
>>>>>    In particular, I have seen that in loops for both the spec2000 and
>>>>>    spec2006 versions of bzip.  It can happen, but it does only rarely.
>>>>>
>>>>>    Kevin Smith
>>>>>
>>>>>    >-----Original Message-----
>>>>>    >From: llvm-dev [mailto:llvm-dev-bounces at lists.llvm.org
>>>>>    <mailto:llvm-dev-bounces at lists.llvm.org>] On Behalf Of
>>>>>    >Krzysztof Parzyszek via llvm-dev
>>>>>    >Sent: Tuesday, May 24, 2016 8:04 AM
>>>>>    >To: LLVM Dev <llvm-dev at lists.llvm.org <mailto:llvm-
>>> dev at lists.llvm.org>>
>>>>>    >Subject: [llvm-dev] Liveness of AL, AH and AX in x86 backend
>>>>>    >
>>>>>    >I'm trying to see how the x86 backend deals with the relationship
>>>>>    >between AL, AH and AX, but I can't get it to generate any code that
>>>>>    >would expose an interesting scenario.
>>>>>    >
>>>>>    >For example, I wrote this piece:
>>>>>    >
>>>>>    >typedef struct {
>>>>>    >   char x, y;
>>>>>    >} struct_t;
>>>>>    >
>>>>>    >struct_t z;
>>>>>    >
>>>>>    >struct_t foo(char *p) {
>>>>>    >   struct_t s;
>>>>>    >   s.x = *p++;
>>>>>    >   s.y = *p;
>>>>>    >   z = s;
>>>>>    >   s.x++;
>>>>>    >   return s;
>>>>>    >}
>>>>>    >
>>>>>    >But the output at -O2 is
>>>>>    >
>>>>>    >foo:                                    # @foo
>>>>>    >         .cfi_startproc
>>>>>    ># BB#0:                                 # %entry
>>>>>    >         movb    (%rdi), %al
>>>>>    >         movzbl  1(%rdi), %ecx
>>>>>    >         movb    %al, z(%rip)
>>>>>    >         movb    %cl, z+1(%rip)
>>>>>    >         incb    %al
>>>>>    >         shll    $8, %ecx
>>>>>    >         movzbl  %al, %eax
>>>>>    >         orl     %ecx, %eax
>>>>>    >         retq
>>>>>    >
>>>>>    >
>>>>>    >I was hoping it would do something along the lines of
>>>>>    >
>>>>>    >   movb (%rdi), %al
>>>>>    >   movb 1(%rdi), %ah
>>>>>    >   movh %ax, z(%rip)
>>>>>    >   incb %al
>>>>>    >   retq
>>>>>    >
>>>>>    >
>>>>>    >Why is the x86 backend not getting this code?  Does it know that
>>>>>    AH:AL =
>>>>>    >AX?
>>>>>    >
>>>>>    >-Krzysztof
>>>>>    >
>>>>>    >
>>>>>    >
>>>>>    >--
>>>>>    >Qualcomm Innovation Center, Inc. is a member of Code Aurora
>Forum,
>>>>>    >hosted by The Linux Foundation
>>>>>    >_______________________________________________
>>>>>    >LLVM Developers mailing list
>>>>>    >llvm-dev at lists.llvm.org <mailto:llvm-dev at lists.llvm.org>
>>>>>    >http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>>>>>    _______________________________________________
>>>>>    LLVM Developers mailing list
>>>>>    llvm-dev at lists.llvm.org <mailto:llvm-dev at lists.llvm.org>
>>>>>    http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>>>>>
>>>>>
>>>>
>>>>
>>>> --
>>>> Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum,
>>> hosted by The Linux Foundation
>>>> _______________________________________________
>>>> LLVM Developers mailing list
>>>> llvm-dev at lists.llvm.org
>>>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>>
>
>
>--
>Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum,
>hosted by The Linux Foundation