[llvm-dev] Liveness of AL, AH and AX in x86 backend
Krzysztof Parzyszek via llvm-dev
llvm-dev at lists.llvm.org
Tue May 24 13:59:47 PDT 2016
Thanks Kevin. This isn't exactly what I'm looking for, though. The ECX
is explicitly defined here and CL/CH are only used. I was interested in
the opposite situation---where the sub-registers are defined separately
and then the super-register is used as a whole.
Hopefully the sub-register liveness tracking is what I need, so the
questions about x86 may become moot.
-Krzysztof
On 5/24/2016 3:25 PM, Smith, Kevin B wrote:
> Here's some of the generated code from the current community head for bzip2.c from spec 256.bzip2, with these options:
>
> clang -m32 -S -O2 bzip2.c
>
> .LBB14_4: # %bsW.exit24
> subl %eax, %ebx
> addl $8, %eax
> movl %ebx, %ecx
> movl %eax, bsLive
> shll %cl, %edi
> movl %ebp, %ecx
> orl %esi, %edi
> movzbl %ch, %esi
> cmpl $8, %eax
> movl %edi, bsBuff
> jl .LBB14_6
>
> As you can see, it is using both cl and ch for different values in this basic block. This occurs in the generated code for the routine bsPutUInt32
>
> Kevin Smith
>
>> -----Original Message-----
>> From: mehdi.amini at apple.com [mailto:mehdi.amini at apple.com]
>> Sent: Tuesday, May 24, 2016 1:03 PM
>> To: Krzysztof Parzyszek <kparzysz at codeaurora.org>
>> Cc: mats petersson <mats at planetcatfish.com>; Smith, Kevin B
>> <kevin.b.smith at intel.com>; llvm-dev at lists.llvm.org
>> Subject: Re: [llvm-dev] Liveness of AL, AH and AX in x86 backend
>>
>> Hi,
>>
>> Could you use "MIR" to forge the example you're looking for?
>>
>> --
>> Mehdi
>>
>>
>>> On May 24, 2016, at 10:10 AM, Krzysztof Parzyszek via llvm-dev <llvm-
>> dev at lists.llvm.org> wrote:
>>>
>>> Then let me shift focus from performance to size. With either optsize or
>> minsize, the output is still the same.
>>>
>>> As per the subject, I'm not really interested in the quality of the final code,
>> but in the way that the x86 target deals with the structural relationship
>> between these registers. Specifically, I'd like to see if it would generate
>> implicit defs/uses for AX on defs/uses of AH/AL. I looked in the X86
>> sources and I didn't find code that would make me certain, but I'm not too
>> familiar with that backend. Having a testcase to work with would make it a lot
>> easier for me.
>>>
>>> -Krzysztof
>>>
>>>
>>> On 5/24/2016 12:03 PM, mats petersson wrote:
>>>> On several variants of x86 processors, mixing `ah`, `al` and `ax` as
>>>> source/destination in the same dependency chain will have some
>>>> penalties, so for THOSE processors, there is a benefit to NOT use `al`
>>>> and `ah` to reflect parts of `ax` - I believe this is caused by the fact
>>>> that the processor doesn't ACTUALLY see these as parts of a bigger
>>>> register internally, and will execute two independent dependency chains,
>>>> UNTIL you start using `ax` as one register. At this point, the processor
>>>> has to make sure both of dependency chains for `al` and `ah` have been
>>>> complete, and that the merged value is available in `ax`. If the
>>>> processor uses `cl` and `al`, this sort of problem is avoided.
>>>>
>>>> <<Quote from Intel Optimisation guide, page 3-44
>>>> http://www.intel.co.uk/content/dam/doc/manual/64-ia-32-architectures-
>> optimization-manual.pdf
>>>>
>>>> A partial register stall happens when an instruction refers to a
>>>> register, portions of
>>>> which were previously modified by other instructions. For example,
>>>> partial register
>>>> stalls occurs with a read to AX while previous instructions stored AL
>>>> and AH, or a read
>>>> to EAX while previous in
>>>> struction modified AX.
>>>> The delay of a partial register stall is small in processors based on
>>>> Intel Core and
>>>> NetBurst microarchitectures, and in Pentium M processor (with CPUID
>>>> signature
>>>> family 6, model 13), Intel Core Solo,
>>>> and Intel Core Duo processors. Pentium M
>>>> processors (CPUID signature with family 6,
>>>> model 9) and the P6 family incur a large
>>>> penalty.
>>>> <<Enq quote>>
>>>>
>>>> So for compact code, yes, it's probably an advantage. For SOME
>>>> processors in the x86 range, not so good for performance.
>>>>
>>>> Whether LLVM has the information as to WHICH processor models have
>> such
>>>> penalties (or better yet, can determine the amount of time lost for this
>>>> sort of operation), I'm not sure. It's obviously something that CAN be
>>>> programmed into a compiler, it's just a matter of understanding the
>>>> effort vs. reward factor for this particular type of optimisation,
>>>> compared to other things that could be done to improve the quality of
>>>> the code generated.
>>>>
>>>> --
>>>> Mats
>>>>
>>>> On 24 May 2016 at 17:09, Smith, Kevin B via llvm-dev
>>>> <llvm-dev at lists.llvm.org <mailto:llvm-dev at lists.llvm.org>> wrote:
>>>>
>>>> Try using x86 mode rather than Intel64 mode. I have definitely
>>>> gotten it to use both ah and al in 32 bit x86 code generation.
>>>> In particular, I have seen that in loops for both the spec2000 and
>>>> spec2006 versions of bzip. It can happen, but it does only rarely.
>>>>
>>>> Kevin Smith
>>>>
>>>> >-----Original Message-----
>>>> >From: llvm-dev [mailto:llvm-dev-bounces at lists.llvm.org
>>>> <mailto:llvm-dev-bounces at lists.llvm.org>] On Behalf Of
>>>> >Krzysztof Parzyszek via llvm-dev
>>>> >Sent: Tuesday, May 24, 2016 8:04 AM
>>>> >To: LLVM Dev <llvm-dev at lists.llvm.org <mailto:llvm-
>> dev at lists.llvm.org>>
>>>> >Subject: [llvm-dev] Liveness of AL, AH and AX in x86 backend
>>>> >
>>>> >I'm trying to see how the x86 backend deals with the relationship
>>>> >between AL, AH and AX, but I can't get it to generate any code that
>>>> >would expose an interesting scenario.
>>>> >
>>>> >For example, I wrote this piece:
>>>> >
>>>> >typedef struct {
>>>> > char x, y;
>>>> >} struct_t;
>>>> >
>>>> >struct_t z;
>>>> >
>>>> >struct_t foo(char *p) {
>>>> > struct_t s;
>>>> > s.x = *p++;
>>>> > s.y = *p;
>>>> > z = s;
>>>> > s.x++;
>>>> > return s;
>>>> >}
>>>> >
>>>> >But the output at -O2 is
>>>> >
>>>> >foo: # @foo
>>>> > .cfi_startproc
>>>> ># BB#0: # %entry
>>>> > movb (%rdi), %al
>>>> > movzbl 1(%rdi), %ecx
>>>> > movb %al, z(%rip)
>>>> > movb %cl, z+1(%rip)
>>>> > incb %al
>>>> > shll $8, %ecx
>>>> > movzbl %al, %eax
>>>> > orl %ecx, %eax
>>>> > retq
>>>> >
>>>> >
>>>> >I was hoping it would do something along the lines of
>>>> >
>>>> > movb (%rdi), %al
>>>> > movb 1(%rdi), %ah
>>>> > movh %ax, z(%rip)
>>>> > incb %al
>>>> > retq
>>>> >
>>>> >
>>>> >Why is the x86 backend not getting this code? Does it know that
>>>> AH:AL =
>>>> >AX?
>>>> >
>>>> >-Krzysztof
>>>> >
>>>> >
>>>> >
>>>> >--
>>>> >Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum,
>>>> >hosted by The Linux Foundation
>>>> >_______________________________________________
>>>> >LLVM Developers mailing list
>>>> >llvm-dev at lists.llvm.org <mailto:llvm-dev at lists.llvm.org>
>>>> >http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>>>> _______________________________________________
>>>> LLVM Developers mailing list
>>>> llvm-dev at lists.llvm.org <mailto:llvm-dev at lists.llvm.org>
>>>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>>>>
>>>>
>>>
>>>
>>> --
>>> Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum,
>> hosted by The Linux Foundation
>>> _______________________________________________
>>> LLVM Developers mailing list
>>> llvm-dev at lists.llvm.org
>>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>
--
Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum,
hosted by The Linux Foundation
More information about the llvm-dev
mailing list