[llvm-dev] [ARM] Peephole optimization ( instructions tst + add )
Kosov Pavel via llvm-dev
llvm-dev at lists.llvm.org
Mon Nov 25 22:51:32 PST 2019
Thank you!
I took a look at this method (ARMBaseInstrInfo::optimizeCompareInstr) and how it is used.
So,if I understood correctly, I need to add new method to TargetInstrInfo (similar to optimizeCompareInstr - e.g. optimizeAddInstr) and implement it in AArch64InstrInfo.
This method should be able to transform code like this:
%47:gpr64 = ANDXrr %46:gpr64, %32:gpr64
%48:gpr64common = ORRXrr killed %47:gpr64, %28:gpr64common
%49:gpr64 = ANDSXrr %46:gpr64, %32:gpr64, implicit-def $nzcv
to this form:
%47:gpr64 = ANDSXrr %46:gpr64, %32:gpr64, implicit-def $nzcv
%48:gpr64common = ORRXrr killed %47:gpr64, %28:gpr64common
Is everything correct?
From: Eli Friedman [mailto:efriedma at quicinc.com]
Sent: Friday, November 22, 2019 9:53 PM
To: Kosov Pavel <kosov.pavel at huawei.com>; LLVM Dev <llvm-dev at lists.llvm.org>
Subject: RE: [llvm-dev] [ARM] Peephole optimization ( instructions tst + add )
You probably want to do this some time before register allocation, so you don't have to worry about physical register definitions.
Maybe take a look at what ARM does in ARMBaseInstrInfo::optimizeCompareInstr ?
-Eli
From: Kosov Pavel <kosov.pavel at huawei.com<mailto:kosov.pavel at huawei.com>>
Sent: Friday, November 22, 2019 3:09 AM
To: Eli Friedman <efriedma at quicinc.com<mailto:efriedma at quicinc.com>>; LLVM Dev <llvm-dev at lists.llvm.org<mailto:llvm-dev at lists.llvm.org>>
Subject: [EXT] RE: [llvm-dev] [ARM] Peephole optimization ( instructions tst + add )
Ok, thank you, I will implement it then.
As far as I see this optimization should be done in AArch64LoadStoreOptimizer, is it right?
From: Eli Friedman [mailto:efriedma at quicinc.com]
Sent: Thursday, November 21, 2019 11:55 PM
To: Kosov Pavel <kosov.pavel at huawei.com<mailto:kosov.pavel at huawei.com>>; LLVM Dev <llvm-dev at lists.llvm.org<mailto:llvm-dev at lists.llvm.org>>
Subject: RE: [llvm-dev] [ARM] Peephole optimization ( instructions tst + add )
That transform is legal; it's a missed optimization.
-Eli
From: llvm-dev <llvm-dev-bounces at lists.llvm.org<mailto:llvm-dev-bounces at lists.llvm.org>> On Behalf Of Kosov Pavel via llvm-dev
Sent: Thursday, November 21, 2019 2:00 AM
To: llvm-dev at lists.llvm.org<mailto:llvm-dev at lists.llvm.org>
Subject: [EXT] [llvm-dev] [ARM] Peephole optimization ( instructions tst + add )
Hello!
I noticed that in some cases clang generates sequence of AND+TST instructions:
For example:
AND x3, x2, x1
TST x2, x1
I think these instructions should be merged to one:
ANDS x3, x2, x1
( because TST <Xn>, <Xm> is alias for ANDS XZR, <Xn>, <Xm> - https://static.docs.arm.com/ddi0596/a/DDI_0596_ARM_a64_instruction_set_architecture.pdf )
Is it missing optimization or there could be some negative effect from such merge?
Best regards
Pavel
PS: Code sample (though it may be significantly reduced):
(clang -target aarch64 sample.c -S -O2 -o sample.S )
=========================================================================
#define NULL ((void*)0)
typedef struct {
unsigned long * res_in;
unsigned long * proc;
} fd_set_bits;
fd_set_bits *gv_fds;
int g_max_i;
int LOOP_ITERS_COUNT;
unsigned DEF_MASK;
__attribute__((noinline)) int do_test(const int max_iters_count,
const unsigned long in,
const unsigned long out,
const unsigned long ex,
const unsigned long bit_init_val,
const unsigned long mask) {
int retval = 0;
for(int k =0 ; k < max_iters_count; k++)
{
fd_set_bits *fds = gv_fds;
for(int j = 0; j < LOOP_ITERS_COUNT; ++j)
{
if (in) {
retval++;
fds->proc = NULL;
}
if (mask & DEF_MASK) {
fds->proc = NULL;
}
}
}
return retval;
}
=========================================================================
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20191126/5bc949a9/attachment.html>
More information about the llvm-dev
mailing list