[PATCH] D94015: [SCEV] Replace cttz loop by call to cttz intrinsic.
Francis laniel via Phabricator via llvm-commits
llvm-commits at lists.llvm.org
Mon Jan 4 09:17:34 PST 2021
eiffel created this revision.
eiffel added a reviewer: reames.
Herald added a subscriber: hiraditya.
eiffel requested review of this revision.
Herald added a project: LLVM.
Herald added a subscriber: llvm-commits.
Hi.
First, I hope you are fine and the same for your relatives.
I wrote a patch which solves the issue 17128 <https://bugs.llvm.org/show_bug.cgi?id=17128>.
The goal of this patch is to replace such snippet:
int cttz(unsigned long x){
unsigned long i = 0;
while(i < 64 && (((x >> i) & 0x1) == 0))
i++;
return i;
}
by calls to llvm `cttz` intrinsic which can then be translated to the corresponding assembly instruction, if the architecture has one.
In my case, the intrinsic was replaced by `bfsq` instruction.
To confirm my results, I wrote cttz.ll test to confirm the patch works and I ran the the check-llvm-unit and check-llvm targets.
The second gave me one failure for Bindings/Go/go.test:
# llvm.org/llvm/bindings/go/llvm.test
/usr/lib/go-1.11/pkg/tool/linux_amd64/link: running /usr/bin/c++ failed: exit status 1
ld.lld: error: unknown --compress-debug-sections value: zlib-gnu
collect2: error: ld returned 1 exit status
I do not think this problem is related to my patch but rather to my configuration.
I also quickly benchmarked my modifications.
First, I measured the compilation time of this program by compiling it 100 times using this command `clang -O3 -S -emit-llvm` and `time` as measuring tool:
#include <stdlib.h>
int cttz(unsigned long x){
unsigned long i = 0;
while(i < 64 && (((x >> i) & 0x1) == 0))
i++;
return i;
}
int main(void){
int bits_field;
int first_set;
bits_field = rand();
first_set = cttz(bits_field);
return first_set;
}
The results are the following (in second):
| | 100 compilations | mean for 1 compilation |
| without patch | 20.54 | .21 |
| with patch | 16.19 | .17 |
|
So, the patch reduces compilation time of approximately 23%.
However, I am not really sure of this result as I first though that the modification would make the compilation slower.
Maybe it is quicker due to the loop removing and then not having to optimize it.
Then, I measure the performance of the generated code by running the above code 10000 times using `time` as measuring tool, the results are as follows (in millisecond):
| | 10000 runs | mean for 1 run |
| without patch | 8730 | .873 |
| with patch | 8220 | .822 |
|
So, the patch reduces code execution time by around 6%.
If you see any way to improve the patch or mistake I made, feel free to share.
Best regards.
Repository:
rG LLVM Github Monorepo
https://reviews.llvm.org/D94015
Files:
llvm/lib/Transforms/Scalar/LoopIdiomRecognize.cpp
llvm/test/Transforms/LoopIdiom/cttz.ll
-------------- next part --------------
A non-text attachment was scrubbed...
Name: D94015.314386.patch
Type: text/x-patch
Size: 9507 bytes
Desc: not available
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20210104/ee40a8c0/attachment.bin>
More information about the llvm-commits
mailing list