[PATCH] D94015: [SCEV] Replace cttz loop by call to cttz intrinsic.

Mon Jan 4 09:17:34 PST 2021

eiffel created this revision.
eiffel added a reviewer: reames.
Herald added a subscriber: hiraditya.
eiffel requested review of this revision.
Herald added a project: LLVM.
Herald added a subscriber: llvm-commits.

Hi.

First, I hope you are fine and the same for your relatives.

I wrote a patch which solves the issue 17128 <https://bugs.llvm.org/show_bug.cgi?id=17128>.
The goal of this patch is to replace such snippet:

  int cttz(unsigned long x){
  	unsigned long i = 0;
  	while(i < 64 && (((x >> i) & 0x1) == 0))
  		i++;
  	return i;
  }

by calls to llvm `cttz` intrinsic which can then be translated to the corresponding assembly instruction, if the architecture has one.
In my case, the intrinsic was replaced by `bfsq` instruction.

To confirm my results, I wrote cttz.ll test to confirm the patch works and I ran the the check-llvm-unit and check-llvm targets.
The second gave me one failure for Bindings/Go/go.test:

  # llvm.org/llvm/bindings/go/llvm.test
  /usr/lib/go-1.11/pkg/tool/linux_amd64/link: running /usr/bin/c++ failed: exit status 1
  ld.lld: error: unknown --compress-debug-sections value: zlib-gnu
  collect2: error: ld returned 1 exit status

I do not think this problem is related to my patch but rather to my configuration.

I also quickly benchmarked my modifications.
First, I measured the compilation time of this program by compiling it 100 times using this command `clang -O3 -S -emit-llvm` and `time` as measuring tool:

  #include <stdlib.h>

  int cttz(unsigned long x){
  	unsigned long i = 0;
  	while(i < 64 && (((x >> i) & 0x1) == 0))
  		i++;
  	return i;
  }

  int main(void){
  	int bits_field;
  	int first_set;

  	bits_field = rand();

  	first_set = cttz(bits_field);

  	return first_set;
  }

The results are the following (in second):

|               | 100 compilations | mean for 1 compilation |
| without patch | 20.54            | .21                    |
| with patch    | 16.19            | .17                    |
|

So, the patch reduces compilation time of approximately 23%.
However, I am not really sure of this result as I first though that the modification would make the compilation slower.
Maybe it is quicker due to the loop removing and then not having to optimize it.

Then, I measure the performance of the generated code by running the above code 10000 times using `time` as measuring tool, the results are as follows (in millisecond):

|               | 10000 runs | mean for 1 run |
| without patch | 8730       | .873           |
| with patch    | 8220       | .822           |
|

So, the patch reduces code execution time by around 6%.

If you see any way to improve the patch or mistake I made, feel free to share.

Best regards.

Repository:
  rG LLVM Github Monorepo

https://reviews.llvm.org/D94015

Files:
  llvm/lib/Transforms/Scalar/LoopIdiomRecognize.cpp
  llvm/test/Transforms/LoopIdiom/cttz.ll

-------------- next part --------------
A non-text attachment was scrubbed...
Name: D94015.314386.patch
Type: text/x-patch
Size: 9507 bytes
Desc: not available
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20210104/ee40a8c0/attachment.bin>