[llvm-dev] [RFC][RISCV] Selection of complex codegen patterns into RISCV bit manipulation instructions

Roman Lebedev via llvm-dev llvm-dev at lists.llvm.org
Wed Aug 14 05:31:27 PDT 2019

On Wed, Aug 14, 2019 at 3:13 PM paolo via llvm-dev
<llvm-dev at lists.llvm.org> wrote:
> Hi all,

> I'm currently working on the implementation for LLVM of the RISCV Bit
> Manipulation ISA extension described by Clifford Wolf in the following
> presentation:
> https://content.riscv.org/wp-content/uploads/2019/06/17.10-b_wolf.pdf
> and the following document:
> https://github.com/riscv/riscv-bitmanip/blob/master/bitmanip-0.90.pdf

> The aim is to provide the intrinsic functions to the user in order to
> implement code that is more optimal for bit manipulations on RISCV
> targets, but also to provide automatic optimization by lowering simple
> code patterns into optimized bit manipulation assembly,
>     %neg = xor i32 %1, -1                    ----->      andn t0, t0, t1
>     %and = and i32 %0, %neg
> just in case the user wants such optimization but is not aware of all
> the bits that can be optimized.
> I'm dealing with the fact that it is pretty hard to select some patterns
> of DAG nodes in order to replace them with an optimal machine equivalent
> machine instruction.
> Take for intsance the count leading zeros operation:
>     uint32_t clz (uint32_t x)
>     {
>         for (int count = 0; count < 32; count++ ) {
>             if ((x << count) < 0)
>                 return count;
>         }
>         return 32;
>     }
> It needs a loop to be performed and that makes it difficult to be
> lowered because it goes through several basic blocks, and different
> optimizations can easily compromise the pattern recognition.
You only want to lower LLVM IR @llvm.cttz intrinsic in *that* case.

> What I'm wondering is, is there already any place in LLVM where complex
> patterns like this are already lowered into single instructions? (e.g.:
> clz, that is used quite often). Maybe at a higher level?
That depends.
If there's LLVM intrinsic for it, then any normal optimization pass could do it.
In cttz's case it's mainly done in LoopIdiom pass.

> Another point of view that I've been suggested and that I'd like to
> discuss is: does it make sense to implement such lowering for operations
> that normally a user wouldn't implement from scratch when an intrinsic
> function is already available for that?
Again, i'd say this is too broad of a question.
If there is LLVM IR intrinsic, then you only need to lower it,
and optionally ensure that middle-end passes form it from appropriate IR.

If there isn't one, then yes, you'd want to match all the beautiful wilderness
of the possible patterns that combine into that instruction.

While it's really tempting to just add IR intrinsic for everything,
please do note that a new intrinsic is completely opaque to the rest of LLVM.
It does not magically get peep-hole folds, so those would need to be added,
especially if you intend to form said intrinsic within the middle-end from IR.

This may change some day when these peep-hole folds are auto-inferred,
but that is not so nowadays. Really looking forward to that.

> Many thanks.
> Paolo Savini

> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev

More information about the llvm-dev mailing list