[llvm-dev] [RFC][RISCV] Selection of complex codegen patterns into RISCV bit manipulation instructions

Wed Aug 28 10:08:29 PDT 2019

Hi Roman,

following from a similar discussion that started on Phabricator:

https://reviews.llvm.org/D66479

I'd like to re-elaborate my answer and change a bit the scope of the
question.

Regardless of the user interface that the bit manipulation patch
provides to the user (e.g. frontend intrinsics, C code...) the C or LLVM
IR implementation of some of the instructions from the bit manipulation
proposal cannot (as far as I know) be patternmatched directly with
RISCVISD nodes because they span over multiple basic blocks.

For this reason we need to implement idiom recognition in the middle end
and, if the pattern matches, emit an LLVM intrinsic, like LLVM does for
ctlz and cttz. Than we would be able to select such instruction.

On 15/08/2019 11:20, Roman Lebedev wrote:
> On Thu, Aug 15, 2019 at 12:41 PM paolo <paolo.savini at embecosm.com> wrote:
>> Hi Roman,
>>> That depends.
>>> If there's LLVM intrinsic for it, then any normal optimization pass could do it.
>>> In cttz's case it's mainly done in LoopIdiom pass.
>> Oh yes. Thank you!
>>
>> Unfortunately several of the instructions of the bit manipulation
>> extension don't seem to have an intrinsic already in LLVM.
>>
>> That will require to add some passes to the middle end.
>>
>>> Again, i'd say this is too broad of a question.
>>> If there is LLVM IR intrinsic, then you only need to lower it,
>>> and optionally ensure that middle-end passes form it from appropriate IR.
>>>
>>> If there isn't one, then yes, you'd want to match all the beautiful wilderness
>>> of the possible patterns that combine into that instruction.
>>>
>>> While it's really tempting to just add IR intrinsic for everything,
>>> please do note that a new intrinsic is completely opaque to the rest of LLVM.
>>> It does not magically get peep-hole folds, so those would need to be added,
>>> especially if you intend to form said intrinsic within the middle-end from IR.
>>>
>>> This may change some day when these peep-hole folds are auto-inferred,
>>> but that is not so nowadays. Really looking forward to that.
>>>
>> It would be definitely interesting.
>>
>> Anyway adding such complex instructions to the middle end seems material
>> for another patch. Unless things change in the meantime.
>>
>> For now we can provide a lower level optimization of smaller bit
>> manipulation patterns.
>>
>> But I'll definitely look into adding those passes as they would provide
>> much more optimization.
> I'm not sure what you mean by "more passes" in the reply.
> If there is no matching instruction/intrinsic, then i'm not sure how a
> pass would help.
>
> *Please* do note my comment about adding new instructions/intrinsics.
> While it's not and immovable obstacle, it by no means should be treated lightly.
> If you want to add new LLVM IR instruction/intrinsic, with intention of actually
> producing it from other instructions in middle-end (as opposed to just lowering
> it from compiler front-end, or not producing it in middle-end),
> you must also consider how said new IR instruction/intrinsic will affect
> all other optimization passes, and *that* cost *is* high.

I see. But we might need to do it anyway. With caution. As you already
pointed out earlier if we add an LLVM intrinsic that is lowered directly
form a front end intrinsic, that would be impenetrable by middle end
optimizations. An advantage would be that it would be lowered "safely"
(with no mutual interference from optimization passes) into the
corresponding asm, but that also means that any optimization from LLVM
that could provide even better code (even faster or smaller than the
expected bit manipulation asm) wouldn't be possible.

All that being said, in the case a user wants to be sure that some
specific bit manipulation asm instructions are selected without
interference (think about for instance critical programs like C
implementations of block ciphers for which both performance and security
are crucial), how would you see to provide inline asm behind the
interface functions (e.g. _rv32_andn):

uint32_t _rv32_andn(uint32_t a, uint32_t b) {

    uint32_t res;

    __asm__ ("andn %0, %1, %2" : "=r"(res) : "r"(a), "r"(b));

    return res;

}

as opposed to provide a chain of front-end intrinsic that are lowered to
LLVM intrinsics and then asm?

uint32_t _rv32_andn(uint32_t a, uint32_t b) {

    return __builtin_andn(a, b);

}

Nothing would change form the user's perspective, but I guess that would
imply a difference in LLVM for ... maintainability?

> E.g. if you add 'andn', you then need to find every fold that would look for
> and(not(y), x) or and(x, not(y)) and teach it about 'andn'.
> Things will be more fun with more complex patterns :)
>
>> Many thanks.
>>
>> Paolo
> Roman

Paolo