[llvm-dev] sum elements in the vector

suyog sarda via llvm-dev llvm-dev at lists.llvm.org
Mon May 30 21:37:08 PDT 2016

> Thanks for the reply. Do you know if it is possible to add a new intrinsic
> without actually modifying core code (ISDOpcodes.h is an example of core
> code)? I'd like to add this intrinsic with as little code change as
> possible.

There were attempts to identify this pattern (and similar other) at
selectionDAG level and emit vector code customized for a backend.
See example - https://llvm.org/bugs/show_bug.cgi?id=20035 and thread

As reviewed, this made very specific to a given target and it was suggested
to handle these patterns at IR level.
At IR level, attempt was made to catch this pattern in Loop/SLP(unrolled
straight line code) vectorizer and use TTI cost info to vectorize the
This was again clumsy.

As James and Shahid pointed out, intrinsics are the best shot to vectorize
this pattern to make it more generic.
This generally follows the steps listed by Shahid :

1.       Define an intrinsic, xyz(),  for the required instruction and
corresponding SDNode

2.       Generate the “call xyz() “ IR based the matched pattern

3.       Map “call xyz()” IR to corresponding SDNode in

4.       Provide default expansion of the xyz() intrinsic

5.       Legalize type and/or operation

6.       Provide Lowering of intrinsic/SDNode to generate your target

When a pattern is identified and marked with intrinsic at IR level, a
corresponding SDNode will be generated when converting IR to SelectionDAG.
This SDNode can then be legalize/expanded/lowered to specific target when
lowering to target machine code. For this intrinsic specific SDNode,
IMO you will have to add the entries in ISDOpcodes.h. I don't see any harm
or big change in adding them.

As you know, the pattern listed in above discussions is commonly occurring
pattern and hence need to be identified at IR level via intrinsics.
Every target has its own way to handle these patterns - as far as i know
X86 will have single instruction PSAD while AArch64 will handle it in two
This variance can be handled at DAGLowering and target code generation
phase (and this will be highly acceptable to land in trunk code
since it solves the issue in clean and generic way).

Not sure if D10867 and D11678  were reverted later, but i think these can
serve your purpose as an example to add intrinsic and generate code.

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20160531/d9ff41a7/attachment.html>

More information about the llvm-dev mailing list