[PATCH] D26103: Add tips for generic IR vs architecture specific code.

Thu Dec 1 15:07:04 PST 2016

asbirlea added a comment.

I tend to agree to that, that's why I suggested this could go into a a separate page. The generic documentation would point to each target specific subpage.
This is part of the feedback I was hoping for; there aren't currently any such pages, so as a draft I dropped the content in here, but I believe it would be better on its own.

================
Comment at: docs/Frontend/PerformanceTips.rst:126
+Whenever possible, the IR generated should be generic IR, instead of architecture
+specific IR (i.e. intrinsics).
+If LLVM cannot lower the generic code to the desired intrinsic, start a discussion
----------------
reames wrote:
> What is the take away from this piece of advice?  
> 
> small edits: 
> e.g. intrinsics or inline asm.
> define "generic IR" or use alternate phrase
I tried to explain more in the last comment.

Is it better to replace "generic IR' with "architecture independent (generic) IR"? 
All suggestion to make the doc clearer are more than welcome.

================
Comment at: docs/Frontend/PerformanceTips.rst:130
+for the missing lowering opportunity.
+A few known patterns that lead to lowering to intrinsics are listed below.
+
----------------
reames wrote:
> From this sentance, I'm not sure what to expect.  Are these patterns where generic IR *does* work, or does not work?
Patterns where generic IR *does* work. This draft certainly does not cover everything, I'd expect it to be expanded. I found no other documentation (other than the comments in ISEL code) that would help a frontend writer find these.

================
Comment at: docs/Frontend/PerformanceTips.rst:132
+
+The *interleaved access pass* performs the following lowerings (tests can be found in CodeGen/ARM/arm-interleaved-accesses.ll and CodeGen/AArch64/aarch64-interleaved-accesses.ll):
+
----------------
reames wrote:
> Why should this pass get special treatment in target neutral documentation?  We don't talk about ISEL here for instance.
Agreed - separate page for each architecture where we //do// talk about ISEL?

================
Comment at: docs/Frontend/PerformanceTips.rst:143
+
+         Is lowered to: 
+                ::
----------------
reames wrote:
> Lowered by whom, and why does a frontend author care?
In this example, by ISEL. The purpose is to have the frontend authors not generate custom intrinsics when generating generic IR should give the same asm in the end.
The example I've dealt with is Halide, which has special code generation of ARM/AArch64 code in some particular cases. These (used to) ge nerate intrinsics (some still do) for cases where the lowering would not get the right asm instruction. The example of the interleaved access pass is a case where there's no reason for intrinsics to be generated.
Changing their code generation to use the right patterns makes the resulting IR architecture independent, gets the same performance on the arm targets and at least the same on x86 and is easier to maintain.

The high-level idea I'm trying to convey is: if llvm's lowering passes can get the same performance, try to rely on those and generate architecture independent IR; if not, use target specific IR (intrinsics, inline asm) but please let the LLVM community know about it and perhaps it's something it should be addressed.

================
Comment at: docs/Frontend/PerformanceTips.rst:205
+
+LLVM does not promise to be performance aware, so the above patterns, while generic IR,  are still recommended for the particular platforms.
+For more suggestions of architecture specific patterns, please send
----------------
reames wrote:
> This sentence does not parse for me.
It was meant to be as a sort of disclaimer; if you could suggest how to make this clearer that would be great.

The idea is that such patterns are lowered to a particular asm instruction on this arch, known to be effective there. It may not give the best performance on another architecture.

The aim is to give the frontend authors the info on existing patterns, encourage them to generate generic IR whenever possible, while still testing if they get the expected performance on *other* archs. Then get their feedback when they do see such performance regressions, or when lowering could do a better job.

https://reviews.llvm.org/D26103