[PATCH] D47691: [NVPTX] Ignore target-cpu and -features for inling

Hal Finkel via Phabricator via llvm-commits llvm-commits at lists.llvm.org
Wed Jun 6 05:40:08 PDT 2018


hfinkel added inline comments.


================
Comment at: lib/Target/NVPTX/NVPTXTargetTransformInfo.h:66
+  // attributes that were added to newer versions of LLVM/Clang: There are
+  // no incompatible functions in PTX, ptxas will throw errors in such cases.
+  bool areInlineCompatible(const Function *Caller,
----------------
tra wrote:
> Hahnfeld wrote:
> > tra wrote:
> > > hfinkel wrote:
> > > > "We can ignore potential problems in inlining because the assembler will generate errors later if we do the wrong thing" is probably not what we mean. Obviously the point of this function is to avoid inlining when that might cause a problem later (either in tools, such as the assembler, or at runtime). 
> > > > 
> > > > My understanding, from the ongoing discussions, is that PTX is backward compatible, and ptxas will generate code for the underlying target for all PTX uniformly during any given compilation (and, thus, regardless of what the attributes say, the generated machine code will use features from the most-recent specified target. This might not be completely true (i.e., ptxas might still generate code in light of different legalization decisions), but it might be true enough to be the desired behavior.
> > > We're talking about two  issues here:
> > > 
> > > * what should be done with functions that have target-cpu or target-feature that's not compatible with the current compilation mode?
> > > 
> > > Right now the assumption is that the IR we're given is valid (as in - compilable) for the given compilation mode. At the very least, I can successfully compile functions with target-cpu=sm_70 with llc -march=aarch64.
> > > 
> > > If that's not the case, results may vary from success to partial success (suboptimal code due to some instructions being unavailable), to a failure during compilation (unavailable intrinsics) to PTX or, finally, a failure during compilation of PTX->SASS (e.g. invalid instruction in inline assembly)). I'm not aware of a good way to deal with incompatible IR in LLVM and that's not NVPTX-specific.
> > > 
> > > * What inlining restrictions should we impose on the functions we deemed acceptable for compilation?
> > > If the assumption above is true, then we have no reason (in NVPTX)  to prevent linkining based on these attributes, because every one of them is compilable, which in turn means that it's executable on the target GPU.  The goal of preventing inlining here was to avoid executing incompatible code.
> > > 
> > > Bottom line -- the situation is far from perfect, but IMO the patch does sensible thing if we're compiling IR with mixed target-cpu and target-features attributes using NVPTX.
> > (To be honest I don't understand the point of `areInlineCompatible` at all: Either that function can be compiled in which case it doesn't matter if it is called or inlined, or it is invalid which will result in an error...)
> My understanding is that it's needed to support function multiversioning in clang which will only work if functions are not inlined:
> https://clang.llvm.org/docs/AttributeReference.html#target-gnu-target  https://gcc.gnu.org/wiki/FunctionMultiVersioning
> 
> (To be honest I don't understand the point of areInlineCompatible at all: Either that function can be compiled in which case it doesn't matter if it is called or inlined, or it is invalid which will result in an error...)

Because you might have:

  foo.c:
  void foo() {
    // some basic implementation
  }

  foo_avx.c, compiled with -mavx
  void foo_avx() {
    // do things with some axv intrinsics
  }

  main.c:
  ...
  if (cpu_has_avx())
    foo_avx();
  else
    foo();

now compile with LTO enabled. You need to prevent the inlining of foo_avx into the caller in main or else you'll lose the avx codegen (which is an optimization problem, and could, moreover, lead to a compilation failure if intrinsics are used that won't work if the target feature is not enabled).

Is there a sense in which the same is true for PTX? Are there intrinsics that depend on target features?


Repository:
  rL LLVM

https://reviews.llvm.org/D47691





More information about the llvm-commits mailing list