[PATCH] D127901: [LinkerWrapper] Add PTX output to CUDA fatbinary in LTO-mode

Thu Jun 16 14:40:32 PDT 2022

tra added a comment.

Playing devil's advocate, I've got to ask -- do we even want to support JIT?

JIT brings more trouble than benefits.

- substantial start-up time on nontrivial apps. Last time I tried launching a tensorflow app and needed to JIT its kernels, it took about half an hour until JIT was done.
- substantial increase in the size of the executable. Statically linked tensorflow apps are already pushing the limits of the executables that use small memory model (-mcmodel=small is the default for clang and gcc, AFAICT).
- very easy to make a mistake, compile for a wrong GPU and not notice it, because JIT will try to keep it running using PTX.
- makes executables and tests non-hermetic -- the code that will run on GPU (and thus the behavior) will depend on particular driver version the apps uses at runtime.

Benefits: It *may* allow us to run a miscompiled/outdated CUDA app. Whether it's actually a benefit is questionable. To me it looks like a way to paper over a problem.

We (google) have experienced all of the above and ended up disabling PTX JIT'ting altogether.

That said, we do embed PTX by default at the moment, so this patch does not really change the status quo, so I'm not opposed to it, as long is we can disable PTX embedding if we need/want to.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D127901/new/

https://reviews.llvm.org/D127901