[PATCH] D108961: [RISCV] MC relaxation for out-of-range conditional branch.

Mon Jan 30 08:26:18 PST 2023

reames added a comment.

As @craig.topper mentioned above, I stumbled into this issue last week when debugging an LTO failure.  The exact issue I hit was because BranchRelaxation doesn't account for the padding required in RISCV's alignment directive.  (Which is different from every other target.)  We could, of course, patch the particular issue in BranchRelaxation, but I want to make an argument for reversing course on this patch as well.  If we'd landed this patch, my hard to debug LTO correctness issue would have been a minor code quality issue at worst, and I'd not have lost a week of my life.  :)

First, we are currently incompatible with the GNU toolchain.  LLVM's assembler will not successfully assemble programs which 'as' will.  This shows up both in hand written assembly, but also, in theory at the moment, in output from gcc.

  target:
  	bne a0, a1, target
  .rep 1024
  	nop
  .endr
  	bne a0, a1, target

Personally, I consider this a very strong reason to accept this patch (or some variant thereof).  To the point where I almost hesitate to bring up other lines of arguments because I'm worried they'll distract from the core point of cross toolchain compatibility.  We *need* cross toolchain compatibility; any exceptions to this need to be *very* strongly justified, and I don't consider any of the arguments made on this thread to date to come anywhere close to that bar.

Second, I don't believe the correctness invariant relied on from BranchRelaxation is a good idea.  At the moment, we *must* get all instruction sizes used in branch relaxation to be upper bounds on the actual bytes generated.  We've now had multiple examples where this was not upheld in edge cases, and I don't see anything in the code which particularly enforces said invariant.  I've posted a couple patches to add a few relevant asserts here and there, but this is a cross cutting invariant with broad implications.  If we can remove it it, that removes a major correctness risk.

There was an argument made on thread previously that this patch adds rarely executed behavior which is likely to bit rot.  I think this argument gets things completely backwards.  The fundamental case requiring relaxation has to be implemented somewhere, and the assembler implementation is actually the easier to test of the alternatives.  The assembler change has dramatically less "blast radius" than the compiler BranchRelaxation approach.

Now, I do need to acknowledge that there's related branch relaxation case which the assembler *can't* easily handle.  Specifically, when we cross out of +/- 2MB range, we need a scratch register to materialize the long branch.  To my knowledge, GCC doesn't handle this case at all.

Third, I think we actually have potentially significant missed optimizations here.  BranchRelaxation is inherently conservative.  As one example, @craig.topper tells me that every branch is modeled as 4 bytes, even if the branch target would allow the branch to be compressible.  As such, for branch dense code (e.g. loop nests) we can end up relaxation branches which don't need relaxed.  I haven't (yet) seen a case compelling enough on it's own to address, but in theory, such cases are certainly possible.  Since we have a correctness need to change schemes here anyways, we might as well fix the optimization problem at the same time.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D108961/new/

https://reviews.llvm.org/D108961