[llvm-dev] FW: RFC: Atomic LL/SC loops in LLVM revisited

Wed Jun 20 07:44:17 PDT 2018

On 20 June 2018 at 15:10, Simon Dardis via llvm-dev
<llvm-dev at lists.llvm.org> wrote:
> (Resending,  missed llvm-dev somehow.)
>>
>> On 15 June 2018 at 12:28, Tim Northover via llvm-dev <llvm-
>> dev at lists.llvm.org> wrote:
>> > On Thu, 14 Jun 2018 at 13:45, Alex Bradbury <asb at lowrisc.org> wrote:
>> >> I don't like to see the compiler generate code that's obviously
>> >> dumber than what a human would write, but in this case do we really
>> >> think there would be any sort of measurable impact on performance?
>> >
>> > It's certainly going to be marginal, but then so is the benefit of
>> > late expansion.
>>
>> I feel differently: to my mind there's a lot to gain by late expansion
>> and relatively little to lose. By expanding only the LL/SC loop late
>> and potentially making use of future 'asm goto' support in the future
>> (as James suggests) there should be minimal/no codegen impact. Output
>> that is guaranteed to meet the platform requirements for forward
>> progress with a codegen method that is resilient to the introduction
>> of new in-tree or out-of-tree passes is a huge win to me vs the status
>> quo. Although the current approach appears to work ok in practice we
>> know expansion in IR for LL/SC is fundamentally on shaky ground  - at least for archs other than Hexagon.
>
> This is a concern for MIPS for both correctness and quality of the generated code, and the maintenance burden regarding the number of different expansions required if they are to be expanded at the MC layer.

Hi Simon. Although expanding at the MC layer has the nice property of
being as close to just emitting inline ASM as possible, that's
actually not what I'm advocating. If you look at e.g.
https://reviews.llvm.org/D47882 you'll see that as much expansion
takes possible takes place in IR, while the LL/SC loop is expanded in
a very late stage MachineFunctionPass. For the reasons you stated, it
sounds like the Mips backend wouldn't want to perform the last-stage
expansion quite as late as I'm doing in RISC-V, which is a choice each
backend is free to make.

Surely the strategy described is no worse than the scheme in D31287 in
terms of delay slot filling etc, and possibly has minor advantages by
allowing a little more to be expanded in the IR level and thus sharing
a little more code with other backends. Or am I misunderstanding?

Best,

Alex