[llvm-dev] RFC: On non 8-bit bytes and the target for it

Tue Nov 5 16:35:30 PST 2019

On Mon, Nov 4, 2019 at 4:07 PM David Blaikie via llvm-dev <
llvm-dev at lists.llvm.org> wrote:

>
>
> On Mon, Nov 4, 2019 at 4:00 PM James Molloy <james at jamesmolloy.co.uk>
> wrote:
>
>> Hi,
>>
>> To throw my hat into the ring, we also maintain a downstream target that
>> has a subset of this problem - it has non-8-bit-addressable memory. This
>> means that %uglygeps don't work, pointers can't be casted to i8* and expect
>> to be byte-addressed (conversions to memcpy/memset that use i8* aren't
>> welcome either), and GEP lowering code is slightly different.
>>
>> We maintain this currently as a pile of known technical debt. We have a
>> CGP pass to decompose GEPs and custom-expand them taking our word size into
>> account, but everything before that is just waiting for InstCombine to
>> break something. AliasAnalysis also makes offsetting assumptions that are
>> totally bogus because our pointers are word-addressed and therefore so are
>> pointer offsets.
>>
>> We'd be really keen to help here. We're keen to upstream anything we
>> possibly can (and have been, over the past few months). We've have several
>> discussions about how best to approach upstream with this, and the sticking
>> point has always been lack of a testing target. It's always felt to me that
>> the idea of addressable granule should be a fairly reasonable DataLayout
>> addition; We can test DataLayout changes purely via opt without requiring a
>> target that uses them. Lowering to instructions was always the testing
>> sticking point.
>>
>> We'd be keen to help out what the community decides to do here. I
>> personally feel it's reasonable that:
>>   - LangRef/DataLayout is updated with semantically coherent changes.
>>   - The midend optimizer is updated by someone who cares about those
>> changes and tests are added that use the new DataLayout.
>>   - Developers that don't care about those changes maintain a best-effort
>> approach, which is exactly what we do right now; there are features that
>> are tested but are still esoteric enough that I might reasonably break them
>> without realising (OperandBundles come to mind), so I don't think there's
>> any change in mindset here.
>>   - Developers that care perform downstream testing and provide review
>> feedback / revert if really bad / fixes. Again, this is how LLVM works
>> right now - I'd guess that >80% of our real-world test coverage comes from
>> downstream users deploying ToT LLVM rather than the upstream LIT tests /
>> builders.
>>
>
> This last bit is the bit I'd worry about if the bug were only visible in
> an out-of-tree target. If it's visible with a DataLayout change that can be
> tested upstream, then I'd be OK with an upstream patch being reverted given
> a test case with custom DataLayout showing the failure.
>
> But if the failure is for, say, non-8-bit-bytes that are only actually
> exercised downstream, I'm less OK with patches being reverted upstream no
> matter the severity of the breakage to such targets downstream.
>

Suppose that the failure is build breakage for the downstream target. This
could cause the entirety of the downstream LLVM/Clang build to break in a
way that isn’t easy to untangle or fix. This could prevent an entire
organization from integrating LLVM for an extended period of time. So it
isn’t necessarily “no matter the severity of the breakage for such
*targets* downstream” but could be the entire organization unless a
tactical downstream revert of the offending change can be made.

— Sean Silva

> - Dave
>
>
>
>>
>> Cheers,
>>
>> James
>>
>> On Mon, 4 Nov 2019 at 12:16, David Blaikie via llvm-dev <
>> llvm-dev at lists.llvm.org> wrote:
>>
>>>
>>>
>>> On Sat, Nov 2, 2019 at 12:45 AM Jorg Brown via llvm-dev <
>>> llvm-dev at lists.llvm.org> wrote:
>>>
>>>> On Fri, Nov 1, 2019 at 8:40 AM Adrian Prantl via llvm-dev <
>>>> llvm-dev at lists.llvm.org> wrote:
>>>>
>>>>> > On Nov 1, 2019, at 3:41 AM, Dmitriy Borisenkov via llvm-dev <
>>>>> llvm-dev at lists.llvm.org> wrote:
>>>>> > It seems that there are two possible solutions on how to move
>>>>> forward with non 8 bits byte:
>>>>> >
>>>>> > 1. Commit changes without tests. Chris Lattner, Mikael Holmen,
>>>>> Jeroen Dobbelaere, Jesper Antonsson support this idea.
>>>>> > James Y Knight says that at least magic numbers should be removed
>>>>> "at least where it arguably helps code clarity". This might be not exactly
>>>>> the scope of the changes discussed, but it's probably worth do discuss code
>>>>> clarity having concrete patches.
>>>>> > GCC (according to James Y Knight) has the same practice meaning
>>>>> non-8 bits byte is supported but there are no tests in upstream and we have
>>>>> downstream contributors who will fix the bugs if they appear in the LLVM
>>>>> core.
>>>>> > David Chisnall raised a question about what to count as a byte
>>>>> (which defines the scope of the changes) and we suggest to use all 5
>>>>> criteria he granted:
>>>>> > > - The smallest unit that can be loaded / stored at a time.
>>>>> > > - The smallest unit that can be addressed with a raw pointer in a
>>>>> specific address space.
>>>>> > > - The largest unit whose encoding is opaque to anything above the
>>>>> ISA.
>>>>> > > - The type used to represent `char` in C.
>>>>> > > - The type that has a size that all other types are a multiple of.
>>>>> > But if DSPs are less restrictive about byte, some of the criteria
>>>>> could be removed.
>>>>> >
>>>>> > 2. Use an iconic target. PDP10 was suggested as a candidate. This
>>>>> opinion found support from Tim Northover, Joerg Sonenberger, Mehdi AMINI,
>>>>> Philip Reames. It's not clear though does this opinion oppose upstreaming
>>>>> non-8-bits byte without tests or just a dummy and TVM targets options.
>>>>> >
>>>>> > So if there is no strong opposition to the solution 1 from the
>>>>> people supporting an iconic target option, we could probably move to the
>>>>> patches.
>>>>>
>>>>> I'm in camp (2). Any changes that are not tested are an invitation to
>>>>> upstream developers to "simplify" the code, not knowing that those changes
>>>>> are important. Anyone who commits untested changes to LLVM will inevitably
>>>>> face an uphill battle against benevolent NFC refactorings that break these
>>>>> changes because the expectation of how the code is supposed to behave is
>>>>> not codified in a test. In the short term option (1) sounds more appealing
>>>>> because they can start right away, but I'm going to predict that it will be
>>>>> more expensive for the downstream maintainers of non 8-bit targets in the
>>>>> long term.
>>>>>
>>>>
>>>> I've worked on multiple codebases where an option existed in order to
>>>> satisfy an extremely small userbase, with little or no testing,
>>>>
>>>
>>> In those situations, were the core developers responsible for those
>>> features/users? Yeah, if I needed to support a certain observable feature
>>> of clang continuing to work, I'd want tests (I'm pretty serious about
>>> testing, FWIW).
>>>
>>>
>>>> and as such, I'm adamantly opposed to repeating it.
>>>>
>>>
>>>>
>>> In addition to what Adrian said, where the weird option exists but is
>>>> constantly being incidentally broken, I've seen the opposite problem:
>>>> people become afraid to refactor a section of code because it might break
>>>> the weird option.  You might say "if there aren't any tests, people should
>>>> feel free to refactor the code; their only responsibility is to make sure
>>>> the tests will still work", but honestly, I've seen the opposite: without
>>>> tests, it's just presumed that touching certain parts of code is likely to
>>>> break things, and so after a time, an aura of untouchability starts to
>>>> surround regions of the code.  And then, the more time goes by, the more
>>>> that code becomes unfamiliar to everyone (because no one is actively
>>>> maintaining it).  In the long run, the cost of an unmaintained option may
>>>> be far more than the cost of a maintained one.
>>>>
>>>
>>>
>>> I'm not actually opposed to this situation - LLVM as a project is pretty
>>> happy about making big structural changes to the codebase & holding the
>>> test coverage and downstream users accountable for ensuring quality. We
>>> rarely avoid changes due to risk of breakage & as a community push back a
>>> fair bit on reviewers suggesting we should - if someone can't demonstrate
>>> the breakage in an upstream test (or has a pretty good track record of true
>>> positives that may take some time to investigate - and thus it might be
>>> better in the short term to revert while waiting for that evidence to be
>>> provided) the changes tend to go in and stay in.
>>>
>>> Yeah, I think some parts of the code may become complicated enough to
>>> warrant separate testing - but most of the code that might move to
>>> constants for byte width, iterate over bits to that byte width, etc, will
>>> be tested on one value & might have bugs on other values that will be found
>>> (or not) downstream - best effort and all. But in cases where the code to
>>> handle novel byte widths becomes more complicated - some abstraction and
>>> unit testing would seem quite appropriate.
>>>
>>>
>>>
>>>> In short: please don't commit changes without tests.  Even if the test
>>>> is nothing but making sure this works:
>>>>
>>>> int main(int argc, char *argv[]) {
>>>>   return argv[argc - 1][0];
>>>> }
>>>>
>>>> That at least would give some freedom from the guilt of breaking
>>>> something important.
>>>>
>>>
>>> It's hard to make sure that works in a meaningful sense in this context
>>> - without a non-8-bit-byte target in upstream LLVM, which is the point of
>>> contention/discussion. It's unclear if there's a suitable target/community
>>> to provide/maintain such a target upstream. I don't think there's a
>>> "cheap"/stub/trivial target we could create that would provide what you're
>>> suggesting without bitrotting quickly and being removed (more quickly than,
>>> I think, the sort of patches to support but not provide, non-8-bit-byte
>>> targets).
>>>
>>> Though it's hard to guess without seeing the sort of patches that'd be
>>> needed.
>>>
>>> - Dave
>>>
>>>
>>>>
>>>> -- Jorg
>>>>
>>>> _______________________________________________
>>>> LLVM Developers mailing list
>>>> llvm-dev at lists.llvm.org
>>>> https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>>>>
>>> _______________________________________________
>>> LLVM Developers mailing list
>>> llvm-dev at lists.llvm.org
>>> https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>>>
>> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20191105/34c850f3/attachment.html>