[llvm-dev] RFC: On non 8-bit bytes and the target for it

Mon Nov 4 16:07:14 PST 2019

On Mon, Nov 4, 2019 at 4:00 PM James Molloy <james at jamesmolloy.co.uk> wrote:

> Hi,
>
> To throw my hat into the ring, we also maintain a downstream target that
> has a subset of this problem - it has non-8-bit-addressable memory. This
> means that %uglygeps don't work, pointers can't be casted to i8* and expect
> to be byte-addressed (conversions to memcpy/memset that use i8* aren't
> welcome either), and GEP lowering code is slightly different.
>
> We maintain this currently as a pile of known technical debt. We have a
> CGP pass to decompose GEPs and custom-expand them taking our word size into
> account, but everything before that is just waiting for InstCombine to
> break something. AliasAnalysis also makes offsetting assumptions that are
> totally bogus because our pointers are word-addressed and therefore so are
> pointer offsets.
>
> We'd be really keen to help here. We're keen to upstream anything we
> possibly can (and have been, over the past few months). We've have several
> discussions about how best to approach upstream with this, and the sticking
> point has always been lack of a testing target. It's always felt to me that
> the idea of addressable granule should be a fairly reasonable DataLayout
> addition; We can test DataLayout changes purely via opt without requiring a
> target that uses them. Lowering to instructions was always the testing
> sticking point.
>
> We'd be keen to help out what the community decides to do here. I
> personally feel it's reasonable that:
>   - LangRef/DataLayout is updated with semantically coherent changes.
>   - The midend optimizer is updated by someone who cares about those
> changes and tests are added that use the new DataLayout.
>   - Developers that don't care about those changes maintain a best-effort
> approach, which is exactly what we do right now; there are features that
> are tested but are still esoteric enough that I might reasonably break them
> without realising (OperandBundles come to mind), so I don't think there's
> any change in mindset here.
>   - Developers that care perform downstream testing and provide review
> feedback / revert if really bad / fixes. Again, this is how LLVM works
> right now - I'd guess that >80% of our real-world test coverage comes from
> downstream users deploying ToT LLVM rather than the upstream LIT tests /
> builders.
>

This last bit is the bit I'd worry about if the bug were only visible in an
out-of-tree target. If it's visible with a DataLayout change that can be
tested upstream, then I'd be OK with an upstream patch being reverted given
a test case with custom DataLayout showing the failure.

But if the failure is for, say, non-8-bit-bytes that are only actually
exercised downstream, I'm less OK with patches being reverted upstream no
matter the severity of the breakage to such targets downstream.

- Dave

>
> Cheers,
>
> James
>
> On Mon, 4 Nov 2019 at 12:16, David Blaikie via llvm-dev <
> llvm-dev at lists.llvm.org> wrote:
>
>>
>>
>> On Sat, Nov 2, 2019 at 12:45 AM Jorg Brown via llvm-dev <
>> llvm-dev at lists.llvm.org> wrote:
>>
>>> On Fri, Nov 1, 2019 at 8:40 AM Adrian Prantl via llvm-dev <
>>> llvm-dev at lists.llvm.org> wrote:
>>>
>>>> > On Nov 1, 2019, at 3:41 AM, Dmitriy Borisenkov via llvm-dev <
>>>> llvm-dev at lists.llvm.org> wrote:
>>>> > It seems that there are two possible solutions on how to move forward
>>>> with non 8 bits byte:
>>>> >
>>>> > 1. Commit changes without tests. Chris Lattner, Mikael Holmen, Jeroen
>>>> Dobbelaere, Jesper Antonsson support this idea.
>>>> > James Y Knight says that at least magic numbers should be removed "at
>>>> least where it arguably helps code clarity". This might be not exactly the
>>>> scope of the changes discussed, but it's probably worth do discuss code
>>>> clarity having concrete patches.
>>>> > GCC (according to James Y Knight) has the same practice meaning non-8
>>>> bits byte is supported but there are no tests in upstream and we have
>>>> downstream contributors who will fix the bugs if they appear in the LLVM
>>>> core.
>>>> > David Chisnall raised a question about what to count as a byte (which
>>>> defines the scope of the changes) and we suggest to use all 5 criteria he
>>>> granted:
>>>> > > - The smallest unit that can be loaded / stored at a time.
>>>> > > - The smallest unit that can be addressed with a raw pointer in a
>>>> specific address space.
>>>> > > - The largest unit whose encoding is opaque to anything above the
>>>> ISA.
>>>> > > - The type used to represent `char` in C.
>>>> > > - The type that has a size that all other types are a multiple of.
>>>> > But if DSPs are less restrictive about byte, some of the criteria
>>>> could be removed.
>>>> >
>>>> > 2. Use an iconic target. PDP10 was suggested as a candidate. This
>>>> opinion found support from Tim Northover, Joerg Sonenberger, Mehdi AMINI,
>>>> Philip Reames. It's not clear though does this opinion oppose upstreaming
>>>> non-8-bits byte without tests or just a dummy and TVM targets options.
>>>> >
>>>> > So if there is no strong opposition to the solution 1 from the people
>>>> supporting an iconic target option, we could probably move to the patches.
>>>>
>>>> I'm in camp (2). Any changes that are not tested are an invitation to
>>>> upstream developers to "simplify" the code, not knowing that those changes
>>>> are important. Anyone who commits untested changes to LLVM will inevitably
>>>> face an uphill battle against benevolent NFC refactorings that break these
>>>> changes because the expectation of how the code is supposed to behave is
>>>> not codified in a test. In the short term option (1) sounds more appealing
>>>> because they can start right away, but I'm going to predict that it will be
>>>> more expensive for the downstream maintainers of non 8-bit targets in the
>>>> long term.
>>>>
>>>
>>> I've worked on multiple codebases where an option existed in order to
>>> satisfy an extremely small userbase, with little or no testing,
>>>
>>
>> In those situations, were the core developers responsible for those
>> features/users? Yeah, if I needed to support a certain observable feature
>> of clang continuing to work, I'd want tests (I'm pretty serious about
>> testing, FWIW).
>>
>>
>>> and as such, I'm adamantly opposed to repeating it.
>>>
>>
>>>
>> In addition to what Adrian said, where the weird option exists but is
>>> constantly being incidentally broken, I've seen the opposite problem:
>>> people become afraid to refactor a section of code because it might break
>>> the weird option.  You might say "if there aren't any tests, people should
>>> feel free to refactor the code; their only responsibility is to make sure
>>> the tests will still work", but honestly, I've seen the opposite: without
>>> tests, it's just presumed that touching certain parts of code is likely to
>>> break things, and so after a time, an aura of untouchability starts to
>>> surround regions of the code.  And then, the more time goes by, the more
>>> that code becomes unfamiliar to everyone (because no one is actively
>>> maintaining it).  In the long run, the cost of an unmaintained option may
>>> be far more than the cost of a maintained one.
>>>
>>
>>
>> I'm not actually opposed to this situation - LLVM as a project is pretty
>> happy about making big structural changes to the codebase & holding the
>> test coverage and downstream users accountable for ensuring quality. We
>> rarely avoid changes due to risk of breakage & as a community push back a
>> fair bit on reviewers suggesting we should - if someone can't demonstrate
>> the breakage in an upstream test (or has a pretty good track record of true
>> positives that may take some time to investigate - and thus it might be
>> better in the short term to revert while waiting for that evidence to be
>> provided) the changes tend to go in and stay in.
>>
>> Yeah, I think some parts of the code may become complicated enough to
>> warrant separate testing - but most of the code that might move to
>> constants for byte width, iterate over bits to that byte width, etc, will
>> be tested on one value & might have bugs on other values that will be found
>> (or not) downstream - best effort and all. But in cases where the code to
>> handle novel byte widths becomes more complicated - some abstraction and
>> unit testing would seem quite appropriate.
>>
>>
>>
>>> In short: please don't commit changes without tests.  Even if the test
>>> is nothing but making sure this works:
>>>
>>> int main(int argc, char *argv[]) {
>>>   return argv[argc - 1][0];
>>> }
>>>
>>> That at least would give some freedom from the guilt of breaking
>>> something important.
>>>
>>
>> It's hard to make sure that works in a meaningful sense in this context -
>> without a non-8-bit-byte target in upstream LLVM, which is the point of
>> contention/discussion. It's unclear if there's a suitable target/community
>> to provide/maintain such a target upstream. I don't think there's a
>> "cheap"/stub/trivial target we could create that would provide what you're
>> suggesting without bitrotting quickly and being removed (more quickly than,
>> I think, the sort of patches to support but not provide, non-8-bit-byte
>> targets).
>>
>> Though it's hard to guess without seeing the sort of patches that'd be
>> needed.
>>
>> - Dave
>>
>>
>>>
>>> -- Jorg
>>>
>>> _______________________________________________
>>> LLVM Developers mailing list
>>> llvm-dev at lists.llvm.org
>>> https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>>>
>> _______________________________________________
>> LLVM Developers mailing list
>> llvm-dev at lists.llvm.org
>> https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20191104/0d86ed01/attachment-0001.html>