[llvm-dev] RFC: On non 8-bit bytes and the target for it

Tue Nov 5 17:02:14 PST 2019

On Tue, Nov 5, 2019 at 4:35 PM Sean Silva <chisophugis at gmail.com> wrote:

>
>
> On Mon, Nov 4, 2019 at 4:07 PM David Blaikie via llvm-dev <
> llvm-dev at lists.llvm.org> wrote:
>
>>
>>
>> On Mon, Nov 4, 2019 at 4:00 PM James Molloy <james at jamesmolloy.co.uk>
>> wrote:
>>
>>> Hi,
>>>
>>> To throw my hat into the ring, we also maintain a downstream target that
>>> has a subset of this problem - it has non-8-bit-addressable memory. This
>>> means that %uglygeps don't work, pointers can't be casted to i8* and expect
>>> to be byte-addressed (conversions to memcpy/memset that use i8* aren't
>>> welcome either), and GEP lowering code is slightly different.
>>>
>>> We maintain this currently as a pile of known technical debt. We have a
>>> CGP pass to decompose GEPs and custom-expand them taking our word size into
>>> account, but everything before that is just waiting for InstCombine to
>>> break something. AliasAnalysis also makes offsetting assumptions that are
>>> totally bogus because our pointers are word-addressed and therefore so are
>>> pointer offsets.
>>>
>>> We'd be really keen to help here. We're keen to upstream anything we
>>> possibly can (and have been, over the past few months). We've have several
>>> discussions about how best to approach upstream with this, and the sticking
>>> point has always been lack of a testing target. It's always felt to me that
>>> the idea of addressable granule should be a fairly reasonable DataLayout
>>> addition; We can test DataLayout changes purely via opt without requiring a
>>> target that uses them. Lowering to instructions was always the testing
>>> sticking point.
>>>
>>> We'd be keen to help out what the community decides to do here. I
>>> personally feel it's reasonable that:
>>>   - LangRef/DataLayout is updated with semantically coherent changes.
>>>   - The midend optimizer is updated by someone who cares about those
>>> changes and tests are added that use the new DataLayout.
>>>   - Developers that don't care about those changes maintain a
>>> best-effort approach, which is exactly what we do right now; there are
>>> features that are tested but are still esoteric enough that I might
>>> reasonably break them without realising (OperandBundles come to mind), so I
>>> don't think there's any change in mindset here.
>>>   - Developers that care perform downstream testing and provide review
>>> feedback / revert if really bad / fixes. Again, this is how LLVM works
>>> right now - I'd guess that >80% of our real-world test coverage comes from
>>> downstream users deploying ToT LLVM rather than the upstream LIT tests /
>>> builders.
>>>
>>
>> This last bit is the bit I'd worry about if the bug were only visible in
>> an out-of-tree target. If it's visible with a DataLayout change that can be
>> tested upstream, then I'd be OK with an upstream patch being reverted given
>> a test case with custom DataLayout showing the failure.
>>
>> But if the failure is for, say, non-8-bit-bytes that are only actually
>> exercised downstream, I'm less OK with patches being reverted upstream no
>> matter the severity of the breakage to such targets downstream.
>>
>
> Suppose that the failure is build breakage for the downstream target. This
> could cause the entirety of the downstream LLVM/Clang build to break in a
> way that isn’t easy to untangle or fix. This could prevent an entire
> organization from integrating LLVM for an extended period of time. So it
> isn’t necessarily “no matter the severity of the breakage for such
> *targets* downstream” but could be the entire organization unless a
> tactical downstream revert of the offending change can be made.
>

That's already the world they live in today having out of tree targets and
especially ones with non-8-bit-bytes. I don't think upstream can sign up
for any stronger support here in that situation.

>
> — Sean Silva
>
>
>> - Dave
>>
>>
>>
>>>
>>> Cheers,
>>>
>>> James
>>>
>>> On Mon, 4 Nov 2019 at 12:16, David Blaikie via llvm-dev <
>>> llvm-dev at lists.llvm.org> wrote:
>>>
>>>>
>>>>
>>>> On Sat, Nov 2, 2019 at 12:45 AM Jorg Brown via llvm-dev <
>>>> llvm-dev at lists.llvm.org> wrote:
>>>>
>>>>> On Fri, Nov 1, 2019 at 8:40 AM Adrian Prantl via llvm-dev <
>>>>> llvm-dev at lists.llvm.org> wrote:
>>>>>
>>>>>> > On Nov 1, 2019, at 3:41 AM, Dmitriy Borisenkov via llvm-dev <
>>>>>> llvm-dev at lists.llvm.org> wrote:
>>>>>> > It seems that there are two possible solutions on how to move
>>>>>> forward with non 8 bits byte:
>>>>>> >
>>>>>> > 1. Commit changes without tests. Chris Lattner, Mikael Holmen,
>>>>>> Jeroen Dobbelaere, Jesper Antonsson support this idea.
>>>>>> > James Y Knight says that at least magic numbers should be removed
>>>>>> "at least where it arguably helps code clarity". This might be not exactly
>>>>>> the scope of the changes discussed, but it's probably worth do discuss code
>>>>>> clarity having concrete patches.
>>>>>> > GCC (according to James Y Knight) has the same practice meaning
>>>>>> non-8 bits byte is supported but there are no tests in upstream and we have
>>>>>> downstream contributors who will fix the bugs if they appear in the LLVM
>>>>>> core.
>>>>>> > David Chisnall raised a question about what to count as a byte
>>>>>> (which defines the scope of the changes) and we suggest to use all 5
>>>>>> criteria he granted:
>>>>>> > > - The smallest unit that can be loaded / stored at a time.
>>>>>> > > - The smallest unit that can be addressed with a raw pointer in a
>>>>>> specific address space.
>>>>>> > > - The largest unit whose encoding is opaque to anything above the
>>>>>> ISA.
>>>>>> > > - The type used to represent `char` in C.
>>>>>> > > - The type that has a size that all other types are a multiple of.
>>>>>> > But if DSPs are less restrictive about byte, some of the criteria
>>>>>> could be removed.
>>>>>> >
>>>>>> > 2. Use an iconic target. PDP10 was suggested as a candidate. This
>>>>>> opinion found support from Tim Northover, Joerg Sonenberger, Mehdi AMINI,
>>>>>> Philip Reames. It's not clear though does this opinion oppose upstreaming
>>>>>> non-8-bits byte without tests or just a dummy and TVM targets options.
>>>>>> >
>>>>>> > So if there is no strong opposition to the solution 1 from the
>>>>>> people supporting an iconic target option, we could probably move to the
>>>>>> patches.
>>>>>>
>>>>>> I'm in camp (2). Any changes that are not tested are an invitation to
>>>>>> upstream developers to "simplify" the code, not knowing that those changes
>>>>>> are important. Anyone who commits untested changes to LLVM will inevitably
>>>>>> face an uphill battle against benevolent NFC refactorings that break these
>>>>>> changes because the expectation of how the code is supposed to behave is
>>>>>> not codified in a test. In the short term option (1) sounds more appealing
>>>>>> because they can start right away, but I'm going to predict that it will be
>>>>>> more expensive for the downstream maintainers of non 8-bit targets in the
>>>>>> long term.
>>>>>>
>>>>>
>>>>> I've worked on multiple codebases where an option existed in order to
>>>>> satisfy an extremely small userbase, with little or no testing,
>>>>>
>>>>
>>>> In those situations, were the core developers responsible for those
>>>> features/users? Yeah, if I needed to support a certain observable feature
>>>> of clang continuing to work, I'd want tests (I'm pretty serious about
>>>> testing, FWIW).
>>>>
>>>>
>>>>> and as such, I'm adamantly opposed to repeating it.
>>>>>
>>>>
>>>>>
>>>> In addition to what Adrian said, where the weird option exists but is
>>>>> constantly being incidentally broken, I've seen the opposite problem:
>>>>> people become afraid to refactor a section of code because it might break
>>>>> the weird option.  You might say "if there aren't any tests, people should
>>>>> feel free to refactor the code; their only responsibility is to make sure
>>>>> the tests will still work", but honestly, I've seen the opposite: without
>>>>> tests, it's just presumed that touching certain parts of code is likely to
>>>>> break things, and so after a time, an aura of untouchability starts to
>>>>> surround regions of the code.  And then, the more time goes by, the more
>>>>> that code becomes unfamiliar to everyone (because no one is actively
>>>>> maintaining it).  In the long run, the cost of an unmaintained option may
>>>>> be far more than the cost of a maintained one.
>>>>>
>>>>
>>>>
>>>> I'm not actually opposed to this situation - LLVM as a project is
>>>> pretty happy about making big structural changes to the codebase & holding
>>>> the test coverage and downstream users accountable for ensuring quality. We
>>>> rarely avoid changes due to risk of breakage & as a community push back a
>>>> fair bit on reviewers suggesting we should - if someone can't demonstrate
>>>> the breakage in an upstream test (or has a pretty good track record of true
>>>> positives that may take some time to investigate - and thus it might be
>>>> better in the short term to revert while waiting for that evidence to be
>>>> provided) the changes tend to go in and stay in.
>>>>
>>>> Yeah, I think some parts of the code may become complicated enough to
>>>> warrant separate testing - but most of the code that might move to
>>>> constants for byte width, iterate over bits to that byte width, etc, will
>>>> be tested on one value & might have bugs on other values that will be found
>>>> (or not) downstream - best effort and all. But in cases where the code to
>>>> handle novel byte widths becomes more complicated - some abstraction and
>>>> unit testing would seem quite appropriate.
>>>>
>>>>
>>>>
>>>>> In short: please don't commit changes without tests.  Even if the test
>>>>> is nothing but making sure this works:
>>>>>
>>>>> int main(int argc, char *argv[]) {
>>>>>   return argv[argc - 1][0];
>>>>> }
>>>>>
>>>>> That at least would give some freedom from the guilt of breaking
>>>>> something important.
>>>>>
>>>>
>>>> It's hard to make sure that works in a meaningful sense in this context
>>>> - without a non-8-bit-byte target in upstream LLVM, which is the point of
>>>> contention/discussion. It's unclear if there's a suitable target/community
>>>> to provide/maintain such a target upstream. I don't think there's a
>>>> "cheap"/stub/trivial target we could create that would provide what you're
>>>> suggesting without bitrotting quickly and being removed (more quickly than,
>>>> I think, the sort of patches to support but not provide, non-8-bit-byte
>>>> targets).
>>>>
>>>> Though it's hard to guess without seeing the sort of patches that'd be
>>>> needed.
>>>>
>>>> - Dave
>>>>
>>>>
>>>>>
>>>>> -- Jorg
>>>>>
>>>>> _______________________________________________
>>>>> LLVM Developers mailing list
>>>>> llvm-dev at lists.llvm.org
>>>>> https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>>>>>
>>>> _______________________________________________
>>>> LLVM Developers mailing list
>>>> llvm-dev at lists.llvm.org
>>>> https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>>>>
>>> _______________________________________________
>> LLVM Developers mailing list
>> llvm-dev at lists.llvm.org
>> https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20191105/0079b78b/attachment.html>