[llvm-dev] RFC: On non 8-bit bytes and the target for it

Tue Nov 5 17:20:36 PST 2019

> That's already the world they live in today having out of tree targets
and especially ones with non-8-bit-bytes. I don't think upstream can sign
up for any stronger support here in that situation.

I absolutely agree.

On Tue, 5 Nov 2019 at 17:02, David Blaikie <dblaikie at gmail.com> wrote:

>
>
> On Tue, Nov 5, 2019 at 4:35 PM Sean Silva <chisophugis at gmail.com> wrote:
>
>>
>>
>> On Mon, Nov 4, 2019 at 4:07 PM David Blaikie via llvm-dev <
>> llvm-dev at lists.llvm.org> wrote:
>>
>>>
>>>
>>> On Mon, Nov 4, 2019 at 4:00 PM James Molloy <james at jamesmolloy.co.uk>
>>> wrote:
>>>
>>>> Hi,
>>>>
>>>> To throw my hat into the ring, we also maintain a downstream target
>>>> that has a subset of this problem - it has non-8-bit-addressable memory.
>>>> This means that %uglygeps don't work, pointers can't be casted to i8* and
>>>> expect to be byte-addressed (conversions to memcpy/memset that use i8*
>>>> aren't welcome either), and GEP lowering code is slightly different.
>>>>
>>>> We maintain this currently as a pile of known technical debt. We have a
>>>> CGP pass to decompose GEPs and custom-expand them taking our word size into
>>>> account, but everything before that is just waiting for InstCombine to
>>>> break something. AliasAnalysis also makes offsetting assumptions that are
>>>> totally bogus because our pointers are word-addressed and therefore so are
>>>> pointer offsets.
>>>>
>>>> We'd be really keen to help here. We're keen to upstream anything we
>>>> possibly can (and have been, over the past few months). We've have several
>>>> discussions about how best to approach upstream with this, and the sticking
>>>> point has always been lack of a testing target. It's always felt to me that
>>>> the idea of addressable granule should be a fairly reasonable DataLayout
>>>> addition; We can test DataLayout changes purely via opt without requiring a
>>>> target that uses them. Lowering to instructions was always the testing
>>>> sticking point.
>>>>
>>>> We'd be keen to help out what the community decides to do here. I
>>>> personally feel it's reasonable that:
>>>>   - LangRef/DataLayout is updated with semantically coherent changes.
>>>>   - The midend optimizer is updated by someone who cares about those
>>>> changes and tests are added that use the new DataLayout.
>>>>   - Developers that don't care about those changes maintain a
>>>> best-effort approach, which is exactly what we do right now; there are
>>>> features that are tested but are still esoteric enough that I might
>>>> reasonably break them without realising (OperandBundles come to mind), so I
>>>> don't think there's any change in mindset here.
>>>>   - Developers that care perform downstream testing and provide review
>>>> feedback / revert if really bad / fixes. Again, this is how LLVM works
>>>> right now - I'd guess that >80% of our real-world test coverage comes from
>>>> downstream users deploying ToT LLVM rather than the upstream LIT tests /
>>>> builders.
>>>>
>>>
>>> This last bit is the bit I'd worry about if the bug were only visible in
>>> an out-of-tree target. If it's visible with a DataLayout change that can be
>>> tested upstream, then I'd be OK with an upstream patch being reverted given
>>> a test case with custom DataLayout showing the failure.
>>>
>>> But if the failure is for, say, non-8-bit-bytes that are only actually
>>> exercised downstream, I'm less OK with patches being reverted upstream no
>>> matter the severity of the breakage to such targets downstream.
>>>
>>
>> Suppose that the failure is build breakage for the downstream target.
>> This could cause the entirety of the downstream LLVM/Clang build to break
>> in a way that isn’t easy to untangle or fix. This could prevent an entire
>> organization from integrating LLVM for an extended period of time. So it
>> isn’t necessarily “no matter the severity of the breakage for such
>> *targets* downstream” but could be the entire organization unless a
>> tactical downstream revert of the offending change can be made.
>>
>
> That's already the world they live in today having out of tree targets and
> especially ones with non-8-bit-bytes. I don't think upstream can sign up
> for any stronger support here in that situation.
>
>
>>
>> — Sean Silva
>>
>>
>>> - Dave
>>>
>>>
>>>
>>>>
>>>> Cheers,
>>>>
>>>> James
>>>>
>>>> On Mon, 4 Nov 2019 at 12:16, David Blaikie via llvm-dev <
>>>> llvm-dev at lists.llvm.org> wrote:
>>>>
>>>>>
>>>>>
>>>>> On Sat, Nov 2, 2019 at 12:45 AM Jorg Brown via llvm-dev <
>>>>> llvm-dev at lists.llvm.org> wrote:
>>>>>
>>>>>> On Fri, Nov 1, 2019 at 8:40 AM Adrian Prantl via llvm-dev <
>>>>>> llvm-dev at lists.llvm.org> wrote:
>>>>>>
>>>>>>> > On Nov 1, 2019, at 3:41 AM, Dmitriy Borisenkov via llvm-dev <
>>>>>>> llvm-dev at lists.llvm.org> wrote:
>>>>>>> > It seems that there are two possible solutions on how to move
>>>>>>> forward with non 8 bits byte:
>>>>>>> >
>>>>>>> > 1. Commit changes without tests. Chris Lattner, Mikael Holmen,
>>>>>>> Jeroen Dobbelaere, Jesper Antonsson support this idea.
>>>>>>> > James Y Knight says that at least magic numbers should be removed
>>>>>>> "at least where it arguably helps code clarity". This might be not exactly
>>>>>>> the scope of the changes discussed, but it's probably worth do discuss code
>>>>>>> clarity having concrete patches.
>>>>>>> > GCC (according to James Y Knight) has the same practice meaning
>>>>>>> non-8 bits byte is supported but there are no tests in upstream and we have
>>>>>>> downstream contributors who will fix the bugs if they appear in the LLVM
>>>>>>> core.
>>>>>>> > David Chisnall raised a question about what to count as a byte
>>>>>>> (which defines the scope of the changes) and we suggest to use all 5
>>>>>>> criteria he granted:
>>>>>>> > > - The smallest unit that can be loaded / stored at a time.
>>>>>>> > > - The smallest unit that can be addressed with a raw pointer in
>>>>>>> a specific address space.
>>>>>>> > > - The largest unit whose encoding is opaque to anything above
>>>>>>> the ISA.
>>>>>>> > > - The type used to represent `char` in C.
>>>>>>> > > - The type that has a size that all other types are a multiple
>>>>>>> of.
>>>>>>> > But if DSPs are less restrictive about byte, some of the criteria
>>>>>>> could be removed.
>>>>>>> >
>>>>>>> > 2. Use an iconic target. PDP10 was suggested as a candidate. This
>>>>>>> opinion found support from Tim Northover, Joerg Sonenberger, Mehdi AMINI,
>>>>>>> Philip Reames. It's not clear though does this opinion oppose upstreaming
>>>>>>> non-8-bits byte without tests or just a dummy and TVM targets options.
>>>>>>> >
>>>>>>> > So if there is no strong opposition to the solution 1 from the
>>>>>>> people supporting an iconic target option, we could probably move to the
>>>>>>> patches.
>>>>>>>
>>>>>>> I'm in camp (2). Any changes that are not tested are an invitation
>>>>>>> to upstream developers to "simplify" the code, not knowing that those
>>>>>>> changes are important. Anyone who commits untested changes to LLVM will
>>>>>>> inevitably face an uphill battle against benevolent NFC refactorings that
>>>>>>> break these changes because the expectation of how the code is supposed to
>>>>>>> behave is not codified in a test. In the short term option (1) sounds more
>>>>>>> appealing because they can start right away, but I'm going to predict that
>>>>>>> it will be more expensive for the downstream maintainers of non 8-bit
>>>>>>> targets in the long term.
>>>>>>>
>>>>>>
>>>>>> I've worked on multiple codebases where an option existed in order to
>>>>>> satisfy an extremely small userbase, with little or no testing,
>>>>>>
>>>>>
>>>>> In those situations, were the core developers responsible for those
>>>>> features/users? Yeah, if I needed to support a certain observable feature
>>>>> of clang continuing to work, I'd want tests (I'm pretty serious about
>>>>> testing, FWIW).
>>>>>
>>>>>
>>>>>> and as such, I'm adamantly opposed to repeating it.
>>>>>>
>>>>>
>>>>>>
>>>>> In addition to what Adrian said, where the weird option exists but is
>>>>>> constantly being incidentally broken, I've seen the opposite problem:
>>>>>> people become afraid to refactor a section of code because it might break
>>>>>> the weird option.  You might say "if there aren't any tests, people should
>>>>>> feel free to refactor the code; their only responsibility is to make sure
>>>>>> the tests will still work", but honestly, I've seen the opposite: without
>>>>>> tests, it's just presumed that touching certain parts of code is likely to
>>>>>> break things, and so after a time, an aura of untouchability starts to
>>>>>> surround regions of the code.  And then, the more time goes by, the more
>>>>>> that code becomes unfamiliar to everyone (because no one is actively
>>>>>> maintaining it).  In the long run, the cost of an unmaintained option may
>>>>>> be far more than the cost of a maintained one.
>>>>>>
>>>>>
>>>>>
>>>>> I'm not actually opposed to this situation - LLVM as a project is
>>>>> pretty happy about making big structural changes to the codebase & holding
>>>>> the test coverage and downstream users accountable for ensuring quality. We
>>>>> rarely avoid changes due to risk of breakage & as a community push back a
>>>>> fair bit on reviewers suggesting we should - if someone can't demonstrate
>>>>> the breakage in an upstream test (or has a pretty good track record of true
>>>>> positives that may take some time to investigate - and thus it might be
>>>>> better in the short term to revert while waiting for that evidence to be
>>>>> provided) the changes tend to go in and stay in.
>>>>>
>>>>> Yeah, I think some parts of the code may become complicated enough to
>>>>> warrant separate testing - but most of the code that might move to
>>>>> constants for byte width, iterate over bits to that byte width, etc, will
>>>>> be tested on one value & might have bugs on other values that will be found
>>>>> (or not) downstream - best effort and all. But in cases where the code to
>>>>> handle novel byte widths becomes more complicated - some abstraction and
>>>>> unit testing would seem quite appropriate.
>>>>>
>>>>>
>>>>>
>>>>>> In short: please don't commit changes without tests.  Even if the
>>>>>> test is nothing but making sure this works:
>>>>>>
>>>>>> int main(int argc, char *argv[]) {
>>>>>>   return argv[argc - 1][0];
>>>>>> }
>>>>>>
>>>>>> That at least would give some freedom from the guilt of breaking
>>>>>> something important.
>>>>>>
>>>>>
>>>>> It's hard to make sure that works in a meaningful sense in this
>>>>> context - without a non-8-bit-byte target in upstream LLVM, which is the
>>>>> point of contention/discussion. It's unclear if there's a suitable
>>>>> target/community to provide/maintain such a target upstream. I don't think
>>>>> there's a "cheap"/stub/trivial target we could create that would provide
>>>>> what you're suggesting without bitrotting quickly and being removed (more
>>>>> quickly than, I think, the sort of patches to support but not provide,
>>>>> non-8-bit-byte targets).
>>>>>
>>>>> Though it's hard to guess without seeing the sort of patches that'd be
>>>>> needed.
>>>>>
>>>>> - Dave
>>>>>
>>>>>
>>>>>>
>>>>>> -- Jorg
>>>>>>
>>>>>> _______________________________________________
>>>>>> LLVM Developers mailing list
>>>>>> llvm-dev at lists.llvm.org
>>>>>> https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>>>>>>
>>>>> _______________________________________________
>>>>> LLVM Developers mailing list
>>>>> llvm-dev at lists.llvm.org
>>>>> https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>>>>>
>>>> _______________________________________________
>>> LLVM Developers mailing list
>>> llvm-dev at lists.llvm.org
>>> https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>>>
>>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20191105/195a33ba/attachment.html>