[llvm-dev] RFC: On non 8-bit bytes and the target for it

Mon Nov 4 16:00:06 PST 2019

Hi,

To throw my hat into the ring, we also maintain a downstream target that
has a subset of this problem - it has non-8-bit-addressable memory. This
means that %uglygeps don't work, pointers can't be casted to i8* and expect
to be byte-addressed (conversions to memcpy/memset that use i8* aren't
welcome either), and GEP lowering code is slightly different.

We maintain this currently as a pile of known technical debt. We have a CGP
pass to decompose GEPs and custom-expand them taking our word size into
account, but everything before that is just waiting for InstCombine to
break something. AliasAnalysis also makes offsetting assumptions that are
totally bogus because our pointers are word-addressed and therefore so are
pointer offsets.

We'd be really keen to help here. We're keen to upstream anything we
possibly can (and have been, over the past few months). We've have several
discussions about how best to approach upstream with this, and the sticking
point has always been lack of a testing target. It's always felt to me that
the idea of addressable granule should be a fairly reasonable DataLayout
addition; We can test DataLayout changes purely via opt without requiring a
target that uses them. Lowering to instructions was always the testing
sticking point.

We'd be keen to help out what the community decides to do here. I
personally feel it's reasonable that:
  - LangRef/DataLayout is updated with semantically coherent changes.
  - The midend optimizer is updated by someone who cares about those
changes and tests are added that use the new DataLayout.
  - Developers that don't care about those changes maintain a best-effort
approach, which is exactly what we do right now; there are features that
are tested but are still esoteric enough that I might reasonably break them
without realising (OperandBundles come to mind), so I don't think there's
any change in mindset here.
  - Developers that care perform downstream testing and provide review
feedback / revert if really bad / fixes. Again, this is how LLVM works
right now - I'd guess that >80% of our real-world test coverage comes from
downstream users deploying ToT LLVM rather than the upstream LIT tests /
builders.

Cheers,

James

On Mon, 4 Nov 2019 at 12:16, David Blaikie via llvm-dev <
llvm-dev at lists.llvm.org> wrote:

>
>
> On Sat, Nov 2, 2019 at 12:45 AM Jorg Brown via llvm-dev <
> llvm-dev at lists.llvm.org> wrote:
>
>> On Fri, Nov 1, 2019 at 8:40 AM Adrian Prantl via llvm-dev <
>> llvm-dev at lists.llvm.org> wrote:
>>
>>> > On Nov 1, 2019, at 3:41 AM, Dmitriy Borisenkov via llvm-dev <
>>> llvm-dev at lists.llvm.org> wrote:
>>> > It seems that there are two possible solutions on how to move forward
>>> with non 8 bits byte:
>>> >
>>> > 1. Commit changes without tests. Chris Lattner, Mikael Holmen, Jeroen
>>> Dobbelaere, Jesper Antonsson support this idea.
>>> > James Y Knight says that at least magic numbers should be removed "at
>>> least where it arguably helps code clarity". This might be not exactly the
>>> scope of the changes discussed, but it's probably worth do discuss code
>>> clarity having concrete patches.
>>> > GCC (according to James Y Knight) has the same practice meaning non-8
>>> bits byte is supported but there are no tests in upstream and we have
>>> downstream contributors who will fix the bugs if they appear in the LLVM
>>> core.
>>> > David Chisnall raised a question about what to count as a byte (which
>>> defines the scope of the changes) and we suggest to use all 5 criteria he
>>> granted:
>>> > > - The smallest unit that can be loaded / stored at a time.
>>> > > - The smallest unit that can be addressed with a raw pointer in a
>>> specific address space.
>>> > > - The largest unit whose encoding is opaque to anything above the
>>> ISA.
>>> > > - The type used to represent `char` in C.
>>> > > - The type that has a size that all other types are a multiple of.
>>> > But if DSPs are less restrictive about byte, some of the criteria
>>> could be removed.
>>> >
>>> > 2. Use an iconic target. PDP10 was suggested as a candidate. This
>>> opinion found support from Tim Northover, Joerg Sonenberger, Mehdi AMINI,
>>> Philip Reames. It's not clear though does this opinion oppose upstreaming
>>> non-8-bits byte without tests or just a dummy and TVM targets options.
>>> >
>>> > So if there is no strong opposition to the solution 1 from the people
>>> supporting an iconic target option, we could probably move to the patches.
>>>
>>> I'm in camp (2). Any changes that are not tested are an invitation to
>>> upstream developers to "simplify" the code, not knowing that those changes
>>> are important. Anyone who commits untested changes to LLVM will inevitably
>>> face an uphill battle against benevolent NFC refactorings that break these
>>> changes because the expectation of how the code is supposed to behave is
>>> not codified in a test. In the short term option (1) sounds more appealing
>>> because they can start right away, but I'm going to predict that it will be
>>> more expensive for the downstream maintainers of non 8-bit targets in the
>>> long term.
>>>
>>
>> I've worked on multiple codebases where an option existed in order to
>> satisfy an extremely small userbase, with little or no testing,
>>
>
> In those situations, were the core developers responsible for those
> features/users? Yeah, if I needed to support a certain observable feature
> of clang continuing to work, I'd want tests (I'm pretty serious about
> testing, FWIW).
>
>
>> and as such, I'm adamantly opposed to repeating it.
>>
>
>>
> In addition to what Adrian said, where the weird option exists but is
>> constantly being incidentally broken, I've seen the opposite problem:
>> people become afraid to refactor a section of code because it might break
>> the weird option.  You might say "if there aren't any tests, people should
>> feel free to refactor the code; their only responsibility is to make sure
>> the tests will still work", but honestly, I've seen the opposite: without
>> tests, it's just presumed that touching certain parts of code is likely to
>> break things, and so after a time, an aura of untouchability starts to
>> surround regions of the code.  And then, the more time goes by, the more
>> that code becomes unfamiliar to everyone (because no one is actively
>> maintaining it).  In the long run, the cost of an unmaintained option may
>> be far more than the cost of a maintained one.
>>
>
>
> I'm not actually opposed to this situation - LLVM as a project is pretty
> happy about making big structural changes to the codebase & holding the
> test coverage and downstream users accountable for ensuring quality. We
> rarely avoid changes due to risk of breakage & as a community push back a
> fair bit on reviewers suggesting we should - if someone can't demonstrate
> the breakage in an upstream test (or has a pretty good track record of true
> positives that may take some time to investigate - and thus it might be
> better in the short term to revert while waiting for that evidence to be
> provided) the changes tend to go in and stay in.
>
> Yeah, I think some parts of the code may become complicated enough to
> warrant separate testing - but most of the code that might move to
> constants for byte width, iterate over bits to that byte width, etc, will
> be tested on one value & might have bugs on other values that will be found
> (or not) downstream - best effort and all. But in cases where the code to
> handle novel byte widths becomes more complicated - some abstraction and
> unit testing would seem quite appropriate.
>
>
>
>> In short: please don't commit changes without tests.  Even if the test is
>> nothing but making sure this works:
>>
>> int main(int argc, char *argv[]) {
>>   return argv[argc - 1][0];
>> }
>>
>> That at least would give some freedom from the guilt of breaking
>> something important.
>>
>
> It's hard to make sure that works in a meaningful sense in this context -
> without a non-8-bit-byte target in upstream LLVM, which is the point of
> contention/discussion. It's unclear if there's a suitable target/community
> to provide/maintain such a target upstream. I don't think there's a
> "cheap"/stub/trivial target we could create that would provide what you're
> suggesting without bitrotting quickly and being removed (more quickly than,
> I think, the sort of patches to support but not provide, non-8-bit-byte
> targets).
>
> Though it's hard to guess without seeing the sort of patches that'd be
> needed.
>
> - Dave
>
>
>>
>> -- Jorg
>>
>> _______________________________________________
>> LLVM Developers mailing list
>> llvm-dev at lists.llvm.org
>> https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>>
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20191104/5d30fc49/attachment.html>