<div dir="ltr"><div dir="ltr">On Fri, Nov 1, 2019 at 8:43 AM Robinson, Paul via llvm-dev <<a href="mailto:llvm-dev@lists.llvm.org">llvm-dev@lists.llvm.org</a>> wrote:<br></div><div class="gmail_quote"><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">Did somebody say "PDP-10"?  😊<br>

<br>

> David Chisnall raised a question about what to count as a byte<br>

> (which defines the scope of the changes) and we suggest to use<br>

> all 5 criteria he granted:<br>

> > - The smallest unit that can be loaded / stored at a time.<br>

> > - The smallest unit that can be addressed with a raw pointer<br>

>     in a specific address space.<br>

> > - The largest unit whose encoding is opaque to anything above<br>

>     the ISA.<br>

> > - The type used to represent `char` in C.<br>

> > - The type that has a size that all other types are a multiple<br>

>     of.<br>

> But if DSPs are less restrictive about byte, some of the criteria<br>

> could be removed.<br>

><br>

> 2. Use an iconic target. PDP10 was suggested as a candidate. This<br>

> opinion found support from Tim Northover, Joerg Sonenberger, Mehdi<br>

> AMINI, Philip Reames. It's not clear though does this opinion<br>

> oppose upstreaming non-8-bits byte without tests or just a dummy<br>

> and TVM targets options.<br>

<br>

Note that for the PDP-10, not all 5 criteria are the same thing.<br>

It is a word-addressed machine (36-bit words) but the ISA has<br>

instructions to handle 18-bit halfwords, and also defines a <br>

"byte pointer" to allow load/store of arbitrary-size bytes within <br>

a word.  Byte pointers allow any size byte that fits in a word <br>

(from 1 bit to 36 bits).  So what we have is:<br>

<br>

> - The smallest unit that can be loaded / stored at a time.<br>

<br>

This is 1 bit, from the ISA's perspective, using byte pointers.<br>

Obviously caches and such would be word-based, but that's not<br>

the point of this criterion.<br>

<br>

> - The smallest unit that can be addressed with a raw pointer<br>

    in a specific address space.<br>

<br>

On PDP-10, the naïve interpretation of "raw pointer" would be<br>

a simple memory address, so this is a 36-bit word.  (Halfword <br>

access uses different instructions to move the upper or lower <br>

halfwords; it's not encoded in the address.)<br>

Note that `char *` is not a "raw pointer" in this sense; it is<br>

a byte pointer.<br>

<br>

> - The largest unit whose encoding is opaque to anything above<br>

    the ISA.<br>

<br>

I am not clear what this actually means.  I could interpret it<br>

as a double-word floating point, but I doubt that was what was<br>

intended.<br>

<br>

> - The type used to represent `char` in C.<br>

<br>

tl;dr: 7-bit byte.<br>

<br>

C is hard to map to PDP-10. DEC did not provide a compiler,<br>

although I was aware of a third-party C compiler; it used 7-bit <br>

ASCII for `char` which was the most typical character size on <br>

that machine.  (Sixbit was also used frequently, if you didn't<br>

need lowercase or many special characters, e.g. for filenames.<br>

8-bit ASCII was uncommon, unless you were forced into doing<br>

data transfers to those newfangled PDP-11 and VAX things.)<br>

This means that `char *` and `int *` had different formats, the<br>

former being a byte pointer and the latter being an address;<br>

casting was not free.<br>

<br>

> - The type that has a size that all other types are a multiple<br>

    of.<br>

<br>

Discounting 'char' and strings, I'd have to say this would be<br>

the 36-bit word, i.e. 'int'.<br></blockquote><div><br></div><div>Fascinating.</div><div><br></div><div>So, a 36-bit word could contain 6 Sixbits, 5 7-bit ASCII characters, or 4 8-bit ASCII characters for communicating with later DEC machines?</div><div><br></div><div>I was going to ask what the compiler does when it sees "Hello World"... but since DEC didn't provide a compiler, I suppose there can't be an answer to that...</div><div><br></div><div>I would say that it's critical for memcpy to work well enough that it copies all the bits, which to me means that the size of a "word" has to be a multiple of whatever 'char' is.  That rules out both 8-bit chars and 7-bit chars.  I would say your only choices are:<br></div><div><br></div><div>1 bit</div><div>6 bits</div><div>9 bits</div><div>36 bits</div><div><br></div><div>3, 4, 12, and 18 also evenly divide 36, but I don't see any compelling reason to want them.</div><div><br></div><div>A 9-bit char would have some use if 8-bit characters were only packed 4-to-a-word.</div><div><br></div><div>A 1-bit char would be awesome because then you might end up with the only architecture in the world where vector<bool> wasn't an abomination.  I guess then that "Hello World" might have a size of 77 chars (84 counting the NUL), assuming that the compiler treated 7-bit as the preferred encoding.</div><div><br></div><div>...</div><div><br></div><div>Thanks for the lesson.  I have a very dim recollection of programming a PDP in college... apparently blissfully unaware of word sizes... which makes me think it was probably an 11/70.</div><div><br></div><div>-- Jorg</div></div></div>