[LLVMdev] Fwd: Segment Register Use

Wed Jul 25 09:14:34 PDT 2007

fucking hell, listserv...

---------- Forwarded message ----------
From: "Wilfred L. Guerin" <wilfredguerin at gmail.com>
Date: Wed, 25 Jul 2007 10:54:46 -0500
Subject: Re: [LLVMdev] Segment Register Use
To: Holger Schurig <hs4233 at mail.mn-solutions.de>

I was very much expecting this style of response ;)

I believe the following characteristics and class of example should
demonstrate the concerns in all models, as I know significant research
in massive data tables has been undertaken for decades...

First:

In favouring linear processing paths, it is more common to use a
computational sequencing method and never use a push/pop stack under
any circumstance (4:1 opt)

Partitioning of seg registers is good for security but irrelevant in
this example, both the code space and data space use identical
management models.

Methods:

A sequential lookup table (turnary tree, reference table, "nibbler")
is used for memory lookup using conventional infinite-bitlength
identifiers using any size partition and turnary size...

table has id1, it has 256 blocks (mem ptr or similar ID), value of
turnary is n (of 8bit) and size of block is static per table.
regardless of using id based lookups for data or memory locations, the
process is identical:

shift turnary scope, and turnary 0x0scope, mul(shl) by block size, get
offset value, (typically confirm need to further recurse) then repeat
with turnary until done.

The final memory position is commonly either assigned as the base
pointer for the requested data structure or code offset.

Code is compiled for local memory values (static) relative cs and
known memory locations for the data in the modular process (ds)

.....

Here the model becomes critical:

In complex vector comparitors, high dimension math, and dense data
sets (with potentially distributed media storage being loaded), a
limited processor has few options (ia for now).

To compare characteristics of 2 or more "vectors" and write the result
to a new data container, it needs obvious access to these locations.

ia flat memory model makes this easy, and well partitoned cs sector
(isolated like buffers) max available speed.

The static-seeded tables are recursed, the result location is set to
the offset register accordingly.

let's assume code does something stupid like basic math or propogate
the larger value from the sources.

Using DS is obvious, but also due to inlined control methods and
predictable flow routing and modeling, the stack offset register as
well as a few others are available.

Depending on processor, the computational registers can use these
segments the way they were designed.

ia32 requires one operand of compare in a comp register, others can
handle offset-offset compare in one cycle.

compreg = ds+numofblock*width

when you are using TWO stack pointers, ds, and whatever else
available, you end up with all available data for the vector
comparison within one flop and an offset within the vector with one
rol function on the control register.

most accumulators can handle this easily.

...

There are many names for this, but it differs from a common vector
math inline because of processing method.

I forget the common textbook term.

...

The named variable convention in llvm is desirable because it allows
the selection of which base pointer to use for each table reference
based on what computaions go there. One of the ia32 is grand for
writing to*, but cant be used to move mem to reg.

Obviously this is an intentional design.

Optimizer should handle this based on machine type and a model of
hardware capabilities.

(again back to HDL and the anchient ISDL specs for modeling.. but
avoiding the new uml except at higher layers)

so.

a table that resolves to various data blocks, set the base pointers
accordingly, then process.

morphic code and code coppied to a new cs for runtime (load to chip)
can have its own buffers, stacks are idiotic. (this directs
post-operation routing and allows distributed processing when using
the extra table with ids)

in short, trying to do such a thing for complex data sets, especially
when the result set or value dictates where to go next, using the
linear sequence lookup to a register as defined in most current
compilers, simply cant work.

Doing so is (assume 2 usable registers for data)
(lookupflops*numdatablocks)^2 for math processing.

Even linearized, you need a test for eol using stack and ... well
stacks break with recursion.

Im sure there are documents describing this type and other similar
models that function transparently on any hardware host.

Obviously preparing large data sets in this manner is required to
optimize loading of limited memory space processors like the GPU
boards when using conventional CPU dependant controllers. (good
example is ia64 with nvidia quad gpu)

Hopefully someone can give a link to a simple overview of this issue,
but im sure the concept is easy enough to understand.

-Wilfred L. Guerin
WilfredGuerin at gmail.com

On 7/25/07, Holger Schurig <hs4233 at mail.mn-solutions.de> wrote:
> > if your LLVM still depends on either Generic or some of the
> > RTL models they use in various processor definitions, I
> > express concern for optimization and compilation.
>
> Thank you for your concerns.
>
> However, LLVM is in the first place an environment to write
> compilers. As an example for llvm, it can use the GCC frontend
> for compilation. "clang" is another compile, that can compile
> programs without any call to any gcc part.
>
> To learn more about code-generation for x86 targets inside LLVM
> (e.g. without the help of GCC), look at those files:
>
> http://llvm.org/svn/llvm-project/llvm/trunk/lib/Target/X86
>
>
> Also, can you provide an example of the same program, once
> compiled with the use of CS/DS and once without this?  I mean:
> show use the assembly code. How far apart is the performance of
> those two test programs?  For which ABI can you compile with
> using CS/DS?  AFAIK a Linux environment disables this.
>
> Somehow your mail remembers me at times than I compiled under
> MSDOS and I had several memory models to select from,
> e.g. "tiny", "medium", "large", "flat" ...
>