[llvm-commits] Struct-handling patches

Wed Oct 1 00:41:05 PDT 2008

> I think this is fine. It's a tricky area, and I expect it'll be  something
> we'll re-evaluate over time, but I think it's reasonable for now.
Good.

> Instead of isa + cast you can use
>    if (const ArrayType *ATy = dyn_cast<ArrayType>(*I)) {
> 
> Also, the word "Otherwise" in the comment no longer makes sense,
> given the new structure of the code.
I initially had this code structured differently, and apparently haven't taken
the time to remove this issues after restructuring. Will do.

> I agree with Duncan that this doesn't sound safe in the case of
> structs with holes.
Yeah, I hadn't thought of that. However, doesn't this problem arise in case of
a memcpy of a {double}. I'm completely unsure of how this works, but if double
is the same as fp80, ie a 10 byte value, and the struct, and thus a memcpy of
it, gets a size of 16 bytes (due to alignment?), then replacing sucy a memcpy
with load + store will skip the 6-byte hole at the end of the struct, right?
It might very well be that any or all of the assumptions in the above are
wrong, though.

> > This last patch is not completely useful by itself yet, since  scalarrepl
> > doesn't know how to handle struct loads and stores yet :-) I'll have  a
> > look at that tomorrow.
In fact, it turns out that scalarrepl does handle struct loads and stores
already (at least in the testcase I was devising to show it didn't).

> Would it make sense to teach scalarrepl how to handle llvm.memcpy? The  code
> above in instcombine only looks at objects that are 8 bytes or smaller,
> which is somewhat arbitrary when it comes to structs. We don't want to
> translate large struct copies to single loads and stores, but it seems there
> will be a lot of interesting cases with structs slightly larger  than 8
> bytes.
scalarrepl already knows how to handle memcpy. However, my problems arose
because before scalarrepl looks at the code, instcombine had already replaced
the memcpy with a load+store of i32 or i64, which scalarrepl didn't like.

As an example, when we start out with the following. I didn't actually test
this code, I'm just writing down what I remember happening.

	%A = alloca { i16, i16 }
	%B = alloca { i16, i16 }
	...
	%Ap = bitcast {i16, i16}* %A to i8*
	%Bp = bitcast {i16, i16}* %B to i8*
	call @llvm.memcpy(i8* %Ap, i8* %Bp, i32 4, i32 1)

Instcombine changes this to:

	%A = alloca { i16, i16 }
	%B = alloca { i16, i16 }
	...
	%Ap = bitcast {i16, i16}* %A to i32*
	%Bp = bitcast {i16, i16}* %B to i32*
	%V = load i32* %Bp
	store i32 %V, i32* %Ap

Scalarrepl no longer knows how to handle this, and leaves the alloca's intact.
Scalarrepl might be able to change this, perhaps like this:

	%A = alloca { i16, i16 }
	...
	%Ap = bitcast {i16, i16}* %A to i32*
	%V = load i32* %Bp
	%Vs = bitcast i32 %V to {i16, i16}
	%B0 = extractvalue {i16, i16} %Vs, i32 0
	%B1 = extractvalue {i16, i16} %Vs, i32 1

Here, the %B alloca is replaced, but scalarrepl still keeps the i32 intact by
bitcasting from it (or to it in the reverse case). If %A is also replaced in
the above, the resulting bitcasts, insertvalue and extractvalue instructions
should vanish and the code should become pretty.

Alternatively, scalarrepl could look two steps further and also include the
load and bitcast in the replace, resulting in:

	%A = alloca { i16, i16 }
	...
	%A0 = getelementptr {i16, i16} %A, i32 0, i32 0
	%A1 = getelementptr {i16, i16} %A, i32 0, i32 1
	%B0 = load i16* %A0
	%B1 = load i16* %A1

The latter approach has the advantage of removing the i32 stuff straight away,
but might be less flexible in situations that are slightly different from the
above example.

In both of the above options, the holes in a struct might not be relevant
anymore. Once scalarrepl replaces a struct, the holes disappear, and only the
elements remain. Then there appears to be no need to preserve the
hole-copying.

However, the exception to this seems to be when a struct is both loaded from
and stored to (using a type like i32, which also reads or writes holes. This
also goes for doing both a memcpy in and out of a struct, so I suspect that
scalarrepl has some way of handling this properly (or might it be bugged now?)

> Actually, it may some day make sense to use single loads and stores
> instead of memcpy for all structs, but before we can do that we'd need
> at least need to teach codegen how to effectively turn them back into
> memcpy in most cases. Offhand I don't know what other changes would
> be needed to avoid major pessimizations.
I think the main problem that exists is that the frontend now emits memcpy's
and thus requires holes in structs to be copied, even when this isn't even
required. Having the frontend emit something with completely correct semantics
probably makes things a lot easier in the backend (however, as I suggested
here [1] it could be even better to have some way to express "copy this
struct, and you may or may not copy holes"). Also, this discussion is become
more and more alike to the struct-copy thread in cfe-dev [1].

Gr.

Matthijs

[1]: http://lists.cs.uiuc.edu/pipermail/cfe-dev/2008-September/002920.html