[LLVMdev] Casting between address spaces and address space semantics

Tue Sep 16 15:26:51 PDT 2008

Hi Matthijs,

On Sep 15, 2008, at 5:13 AM, Matthijs Kooijman wrote:

> Hi Mon Ping,
>
>> If I remember correctly, I was also not fond of passing another
>> TargetAddrSpace reference to the TargetData object.  I was hoping  
>> that we
>> could encode the information as a target description string like we  
>> do for
>> ABI information.   I just don't want to end up with too many  
>> objects that
>> describe the machine. One can argue that we shouldn't pollute the
>> TargetData since it describes the ABI  alignment, size, and object
>> layoutbut I feel that this data fits naturally there.  If you and  
>> other
>> people feel it is cleaner with a separate pass, I'm fine with it.
>
> Perhaps encoding it in TargetData makes sense. However, I was  
> avoiding this
> for now, since Chris commented a while back that he wanted to have  
> it in
> TargetData "only if absolutely required".
>
> However, thinking of this a bit more I do see your point about  
> TargetData.
> Another interesting advantage is that it would be a lot easier to  
> make things
> consistent between clang and LLVM, by simply using the TargetData  
> string
> that gets embedded in the module.
>
> So, I guess embedding this info in TargetData makes more sense. How  
> would this
> look like? I would think of something like:
>
> as1:<2:=3:>4:!3
>
> This would mean address space 1 is a subset of 2, equivalent to 3, a  
> superset
> of 4 and disjoint with 3. A number of these could be present in a  
> TargetData
> string, to fully describe the situation. Any relations not described  
> mean
> disjoint. Relations can also be implicitely defined, ie, as1:<2- 
> as2:<3 also
> implies as1:<3.
>
> I'm not sure if the > should be present, since that's always  
> reversible. Also,
> ! is probably not so nice for disjoint, any other suggestions?
>

I'm thinking it is sufficient to just have subset and equivalence. If  
superset makes the description more concise, it might be worth to put  
in.  Since the default is disjoint, I don't think we need it.  BTW, in  
the example, I assume you are just showing syntax as the system would  
flag an error if 1 is equivalent to 3 and disjoint to 3.

> As to implementing this, I'm thinking of a equivalency table (for  
> every
> address space, store the equivalent address space with the lowest  
> id) and for
> each of those lowest id spaces in each equivalency group store a set  
> of
> address spaces that are a subset. This set should be complete, so  
> when the
> string says A > B > C, the set should store both B and C as subsets  
> of A.
>
> This allows for resolving the relation between two address spaces by  
> two
> lookups in the equivalency table and (one or) two lookups in the  
> subset table.
> No results in the subset table means the relation is disjoint, then.

I think this is a reasonable way of going here.

>
>
> Any comments on this? Chris, would this be acceptable?

>> I want to treat my next point with some delicacy as I don't want to  
>> start a
>> religious war.  I just want to get clarification from the community  
>> on the
>> use of multiple inheritance for the case of Phases like
>> AllDisjointAddrspaces.  From what I can gather, the use of multiple
>> inheritance is to separate the interface (TargetAddrSpace) to  
>> access data
>> from the interface of the phase (ImmutablePhase).  In this case,  
>> will we
>> ever create a concrete class from TargetAddrSpace that doesn't also  
>> derive
>> from ImmutablePass? If not, I don't think is worth using multiple
>> inheritance in this case.
> I think you are right here, changing the inheritance in this way  
> also works
> fine.
>
>>>> For the case of a GetElementPointer, we are replacing a bitcast  
>>>> to a
>>>> pointer, getelem with a getelem bitcast.  The assumption is the  
>>>> latter
>>>> bitcast will hopefully go away when we iterate through those uses.
>>> Uh? Is this a comment about what the current code or my patch  
>>> does, or what
>>> it should do? I don't understand what you mean here.
>> My comment was more on what I thought the patch did and I wanted to  
>> confirm
>> that it will cleanup newly generated bit cast that are created.
> In that case, yes, the newly generated bitcasts should be  
> iteratively cleaned
> up whenever possible.
>
>>> True, anyone actually using address space should make sure that  
>>> this info
>>> is correct anyway. So, no need for an unknown default?
>> That is my feeling.
> Ok.
>
>> No, you got  my point even though my example is not a good one. If  
>> the
>> address calculation was using a variable, I don't think we can fold  
>> it into
>> the GEP and we might lose this information.
> Ie, a variable that is stored to, you mean? In that case, the  
> address space is
> probably propagated until the store instruction. Perhaps it can even  
> be
> propagated through the store instruction, so it stores to a  
> bitcasted pointer
> (ie, bitcast i32 addrspace(2)* * to i32 addrspace(1) * *). I suspect  
> other
> parts of instcombine might handle this from here and change the  
> variable's
> type to i32 addrspace(1), if possible. I guess this is something for  
> later on.
>

I think I meant that the address space information needs to be  
propagated through any temporary that is used to hold the value that  
might be using a different address spaces.  In general, this isn't  
difficult but one just have to be aware of it when manipulating  
pointers.

>> The point I was trying to make is that the information needs to be
>> propagated through any address calculation when possible.
> Yes, but that shouldn't be too hard to add to the existing code.
>

I agree,

Cheers,
   -- Mon Ping