[LLVMdev] [llvm-commits] Vectors of Pointers and Vector-GEP

Rotem, Nadav nadav.rotem at intel.com
Wed Nov 30 13:16:29 PST 2011


Jose, 

The scenario you described is probably the most important/common case. Supporting GEPs with a scalar base pointer and multiple indices can indeed assist IR-level optimizations in detecting these patterns and replace them with intrinsics. But even without a single scalar base pointers, optimizations can detect that the base pointer is broadcasted from a scalar.  Having said that, I am still not sure how to add codegen support for AVX2 scatter/gather of base + 32bit-indices. The problem is that the GEP would return a vector of pointers, which need to be reversed back to the 'base+index' form. I think that replacing the GEP/LOAD sequence with an intrinsic if probably the best choice.

Nadav 


-----Original Message-----
From: Jose Fonseca [mailto:jfonseca at vmware.com] 
Sent: Wednesday, November 30, 2011 18:00
To: Rotem, Nadav
Cc: LLVM Developers Mailing List; David A. Greene
Subject: Re: [LLVMdev] [llvm-commits] Vectors of Pointers and Vector-GEP

Yes, indeed I can always fallback to intrinsics.

But still, I believe that the case I described is in its essence quite common-place, so it should be a first-class citizen in the LLVM IR.  AVX2 is the target ISA I'm thinking of too BTW.

Let's forget 3D, and imagine something as trivial as a vectorized i32 => float table look up. I'd expect that the IR would look something like:

  ; Look Up Table with precomputed values 
  declare float* @lut;

  define <8 x float> @foo(<8 x float> %indices) {
    %pointer = getelementptr float* @lut, <8 x i32> %indices
    %values = load <8 x float*> %pointer
    ret <8 x float> %values;
  }

And the final AVX2 code I'd expect would consist of a single VGATHERDPS, both on 64bits and 32bits addressing mode:

foo:
  VPCMPEQB   ymm1, ymm1, ymm1                        ; generate all ones
  VGATHERDPS ymm0, DWORD PTR [ymm0 * 4 + lut], ymm1
  RET

Jose

----- Original Message -----
> Hi Jose,
> 
> The proposed IR change does not contribute nor hinder the usecase you
> mentioned. The case of a base + vector-index should be easily
> addressed by an intrinsic. The pointer-vector proposal comes to
> support full scatter/gather instructions (such as the AVX2 gather
> instructions).
> 
> Nadav
> 
> 
> -----Original Message-----
> From: Jose Fonseca [mailto:jfonseca at vmware.com]
> Sent: Tuesday, November 29, 2011 22:25
> To: Rotem, Nadav; David A. Greene
> Cc: LLVM Developers Mailing List
> Subject: Re: [LLVMdev] [llvm-commits] Vectors of Pointers and
> Vector-GEP
> 
> ----- Original Message -----
> > "Rotem, Nadav" <nadav.rotem at intel.com> writes:
> > 
> > > David,
> > >
> > > Thanks for the support! I sent a detailed email with the overall
> > > plan. But just to reiterate, the GEP would look like this:
> > >
> > > 	%PV = getelementptr <4 x i32*> %base, <4 x i32> <i32 1, i32 2,
> > > 	i32
> > > 	3, i32 4>
> > >
> > > Where the index of the GEP is a vector of indices. I am not
> > > against
> > > having multiple indices. I just want to start with a basic set of
> > > features.
> > 
> > Ah, I see.  I actually think multiple indices as in multiple
> > vectors
> > of
> > indices to the GEP above would be pretty rare.
> 
> Nadav, David,
> 
> I'd like to understand a bit better the final role of these pointer
> vector types in 64bit architectures, where the pointers are often
> bigger than the elements stored/fetch (e.g, 32bits floats/ints).
> 
> Will 64bits backends be forced to actually operate with 64bit pointer
> vectors all the time? Or will they be able to retain operations on
> base + 32bit offsets as such?
> 
> In particular, an important use case for 3D software rendering is to
> be able to gather <4 x i32> values, from a i32* scalar base pointer
> in a 64bit address space, indexed by <N x i32> offsets. [1]  And it
> is important that the intermediate <N x i32*> pointer vectors is
> actually never instanced, as it wouldn't fit in the hardware SIMD
> registers, and therefore would require two gather operations.
> 
> It would be nice to see how this use case would look in the proposed
> IR, and get assurance that backends will be able to emit efficient
> code (i.e., a single gather instruction) from that IR.
> 
> Jose
> 
> [1] http://lists.cs.uiuc.edu/pipermail/llvmdev/2011-June/040825.html
> ---------------------------------------------------------------------
> Intel Israel (74) Limited
> 
> This e-mail and any attachments may contain confidential material for
> the sole use of the intended recipient(s). Any review or distribution
> by others is strictly prohibited. If you are not the intended
> recipient, please contact the sender and delete all copies.
> 
---------------------------------------------------------------------
Intel Israel (74) Limited

This e-mail and any attachments may contain confidential material for
the sole use of the intended recipient(s). Any review or distribution
by others is strictly prohibited. If you are not the intended
recipient, please contact the sender and delete all copies.




More information about the llvm-dev mailing list