[cfe-commits] [PATCH] Optimize vec3 loads/stores

Mon Jul 23 13:14:07 PDT 2012

On Jul 18, 2012, at 6:51 PM, John McCall wrote:

> On Jul 18, 2012, at 5:37 PM, Tanya Lattner wrote:
>> On Jul 18, 2012, at 5:08 AM, Benyei, Guy wrote:
>>> Hi Tanya,
>>> Looks good and usefull, but I'm not sure if it should be clang's decision if storing and loading vec4s is better than vec3.
>> 
>> The idea was to have Clang generate code that the optimizers would be more likely to do something useful and smart with. I understand the concern, but I'm not sure where the best place for this would be then?
> 
> Hmm.  The IR size of a <3 x blah> is basically the size of a <4 x blah> anyway;  arguably the backend already has all the information it needs for this.  Dan, what do you think?
> 
> One objection to doing this in the frontend is that it's not clear to me that this is a transformation we should be doing if <4 x blah> isn't actually legal for the target.  But I'm amenable to the idea that this belongs here.
> 

I do not think its Clangs job to care about this as we already have this problem for other vector sizes and its target lowering's job to fix it. 

> I'm also a little uncomfortable with this patch because it's so special-cased to 3.  I understand that that might be all that OpenCL really cares about, but it seems silly to add this code that doesn't also kick in for, say, <7 x i16> or whatever.  It really shouldn't be difficult to generalize.

While it could be generalized, I am only 100% confident in the codegen for vec3 as I know for sure that it improves the code quality that is ultimately generated. This is also throughly tested by our OpenCL compiler so I am confident we are not breaking anything and we are improving performance. From my perspective, there is no vec7 in OpenCL, so there is no strong need for it to be generalized. It seems to make sense that it would also improve code quality for vec7. So while I could make the change, I don't have real world apps to test it. I think this can easily be left as an extension for later.

> 
> Also, this optimization assumes that sizeof(element) is a power of 2, because the only thing that the AST guarantees is that the size of the vector type is a power of 2, and nextPow2(3 x E) == nextPow2(4 x E) is only true in general if E is a power of 2.  Is that reasonable?  Is that checked somewhere?
> 

I don't think this is officially checked anywhere but is a general rule.

-Tanya

> Two minor notes.
> 
> +    const llvm::VectorType *VTy =
> +    dyn_cast<llvm::VectorType>(EltTy);
> 
> How can this fail?  If it can, that's interesting;  if it can't, it would be better if the "is this a vector operation" checks used the IR type instead of the AST type.
> 
> +      llvm::PointerType *DstPtr = cast<llvm::PointerType>(Addr->getType());
> +      if (DstPtr->getElementType() != SrcTy) {
> 
> This can't fail in the case we care about, so you can just merge
> this into the if-block above.
> 
> John.