[PATCH] D30810: Preserve vec3 type.

Tue Mar 14 11:49:21 PDT 2017

jaykang10 added a comment.

In https://reviews.llvm.org/D30810#700476, @Anastasia wrote:

> In https://reviews.llvm.org/D30810#699827, @jaykang10 wrote:
>
> > In https://reviews.llvm.org/D30810#699695, @bruno wrote:
> >
> > > Hi JinGu,
> > >
> > > I just read the discussion on cfe-dev, some comments:
> > >
> > > > My colleague and I are implementing a transformation pass between LLVM IR and another IR and we want to keep the 3-component vector types in our target IR
> > >
> > > Why can't you go back through the shuffle to find out that it comes from a vec3 -> vec4 transformation? Adding a flag here just for that seems very odd.
> >
> >
> > Hi Bruno,
> >
> > Thanks for your comment. We have a pass to undo the vec4 to vec3. I wondered why clang generates the vec4 for vec3 load/store. As you can see the comment on clang's code, they are generated for better performance. I had a questions at this point. clang should consider vector load/store aligned by 4 regardless of target??? llvm's codegen could handle vec3 according to targets' vector load/store with their alignment. I agree the flag looks odd but I was concerned some llvm's targets depend on the vec4 so I suggested to add the flag. If I missed something, please let me know.
>
>
> I think doing this transformation in a frontend was not the best choice because we are losing the source information too early. But some targets can offer a better support for vec3 natively than converting to vec4. Regarding the option, I am wondering whether adding this as a part of TargetInfo would be a better solution. Although, I don't think any currently available upstream targets support this.

In https://reviews.llvm.org/D30810#700476, @Anastasia wrote:

> In https://reviews.llvm.org/D30810#699827, @jaykang10 wrote:
>
> > In https://reviews.llvm.org/D30810#699695, @bruno wrote:
> >
> > > Hi JinGu,
> > >
> > > I just read the discussion on cfe-dev, some comments:
> > >
> > > > My colleague and I are implementing a transformation pass between LLVM IR and another IR and we want to keep the 3-component vector types in our target IR
> > >
> > > Why can't you go back through the shuffle to find out that it comes from a vec3 -> vec4 transformation? Adding a flag here just for that seems very odd.
> >
> >
> > Hi Bruno,
> >
> > Thanks for your comment. We have a pass to undo the vec4 to vec3. I wondered why clang generates the vec4 for vec3 load/store. As you can see the comment on clang's code, they are generated for better performance. I had a questions at this point. clang should consider vector load/store aligned by 4 regardless of target??? llvm's codegen could handle vec3 according to targets' vector load/store with their alignment. I agree the flag looks odd but I was concerned some llvm's targets depend on the vec4 so I suggested to add the flag. If I missed something, please let me know.
>
>
> I think doing this transformation in a frontend was not the best choice because we are losing the source information too early. But some targets can offer a better support for vec3 natively than converting to vec4. Regarding the option, I am wondering whether adding this as a part of TargetInfo would be a better solution. Although, I don't think any currently available upstream targets support this.

I have a fundamental question for vec3 load/store again. I have checked how llvm's codegen handles the vec3 and vec4 load/store. Assuming that target's action is widen for vector type unsupported. If they have same alignment, the llvm's codegen generates one vector load for both of them. But it handles store differently  between vec3 and vec4  on DAGTypeLegalizer because vec4 store writes place over vec3 type. For example,

source code

  typedef float float3 __attribute__((ext_vector_type(3)));

  void foo(float3 *a, float3 *b) {
    *a = *b; 
  }

LLVM IR for vec4 (it is original clang's output)

  define void @foo(<3 x float>* nocapture %a, <3 x float>* nocapture readonly %b) {
  entry:
    %castToVec4 = bitcast <3 x float>* %b to <4 x float>*
    %loadVec4 = load <4 x float>, <4 x float>* %castToVec4, align 16
    %storetmp = bitcast <3 x float>* %a to <4 x float>*
    store <4 x float> %loadVec4, <4 x float>* %storetmp, align 16
    ret void
  }

LLVM IR for vec3

  define void @foo(<3 x float>* nocapture %a, <3 x float>* nocapture readonly %b) {
  entry:
    %0 = load <3 x float>, <3 x float>* %b, align 16
    store <3 x float> %0, <3 x float>* %a, align 16
    ret void
  }

As you can see above IR for vec4, "store <4 x float> %loadVec4, <4 x float>* %storetmp, align 16" writes place beyond float3. Is the behavior correct??? I can not find which spec or document mentions it... The vec3 store are lowered to more instructions such as stores and extractelements but the behavior is correct. If the vec3 --> vec4 store is not correct, we could add vec3 as default in TargetInfo. Additionally,  if the vec3 --> vec4 store is correct, we could modify llvm codegen's widen vector store for vec3 to generate one vector store like vec4.  If I missed something, please let me know.

https://reviews.llvm.org/D30810