[PATCH] [AArch64] Fix a bug generating incorrect instruction when building small vector.

Kevin Qin kevinqindev at gmail.com
Mon Jul 21 20:18:12 PDT 2014


Hi Tim,

Thanks for your review. I also hate this implementation, which is so complex and may introduce some potential bug just like what this patch is going to fix. I hope this patch could bring peace in this area some time, otherwise we should consider to refactor the whole function. Some inline comments below.


Regards,
Kevin

================
Comment at: lib/Target/AArch64/AArch64ISelLowering.cpp:4173
@@ +4172,3 @@
+      // original v4i8 at last.
+      ResMultiplier = SmallestEltTy.getSizeInBits() / SrcEltTy.getSizeInBits();
+      SmallestEltTy = SrcEltTy;
----------------
Tim Northover wrote:
> I don't think this is correct. Imagine an output vector of v4i32, produced from SourceVecs with type v8i16 and v16i8. This situation should give ResMultiplier = 4, but it would end up as 2.
Yes, you are right. ResMultiplier should multiply itself with other part.

================
Comment at: lib/Target/AArch64/AArch64ISelLowering.cpp:4269-4270
@@ -4248,2 +4268,4 @@
     }
+    for (int j = 0; j != ResMultiplier - 1; ++j)
+      Mask.push_back(-1);
   }
----------------
Tim Northover wrote:
> I don't think this is generic enough. Again in the situation with two mismatched sources (so take a v4i32 VT being produced from a v4i16 and a v8i8 for example).
> 
> Anything coming from the v4i16 needs to set *two* elements of Mask properly (and then 2 undefs).
> 
> I'm not sure, but a more thorough restructuring of this loop might make the result clearer too. Don't be afraid to move things about if you see anything.
For your example, v4i32 will bitcast to v16i8, which only 0, 8, 16, 24 are valid lane number, and all other lanes should be undef. The valid lane number is decided by ResMultiplier only, and independent to sources. So I think this code can cover all scenarios including two sources are mismatched(The mismatching of  two sources will reflect on their OffsetMultipliers, and finally change the Mask number). But I can make a change for code style.

http://reviews.llvm.org/D4385






More information about the llvm-commits mailing list