[PATCH] D11518: [DAGCombiner] Convert constant AND masks to shuffle clear masks down to the byte level
llvm-dev at redking.me.uk
Sun Jul 26 14:43:26 PDT 2015
RKSimon created this revision.
RKSimon added reviewers: chandlerc, qcolombet, andreadb.
RKSimon added a subscriber: llvm-commits.
RKSimon set the repository for this revision to rL LLVM.
The XformToShuffleWithZero method currently checks AND masks at the per-lane level for all-one and all-zero constants and attempts to converts them to legal shuffle clear masks.
This patch generalises XformToShuffleWithZero, splitting and checking the sub-lanes of the constants down to the byte level to see if any legal shuffle clear masks are possible. This allows a lot of masks (often from legalization or truncation) to be folded into existing shuffle patterns and removes a lot of constant mask loading.
The patch involves a number of additional minor tweaks to improve codegen, I can commit these separately or generate patches for extra review if any of you wish:
- XformToShuffleWithZero is only attempted if constant folding has failed.
- A lot more X86 byte vector blends are now generated. I've added a stage to the VPBLENDVB lowering in lowerVectorShuffleAsBlend to attempt to lower back to a AND mask using lowerVectorShuffleAsBitMask if possible. VPAND is a lot faster than VPBLENDVB.
- X86 v8i16 shuffle lowering now attempts to use lowerVectorShuffleAsBitMask before resorting to VPSHUFB (matches v16i8 shuffle lowering). VPAND is a lot faster than VPSHUFB.
There are still a few examples of poor shuffle lowering that are exposed that we can cleanup in future patches (e.g. x86 legalized v8i8 zero extension uses PMOVZX+AND+AND instead of AND+PMOVZX)
-------------- next part --------------
A non-text attachment was scrubbed...
Size: 63679 bytes
Desc: not available
More information about the llvm-commits