[PATCH] D140665: [SelectionDAG][RISCV][X86][AArch64][AMDGPU][PowerPC] Improve SimplifyDemandedBits for SHL with NUW/NSW flags.
Craig Topper via Phabricator via llvm-commits
llvm-commits at lists.llvm.org
Tue Dec 27 10:07:37 PST 2022
craig.topper added a comment.
In D140665#4017570 <https://reviews.llvm.org/D140665#4017570>, @lebedev.ri wrote:
> In D140665#4017550 <https://reviews.llvm.org/D140665#4017550>, @spatel wrote:
>
>> It's a bug if we are both propagating flags and not accounting for them - we've definitely caught cases like that in IR.
>>
>> I agree that it seems like the better choice would be to ignore/drop flags (the alternative is that flags could penalize optimization). But it's also possible that changing it could cause missed folds because we lost information by dropping flags. We probably just need to experiment and see what falls out from it.
>
> I'm not sure what this is trying to fix originally,
> but i think the test changes clearly show that we should be going in the opposite direction.
I'll try to fill in the original problem. See the test in aext-to-zext.ll
We have initially have this DAG
Initial selection DAG: %bb.0 'read:entry'
SelectionDAG has 19 nodes:
t0: ch,glue = EntryToken
t2: i64,ch = CopyFromReg t0, Register:i64 %0
t3: i64 = Constant<0>
t11: i16 = Constant<8>
t8: i64 = add nuw t2, Constant:i64<1>
t9: i8,ch = load<(load (s8) from %ir.arrayidx1)> t0, t8, undef:i64
t10: i16 = zero_extend t9
t13: i16 = shl nuw t10, Constant:i64<8>
t5: i8,ch = load<(load (s8) from %ir.adr)> t0, t2, undef:i64
t6: i16 = zero_extend t5
t14: i16 = or t13, t6
t15: i64 = zero_extend t14
t17: ch,glue = CopyToReg t0, Register:i64 $x10, t15
t18: ch = RISCVISD::RET_FLAG t17, Register:i64 $x10, t17:1
Part way through the first DAGCombine, t10 zero_extend is replaced by an any_extend because the extended bits aren't demanded. The users are a shl by 8 and a zero_extend that reads bits 63:16 of the shl.
The extends get combined with loads to leave this DAG
Optimized lowered selection DAG: %bb.0 'read:entry'
SelectionDAG has 15 nodes:
t0: ch,glue = EntryToken
t2: i64,ch = CopyFromReg t0, Register:i64 %0
t8: i64 = add nuw t2, Constant:i64<1>
t20: i16,ch = load<(load (s8) from %ir.arrayidx1), anyext from i8> t0, t8, undef:i64
t13: i16 = shl nuw t20, Constant:i64<8>
t21: i16,ch = load<(load (s8) from %ir.adr), zext from i8> t0, t2, undef:i64
t14: i16 = or t13, t21
t15: i64 = zero_extend t14
t17: ch,glue = CopyToReg t0, Register:i64 $x10, t15
t18: ch = RISCVISD::RET_FLAG t17, Register:i64 $x10, t17:1
We type legalize and optimize to this
SelectionDAG has 16 nodes:
t0: ch,glue = EntryToken
t2: i64,ch = CopyFromReg t0, Register:i64 %0
t8: i64 = add nuw t2, Constant:i64<1>
t22: i64,ch = load<(load (s8) from %ir.arrayidx1), anyext from i8> t0, t8, undef:i64
t23: i64 = shl nuw t22, Constant:i64<8>
t24: i64,ch = load<(load (s8) from %ir.adr), zext from i8> t0, t2, undef:i64
t25: i64 = or t23, t24
t27: i64 = and t25, Constant:i64<65535>
t17: ch,glue = CopyToReg t0, Register:i64 $x10, t27
t18: ch = RISCVISD::RET_FLAG t17, Register:i64 $x10, t17:1
The t27 'and' would be unnecessary if we had two zextloads instead of a zextload and an aextload.
This patch happens to prevent the original zero_extend from becoming an any_extend. Which solves the problem, but I don't think it's robust.
Repository:
rG LLVM Github Monorepo
CHANGES SINCE LAST ACTION
https://reviews.llvm.org/D140665/new/
https://reviews.llvm.org/D140665
More information about the llvm-commits
mailing list