[all-commits] [llvm/llvm-project] f0dd12: [x86] use zero-extending load of a byte outside of...

Sanjay Patel via All-commits all-commits at lists.llvm.org
Tue Jul 19 18:36:07 PDT 2022


  Branch: refs/heads/main
  Home:   https://github.com/llvm/llvm-project
  Commit: f0dd12ec5c0169ba5b4363b62d59511181cf954a
      https://github.com/llvm/llvm-project/commit/f0dd12ec5c0169ba5b4363b62d59511181cf954a
  Author: Sanjay Patel <spatel at rotateright.com>
  Date:   2022-07-19 (Tue, 19 Jul 2022)

  Changed paths:
    M llvm/lib/Target/X86/X86FixupBWInsts.cpp
    M llvm/test/CodeGen/X86/2006-01-19-ISelFoldingBug.ll
    M llvm/test/CodeGen/X86/2006-05-08-InstrSched.ll
    M llvm/test/CodeGen/X86/2006-11-17-IllegalMove.ll
    M llvm/test/CodeGen/X86/2007-08-09-IllegalX86-64Asm.ll
    M llvm/test/CodeGen/X86/2008-04-17-CoalescerBug.ll
    M llvm/test/CodeGen/X86/2008-04-24-MemCpyBug.ll
    M llvm/test/CodeGen/X86/2008-09-11-CoalescerBug2.ll
    M llvm/test/CodeGen/X86/2010-09-17-SideEffectsInChain.ll
    M llvm/test/CodeGen/X86/8bit_cmov_of_trunc_promotion.ll
    M llvm/test/CodeGen/X86/GlobalISel/callingconv.ll
    M llvm/test/CodeGen/X86/GlobalISel/memop-scalar-x32.ll
    M llvm/test/CodeGen/X86/GlobalISel/memop-scalar.ll
    M llvm/test/CodeGen/X86/PR40322.ll
    M llvm/test/CodeGen/X86/abs.ll
    M llvm/test/CodeGen/X86/add-sub-bool.ll
    M llvm/test/CodeGen/X86/and-load-fold.ll
    M llvm/test/CodeGen/X86/and-sink.ll
    M llvm/test/CodeGen/X86/and-with-overflow.ll
    M llvm/test/CodeGen/X86/arg-copy-elide.ll
    M llvm/test/CodeGen/X86/atom-cmpb.ll
    M llvm/test/CodeGen/X86/atomic-idempotent.ll
    M llvm/test/CodeGen/X86/atomic-mi.ll
    M llvm/test/CodeGen/X86/atomic-monotonic.ll
    M llvm/test/CodeGen/X86/atomic-unordered.ll
    M llvm/test/CodeGen/X86/avoid-sfb-overlaps.ll
    M llvm/test/CodeGen/X86/avoid-sfb.ll
    M llvm/test/CodeGen/X86/avx512-calling-conv.ll
    M llvm/test/CodeGen/X86/avx512-ext.ll
    M llvm/test/CodeGen/X86/avx512-extract-subvector-load-store.ll
    M llvm/test/CodeGen/X86/avx512-insert-extract.ll
    M llvm/test/CodeGen/X86/avx512-intrinsics-canonical.ll
    M llvm/test/CodeGen/X86/avx512-intrinsics-fast-isel.ll
    M llvm/test/CodeGen/X86/avx512-intrinsics-upgrade.ll
    M llvm/test/CodeGen/X86/avx512-intrinsics.ll
    M llvm/test/CodeGen/X86/avx512-load-store.ll
    M llvm/test/CodeGen/X86/avx512-load-trunc-store-i1.ll
    M llvm/test/CodeGen/X86/avx512-mask-op.ll
    M llvm/test/CodeGen/X86/avx512-select.ll
    M llvm/test/CodeGen/X86/avx512bf16-vl-intrinsics.ll
    M llvm/test/CodeGen/X86/avx512bw-intrinsics-fast-isel.ll
    M llvm/test/CodeGen/X86/avx512bwvl-intrinsics-fast-isel.ll
    M llvm/test/CodeGen/X86/avx512bwvl-intrinsics-upgrade.ll
    M llvm/test/CodeGen/X86/avx512ifma-intrinsics-fast-isel.ll
    M llvm/test/CodeGen/X86/avx512ifmavl-intrinsics-fast-isel.ll
    M llvm/test/CodeGen/X86/avx512vbmi2-intrinsics-fast-isel.ll
    M llvm/test/CodeGen/X86/avx512vbmi2vl-intrinsics-fast-isel.ll
    M llvm/test/CodeGen/X86/avx512vl-intrinsics-fast-isel.ll
    M llvm/test/CodeGen/X86/avx512vl-intrinsics-upgrade.ll
    M llvm/test/CodeGen/X86/bitcast-vector-bool.ll
    M llvm/test/CodeGen/X86/bitreverse.ll
    M llvm/test/CodeGen/X86/bmi.ll
    M llvm/test/CodeGen/X86/bool-math.ll
    M llvm/test/CodeGen/X86/bool-vector.ll
    M llvm/test/CodeGen/X86/brcond.ll
    M llvm/test/CodeGen/X86/bt.ll
    M llvm/test/CodeGen/X86/btc_bts_btr.ll
    M llvm/test/CodeGen/X86/byval5.ll
    M llvm/test/CodeGen/X86/callbr-asm-instr-scheduling.ll
    M llvm/test/CodeGen/X86/clear-highbits.ll
    M llvm/test/CodeGen/X86/clear-lowbits.ll
    M llvm/test/CodeGen/X86/clz.ll
    M llvm/test/CodeGen/X86/cmov.ll
    M llvm/test/CodeGen/X86/cmovcmov.ll
    M llvm/test/CodeGen/X86/combine-andintoload.ll
    M llvm/test/CodeGen/X86/combine-bswap.ll
    M llvm/test/CodeGen/X86/const-shift-of-constmasked.ll
    M llvm/test/CodeGen/X86/copy-eflags.ll
    M llvm/test/CodeGen/X86/div-rem-pair-recomposition-signed.ll
    M llvm/test/CodeGen/X86/div-rem-pair-recomposition-unsigned.ll
    M llvm/test/CodeGen/X86/divide-by-constant.ll
    M llvm/test/CodeGen/X86/divrem8_ext.ll
    M llvm/test/CodeGen/X86/emutls.ll
    M llvm/test/CodeGen/X86/extract-bits.ll
    M llvm/test/CodeGen/X86/extract-insert.ll
    M llvm/test/CodeGen/X86/extract-lowbits.ll
    M llvm/test/CodeGen/X86/extractelement-index.ll
    M llvm/test/CodeGen/X86/fast-isel-call-bool.ll
    M llvm/test/CodeGen/X86/fast-isel-i1.ll
    M llvm/test/CodeGen/X86/fast-isel-sext-zext.ll
    M llvm/test/CodeGen/X86/fixup-bw-copy.ll
    M llvm/test/CodeGen/X86/fixup-bw-inst.ll
    M llvm/test/CodeGen/X86/fold-and-shift-x86_64.ll
    M llvm/test/CodeGen/X86/fold-and-shift.ll
    M llvm/test/CodeGen/X86/fp-intrinsics.ll
    M llvm/test/CodeGen/X86/fp-strict-scalar-fptoint.ll
    M llvm/test/CodeGen/X86/fp-strict-scalar-inttofp-fp16.ll
    M llvm/test/CodeGen/X86/fp-strict-scalar-inttofp.ll
    M llvm/test/CodeGen/X86/fp80-strict-scalar.ll
    M llvm/test/CodeGen/X86/fptosi-sat-scalar.ll
    M llvm/test/CodeGen/X86/fptoui-sat-scalar.ll
    M llvm/test/CodeGen/X86/fshl.ll
    M llvm/test/CodeGen/X86/fshr.ll
    M llvm/test/CodeGen/X86/funnel-shift-rot.ll
    M llvm/test/CodeGen/X86/funnel-shift.ll
    M llvm/test/CodeGen/X86/gpr-to-mask.ll
    M llvm/test/CodeGen/X86/h-register-addressing-32.ll
    M llvm/test/CodeGen/X86/h-register-addressing-64.ll
    M llvm/test/CodeGen/X86/hoist-and-by-const-from-lshr-in-eqcmp-zero.ll
    M llvm/test/CodeGen/X86/hoist-and-by-const-from-shl-in-eqcmp-zero.ll
    M llvm/test/CodeGen/X86/iabs.ll
    M llvm/test/CodeGen/X86/inc-of-add.ll
    M llvm/test/CodeGen/X86/insertelement-var-index.ll
    M llvm/test/CodeGen/X86/isel-sink2.ll
    M llvm/test/CodeGen/X86/legalize-shift-64.ll
    M llvm/test/CodeGen/X86/lifetime-alias.ll
    M llvm/test/CodeGen/X86/load-local-v3i1.ll
    M llvm/test/CodeGen/X86/load-local-v4i5.ll
    M llvm/test/CodeGen/X86/load-scalar-as-vector.ll
    M llvm/test/CodeGen/X86/masked_gather_scatter.ll
    M llvm/test/CodeGen/X86/memcmp-more-load-pairs-x32.ll
    M llvm/test/CodeGen/X86/memcmp-more-load-pairs.ll
    M llvm/test/CodeGen/X86/memcmp-x32.ll
    M llvm/test/CodeGen/X86/memcmp.ll
    M llvm/test/CodeGen/X86/memcpy.ll
    M llvm/test/CodeGen/X86/merge-consecutive-loads-128.ll
    M llvm/test/CodeGen/X86/merge-store-partially-alias-loads.ll
    M llvm/test/CodeGen/X86/midpoint-int.ll
    M llvm/test/CodeGen/X86/misched_phys_reg_assign_order.ll
    M llvm/test/CodeGen/X86/movmsk-cmp.ll
    M llvm/test/CodeGen/X86/musttail-varargs.ll
    M llvm/test/CodeGen/X86/neg-abs.ll
    M llvm/test/CodeGen/X86/negate-i1.ll
    M llvm/test/CodeGen/X86/oddshuffles.ll
    M llvm/test/CodeGen/X86/or-with-overflow.ll
    M llvm/test/CodeGen/X86/packed_struct.ll
    M llvm/test/CodeGen/X86/peephole-na-phys-copy-folding.ll
    M llvm/test/CodeGen/X86/popcnt.ll
    M llvm/test/CodeGen/X86/pr12360.ll
    M llvm/test/CodeGen/X86/pr15267.ll
    M llvm/test/CodeGen/X86/pr20011.ll
    M llvm/test/CodeGen/X86/pr22473.ll
    M llvm/test/CodeGen/X86/pr28824.ll
    M llvm/test/CodeGen/X86/pr32345.ll
    M llvm/test/CodeGen/X86/pr34292.ll
    M llvm/test/CodeGen/X86/pr34381.ll
    M llvm/test/CodeGen/X86/pr35765.ll
    M llvm/test/CodeGen/X86/pr38539.ll
    M llvm/test/CodeGen/X86/pr38743.ll
    M llvm/test/CodeGen/X86/pr38795.ll
    M llvm/test/CodeGen/X86/pr39926.ll
    M llvm/test/CodeGen/X86/pr46527.ll
    M llvm/test/CodeGen/X86/pr5145.ll
    M llvm/test/CodeGen/X86/reduce-trunc-shl.ll
    M llvm/test/CodeGen/X86/rot16.ll
    M llvm/test/CodeGen/X86/rot32.ll
    M llvm/test/CodeGen/X86/rotate.ll
    M llvm/test/CodeGen/X86/rotate4.ll
    M llvm/test/CodeGen/X86/sadd_sat.ll
    M llvm/test/CodeGen/X86/sadd_sat_plus.ll
    M llvm/test/CodeGen/X86/sadd_sat_vec.ll
    M llvm/test/CodeGen/X86/sdiv_fix.ll
    M llvm/test/CodeGen/X86/sdiv_fix_sat.ll
    M llvm/test/CodeGen/X86/select.ll
    M llvm/test/CodeGen/X86/setcc-combine.ll
    M llvm/test/CodeGen/X86/setcc.ll
    M llvm/test/CodeGen/X86/sext-trunc.ll
    M llvm/test/CodeGen/X86/shift-amount-mod.ll
    M llvm/test/CodeGen/X86/shift-and.ll
    M llvm/test/CodeGen/X86/shift-bmi2.ll
    M llvm/test/CodeGen/X86/shift-by-signext.ll
    M llvm/test/CodeGen/X86/shift-coalesce.ll
    M llvm/test/CodeGen/X86/shift-combine.ll
    M llvm/test/CodeGen/X86/shift-double.ll
    M llvm/test/CodeGen/X86/shift-i128.ll
    M llvm/test/CodeGen/X86/shift-mask.ll
    M llvm/test/CodeGen/X86/smul_fix.ll
    M llvm/test/CodeGen/X86/smul_fix_sat.ll
    M llvm/test/CodeGen/X86/srem-seteq-illegal-types.ll
    M llvm/test/CodeGen/X86/sse-scalar-fp-arith.ll
    M llvm/test/CodeGen/X86/sse2-intrinsics-fast-isel.ll
    M llvm/test/CodeGen/X86/sshl_sat.ll
    M llvm/test/CodeGen/X86/sshl_sat_vec.ll
    M llvm/test/CodeGen/X86/ssub_sat.ll
    M llvm/test/CodeGen/X86/ssub_sat_plus.ll
    M llvm/test/CodeGen/X86/ssub_sat_vec.ll
    M llvm/test/CodeGen/X86/store-narrow.ll
    M llvm/test/CodeGen/X86/sttni.ll
    M llvm/test/CodeGen/X86/sub-of-not.ll
    M llvm/test/CodeGen/X86/swifterror.ll
    M llvm/test/CodeGen/X86/tail-opts.ll
    M llvm/test/CodeGen/X86/tls.ll
    M llvm/test/CodeGen/X86/trunc-to-bool.ll
    M llvm/test/CodeGen/X86/uadd_sat.ll
    M llvm/test/CodeGen/X86/uadd_sat_plus.ll
    M llvm/test/CodeGen/X86/uadd_sat_vec.ll
    M llvm/test/CodeGen/X86/udiv_fix.ll
    M llvm/test/CodeGen/X86/udiv_fix_sat.ll
    M llvm/test/CodeGen/X86/umul_fix.ll
    M llvm/test/CodeGen/X86/umul_fix_sat.ll
    M llvm/test/CodeGen/X86/umulo-128-legalisation-lowering.ll
    M llvm/test/CodeGen/X86/unfold-masked-merge-vector-variablemask.ll
    M llvm/test/CodeGen/X86/urem-power-of-two.ll
    M llvm/test/CodeGen/X86/urem-seteq-illegal-types.ll
    M llvm/test/CodeGen/X86/ushl_sat.ll
    M llvm/test/CodeGen/X86/ushl_sat_vec.ll
    M llvm/test/CodeGen/X86/usub_sat.ll
    M llvm/test/CodeGen/X86/usub_sat_plus.ll
    M llvm/test/CodeGen/X86/usub_sat_vec.ll
    M llvm/test/CodeGen/X86/vec_setcc.ll
    M llvm/test/CodeGen/X86/vector-sext.ll
    M llvm/test/CodeGen/X86/volatile-memstores-nooverlapping-load-stores.ll
    M llvm/test/CodeGen/X86/xchg-nofold.ll
    M llvm/test/CodeGen/X86/xmulo.ll
    M llvm/test/CodeGen/X86/xor-icmp.ll
    M llvm/test/CodeGen/X86/xor-lea.ll
    M llvm/test/CodeGen/X86/xor-with-overflow.ll
    M llvm/test/CodeGen/X86/xor.ll
    M llvm/test/CodeGen/X86/zext-logicop-shift-load.ll
    M llvm/test/tools/UpdateTestChecks/update_llc_test_checks/Inputs/asm-show-inst.ll.expected
    M llvm/test/tools/UpdateTestChecks/update_llc_test_checks/Inputs/basic.ll.expected

  Log Message:
  -----------
  [x86] use zero-extending load of a byte outside of loops too (2nd try)

The first attempt missed changing test files for tools
(update_llc_test_checks.py).

Original commit message:

This implements the main suggested change from issue #56498.
Using the shorter (non-extending) instruction with only
-Oz ("minsize") rather than -Os ("optsize") is left as a
possible follow-up.

As noted in the bug report, the zero-extending load may have
shorter latency/better throughput across a wide range of x86
micro-arches, and it avoids a potential false dependency.
The cost is an extra instruction byte.

This could cause perf ups and downs from secondary effects,
but I don't think it is possible to account for those in
advance, and that will likely also depend on exact micro-arch.
This does bring LLVM x86 codegen more in line with existing
gcc codegen, so if problems are exposed they are more likely
to occur for both compilers.

Differential Revision: https://reviews.llvm.org/D129775




More information about the All-commits mailing list