[PATCH] D134982: [X86] Add support for "light" AVX

Wed Dec 28 15:36:02 PST 2022

TokarIP added inline comments.

================
Comment at: llvm/lib/Target/X86/X86.td:1290
+                                     TuningInsertVZEROUPPER,
+                                     TuningAllowLight256Bit];
   list<SubtargetFeature> ZN2AdditionalFeatures = [FeatureCLWB,
----------------
lebedev.ri wrote:
> TokarIP wrote:
> > RKSimon wrote:
> > > I'm not certain Ryzen needs this - even on znver1 with double pumping of 256-bit ops.
> > I'm not sure I understand this comment. You mean since Ryzen doesn't have any frequency problems, so we don't care about  prefer-vector-width=128 behavior? This is mostly here for a) completeness (since 256-ops don't seem to hurt on ryzen we do prefer 256 bit loads/stores) and b) for cases where users want znver tuning but still prefer good performance on intel sop they pass prefer-vector-width=128
> I agree with @RKSimon here. I'm not really sure why anyone would want to
> use non-full vector width on Ryzens, so i don't think we support it there.
FWIW mtune=znver3 + mprefer-vector-width=128 often gives best results for a mixed (skylake+rome) server fleet.

================
Comment at: llvm/test/CodeGen/X86/vector-width-store-merge.ll:70

-attributes #0 = { nounwind uwtable "correctly-rounded-divide-sqrt-fp-math"="false" "disable-tail-calls"="false" "less-precise-fpmad"="false" "min-legal-vector-width"="0" "frame-pointer"="none" "no-infs-fp-math"="false" "no-jump-tables"="false" "no-nans-fp-math"="false" "no-signed-zeros-fp-math"="false" "no-trapping-math"="false" "prefer-vector-width"="128" "stack-protector-buffer-size"="8" "target-cpu"="skylake-avx512" "target-features"="+adx,+aes,+avx,+avx2,+avx512bw,+avx512cd,+avx512dq,+avx512f,+avx512vl,+bmi,+bmi2,+clflushopt,+clwb,+cx16,+cx8,+f16c,+fma,+fsgsbase,+fxsr,+invpcid,+lzcnt,+mmx,+movbe,+pclmul,+pku,+popcnt,+prfchw,+rdrnd,+rdseed,+sahf,+sse,+sse2,+sse3,+sse4.1,+sse4.2,+ssse3,+x87,+xsave,+xsavec,+xsaveopt,+xsaves" "unsafe-fp-math"="false" "use-soft-float"="false" }
+attributes #0 = { nounwind uwtable "correctly-rounded-divide-sqrt-fp-math"="false" "disable-tail-calls"="false" "less-precise-fpmad"="false" "min-legal-vector-width"="0" "frame-pointer"="none" "no-infs-fp-math"="false" "no-jump-tables"="false" "no-nans-fp-math"="false" "no-signed-zeros-fp-math"="false" "no-trapping-math"="false" "prefer-vector-width"="128" "stack-protector-buffer-size"="8" "target-cpu"="sandybridge" "target-features"="+adx,+aes,+avx,+avx2,+avx512bw,+avx512cd,+avx512dq,+avx512f,+avx512vl,+bmi,+bmi2,+clflushopt,+clwb,+cx16,+cx8,+f16c,+fma,+fsgsbase,+fxsr,+invpcid,+lzcnt,+mmx,+movbe,+pclmul,+pku,+popcnt,+prfchw,+rdrnd,+rdseed,+sahf,+sse,+sse2,+sse3,+sse4.1,+sse4.2,+ssse3,+x87,+xsave,+xsavec,+xsaveopt,+xsaves" "unsafe-fp-math"="false" "use-soft-float"="false" }
 attributes #1 = { argmemonly nounwind }
----------------
pengfei wrote:
> TokarIP wrote:
> > pengfei wrote:
> > > This patch changes the behavior the test expected, though it should no correctness issue for 256-bits.
> > > We should update the test to show it rather than hide it.
> > > Note, it will have correctness issue or build error if force to generate 512-bits instructions.
> > We want to test 2 behaviors:
> > 1)prefer-vector-width=128 and no TuningAllowLight256Bit should generate 128-bit load/store - this test
> > 2)prefer-vector-width=128 and  TuningAllowLight256Bit  should generate 256-bit - memcpy-light-avx.ll
> > 
> > Updating this test to check 256 case, means that we still need extra test for behavior #1, I'd rather keep the number of tests smaller with the same coverage.
> You can add another RUN to test the behaviors in `llvm/test/CodeGen/X86/memcpy-light-avx.ll` if you like, e.g.,
> `; RUN: llc < %s -mtriple=x86_64-unknown-linux-gnu -mcpu=haswell | FileCheck %s --check-prefix=NO-256`
> 
> We don't have a method to disable it on new targets, so no 2 behaviors here.
Thanks for the suggestion! Now we have 2 runs one cpus without this tuning and one with. 

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D134982/new/

https://reviews.llvm.org/D134982