[PATCH] Loop Vectorizer doesn't use %zmm registers on targets supporting AVX512.

Mon Mar 17 08:39:15 PDT 2014

I meant the vector cost model in X86TargetTransformInfo. But your are right without access to prerelease hardware you can’t test the test suite.

That said, you could still test the vectorizer cost model by comparing the estimated cost for auto generated micro kernels (operation x type matrix, for example: test/Analysis/CostModel/X86/sitofp.ll) to the generated assembly code, making sure that the cost model roughly agrees with the generated code.

Sometimes legalization of a vectorized operation will scalarize this operation with a very expensive code sequence. The fallback cost model  (basic transform info) does not always catch that. In such cases you want the cost model (x86targettransforminfo) to return a high cost or the vectorized code you are generating is a lot worse than the scalar variant.

If you add a new register width it is likely you will run into such cases. As a consequence by enabling avx512 you may be generating worse code in some cases than if you just generated code for avx2.

Thanks,
Arnold

On Mar 16, 2014, at 3:37 AM, zinovy.nis at gmail.com wrote:

> I have no real avx512 architecture at hands.
> But I believe that using wider vectors is better than using shorter ones in this case.
> BTW, the KNL cost model in LLVM was taken from the Haswell cost model (see TODOs in x86.td), so it can't be very accurate for KNL.
> 
> -----Исходное сообщение----- From: Arnold Schwaighofer
> Sent: Friday, March 14, 2014 8:08 PM
> To: elena.demikhovsky at intel.com ; rob.khasanov at gmail.com ; avolkov.intel at gmail.com ; nrotem at apple.com ; zinovy.nis at gmail.com
> Cc: llvm-commits at cs.uiuc.edu ; Andrea_DiBiagio at sn.scee.net ; aschwaighofer at apple.com
> Subject: Re: [PATCH] Loop Vectorizer doesn't use %zmm registers on targets supporting AVX512.
> 
> 
> Curious, did you test whether the cost model works reasonably well across the test-suite with this turned on? Are there any/many major regressions with this patch applied vs without on an avx512 architecture?
> 
> 
> Thanks,
> Arnold
> 
> 
> ================
> Comment at: test/CodeGen/X86/avx512-vectorizer.ll:8
> @@ +7,3 @@
> +
> +define void @foo(float* noalias nocapture readonly %a, float* noalias nocapture readonly %b, float* noalias nocapture %c, i32 %n) #0 {
> +entry:
> ----------------
> You could remove the no capture and readonly attributes. They are not needed for this test case.
> 
> I am also surprised that llvm accepts dangling function attributes ("#0”) but it does :).
> 
> 
> http://llvm-reviews.chandlerc.com/D3078