[PATCH] D41096: [X86] Initial support for prefer-vector-width function attribute
    Hal Finkel via Phabricator via llvm-commits 
    llvm-commits at lists.llvm.org
       
    Wed Dec 13 10:34:02 PST 2017
    
    
  
hfinkel added a comment.
In https://reviews.llvm.org/D41096#953284, @craig.topper wrote:
> The dependency isn't really an llvm dependency. You're absolutely right, that if it was just that it would be fixable by changing our instruction patterns. The problem here is about how the features and dependencies are defined in our instructions manuals. avx512f means "avx512 foundation", but only includes 512 bit instructions. The latter features imply the foundation is present.
>
> A user may expect that if they use "-march=skylake-avx512" that they can use 512-bit intrinsics in x86intrin.h. But at the same time we should generate good performing code for other users who specify -march=skylake-avx512 and didn't write any vector code in their source. Based on the data we've seen so far, the best way to do that is to disable 512-bit instructions.
Understood. However, we can separate this within the implementation. Specifically, Clang does not need to tag all generated functions with the same target attribute. On the LLVM side, we can separate these, and then use that finer-grained control.
> So what I'm looking for a solution that optimizes for the likely case that the user code doesn't contain 512-bit vector intrinsics and tells the legalizer to use 256-bit registers only. But if the user code does contains explicit 512-bit intrinsics, we still allow that code to compile and ideally generate what the user expected. I don't want existing 512-bit intrinsic code to suddenly stop compiling with -march=skylake-avx512.
I think that we're on the same page. In addition to intrinsics, I'm also worried about OpenMP SIMD loops/functions (or other explicit vectorization). We may want those to also give 512-bit vectors by default (presumably, such specifically-tagged loops are likely to run long enough to amortize the clock-rate effects).
In https://reviews.llvm.org/D41096#954037, @craig.topper wrote:
> getHostCPUName/getHostCPUFeatures is called by the clang driver long before any code is parsed. How would it know if the code contained any 512-bit instructions?
I recommend that we do this in Clang during CodeGen. We should add a special callback that will allow TargetInfo to examine the AST and adjust the target (or target features) on a per-function basis. Any function using AXV-512 intrinsics on 512-bit vectors, explicit vector types, or OpenMP SIMD (unless a small simdlen clause is present) will stay as-is, and everything else will be modified to turn off 512-bit vectors.
The real question is whether to block inlining on mismatch here. I don't think that we should (and we'll need to enable 512-bit vectors in the caller). The problem is that people write C++ wrappers around vector intrinsics, and we need the compiler to remove the abstraction layer. Generating that code poorly will be a significant problem. This will have the unfortunate "action at a distance" effects we discussed earlier (because having some 512-bit vectors in some function, even after inlining, will suddenly enable it elsewhere in the function), but I don't see any good way to prevent that in undesirable cases without causing significant problems elsewhere.
https://reviews.llvm.org/D41096
    
    
More information about the llvm-commits
mailing list