[llvm-dev] [RFC] [X86] Emit unaligned vector moves on avx machine with option control.
Roman Lebedev via llvm-dev
llvm-dev at lists.llvm.org
Tue Apr 20 00:27:14 PDT 2021
On Tue, Apr 20, 2021 at 5:30 AM Luo, Yuanke <yuanke.luo at intel.com> wrote:
>
>
>
> I collected the feedback/requirement from Intel customer as below.
>
>
>
> Our software runs in an embedded environment and is processing buffers which are unaligned. Sometimes this misalignment is simply because the buffer allocation is beyond the immediate control of our software but it can also be because we are processing blocks of data which are not multiples of the vector size (e.g., 6, 12 or 24). We can’t just fix our buffers to make them aligned. Our code is complicated and we support multiple instruction sets operating using the same algorithms by using templated code. For example:
>
>
>
> template<typename DVEC_TYPE>
>
> void doSomething(DVEC_TYPE* data)
>
> {
>
> // Trivial example – reality would be something much more substantial, possibly with loops or other function calls.
>
> *data += 1.0f;
>
> }
>
>
>
> Note that we use dvec to help us abstract the ISA, but other similar header-only vector overloading libraries also exist.
>
>
>
> We would then instantiate our function above multiple times for each ISA or data type we care about:
>
>
>
> template void doSomething<float>(float* data); // Scalar type useful for debugging algorithm and doing basic testing
>
> template void doSomething<F32vec8>(F32vec8* data); // Different AVX widths
>
> template void doSomething<F32vec16>(F32vec16* data);
>
> template void doSomething<I32vec16>(I32vec16* data); // Different element type
>
>
>
> The functions are sufficiently large that we don’t want to have to write a different version for each ISA. We know that the incoming data may be mis-aligned and that accessing it directly is UB, so we could modify our code to explicitly handle misalignment. Something like:
>
>
>
> template<typename DVEC_TYPE>
>
> void doSomething(DVEC_TYPE* data)
>
> {
>
> DVEC_TYPE t;
>
> loadu(t, data);
>
> t += 1.0f;
>
> storeu(data, t);
>
> }
>
>
>
> The code has become more verbose, less readable (maintainable, debuggable, etc), and it no longer works with plain scalar types which don’t have loadu/storeu defined unless we start defining overloaded helper functions. Also, if `data’ pointed at an array, we’d have to throw some pointer arithmetic into the mix, rather than just using plain `data[IDX]’ syntax. We can certainly write code which could cope with the misalignment explicitly but it just ends up becoming messy.
How about:
https://godbolt.org/z/vsj9raaqM
> Or, we could leverage the hardware to manage this misalignment for us letting the compiler emit the movups instruction, instead of movaps.
I guess people are intentionally ignoring all mentions that the code
will *still* be miscompiled in other ways.
That's sad.
> Until now we have only been using the Intel Compiler, so we have written our code to use ICC’s unaligned operations and hardware support to make our code cleaner. We are looking at porting our code to LLVM, but LLVM is making this harder than it needs to be.
>
>
>
> Thanks
>
> Yuanke
Roman
> From: paul.robinson at sony.com <paul.robinson at sony.com>
> Sent: Tuesday, April 20, 2021 4:42 AM
> To: jyknight at google.com
> Cc: Luo, Yuanke <yuanke.luo at intel.com>; lebedev.ri at gmail.com; Liu, Chen3 <chen3.liu at intel.com>; llvm-dev at lists.llvm.org; Maslov, Sergey V <sergey.v.maslov at intel.com>; Towner, Daniel <daniel.towner at intel.com>
> Subject: RE: [llvm-dev] [RFC] [X86] Emit unaligned vector moves on avx machine with option control.
>
>
>
> We might still not be fully understanding one another, because this:
>
> so that you can compile code with under-aligned objects, and have it work as the author expected it to
>
> sounds like you’re expecting us to recompile the client code that creates the under-aligned objects. That is literally not possible. If you do understand that part, great, it’s just not obvious to me from how you’re phrasing things.
>
>
>
> I (still) don’t know what Intel is facing. For Sony’s problem, we would be much more likely to try to do something specific to the APIs that are being abused, rather than something draconian like eliminating alignment requirements for everyone. But of course we have a solution that works for us, so there’s that much more inertia to overcome.
>
> --paulr
>
>
>
> From: James Y Knight <jyknight at google.com>
> Sent: Monday, April 19, 2021 2:30 PM
> To: Robinson, Paul <paul.robinson at sony.com>
> Cc: Luo, Yuanke <yuanke.luo at intel.com>; Roman Lebedev <lebedev.ri at gmail.com>; Liu, Chen3 <chen3.liu at intel.com>; llvm-dev <llvm-dev at lists.llvm.org>; Maslov, Sergey V <sergey.v.maslov at intel.com>; daniel.towner at intel.com
> Subject: Re: [llvm-dev] [RFC] [X86] Emit unaligned vector moves on avx machine with option control.
>
>
>
>
> I understand your goal is to find and fix bugs in software that is
> still under development and CAN be fixed. I fully endorse that
> goal. However, that is not the situation that Sony has, and likely
> not what Intel has. Your proposal will NOT solve our problem.
>
>
>
> No, that's not it at all! I'm afraid you've totally misunderstood my concern.
>
>
>
> My goal is that if we add a compiler feature to address this problem -- so that you can compile code with under-aligned objects, and have it work as the author expected it to -- that the feature reliably addresses the problem, and makes such code no longer exhibit Undefined Behavior. The proposed backend change does not accomplish that, but we can implement a feature which will.
>
>
>
> As Reid said, -fmax-type-align=N appears to be almost that feature, and something like this little patch (along with documentation update) may be all that's needed (but this is totally untested).
>
>
>
> diff --git clang/lib/CodeGen/CodeGenModule.cpp clang/lib/CodeGen/CodeGenModule.cpp
> index b23d995683bf..3aef166a690e 100644
> --- clang/lib/CodeGen/CodeGenModule.cpp
> +++ clang/lib/CodeGen/CodeGenModule.cpp
> @@ -6280,8 +6280,7 @@ CharUnits CodeGenModule::getNaturalTypeAlignment(QualType T,
> // Cap to the global maximum type alignment unless the alignment
> // was somehow explicit on the type.
> if (unsigned MaxAlign = getLangOpts().MaxTypeAlign) {
> - if (Alignment.getQuantity() > MaxAlign &&
> - !getContext().isAlignmentRequired(T))
> + if (Alignment.getQuantity() > MaxAlign)
> Alignment = CharUnits::fromQuantity(MaxAlign);
> }
> return Alignment;
>
>
More information about the llvm-dev
mailing list