[llvm-dev] [RFC] [X86] Emit unaligned vector moves on avx machine with option control.

Luo, Yuanke via llvm-dev llvm-dev at lists.llvm.org
Mon Apr 19 19:30:00 PDT 2021


I collected the feedback/requirement from Intel customer as below.

Our software runs in an embedded environment and is processing buffers which are unaligned. Sometimes this misalignment is simply because the buffer allocation is beyond the immediate control of our software but  it can also be because we are processing blocks of data which are not multiples of the vector size (e.g., 6, 12 or 24). We can’t just fix our buffers to make them aligned. Our code is complicated and we support multiple instruction sets operating using the same algorithms by using templated code. For example:

template<typename DVEC_TYPE>
void doSomething(DVEC_TYPE* data)
{
  // Trivial example – reality would be something much more substantial, possibly with loops or other function calls.
  *data += 1.0f;
}

Note that we use dvec to help us abstract the ISA, but other similar header-only vector overloading libraries also exist.

We would then instantiate our function above multiple times for each ISA or data type we care about:

template void doSomething<float>(float* data); // Scalar type useful for debugging algorithm and doing basic testing
template void doSomething<F32vec8>(F32vec8* data); // Different AVX widths
template void doSomething<F32vec16>(F32vec16* data);
template void doSomething<I32vec16>(I32vec16* data); // Different element type

The functions are sufficiently large that we don’t want to have to write a different version for each ISA. We know that the incoming data may be mis-aligned and that accessing it directly is UB, so we could modify our code to explicitly handle misalignment. Something like:

template<typename DVEC_TYPE>
void doSomething(DVEC_TYPE* data)
{
  DVEC_TYPE t;
  loadu(t, data);
  t += 1.0f;
  storeu(data, t);
}

The code has become more verbose, less readable (maintainable, debuggable, etc), and it no longer works with plain scalar types which don’t have loadu/storeu defined unless we start defining overloaded helper functions. Also, if `data’ pointed at an array, we’d have to throw some pointer arithmetic into the mix, rather than just using plain `data[IDX]’ syntax. We can certainly write code which could cope with the misalignment explicitly but it just ends up becoming messy. Or, we could leverage the hardware to manage this misalignment for us letting the compiler emit the movups instruction, instead of movaps.

Until now we have only been using the Intel Compiler, so we have written our code to use ICC’s unaligned operations and hardware support to make our code cleaner. We are looking at porting our code to LLVM, but LLVM is making this harder than it needs to be.

Thanks
Yuanke

From: paul.robinson at sony.com <paul.robinson at sony.com>
Sent: Tuesday, April 20, 2021 4:42 AM
To: jyknight at google.com
Cc: Luo, Yuanke <yuanke.luo at intel.com>; lebedev.ri at gmail.com; Liu, Chen3 <chen3.liu at intel.com>; llvm-dev at lists.llvm.org; Maslov, Sergey V <sergey.v.maslov at intel.com>; Towner, Daniel <daniel.towner at intel.com>
Subject: RE: [llvm-dev] [RFC] [X86] Emit unaligned vector moves on avx machine with option control.

We might still not be fully understanding one another, because this:
so that you can compile code with under-aligned objects, and have it work as the author expected it to
sounds like you’re expecting us to recompile the client code that creates the under-aligned objects.  That is literally not possible.  If you do understand that part, great, it’s just not obvious to me from how you’re phrasing things.

I (still) don’t know what Intel is facing.  For Sony’s problem, we would be much more likely to try to do something specific to the APIs that are being abused, rather than something draconian like eliminating alignment requirements for everyone.  But of course we have a solution that works for us, so there’s that much more inertia to overcome.
--paulr

From: James Y Knight <jyknight at google.com<mailto:jyknight at google.com>>
Sent: Monday, April 19, 2021 2:30 PM
To: Robinson, Paul <paul.robinson at sony.com<mailto:paul.robinson at sony.com>>
Cc: Luo, Yuanke <yuanke.luo at intel.com<mailto:yuanke.luo at intel.com>>; Roman Lebedev <lebedev.ri at gmail.com<mailto:lebedev.ri at gmail.com>>; Liu, Chen3 <chen3.liu at intel.com<mailto:chen3.liu at intel.com>>; llvm-dev <llvm-dev at lists.llvm.org<mailto:llvm-dev at lists.llvm.org>>; Maslov, Sergey V <sergey.v.maslov at intel.com<mailto:sergey.v.maslov at intel.com>>; daniel.towner at intel.com<mailto:daniel.towner at intel.com>
Subject: Re: [llvm-dev] [RFC] [X86] Emit unaligned vector moves on avx machine with option control.


I understand your goal is to find and fix bugs in software that is
still under development and CAN be fixed.  I fully endorse that
goal.  However, that is not the situation that Sony has, and likely
not what Intel has.  Your proposal will NOT solve our problem.

No, that's not it at all! I'm afraid you've totally misunderstood my concern.

My goal is that if we add a compiler feature to address this problem -- so that you can compile code with under-aligned objects, and have it work as the author expected it to --  that the feature reliably addresses the problem, and makes such code no longer exhibit Undefined Behavior. The proposed backend change does not accomplish that, but we can implement a feature which will.

As Reid said, -fmax-type-align=N appears to be almost that feature, and something like this little patch (along with documentation update) may be all that's needed (but this is totally untested).

diff --git clang/lib/CodeGen/CodeGenModule.cpp clang/lib/CodeGen/CodeGenModule.cpp
index b23d995683bf..3aef166a690e 100644
--- clang/lib/CodeGen/CodeGenModule.cpp
+++ clang/lib/CodeGen/CodeGenModule.cpp
@@ -6280,8 +6280,7 @@ CharUnits CodeGenModule::getNaturalTypeAlignment(QualType T,
   // Cap to the global maximum type alignment unless the alignment
   // was somehow explicit on the type.
   if (unsigned MaxAlign = getLangOpts().MaxTypeAlign) {
-    if (Alignment.getQuantity() > MaxAlign &&
-        !getContext().isAlignmentRequired(T))
+    if (Alignment.getQuantity() > MaxAlign)
       Alignment = CharUnits::fromQuantity(MaxAlign);
   }
   return Alignment;

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20210420/871f63bb/attachment.html>


More information about the llvm-dev mailing list