[llvm-dev] [cfe-dev] [RFC] Loading Bitfields with Smallest Needed Types

Tue May 26 16:17:29 PDT 2020

When writing my initial email, I forgot another option which Eli
pointed out: don't shrink the store's size. That would be acceptable
for our purposes. If it's something that needs further consideration,
perhaps we could disable it via a flag (not an official "-m..." flag,
but "-mllvm -disable-store-shortening" or whatever)?

-bw

On Tue, May 26, 2020 at 3:54 PM Eli Friedman <efriedma at quicinc.com> wrote:
>
> By default, clang emits all bitfield load/store operations using the width of the entire sequence of bitfield members.  If you look at the LLVM IR for your testcase, all the bitfield operations are i16.  (For thread safety, the C/C++ standards treat a sequence of bitfield members as a single "field".)
>
> If you look at the assembly, though, an "andb $-2, (%rdi)" slips in.  This is specific to the x86 backend: it's narrowing the store to save a couple bytes in the encoding, and a potential decoding stall due to a 2-byte immediate.  Maybe we shouldn't do that, or we should guard it with a better heuristic.
>
> -Eli
>
> -----Original Message-----
> From: cfe-dev <cfe-dev-bounces at lists.llvm.org> On Behalf Of Bill Wendling via cfe-dev
> Sent: Tuesday, May 26, 2020 3:29 PM
> To: cfe-dev at lists.llvm.org Developers <cfe-dev at lists.llvm.org>; llvm-dev <llvm-dev at lists.llvm.org>
> Subject: [EXT] [cfe-dev] [RFC] Loading Bitfields with Smallest Needed Types
>
> We're running into an interesting issue with the Linux kernel, and
> wanted advice on how to proceed.
>
> Example of what we're talking about: https://godbolt.org/z/ABGySq
>
> The issue is that when working with a bitfield a load may happen
> quickly after a store. For instance:
>
> struct napi_gro_cb {
>     void *frag;
>     unsigned int frag_len;
>     u16 flush;
>     u16 flush_id;
>     u16 count;
>     u16 gro_remcsum_start;
>     unsigned long age;
>     u16 proto;
>     u8 same_flow : 1;
>     u8 encap_mark : 1;
>     u8 csum_valid : 1;
>     u8 csum_cnt : 3;
>     u8 free : 2;
>     u8 is_ipv6 : 1;
>     u8 is_fou : 1;
>     u8 is_atomic : 1;
>     u8 recursive_counter : 4;
>     __wsum csum;
>     struct sk_buff *last;
> };
>
> void dev_gro_receive(struct sk_buff *skb)
> {
> ...
>         same_flow = NAPI_GRO_CB(skb)->same_flow;
> ...
> }
>
> Right before the "same_flow = ... ->same_flow;" statement is executed,
> a store is made to the bitfield at the end of a called function:
>
>     NAPI_GRO_CB(skb)->same_flow = 1;
>
> The store is a byte:
>
>     orb    $0x1,0x4a(%rbx)
>
> while the read is a word:
>
>     movzwl 0x4a(%r12),%r15d
>
> The problem is that between the store and the load the value hasn't
> been retired / placed in the cache. One would expect store-to-load
> forwarding to kick in, but on x86 that doesn't happen because x86
> requires the store to be of equal or greater size than the load. So
> instead the load takes the slow path, causing unacceptable slowdowns.
>
> GCC gets around this by using the smallest load for a bitfield. It
> seems to use a byte for everything, at least in our examples. From the
> comments, this is intentional, because according to the comments
> (which are never wrong) C++0x doesn't allow one to touch bits outside
> of the bitfield. (I'm not a language lawyer, but take this to mean
> that gcc is trying to minimize which bits it's accessing by using byte
> stores and loads whenever possible.)
>
> The question I have is what should we do to fix this issue? Once we
> get to LLVM IR, the information saying that we're accessing a bitfield
> is gone. We have a few options:
>
> * We can glean this information from how the loaded value is used and
> fix this during DAG combine, but it seems a bit brittle.
>
> * We could correct the size of the load during the front-end's code
> generation. This benefits from using all of LLVM's passes on the code.
>
> * We could perform the transformation in another place, possible in MIR or MC.
>
> What do people think?
>
> -bw
> _______________________________________________
> cfe-dev mailing list
> cfe-dev at lists.llvm.org
> https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev