[PATCH] D30416: [BitfieldShrinking] Shrink Bitfields load/store when the bitfields are legal to access independently

Wei Mi via Phabricator via llvm-commits llvm-commits at lists.llvm.org
Thu May 11 07:26:39 PDT 2017


wmi added inline comments.


================
Comment at: lib/CodeGen/MemAccessShrinking.cpp:967-997
+/// When the input instruction \p IN is and(Val, Cst) or trunc, it indicates
+/// only a portion of its input value has been used. We will walk through the
+/// Def-Use chain, track the range of value which will be used, remember the
+/// operations contributing to the used value range, and skip operations which
+/// changes value range that is not to be used, until a load is found.
+///
+/// So we start from and or trunc operations, then try to find a sequence of
----------------
wmi wrote:
> chandlerc wrote:
> > wmi wrote:
> > > chandlerc wrote:
> > > > Rather than re-implementing all of this logic, can you re-use the existing demanded bits facilities in LLVM?
> > > > 
> > > > For example, I think you can use the `DemandedBits` analysis, walk all loads in the function, and then narrow them based on the demanded bits it has computed. Because of how `DemandedBits` works, it is both efficient and very powerful. It can handle many more patterns.
> > > > 
> > > > Thinking about this, I suspect you'll want to do two passes essentially. First, narrow all the *stores* that you can. This will likely be iterative. Once that finishes, it seems like you'll be able to then do a single walk over the loads with a fresh `DemandedBits` analysis and narrow all of those left. You'll probably want to narrow the stores first because that may make bits stop being demanded. But I don't see any way for the reverse to be true, so there should be a good sequencing.
> > > > 
> > > > To make the analysis invalidation stuff easier, you may actually need this to actually be two passes so that the store pass can invalidate the `DemandedBits` analysis, and the load pass can recompute it fresh.
> > > > 
> > > > Does that make sense?
> > > > 
> > > > If so, I would suggest getting just the store shrinking in this patch, and add the load shrinking in a follow-up patch. I'm happy for them to be implemented in a single file as they are very similar and its good for people to realize they likely want *both* passes.
> > > I considered demanded bits facilities before, but I found it can only simplify the code a little bit. Finding the demanded bits inside of the load is only a small part of the work. Most of the complexity of the code comes from figuring out which ops in the sequence on the Def-Use Chain change the demanded bits. Like if we see shifts, we may clear some demanded bits in less significant position to zeros because we shift right then shift left. Because we change the demanded bits, we must include the shifts into the shrinked code sequence. Like if we see Or(And(Or(And(...)) pattern,  we want to know that the bits changed by Or(And()) are different bits of the demanded bits, only when that is true, we can omit the Or(And(...)) pattern in the final shrinked code sequence. Another reason is, demanded bits analysis may not be very cheap. As for memory shrinking, few pattern like and/trunc is very common to be useful for the shrinking so we actually don't need very general demanded bits analysis for every instruction. 
> > I don't really understand the argument here...
> > 
> > I would expect the demanded bits facilities to already handle things like shifts changing which bits are demanded, does it not? If not, it seems like we should extend that general facility rather than building an isolated system over here.
> > 
> > Regarding the cost, I'm much more worried about the algorithmic cost of this and the fact that it seems relatively sensitive to things that we don't reliably canonicalize (using trunc or and instructions to remove demanded bits).
> > 
> > Generally speaking, working *backwards* from the and or trunc is going to be much more expensive than working *forwards*.
> > 
> > But even if it the existing demanded bits is too expensive, we still shouldn't develop a new cheap one locally here. We should either add a parameter to decrease its cost or add a new general purpose "fast" demanded bits and then use that here.
> > I would expect the demanded bits facilities to already handle things like shifts changing which bits are demanded, does it not?
> 
> Yes, demanded bits facilities can adjust which bits are demanded if there is shift. However, which bits are the demanded bits is not the only thing I need to know. I still need to know whether an operation in the Def-Use Chain effectively change the value of the bits demanded. If the operation changes the value of demanded bits, the operation should be shrinked together with the load. If it only changes the value of bits other than the demanded bits, the operation can be omitted. That is actually what most of the pattern matching is doing in load shrinking, and that part of work I think cannot be fullfilled by demanded bits analysis.   
Even only for demanded bits, working forwards is not that straightforward because a wide load for a field group will be shared by multiple field accesses, and demanded bits analysis will sometimes show that all the bits of the load will be used. And it is possible that on the upper side of the Def-Use Chain, the width of the demanded bits are larger because of a node may have multiple uses, on the lower side of the Def-Use Chain, the width of the demanded bits are smaller. Working forward, we have to do some search on the expr tree rooted at a load stmt. Working backwards will let us know the demanded bits we want to use from the beginning. 


Repository:
  rL LLVM

https://reviews.llvm.org/D30416





More information about the llvm-commits mailing list