[PATCH] D23384: Use a byte array as an internal buffer of BitVector.

Thu Aug 11 13:39:43 PDT 2016

This patch is to store bits in the little-endian order regardless of host
endianness (or, in other words, use a byte array as the internal storage
instead of a word array), so that you can easily set/get internal buffers.
We could define a new class for doing this, but is this bad for the
existing BitVector? It seems to me that this patch is an improvement even
without these new features. What are the potential problems you are
thinking?

On Thu, Aug 11, 2016 at 8:42 AM, Daniel Berlin via llvm-commits <
llvm-commits at lists.llvm.org> wrote:

> Right, but my question is "why completely change the existing bitvector
> type to try to accomplish this instead of make one for pdb, which has a
> very different set of use cases and priorities"
> For BitVector, priority 1 is fast bit set/reset and set operations.
> There is no priority #2
>
> For PDB, it seems "being able to sanely read and write bitvector to disk
> is priority 1, set operation performance is number two.
>
> At the point the priorities are basically diametrically opposed, *and*
> patches so far have shown trying to do both creates a bit of a mess, i'd
> create a new type.
>
> On Wed, Aug 10, 2016 at 9:48 PM, Zachary Turner <zturner at google.com>
> wrote:
>
>> Actually memcpy is exactly what we want to do. The pdb's bits are in this
>> order
>>
>> 7 6 5 4 3 2 1 0 | 15 14 13 12 11 10 9 8
>>
>> But the bitvectors aren't, so a memcpy won't work.
>>
>> Unless I've missed something
>> On Wed, Aug 10, 2016 at 9:41 PM Daniel Berlin <dberlin at dberlin.org>
>> wrote:
>>
>>> FWIW: Speaking as a guy who wrote the bitvector and sparse bitvector
>>> implementations for a number of compilers, three generic data structure
>>> libraries,  and other random apps, not only would I never expect to access
>>> the underlying bytes in a bitvector, until this thread, i've never seen
>>> anyway nobody's ever asked to :)
>>>
>>> In fact, they often asked for the opposite, and wanted them specialized
>>> to be as large a datatype as possible internally and then have the ops
>>> auto-vectorized (gcc, for example, does this: #define SBITMAP_ELT_TYPE
>>> unsigned HOST_WIDEST_FAST_INT to get the fastest datatype possible for it).
>>>
>>> Is the PDB stuff so performance sensitive that you can't just create
>>> OnDiskBitVector and pay for the memcpy to copy a BitVector into one?
>>>
>>> (Or, alternatively, use that type exclusively in the cases you need to
>>> put it on disk)
>>>
>>> You also gain the advantage that you could specialize it so, for
>>> example, it's mmapable and thus only the parts you touch are actually read
>>> from disk.
>>>
>>> --Dan
>>>
>>>
>>> On Wed, Aug 10, 2016 at 9:32 PM, Zachary Turner via llvm-commits <
>>> llvm-commits at lists.llvm.org> wrote:
>>>
>>>> It's worth profiling but I would imagine performance to be the same or
>>>> faster. Conceptually the only difference is storing longs internally or
>>>> bytes internally. The latter allows fewer bitwise operations on average to
>>>> extract or set a given bit, so i would imagine performance to be the same
>>>> or faster. Regardless, since this is a performance sensitive data
>>>> structure, we should profile.
>>>>
>>>> Design wise, i think this actually makes it more generic, not less. If
>>>> you have a bit vector, it's reasonable to expect you can access the
>>>> underlying bytes, but you can't when the internal representation is a
>>>> sequence of longs.
>>>>
>>>>
>>>> On Wed, Aug 10, 2016 at 9:25 PM Madhur Amilkanthwar <
>>>> madhur13490 at gmail.com> wrote:
>>>>
>>>>> I agree with David. You mentioned the reasons about design choices and
>>>>> things which would be allowed with this patch but what about performance?
>>>>>
>>>>> On Thu, Aug 11, 2016 at 8:36 AM, David Majnemer via llvm-commits <
>>>>> llvm-commits at lists.llvm.org> wrote:
>>>>>
>>>>>> majnemer added a subscriber: majnemer.
>>>>>> majnemer added a comment.
>>>>>>
>>>>>> Have you actually measured this to be a major improvement? Are there
>>>>>> so many bits that the old way is a major bottleneck?
>>>>>> The BitVector is used for the CodeGen and optimizer, I'm not entirely
>>>>>> convinced it makes sense to modify this generic datastructure so
>>>>>> drastically for PDBs...
>>>>>>
>>>>>
>>>>>>
>>>>>> https://reviews.llvm.org/D23384
>>>>>>
>>>>>>
>>>>>>
>>>>>> _______________________________________________
>>>>>> llvm-commits mailing list
>>>>>> llvm-commits at lists.llvm.org
>>>>>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-commits
>>>>>>
>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> *Disclaimer: Views, concerns, thoughts, questions, ideas expressed in
>>>>> this mail are of my own and my employer has no take in it. *
>>>>> Thank You.
>>>>> Madhur D. Amilkanthwar
>>>>>
>>>>>
>>>> _______________________________________________
>>>> llvm-commits mailing list
>>>> llvm-commits at lists.llvm.org
>>>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-commits
>>>>
>>>>
>>>
>
> _______________________________________________
> llvm-commits mailing list
> llvm-commits at lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-commits
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20160811/40e819f8/attachment.html>