[PATCH] D23384: Use a byte array as an internal buffer of BitVector.

Thu Aug 11 13:45:43 PDT 2016

On Thu, Aug 11, 2016 at 1:39 PM, Rui Ueyama <ruiu at google.com> wrote:

> This patch is to store bits in the little-endian order regardless of host
> endianness (or, in other words, use a byte array as the internal storage
> instead of a word array), so that you can easily set/get internal buffers.
> We could define a new class for doing this, but is this bad for the
> existing BitVector?
>

Let's see!
Have you benchmarked it?

> It seems to me that this patch is an improvement even without these new
> features. What are the potential problems you are thinking?
>

Aliasing, etc.

>
> On Thu, Aug 11, 2016 at 8:42 AM, Daniel Berlin via llvm-commits <
> llvm-commits at lists.llvm.org> wrote:
>
>> Right, but my question is "why completely change the existing bitvector
>> type to try to accomplish this instead of make one for pdb, which has a
>> very different set of use cases and priorities"
>> For BitVector, priority 1 is fast bit set/reset and set operations.
>> There is no priority #2
>>
>> For PDB, it seems "being able to sanely read and write bitvector to disk
>> is priority 1, set operation performance is number two.
>>
>> At the point the priorities are basically diametrically opposed, *and*
>> patches so far have shown trying to do both creates a bit of a mess, i'd
>> create a new type.
>>
>> On Wed, Aug 10, 2016 at 9:48 PM, Zachary Turner <zturner at google.com>
>> wrote:
>>
>>> Actually memcpy is exactly what we want to do. The pdb's bits are in
>>> this order
>>>
>>> 7 6 5 4 3 2 1 0 | 15 14 13 12 11 10 9 8
>>>
>>> But the bitvectors aren't, so a memcpy won't work.
>>>
>>> Unless I've missed something
>>> On Wed, Aug 10, 2016 at 9:41 PM Daniel Berlin <dberlin at dberlin.org>
>>> wrote:
>>>
>>>> FWIW: Speaking as a guy who wrote the bitvector and sparse bitvector
>>>> implementations for a number of compilers, three generic data structure
>>>> libraries,  and other random apps, not only would I never expect to access
>>>> the underlying bytes in a bitvector, until this thread, i've never seen
>>>> anyway nobody's ever asked to :)
>>>>
>>>> In fact, they often asked for the opposite, and wanted them specialized
>>>> to be as large a datatype as possible internally and then have the ops
>>>> auto-vectorized (gcc, for example, does this: #define SBITMAP_ELT_TYPE
>>>> unsigned HOST_WIDEST_FAST_INT to get the fastest datatype possible for it).
>>>>
>>>> Is the PDB stuff so performance sensitive that you can't just create
>>>> OnDiskBitVector and pay for the memcpy to copy a BitVector into one?
>>>>
>>>> (Or, alternatively, use that type exclusively in the cases you need to
>>>> put it on disk)
>>>>
>>>> You also gain the advantage that you could specialize it so, for
>>>> example, it's mmapable and thus only the parts you touch are actually read
>>>> from disk.
>>>>
>>>> --Dan
>>>>
>>>>
>>>> On Wed, Aug 10, 2016 at 9:32 PM, Zachary Turner via llvm-commits <
>>>> llvm-commits at lists.llvm.org> wrote:
>>>>
>>>>> It's worth profiling but I would imagine performance to be the same or
>>>>> faster. Conceptually the only difference is storing longs internally or
>>>>> bytes internally. The latter allows fewer bitwise operations on average to
>>>>> extract or set a given bit, so i would imagine performance to be the same
>>>>> or faster. Regardless, since this is a performance sensitive data
>>>>> structure, we should profile.
>>>>>
>>>>> Design wise, i think this actually makes it more generic, not less. If
>>>>> you have a bit vector, it's reasonable to expect you can access the
>>>>> underlying bytes, but you can't when the internal representation is a
>>>>> sequence of longs.
>>>>>
>>>>>
>>>>> On Wed, Aug 10, 2016 at 9:25 PM Madhur Amilkanthwar <
>>>>> madhur13490 at gmail.com> wrote:
>>>>>
>>>>>> I agree with David. You mentioned the reasons about design choices
>>>>>> and things which would be allowed with this patch but what about
>>>>>> performance?
>>>>>>
>>>>>> On Thu, Aug 11, 2016 at 8:36 AM, David Majnemer via llvm-commits <
>>>>>> llvm-commits at lists.llvm.org> wrote:
>>>>>>
>>>>>>> majnemer added a subscriber: majnemer.
>>>>>>> majnemer added a comment.
>>>>>>>
>>>>>>> Have you actually measured this to be a major improvement? Are there
>>>>>>> so many bits that the old way is a major bottleneck?
>>>>>>> The BitVector is used for the CodeGen and optimizer, I'm not
>>>>>>> entirely convinced it makes sense to modify this generic datastructure so
>>>>>>> drastically for PDBs...
>>>>>>>
>>>>>>
>>>>>>>
>>>>>>> https://reviews.llvm.org/D23384
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> _______________________________________________
>>>>>>> llvm-commits mailing list
>>>>>>> llvm-commits at lists.llvm.org
>>>>>>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-commits
>>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> --
>>>>>> *Disclaimer: Views, concerns, thoughts, questions, ideas expressed in
>>>>>> this mail are of my own and my employer has no take in it. *
>>>>>> Thank You.
>>>>>> Madhur D. Amilkanthwar
>>>>>>
>>>>>>
>>>>> _______________________________________________
>>>>> llvm-commits mailing list
>>>>> llvm-commits at lists.llvm.org
>>>>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-commits
>>>>>
>>>>>
>>>>
>>
>> _______________________________________________
>> llvm-commits mailing list
>> llvm-commits at lists.llvm.org
>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-commits
>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20160811/a028aa6e/attachment.html>