[PATCH] D23384: Use a byte array as an internal buffer of BitVector.

Thu Aug 11 13:50:42 PDT 2016

On Thu, Aug 11, 2016 at 1:45 PM, Daniel Berlin <dberlin at dberlin.org> wrote:

> On Thu, Aug 11, 2016 at 1:39 PM, Rui Ueyama <ruiu at google.com> wrote:
>
>> This patch is to store bits in the little-endian order regardless of host
>> endianness (or, in other words, use a byte array as the internal storage
>> instead of a word array), so that you can easily set/get internal buffers.
>> We could define a new class for doing this, but is this bad for the
>> existing BitVector?
>>
>
> Let's see!
> Have you benchmarked it?
>

Nope, will do. Do we have a benchmark suite or something? How do you
usually measure the compiler performance?

>
>
>> It seems to me that this patch is an improvement even without these new
>> features. What are the potential problems you are thinking?
>>
>
> Aliasing, etc.
>
>
>
>>
>> On Thu, Aug 11, 2016 at 8:42 AM, Daniel Berlin via llvm-commits <
>> llvm-commits at lists.llvm.org> wrote:
>>
>>> Right, but my question is "why completely change the existing bitvector
>>> type to try to accomplish this instead of make one for pdb, which has a
>>> very different set of use cases and priorities"
>>> For BitVector, priority 1 is fast bit set/reset and set operations.
>>> There is no priority #2
>>>
>>> For PDB, it seems "being able to sanely read and write bitvector to disk
>>> is priority 1, set operation performance is number two.
>>>
>>> At the point the priorities are basically diametrically opposed, *and*
>>> patches so far have shown trying to do both creates a bit of a mess, i'd
>>> create a new type.
>>>
>>> On Wed, Aug 10, 2016 at 9:48 PM, Zachary Turner <zturner at google.com>
>>> wrote:
>>>
>>>> Actually memcpy is exactly what we want to do. The pdb's bits are in
>>>> this order
>>>>
>>>> 7 6 5 4 3 2 1 0 | 15 14 13 12 11 10 9 8
>>>>
>>>> But the bitvectors aren't, so a memcpy won't work.
>>>>
>>>> Unless I've missed something
>>>> On Wed, Aug 10, 2016 at 9:41 PM Daniel Berlin <dberlin at dberlin.org>
>>>> wrote:
>>>>
>>>>> FWIW: Speaking as a guy who wrote the bitvector and sparse bitvector
>>>>> implementations for a number of compilers, three generic data structure
>>>>> libraries,  and other random apps, not only would I never expect to access
>>>>> the underlying bytes in a bitvector, until this thread, i've never seen
>>>>> anyway nobody's ever asked to :)
>>>>>
>>>>> In fact, they often asked for the opposite, and wanted them
>>>>> specialized to be as large a datatype as possible internally and then have
>>>>> the ops auto-vectorized (gcc, for example, does this: #define
>>>>> SBITMAP_ELT_TYPE unsigned HOST_WIDEST_FAST_INT to get the fastest datatype
>>>>> possible for it).
>>>>>
>>>>> Is the PDB stuff so performance sensitive that you can't just create
>>>>> OnDiskBitVector and pay for the memcpy to copy a BitVector into one?
>>>>>
>>>>> (Or, alternatively, use that type exclusively in the cases you need to
>>>>> put it on disk)
>>>>>
>>>>> You also gain the advantage that you could specialize it so, for
>>>>> example, it's mmapable and thus only the parts you touch are actually read
>>>>> from disk.
>>>>>
>>>>> --Dan
>>>>>
>>>>>
>>>>> On Wed, Aug 10, 2016 at 9:32 PM, Zachary Turner via llvm-commits <
>>>>> llvm-commits at lists.llvm.org> wrote:
>>>>>
>>>>>> It's worth profiling but I would imagine performance to be the same
>>>>>> or faster. Conceptually the only difference is storing longs internally or
>>>>>> bytes internally. The latter allows fewer bitwise operations on average to
>>>>>> extract or set a given bit, so i would imagine performance to be the same
>>>>>> or faster. Regardless, since this is a performance sensitive data
>>>>>> structure, we should profile.
>>>>>>
>>>>>> Design wise, i think this actually makes it more generic, not less.
>>>>>> If you have a bit vector, it's reasonable to expect you can access the
>>>>>> underlying bytes, but you can't when the internal representation is a
>>>>>> sequence of longs.
>>>>>>
>>>>>>
>>>>>> On Wed, Aug 10, 2016 at 9:25 PM Madhur Amilkanthwar <
>>>>>> madhur13490 at gmail.com> wrote:
>>>>>>
>>>>>>> I agree with David. You mentioned the reasons about design choices
>>>>>>> and things which would be allowed with this patch but what about
>>>>>>> performance?
>>>>>>>
>>>>>>> On Thu, Aug 11, 2016 at 8:36 AM, David Majnemer via llvm-commits <
>>>>>>> llvm-commits at lists.llvm.org> wrote:
>>>>>>>
>>>>>>>> majnemer added a subscriber: majnemer.
>>>>>>>> majnemer added a comment.
>>>>>>>>
>>>>>>>> Have you actually measured this to be a major improvement? Are
>>>>>>>> there so many bits that the old way is a major bottleneck?
>>>>>>>> The BitVector is used for the CodeGen and optimizer, I'm not
>>>>>>>> entirely convinced it makes sense to modify this generic datastructure so
>>>>>>>> drastically for PDBs...
>>>>>>>>
>>>>>>>
>>>>>>>>
>>>>>>>> https://reviews.llvm.org/D23384
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> _______________________________________________
>>>>>>>> llvm-commits mailing list
>>>>>>>> llvm-commits at lists.llvm.org
>>>>>>>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-commits
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> --
>>>>>>> *Disclaimer: Views, concerns, thoughts, questions, ideas expressed
>>>>>>> in this mail are of my own and my employer has no take in it. *
>>>>>>> Thank You.
>>>>>>> Madhur D. Amilkanthwar
>>>>>>>
>>>>>>>
>>>>>> _______________________________________________
>>>>>> llvm-commits mailing list
>>>>>> llvm-commits at lists.llvm.org
>>>>>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-commits
>>>>>>
>>>>>>
>>>>>
>>>
>>> _______________________________________________
>>> llvm-commits mailing list
>>> llvm-commits at lists.llvm.org
>>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-commits
>>>
>>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20160811/9f916a1a/attachment.html>