[PATCH] Implemented llvm::sys::locale::columnWidth and isPrint for the case of generic UTF8-capable terminal.

Thu Aug 8 11:38:57 PDT 2013

================
Comment at: llvm/trunk/include/llvm/Support/UnicodeCharRanges.h:22
@@ +21,3 @@
+
+struct UnicodeCharRange {
+  uint32_t Lower;
----------------
Jordan Rose wrote:
> Manuel Klimek wrote:
> > I think it makes sense to put a comment here on the closedness of the ranges.
> > Defining a "is-in" on the class would also basically be self-documenting.
> > 
> I like the idea of using a method, but the actual implementation of the search wouldn't benefit from it. Your other comment may make this unnecessary, though.
True, the is-in is actually never needed... I'd probably still benefit from the comment, just to resolve the questions while reading the code early on.

================
Comment at: llvm/trunk/include/llvm/Support/UnicodeCharRanges.h:30
@@ +29,3 @@
+/// [min, max], and if the ranges themselves are ordered and non-overlapping.
+static inline bool isValidCharSet(UnicodeCharSet CharSet) {
+#ifndef NDEBUG
----------------
Jordan Rose wrote:
> Manuel Klimek wrote:
> > To me this whole function looks like it would be better put into a unit test.
> I got this feedback when I implemented it the first time; the reason I was worried about that is because I anticipating someone adding a new set and forgetting to add a test. This way it's forced just by calling `isCharInSet`.
Yea, I can see the motivation.

Personally, if this is really a concern, I'd perhaps do something like this (of course with proper names...):
  static const UnicodeCharRange[] CharSet1= { ... };
  ...
  static const ArrayRef<UnicodeCharRange> CharSets[] = {
    CharSet1, CharSet2, ...
  };
  // Comment that explains that those are indices into CharSets, etc
  enum CharSetSelector {
    ... = 0, ... = 1
  }
  bool isInCharSet(uint32_t C, CharSetSelector CS) {
    return std::binary_search(CharSets[CS].begin(), CharSets[CS].end(), C);
  }

Now we can just test CharSets in a unit test, and everybody will add a new CharSet there, just because it's convenient (if you really want to be safe, one could add a class with visiblity around the CharSets).

================
Comment at: llvm/trunk/include/llvm/Support/UnicodeCharRanges.h:78
@@ +77,3 @@
+LLVM_READONLY static inline bool isCharInSet(uint32_t C,
+                                             UnicodeCharSet CharSet) {
+  assert(isValidCharSet(CharSet));
----------------
Jordan Rose wrote:
> Manuel Klimek wrote:
> > If we define operator<(uint32_t, UnicodeCharRange) and
> > operator<(UnicodeCharRange, uint32_t), this becomes
> > std::binary_search(CharSet.begin(), CharSet.end(), C);
> > I pretty strongly vote against adding an implementation of binary search.
> Interesting. Feels weird to me, but I guess there's actually a reasonable interpretation for both of those even outside this context.
I think so, and I also think that they conform to what the standard requires (at least C++11) for the use here. I'll happily admit that it also feels weird to me, though, but not weirder than seeing a hand-rolled binary-search ;)

http://llvm-reviews.chandlerc.com/D1253

COMMIT
  http://llvm-reviews.chandlerc.com/rL187837