[llvm-commits] Triple normalization bug

Renato Golin renato.golin at arm.com
Thu Jan 27 07:17:35 PST 2011


Hi there,

So, only today I could get back to the normalization bug, let me
explain what is going on and why I fixed that way. I've reverted my
OS/Env changes, leaving only the Environment additions (gnu, eabi) and
letting normalize do its job.


=== Context

Triple.cpp, std::string Triple::normalize(StringRef Str); Triple =
"x86_64-gnu-linux";


=== The problem

First, the Components vector is created to hold 4 elements (but not
filled with 4 elements). The string tokenizer fills it up with three
elements: (x86_64, gnu and linux). The fourth element is not pushed
back (so it doesn't exist).

  SmallVector<StringRef, 4> Components;
  for (size_t First = 0, Last = 0; Last != StringRef::npos; First = Last + 1) {
    Last = Str.find('-', First);
    Components.push_back(Str.slice(First, Last));
  }

The elements are parsed and only x86_64 and linux makes sense, so
Found array contains { true, false, true, false }; Note, it has 4
*allocated* elements.

  bool Found[4];
  Found[0] = Arch != UnknownArch;
  Found[1] = Vendor != UnknownVendor;
  Found[2] = OS != UnknownOS;
  Found[3] = Environment != UnknownEnvironment;

The normalization passes through all Pos [0 -> 3] (positions that
weren't found) and internally iterate through all Idx [0 -> 2] (tokens
split).

  for (unsigned Pos = 0; Pos != array_lengthof(Found); ++Pos) {
    if (Found[Pos])
      continue; // Already in the canonical position.

    for (unsigned Idx = 0; Idx != Components.size(); ++Idx) {
      // Do not reparse any components that already matched.
      if (Idx < array_lengthof(Found) && Found[Idx])
        continue;

Because of the continue on both places, the only two times it enters
the inner loop is when Pos = 1, Idx = 1 (Vendor, "gnu") and Pos = 3,
Idx = 1 (Environment, "gnu"). On the first case, Vendor is still
invalid, but in the second case the normalization detected a match (as
it should). And here's where the bug shows its face:

Pos = 3, Idx = 1:

      } else if (Pos > Idx) {
        do {
          StringRef CurrentComponent(""); // The empty component.

i = [1 -> 2], Components.size() = 3

          for (unsigned i = Idx; i < Components.size(); ++i) {
            // Skip over any fixed components.

The first time it passes here, CurrentComponent swaps with "gnu", so
Vendor is now empty string. On the second pass, however:

lengthof(Found) is 4, while the vector size is 3. The next not-found
item is on Found[4] (index 3), which is one pass the Components' size:

            while (i < array_lengthof(Found) && Found[i]) ++i; /* THIS
IS WRONG */

Here, i = 3;

            std::swap(CurrentComponent, Components[i]);

Bang! Components[3] hasn't been allocated, segfault.


=== The fix

To avoid major changes and adding too much code/data for a single
special case, I've added the extra (not so nice) addition to
Components vector the last element. However, there should be a better
way.

If you want to place the element at the end, you MUST have it
initialized with empty string so "swap" can work OR avoid using swap
when there is no such element at that index (probably the best
solution).


=== Why it works with "linux"?

Because that movement is done by two distinct loops. The "do { }" and
and "for (i) { }". The triple "linux" is moved up by the "do" loop
because there were no other found tokens (so the for loop is always
executed once), which works and triples that have a partial match
(such as x86_64-gnu-linux) go into the "for" loop and pass the end of
the vector.

As I said, this was a latent bug that didn't show up because there
weren't enough tests for it (not enough combinations of triples).


=== The OS/Env hack

As Duncan said, it was redundant and wrong, the normalization
should've done that automatically, and it actually does. I fell in the
error of trying that firs to solve the segfault problem and not
removing it after I found the real problem.


=== What I'll do

I have removed already the OS/Env hack locally and am tweaking the
swap to not break under those situations and will add some random
tests in TripleTests.cpp with the corner cases I can think of. Feel
free to add more, even unrealistic ones, in case some other crazy guys
come with weird triples... ;)



-- 
cheers,
--renato



More information about the llvm-commits mailing list