[LLVMbugs] [Bug 13602] basic_filebuf's internal buffer is shrinking when using with some codecvt.

bugzilla-daemon at llvm.org bugzilla-daemon at llvm.org
Fri Aug 17 13:52:48 PDT 2012


http://llvm.org/bugs/show_bug.cgi?id=13602

Hyeon-Bin Jeong <tuhertz at gmail.com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|RESOLVED                    |REOPENED
         Resolution|INVALID                     |

--- Comment #2 from Hyeon-Bin Jeong <tuhertz at gmail.com> 2012-08-17 15:52:48 CDT ---
I think i'm doing exactly what you are saying and what the standard intend. (Am
i missing something?)
I'm trying to make a codecvt which converts external char type (UTF-8) to
internal char32_t  type (UTF-32). 

UTF-8 has 1~4 bytes so it's N:1 conversion. When overflow() is called, it fills
4096 bytes external buffer with char(UTF-8) sequence from FILE object. and then
convert them into internal buffer with char32_t(UTF-32) characters by calling
in().

__r = __cv_->in(__st_, __extbuf_, __extbufend_, __extbufnext_,
                                       this->eback() + __unget_sz,
                                       this->egptr(), __inext);

But it produce only about 1300 char32_t characters when converting asian
characters because most of asian language has 3 bytes character width in UTF-8.
So __inext move only a third of buffer size; 

The problem happens when it calls setg after a few line below.

this->setg(this->eback(), this->eback() + __unget_sz, __inext);

This line sets (internal) buffer end(i.e. __einp_) to __inext. So after this
line, egptr() returns position at one third of the way from __intbuf_ to
__intbuf+__ibs_.
When next underflow called, It calculate read size __nmemb by egptr() -
eback(), so it load only 33% of external buffer! As a result, buffer size keep
shrinking on each underflow() call until it's size to be 1 byte. 

Here is sample code from standard documents and it use intern_buf+ISIZE as
buffer end, not egptr().

  char   extern_buf[XSIZE];
  char*  extern_end;
  charT  intern_buf[ISIZE];
  charT* intern_end;
  codecvt_base::result r =
    a_codecvt.in(state, extern_buf, extern_buf+XSIZE, extern_end,
                 intern_buf, intern_buf+ISIZE, intern_end);

And one thing more. I think seekpos() should restore the state from position
argument. seekoff() save current state into position and return it, so
seekpos() need to restore state from it.

-- 
Configure bugmail: http://llvm.org/bugs/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are on the CC list for the bug.



More information about the llvm-bugs mailing list