LMPX.COM |
Home | Linux | Mysql | PHP | XML | ||
|
|
|||
From: Samuel L. Bayer Date: Mon May 14 10:09:44 2007 Subject: Re: Problems with Perl Asian encodings?
Samuel L. Bayer wrote: > Has anyone else done such a comparison of GNU recode and Perl Encode? > I'd very much prefer to move the Perl, not simply for efficiency but > because, unlike GNU recode, it appears to be actively maintained; > however, the error rate is just too high, especially considering that > the GNU recode output looks clean, and our users have not complained > about it. Hi again all - Last week, I sent out a query about Asian encodings and Perl Encode vs. GNU recode. Martin Thurn graciously helped me debug this problem, and I can now summarize as follows, quoting Martin: " In the sample data you sent, in the original GB2312, right after the word "diode", there is an octal \244 and octal \112. Octal \244 = decimal 164 which is not a legal first-byte in GB2312. Recode apparently dropped the \244 and left the \112 as-is, a capital J. Encode apparently converted the \244 to a default UTF-8 "unknown character" and left the \112 as-is, a capital J." So the outcome was that there's a mode in GNU recode which will drop these illegal first bytes. So the question is: is the same thing possible in Perl Encode? The documentation for some of the FB_ variables is tempting, but pretty opaque. Again, I'm using Perl 5.8.7, with the versions of Encode that come with that distribution. Thanks so much in advance - Sam Bayer The MITRE Corporation sam@mitre.org
| Navigate in group perl.i18n at sever nntp.perl.org | |
| Previous | Next |
| © No Copyright You are free to use Anything |
Site Maintained by PHP Developer
Powered By PHP Consultants |