Home  |  Linux  | Mysql  | PHP  | XML
From:Ciaran Hamilton Date:Thu May 17 03:26:41 2007
Subject:Re: Problems with Perl Asian encodings?
Hi,

Samuel L. Bayer wrote:
> So the outcome was that there's a mode in GNU recode which will drop 
> these illegal first bytes. So the question is: is the same thing 
> possible in Perl Encode? The documentation for some of the FB_ variables 
> is tempting, but pretty opaque.

Yes, the way to do it is by using Encode::FB_QUIET. Basically, here's 
how you would do it... if $text is the text you want to decode into 
UTF-8, then this should do the trick:

-----
use Encode;

my $textcopy = $text;
my $encoding = "gb2312";

my $decoded = decode($encoding, $text, Encode::FB_QUIET);

while ($text ne "") {   # this loops while we've still got bad 
characters to deal with.
   ### my $badbyte = substr($text, 0, 1);   # $badbyte now contains the 
invalid byte.
   ### my $hex = sprintf("%X", ord($badbyte));
   ### print STDERR "Invalid character \\x" . ("0" x (1 - length($hex))) 
. $hex . " in input - dropping.\n";
   $text = substr($text, 1);   # skip over the bad character
   $decoded .= decode($encoding, $text, Encode::FB_QUIET);
}

print "Output: $decoded\n";
-----

The code as given will ignore every bad character and prints no 
warnings; if you want warnings, uncomment the lines marked with ###. It 
depends what you want your code to do. :D

Hope this helps!

  - Ciaran.
Navigate in group perl.i18n at sever nntp.perl.org
Previous Next




  
© No Copyright
You are free to use Anything
Site Maintained by PHP Developer
Powered By PHP Consultants