LMPX.COM |
Home | Linux | Mysql | PHP | XML | ||
|
|
|||
From: Ciaran Hamilton Date: Thu May 17 03:26:41 2007 Subject: Re: Problems with Perl Asian encodings?
Hi, Samuel L. Bayer wrote: > So the outcome was that there's a mode in GNU recode which will drop > these illegal first bytes. So the question is: is the same thing > possible in Perl Encode? The documentation for some of the FB_ variables > is tempting, but pretty opaque. Yes, the way to do it is by using Encode::FB_QUIET. Basically, here's how you would do it... if $text is the text you want to decode into UTF-8, then this should do the trick: ----- use Encode; my $textcopy = $text; my $encoding = "gb2312"; my $decoded = decode($encoding, $text, Encode::FB_QUIET); while ($text ne "") { # this loops while we've still got bad characters to deal with. ### my $badbyte = substr($text, 0, 1); # $badbyte now contains the invalid byte. ### my $hex = sprintf("%X", ord($badbyte)); ### print STDERR "Invalid character \\x" . ("0" x (1 - length($hex))) . $hex . " in input - dropping.\n"; $text = substr($text, 1); # skip over the bad character $decoded .= decode($encoding, $text, Encode::FB_QUIET); } print "Output: $decoded\n"; ----- The code as given will ignore every bad character and prints no warnings; if you want warnings, uncomment the lines marked with ###. It depends what you want your code to do. :D Hope this helps! - Ciaran.
| Navigate in group perl.i18n at sever nntp.perl.org | |
| Previous | Next |
| © No Copyright You are free to use Anything |
Site Maintained by PHP Developer
Powered By PHP Consultants |