Home  |  Linux  | Mysql  | PHP  | XML
From:Mike Rylander Date:Mon May  5 19:52:18 2008
Subject:Re: Stripping out Unicode combining characters (diacritics)
On Mon, May 5, 2008 at 8:26 PM, Doran, Michael D <doran@uta.edu> wrote:
[snip]
>
>  I'm pulling my hair out on this... so any help would be appreciated.  If there's any other info I can provide, let me know.
>

You'll want to transform the text to NFD format (nominally, base
characters plus combining marks) instead of NFC (precombined
characters) using Unicode::Normalize:

 use Unicode::Normalize;

 my $text = NFD($original);
 $text =~ s/\pM+//go;

Hope that helps.

-- 
Mike Rylander
 | VP, Research and Design
 | Equinox Software, Inc. / The Evergreen Experts
 | phone: 1-877-OPEN-ILS (673-6457)
 | email: miker@esilibrary.com
 | web: http://www.esilibrary.com
Navigate in group perl.i18n at sever nntp.perl.org
Previous Next




  
© No Copyright
You are free to use Anything
Site Maintained by PHP Developer
Powered By PHP Consultants