Home  |  Linux  | Mysql  | PHP  | XML
From:Ken Williams Date:Mon Jan  8 21:38:10 2007
Subject:Re: text categorization with SVM and NaiveBayes
On Jan 8, 2007, at 10:51 AM, Tom Fawcett wrote:

> Just to add a note here: Ken is correct -- both NB and SVMs are  
> known to be rather poor at providing accurate probabilities.  Their  
> scores tend to be too extreme.  Producing good probabilities from  
> these scores is called calibrating the classifier, and it's more  
> complex than just taking a root of the score.  There are several  
> methods for calibrating scores.  The good news is that there's an  
> effective one called isotonic regression (or Pool Adjacent  
> Violators) which is pretty easy and fast.  The bad news is that  
> there's no plug-in (ie, CPAN-ready) perl implementation of it (I've  
> got a simple implementation which I should convert and contribute  
> someday).
>
> If you want to read about classifier calibration, google one of  
> these titles:
>
> "Transforming classifier scores into accurate multiclass  
> probability estimates"
> by Bianca Zadrozny and Charles Elkan
>
> "Predicting Good Probabilities With Supervised Learning"
> by A. Niculescu-Mizil and R. Caruana


Cool, thanks for the references.  It might be nice to add somesuch  
scheme to Algorithm::NaiveBayes (and friends), so that the user has a  
choice of several normalization schemes, including "none".  If I get  
a surplus of tuits I'll add it, or if you feel like contributing your  
stuff that would be great too.

  -Ken

Navigate in group perl.ai at sever nntp.perl.org
Previous Next




  
© No Copyright
You are free to use Anything
Site Maintained by PHP Developer
Powered By PHP Consultants