From ace0a18653977785b0f2ed0b7214b22e28ad5f11 Mon Sep 17 00:00:00 2001 From: Mahdi Dibaiee Date: Sun, 21 Aug 2016 01:21:42 +0430 Subject: [PATCH] chore(README): explain how the top 10 method increases accuracy and F measure --- README.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/README.md b/README.md index 90a5af9..8e5dbdc 100644 --- a/README.md +++ b/README.md @@ -43,7 +43,7 @@ stack exec example-xor # using Porter stemming, stopword elimination and a few custom techniques. # The dataset is imbalanced which causes the classifier to be biased towards some classes (earn, acq, ...) # to workaround the imbalanced dataset problem, there is a --top-ten option which classifies only top 10 popular -# classes, with evenly split datasets (100 for each) +# classes, with evenly split datasets (100 for each), this increases F Measure significantly, along with ~10% of improved accuracy # N-Grams don't seem to help us much here (or maybe my implementation is wrong!), using bigrams increases # accuracy, while decreasing F-Measure slightly. stack exec example-naivebayes-doc-classifier -- --verbose