chore(README): explain how the top 10 method increases accuracy and F measure
This commit is contained in:
parent
7d0ce29ba8
commit
ace0a18653
@ -43,7 +43,7 @@ stack exec example-xor
|
|||||||
# using Porter stemming, stopword elimination and a few custom techniques.
|
# using Porter stemming, stopword elimination and a few custom techniques.
|
||||||
# The dataset is imbalanced which causes the classifier to be biased towards some classes (earn, acq, ...)
|
# The dataset is imbalanced which causes the classifier to be biased towards some classes (earn, acq, ...)
|
||||||
# to workaround the imbalanced dataset problem, there is a --top-ten option which classifies only top 10 popular
|
# to workaround the imbalanced dataset problem, there is a --top-ten option which classifies only top 10 popular
|
||||||
# classes, with evenly split datasets (100 for each)
|
# classes, with evenly split datasets (100 for each), this increases F Measure significantly, along with ~10% of improved accuracy
|
||||||
# N-Grams don't seem to help us much here (or maybe my implementation is wrong!), using bigrams increases
|
# N-Grams don't seem to help us much here (or maybe my implementation is wrong!), using bigrams increases
|
||||||
# accuracy, while decreasing F-Measure slightly.
|
# accuracy, while decreasing F-Measure slightly.
|
||||||
stack exec example-naivebayes-doc-classifier -- --verbose
|
stack exec example-naivebayes-doc-classifier -- --verbose
|
||||||
|
Loading…
Reference in New Issue
Block a user