sibe/README.md

sibe
====
A simple Machine Learning library.

## Simple neural network
```haskell
import Numeric.Sibe

let a = (sigmoid, sigmoid') -- activation function
    -- random network, seed 0, values between -1 and 1,
    -- two inputs, two nodes in hidden layer and a single output
    rnetwork = randomNetwork 0 (-1, 1) 2 [(2, a)] (1, a)

    -- inputs and labels
    inputs = [vector [0, 1], vector [1, 0], vector [1, 1], vector [0, 0]]
    labels = [vector [1], vector [1], vector [0], vector [0]]

    -- define the session which includes parameters
    session = def { network = rnetwork
                  , learningRate = 0.5
                  , epochs = 1000
                  , training = zip inputs labels
                  , test = zip inputs labels
                  , drawChart = True
                  , chartName = "nn.png" -- draws chart of loss over time
                  } :: Session

    initialCost = crossEntropy session

-- run gradient descent
-- you can also use `sgd`, see the notmnist example
newsession <- run gd session

let results = map (`forward` newsession) inputs
    rounded = map (map round . toList) results

    cost = crossEntropy newsession

putStrLn $ "- initial cost (cross-entropy): " ++ show initialCost
putStrLn $ "- actual result: " ++ show results
putStrLn $ "- rounded result: " ++ show rounded
putStrLn $ "- cost (cross-entropy): " ++ show cost
```


## Examples
```bash
# neural network examples
stack exec example-xor
stack exec example-424
# notMNIST dataset, achieves ~87.5% accuracy after 9 epochs
stack exec example-notmnist

# Naive Bayes document classifier, using Reuters dataset
# using Porter stemming, stopword elimination and a few custom techniques.
# The dataset is imbalanced which causes the classifier to be biased towards some classes (earn, acq, ...)
# to workaround the imbalanced dataset problem, there is a --top-ten option which classifies only top 10 popular
# classes, with evenly split datasets (100 for each), this increases F Measure significantly, along with ~10% of improved accuracy
# N-Grams don't seem to help us much here (or maybe my implementation is wrong!), using bigrams increases
# accuracy, while decreasing F-Measure slightly.
stack exec example-naivebayes-doc-classifier -- --verbose
stack exec example-naivebayes-doc-classifier -- --verbose --top-ten
```

### notMNIST

notMNIST dataset, sigmoid hidden layer, cross-entropy loss, learning rate decay and sgd ([`notmnist.hs`](https://github.com/mdibaiee/sibe/blob/master/examples/notmnist.hs)):
![notMNIST](https://github.com/mdibaiee/sibe/blob/master/notmnist.png?raw=true)

notMNIST dataset, relu hidden layer, cross-entropy loss, learning rate decay and sgd ([`notmnist.hs`](https://github.com/mdibaiee/sibe/blob/master/examples/notmnist.hs)):
![notMNIST](https://github.com/mdibaiee/sibe/blob/master/notmnist-relu.png?raw=true)

### Word2Vec

word2vec on a very small sample text:

```
the king loves the queen
the queen loves the king,
the dwarf hates the king
the queen hates the dwarf
the dwarf poisons the king
the dwarf poisons the queen
the man loves the woman
the woman loves the man,
the thief hates the man
the woman hates the thief
the thief robs the man
the thief robs the woman
```

The computed vectors are transformed to two dimensions using PCA:

`king` and `queen` have a relation with `man` and `woman`, `love` and `hate` are close to each other,
and `dwarf` and `thief` have a relation with `poisons` and `robs`, also, `dwarf` is close to `queen` and `king` while
`thief` is closer to `man` and `woman`. `the` doesn't relate to anything.
![word2vec results](https://raw.githubusercontent.com/mdibaiee/sibe/master/w2v.png)

_You can reproduce this result using these parameters:_
```haskell
let session = def { learningRate = 0.1
                  , batchSize = 1
                  , epochs = 10000
                  , debug = True
                  } :: Session
    w2v = def { docs = ds
              , dimensions = 30
              , method = SkipGram
              , window = 2
              , w2vDrawChart = True
              , w2vChartName = "w2v.png"
              } :: Word2Vec
```

This is a very small development dataset and I have to test it on larger datasets.
chore(README): simple initial README 2016-07-29 11:56:50 +00:00			`sibe`
			`====`
			`A simple Machine Learning library.`

chore(README): simple neural network 2016-09-09 20:50:08 +00:00			`## Simple neural network`
			```haskell
chore(README): unnecessary space 2016-10-16 22:29:48 +00:00			`import Numeric.Sibe`

			`let a = (sigmoid, sigmoid') -- activation function`
			`-- random network, seed 0, values between -1 and 1,`
			`-- two inputs, two nodes in hidden layer and a single output`
			`rnetwork = randomNetwork 0 (-1, 1) 2 [(2, a)] (1, a)`

			`-- inputs and labels`
			`inputs = [vector [0, 1], vector [1, 0], vector [1, 1], vector [0, 0]]`
			`labels = [vector [1], vector [1], vector [0], vector [0]]`

			`-- define the session which includes parameters`
			`session = def { network = rnetwork`
			`, learningRate = 0.5`
			`, epochs = 1000`
			`, training = zip inputs labels`
			`, test = zip inputs labels`
			`, drawChart = True`
			`, chartName = "nn.png" -- draws chart of loss over time`
			`} :: Session`

			`initialCost = crossEntropy session`

			`-- run gradient descent`
			-- you can also use `sgd`, see the notmnist example
			`newsession <- run gd session`

			let results = map (`forward` newsession) inputs
			`rounded = map (map round . toList) results`

			`cost = crossEntropy newsession`

			`putStrLn $ "- initial cost (cross-entropy): " ++ show initialCost`
			`putStrLn $ "- actual result: " ++ show results`
			`putStrLn $ "- rounded result: " ++ show rounded`
			`putStrLn $ "- cost (cross-entropy): " ++ show cost`
chore(README): simple initial README 2016-07-29 11:56:50 +00:00			```
chore(README): simple neural network 2016-09-09 20:50:08 +00:00

			`## Examples`
			```bash
chore(README): better readme, chart 2016-09-09 20:45:38 +00:00			`# neural network examples`
chore(README): simple initial README 2016-07-29 11:56:50 +00:00			`stack exec example-xor`
chore(README): better readme, chart 2016-09-09 20:45:38 +00:00			`stack exec example-424`
chore(readme): time is relative to the machine 2016-10-08 16:27:00 +00:00			`# notMNIST dataset, achieves ~87.5% accuracy after 9 epochs`
chore(README): better readme, chart 2016-09-09 20:45:38 +00:00			`stack exec example-notmnist`
fix(cleanText): remove unnecessary spaces fix(run): use `1 - prior` for alpha, no need for smoothing feat(cleanText): turn all text to lowercase 2016-08-09 11:34:57 +00:00
chore(README): don't tell them about accuracy, let them try it themselves 2016-08-20 20:37:01 +00:00			`# Naive Bayes document classifier, using Reuters dataset`
fix(cleanText): remove unnecessary spaces fix(run): use `1 - prior` for alpha, no need for smoothing feat(cleanText): turn all text to lowercase 2016-08-09 11:34:57 +00:00			`# using Porter stemming, stopword elimination and a few custom techniques.`
feat(topten): top-ten classification with evenly distrubuted data 2016-08-20 20:29:42 +00:00			`# The dataset is imbalanced which causes the classifier to be biased towards some classes (earn, acq, ...)`
			`# to workaround the imbalanced dataset problem, there is a --top-ten option which classifies only top 10 popular`
chore(README): explain how the top 10 method increases accuracy and F measure 2016-08-20 20:51:42 +00:00			`# classes, with evenly split datasets (100 for each), this increases F Measure significantly, along with ~10% of improved accuracy`
fix(cleanText): remove unnecessary spaces fix(run): use `1 - prior` for alpha, no need for smoothing feat(cleanText): turn all text to lowercase 2016-08-09 11:34:57 +00:00			`# N-Grams don't seem to help us much here (or maybe my implementation is wrong!), using bigrams increases`
			`# accuracy, while decreasing F-Measure slightly.`
			`stack exec example-naivebayes-doc-classifier -- --verbose`
feat(topten): top-ten classification with evenly distrubuted data 2016-08-20 20:29:42 +00:00			`stack exec example-naivebayes-doc-classifier -- --verbose --top-ten`
chore(README): simple initial README 2016-07-29 11:56:50 +00:00			```
chore: section 2016-09-09 20:52:09 +00:00
			`### notMNIST`

relu: run notmnist using relu activation and draw the chart [wip] word2vec: work in progress implementation of word2vec 2016-09-13 05:19:44 +00:00			notMNIST dataset, sigmoid hidden layer, cross-entropy loss, learning rate decay and sgd ([`notmnist.hs`](https://github.com/mdibaiee/sibe/blob/master/examples/notmnist.hs)):
			`![notMNIST](https://github.com/mdibaiee/sibe/blob/master/notmnist.png?raw=true)`

			notMNIST dataset, relu hidden layer, cross-entropy loss, learning rate decay and sgd ([`notmnist.hs`](https://github.com/mdibaiee/sibe/blob/master/examples/notmnist.hs)):
chore(README): notmnist-relu 2016-10-03 15:52:55 +00:00			`![notMNIST](https://github.com/mdibaiee/sibe/blob/master/notmnist-relu.png?raw=true)`
chore(readme): word2vec chart explained 2016-10-01 09:02:08 +00:00
			`### Word2Vec`

			`word2vec on a very small sample text:`

			```
			`the king loves the queen`
			`the queen loves the king,`
			`the dwarf hates the king`
			`the queen hates the dwarf`
			`the dwarf poisons the king`
			`the dwarf poisons the queen`
			`the man loves the woman`
			`the woman loves the man,`
			`the thief hates the man`
			`the woman hates the thief`
			`the thief robs the man`
			`the thief robs the woman`
			```

feat(pca): implement PCA and visualize data using it 2016-10-11 12:58:09 +00:00			`The computed vectors are transformed to two dimensions using PCA:`
chore(readme): word2vec chart explained 2016-10-01 09:02:08 +00:00
			`king` and `queen` have a relation with `man` and `woman`, `love` and `hate` are close to each other,
			and `dwarf` and `thief` have a relation with `poisons` and `robs`, also, `dwarf` is close to `queen` and `king` while
			`thief` is closer to `man` and `woman`. `the` doesn't relate to anything.
			`![word2vec results](https://raw.githubusercontent.com/mdibaiee/sibe/master/w2v.png)`

feat(pca): implement PCA and visualize data using it 2016-10-11 12:58:09 +00:00			`_You can reproduce this result using these parameters:_`
			```haskell
			`let session = def { learningRate = 0.1`
			`, batchSize = 1`
			`, epochs = 10000`
			`, debug = True`
			`} :: Session`
			`w2v = def { docs = ds`
			`, dimensions = 30`
			`, method = SkipGram`
			`, window = 2`
			`, w2vDrawChart = True`
			`, w2vChartName = "w2v.png"`
			`} :: Word2Vec`
			```

			`This is a very small development dataset and I have to test it on larger datasets.`