sibe/README.md

sibe
====
A simple Machine Learning library.

## Simple neural network
```haskell
    let a = (sigmoid, sigmoid') -- activation function
        -- random network, seed 0, values between -1 and 1,
        -- two inputs, two nodes in hidden layer and a single output
        rnetwork = randomNetwork 0 (-1, 1) 2 [(2, a)] (1, a)

        -- inputs and labels
        inputs = [vector [0, 1], vector [1, 0], vector [1, 1], vector [0, 0]]
        labels = [vector [1], vector [1], vector [0], vector [0]]

        -- define the session which includes parameters
        session = def { network = rnetwork
                      , learningRate = 0.5
                      , epochs = 1000
                      , training = zip inputs labels
                      , test = zip inputs labels
                      } :: Session

        initialCost = crossEntropy session

    -- run gradient descent
    -- you can also use `sgd`, see the notmnist example
    newsession <- run gd session

    let results = map (`forward` newsession) inputs
        rounded = map (map round . toList) results

        cost = crossEntropy newsession
    
    putStrLn $ "- initial cost (cross-entropy): " ++ show initialCost
    putStrLn $ "- actual result: " ++ show results
    putStrLn $ "- rounded result: " ++ show rounded
    putStrLn $ "- cost (cross-entropy): " ++ show cost
```


## Examples
```bash
# neural network examples
stack exec example-xor
stack exec example-424
# notMNIST dataset, achieves ~87.5% accuracy after 9 epochs (2 minutes)
stack exec example-notmnist

# Naive Bayes document classifier, using Reuters dataset
# using Porter stemming, stopword elimination and a few custom techniques.
# The dataset is imbalanced which causes the classifier to be biased towards some classes (earn, acq, ...)
# to workaround the imbalanced dataset problem, there is a --top-ten option which classifies only top 10 popular
# classes, with evenly split datasets (100 for each), this increases F Measure significantly, along with ~10% of improved accuracy
# N-Grams don't seem to help us much here (or maybe my implementation is wrong!), using bigrams increases
# accuracy, while decreasing F-Measure slightly.
stack exec example-naivebayes-doc-classifier -- --verbose
stack exec example-naivebayes-doc-classifier -- --verbose --top-ten
```

### notMNIST

notMNIST dataset, cross-entropy loss, learning rate decay and sgd ([`notmnist.hs`](https://github.com/mdibaiee/sibe/blob/master/examples/notmnist.hs)):
![notMNIST](https://github.com/mdibaiee/sibe/blob/master/notmnist.png?raw=true)
chore(README): simple initial README 2016-07-29 11:56:50 +00:00			`sibe`
			`====`
			`A simple Machine Learning library.`

chore(README): simple neural network 2016-09-09 20:50:08 +00:00			`## Simple neural network`
			```haskell
			`let a = (sigmoid, sigmoid') -- activation function`
			`-- random network, seed 0, values between -1 and 1,`
			`-- two inputs, two nodes in hidden layer and a single output`
			`rnetwork = randomNetwork 0 (-1, 1) 2 [(2, a)] (1, a)`

			`-- inputs and labels`
			`inputs = [vector [0, 1], vector [1, 0], vector [1, 1], vector [0, 0]]`
			`labels = [vector [1], vector [1], vector [0], vector [0]]`

			`-- define the session which includes parameters`
			`session = def { network = rnetwork`
			`, learningRate = 0.5`
			`, epochs = 1000`
			`, training = zip inputs labels`
			`, test = zip inputs labels`
			`} :: Session`

			`initialCost = crossEntropy session`

			`-- run gradient descent`
			-- you can also use `sgd`, see the notmnist example
			`newsession <- run gd session`

			let results = map (`forward` newsession) inputs
			`rounded = map (map round . toList) results`

			`cost = crossEntropy newsession`

			`putStrLn $ "- initial cost (cross-entropy): " ++ show initialCost`
			`putStrLn $ "- actual result: " ++ show results`
			`putStrLn $ "- rounded result: " ++ show rounded`
			`putStrLn $ "- cost (cross-entropy): " ++ show cost`
chore(README): simple initial README 2016-07-29 11:56:50 +00:00			```
chore(README): simple neural network 2016-09-09 20:50:08 +00:00

			`## Examples`
			```bash
chore(README): better readme, chart 2016-09-09 20:45:38 +00:00			`# neural network examples`
chore(README): simple initial README 2016-07-29 11:56:50 +00:00			`stack exec example-xor`
chore(README): better readme, chart 2016-09-09 20:45:38 +00:00			`stack exec example-424`
fix(crossEntropy): implement crossEntropy' to be used in output layer fix(softmax'): softmax was not correct 2016-09-10 13:13:45 +00:00			`# notMNIST dataset, achieves ~87.5% accuracy after 9 epochs (2 minutes)`
chore(README): better readme, chart 2016-09-09 20:45:38 +00:00			`stack exec example-notmnist`
fix(cleanText): remove unnecessary spaces fix(run): use `1 - prior` for alpha, no need for smoothing feat(cleanText): turn all text to lowercase 2016-08-09 11:34:57 +00:00
chore(README): don't tell them about accuracy, let them try it themselves 2016-08-20 20:37:01 +00:00			`# Naive Bayes document classifier, using Reuters dataset`
fix(cleanText): remove unnecessary spaces fix(run): use `1 - prior` for alpha, no need for smoothing feat(cleanText): turn all text to lowercase 2016-08-09 11:34:57 +00:00			`# using Porter stemming, stopword elimination and a few custom techniques.`
feat(topten): top-ten classification with evenly distrubuted data 2016-08-20 20:29:42 +00:00			`# The dataset is imbalanced which causes the classifier to be biased towards some classes (earn, acq, ...)`
			`# to workaround the imbalanced dataset problem, there is a --top-ten option which classifies only top 10 popular`
chore(README): explain how the top 10 method increases accuracy and F measure 2016-08-20 20:51:42 +00:00			`# classes, with evenly split datasets (100 for each), this increases F Measure significantly, along with ~10% of improved accuracy`
fix(cleanText): remove unnecessary spaces fix(run): use `1 - prior` for alpha, no need for smoothing feat(cleanText): turn all text to lowercase 2016-08-09 11:34:57 +00:00			`# N-Grams don't seem to help us much here (or maybe my implementation is wrong!), using bigrams increases`
			`# accuracy, while decreasing F-Measure slightly.`
			`stack exec example-naivebayes-doc-classifier -- --verbose`
feat(topten): top-ten classification with evenly distrubuted data 2016-08-20 20:29:42 +00:00			`stack exec example-naivebayes-doc-classifier -- --verbose --top-ten`
chore(README): simple initial README 2016-07-29 11:56:50 +00:00			```
chore: section 2016-09-09 20:52:09 +00:00
			`### notMNIST`

			notMNIST dataset, cross-entropy loss, learning rate decay and sgd ([`notmnist.hs`](https://github.com/mdibaiee/sibe/blob/master/examples/notmnist.hs)):
			`![notMNIST](https://github.com/mdibaiee/sibe/blob/master/notmnist.png?raw=true)`