interpretation

2017-04-03 11:58:23 +04:30
parent 7d6a23625b
commit 42bff3bbc7
1 changed files with 3 additions and 0 deletions
--- a/README.md
+++ b/README.md
@ -38,5 +38,8 @@ Notes
 -----
 It seems training past a maximum point reduces performance, learning rate decay might help with that.
 My interpretation is that after finding a local maximum for accumulated reward and being able to receive high rewards,
 the updates become pretty large and will pull the model too much to sides, thus the model will enter a state of oscillation.
 To try it yourself, there is a `long.npy` file, rename it to `load.npy` (backup `load.npy` before doing so) and run `demo.py`,
 you will see the bird failing more often than not. `long.py` was trained for 100 more iterations than `load.npy`.