interpretation

This commit is contained in:
Mahdi Dibaiee 2017-04-03 11:58:23 +04:30
parent 7d6a23625b
commit 42bff3bbc7

View File

@ -38,5 +38,8 @@ Notes
-----
It seems training past a maximum point reduces performance, learning rate decay might help with that.
My interpretation is that after finding a local maximum for accumulated reward and being able to receive high rewards,
the updates become pretty large and will pull the model too much to sides, thus the model will enter a state of oscillation.
To try it yourself, there is a `long.npy` file, rename it to `load.npy` (backup `load.npy` before doing so) and run `demo.py`,
you will see the bird failing more often than not. `long.py` was trained for 100 more iterations than `load.npy`.