diff --git a/README.md b/README.md index 6e7b560..3ea6b25 100644 --- a/README.md +++ b/README.md @@ -38,5 +38,8 @@ Notes ----- It seems training past a maximum point reduces performance, learning rate decay might help with that. +My interpretation is that after finding a local maximum for accumulated reward and being able to receive high rewards, +the updates become pretty large and will pull the model too much to sides, thus the model will enter a state of oscillation. + To try it yourself, there is a `long.npy` file, rename it to `load.npy` (backup `load.npy` before doing so) and run `demo.py`, you will see the bird failing more often than not. `long.py` was trained for 100 more iterations than `load.npy`.