interpretation
This commit is contained in:
parent
7d6a23625b
commit
42bff3bbc7
@ -38,5 +38,8 @@ Notes
|
|||||||
-----
|
-----
|
||||||
|
|
||||||
It seems training past a maximum point reduces performance, learning rate decay might help with that.
|
It seems training past a maximum point reduces performance, learning rate decay might help with that.
|
||||||
|
My interpretation is that after finding a local maximum for accumulated reward and being able to receive high rewards,
|
||||||
|
the updates become pretty large and will pull the model too much to sides, thus the model will enter a state of oscillation.
|
||||||
|
|
||||||
To try it yourself, there is a `long.npy` file, rename it to `load.npy` (backup `load.npy` before doing so) and run `demo.py`,
|
To try it yourself, there is a `long.npy` file, rename it to `load.npy` (backup `load.npy` before doing so) and run `demo.py`,
|
||||||
you will see the bird failing more often than not. `long.py` was trained for 100 more iterations than `load.npy`.
|
you will see the bird failing more often than not. `long.py` was trained for 100 more iterations than `load.npy`.
|
||||||
|
Loading…
Reference in New Issue
Block a user