From 42bff3bbc70e13e40527c820a1c4d912e6b21edf Mon Sep 17 00:00:00 2001
From: Mahdi Dibaiee <mdibaiee@aol.com>
Date: Mon, 3 Apr 2017 11:58:23 +0430
Subject: [PATCH] interpretation

---
 README.md | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/README.md b/README.md
index 6e7b560..3ea6b25 100644
--- a/README.md
+++ b/README.md
@@ -38,5 +38,8 @@ Notes
 -----
 
 It seems training past a maximum point reduces performance, learning rate decay might help with that.
+My interpretation is that after finding a local maximum for accumulated reward and being able to receive high rewards,
+the updates become pretty large and will pull the model too much to sides, thus the model will enter a state of oscillation.
+
 To try it yourself, there is a `long.npy` file, rename it to `load.npy` (backup `load.npy` before doing so) and run `demo.py`,
 you will see the bird failing more often than not. `long.py` was trained for 100 more iterations than `load.npy`.