In this work, the task is to learn a Tetris controller that operates as possible as it can. The baseline of our algorithm is to apply noisy cross entropy method on the policy selection that stochastically generates samples based on previously chosen elite set with some perturbed noise to prevent too fast convergence and getting stuck in local optimum. Policies are represented here using linearly weighted features.
This work investigated how the learning was evolving during the training iterations, and we improved the algorithm for training efficiency and better convergence by learning the feature set selection as well. Sometimes for feature sets included didn’t guarantee better performance. Keeping the feature set at the most efficient size was also included in our training. The video demonstrating the capability is shown in:
The convergence on the feature selection algorithm is shown: