Tetris Policy Improvement

In this work, the task is to learn a Tetris controller that operates as possible as it can. The baseline of our algorithm is to apply noisy cross entropy method on the policy selection that stochastically generates samples based on previously chosen elite set with some perturbed noise to prevent too fast convergence and getting stuck in local optimum. Policies are represented here using linearly weighted features.

This work investigated how the learning was evolving during the training iterations, and we improved the algorithm for training  efficiency and better convergence by learning the feature set selection as well. Sometimes for feature sets included didn’t guarantee better performance. Keeping the feature set at  the most efficient size was also included in our training. The video demonstrating the capability is shown in:

The convergence on the feature selection algorithm is shown:

tt

 

 

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: