Demonstration of plain Q-learning on a 5x5 gridworld This script demonstrates the direct use of the MARL toolbox to run learning experiments, examine learning statistics and replay learned behaviour. The experiment involves two Q-learning agents on a 5x5 gridworld, and is explained in detail in Chapter 4 of the documentation.