TOC   *   Intro   *   Agent behaviour   *   Worlds   *   Running and replaying   *   Batch experiments

4. Running and processing batch experiments

In addition to the learning and replaying mechanisms, the toolbox also provides two functions that can be used to run and process data from large batch experiments: runexp and processexp. An experiment is interpreted here as a set of repetitions of a learning process, where each repetition has the same initial configuration. The goal of this repeated execution is to eliminate through averaging the effect of random elements in the agents' algorithms.

The interface with runexp and processexp is quite complex, and is described in detail in the help text of these functions. We give here only a brief description of the features they offer, and provide an example demonstrating their use.

The code for this example can be found in the batchdemo script.

Jump to section:

  1. Running experiments
  2. Processing experiment data

Running experiments

Experiments can be run using the runexp function. The signature of this function is:

runexp(expconfigs, datafile, datadir, nruns, silent);

A sequence of experiments can be run at once. The experiment configuration data is represented in a cell array, with each cell corresponding to an experiment. For each experiment, the following parameters can be set:

The results of running the sequence are saved to datadir/datafile, and can be later processed using the complementary function processexp. While running, the function outputs progress information at the Matlab console.

The nruns argument sets the default number of runs for experiments that do not set this parameter via their options. The silent argument, if 'on', silents all graphical and text output regardless of the experiment option values.

Henceforth, we use the term "batch" to refer to a sequence of experiments as described above. As an example, we design a batch that compares the performance of basic Q-learning and a version of Q-learning that uses a Q-table indexed on the complete world state. The comparison is done by running each algorithm 30 times on the gridworld presented in Figure 1 (the same gridworld as in the previous sections).

Gridworld Example

Figure 1. Experiment gridworld

The code for setting up and running the batch is given below. Note that all graphical output is suppressed via option settings. This gives better speed. However, be aware that the batch will still take some time to complete.

NRUNS = 30; expc = {struct(... ... % world arguments 'worldtype', 'gridworld', ... 'worldargs', {{[], 'Square', [5 5], [3 1; 3 5]}}, ... ... % learning parameters and agent learning parameters ... % note we do learning is not stopped upon convergence 'lp', struct('trials', 80, 'maxiter', 300, 'convgoal', -1, 'avgtrials', 30), ... 'alp', struct('alpha', .3, 'gamma', .95, 'lambda', .5, 'epsilon', .3), ... ... % agent arguments 'agentargs', {{... {[1 1], [5 5], 5, 'plainq', 'greedyact', 'randomexplore'}, ... {[5 1], [1 5], 5, 'plainq', 'greedyact', 'randomexplore'} }}, ... ... % options structure 'options', struct('nruns', NRUNS, 'show', 'off', 'convplot', 'off', 'plotpause', 15, 'label', 'Plain Q') ... ), struct(... ... % for the second experiment, anything that is not specified ... % remains the same as configured for the first experiment 'lp', struct('trials', 100, 'maxiter', 400, 'convgoal', -1, 'avgtrials', 30), ... 'agentargs', {{... {[1 1], [5 5], 5, 'fullstate_plainq', 'fullstate_greedyact', 'randomexplore'}, ... {[5 1], [1 5], 5, 'fullstate_plainq', 'fullstate_greedyact', 'randomexplore'} }}, ... 'options', struct('nruns', NRUNS, 'show', 'off', 'convplot', 'off', 'plotpause', 15, 'label', 'Complete-state Q') ... )}; fname = 'q_fullstateq'; % the data are saved in current directory runexp(expc, fname, pwd, NRUNS, 'off');

Processing experiment data

A sequence of experiments created by runexp can be processed using processexp. This function has the signature:

processexp(datafile, mode, plotcount, plotfields);

The type of the information generated by processing is specified by the mode parameter, and can be one of the following:

The 'plot' mode can be further customized by using the plotcount and plotfields arguments: The batch plot can also be split in manual mode. The processexp function outputs progress logs at the Matlab console while processing experiments.

For example, the following command replays the behaviour of one set of Q-learners, and one set of full-state Q-learners:

>> processexp(fname, 'replay');

To plot compared convergence statistics, we use:

>> processexp(fname, 'plot');

The resulting figure should look similar to Figure 2.

Gridworld Example

Figure 2. Batch convergence results

TOC   *   Intro   *   Agent behaviour   *   Worlds   *   Running and replaying   *   Batch experiments