Home > marl > episodic_learn.m

episodic_learn

PURPOSE ^

Implements the learning control mechanism in an episodic environment

SYNOPSIS ^

function [world, agents, stats] = episodic_learn(world, agents, learnparam)

DESCRIPTION ^

Implements the learning control mechanism in an episodic environment
  [WORLD, AGENTS, STATS] = EPISODIC_LEARN(WORLD, AGENTS, LEARNPARAM)
  Performs reinforcement learning over a given number of trials until
  convergence. A trial is complete when all agents have finished their
  tasks or after the iterations count has reached a specified upper limit.
  Learning statistics are maintained. 

  Parameters:
   WORLD       - the world within which the learning takes place.
   AGENTS      - the (possibly heterogeneous) cell array of agents.
               Must be the same with which the world was created.
   LEARNPARAM  - the learning params. Contains the following fields, all
               optional.
               'trials' - how many trials to run at most. Default 100.
               'maxiter' - how many iterations to run in a trial, at most.
                   Default 1000.
               'convgoal' - the convergence goal - the target mean
                   variation in the number of steps taken for each trial
                   Default: -1, means the max number of trials will always
                   run.
               'avgtrials' - over how many trials to average when checking for 
                   convergence. Default 10.
               'showconv'  - whether to plot the convergence progress "live".
                   Default 0 (do not show).
               'closeconv' - whether to close convergence progress figure
                   upon finishing learning.
  Returns:
   WORLD       - the possibly altered world.
   AGENTS      - the (savvier) agents.
   STATS       - the learning statistics.

  If a convergence goal is specified, learning is stopped when the agents
  complete 'avgtrials' successive trials with less than 'convgoal'
  stdev in the iterations taken to complete them. This only makes sense in
  deterministic environments and with a deterministic initial state of the
  world at the beginning of each trial.

  The learning statistics are returned in a structure with the following
  fields:
   'trials'      - how many trials were run until convergence (or limit)
   'iter'        - how many iterations each trial took to complete;
               vector running over trials
   'elapsedtime' - how much CPU time learning took
   'worldstats'  - specific statistics maintained by the world; cell vector
               running over trials (if the world supports this)
   'syntheticworldstats'
                 - specific statistics maintained by the world for the
               entire learning process (if the world supports this)


  See also learn, agent_initlearn, agent_act, agent_learn

CROSS-REFERENCE INFORMATION ^

This function calls: This function is called by:
Generated on Wed 04-Aug-2010 16:55:08 by m2html © 2005