Implements the learning control mechanism in an episodic environment [WORLD, AGENTS, STATS] = EPISODIC_LEARN(WORLD, AGENTS, LEARNPARAM) Performs reinforcement learning over a given number of trials until convergence. A trial is complete when all agents have finished their tasks or after the iterations count has reached a specified upper limit. Learning statistics are maintained. Parameters: WORLD - the world within which the learning takes place. AGENTS - the (possibly heterogeneous) cell array of agents. Must be the same with which the world was created. LEARNPARAM - the learning params. Contains the following fields, all optional. 'trials' - how many trials to run at most. Default 100. 'maxiter' - how many iterations to run in a trial, at most. Default 1000. 'convgoal' - the convergence goal - the target mean variation in the number of steps taken for each trial Default: -1, means the max number of trials will always run. 'avgtrials' - over how many trials to average when checking for convergence. Default 10. 'showconv' - whether to plot the convergence progress "live". Default 0 (do not show). 'closeconv' - whether to close convergence progress figure upon finishing learning. Returns: WORLD - the possibly altered world. AGENTS - the (savvier) agents. STATS - the learning statistics. If a convergence goal is specified, learning is stopped when the agents complete 'avgtrials' successive trials with less than 'convgoal' stdev in the iterations taken to complete them. This only makes sense in deterministic environments and with a deterministic initial state of the world at the beginning of each trial. The learning statistics are returned in a structure with the following fields: 'trials' - how many trials were run until convergence (or limit) 'iter' - how many iterations each trial took to complete; vector running over trials 'elapsedtime' - how much CPU time learning took 'worldstats' - specific statistics maintained by the world; cell vector running over trials (if the world supports this) 'syntheticworldstats' - specific statistics maintained by the world for the entire learning process (if the world supports this) See also learn, agent_initlearn, agent_act, agent_learn