Home > marl > agent > explorefuns > explore.m

explore

PURPOSE ^

Template. Implements exploration behaviour

SYNOPSIS ^

function [a, action] = explore(a, action, state, actions, rewards, params)

DESCRIPTION ^

Template. Implements exploration behaviour
  [A, ACTION] = EXPLORE(A, ACTION, STATE, ACTIONS, REWARDS, PARAMS) 
  Implements exploration behaviour for an agent.

  This function is a template and performs no operations. This functions'
  input and output argument(s) must conform to the specifications below.

  Parameters:
   A           - the agent
   ACTION      - the action chosen by the agent prior to calling the
               exploration behaviour. A column vector, or a single
               number if the agent is single-output.
   STATE       - the agent's view over the world state. A column vector.
   ACTIONS     - the agent's view over the last executed joint action. A
               column vector. If no action has yet been executed, the
               vector is filled with NaNs.
   REWARDS     - the agent's view over the last received joint reward. A
               column vector. If no feedback has yet been received, the
               vector is filled with NaNs.
   PARAMS      - a structure containing extra information on the basis
               of which the agent may reason on how to act. In episodic
               environments, must contain a boolean field "newtrial"
               signaling whether a new trial has just begun.

  The exploration function must support two special modes, activable via
  a field named "mode" of the [params] argument. These modes are:
   "init"      - initialization. Must save any initial exploration
           configuration.
   "reset"     - reset exploration to initial configuration. This mode
           should also be able to perform a partial reset, i.e. given a
           weight between 0 and 1, the agent should manifest a bias towards
           exploration equal only to weight * initial bias. The weight
           is supplied in the "weight" field of the params. If none
           supplied, should be assumed equal to 1.

  In these modes, no action selection is required and the state, actions,
  and rewards arguments should be disregarded.

  Returns:
   A           - the possibly updated agent
   ACTION      - the chosen action, possibly changed as a result of the
               exploration behaviour


  See also act, agent_act

CROSS-REFERENCE INFORMATION ^

This function calls: This function is called by:
Generated on Wed 04-Aug-2010 16:55:08 by m2html © 2005