Home > marl > agent > learnfuns > wolfq.m

wolfq

PURPOSE ^

Implements the Win-or-Learn-Fast Policy Hill Climbing algorithm

SYNOPSIS ^

function a = wolfq(a, state, action, reward, params)

DESCRIPTION ^

Implements the Win-or-Learn-Fast Policy Hill Climbing algorithm
  A = WOLFQ(A, STATE, ACTION, REWARD, PARAMS)
 Implements the Win-or-Learn-Fast Policy Hill Climbing algorithm [1].
 While the originally presented algorithm assumes the whole monolithic
 state of the world is used by the agents, this version also supports
 learning also on the basis of the single agent's state (flag
 "usefullstate" on the learning parameters).
 Here, agents only uses their own state.

  Values on the agent learning parameters:
   gamma           - the discount factor
   alpha           - the learning rate
   alphadecay      - the decay ratio for the learning rate
   deltaw          - the initial deltaw (for simple delta decay schedule)
   deltadecay      - the decay ratio for the policy learning rate
   deltaratio      - how much bigger is the learning rate when losing than
                   when winning; deltal = deltaratio * deltaw
   lambda          - eligibility trace decay rate
   usefullstate    - whether to use the full state in learning

 Any decay ratio may be either a single value (in which case a simple
 decay schedule based on plain multiplication is assumed) or a vector of
 two values [a b] in which case Pk = 1 / (a + k/b), where Pk is the value
 of the parameter at the k-th iteration.

 Supports discrete states and actions, with 1 action variable per agent.

  References:
  [1] Bowling, M. and Veloso, M. (2002). Multiagent learning using a
      variable learning rate. Journal of Artificial Intelligence,
      136(2):215-250.

 See also agent_control, wolfq_init

CROSS-REFERENCE INFORMATION ^

This function calls: This function is called by:
Generated on Wed 04-Aug-2010 16:55:08 by m2html © 2005