Home > marl > agent > learnfuns > minimaxq2.m

minimaxq2

PURPOSE ^

Implements an alternate version of the minimax Q-learning algorithm

SYNOPSIS ^

function a = minimaxq2(a, state, action, reward, params)

DESCRIPTION ^

Implements an alternate version of the minimax Q-learning algorithm
  A = MINIMAXQ2(A, STATE, ACTION, REWARD, PARAMS)
  Implements minimax Q-learning in a faster learning version than Littman's.
  Discards the auxiliary table for state values used by Littman, computing
  the next state value at each step by solving an LP problem similar to 
  the one solved for computing the policy at the current step. Note that
  this actually means slower running speed, but fewer learning iterations
  for the same performance. However, the theoretical convergence
  guarantees are lost in this version.

  Required values on the agent learning parameters:
   alpha           - the learning rate
   gamma           - the discount factor
   lambda          - the eligibility trace decay rate
   epsilon         - the exploration probability
  Required values on the extra parameters:
   newtrial        - only in episodic environments; whether a new trial is
                   beginning


  References:
  [1] Littman, M. L. (2001). Friend-or-foe Q-learning in general-sum
      games. In Proceedings of the Eighteenth International Conference on
      Machine Learning (ICML-01), pages 322-328, Williams College,
      Williamstown Massachusets, USA.

  See agent_learn, minimaxq2_init

CROSS-REFERENCE INFORMATION ^

This function calls: This function is called by:
Generated on Wed 04-Aug-2010 16:55:08 by m2html © 2005