Home > marl > agent > learnfuns > asfq.m

asfq

PURPOSE ^

Implements the adaptive state focus Q-learning algorithm

SYNOPSIS ^

function a = asfq(a, state, action, reward, params)

DESCRIPTION ^

Implements the adaptive state focus Q-learning algorithm
  A = ASFQ(A, STATE, ACTION, REWARD, PARAMS)
  Implements adaptive state focus Q-learning (ASF-Q) [1]. This algorithms
  works in two modes:
   - SINGLE, plain Q-learning. However, the evolution of Q-values is
       monitored and, when lack of convergence is detected, the algorithm
       switches to mode MULTI.
   - MULTI, which is full-state Q learning. In this mode, the states of
       all the agents are considered in maintaining the Q-table.

  Required values on the agent learning parameters:
   alpha           - the learning rate
   gamma           - the discount factor
   lambda          - the eligibility trace decay rate
   epsilon         - the exploration probability
   window          - the analysis window length
   stops           - the number of analysis stops per window
   zeromargin      - the zero margin, determining when a value used for
       testing convergence is considered positive, negative or zero
   expresetweight  - how does the exploration behaviour change upon
       switching modes

  Required values on the extra parameters:
   newtrial        - only in episodic environments; whether a new trial is
                   beginning

  The convergence monitoring happens in the following way: the absolute
  mean difference between Q-values at successive steps is maintained
  within a sliding [window] of iterations. The sliding average of this
  difference is computed in [stops] point per window. After one and a half
  windows have fully passed, each stop point triggers an analysis. The
  mean value of the derivative of the sliding average is computed; if this
  value is not negative, while at the same time the mean value of the
  sliding average is positive, the algorithm considers itself in a
  situation of convergence failure. Upon this, it extends the Q-table with
  the state space of the other agents (switches to mode MULTI). The
  inequalities above are all done within a robustness margin [zeromargin].

  Supports discrete states and actions, with 1 action variable per agent.

  This learning function should be coupled with the dedicated action
  function asfq_greedyact().


  [1] Busoniu, L., De Schutter, B. and Babuska, R. (2005). Multiagent
      reinforcement learning with adaptive state focus. In Proceedings 17th
      Belgian-Dutch Conference on Artificial Intelligence (BNAIC-05), pages
      35-42, Brussels, Belgium.

  See also agent_learn, asfq_init, asfq_greedyact

CROSS-REFERENCE INFORMATION ^

This function calls: This function is called by:
Generated on Wed 04-Aug-2010 16:55:08 by m2html © 2005