Implements the adaptive state focus Q-learning algorithm A = ASFQ(A, STATE, ACTION, REWARD, PARAMS) Implements adaptive state focus Q-learning (ASF-Q) [1]. This algorithms works in two modes: - SINGLE, plain Q-learning. However, the evolution of Q-values is monitored and, when lack of convergence is detected, the algorithm switches to mode MULTI. - MULTI, which is full-state Q learning. In this mode, the states of all the agents are considered in maintaining the Q-table. Required values on the agent learning parameters: alpha - the learning rate gamma - the discount factor lambda - the eligibility trace decay rate epsilon - the exploration probability window - the analysis window length stops - the number of analysis stops per window zeromargin - the zero margin, determining when a value used for testing convergence is considered positive, negative or zero expresetweight - how does the exploration behaviour change upon switching modes Required values on the extra parameters: newtrial - only in episodic environments; whether a new trial is beginning The convergence monitoring happens in the following way: the absolute mean difference between Q-values at successive steps is maintained within a sliding [window] of iterations. The sliding average of this difference is computed in [stops] point per window. After one and a half windows have fully passed, each stop point triggers an analysis. The mean value of the derivative of the sliding average is computed; if this value is not negative, while at the same time the mean value of the sliding average is positive, the algorithm considers itself in a situation of convergence failure. Upon this, it extends the Q-table with the state space of the other agents (switches to mode MULTI). The inequalities above are all done within a robustness margin [zeromargin]. Supports discrete states and actions, with 1 action variable per agent. This learning function should be coupled with the dedicated action function asfq_greedyact(). [1] Busoniu, L., De Schutter, B. and Babuska, R. (2005). Multiagent reinforcement learning with adaptive state focus. In Proceedings 17th Belgian-Dutch Conference on Artificial Intelligence (BNAIC-05), pages 35-42, Brussels, Belgium. See also agent_learn, asfq_init, asfq_greedyact