Implements the team Q-learning algorithm A = TEAMQ(A, STATE, ACTION, REWARD, PARAMS) Implements team Q-learning (also known as friend-Q). This is a straightforward extension of plain Q to the multiagent case. Each agent learns a Q-table on the basis of the full world state and the joint action. Required values on the agent learning parameters: alpha - the learning rate gamma - the discount factor lambda - the eligibility trace decay rate epsilon - the exploration probability Required values on the extra parameters: newtrial - only in episodic environments; whether a new trial is beginning Supports discrete states and actions, with 1 action variable per agent. Can be coupled with an action function that uses a Q-table indexed on full world state and joint action, such as fullstatejoint_greedyact(). References: [1] Littman, M. L. (2001). Friend-or-foe Q-learning in general-sum games. In Proceedings of the Eighteenth International Conference on Machine Learning (ICML-01), pages 322-328, Williams College, Williamstown Massachusets, USA. See also agent_learn, teamq_init