Implements the plain Q-learning algorithm A = PLAINQ(A, STATE, ACTIONS, REWARDS, PARAMS) Implements Q-learning as described in [1], employing in addition to that an eligibility trace. Expects the Q table to be initialized. Uses flat Q and eligibility tables for fast access. Required values on the agent learning parameters: alpha - the learning rate gamma - the discount factor lambda - the eligibility trace decay rate epsilon - the exploration probability Required values on the extra parameters: newtrial - only in episodic environments; whether a new trial is beginning Can be coupled with an action function that uses a Q-table indexed on agent state and action, such as greedyact(). Supports discrete states and actions, with 1 action variable per agent. References: [1] Watkins, C. J. C. H. and Dayan, P. (1992). Technical note: Q-learning. Machine Learning Journal, 8:279-292. See also learn, plainq_init