Implements non-exploitive random exploration. [A, ACTION] = RANDOMEXPLORE(A, ACTION, STATE, ACTIONS, REWARDS, PARAMS) Implements non-exploitive random exploration with probability epsilon. This is the most basic (yet the most commonly used) exploration strategy. Can be used with any learning/policy functions combination, if the required parameters listed below are fed to the learning initialization function. Supports discrete actions, with 1 action variable only. Required values on the agent learning parameters: epsilon - the exploration probability epsilondecay - the exploration probability decay ratio. Required in non-episodic environments. If specified, epsilon will decay with this ratio at each iteration, otherwise, in episodic environments, it will decay with the trials count. Required values on extra parameters: newtrial - (in episodic environments) whether a new trial is beginning See also act, explore