Implements stochastic action selection for policy indexed on agent's state [A, ACTION] = STOCHACT(A, STATE, ACTIONS, REWARDS, PARAMS) Implements stochastic action selection. An action is chosen based on the current stochastic policy of the agent. The elements of the policy corresponding to a given state must form a valid probability distribution over the (discrete) actions. Supports discrete states and actions, with 1 action variable only. This policy must be stored under field 'PI' of the agent, as a flat vector representing a matrix with dimensions agent-action-space-size X agent-state-space-size. Moreover, this size must be cached in field 'sizes.pi' of the agent. See also agent_act