Initializes the ASF-Q learning algorithm A = ASFQ_INIT(A, INFO) Creates the structures required for the ASF-Q learning algorithm to run. Initializes the Q table and an eligibility trace as flat vectors. Optional values on the agent learning parameters (see asfq() for the list of mandatory values): window - the analysis window length, default 256. This is the number of iterations over which the differences of Q-values between successive iterations are maintained. For exact analysis stops, make sure this is divisible with 2 and [stops] below. stops - the number of analysis stops per window, default 16. A sliding mean of the differences is maintained. This mean is subsampled [stops] times per window. The process begins after half of a window has elapsed, thus analysis (which requires a full window of data) can begin after one and a half window. zeromargin - the zero margin, determining when a value used for testing convergence is considered positive, negative or zero. Positive real between 0 and 1, default 0.05 (5%). A value will be considered positive if it is above [zeromargin]*SignalAmplitude, negative if below -[zeromargin]*SignalAmplitude, zero otherwise. expresetweight - how does the exploration behaviour change upon switching modes, default []. Either a number between 0 and 1, dictating the exploration reset weight (see agent()), or the empty matrix, meaning switches have no effect on the exploration behaviour. reverttobestq - if true, the algorithm will maintain a cache of the Q-table that behaved best so far (i.e. accumulated the best reward) and will revert to this Q-table upon expansion. This only works in episodic environments, where "best behaviour so far" makes a clear sense. The default is 0 (don't revert). Values on the info parameters: statespacesize - the state space size See also asfq, agent_initlearn