The BOXES algorithm was originally developed by Michie and Chambers (1968) and was one of the first examples of reinforcement learning. The version described here is a modification of the original algorithm that is simpler and requires fewer trials to learn to balance a pole and cart system.
You can find the source code for a Java applet and a brief description of the BOXES algorithm here.
Publications
Law, J. K. C. (1992). Adaptive Rule-based Control. Master of Cognitive Science Thesis, School of Computer Science and Engineering, University of New South Wales.
Michie, D. and Chambers, R. A. (1968). Boxes: An Experiment in Adaptive Control. In E. Dale and D. Michie (Eds.), Machine Intelligence 2. Edinburgh: Oliver and Boyd..
Sammut, C. A. (1994). Recent Progress with BOXES. In K. Furakawa, Michie, D. & S. Muggleton (Eds.), Machine Intelligence 13. Oxford: The Clarendon Press, OUP, pp 363-384.