COSYNELog in


Cosyne 2008 Workshops


March 3-4, 2008

Snow Bird, Utah


Speaker Name

Razvan Florian, Center for Cognitive and Neural Studies, Cluj, Romania

Talk Title

Relating reinforcement learning and STDP

Talk Abstract

It has been shown analytically that reinforcement learning (RL) can be implemented in stochastic spiking neural networks by a plasticity mechanism similar to STDP, modulated by a global reward signal (Florian, 2005, 2007; de Queiroz et al., 2006; Baras and Meir, 2007). For this learning rule, plasticity results associatively from pre-post pairs of spikes, but there is no correspondent to post-pre associative plasticity encountered in typical forms of STDP unless homeostasis is explicitly considered (Pfister et al., 2006; Florian, 2007). Modulating typical STDP with the reward signal has also been shown to lead to RL, in simulations (Soula et al., 2004, 2005; Florian, 2005, 2007; Izhikevich, 2007; Henry et al., 2007; Farries and Fairhall, in press) and also analytically, under certain conditions (Legenstein et al., 2008). It has also been found that STDP can be neuromodulated in the brain, but the modulation of the typical form of STDP results from different mechanisms for depression and potentiation (Seol et al., 2007). From all this we can conclude that post-pre associative plasticity is sufficient but not necessary for implementing RL in spiking neural networks. We present alternatives to this type of plasticity for RL in spiking neural networks.

References

Baras, D. and Meir, R. (2007), ‘Reinforcement learning, spike-time-dependent plasticity and the BCM rule’, Neural Computation 19, 2245–2279. http://eprints.pascal-network.org/archive/00002561/01/RL-STDP_Final.pdf

de Queiroz, M. S., de Berredo, R. C. and de Pádua Braga, A. (2006), ‘Reinforcement learning of a simple control task using the spike response model’, Neurocomputing 70(1–3), 14–20. http://www.cpdee.ufmg.br/~apbraga/journals/spiking-neuroc.pdf

Farries, M. A. and Fairhall, A. L. (2007), ‘Reinforcement learning with modulated spike timing-dependent synaptic plasticity’, Journal of Neurophysiology 98, 3648-3665. http://dx.doi.org/10.1152/jn.00364.2007

Florian, R. V. (2005), A reinforcement learning algorithm for spiking neural networks, in D. Zaharie, D. Petcu, V. Negru, T. Jebelean, G. Ciobanu, A. Cicortas, A. Abraham and M. Paprzycki, eds, ‘Proceedings of the Seventh International Symposium on Symbolic and Numeric Algorithms for Scientific Comput- ing (SYNASC 2005)’, IEEE Computer Society, Los Alamitos, CA, pp. 299–306. http://www.coneural.org/florian/papers/05_RL_for_spiking_NNs.php

Florian, R. V. (2007), ‘Reinforcement learning through modulation of spike-timing dependent plasticity’, Neural Computation 19(6), 1468–1502. http://www.coneural.org/florian/papers/2007_florian_modulated_STDP.php

Henry, F., Daucé, E. and Soula, H. (2007), ‘Temporal pattern identification using spike-timing dependent plasticity’, Neurocomputing 70, 2009-2016. http://dx.doi.org/10.1016/j.neucom.2006.10.082

Izhikevich, E. M. (2007), ‘Solving the distal reward problem through linkage of STDP and dopamine signaling’, Cerebral Cortex 17(10), 2443–2452. http://vesicle.nsi.edu/users/izhikevich/publications/dastdp.pdf

Legenstein, R., Pecevski, D. and Maass, W. (2008), Theoretical analysis of learning with reward-modulated spike-timing-dependent plasticity, in J. Platt, D. Koller, Y. Singer and S. Roweis, eds, ‘Advances in Neural Information Processing Systems 20’, MIT Press, Cambridge, MA. http://books.nips.cc/papers/files/nips20/NIPS2007_0643.pdf

Pfister, J.-P., Toyoizumi, T., Barber, D. and Gerstner, W. (2006), ‘Optimal spike-timing-dependent plasticity for precise action potential firing in supervised learning’, Neural Computation 18(6), 1318–1348. http://diwww.epfl.ch/~gerstner/PUBLICATIONS/Pfister06.pdf

Seol, G. H., Ziburkus, J., Huang, S., Song, L., Kim, I. T., Takamiya, K., Huganir, R. L., Lee, H.-K. and Kirkwood, A. (2007), ‘Neuromodulators control the polar- ity of spike-timing-dependent synaptic plasticity’, Neuron 55, 919–929. http://dx.doi.org/10.1016/j.neuron.2007.08.013

Soula, H., Alwan, A. and Beslon, G. (2004), Obstacle avoidance learning in a spiking neural network, in ‘Last Minute Results of Simulation of Adaptive Behavior’, Los Angeles, CA. http://www.koredump.org/hed/abstract_sab2004.pdf

Soula, H., Alwan, A. and Beslon, G. (2005), Learning at the edge of chaos: Temporal coupling of spiking neuron controller of autonomous robotic, in ‘Proceedings of AAAI Spring Symposia on Developmental Robotics’, AAAI Press, Menlo Park, CA, USA. http://koredump.org/hed/soula_aaai05.pdf

Retrieved from "http://www.cosyne.org/wiki/Workshop_speaker_Razvan_Florian"

This page has been accessed 1,037 times. This page was last modified 16:08, 12 February 2008.


Cosyne 10
Meeting program
Workshops
Hotels
Transportation
Abstracts
Registration
Volunteers
Mailing list

Cosyne 09
Cosyne 08
Cosyne 07
Cosyne 06
Cosyne 05
Cosyne 04