Dopamine is a critical modulator of both learning and motivation. This presents a problem: how can target cells know whether increased dopamine is a signal to learn, or to move? It is often presumed that motivation involves slow (“tonic”) dopamine changes, while fast (“phasic”) dopamine fluctuations convey reward prediction errors for learning. Yet recent studies have shown that dopamine conveys motivational value, and promotes movement, even on sub-second timescales. Here I describe an alternative account of how dopamine regulates ongoing behavior. Dopamine release related to motivation is rapidly and locally sculpted by receptors on dopamine terminals, independently from dopamine cell firing. Target neurons abruptly switch between learning and performance modes, with striatal cholinergic interneurons providing one candidate switch mechanism. The behavioral impact of dopamine varies by subregion, but in each case dopamine provides a dynamic estimate of whether it is worth expending a limited internal resource, such as energy, attention, or time.
Is dopamine a signal for learning, for motivation, or both?
Our understanding of dopamine has changed in the past, and is changing once again. One critical distinction is between dopamine effects on current behavior (performance), and dopamine effects on future behavior (learning). Both are real and important, but at various times one has been in favor and the other has not.
When (in the ‘70s) it became possible to perform selective, complete lesions of dopamine pathways, the obvious behavioral consequence was a severe reduction in movement1. This fit with the akinetic effects of dopamine loss in humans, produced by advanced Parkinson’s disease, toxic drugs, or encephalitis2. Yet neither rat nor human cases display a fundamental inability to move. Dopamine-lesioned rats swim in cold water3, and akinetic patients may get up and run if a fire alarm sounds(“paradoxical” kinesia). Nor is there a basic deficit in appreciating rewards: dopamine-lesioned rats will consume food placed in their mouths, and show signs of enjoying it4. Rather, they will not choose to exert effort to actively obtain rewards. These and many other results established a fundamental link between dopamine and motivation5. Even the movement slowing observed in less-severe cases of Parkinson’s Disease can be considered a motivational deficit, reflecting implicit decisions that it is not worth expending the energy required for faster movements6.
Then (in the ‘80s) came pioneering recordings of dopamine neurons in behaving monkeys(in midbrain areas that project to forebrain: ventral tegmental area, VTA / substantia nigra pars compacta, SNc). Among observed firing patterns were brief bursts of activity to stimuli that triggered immediate movements. This “phasic” dopamine firing was initially interpreted as supporting “behavioral activation”7 and “motivational arousal”8 – in other words, as invigorating the animal’s current behavior.
A radical shift occurred in the ‘90s, with the reinterpretation of phasic dopamine bursts as encoding reward prediction errors (RPEs9). This was based upon a key observation: dopamine cells respond to unexpected stimuli associated with future reward, but often stop responding if these stimuli become expected10. The RPE idea originated in earlier learning theories, and especially in the then-developing computer science field of reinforcement learning11. The point of an RPE signal is to update values(estimates of future rewards). These values are used later, to help make choices that maximize reward. Since dopamine cell firing resembled RPEs, and RPEs are used for learning, it became natural to emphasize the role of dopamine in learning. Later optogenetic manipulations confirmed the dopaminergic identity of RPE-coding cells12,13 and showed they indeed modulate learning14,15.
The idea that dopamine provides a learning signal fits beautifully with the literature that dopamine modulates synaptic plasticity in the striatum, the primary forebrain target of dopamine. For example, the triple coincidence of glutamate stimulation of a striatal dendrite spine, postsynaptic depolarization, and dopamine release causes the spine to grow16. Dopaminergic modulation of long-term learning mechanisms helps explain the persistent behavioral effects of addictive drugs, which share the property of enhancing striatal dopamine release17. Even the profound akinesia with dopamine loss can be partly accounted for by such learning mechanisms18. Lack of dopamine may be treated as a constantly-negative RPE, that progressively updates values of actions towards zero. Similar progressive, extinction-like effects on behavior can be produced by dopamine antagonists19,20.
Yet the idea that dopamine is critically involved in ongoing motivation has never gone away – on the contrary, it is widely taken for granted by behavioral neuroscientists. This is appropriate given the strong evidence that dopamine functions in motivation/movement/invigoration are dissociable from learning15,20–23. Less widely-appreciated is the challenge involved in reconciling this motivational role with the theory that DA provides an RPE learning signal.
Motivation “looks forward”: it uses predictions of future reward (values) to appropriately energize current behavior. By contrast, learning “looks backwards” at states and actions in the recent past, and updates their values. These are complementary phases of a cycle: the updated values may be used in subsequent decision-making if those states are re-encountered, then updated again, and so forth. But which phase of the cycle is dopamine involved with – using values to make decisions (performance), or updating values (learning)?
In some circumstances it is straightforward to imagine dopamine playing both roles simultaneously.24Unexpected, reward-predictive cues are the archetypical events for evoking dopamine cell firing and release, and such cues typically both invigorate behavior and evoke learning(Fig. 1). In this particular situation both reward prediction, and reward prediction errors, increase simultaneously – but this is not always the case. As just one example, people and other animals are often motivated to work for rewards even when little or nothing surprising occurs. They may work harder and harder as they get closer and closer to reward (value increases as rewards draw near). The point is that learning and motivation are conceptually, computationally, and behaviorally distinct – and yet dopamine seems to do both.
Below I critically assess current ideas about how dopamine is able to achieve both learning and motivational functions. I propose an updated model, based on three key facts: 1) dopamine release from terminals does not arise simply from dopamine cell firing, but can also be locally controlled; 2) dopamine affects both synaptic plasticity and excitability of target cells, with distinct consequences for learning and performance respectively; 3) dopamine effects on plasticity can be switched on or off by nearby circuit elements. Together, these features may allow brain circuits to toggle between two distinct dopamine messages, for learning and motivation respectively.
Are there separate “phasic” and “tonic” dopamine signals, with different meanings?
It is often argued that the learning and motivational roles of dopamine occur on different time scales25. Dopamine cells fire continuously (“tonically”) at a few spikes per second, with occasional brief (“phasic”) bursts or pauses. Bursts, especially if artificially synchronized across dopamine cells, drive corresponding rapid increases in forebrain dopamine26 that are highly transient (sub-second duration27). The separate contribution of tonic dopamine cell firing to forebrain dopamine concentrations is less clear. Some evidence suggests this contribution is very small28. It may be sufficient to produce near-continuous stimulation of the higher-affinity D2 receptors, allowing the system to notice brief pauses in dopamine cell firing29 and use these pauses as negative prediction errors.
Microdialysis has been widely used to directly measure forebrain dopamine levels, albeit with low temporal resolution(typically averaging across many minutes). Such slow measurements of dopamine can be challenging to relate precisely to behavior. Nonetheless microdialysis of dopamine in the nucleus accumbens(NAc; ventral/medial striatum) shows positive correlations to locomotor activity30 and other indices of motivation5. This has been widely taken to mean that there are slow(“tonic”) changes in dopamine concentration, and that these slow changes convey a motivational signal. More specifically, computational models have proposed that tonic dopamine levels track the long-term average reward rate31 – a useful motivational variable for time allocation and foraging decisions. It is worth emphasizing that very few papers clearly define “tonic” dopamine levels – they usually just assume that dopamine concentration slowly changes over the multiple-minutes time scale of microdialysis.
Yet this “phasic dopamine=RPE/learning, tonic dopamine=motivation” view faces many problems. First, there is no direct evidence that tonic dopamine cell firing normally varies over slow time scales. Tonic firing rates do not change with changing motivation32,33. It has been argued that tonic dopamine levels change due to a changing proportion of active dopamine cells34,35. But across many studies in undrugged, unlesioned animals, dopamine cells have never been reported to switch between silent and active states.
Furthermore, the fact that microdialysis measures dopamine levels slowly does not mean that dopamine levels actually change slowly. We recently15 examined rat NAc dopamine in a probabilistic reward task, using both microdialysis and fast-scan cyclic voltammetry. We confirmed that mesolimbic dopamine, as measured by microdialysis, correlates with reward rate(rewards/min). However, even with an improved microdialysis temporal resolution(1min) dopamine fluctuated as fast as we sampled it: we saw no evidence for an inherently-slow dopamine signal.
Using the finer temporal resolution still of voltammetry we observed a close relationship between sub-second dopamine fluctuations and motivation. As rats performed the sequence of actions needed to achieve rewards, dopamine rose higher and higher, reaching a peak just as they obtained the reward(and dropping rapidly as they consumed it). We showed that dopamine correlated strongly with instantaneous state value -defined as the expected future reward, discounted by the expected time needed to receive it. These rapid dopamine dynamics can also explain the microdialysis results, without invoking separate dopamine signals on different time scales. As animals experience more rewards, they increase their expectations of future rewards at each step in the trial sequence. Rather than a slowly-evolving average reward rate signal, the correlation between dopamine and reward rate is best explained as an average, over the prolonged microdialysis sample collection time, of these rapidly-evolving state values.
This value interpretation of mesolimbic dopamine release is consistent with voltammetry results from other research groups, who have repeatedly found that dopamine release ramps up with increasing proximity to reward36–38(Fig. 2). This motivational signal is not inherently “slow”, but rather can be observed across a continuous range of time scales. Although dopamine ramps can last several seconds when an approach behavior also lasts several seconds38, this reflects the time course of the behavior, rather than intrinsic dopamine dynamics. The relationship between mesolimbic dopamine release and fluctuating value is visible as fast as the recording technique permits, i.e. on a ~100ms timescale with acute voltammetry electrodes15.
Fast dopamine fluctuations do not simply mirror motivation, they also immediately drive motivated behavior. Larger phasic responses of dopamine cells to trigger cues predict shorter reaction times on that very same trial39. Optogenetic stimulation of VTA dopamine cells makes rats more likely to begin work in our probabilistic reward task15, just as if they had a higher expectation of reward. Optogenetic stimulation of SNc dopamine neurons, or their axons in dorsal striatum, increases the probability of movement40,41. Critically, these behavioral effects are apparent within a couple hundred milliseconds of the onset of optogenetic stimulation. The ability of reward-predictive cues to boost motivation appears to be mediated by very rapid dopaminergic modulation of the excitability of NAc spiny neurons42. Since dopamine is changing quickly, and these dopamine changes affect motivation quickly, the motivational functions of dopamine are better described as fast(“phasic”), not slow(“tonic”).
Furthermore, invoking separate fast and slow time scales does not in itself solve the decoding problem faced by neurons with dopamine receptors. If dopamine signals learning, modulation of synaptic plasticity would seem an appropriate cellular response. But immediate effects on motivated behavior imply immediate effects on spiking – e.g. through rapid changes in excitability. Dopamine can have both of these postsynaptic effects(and more), so does a given dopamine concentration have a specific meaning? Or does this meaning need to be constructed – e.g. by comparing dopamine levels across time, or by using other coincident signals to determine which cellular machinery to engage? This possibility is discussed further below.
Does dopamine release convey the same information as dopamine cell firing?
The relationship between fast dopamine fluctuations and motivational value seems strange, given that dopamine cell firing instead resembles RPE. Furthermore, some studies have reported RPE signals in mesolimbic dopamine release43. It is important to note a challenge in interpreting some forms of neural data. Value signals and RPEs are correlated with each other – not surprisingly as the RPE is usually defined as the change in value from one moment to the next(“temporal-difference” RPE). Because of this correlation it is critical to use experimental designs and analyses that distinguish value from RPE accounts. The problem is compounded when using a neural measure that relies on relative, rather than absolute, signal changes. Voltammetry analyses usually compare dopamine at some time point of interest to a “baseline” epoch earlier in each trial(to remove signal components that are non-dopamine-dependent, including electrode charging on each voltage sweep and drift over a timescale of minutes). But subtracting away a baseline can make a value signal resemble an RPE signal. This is what we observed in our own voltammetry data(Fig. 2e). Changes in reward expectation were reflected in changes in dopamine concentration early in each trial, and these changes are missed if one just assumes a constant baseline across trials15. Conclusions about dopamine release and RPE coding thus need to be viewed with caution. This data interpretation danger applies not only to voltammetry, but to any analysis that relies on relative changes – potentially including some fMRI and photometry44.
Nonetheless, we still need to reconcile value-related dopamine release in NAc core with the consistent absence of value-related spiking by dopamine neurons13, even within the lateral VTA area that provides dopamine to NAc core45. One potential factor is that dopamine cells are usually recorded in head-restrained animals performing classical conditioning tasks, while dopamine release is typically measured in unrestrained animals actively moving through their environment. We proposed that mesolimbic dopamine might specifically indicate the value of “work”15 – that it reflects a requirement for devoting time and effort to obtain the reward. Consistent with this, dopamine increases with signals instructing movement, but not with signals instructing stillness, even when they indicate similar future reward46. If – as in many classical conditioning tasks – there is no benefit to active “work”, then dopaminergic changes indicating the value of work may be less apparent.
Even more important may be the fact that dopamine release can be locally controlled at the terminals themselves, and thus show spatio-temporal patterns independent of cell body spiking. For example, the basolateral amygdala (BLA) can influence NAc dopamine release even when VTA is inactivated47. Conversely, inactivating BLA reduces NAc dopamine release and corresponding motivated behavior, without apparently affecting VTA firing48. Dopamine terminals have receptors for a range of neurotransmitters, including glutamate, opioids, and acetylcholine. Nicotinic acetylcholine receptors allow striatal cholinergic interneurons(CINs) to rapidly control dopamine release49,50. Although it has long been noted that local control of dopamine release is potentially important7,51, it has not been included in computational accounts of dopamine function. I propose that dopamine release dynamics related to value coding arise largely through local control, even as dopamine cell firing provides important RPE-like signals for learning.
How can dopamine mean both learning and motivation without confusion?
In principle, a value signal is sufficient to convey RPE as well, since temporal-difference RPEs simply are rapid changes in value(Fig. 2B). For example, distinct intracellular pathways in target neurons might be differently sensitive to the absolute concentration of dopamine(representing value) versus fast relative changes in concentration(representing RPE). This scheme seems plausible, given the complex dopamine modulation of spiny neuron physiology52 and their sensitivity to temporal patterns of calcium concentration53. Yet this also seems somewhat redundant. If an RPE-like signal already exists in dopamine cell spiking, it ought to be possible to use it rather than re-deriving RPE from a value signal.
To appropriately use distinct RPE and value signals, dopamine-recipient circuits may actively switch how they interpret dopamine. There is intriguing evidence that acetylcholine may serve this switching role too. At the same time as dopamine cells fire bursts of spikes to unexpected cues, CINs show brief(~150ms) pauses in firing, which do not scale with RPEs54. These CIN pauses can be driven by VTA GABAergic neurons55 as well as “surprise”-related cells in the intralaminar thalamus, and have been proposed to act as an associability signal promoting learning56. Morris and Bergman suggested54 that cholinergic pauses define temporal windows for striatal plasticity, during which dopamine can be used as a learning signal. Dopamine-dependent plasticity is continuously suppressed by mechanisms including muscarinic m4 receptors on direct-pathway striatal neurons57. Models of intracellular signaling suggest that during CIN pauses, the absence of m4 binding may act synergistically with phasic dopamine bursts to boost PKA activation58, thereby promoting synaptic change.
Striatal cholinergic cells are thus well-positioned to dynamically switch the meaning of a multiplexed dopaminergic message. During CIN pauses, relief of a muscarinic block over synaptic plasticity would allow dopamine to be used for learning. At other times release from dopamine terminals would be locally sculpted to affect ongoing behavioral performance. Currently, this suggestion is both speculative and incomplete. It has been proposed that CINs integrate information from many surrounding spiny neurons to extract useful network-level signals such as entropy59,60. But it is not at all clear that CIN activity dynamics can be used to generate dopamine value signals61, and also to gate dopamine learning signals.
Does dopamine mean the same thing throughout the forebrain?
As the RPE idea took hold, it was imagined that dopamine was a global signal, broadcasting an error message throughout striatal and frontal cortical targets. Schultz emphasized that monkey dopamine cells throughout VTA and SNc have very similar responses62. Studies of identified dopamine cells have also found quite homogeneous RPE-like responses in rodents, at least for lateral VTA neurons within classical conditioning contexts13. Yet dopamine cells are molecularly and physiologically diverse63–65 and there are now many reports that they show diverse firing patterns in behaving animals. These include phasic increases in firing to aversive events66 and trigger cues67 that fit poorly with the standard RPE account. Many dopamine cells show an initial short-latency response to sensory events that reflects surprise or “alerting” more than specific RPE coding68,69. This alerting aspect is more prominent in SNc69, where dopamine cells project more to “sensorimotor” dorsal/lateral striatum(DLS45,63). Subpopulations of SNc dopamine cells have also been reported to increase41 or decrease70 firing in conjunction with spontaneous movements, even without external cues.
Several groups used fiber photometry and the calcium indicator GCaMP to examine bulk activity of subpopulations of dopamine neurons71,72. Dopamine cells that project to the dorsal/medial striatum(DMS) showed transiently depressed activity to unexpected brief shocks, while those projecting to DLS showed increased activity71– more consistent with an alerting response. Distinct dopaminergic responses in different forebrain subregions have also been observed using GCaMP to examine activity of dopamine axons and terminals40,72,73. Using two-photon imaging in head-restrained mice, Howe and Dombeck40 reported phasic dopamine activity related to spontaneous movements. This was predominantly seen in individual dopamine axons from SNc that terminated in dorsal striatum, while VTA dopamine axons in NAc responded more to reward delivery. Others also found reward-related dopaminergic activity in NAc, with DMS instead more linked to contralateral actions72 and the posterior tail of striatum responsive to aversive and novel stimuli74.
Direct measures of dopamine release also reveal heterogeneity between subregions30,75. With microdialysis we found dopamine to be correlated with value specifically in NAc core and ventral-medial frontal cortex, not in other medial parts of striatum(NAc shell, DMS) or frontal cortex. This is intriguing as it appears to map well to two “hotspots” of value coding consistently seen in human fMRI studies studies76,77. In particular the NAc BOLD signal, which has a close relationship to dopamine signaling78, increases with reward anticipation(value) – more than with RPE76.
Whether these spatial patterns of dopamine release arise from firing of distinct dopamine cell subpopulations, local control of dopamine release, or both, they challenge the idea of a global dopamine message. One might conclude that there are many different dopamine functions, with (for example) dopamine in dorsal striatum signaling “movement” and dopamine in ventral striatum signaling “reward”40. However, I favor another conceptual approach. Different striatal subregions get inputs from different cortical regions, and so will be processing different types of information. Yet each striatal subregion shares a common microcircuit architecture, including separate D1- versus D2- receptor bearing spiny neurons79, CINs, and so forth. Although it is common to refer to various striatal subregions(e.g. DLS, DMS, NAc core) as if they are discrete areas, there are no sharp anatomical boundaries between them(NAc shell is a bit more neurochemically distinct). Instead there are just gentle gradients in receptor density, interneuron proportions etc., which seem more like tweaks to the parameters of a shared computational algorithm. Given this common architecture, can we describe a common dopamine function, abstracted away from the specific information being handled by each subregion?
Striatal dopamine and the allocation of limited resources.
I propose that a variety of disparate dopamine effects on ongoing behavior can be understood as modulation of resource allocation decisions. Specifically, dopamine provides estimates of how worthwhile it is to expend a limited internal resource, with the particular resource differing between striatal subregions. For “motor” striatum(~DLS) the resource is movement, which is limited because moving costs energy, and because many actions are incompatible with each other80. Increasing dopamine makes it more likely that an animal will decide it is worth expending energy to move, or move faster6,40,81. Note that a dopamine signal that encodes “movement is worthwhile” will produce correlations between dopamine and movement, even without dopamine encoding “movement” per se.
For “cognitive” striatum(~DMS) the resources are cognitive processes including attention(which is limited-capacity by definition82) and working memory83. Without dopamine, salient external cues that normally provoke orienting movements are neglected, as if considered less attention-worthy3. Furthermore, deliberately marshaling cognitive control processes is effortful(costly84). Dopamine – especially in DMS85 – plays a key role in deciding whether it is worth exerting this effort86,87. This can include whether to employ more cognitively-demanding, deliberative(“model-based”) decision strategies88.
For “motivational” striatum(~NAc) one key limited resource may be the animal’s time. Mesolimbic dopamine is not required when animals perform a simple, fixed action to rapidly obtain rewards89. But many forms of reward can only be obtained through prolonged work: extended sequences of unrewarded actions, as in foraging. Choosing to engage in work means that other beneficial ways of spending time must be foregone. High mesolimbic dopamine indicates that engaging in temporally-extended, effortful work is worthwhile, but as dopamine is lowered animals do not bother, and may instead just prepare to sleep90.
Within each cortico-striatal loop circuit dopamine’s contribution to ongoing behavior is thus both economic(concerned with resource allocation) and motivational(whether it is worthwhile to expend resources81). These circuits are not fully independent, but rather have a hierarchical, spiraling organization: more ventral portions of striatum influence dopamine cells that project to more dorsal portions5,91. In this way decisions to engage in work may also help invigorate required specific, briefer movements. But overall, dopamine provides “activational” signals – increasing the probability that some decision is made – rather than “directional” signals specifying how resources should be spent5.
What is the computational role of dopamine as decisions are made?
One way of thinking about this activational role is in terms of decision-making “thresholds”. In certain mathematical models decision processes increase until they reach a threshold level, when the system becomes committed to an action92. Higher dopamine would be equivalent to a lower distance-to-threshold, so that decisions are reached more rapidly. This idea is simplistic, yet makes quantitative predictions that have been confirmed. Lowering thresholds for movement would cause a specific change in the shape of the reaction time distribution, just what is seen when amphetamine is infused into sensorimotor striatum20.
Rather than fixed thresholds, behavioral and neural data may be better fit if thresholds decrease over time, as if decisions become increasingly urgent. Basal ganglia output has been proposed to provide a dynamically-evolving urgency signal, which invigorates selection mechanisms in cortex93. Urgency was also greater when future rewards were closer in time, making this concept similar to the value coding, activational role of dopamine.
Is such an activational role sufficient to describe the performance-modulating effects of striatal dopamine? This is related to the long-standing question of whether basal ganglia circuits directly select among learned actions80 or merely invigorate choices made elsewhere93,94. There are at least two ways in which dopamine can appear to have a more “directional” effect. The first is when dopamine acts within a brain subregion that processes inherently directional information. Basal ganglia circuits have an important, partly-lateralized role orienting towards and approaching potential rewards. The primate caudate(~DMS) is involved in driving eye movements towards contralateral spatial fields95. A dopaminergic signal that something in contralateral space is worth orienting towards may account for the observed correlation between dopaminergic activity in DMS and contralateral movements72, as well as the rotational behavior produced by dopamine manipulations96. A second “directional” influence of dopamine is apparent when (bilateral) dopamine lesions bias rats towards low-effort / low-reward choices, rather than high-effort / high-reward alternatives97. This may reflect the fact that some decisions are more serial than parallel, with rats(and humans) evaluating options one-at-a-time98. In these decision contexts dopamine may still pay a fundamentally activational role by conveying the value of the currently-considered option, which can then be accepted or not24.
Active animals make decisions at multiple levels, often at high rates. Beyond thinking about individual decisions, it may be helpful to consider an overall trajectory through a sequence of states(Fig. 1). By facilitating transitions from one state to the next, dopamine may accelerate the flow along learned trajectories99. This may relate to the important influence of dopamine over the timing of behavior44,100. One key frontier for future work is to gain a deeper understanding of how such dopamine effects on ongoing behavior arise mechanistically, by altering information processing within single cells, microcircuits and large-scale cortical-basal ganglia loops. Also, I have emphasized common computational roles of dopamine across a range of striatal targets, but largely neglected cortical targets, and it remains to be seen whether dopamine functions in both structures can be described within the same framework.
In summary, an adequate description of dopamine would explain how dopamine can signal both learning, and motivation, on the same fast time scales, without confusion. It would explain why dopamine release in key targets covaries with reward expectation even though dopamine cell firing does not. And it would provide a unified computational account of dopamine actions throughout striatum and elsewhere, which explains disparate behavioral effects on movement, cognition, and timing. Some specific ideas presented here are speculative, but are intended to invigorate renewed discussion, modeling, and incisive new experiments.
I thank the many colleagues who provided insightful comments on earlier text drafts, including Kent Berridge, Peter Dayan, Brian Knutson, Jeff Beeler, Peter Redgrave, John Lisman, Jesse Goldberg, and the anonymous Referees. I regret that space limitations precluded discussion of many important prior studies. Essential support was provided by the National Institute on Neurological Disorders and Stroke, the National Institute of Mental Health, and the National Institute on Drug Abuse.