Dissociable dopamine dynamics for learning and motivation (2019)

https://www.nature.com/articles/s41586-019-1235-y

Nature (2019) | Download Citation

Abstract

The dopamine projection from ventral tegmental area (VTA) to nucleus accumbens (NAc) is critical for motivation to work for rewards and reward-driven learning. How dopamine supports both functions is unclear. Dopamine cell spiking can encode prediction errors, which are vital learning signals in computational theories of adaptive behaviour. By contrast, dopamine release ramps up as animals approach rewards, mirroring reward expectation. This mismatch might reflect differences in behavioural tasks, slower changes in dopamine cell spiking or spike-independent modulation of dopamine release. Here we compare spiking of identified VTA dopamine cells with NAc dopamine release in the same decision-making task. Cues that indicate an upcoming reward increased both spiking and release. However, NAc core dopamine release also covaried with dynamically evolving reward expectations, without corresponding changes in VTA dopamine cell spiking. Our results suggest a fundamental difference in how dopamine release is regulated to achieve distinct functions: broadcast burst signals promote learning, whereas local control drives motivation.

Main

Dopamine is famously related to ‘reward’—but how exactly? One function involves learning from unexpected rewards. Brief increases in dopamine cell firing encode reward prediction errors (RPEs)^1,2,3—learning signals for optimizing future motivated behaviour. Dopamine manipulations can affect learning as if they are altering RPEs^4,5,6, but they also affect motivated behaviours immediately, as if dopamine signals reward expectation (value)⁵. Furthermore, NAc dopamine escalates during motivated approach, consistent with dopamine encoding value^7,8,9,10,11.

With few exceptions^2,12,13, midbrain dopamine firing has been examined during classical conditioning in head-fixed animals^3,14, unlike forebrain dopamine release. We therefore compared firing with release under the same conditions. We identified VTA dopamine neurons using optogenetic tagging^3,13. To measure NAc dopamine release, we used three independent methods—microdialysis, voltammetry and the optical sensor dLight¹⁵—with convergent results. Our primary conclusion is that although RPE-scaled VTA dopamine spike bursts provide abrupt changes in dopamine release appropriate for learning, separate NAc dopamine fluctuations associated with motivation arise independently from VTA dopamine cell firing.

Dopamine tracks motivation in key loci

We trained rats in an operant ‘bandit’ task⁵ (Fig. 1a, b). On each trial, illumination of a nose-poke port (‘Light-on’) prompted approach and entry (‘Centre-in’). After a variable hold period (0.5–1.5 s), white noise (‘Go cue’) led the rat to withdraw (‘Centre-out’) and poke an adjacent port (‘Side-in’). On rewarded trials, this Side-in event was accompanied by a food-hopper click that prompted the rat to approach a food port (‘Food-port-in’) to collect a sugar pellet. Leftward and rightward choices were each rewarded with independent probabilities, which occasionally changed without warning. When rats were more likely to receive rewards, they were more motivated to perform the task. This was apparent in their ‘latency’—the time between Light-on and Centre-in—which was sensitive to the outcome of the preceding few trials (Extended Data Fig. 1) and thereby scaled inversely with reward rate (Fig. 1b).

Fig. 1: Dopamine release covaries with reward rate specifically in NAc core and ventral prelimbic cortex.

We previously reported⁵ a correlation between NAc dopamine release and reward rate, consistent with the motivational role of mesolimbic dopamine¹⁶. Here, we first aimed to determine whether this relationship is observed throughout forebrain targets, consistent with ‘globally broadcast’ dopamine signalling¹⁷, or is restricted to specific subregions. We further hypothesized that these dopamine dynamics would differ between striatum and cortex, as these structures have distinct dopamine uptake–degradation kinetics¹⁸ and may use dopamine for distinct functions^19,20.

Using microdialysis with high performance liquid chromatography–mass spectrometry (HPLC–MS), we surveyed medial frontal cortex and striatum (Fig. 1c, Extended Data Fig. 1). We simultaneously assayed 21 neurotransmitters and metabolites with 1-min time resolution, and used regression to compare chemical time series with behavioural variables (Extended Data Fig. 2).

We replicated the correlation between reward rate and NAc dopamine—in contrast to other neurotransmitters (Fig. 1c, d). However, this relationship was localized to NAc core, and did not hold in the NAc shell or dorsal–medial striatum. Contrary to our hypothesis, we observed a similar spatial pattern in frontal cortex: dopamine release correlated with reward rate in ventral prelimbic cortex, but not in more dorsal or ventral subregions (Fig. 1c, e). Though unexpected, these twin ‘hotspots’ of value-related dopamine release have an intriguing parallel in human neuroimaging: blood oxygen level-dependent signal correlates with subjective value, specifically in NAc and ventral–medial prefrontal cortex²¹.

VTA firing is unrelated to motivation

We next addressed whether this motivation-related forebrain dopamine arises from variable firing of midbrain dopamine cells. The NAc core receives dopamine input from lateral portions of VTA (VTA-l)^6,22,23. In head-fixed mice, VTA-l dopamine neurons reportedly have uniform, RPE-like responses to conditioned stimuli³. To record VTA-l dopamine cells, we infected the VTA with adeno-associated virus (AAV) for Cre-dependent expression of channelrhodopsin (AAV-DIO-ChR2) in rats that express Cre recombinase under a tyrosine hydroxylase (TH) promoter (see Methods). Optrodes (Fig. 2a, b) recorded single-unit responses to brief blue-laser pulses (Fig. 2c, Extended Data Figs. 3, 4, Supplementary Fig. 1). We found 27 well-isolated VTA-l cells with reliable short-latency spikes, and identified them as dopamine neurons.

Fig. 2: Activity of identified VTA dopamine neurons does not change with reward rate.

All dopamine neurons were tonically active, with relatively low firing rates (mean 7.7 Hz, range 3.7–12.9 Hz; compared to all VTA-l neurons recorded together with dopamine cells, P < 0.001 one-tailed Mann–Whitney test). They also had longer-duration spike waveforms (P < 5 × 10⁻⁶, one-tailed Mann–Whitney test), although there were exceptions (Fig. 2d), which confirms that waveform duration is an insufficient marker of dopamine cells in vivo^3,24. A distinct cluster of VTA-l neurons (n = 38, from the same sessions) with brief waveforms and higher firing rates (>20 Hz; mean 41.3 Hz, range 20.1–97.1 Hz) included no tagged dopamine cells. We presume that these faster-firing cells are GABAergic and/or glutamatergic^3,25, and refer to them as ‘non-dopamine’ below.

We recorded the same dopamine cells across multiple behavioural tasks. VTA-l dopamine cells responded strongly to randomly timed food-hopper clicks, and progressively less strongly when these clicks were made more predictable by preceding cues (Extended Data Fig. 5). This is consistent with canonical RPE-like coding by dopamine cells in Pavlovian tasks^2,3,26.

On the basis of evidence from anaesthetized animals, it has previously been argued that altered dopamine levels measured with microdialysis arise from changes in the tonic firing rate of dopamine cells²⁷ and/or the proportion of active versus inactive dopamine neurons²⁸. However, in the bandit task, tonic dopamine cell firing in each block of trials was indifferent to reward rate (Fig. 2e, g). There was no significant change in the firing rates of individual dopamine cells, or those of any other VTA-l neurons, between higher- and lower-reward blocks (Fig. 2f, h; see also ref. ²⁹ for concordant results in head-fixed mice). There was also no overall change in the rate at which dopamine cells fire bursts of spikes (Fig. 2i). Furthermore, we did not observe any dopamine cells switching between active and inactive states. The proportion of time dopamine cells spent inactive (long inter-spike intervals) was very low, and did not change between higher- and lower-reward blocks (Fig. 2i).

The anatomy of the VTA–NAc dopamine projection has been intensively investigated^6,22,23, but—given this apparent functional mismatch between firing and release—we reconfirmed that we were recording from the correct portion of the VTA. Small injections of the retrograde tracer cholera toxin B (CTb) into NAc core resulted in dense labelling of TH⁺ neurons within the same VTA-l area as our optrode recordings (Extended Data Fig. 3). Within the approximate recording zone, 21% of TH⁺ cells were also CTb⁺, and this is likely to be an underestimate of the fraction of NAc core-projecting VTA-l dopamine cells, as our tracer injections did not completely fill the NAc core. Thus, our sample of n = 27 tagged VTA dopamine cells (plus many more untagged cells) almost certainly includes NAc core-projecting neurons. Finally, in an additional rat we recorded two tagged VTA-l dopamine cells after infusing AAV selectively into the NAc core (Extended Data Fig. 3). Both retrogradely infected cells had firing patterns that closely resembled the other tagged dopamine cells in all respects, including a lack of tonic firing changes with varying reward rate (Supplementary Fig. 1). We conclude that changes in tonic VTA-l dopamine cell firing are not responsible for motivation-related changes in forebrain dopamine release.

Tracking release on multiple timescales

Does NAc dopamine release track reward rate per se, as suggested in some theories³⁰, or is this correlation driven by dynamic fluctuations in dopamine release that are too fast to resolve with microdialysis? We argued for the latter possibility on the basis of voltammetry data⁵, but sought confirmation using an independent measure of dopamine release that can span different timescales. The dLight1 suite of genetically encoded optical dopamine indicators was engineered by inserting circularly permutated GFP into dopamine D1 receptors¹⁵. Binding of dopamine causes a highly specific increase in fluorescence (Fig. 3a). We infused AAV into NAc to express either dLight1.1 (four verified NAc placements from three rats) or the brighter variant dLight1.3b (six verified NAc placements from four rats) and monitored fluorescence by fibre photometry. We observed clear NAc dopamine responses to Pavlovian reward-predictive cues, similarly to VTA dopamine cell firing (Extended Data Fig. 5).

Fig. 3: Bridging timescales of dopamine measurement.

For the bandit task, we first examined the dLight signal in 1-min bins (Fig. 3b) for comparison to microdialysis. We again saw a clear relationship between NAc dopamine release and reward rate, in both cross-correlation and analysis of block transitions (Fig. 3c, d). We next examined more closely how this relationship arises. Rather than slowly varying on a timescale of minutes, the dLight signal showed highly dynamic fluctuations within and between each trial (Fig. 3e). We compared these fluctuations to instantaneous state values and RPEs estimated from a reinforcement-learning model (a semi-Markov decision process⁵). As was previously reported using voltammetry⁵, moment-by-moment NAc dopamine showed a strong correlation with state values (Fig. 3f), visible as ramping up within trials when rewards were expected (Fig. 3e). We also saw transient increases with less-expected reward deliveries, consistent with RPE (examined below). In every dLight session, dopamine showed a stronger correlation with values than either RPEs or reward rate (Fig. 3h, Extended Data Fig. 6). Correlations with both state values and RPE were maximal with respect to the dLight signal ~0.3 s later, consistent with a brief lag caused by neural processing of cues and sensor-response time (Fig. 3g; with voltammetry, we reported a lag of 0.4–0.5 s)⁵.

Dopamine firing does not explain release

We next compared dopamine cell firing and release around bandit-task events. External stimuli at Light-on, Go cue and rewarded Side-in (food-hopper click) each evoked a rapid firing increase (Fig. 4a). These responses were observed in the great majority of dopamine cells (Fig. 4c), although the relative magnitude of responses to different cues varied from cell to cell (Supplementary Fig. 1). The NAc dLight signal also responded rapidly and reliably to each of these salient cues (Fig. 4b, c), consistent with burst firing of dopamine cells driving dopamine release.

Fig. 4: Phasic VTA dopamine firing does not account for NAc dopamine dynamics.

We also saw clear increases in NAc dopamine release as rats approached the start port (just before Centre-in) and the food port (just before Food-port-in). This fits well with the extensive voltammetry literature showing that motivated approach behaviours are accompanied by rapid increases in NAc core dopamine^{5,7,8,9,10,11}. However, the VTA-l dopamine cell population did not show a corresponding increase in firing at these times (Fig. 4a; see Extended Data Fig. 7 for additional comparisons, including to non-dopamine cells).

To better dissociate cue-evoked, and approach-related, dopamine activity, we separated trials by short (<1 s) and long (>2 s) latencies (Fig. 4d, e). Increases in dopamine cell firing were consistently locked to the cue onset at Light-on, preferentially for short-latency trials. All 25 dopamine cells with significant firing rate increases after Light-on were better aligned to Light-on than Centre-in (Fig. 4e). By contrast, increases in NAc dopamine release before Centre-in were distinct from cue-evoked dopamine release (Fig. 4d, e). dLight signals consistently increased before Centre-in on long-latency trials (ten out of ten sessions) and before food-port-in (nine out of ten sessions), without corresponding increases in dopamine firing (Fig. 4f).

Finally we considered how event-related dopamine signals depend on recent reward history. During the early part of each trial, dopamine cell firing was not dependent on reward rate (Fig. 5a), despite the influence of reward rate on motivation (Fig. 5b). Subsequently, the phasic response to the reward cue at Side-in was reliably stronger when the reward rate was lower (Fig. 5a), consistent with positive RPE encoding. When this reward cue was omitted, dopamine cells paused firing, though encoding of negative RPEs was much weaker or absent, whether examined at the population level (Fig. 5a, b) or as individual cells (Extended Data Fig. 8). It has previously been proposed that negative RPEs are encoded in the duration of dopamine pauses³¹, but this was observed in just 2 out of 29 individual neurons. Similar results were obtained if reward expectation was estimated in other ways, including trial-based reinforcement learning models (actor-critic and Q-learning) or simply by counting recent rewards (Extended Data Fig. 8).

Fig. 5: Reward history affects VTA dopamine cell firing and NAc dopamine release differently.

Dopamine release at Side-in also showed a clear, transient encoding of positive RPEs, but not of negative RPEs (Fig. 5c, d). This dLight response was slightly delayed and prolonged compared to firing, consistent with time taken for release and reuptake³², but remained a subsecond phenomenon. Unlike firing, however, dLight signals early in each trial were greater when recent trials had been rewarded (Fig. 5c), consistent with value coding. We observed this dependence on reward history even when the rat was not actively moving, but was maintaining a nose poke in the centre port while waiting for the Go cue (Fig. 5d). Overall, we conclude that NAc dopamine release reflects both cue-evoked responses and reward expectation, and that only the former can be well accounted for by VTA-l dopamine cell firing.

Discussion

VTA-l provides the predominant source of dopamine to the NAc core^6,23,24. VTA-l dopamine cells, including those that project to the NAc core, consistently display RPE-encoding bursts^3,12. VTA bursts are thought to be particularly important for driving NAc dopamine³², and indeed we found that cue-evoked VTA bursts were matched by NAc release. However, we additionally found value-related patterns of NAc dopamine release that were not generated by firing of VTA-l dopamine cells, either on long (tonic) or short (phasic) timescales. Other dopamine subpopulations may carry distinct signals^13,33,34, and we cannot rule out the possibility that firing of dopamine cell subpopulations not recorded from here produces value-related dopamine in NAc core. However, value-related firing has never been reported for any dopamine cells, across a wide range of studies. Our results suggest that NAc dopamine dynamics are controlled in different ways, at different times and for different functions, and that recording dopamine cells is important but not sufficient for understanding dopamine signals³⁵.

Release from dopamine terminals is potently influenced by local, non-spiking mechanisms^{36,37,38,39,40}. For example, NAc dopamine release is modulated by the basolateral amygdala even when VTA spiking is pharmacologically suppressed^41,42. It has been noted for decades that local control of dopamine release might achieve functions distinct from those of dopamine cell spiking^36,43, but this has not been incorporated into theoretical views of dopamine. Distinct striatal subregions contribute to different types of decisions, and may influence their own dopamine release according to need⁴⁴. It remains to be determined just how localized this control of dopamine release can be. One limitation shared by the 3 ways that we measured dopamine release is that they all sample on a spatial scale of at least 100 µm, whereas in vivo microscopy suggests that dopamine release may be heterogeneous at considerably smaller scales¹⁵.

Our results do not support the existence of any separate tonic dopamine signal that could mediate motivational effects of dopamine. Instead, dopamine shifts that appear slow if measured slowly (with microdialysis) resolve into rapid fluctuations if measured rapidly (with voltammetry or dLight). Furthermore, recordings of identified VTA dopamine cells by ourselves and others³⁰ provide strong evidence against the idea²⁹ that changes in tonic dopamine cell firing drive tonic changes in dopamine release. Although tonic firing can be altered by lesions or drug manipulations²⁸, we are not aware of sustained changes in firing rate in any behavioural task. Firing can ramp downwards on a timescale of about 1 s during anticipation of motivationally relevant events^45,46. However, this decline is the opposite of what would be required to boost dopamine release with reward expectation, and instead bears more resemblance to a sequence of transient negative prediction errors⁴⁷. Although sustained signals encoding ongoing reward rate could be computationally useful³⁰, dopamine instead provides rapidly fluctuating error and value signals. It remains possible that sustained signals are computed at a subsequent step, by intracellular signalling pathways downstream of dopamine receptors.

Many groups have observed ramping dopamine release as rats approach rewards^{5,7,8,9,10,11}, consistent with encoding escalating reward expectations. Some have argued that these dopamine ramps simply reflect RPEs, by supposing that rats either rapidly forget values⁴⁸ or that they have a warped set of state representations⁴⁹. This latter idea is not supported by our observation that ramping is rapidly modulated from trial to trial on the basis of updated reward expectations, becoming stronger within a short sequence of successive rewards while RPE-like responses to cues become weaker (Fig. 3e). More generally, any theory in which dopamine solely conveys RPEs (learning signals) cannot account for the very well-established connection between ongoing mesolimbic dopamine and motivation¹⁶. The NAc core is not needed for highly trained responses to conditioned stimuli, but is particularly important when deciding to perform time-consuming work to obtain rewards⁵⁰. NAc core dopamine appears to provide an essential dynamic signal of how worthwhile it is to allocate time and effort to work^5,44, even though this signal is not present in VTA dopamine cell firing.

Methods

Animals

All animal procedures were approved by the University of Michigan or University of California San Francisco Institutional Committees on Use and Care of Animals. Male rats (300–500 g, either wild-type Long-Evans or TH-Cre⁺ with a Long-Evans background⁵²) were maintained on a reverse 12:12 light:dark cycle and tested during the dark phase. Rats were mildly food deprived, receiving 15 g of standard laboratory rat chow daily in addition to food rewards earned during task performance. No sample size precalculation was performed. The investigators were not blinded to allocation during experiments and outcome assessment.

Behaviour

Pretraining and testing were performed in computer-controlled Med Associates operant chambers (25 cm × 30 cm at widest point) each with a five-hole nose-poke wall, as previously described⁵. Bandit-task sessions used the following parameters: block lengths were 35-45 trials, randomly selected for each block; hold period before Go cue was 500–1,500 ms (uniform distribution); left–right reward probabilities were 10, 50 and 90% (for electrophysiology, photometry, voltammetry and previously reported microdialysis rats⁵) or 20, 50 and 80% (newly reported microdialysis rats).

Current reward rate was estimated using a time-based leaky-integrator⁵³. Reward rate was incremented each time a reward was received, and decayed exponentially at a rate set by parameter τ (the time in s for the reward rate to decrease by ~63%, that is, 1−1/e). For all analyses, τ was selected on the basis of the rat’s behaviour, maximizing the (negative) correlation between reward rate and log(latency) in each session. The correlations between forebrain dopamine and reward rate were not highly sensitive to this choice of τ (Extended Data Fig. 1).

To classify block transitions as ‘increasing’ or ‘decreasing’ in reward rate, we compared the average leaky-integrator reward rate in the last 5 min of a block to the average reward rate in the first 8 min of the subsequent block.

Rats used for electrophysiology and photometry also performed a Pavlovian approach task, in the same operant chamber with the houselight on throughout the session. Three auditory cues (2 kHz, 5 kHz and 9 kHz) were associated with different probabilities of food delivery (counterbalanced across rats). Cues were played as a train of tone pips (100 ms on, 50 ms off) for a total duration of 2.6 s followed by a delay period of 500 ms. Cues and unpredicted reward deliveries were delivered in pseudorandom order with a variable inter-trial interval (15–30 s, uniform distribution).

Microdialysis

Surgery

Rats were implanted bilaterally with guide cannulae (CMA, 830 9024) in cortex and striatum. One group (n = 8) received one guide cannula targeting prelimbic and infralimbic cortex (anteroposterior (AP) +3.2 mm, mediolateral (ML) 0.6 mm relative to bregma; and dorsoventral (DV) 1.4 mm below brain surface) and another targeting dorsomedial striatum and nucleus accumbens in the opposite hemisphere (AP +1.3, ML 1.9 and DV 3.4). Both implants were angled 5 degrees away from each other along the rostral–caudal plane. A second group (n = 4) received one guide cannula targeting anterior cingulate cortex (AP +1.6, ML 0.8 and DV 0.8) and another targeting accumbens (core/shell in the opposite hemisphere at AP +1.6, ML 1.4 and DV 5.5 (n = 2) or AP +1.6, ML 1.9 and DV 5.7 (n = 2). Implant sides were counterbalanced across rats. Animals were allowed to recover for one week before retraining.

Chemicals

Water, methanol, and acetonitrile for mobile phases were Burdick & Jackson HPLC grade, purchased from VWR (Radnor). All other chemicals were purchased from Sigma Aldrich unless otherwise noted. Artificial cerebrospinal fluid (aCSF) comprised 145 mM NaCl, 2.68 mM KCl, 1.40 mM CaCl₂, 1.01 mM MgSO₄, 1.55 mM Na₂HPO₄ and 0.45 mM NaH₂PO₄, adjusted pH to 7.4 with NaOH. Ascorbic acid (250 nM final concentration) was added to reduce oxidation of analytes.

Sample collection and HPLC-MS

On testing day, animals were placed in the operant chamber with the houselight on. Custom-made concentric polyacrylonitrile membrane microdialysis probes (1-mm dialysing AN69 membrane; Hospal) were inserted bilaterally into guide cannula and perfused continuously (Chemyx, Fusion 400) with aCSF at 2 µl/min for 90 min to allow equilibration. After 5-min baseline collection the houselight was extinguished, cueing the animal to bandit-task availability. Sample collection continued at 1-min intervals and samples were immediately derivatized⁵⁴ with 1.5 µl sodium carbonate, 100 mM; 1.5 µl benzoyl chloride (2% (v/v) benzoyl chloride in acetonitrile); and 1.5 µl isotopically labelled internal standard mixture diluted in 50% (v/v) acetonitrile containing 1% (v/v) sulfuric acid, and spiked with deuterated ACh and choline (C/D/N isotopes) to a final concentration of 20 nM. Sample series collection alternated between the two probes at 30-s intervals in each of 26 sessions, except for one session in which a broken membrane resulted in just one series (51 sample series total). Samples were analysed using Thermo Scientific UHPLC systems (Accela, or Vanquish Horizon interfaced to a Quantum Ultra triple quadrupole mass spectrometer fitted with a HESI II ESI probe), operating in multiple reaction monitoring. Five-microlitre samples were injected onto a Phenomenex core-shell biphenyl Kinetex HPLC column (2.1 mm × 100 mm). Mobile phase A was 10 mM ammonium formate with 0.15% formic acid, and mobile phase B was acetonitrile. The mobile phase was delivered an elution gradient at 450 µl/min as follows: initial, 0% B; 0.01 min, 19% B; 1 min, 26% B; 1.5 min, 75% B; 2.5 min, 100% B; 3 min, 100% B; 3.1 min, 5% B; and 3.5 min, 5% B. Thermo Xcalibur QuanBrowser (Thermo Fisher Scientific) was used to automatically process and integrate peaks. Each of the >100,000 peaks were visually inspected individually to ensure proper integration.

Analysis

All neurochemical concentration data were smoothed with a three-point moving average (y′ = [0.25 × (y−1) + 0.5y + 0.25 × (y+1)]) and z-score normalized within each session to facilitate between-session comparisons. For each target region, a cross-correlogram was generated for each session and the average of the sessions was plotted. One-per cent confidence boundaries were generated for each subplot by shuffling one time series 100,000 times and generating a distribution of correlation coefficients for each session. Multiple regression models were generated using the regress function in MATLAB, with the neurochemical as the outcome variable and behavioural metrics as predictors. Regression coefficients were determined significant at three alpha levels (0.05, 0.0005 and 0.000005), after Bonferroni-correction for multiple comparisons (alpha/(21 chemicals × 7 regions × 9 behavioural regressors)). For analysis of block transitions data were binned into 3-min epochs, discarding the sample that included the transition time.

Electrophysiology

Rats (n = 25) were implanted with custom-designed drivable optrodes, each consisting of 16 tetrodes (constructed from 12.5-µm nichrome wire, Sandvik) glued onto the side of a 200-µm optic fibre and extending up to 500 µm below the fibre tip. During the same surgery, we injected 1 µl AAV2/5-EF1a-DIO-ChR2(H134R)-EYFP into the lateral VTA (AP 5.6, ML 0.8, DV 7.5) or NAc core (AP 1.6, ML 1.6, DV 6.4). Wideband (1–9,000 Hz) brain signals were sampled (30,000 samples per s) using Intan digital headstages. Optrodes were lowered at least 80 µm at the end of each recording session. Individual units were isolated offline using a MATLAB implementation of MountainSort⁵⁵ followed by careful manual inspection.

Classification

To identify whether an isolated VTA-l unit was dopaminergic (TH⁺), we used the stimulus-associated latency test⁵⁶. In brief, at the end of each experimental session, we connected the optrode to a laser diode and delivered light pulse trains of different widths and frequencies. For a unit to be identified as light-responsive it needed to reach the significance level of P < 0.001 for 5-ms and 10-ms pulse trains. We also compared the light evoked waveforms (within 10 ms of laser pulse onset) to session-wide averages; all light-evoked units had a Pearson correlation coefficient of >0.9. Dopamine neurons were successfully recorded from four rats with VTA-l AAV infusions (IM657, 1 unit; IM1002, 3 units; IM1003, 15 units; IM1037, 9 units) and one rat with NAc core AAV (IM-1078, 2 units). Peak width was defined as the full-width-at-half-maximum of the most prominent negative component of the aligned, averaged spike waveform. Non-tagged VTA neurons with session-wide firing rate >20 Hz and peak width <200 µs were classified as non-dopamine cells. To ensure that we were comparing dopamine and non-dopamine cells within the same subregions, we only analysed non-dopamine cells recorded during sessions with at least one optically tagged dopamine cell.

Analysis

Spike bursts were detected by the conventional ‘80/160 template’ approach⁵⁷: each time an inter-spike-interval of 80 ms or less occurs, these and subsequent spikes are considered part of a burst until there is an interval of 160 ms or more. For comparison of ‘tonic’ firing to reward rate, dopamine spikes were counted in 1-min bins. To examine faster changes, spike density functions were constructed by convolving spike trains with a Gaussian kernel with variance 20 ms. To determine how quickly a neuron responded to a given cue, we used 40-ms bins (sliding in steps of 20 ms) and used a shuffle test (10,000 shuffles) for each time bin comparing the firing rate after cue onset to firing rate in the 250 ms immediately preceding the cue. The first bin at which the post-cue firing rate was significantly (P < 0.01, correcting for multiple comparisons) greater than baseline firing was considered the time to cue response.

Peak firing rate was calculated as the maximum (Gaussian-smoothed) firing rate of each trial in a 250-ms window after side-in for rewarded trials, and the valley was calculated as the minimum firing rate in a 2-s window, starting one second after side-in for unrewarded trials.

To calculate a ramp angle during approach behaviours, we smoothed mean firing rates with a 50-ms Gaussian kernel, detected the maximum/minimum of the resulting signal in a 0.5-s window before each event (centre-in or food-port-in) and measured the signed angle connecting the two extrema. To compare firing rates in ‘high’ and ‘low’ reward blocks, for each session we performed a median split of average leaky-integrator reward rate in each block.

Voltammetry and computational model

Fast-scan cyclic voltammetry results shown here reanalyse data previously presented in detail⁵. Within-trial estimates of state value and reward prediction errors were calculated using a semi-Markov decision process reinforcement learning model, exactly as previously described⁵.

Photometry

We used a viral approach to express the genetically encoded optical dopamine sensor dLight¹⁵. Under isoflurane anaesthesia, 1 μl of AAV9-CAG-dLight (1 × 10¹² viral genomes per ml; UC Davis vector core) was slowly (100 nl/min) injected (Nanoject III, Drummond) through a 30-µm glass micropipette in ventral striatum bilaterally (AP: 1.7 mm, ML: 1.7 mm, DV: −7.0 mm). During the same surgery optical fibres (400-µm core, 430-µm total diameter) attached to a metal ferrule (Doric) were inserted (target depth 200 µm higher than AAV) and cemented in place. Data were collected > three weeks later, to allow for dLight expression.

For dLight excitation blue (470 nm) and violet (405 nm; control) LEDs were sinusoidally modulated at distinct frequencies (211 Hz and 531 Hz, respectively⁵⁸). Both excitation and emission signals passed through minicube filters (Doric) and bulk fluorescence was measured with a femtowatt detector (Newport, Model 2151) sampling at 10 kHz. Demodulation produced separate 470 nm (dopamine) and 405 nm (control) signals, which were then rescaled to each other via a least-square fit⁵⁸. Fractional fluorescence signal (dF/F) was then defined as (470–405_fit)/405_fit. For all analyses this signal was downsampled to 50 Hz and smoothed with a five-point median filter. For presentation of 470 nm and 405 nm signals separately, see Extended Data Fig. 7.

Data from an optic fibre placement were included in analyses if the fibre tip was in NAc, and the fluorescence response to at least one task cue had a z-score of >1. These criteria excluded one rat, and yielded three rats/four placements (IM1065-left, IM1066-bilateral, IM1089-right) for dLight1.1, and four rats/six placements (IM1088-bilateral, IM1105-right, IM1106-bilateral, IM1107-right) for dLight1.3b. Similar results were obtained for dLight1.1 and dLight1.3 (Extended Data Fig. 7), so data were combined.

To calculate a ramp angle during approach behaviours, we detected the maximum/minimum of the resulting signal in a 0.5-s window before each event (centre-in or food-port-in) and measured the signed angle connecting the two extrema.

Affinity and molecular specificity of dLight1.3b

In vitro measurements were performed as previously described¹⁵. In brief, HEK293T (ATCC CRL#1573) cells were cultured and transfected with plasmids encoding dlight1.3b driven by a CMV promoter, and washed with HBSS (Life Technologies) supplemented with Ca²⁺ (4mM) and Mg²⁺ (2 mM) before imaging. Imaging was performed using a 40× oil-based objective on an inverted Zeiss Observer LSN710 confocal microscope with 488 nm/513 nm (excitation/emission) wavelengths. For testing the sensor’s fluorescence responses, neurotransmitters were directly applied to the bath during time-lapse imaging, in at least two independent experiments. Titrations of dopamine and noradrenaline were obtained by performing tenfold serial dilutions to achieve eight different concentrations. All other neurotransmitters were tested at three sequential concentrations (100 nM, 1 µM and 10 µM). All neurotransmitter concentrations were obtained by dilution from a 1 mM stock concentration in HBSS, prepared fresh. Raw fluorescence intensities from time lapse imaging were quantified on Fiji; each ROI was manually drawn on the membrane of individual cells. Fluorescent fold change (ΔF/F) was calculated as F peak (averaged fluorescence intensity of four frames) − F basal (averaged fluorescence intensity of four frames before addition of ligands)/F basal. Graphs and statistical analysis were performed using GraphPad Prism 6. Data points were analysed with a one-site specific binding curve fit to obtain K_d values. In box-and-whisker plots, the box covers the 25% to 75% range and whiskers extend from minimum to maximum values.

Reporting summary

Further information on research design is available in the Nature Research Reporting Summary linked to this paper.

Data availability

The AAV.Synapsin.dLight1.3b virus used in this study has been deposited with Addgene (no. 125560; http://www.addgene.org). All data will be available through the Collaborative Research in Computational Neuroscience data sharing website (https://doi.org/110.6080/K0VQ30V9).

Code availability

Custom MATLAB code is available on request from J.D.B.

Additional information

Publisher’s note: Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

References

1.
Schultz, W., Dayan, P. & Montague, P. R. A neural substrate of prediction and reward. Science 275, 1593–1599 (1997).
- CAS
- Article
- Google Scholar
2.
Pan, W. X., Schmidt, R., Wickens, J. R. & Hyland, B. I. Dopamine cells respond to predicted events during classical conditioning: evidence for eligibility traces in the reward-learning network. J. Neurosci. 25, 6235–6242 (2005).
- CAS
- Article
- Google Scholar
3.
Cohen, J. Y., Haesler, S., Vong, L., Lowell, B. B. & Uchida, N. Neuron-type-specific signals for reward and punishment in the ventral tegmental area. Nature 482, 85–88 (2012).
- ADS
- CAS
- Article
- Google Scholar
4.
Steinberg, E. E. et al. A causal link between prediction errors, dopamine neurons and learning. Nat. Neurosci. 16, 966–973 (2013).
- CAS
- Article
- Google Scholar
5.
Hamid, A. A. et al. Mesolimbic dopamine signals the value of work. Nat. Neurosci. 19, 117–126 (2016).
- CAS
- Article
- Google Scholar
6.
Saunders, B. T., Richard, J. M., Margolis, E. B. & Janak, P. H. Dopamine neurons create Pavlovian conditioned stimuli with circuit-defined motivational properties. Nat. Neurosci. 21, 1072–1083 (2018).
- CAS
- Article
- Google Scholar
7.
Phillips, P. E., Stuber, G. D., Heien, M. L., Wightman, R. M. & Carelli, R. M. Subsecond dopamine release promotes cocaine seeking. Nature 422, 614–618 (2003).
- ADS
- CAS
- Article
- Google Scholar
8.
Roitman, M. F., Stuber, G. D., Phillips, P. E., Wightman, R. M. & Carelli, R. M. Dopamine operates as a subsecond modulator of food seeking. J. Neurosci. 24, 1265–1271 (2004).
- CAS
- Article
- Google Scholar
9.
Wassum, K. M., Ostlund, S. B. & Maidment, N. T. Phasic mesolimbic dopamine signaling precedes and predicts performance of a self-initiated action sequence task. Biol. Psychiatry 71, 846–854 (2012).
- CAS
- Article
- Google Scholar
10.
Howe, M. W., Tierney, P. L., Sandberg, S. G., Phillips, P. E. & Graybiel, A. M. Prolonged dopamine signalling in striatum signals proximity and value of distant rewards. Nature 500, 575–579 (2013).
- ADS
- CAS
- Article
- Google Scholar
11.
Syed, E. C. et al. Action initiation shapes mesolimbic dopamine encoding of future rewards. Nat. Neurosci. 19, 34–36 (2016).
- CAS
- Article
- Google Scholar
12.
Morris, G., Nevet, A., Arkadir, D., Vaadia, E. & Bergman, H. Midbrain dopamine neurons encode decisions for future action. Nat. Neurosci. 9, 1057–1063 (2006).
- CAS
- Article
- Google Scholar
13.
da Silva, J. A., Tecuapetla, F., Paixão, V. & Costa, R. M. Dopamine neuron activity before action initiation gates and invigorates future movements. Nature 554, 244–248 (2018).
- ADS
- Article
- Google Scholar
14.
Fiorillo, C. D., Tobler, P. N. & Schultz, W. Discrete coding of reward probability and uncertainty by dopamine neurons. Science 299, 1898–1902 (2003).
- ADS
- CAS
- Article
- Google Scholar
15.
Patriarchi, T., Cho, J. R., Merten, K., Howe, M. W., et al. Ultrafast neuronal imaging of dopamine dynamics with designed genetically encoded sensors. Science 360, eaat4422 (2018).
- Article
- Google Scholar
16.
Salamone, J. D. & Correa, M. The mysterious motivational functions of mesolimbic dopamine. Neuron 76, 470–485 (2012).
- CAS
- Article
- Google Scholar
17.
Schultz, W. Predictive reward signal of dopamine neurons. J. Neurophysiol. 80, 1–27 (1998).
- CAS
- Article
- Google Scholar
18.
Garris, P. A. & Wightman, R. M. Different kinetics govern dopaminergic transmission in the amygdala, prefrontal cortex, and striatum: an in vivo voltammetric study. J. Neurosci. 14, 442–450 (1994).
- CAS
- Article
- Google Scholar
19.
Frank, M. J., Doll, B. B., Oas-Terpstra, J. & Moreno, F. Prefrontal and striatal dopaminergic genes predict individual differences in exploration and exploitation. Nat. Neurosci. 12, 1062–1068 (2009).
- CAS
- Article
- Google Scholar
20.
St Onge, J. R., Ahn, S., Phillips, A. G. & Floresco, S. B. Dynamic fluctuations in dopamine efflux in the prefrontal cortex and nucleus accumbens during risk-based decision making. J. Neurosci. 32, 16880–16891 (2012).
- Article
- Google Scholar
21.
Bartra, O., McGuire, J. T. & Kable, J. W. The valuation system: a coordinate-based meta-analysis of BOLD fMRI experiments examining neural correlates of subjective value. Neuroimage 76, 412–427 (2013).
- Article
- Google Scholar
22.
Ikemoto, S. Dopamine reward circuitry: two projection systems from the ventral midbrain to the nucleus accumbens-olfactory tubercle complex. Brain Res. Brain Res. Rev. 56, 27–78 (2007).
- CAS
- Article
- Google Scholar
23.
Breton, J. M. et al. Relative contributions and mapping of ventral tegmental area dopamine and GABA neurons by projection target in the rat. J. Comp. Neurol. (2018).
24.
Ungless, M. A., Magill, P. J. & Bolam, J. P. Uniform inhibition of dopamine neurons in the ventral tegmental area by aversive stimuli. Science 303, 2040–2042 (2004).
- ADS
- CAS
- Article
- Google Scholar
25.
Morales, M. & Margolis, E. B. Ventral tegmental area: cellular heterogeneity, connectivity and behaviour. Nat. Rev. Neurosci. 18, 73–85 (2017).
- CAS
- Article
- Google Scholar
26.
Morris, G., Arkadir, D., Nevet, A., Vaadia, E. & Bergman, H. Coincident but distinct messages of midbrain dopamine and striatal tonically active neurons. Neuron 43, 133–143 (2004).
- CAS
- Article
- Google Scholar
27.
Floresco, S. B., West, A. R., Ash, B., Moore, H. & Grace, A. A. Afferent modulation of dopamine neuron firing differentially regulates tonic and phasic dopamine transmission. Nat. Neurosci. 6, 968–973 (2003).
- CAS
- Article
- Google Scholar
28.
Grace, A. A. Dysregulation of the dopamine system in the pathophysiology of schizophrenia and depression. Nat. Rev. Neurosci. 17, 524–532 (2016).
- CAS
- Article
- Google Scholar
29.
Cohen, J. Y., Amoroso, M. W. & Uchida, N. Serotonergic neurons signal reward and punishment on multiple timescales. eLife 4, e06346 (2015).
- Article
- Google Scholar
30.
Niv, Y., Daw, N. & Dayan, P. How fast to work: response vigor, motivation and tonic dopamine. Adv. Neural Inf. Process. Syst. 18, 1019 (2006).
- Google Scholar
31.
Bayer, H. M., Lau, B. & Glimcher, P. W. Statistics of midbrain dopamine neuron spike trains in the awake primate. J. Neurophysiol. 98, 1428–1439 (2007).
- Article
- Google Scholar
32.
Chergui, K., Suaud-Chagny, M. F. & Gonon, F. Nonlinear relationship between impulse flow, dopamine release and dopamine elimination in the rat brain in vivo. Neuroscience 62, 641–645 (1994).
- CAS
- Article
- Google Scholar
33.
Parker, N. F. et al. Reward and choice encoding in terminals of midbrain dopamine neurons depends on striatal target. Nat. Neurosci. 19, 845–854 (2016).
- CAS
- Article
- Google Scholar
34.
Menegas, W., Babayan, B. M., Uchida, N. & Watabe-Uchida, M. Opposite initialization to novel cues in dopamine signaling in ventral and posterior striatum in mice. eLife 6, e21886 (2017).
- Article
- Google Scholar
35.
Trulson, M. E. Simultaneous recording of substantia nigra neurons and voltammetric release of dopamine in the caudate of behaving cats. Brain Res. Bull. 15, 221–223 (1985).
- CAS
- Article
- Google Scholar
36.
Glowinski, J., Chéramy, A., Romo, R. & Barbeito, L. Presynaptic regulation of dopaminergic transmission in the striatum. Cell. Mol. Neurobiol. 8, 7–17 (1988).
- CAS
- Article
- Google Scholar
37.
Zhou, F. M., Liang, Y. & Dani, J. A. Endogenous nicotinic cholinergic activity regulates dopamine release in the striatum. Nat. Neurosci. 4, 1224–1229 (2001).
- CAS
- Article
- Google Scholar
38.
Threlfell, S. et al. Striatal dopamine release is triggered by synchronized activity in cholinergic interneurons. Neuron 75, 58–64 (2012).
- CAS
- Article
- Google Scholar
39.
Cachope, R. et al. Selective activation of cholinergic interneurons enhances accumbal phasic dopamine release: setting the tone for reward processing. Cell Reports 2, 33–41 (2012).
- CAS
- Article
- Google Scholar
40.
Sulzer, D., Cragg, S. J. & Rice, M. E. Striatal dopamine neurotransmission: regulation of release and uptake. Basal Ganglia 6, 123–148 (2016).
- Article
- Google Scholar
41.
Floresco, S. B., Yang, C. R., Phillips, A. G. & Blaha, C. D. Basolateral amygdala stimulation evokes glutamate receptor-dependent dopamine efflux in the nucleus accumbens of the anaesthetized rat. Eur. J. Neurosci. 10, 1241–1251 (1998).
- CAS
- Article
- Google Scholar
42.
Jones, J. L. et al. Basolateral amygdala modulates terminal dopamine release in the nucleus accumbens and conditioned responding. Biol. Psychiatry 67, 737–744 (2010).
- CAS
- Article
- Google Scholar
43.
Schultz, W. Responses of midbrain dopamine neurons to behavioral trigger stimuli in the monkey. J. Neurophysiol. 56, 1439–1461 (1986).
- CAS
- Article
- Google Scholar
44.
Berke, J. D. What does dopamine mean? Nat. Neurosci. 21, 787–793 (2018).
- CAS
- Article
- Google Scholar
45.
Bromberg-Martin, E. S., Matsumoto, M. & Hikosaka, O. Distinct tonic and phasic anticipatory activity in lateral habenula and dopamine neurons. Neuron 67, 144–155 (2010).
- CAS
- Article
- Google Scholar
46.
Pasquereau, B. & Turner, R. S. Dopamine neurons encode errors in predicting movement trigger occurrence. J. Neurophysiol. 113, 1110–1123 (2015).
- Article
- Google Scholar
47.
Fiorillo, C. D., Newsome, W. T. & Schultz, W. The temporal precision of reward prediction in dopamine neurons. Nat. Neurosci. 11, 966–973 (2008).
- CAS
- Article
- Google Scholar
48.
Morita, K. & Kato, A. Striatal dopamine ramping may indicate flexible reinforcement learning with forgetting in the cortico-basal ganglia circuits. Front. Neural Circuits 8, 36 (2014).
- PubMed
- PubMed Central
- Google Scholar
49.
Gershman, S. J. Dopamine ramps are a consequence of reward prediction errors. Neural Comput. 26, 467–471 (2014).
- Article
- Google Scholar
50.
Nicola, S. M. The flexible approach hypothesis: unification of effort and cue-responding hypotheses for the role of nucleus accumbens dopamine in the activation of reward-seeking behavior. J. Neurosci. 30, 16585–16600 (2010).
- CAS
- Article
- Google Scholar
51.
Paxinos, G. & Watson, C. The Rat Brain in Stereotaxic Coordinates 5th edn (Elsevier Academic, 2005).
52.
Witten, I. B. et al. Recombinase-driver rat lines: tools, techniques, and optogenetic application to dopamine-mediated reinforcement. Neuron 72, 721–733 (2011).
- CAS
- Article
- Google Scholar
53.
Sugrue, L. P., Corrado, G. S. & Newsome, W. T. Matching behavior and the representation of value in the parietal cortex. Science 304, 1782–1787 (2004).
- ADS
- CAS
- Article
- Google Scholar
54.
Wong, J. M. et al. Benzoyl chloride derivatization with liquid chromatography-mass spectrometry for targeted metabolomics of neurochemicals in biological samples. J. Chromatogr. A 1446, 78–90 (2016).
- CAS
- Article
- Google Scholar
55.
Chung, J. E. et al. A fully automated approach to spike sorting. Neuron 95, 1381–1394 (2017).
- CAS
- Article
- Google Scholar
56.
Kvitsiani, D. et al. Distinct behavioural and network correlates of two interneuron types in prefrontal cortex. Nature 498, 363–366 (2013).
- ADS
- CAS
- Article
- Google Scholar
57.
Grace, A. A. & Bunney, B. S. The control of firing pattern in nigral dopamine neurons: burst firing. J. Neurosci. 4, 2877–2890 (1984).
- CAS
- Article
- Google Scholar
58.
Lerner, T. N. et al. Intact-brain analyses reveal distinct information carried by SNc dopamine subcircuits. Cell 162, 635–647 (2015).
- CAS
- Article
- Google Scholar

Download references

Acknowledgements

We thank P. Dayan, H. Fields, L. Frank, C. Donaghue and T. Faust for their comments on an early version of the manuscript, and V. Hetrick, R. Hashim and T. Davidson for technical assistance and advice. This work was supported by the National Institute on Drug Abuse, the National Institute of Mental Health, the National Institute on Neurological Disorders and Stroke, the University of Michigan, Ann Arbor, and the University of California, San Francisco.

Reviewer information

Nature thanks Margaret Rice and the other anonymous reviewer(s) for their contribution to the peer review of this work.

Author information

These authors contributed equally: Ali Mohebi, Jeffrey R. Pettibone

Affiliations

Department of Neurology, University of California, San Francisco, San Francisco, CA, USA
- Ali Mohebi
- , Jeffrey R. Pettibone
- & Joshua D. Berke
Department of Neuroscience, Brown University, Providence, RI, USA
- Arif A. Hamid
Department of Chemistry, University of Michigan, Ann Arbor, MI, USA
- Jenny-Marie T. Wong
- & Robert T. Kennedy
Neuroscience Graduate Program, University of California, San Francisco, San Francisco, CA, USA
- Leah T. Vinson
- & Joshua D. Berke
Department of Biochemistry and Molecular Medicine, School of Medicine, University of California, Davis, Davis, CA, USA
- Tommaso Patriarchi
- & Lin Tian
Weill Institute for Neurosciences and Kavli Institute for Fundamental Neuroscience, University of California, San Francisco, San Francisco, USA
- Joshua D. Berke

Contributions

A.M. performed and analysed the electrophysiology and photometry, and applied the computational model. J.R.P. performed and analysed the microdialysis with assistance from J.-M.T.W. and supervision by R.T.K. A.A.H. developed the behavioural task and initial photometry setup, and performed the voltammetry. L.T.V. performed retrograde tracing and analysis. T.P. and L.T. developed the dLight sensor and shared expertise. J.D.B. designed and supervised the study, and wrote the manuscript.

Competing interests

The authors declare no competing interests.

Corresponding author

Correspondence to Joshua D. Berke.