Author contributions: M.P.S. and F.C. designed research; M.P.S. and F.C. performed research; M.P.S. and F.C. analyzed data; M.P.S., R.M.W., and R.M.C. wrote the paper.
Mesolimbic dopamine (DA) is phasically released during appetitive behaviors, though there is substantive disagreement about the specific purpose of these DA signals. For example, prediction error (PE) models suggest a role of learning, while incentive salience (IS) models argue that the DA signal imbues stimuli with value and thereby stimulates motivated behavior. However, within the nucleus accumbens (NAc) patterns of DA release can strikingly differ between subregions, and as such, it is possible that these patterns differentially contribute to aspects of PE and IS. To assess this, we measured DA release in subregions of the NAc during a behavioral task that spatiotemporally separated sequential goal-directed stimuli. Electrochemical methods were used to measure subsecond NAc dopamine release in the core and shell during a well learned instrumental chain schedule in which rats were trained to press one lever (seeking; SL) to gain access to a second lever (taking; TL) linked with food delivery, and again during extinction. In the core, phasic DA release was greatest following initial SL presentation, but minimal for the subsequent TL and reward events. In contrast, phasic shell DA showed robust release at all task events. Signaling decreased between the beginning and end of sessions in the shell, but not core. During extinction, peak DA release in the core showed a graded decrease for the SL and pauses in release during omitted expected rewards, whereas shell DA release decreased predominantly during the TL. These release dynamics suggest parallel DA signals capable of supporting distinct theories of appetitive behavior.
SIGNIFICANCE STATEMENT Dopamine signaling in the brain is important for a variety of cognitive functions, such as learning and motivation. Typically, it is assumed that a single dopamine signal is sufficient to support these cognitive functions, though competing theories disagree on how dopamine contributes to reward-based behaviors. Here, we have found that real-time dopamine release within the nucleus accumbens (a primary target of midbrain dopamine neurons) strikingly varies between core and shell subregions. In the core, dopamine dynamics are consistent with learning-based theories (such as reward prediction error) whereas in the shell, dopamine is consistent with motivation-based theories (e.g., incentive salience). These findings demonstrate that dopamine plays multiple and complementary roles based on discrete circuits that help animals optimize rewarding behaviors.
- associative learning
- fast-scan cyclic voltammetry
- incentive salience
- reinforcement learning
- ventral tegmental area
Understanding the role of dopamine (DA) signaling in relation to learning, behavior and addiction is a central issue in behavioral neuroscience. Contemporary theories are consistent with the anatomical organization of the mesolimbic DA system, wherein a relatively small population of DAergic neurons in the ventral tegmental area (VTA) sends collaterals throughout the brain to broadly modulate circuits for learning and action. However, recent evidence suggests that DA signaling may be more heterogeneous than previously considered. For example, phasic DA release following reward-predictive cues scales with the anticipated subjective reward value in the nucleus accumbens (NAc) core, but not shell (Day et al., 2010; Sugam et al., 2012). In contrast, motivational shifts in hedonic processing of drug-predictive tastants are localized to phasic changes in DA release in the NAc shell, but not core (Wheeler et al., 2011). Further, we and others have shown that DA release during learned tasks encoded stimuli differently between core and shell (Aragona et al., 2009; Owesson-White et al., 2009; Badrinarayan et al., 2012; Cacciapaglia et al., 2012). Instead of a global DA signal, then, these findings suggest that DA may be differentially and discretely tuned to specific target regions to support plasticity within defined circuits related to learning, motivation, and action.
However, the precise functions of these heterogeneous DA signals are not well understood. One influential model has posited that DA provides a teaching signal to generate associative expectancies of future outcomes and whether those predictions are accurate [prediction error (PE)]. DA neurons display this type of encoding (Schultz et al., 1997; Schultz and Dickinson, 2000; Waelti et al., 2001; Tobler et al., 2003), although recent findings confirm that essentially all optogenetically identified DA neurons in the VTA show PE-type signaling (Cohen et al., 2012). In contrast, incentive salience (IS) models suggest that DA acts to endow stimuli with valued reinforcers, creating motivational drive for those outcomes (Berridge and Robinson, 1998; Robinson and Berridge, 2008; Zhang et al., 2009; Berridge, 2012). Although similar, PE and IS models make strongly divergent predictions for DA function with respect to its necessity in learning, motivation, and drug addiction (Redish, 2004; Tindell et al., 2009; Bromberg-Martin et al., 2010; Berridge, 2012).
In simple conditioning tasks, it is difficult to know what phasic DA release is encoding (i.e., is it predicting reward, or cue salience?). However, by spatiotemporally isolating predictive and salient stimuli within the same task, it is possible to parse specific features of learning and action to isolate components, such as initial prediction, consummatory behaviors, motivation, and even extinction. To address this, we used an instrumental chain schedule task where presses on one lever [seeking lever (SL)] granted access to presses on a second taking lever (TL), and presses on the TL resulted in food delivery. Further, using fast-scan cyclic voltammetry (FSCV) to measure real-time DA release patterns in either the NAc core or shell in well trained rats, we differentiated how task-selective features of DA encoding differed across NAc subregions. Finally, we examined how these signals dynamically shifted when aspects of motivation (hunger level) and prediction (extinction) were altered. We observed differential patterns of DA release in the core and shell that were highly consistent with PE and IS models, respectively, and generally support the idea of multiple mesolimbic DA signals that can support complementary but distinct aspects of goal-directed behavior.
Materials and Methods
Twelve male Sprague-Dawley rats weighing 280–330 g were used as subjects. Rats were individually housed with a 12 h light/dark cycle and lightly food restricted to no less than 90% free-feed weight (10–15 g of Purina laboratory chow each day, in addition to ∼2.7 g of sucrose consumed during daily sessions). Food restriction was in place for the duration of behavioral testing except during the postsurgery recovery period, when food was given ad libitum. All procedures were performed in accordance with the University of North Carolina at Chapel Hill Institutional Animal Care and Use Committee.
Behavioral training: chain schedule.
Testing chambers contained two retractable levers with a cue-light above each lever and a food receptacle positioned equal distance between the levers as previously described (Cacciapaglia et al., 2012). For each subject, one lever (e.g., left) was designated the TL and the other lever (e.g., right) as the SL for the duration of all test sessions. The side of the TL and SL was counterbalanced across subjects.
Rats were first trained to obtain sucrose pellets (45 mg, Purina) from the food-cup receptacle. During a single pretraining session, 50 pellets were delivered randomly approximately once every 30 s. Rats were then trained to self-administer sucrose pellets during single daily sessions. To shape instrumental responding, animals were first trained to press the TL. Each trial during shaping began with the illumination of a light cue directly above the TL coupled with the extension of the TL into the test chamber[taking lever out (TLO)]. Each taking lever press [TLP; fixed ratio 1 (FR1)] within 15 s of extension resulted in the delivery of a single sucrose pellet (45 mg) into the receptacle, retraction of the TL, and termination of the cue light. If animals failed to press the TL within 15 s, the lever was retracted, the cue light extinguished, and the trial counted as an omission. Trials were separated by a variable intertrial interval with an average of 15 s (range: 5–25 s; shaping days 1 and 2), and then increased to an average of 45 s on shaping days 3–4 (range: 30–60 s).
After the establishment of stable responding on the TL (i.e., no more than 2 omission errors in a session) the chain schedule was introduced (Fig. 1 A), adapted from Olmstead et al. (2000). Trials during chain schedule sessions began with the extension of the SL and the simultaneous illumination of the cue light directly above it (SLO). Each SLP (FR1) resulted in the retraction of the SL and extinguishing of the cue, followed by presentation of the TLO (lever extension, cue light). As above, TL presses resulted in the retraction of the TL, extinguishing of the cue light and the delivery of a sucrose pellet to the food cup. Trials were separated by a variable 45 s intertrial interval (range: 30–60 s), and each session consisted of 30 trials. For day 1 of the chain schedule, there was no delay between retraction of the SL and extension of the TL. On subsequent days, a variable interval (VI) of 3–5 s was introduced between retraction of the SL press and extension of the TL. In addition, a VI 1–3 s was introduced between TL press and the delivery of the sucrose reinforcement. Variable delays were used during training (i.e., all sessions before FSCV recordings) to eliminate the ability for rats to predictively time the delivery of events. Rats were trained for 5 d on this chain schedule or until they showed stable performance of two consecutive sessions without an omission on either the SL or TL, after which they were surgically prepared for voltammetric recording.
Behavioral training: extinction.
After the last recording session, a subset of animals underwent extinction (core recordings: n = 3; shell recordings: n = 7). During extinction, SLO presentations indicated the beginning of a new trial. Trials were identical to those in the test session, where SL presses resulted in presentations of the TLO 4 s later, but presses on the TL were not reinforced. Extinction sessions continued until rats stopped responding on the SL for 10 consecutive trials (Fig. 1 B).
Previous studies (Schoenbaum et al., 2003; Saddoris et al., 2005) have shown that neural correlates of limbic activity are highly sensitive to changes in learning and motivational state, and thus here we used response latency markers to define blocks for each subject (Fig. 1 B). The first block was early extinction, in which response latency for an SLP response was similar to the rewarded session. Next, the first SLP response latency that was at least 2 SD longer than during the previously rewarded chain session marked the beginning of Delay Extinction. Finally, all trials that followed the first omitted response were in the late extinction block, and were grouped by whether the rat omitted a SLP response (late no press) or resumed responding (late press). All extinction behavior was compared with the immediately preceding reinforced chain schedule.
After behavioral training, animals were surgically prepared for voltammetric recordings as previously described (Cacciapaglia et al., 2012). Briefly, rats were anesthetized with an intramuscular injection of ketamine hydrochloride (100 mg/kg, i.m.) and xylazine hydrochloride (20 mg/kg) mix. A guide cannula (Bioanalytical Systems) was implanted above either the NAc shell (+ 1.7 mm AP, + 0.8 mm ML) or core (+1.3 mm AP, +1.3 mm ML) and a bipolar stimulating electrode (Plastics One) was placed in the VTA (−5.2 mm AP, + 1.0 mm ML and −7.8 DV). Another guide cannula for the reference Ag/AgCl electrode was placed in the contralateral hemisphere. Components were secured to the skull with screws and cranioplastic cement.
FSCV recording techniques used here were as described in detail previously (Cacciapaglia et al., 2012; Sugam et al., 2012). Briefly, after surgery, rats were allowed to recover to their presurgery body weight (at least 5 d of recovery). The day of the experiment, a carbon-fiber microelectrode was lowered into the NAc shell or core with a locally constructed microdrive (Chemistry Department Electronic Facility, University of North Carolina, Chapel Hill, NC), after placing an Ag/AgCl reference electrode in the contralateral hemisphere. The carbon-fiber microelectrode was held at −0.4 V versus Ag/AgCl reference electrode. Periodically a cyclic voltammogram was acquired (100 ms intervals) by applying a triangular waveform that drove the potential to 1.3 V and back to −0.4 V. Before the start of each recording session, we obtained electrically evoked DA release events by driving the bipolar stimulating electrode in the VTA and recorded the resultant DA release in the NAc. If a stimulation was unsuccessful at eliciting DA release, the electrode was lowered to a new location and the process was repeated. Once electrical stimulation successfully evoked DA release in the NAc, a training set of evoked DA release was created using a combination of stimulating frequencies (between 10 and 60 Hz) and number of biphasic pulses (from 4 to 25) from the bipolar VTA electrode. In a subset of recordings, an additional training set was created following the end of the behavioral session to ensure electrode stability over the session. In a subset of rats (n = 9), after recording a full session of 30 trials, the electrode was lowered another ∼300 μm until another release site was found, at which point another recording was taken for another session of 30 trials. Analysis of the FSCV data (HDCV Analysis) used chemometric principal component analysis to extract changes in current due to DA using each subject’s electrically stimulated training set from the related recording session collected before testing, as previously described (Heien et al., 2005; Keithley et al., 2010). For each region (core and shell), an average DA concentration trace was aligned to each behavioral event and compared with the average DA concentration over a 5 s baseline immediately before SLO onset using a two-way mixed model ANOVA (factors: event, region) on subject averages.
For extinction sessions, DA traces were likewise aligned to the behavioral events. However, because many SLO presentations were not followed by any presses in extinction, two different analyses were used. For the first, DA traces were aligned to SLO and grouped by phase of extinction (see Fig. 6), and analyzed using a two-way repeated measures using extinction phase and stimulus event as factors. For the second analysis, only trials in which the rat pressed the TLP were used and peak DA concentrations (i.e., maximum DA release within 300 ms following the event) were obtained for both TLP and reward, and likewise analyzed with a two-way repeated-measures ANOVA using extinction phase and task stimulus as factors. All post hoc pairwise comparisons were made using Tukey’s HSD, corrected for unequal N when appropriate. All statistical analyses were performed using Prism 4.0 for Windows (GraphPad Software) or Statistica for Windows (StatSoft).
After each experiment, rats were deeply anesthetized with a ketamine (100 mg/kg)/xylazine (20 mg/kg) mixture (i.m.). A tungsten electrode housed in the same micromanipulator used during the experiment was lowered to the experimental recording site and a small electrolytic lesion was made (50–500 μA, 5 s) to mark the position of the electrode tip. Multiple lesions where made when multiple recordings had been done. Each brain was removed, fixed in 4% formaldehyde, and then frozen to −80C before being sliced into 40 μm coronal sections with a cryostat. Sections were mounted on slides, viewed with bright-field microscopy and digitally imaged (Fig. 1 C).
Comparison of stimulated and event-evoked DA release in core and shell.
It was possible that the differences seen between core and shell DA release for behavioral events was due not to differences in release dynamics, but rather to differences in the kinetics of DA clearance between the regions. For example, striatal DA release and uptake dynamics is slower in the shell than the core due a lower density of the dopamine transporter in the shell (Jones et al., 1996; Budygin et al., 2002). As such, differences between core and shell at later events (e.g., TLO, reward) could be explained by the persistence of residual DA in the synaptic region in the shell.
To address this concern, we compared DA release and uptake patterns elicited by behavioral events (i.e., SLO) during the chain schedule task to DA elicited by a brief burst of electrical stimulation of VTA afferents. Originally, electrical stimulation (2 ms biphasic pulses) of VTA fibers was conducted across a wide array of stimulation frequencies (10–60 Hz) and pulse numbers (4–25 pulses) to obtain a full spectrum of release dynamics for purposes of building a chemometric training set. As such, a large number of electrical stimulations were substantially greater (e.g., 2000 nm) than seen in naturally occurring transients (typically 40–150 nm). At extremely large concentrations of DA release, it is possible that the DA transporter can become saturated, leading to slower clearance kinetics than would be seen in the normal range. To address this directly, we selected only electrical stimulation “trials” in which the peak DA release was <200 nm. Likewise, we selected only behavioral trials where the peak DA release aligned to the SLO was at least 100 nm. For each subject, all eligible trials were averaged for analysis. Using this metric, we obtained 23 electrical stimulations and 15 cue-evoked sessions in the core, and 14 electrical stimulations and 11 cue-evoked sessions in the shell.
Comparisons of cue- versus electrically-evoked DA release were done using several metrics. First, peak DA was derived from the behavioral events at SLO (i.e., greatest DA concentration within 1.5 s SLO onset), TLO (greatest DA concentration between 4 and 5.5 s following SLO onset) and reward (greatest DA concentration between 6.5–8 s following SLO onset) for cue events, and at the corresponding time points for electrically stimulated events (i.e., same time points, but following stimulation onset rather than SLO onset). Next, clearance dynamics were examined using previously published metrics (Yorgason et al., 2011). Specifically, we looked at latency to half-life (concentration half of peak concentration) following peak, T 20 (the time for 20% decay from peak) and T 80 (time for 80% decay from peak). These values were compared using a mixed-model ANOVA using region (core, shell), stimulation type (SLO aligned, electrically stimulated) as between-subjects factors and either DA concentration at each event type (baseline, SLO, TLO, and reward) or decay measure (latencies to peak, T 20, half-life, and T 80, respectively) as repeated measures. Post hoc comparisons were done using Tukey’s HSD for unequal N.
Reinforced chain schedule behavior
Rats rapidly learned the chain schedule task. On the final presurgery session, rats on average completed 99.8% of the trials accurately. During those sessions, rats took on average 783 ± 253 ms to press following SLO, and 588 ± 298 ms to press after TLO, a difference that was nearly significant, t (17) = 1.77, p = 0.085 However, on postsurgical recording days, rats again made almost no omissions (99.5%), but displayed significantly faster response latencies for TLP (999 ± 64 ms) than SLP (444 ± 39 ms), t (29) = 7.48, p < 0.0001. Importantly, on the days of recording, there were no differences in response latency for animals recorded in the shell versus core, t (29) = 0.78, p = 0.48. Thus, rats in both groups (core and shell) were equally competent to complete the chain schedule when the task was reinforced.
Differential DA release in NAc core and shell during reinforced chain schedule
Next, we used FSCV to obtain real-time DA recordings from either the NAc core (n = 13) or shell (n = 12) during performance on the well learned chain schedule (Fig. 1 C). In rats where multiple recordings were taken, the electrode tip was lowered at least 300 μm between sessions to ensure that the 100 μm carbon-fiber electrode tip was entirely in fresh tissue for each recording. Consistent with both PE and IS models, we found robust phasic DA release that began with the onset of the SL and cue light (SLO) in both core and shell. Examples of this signaling from representative recording sessions (averaged across 30 trials) are shown for the core (Fig. 2 A) and shell (Fig. 2 B), with color plots from an individual animal. DA signaling differed strikingly between subregions. Across all rats, in the core (Fig. 2 C–E, black traces), DA peaked rapidly at the onset of the most predictive cue (SLO onset) and then quickly declined to baseline by the time of the TLP. In contrast, in the shell (Fig. 2 C–E, gray traces), rapid increases in DA concentration were coincident with SLO presentation and remained elevated for other motivationally salient stimuli with discrete peaks at TLO and reward delivery before returning to baseline at the end of the trial.
A , Schematic of task design. During the chain schedule, one lever (SL) was extended into the test chamber at the same time as a cue light was illuminated above the lever (SLO). SLP extinguished the light and retracted the lever. After a delay, the other lever in the chamber (TL) was extended and associated cue light illuminated (TLO). Following a press on the TL (TLP) rats received food reinforcement after a delay (R). B , Extinction behavior in animals with FSCV recordings in the core or shell. Behavior and analysis in extinction was grouped by block based on the rat’s behavior. Trials in the immediate preceding chain task were used to compare events in extinction. Early extinction was all trials until the first significantly delay response on the SL, whereas delay extinction was all trials between the first delayed SLP and the first omitted press following SLO presentation. Within the late extinction block, distinctions were made between whether the subject made a press or omitted a response. C , Histology of electrode placements within the core (black circles) and medial shell (gray circles).
Performance of the chained schedule produced different DA release dynamics within the NAc. Dopamine release dynamics in the core ( A ) and shell ( B ) of the NAc aligned to the time of SL extension into the chamber (SLO). Color plots each show averages from a representative subject in the core and shell, respectively. Average time (▴) of the TL extension (TLO) and reward (R) and range (±2 SD) relative to SLO are shown at bottom. C – E , Across-subject mean DA release across all recordings in core (black) and shell (gray) relative to ( C ) SLO, ( D ) SLP, and the extension of the TLO 4 s later, and ( E ) TLP, and the reward (R) food pellet delivered 2.5 s after press. Dashed line shows SEM of the average for each region. Bottom row ( F – H ) shows average peak DA release for each behavioral event. *p < 0.05 versus baseline; †p < 0.05 core versus shell.
We quantified these observations by averaging all recordings taken in either the core or shell, aligned to each of the behavioral events in the chain schedule (Fig. 2 C–H). In the FSCV recording sessions, the time between SLP and TLO was fixed (4 s) as was the time between TLP and reward delivery (2.5 s) to allow for better alignment of task stimuli for DA analysis. Thus, aligning to SLO, SLP/TLO, and TLP/reward allowed alignment of all behavioral markers and permitted analysis of peak DA release relative to these events.
A two-way ANOVA comparing peak (maximum DA concentration within 300 ms following event) DA concentrations across region (core, shell) and event (baseline, SLO, SLP, TLO, TLP, reward) indicated that the shell released more DA overall than the core, F (1,24) = 13.63, p < 0.002. Importantly, a significant interaction of region × event, F (5,120) = 9.88, p < 0.0001, revealed that DA signaling in the core and shell differentially responded to the behavioral events (Fig. 2 F–H). Specifically, core DA release significantly increased at SLO, relative to baseline (p < 0.0001), and remained above baseline at the time of SLP (p < 0.0001), though significantly below that at SLO (p < 0.05). However, there were no differences in peak DA in the core compared with baseline for either TL event (TLO vs baseline, p = 0.59; TLP vs baseline, p = 1.0), and no difference from baseline at the time of reward receipt (p = 1.0).
In contrast, peak DA concentrations in the shell showed significant DA release for all the events. All events were associated with greater DA release than baseline (all comparisons vs BL, p < 0.0002), whereas none of the events were significantly different from each other (all pairwise SLO, SLP, TLO, TLP, and reward comparisons, p > 0.96).
Directly comparing core and shell, we found important differences in DA signaling between regions. Although there were no differences in DA release during either baseline (p = 1.0) or the SL events (SLO, p = 1.0; SLP, p = 0.22), shell DA was significantly elevated compared with core for both TL events (TLO, p < 0.01; TLP, p < 0.001) and reward (p < 0.0005; Fig. 3 D–F).
Changes in DA signaling between the beginning of the session (early; first 5 trials of the chain schedule) versus the end of the session (late; last 5 trials). A , Average DA concentrations in the NAc core from the averages of each subject’s first five trials (light blue) and last five trials (purple). B , In the core, within-subjects peak DA signaling was unchanged between the beginning of the session and end. C , Average DA concentrations in the NAc shell from the averages of each subject’s first five trials (red) and last five trials (orange). D , Shell DA showed a significant within-subjects decrease at both the SLP and TLP cues and reward (**p < 0.01), whereas the decrease at the SLO cue was nearly significant (#p = 0.073). Error bars show SE of the difference (early vs late).
Region-specific changes in DA release between start and end of session
Next, we compared DA release in the core and shell during the reinforced chain schedule task at the beginning of the session (first 5 trials) versus the end of the session (last 5 trials; Fig. 3). This was important to test to ensure that the electrode was stable throughout the session (i.e., that the electrode did not lose sensitivity over time), and also to assess whether DA tracks any subtle changes in motivational state (for example, due to any effects of decreased hunger after consuming the food) following presentation of the different stimuli.
In the core (Fig. 3 A), a two-way ANOVA indicated a significant main effect of event (BL, SLO, SLP, TLO, TLP, Rew; F (5,65) = 35.03, p < 0.0001), but no effect of session phase (early vs late; F (1,13) = 3.55, p = 0.08), or interaction between event × session phase (F (5,65) = 0.82, p = 0.54). Post hoc comparisons between early and late blocks indicated that peak DA release relative to the behavioral events in the core remained the same between the beginning and end of the session (Tukey: all early vs late pairwise comparisons for BL, SLO, SLP, TLO, TLP, and reward, p > 0.50; Fig. 3 B).
However, phasic DA release to task stimuli generally decreased over the session in the shell (Fig. 3 C,D), with significant main effects of event (F (5,55) = 13.52, p < 0.0001), session phase (F (1,11) = 6.95, p = 0.02), and an interaction between event and session phase (F (5,55) = 3.74, p = 0.006). As in the core, post hoc tests indicated no difference at BL, but significant decreases in peak DA release to the SLP, TLP, and reward (Tukey: all p < 0.0005), a trend toward significance at the SLO cue (p = 0.060), but no difference at the TLO cue (p = 0.36). Thus, changes in shell (but not core) were limited primarily to motivated actions and reward consumption with differential (albeit modest) effects cue onsets. These across-session shifts in DA release were not because of generalized changes in electrode sensitivity but instead suggest that stimulus- and shell-specific changes in DA release patterns indicate information about the altered significance of task stimuli across repeated trials. Given that animals had consumed at least 25 pellets on average by the end of each recording session (i.e., 1144 mg, or 7.6% of the weight of the rats’ daily food-restricted regimen), these findings suggest that increased ingestion of the food successfully reduced the motivated hunger state in the animal, which was manifest in changes in the shell but not the core over the course of the session.
Cue-evoked versus electrically stimulated DA dynamics in core and shell
One caveat to these findings may be that the core and shell have different DA clearance dynamics due to lower density of DA transporter in the shell compared with the core (Jones et al., 1996; Budygin et al., 2002). Thus, it is possible that shell DA seen at TL and reward events is due to residual DA release at the time of SLO, but is unable to be cleared from synaptic overflow as efficiently as in the core. To address this, we compared electrically stimulated DA release at the same electrode location as during the chain schedule recordings to see whether electrically stimulated dynamics matched cue-evoked dynamics in the core and shell (Fig. 4 A). We predicted that if the slower clearance kinetics in the shell were responsible for the differences in subsequent event signaling (e.g., TLO) between the core and shell, then electrically stimulated and cue-evoked release in their respective subregions should follow nearly identical patterns of release and clearance. In contrast, significant deviations from electrical stimulations would suggest that the DA release in that area is tracking task-related events in a manner that cannot be explained by synaptic clearance dynamics alone.
Comparison of electrically stimulated versus cue-evoked DA signaling in the NAc core and shell. A , Average concentration of DA aligned to either the SLO cue or electrical stimulation of VTA fiber onset. The timing of the onset of the TLO cue and reward was estimated on a range of response times for those outcomes following SLO (mean response time indicated by triangle; width indicates ±95% confidence interval). B , Comparison of average baseline (BL) DA concentrations to peak DA concentration within 1 s of SLO or electrical stimulation (Stim/SLO), and within the 95% confidence interval range for the times corresponding to the TLO or reward epochs. C , Latency to peak concentration following SLO or electrical stimulation (Peak Lat) and subsequent decay (clearance) following release in the core and shell. T20 and T80 are the times at which the signal has decayed 20% and 80% away from peak, respectively, whereas half-life is the latency following peak to the reach half-peak concentration. *p < 0.0001, electrical stimulation/SLO versus baseline; †p < 0.0001, Shell: Cue greater [DA] than all other stimulation types; ‡ p < 0.0001, Shell: Cue greater latency to decay from peak than all other stimulation types. Error bars show SE of the difference (cue vs electric).
Overall, we found that core and shell sharply differed in their relationship between cue-evoked and electrically evoked DA release (Fig. 4 B). Looking at peak concentrations at task events, there were significant differences in DA concentrations as an interaction of region (core/shell) × stimulation type (electrical versus cue) × event (BL, SLO, TLO, Rew; F (3,174) = 12.31, p < 0.0001). In the core, peak DA concentrations for cue-evoked and electrically stimulated traces were nearly identical; there were no statistical differences between these stimulation types at baseline, SLO, TLO, or reward epochs (Tukey: all p > 0.80). In contrast, the shell showed a different pattern of dynamics between electrically- and cue-evoked DA release patterns. Though there was no difference in concentration at baseline or SLO (Tukey: both p > 0.98), DA was significantly greater for the TLO and reward epochs in the cued trials compared with the electrical stimulations (both p < 0.0001).
Likewise, the rate of release and subsequent clearance from the synapse showed a similar pattern (Fig. 4 C). Looking at clearance rates as a function of decay from peak, there was a significant interaction between region × stimulation type × decay parameter (peak time, T20, half-life, T80; F (3,174) = 80.23, p < 0.00001). As above, core clearance and decay dynamics did not differ between cue-evoked and electrically evoked stimulation types. The latency to peak, T 20, half-life, and T 80 were all statistically similar regardless of stimulation type (electrical vs cue; all p > 0.95). In contrast, DA levels in the shell showed significantly delayed decay to baseline following SLO presentations relative to electrically stimulated trials. While latency to peak and T 20 were similar between cue-evoked and electrically-evoked stimulations (p > 0.98), latency to half-life (p < 0.0001), and T 80 (p < 0.0001) were significantly delayed in the cued trials relative to the electrical stimulation. Collectively, these findings demonstrate that intrinsic differences in clearance kinetics in the shell and core are insufficient to explain differences in DA signaling during behavioral performance.
Rats displayed extinction behavior during sessions when the food reward was omitted by progressively increasing the latency to press the different levers over the course of the extinction session. We generated behaviorally defined phases based on these latency shifts relative to press latency on the SL and TL during the immediately preceding reinforced chain session. The Early phase was defined as the trials where latencies were the same as during the reinforced session. When the rats pressed the lever significantly slower (i.e., >2 SD) than normal, this was termed the delay phase which lasted from the first delayed response until the subject omitted a response. All trials after this first omission were termed late phase based on whether the rat pressed (late press) or omitted a response (late no press).
First, we assessed the number of trials performed before the rats exhibited a latency shift from early to delay phase, as well as the number of trials to the first omitted trial (i.e., shift to late phase) for the SL and TL, respectively (Fig. 5 A). Rats slowed responding on the TL significantly before they did so for the SL, paired t test: t (7) = 2.49, p = 0.04, suggesting that the TL (perhaps by virtue of its immediate relationship with the reward) was more sensitive to reward omission than the SL. In contrast, the number of trials before making the first omission was nearly identical for both SL and TL levers (p > 0.9), perhaps indicating that omitted responses were only emitted when the prediction of reward had accurately updated to zero at the onset of the trial. Consistent with this, we rarely found trials where animals performed a SLP but omitted a subsequent TLP response (only 4/140 total late-phase trials; 2.9%), suggesting that rats almost exclusively performed either the entire chain sequence or not at all. As such, omissions were likely more linked to information available at the SL than TL.
Extinction behavior in animals with FSCV recordings in the core or shell. A , The number of trials before rats first showed a significant increase in response latency to shift from the early to delay phase of extinction (left) and response omission (right) for the SL (light gray) and TL (dark gray). Rats showed a latency shift for the TL in significantly fewer trials than the SL, though the number of trials before a trial was omitted was the same between seeking and taking responses. *p < 0.05 SL versus TL. B , Response latency to respond on the SL (left) and the TL (right) across phases of extinction. Response latency increased across blocks, and was significantly longer in the delay and late extinction blocks for the SL presses. Presses on the TL were reliably faster than those on the SL within each block. *p < 0.05, **p < 0.01 vs early Ext.
Next, we examined average latencies for the SLP and TLP in each phase based on the above criteria. A two-way repeated-measures ANOVA comparing response latency on the different levers (SL, TL) during different phases of the task (chain, early extinction, delay extinction, late press) showed a significant main effect of lever (F (1,4) = 45.7, p = 0.003), which was due to significantly more rapid responses on the TL than the SL (Fig. 5 B), consistent with performance during typical reinforced sessions, and a significant main effect of extinction phase (F (3,12) = 14.5, p < 0.001). For the SL, SLP responses during early extinction were similar to those during the reinforced chain session (Tukey: p = 1.0), but significantly slowed by the delay (p = 0.02 vs early) and late press (p = 0.003 vs early) phases. However, press latencies on the SL were similar between the delay and late phase (p = 0.89). For TL presses, response latency shifts were more subtle, with the Late phase being significantly slower than the early phase (p = 0.04). However, a linear contrast accounted for the greatest proportion of variance in the TL latency shift (F (1,4) = 11.08, p = 0.03; 86% of main effect variance), whereas for the SL, a contrast comparing the chain and early versus the delay and late phases accounted for the greatest proportion of the effect variance (F (1,4) = 15.42, p = 0.02; 97% of main effect variance).
Extinction: omitted outcome differentially augments core and shell DA signaling
Event-related DA signaling in the NAc shifted as the rat progressed through the behaviorally defined phases of extinction. The manner in which DA encoding was affected by extinction strikingly varied between core and shell (Fig. 6).
DA release in the core ( A – C ) and shell ( D – F ) during extinction. A , Alignment to the SLO in the core revealed a continuous decrease in core DA release to the cue over iterative extinction trials (blue lines) relative to rewarded chain sessions (black line). BvCore DA release to operant responses and reward during the reinforced chain schedule (black) and early extinction (blue) aligned to the TLP event. Gray bar shows range of maximum and minimum concentrations of DA during the baseline period. C , Peak DA relative to the SLP, TLP, and reward in the reinforced schedule and early extinction. D , Alignment to the SLO in the shell (red lines) revealed more discrete decreases in phasic DA release to the cue over iterative extinction trials relative to rewarded chain sessions (black line). E , DA signaling aligned to the TLP in the shell in early extinction (red) and the reinforced chain schedule (black). F , Peak DA in the shell was unchanged at SLP, but showed significant decreases at TLP and reward. *p < 0.05, **p < 0.01, chain versus early extinction; †p < 0.05, omission less than baseline.
We first examined DA signaling in the core during extinction. Relative to the SLO, DA significantly and linearly decreased across the different phases of extinction relative to the rewarded chain session, (interaction: phase × cue; SLO vs baseline; F (4,157) = 33.19, p < 0.0001; Fig. 6 A. Post hoc pairwise comparisons showed that peak DA to the SLO rapidly decreased between the chain and early extinction phases (Tukey: p < 0.0001), and again between early extinction and late extinction (p < 0.0001). However, DA during the delay extinction was not different from in the late press block (p = 0.64) and peak DA did not differ in the late phase based on whether the rat made a response or not (late press vs late no press, p = 0.99). Further, DA release during the SLO was significantly greater than baseline in the chain (p < 0.0001), early extinction (p < 0.0001), and delay extinction (p < 0.001) phases, but not in the late press or late no press phases (both p > 0.5). These pairwise findings supported a significant negative linear trend (F (1,157) = 94.77, p < 0.0001), which accounted for a majority (71%) of the effect variance.
Next, one hallmark of PE signals in the brain is the presence of negative prediction errors at the time of an omitted expected reward (Schultz et al., 1997). We anticipated that these signals would be strongest early in extinction when the subject had full expectation that the reward would be delivered. In the core (Fig. 6 B), a two-way ANOVA indicated a significant interaction of event × phase (chain vs early extinction; F (3,57) = 3.24, p = 0.029). Specifically, although peak DA release relative to the preceding SLP was significantly reduced in early extinction relative to the reinforced chain session, (Tukey: p = 0.019), DA release to the TLP was unaffected (p = 0.41). Critically, core DA showed evidence of a negative prediction error during extinction (Fig. 6 B) such that DA release during the time of the expected but omitted reward was significantly lower than during the reinforced session (p = 0.003). Indeed, whereas peak DA release to the reward was no different from baseline during the reinforced Chain session (p = 0.99), it shifted to significantly less than baseline during reward omissions (p = 0.03). Thus, DA signals in the core during early extinction displayed both dynamic shifts in release to predictive SL stimuli and actions, no change relative to a TL cue, and a negative prediction error to reward omission.
The shell showed a different pattern of DA release relative to the SLO cue (Fig. 6 D). Here, subject-averaged cue-evoked DA to the SLO dynamically changed across phases, (interaction: phase × cue; F (2,24) = 7.95, p < 0.0005), but unlike the core, shell DA did not change between the chain phase and early extinction, (p = 0.74), but DA signaling to the SLO was significantly decreased during the delay extinction phase relative to both the reinforced chain phase (p = 0.041) and the early extinction phase (p = 0.02), coincident with the rats’ motivational shift in behavior (Fig. 5). DA signaling significantly decreased again between the delay phase and late phases (late press, p = 0.03; late no press, p = 0.004), but there was no difference in DA levels between the late phases (press versus no press, p = 0.43). As in the core, DA release during the SLO was significantly above baseline during the chain, early extinction and delay extinction phases (Tukey: all p < 0.001), but neither of the late phases were significantly different from baseline. Thus, core DA release rapidly and continuously tracked changes in prediction for the most predictive cue, while patterns of DA release for the same cue in the shell instead tracked changes in motivational state between extinction phases.
Looking at pressing and reward signaling, shell DA release differed from the pattern in the core (Fig. 6 E,F). A two-way ANOVA examining DA on individual trials by stimulus type (BL, SLP, TLP, reward) and extinction phase (chain, early extinction) found a significant interaction between stimulus × extinction (F (3,108) = 11.5, p < 0.0001; Fig. 6 D). Unlike the core, there was no difference in peak shell DA release to the SLP early in extinction (p = 0.44). Instead, extinction induced a significant decrease in DA release to both the TLP (p = 0.01) and at the time of reward omission relative to reward receipt (p < 0.0001) relative to the matched time during the reinforced Chain schedule. During the rewarded Chain session, DA was significantly elevated above baseline (p < 0.0001), but during reward omission, DA was numerically greater than, but not statistically different from baseline (p = 0.07). Thus, unlike the core, we found limited evidence for early extinction prediction errors, and instead a decrease in DA release relative to the TL (but not SL) press as well as the elimination of DA release at the reward seen during the reinforced schedule.
Phasic DA release patterns tracked stimuli that strikingly differed between NAc subregions in a manner consistent with contrasting theories of DA function. In a well learned chain schedule task, DA in the NAc core selectively peaked at the most predictive cue, and linearly tracked changes in prediction value and errors during extinction. In contrast, phasic DA release in the NAc shell tracked all salient stimuli when the task was rewarded, and both within session and during extinction displayed changes in signaling consistent with shifts in motivation. As such, we propose that these DA signals are simultaneously available to the animal during behavior, allowing both predictive and motivational information to guide learning and action.
Core DA release tracks prediction error
In the core, DA phasically increased at the time of SL cue presentation and declined to baseline for fully predicted later events (e.g., TL, reward), similar to previous findings (Roitman et al., 2004; Cacciapaglia et al., 2012). This pattern of activity is consistent with error prediction models, which state that maximally predictive cues should elicit the highest DA release (i.e., prediction), whereas accurately predicted events that follow should elicit minimal DA release (i.e., prediction error). Thus, as the TL and reward were predicted accurately by the SL, they generated little error at their delivery, and evoked little error-related DA release (Schultz et al., 1997; Schultz and Dickinson, 2000).
Our laboratory and others have shown that DA signals in the core are sensitive to differences in predicted value, and are modulated by subjective factors like risk preference and delays to reinforcement (Day et al., 2010; Gan et al., 2010; Sugam et al., 2012; Saddoris et al., 2013, 2015). For example, in rats performing a risky decision making task, core DA scaled with cues that predicted the rat’s preferred option, and rapidly dropped below baseline when expected rewards were omitted, indicative of a negative prediction error (Sugam et al., 2012). Likewise here, core DA tracked both the value of predicted outcomes, and dynamically shifted based on the updated predicted cue value during extinction. Indeed, DA release to the SLO was no different from baseline by the time the rat began omitting responses during extinction, regardless of whether or not a response was made, suggesting DA signaled the anticipated value of responding, rather than the motivation to press. Further, reward omissions early in extinction elicited robust pauses in DA release, consistent with negative prediction error signaling.
Shell DA tracks motivationally salient stimuli
DA release in the shell discretely tracked all salient stimuli (SLO, TLO, R). These patterns could not be explained by slower reuptake kinetics, and instead appear to reflect real-time encoding of contingent events (Pan et al., 2005) and acquired incentive salience (Berridge and Robinson, 1998; Berridge, 2012; Wassum et al., 2012). Thus, DA release events encoded both predictive cues and rewards in the shell, but only predictive cues in the core (Cacciapaglia et al., 2012).
We found evidence for this motivational component of shell DA signaling. First, DA signaling in the shell to stimuli was decreased between the beginning and end of sessions, which was not seen in the core. One explanation is that rats in the end of the session were simply more sated (by definition, they had eaten more food than at the onset of the session), and as such, cues predictive of the food reflected the diminished motivational state of the animal. In contrast, the cues still accurately predicted the delivery of the sucrose pellet, so the PE-type encoding in the core was relatively less affected by this motivational shift.
Second, during extinction, DA release in the shell to the SLO remained stable while the rat was performing the task at the same motivational level (as indicated by response latency and accuracy), but significantly decreased after the rats’ motivation declined (i.e., response latency) over the course of extinction. In contrast, we saw rapid decreases in phasic DA release during the TL stimuli. IS models predict that cues that reduce uncertainty should create greater motivation and incentive salience (Zhang et al., 2009; Smith et al., 2011), which here is biased toward the TL, as it is maximally predictive of imminent reward delivery. Indeed, intra-NAc shell infusions of amphetamine selectively potentiate the encoding of cues most proximal to the delivery of reward in a chained pavlovian task, but have less effect on the first cue in the sequence (Smith et al., 2011). Thus, DA encoding of the TL was particularly sensitive to the predicted loss of reward delivery in extinction. Surprisingly, reward omissions did not result in DA release below baseline, suggesting that shell DA was less likely to encode a negative prediction error than the core. Collectively, this pattern of signaling within the NAc shell is distinctly different from the core, and is suggestive of IS-type encoding.
In support, the NAc shell has been implicated in a variety of motivationally driven behaviors. For example salt appetite, where a salty solution is normally aversive, can be rewarding if the animal is salt-deprived. In both cases, the predicted outcome (salt) is the same, but the motivation to obtain that outcome differs between the normal and salt-deprived animals (Tindell et al., 2009). NAc neural encoding for the salty solution is modulated in the shell based on the degree of salt motivation, while core neurons failed to display state-based differences (Loriaux et al., 2011). Similarly, intra-NAc shell microinfusions of amphetamine strongly potentiates the motivational vigor of lever pressing in the presence of a cue during pavlovian-to-instrumental transfer (PIT), as does chronic pretransfer experience with cocaine (Wyvell and Berridge, 2000; Saddoris et al., 2011; LeBlanc et al., 2013). Indeed, the experience with self-administered cocaine that potentiates PIT behavior also preferentially increases NAc shell neural encoding relative to the core (Saddoris et al., 2011).
This pattern of IS and PE in the shell and core appears to track both appetitive and aversive conditions. In pavlovian fear conditioning, phasic DA increases in the NAc shell for salient aversive cues, whereas core DA release saw decreases and pauses in release indicative of PE-type prediction of a negative outcome (Badrinarayan et al., 2012). Thus, even negative (but salient) events can be accounted for with an IS-type model within the shell, whereas core DA release remains strongly coupled to predictions of outcome value.
Complexity of PE and IS signaling in conditioning
One caveat is that in pavlovian conditioning, animals that preferentially interact with predictive cues (“sign trackers”) show enhanced DA release in the NAc core compared with those that immediately go to the food cup (“goal trackers”; Flagel et al., 2011). This increased sign-tracking is described as supporting IS, as the cue has become a salient stimulus capable of acting as a motivational “magnet” and accords with similar findings in the core (Aragona et al., 2009; Peciña and Berridge, 2013; Wassum et al., 2013; Ostlund et al., 2014). This appears to be at odds with our assignment of PE biased to the core and IS to the shell.
It is important to note that the respective roles of DA signal in the core and shell is likely complex. For example, few of the above studies have independently investigated the role of shell and core in these tasks, so increased DA in the core in sign-tracking animals may simply reflect a generalized increase in incentive DA signals in the mesolimbic pathway. Further, we are not advocating an absolute division; we found some DA release to the TLO in the core, although DA to the SLO in the shell persisted despite changes in motivation during extinction, suggesting that features of IS may be present in the core and PE in the shell (though at lower levels and/or less responsive to task dynamics). Rather, we suggest that core and shell represent a critical biasing toward PE- and IS-type encoding patterns, which is consistent with the more graded composition of striatal anatomy (Haber, 2014).
Implications for addiction
Model-based differences in core and shell DA signaling have important implications beyond natural reward learning. For example, although drugs of abuse are initially rewarding, over time, drug-associated stimuli can induce feelings of intense aversive craving, imposing a negative affective state that drives drug-seeking (Koob and Le Moal, 1997). Prolonged abstinence from drugs increases the impact of drug-associated stimuli though a process known as incubation of craving (Grimm et al., 2001; Hollander and Carelli, 2005; Pickens et al., 2011). The predicted outcome (drug) is unchanged in both the immediate and abstinent condition, but there is a profound increase in the motivation to resume drug taking in the abstinent subjects. This suggests a significant change in those stimuli’s incentive salience and would predict that abstinence-related changes should preferentially be seen in the shell. Relatedly, when drug-self-administering rats are presented with cocaine-predictive cues that induce an aversive motivational state, changes in DA signaling track the aversive state of the animal in the shell but not the core (Wheeler et al., 2011). Collectively, these findings support that core and shell DA contributions to learning and motivation are consistent across both natural and drug rewards.
- Received June 18, 2015.
- Revision received July 8, 2015.
- Accepted July 15, 2015.
This work was supported by National Institutes on Drug Abuse Grants DA028156 and DA035322 to M.P.S. and DA017318 and DA034021 to RMC, and DA010900 to R.M.W. We thank Dr Elizabeth West for comments on an earlier draft of this work.
The authors declare no competing financial interests.
- Correspondence should be addressed to Dr Michael Saddoris, Department of Psychology and Neuroscience, University of Colorado Boulder, Muenzinger, UCB 345, Boulder, CO 80309-0345. [email protected]
- Copyright © 2015 the authors 0270-6474/15/3511572-11$15.00/0
- Badrinarayan A ,
- Wescott SA ,
- Vander Weele CM ,
- Saunders BT ,
- Couturier BE ,
- Maren S ,
- Aragona BJ
(2012) Aversive stimuli differentially modulate real-time dopamine transmission dynamics within the nucleus accumbens core and shell. J Neurosci 32:15779–15790 .
- Budygin EA ,
- John CE ,
- Mateo Y ,
- Jones SR
(2002) Lack of cocaine effect on dopamine clearance in the core and shell of the nucleus accumbens of dopamine transporter knock-out mice. J Neurosci 22:RC222 .
- Heien ML ,
- Khan AS ,
- Ariansen JL ,
- Cheer JF ,
- Phillips PE ,
- Wassum KM ,
- Wightman RM
(2005) Real-time measurement of dopamine fluctuations after cocaine in the brain of behaving rats. Proc Natl Acad Sci U S A 102:10023–10028 .
- Koob GF ,
- Le Moal M
(1997) Drug abuse: hedonic homeostatic dysregulation. Science 278:52–58 .
- Loriaux AL ,
- Roitman JD ,
- Roitman MF
(2011) Nucleus accumbens shell, but not core, tracks motivational value of salt. J Neurophysiol 106:1537–1544 .
- Pan WX ,
- Schmidt R ,
- Wickens JR ,
- Hyland BI
(2005) Dopamine cells respond to predicted events during classical conditioning: evidence for eligibility traces in the reward-learning network. J Neurosci 25:6235–6242 .
- Redish AD
(2004) Addiction as a computational process gone awry. Science 306:1944–1947 .
- Robinson TE ,
- Berridge KC
(2008) Review: the incentive sensitization theory of addiction: some current issues. Philos Trans R Soc Lond B Biol Sci 363:3137–3146 .
- Roitman MF ,
- Stuber GD ,
- Phillips PE ,
- Wightman RM ,
- Carelli RM
(2004) Dopamine operates as a subsecond modulator of food seeking. J Neurosci 24:1265–1271 .
- Schultz W ,
- Dayan P ,
- Montague PR
(1997) A neural substrate of prediction and reward. Science 275:1593–1599 .
- Smith KS ,
- Berridge KC ,
- Aldridge JW
(2011) Disentangling pleasure from incentive salience and learning signals in brain reward circuitry. Proc Natl Acad Sci U S A 108:E255–E264 .
- Tindell AJ ,
- Smith KS ,
- Berridge KC ,
- Aldridge JW
(2009) Dynamic computation of incentive salience: “wanting” what was never “liked.” J Neurosci 29:12220–12228 .
- Tobler PN ,
- Dickinson A ,
- Schultz W
(2003) Coding of predicted reward omission by dopamine neurons in a conditioned inhibition paradigm. J Neurosci 23:10402–10410 .
- Wyvell CL ,
- Berridge KC
(2000) Intra-accumbens amphetamine increases the conditioned incentive salience of sucrose reward: enhancement of reward “wanting” without enhanced “liking” or response reinforcement. J Neurosci 20:8122–8130 .