• Bipolar Disorder
  • Therapy Center
  • When To See a Therapist
  • Types of Therapy
  • Best Online Therapy
  • Best Couples Therapy
  • Best Family Therapy
  • Managing Stress
  • Sleep and Dreaming
  • Understanding Emotions
  • Self-Improvement
  • Healthy Relationships
  • Student Resources
  • Personality Types
  • Guided Meditations
  • Verywell Mind Insights
  • 2023 Verywell Mind 25
  • Mental Health in the Classroom
  • Editorial Process
  • Meet Our Review Board
  • Crisis Support

What Is Operant Conditioning?

How Reinforcement and Punishment Modify Behavior

Kendra Cherry, MS, is a psychosocial rehabilitation specialist, psychology educator, and author of the "Everything Psychology Book."

operant conditioning essay definition

  • Behavior Types

Operant conditioning, sometimes referred to as  instrumental conditioning , is a method of learning that employs rewards and punishments for behavior. Through operant conditioning, an association is made between a behavior and a consequence (whether negative or positive) for that behavior.

For example, when lab rats press a lever when a green light is on, they receive a food pellet as a reward. When they press the lever when a red light is on, they receive a mild electric shock. As a result, they learn to press the lever when the green light is on and avoid the red light.

But operant conditioning is not just something that takes place in experimental settings while training lab animals. It also plays a powerful role in everyday learning. Reinforcement and punishment take place in natural settings all the time, as well as in more structured settings such as classrooms or therapy sessions.

The History of Operant Conditioning

Operant conditioning was first described by behaviorist  B.F. Skinner , which is why you may occasionally hear it referred to as Skinnerian conditioning. As a behaviorist, Skinner believed that it was not really necessary to look at internal thoughts and motivations in order to explain behavior. Instead, he suggested, we should look only at the external, observable causes of human behavior.

Through the first part of the 20th century, behaviorism became a major force within psychology. The ideas of  John B. Watson  dominated this school of thought early on. Watson focused on the principles of  classical conditioning , once famously suggesting that he could take any person regardless of their background and train them to be anything he chose.

Early behaviorists focused their interests on associative learning. Skinner was more interested in how the  consequences  of people's actions influenced their behavior.

Skinner used the term  operant  to refer to any "active behavior that operates upon the environment to generate consequences." Skinner's theory explained how we acquire the range of learned behaviors we exhibit every day.

His theory was heavily influenced by the work of psychologist  Edward Thorndike , who had proposed what he called the  law of effect .   According to this principle, actions that are followed by desirable outcomes are more likely to be repeated while those followed by undesirable outcomes are less likely to be repeated.

Operant conditioning relies on a fairly simple premise: Actions that are followed by reinforcement will be strengthened and more likely to occur again in the future. If you tell a funny story in class and everybody laughs, you will probably be more likely to tell that story again in the future.

If you raise your hand to ask a question and your teacher praises your polite behavior, you will be more likely to raise your hand the next time you have a question or comment. Because the behavior was followed by reinforcement, or a desirable outcome, the preceding action is strengthened.

Conversely, actions that result in punishment or undesirable consequences will be weakened and less likely to occur again in the future. If you tell the same story again in another class but nobody laughs this time, you will be less likely to repeat the story again in the future. If you shout out an answer in class and your teacher scolds you, then you might be less likely to interrupt the class again.

Types of Behaviors

Skinner distinguished between two different types of behaviors

  • Respondent behaviors are those that occur automatically and reflexively, such as pulling your hand back from a hot stove or jerking your leg when the doctor taps on your knee. You don't have to learn these behaviors. They simply occur automatically and involuntarily.
  • Operant behaviors , on the other hand, are those under our conscious control. Some may occur spontaneously and others purposely, but it is the consequences of these actions that then influence whether or not they occur again in the future. Our actions on the environment and the consequences of that action make up an important part of the  learning process .

While classical conditioning could account for respondent behaviors, Skinner realized that it could not account for a great deal of learning. Instead, Skinner suggested that operant conditioning held far greater importance.

Skinner invented different devices during his boyhood and he put these skills to work during his studies on operant conditioning. He created a device known as an operant conditioning chamber, often referred to today as a  Skinner box . The chamber could hold a small animal, such as a rat or pigeon. The box also contained a bar or key that the animal could press in order to receive a reward.

In order to track responses, Skinner also developed a device known as a cumulative recorder. The device recorded responses as an upward movement of a line so that response rates could be read by looking at the slope of the line.

Components of Operant Conditioning

There are several key concepts in operant conditioning. The type of reinforcement or punishment that is used can have an effect on how the individual responds and the effect of conditioning. There are four types of operant conditioning that can be utilized to change behavior: positive reinforcement, negative reinforcement, positive punishment, and negative punishment.

Reinforcement in Operant Conditioning

Reinforcement is any event that strengthens or increases the behavior it follows. There are two kinds of reinforcers. In both of these cases of reinforcement, the behavior increases.

  • Positive reinforcers  are favorable events or outcomes that are presented after the behavior. In positive reinforcement situations, a response or behavior is strengthened by the addition of praise or a direct reward. If you do a good job at work and your manager gives you a bonus, that bonus is a positive reinforcer.
  • Negative reinforcers  involve the removal of an unfavorable events or outcomes after the display of a behavior. In these situations, a response is strengthened by the removal of something considered unpleasant. For example, if your child starts to scream in the middle of a restaurant, but stops once you hand them a treat, your action led to the removal of the unpleasant condition, negatively reinforcing your behavior (not your child's).

Punishment in Operant Conditioning

Punishment is the presentation of an adverse event or outcome that causes a decrease in the behavior it follows. There are two kinds of punishment. In both of these cases, the behavior decreases.

  • Positive punishment , sometimes referred to as punishment by application, presents an unfavorable event or outcome in order to weaken the response it follows. Spanking for misbehavior is an example of punishment by application.
  • Negative punishment , also known as punishment by removal, occurs when a favorable event or outcome is removed after a behavior occurs. Taking away a child's video game following misbehavior is an example of negative punishment.

The five principles of operant conditioning are positive reinforcement, negative reinforcement, positive punishment, negative punishment, and extinction. Extinction occurs when a response is no longer reinforced or punished, which can lead to the fading and disappearance of the behavior.

Operant Conditioning Reinforcement Schedules

Reinforcement is not necessarily a straightforward process, and there are a number of factors that can influence how quickly and how well new things are learned. Skinner found that when and how often behaviors were reinforced played a role in the speed and strength of acquisition . In other words, the timing and frequency of reinforcement influenced how new behaviors were learned and how old behaviors were modified.

Skinner identified several different schedules of reinforcement that impact the operant conditioning process:

  • Continuous reinforcement  involves delivering a reinforcement every time a response occurs. Learning tends to occur relatively quickly, yet the response rate is quite low. Extinction also occurs very quickly once reinforcement is halted.
  • Fixed-ratio schedules are a type of partial reinforcement. Responses are reinforced only after a specific number of responses have occurred. This typically leads to a fairly steady response rate.
  • Fixed-interval schedules are another form of partial reinforcement. Reinforcement occurs only after a certain interval of time has elapsed. Response rates remain fairly steady and start to increase as the reinforcement time draws near, but slow immediately after the reinforcement has been delivered.
  • Variable-ratio schedules are also a type of partial reinforcement that involve reinforcing behavior after a varied number of responses. This leads to both a high response rate and slow extinction rates.
  • Variable-interval schedules  are the final form of partial reinforcement Skinner described. This schedule involves delivering reinforcement after a variable amount of time has elapsed. This also tends to lead to a fast response rate and slow extinction rate.

Examples of Operant Conditioning

We can find examples of operant conditioning at work all around us. Consider the case of children completing homework to earn a reward from a parent or teacher, or employees finishing projects to receive praise or promotions. More examples of operant conditioning in action include:

  • After performing in a community theater play, you receive applause from the audience. This acts as a positive reinforcer , inspiring you to try out for more performance roles.
  • You train your dog to fetch by offering him praise and a pat on the head whenever he performs the behavior correctly. This is another positive reinforcer .
  • A professor tells students that if they have perfect attendance all semester, then they do not have to take the final comprehensive exam. By removing an unpleasant stimulus (the final test), students are negatively reinforced to attend class regularly.
  • If you fail to hand in a project on time, your boss becomes angry and berates your performance in front of your co-workers. This acts as a positive punisher , making it less likely that you will finish projects late in the future.
  • A teen girl does not clean up her room as she was asked, so her parents take away her phone for the rest of the day. This is an example of a negative punishment in which a positive stimulus is taken away.

In some of these examples, the promise or possibility of rewards causes an increase in behavior. Operant conditioning can also be used to decrease a behavior via the removal of a desirable outcome or the application of a negative outcome.

For example, a child may be told they will lose recess privileges if they talk out of turn in class. This potential for punishment may lead to a decrease in disruptive behaviors.

A Word From Verywell

While behaviorism may have lost much of the dominance it held during the early part of the 20th century, operant conditioning remains an important and often used tool in the learning and behavior modification process. Sometimes natural consequences lead to changes in our behavior. In other instances, rewards and punishments may be consciously doled out in order to create a change.

Operant conditioning is something you may immediately recognize in your own life, whether it is in your approach to teaching your children good behavior or in training the family dog. Remember that any type of learning takes time. Consider the type of reinforcement or punishment that may work best for your unique situation and assess which type of reinforcement schedule might lead to the best results.

Staddon JE, Cerutti DT. Operant conditioning . Annu Rev Psychol. 2003;54:115-44. doi:10.1146/annurev.psych.54.101601.145124

Rilling M. How the challenge of explaining learning influenced the origins and development of John B. Watson's behaviorism . Am J Psychol . 2000;113(2):275-301.

Athalye VR, Santos FJ, Carmena JM, Costa RM. Evidence for a neural law of effect . Science . 2018;359(6379):1024-1029. doi:10.1126/science.aao6058

Killeen PR, Posadas-Sanchez D, Johansen EB, Thrailkill EA. Progressive ratio schedules of reinforcement . J Exp Psychol Anim Behav Process. 2009;35(1):35-50. doi:10.1037/a0012497

Coon, D & Mitterer, JO. Psychology: A Journey. Wadsworth, 2014.

Domjan, M. The Principles of Learning and Behavior, Seventh Edition. Cengage Learning, 2015.

By Kendra Cherry, MSEd Kendra Cherry, MS, is a psychosocial rehabilitation specialist, psychology educator, and author of the "Everything Psychology Book."

Operant Conditioning: What It Is, How It Works, and Examples

Saul Mcleod, PhD

Editor-in-Chief for Simply Psychology

BSc (Hons) Psychology, MRes, PhD, University of Manchester

Saul Mcleod, PhD., is a qualified psychology teacher with over 18 years of experience in further and higher education. He has been published in peer-reviewed journals, including the Journal of Clinical Psychology.

Learn about our Editorial Process

Olivia Guy-Evans, MSc

Associate Editor for Simply Psychology

BSc (Hons) Psychology, MSc Psychology of Education

Olivia Guy-Evans is a writer and associate editor for Simply Psychology. She has previously worked in healthcare and educational sectors.

On This Page:

Operant conditioning, or instrumental conditioning, is a theory of learning where behavior is influenced by its consequences. Behavior that is reinforced (rewarded) will likely be repeated, and behavior that is punished will occur less frequently.

By the 1920s, John B. Watson had left academic psychology, and other behaviorists were becoming influential, proposing new forms of learning other than classical conditioning . Perhaps the most important of these was Burrhus Frederic Skinner. Although, for obvious reasons, he is more commonly known as B.F. Skinner.

Skinner’s views were slightly less extreme than Watson’s (1913). Skinner believed that we do have such a thing as a mind, but that it is simply more productive to study observable behavior rather than internal mental events.

The work of Skinner was rooted in the view that classical conditioning was far too simplistic to be a complete explanation of complex human behavior. He believed that the best way to understand behavior is to look at the causes of an action and its consequences. He called this approach operant conditioning.

operant Conditioning quick facts

How It Works

Skinner is regarded as the father of Operant Conditioning, but his work was based on Thorndike’s (1898) Law of Effect . According to this principle, behavior that is followed by pleasant consequences is likely to be repeated, and behavior followed by unpleasant consequences is less likely to be repeated.

Skinner introduced a new term into the Law of Effect – Reinforcement. Behavior that is reinforced tends to be repeated (i.e., strengthened); behavior that is not reinforced tends to die out or be extinguished (i.e., weakened).

Skinner (1948) studied operant conditioning by conducting experiments using animals which he placed in a “ Skinner Box ” which was similar to Thorndike’s puzzle box.

Skinner box or operant conditioning chamber experiment outline diagram. Labeled educational laboratory apparatus structure for mouse or rat experiment to understand animal behavior vector illustration

A Skinner box, also known as an operant conditioning chamber, is a device used to objectively record an animal’s behavior in a compressed time frame. An animal can be rewarded or punished for engaging in certain behaviors, such as lever pressing (for rats) or key pecking (for pigeons).

Skinner identified three types of responses, or operant, that can follow behavior.

  • Neutral operants : responses from the environment that neither increase nor decrease the probability of a behavior being repeated.
  • Reinforcers : Responses from the environment that increase the probability of a behavior being repeated. Reinforcers can be either positive or negative.
  • Punishers : Responses from the environment that decrease the likelihood of a behavior being repeated. Punishment weakens behavior.

We can all think of examples of how our own behavior has been affected by reinforcers and punishers. As a child, you probably tried out a number of behaviors and learned from their consequences.

For example, when you were younger, if you tried smoking at school, and the chief consequence was that you got in with the crowd you always wanted to hang out with, you would have been positively reinforced (i.e., rewarded) and would be likely to repeat the behavior.

If, however, the main consequence was that you were caught, caned, suspended from school, and your parents became involved, you would most certainly have been punished, and you would consequently be much less likely to smoke now.

Positive Reinforcement

Positive reinforcement is a term described by B. F. Skinner in his theory of operant conditioning. In positive reinforcement, a response or behavior is strengthened by rewards, leading to the repetition of desired behavior. The reward is a reinforcing stimulus.

Primary reinforcers are stimuli that are naturally reinforcing because they are not learned and directly satisfy a need, such as food or water.

Secondary reinforcers are stimuli that are reinforced through their association with a primary reinforcer, such as money, school grades. They do not directly satisfy an innate need but may be the means.  So a secondary reinforcer can be just as powerful a motivator as a primary reinforcer.

Skinner showed how positive reinforcement worked by placing a hungry rat in his Skinner box. The box contained a lever on the side, and as the rat moved about the box, it would accidentally knock the lever. Immediately it did so that a food pellet would drop into a container next to the lever.

The rats quickly learned to go straight to the lever after being put in the box a few times. The consequence of receiving food, if they pressed the lever, ensured that they would repeat the action again and again.

Positive reinforcement strengthens a behavior by providing a consequence an individual finds rewarding. For example, if your teacher gives you £5 each time you complete your homework (i.e., a reward), you will be more likely to repeat this behavior in the future, thus strengthening the behavior of completing your homework.

The Premack principle is a form of positive reinforcement in operant conditioning. It suggests using a preferred activity (high-probability behavior) as a reward for completing a less preferred one (low-probability behavior).

This method incentivizes the less desirable behavior by associating it with a desirable outcome, thus strengthening the less favored behavior.

Operant Conditioning Reinforcement 1

Negative Reinforcement

Negative reinforcement is the termination of an unpleasant state following a response.

This is known as negative reinforcement because it is the removal of an adverse stimulus which is ‘rewarding’ to the animal or person. Negative reinforcement strengthens behavior because it stops or removes an unpleasant experience.

For example, if you do not complete your homework, you give your teacher £5. You will complete your homework to avoid paying £5, thus strengthening the behavior of completing your homework.

Skinner showed how negative reinforcement worked by placing a rat in his Skinner box and then subjecting it to an unpleasant electric current which caused it some discomfort. As the rat moved about the box it would accidentally knock the lever.

Immediately, it did so the electric current would be switched off. The rats quickly learned to go straight to the lever after a few times of being put in the box. The consequence of escaping the electric current ensured that they would repeat the action again and again.

In fact, Skinner even taught the rats to avoid the electric current by turning on a light just before the electric current came on. The rats soon learned to press the lever when the light came on because they knew that this would stop the electric current from being switched on.

These two learned responses are known as Escape Learning and Avoidance Learning .

Punishment is the opposite of reinforcement since it is designed to weaken or eliminate a response rather than increase it. It is an aversive event that decreases the behavior that it follows.

Like reinforcement, punishment can work either by directly applying an unpleasant stimulus like a shock after a response or by removing a potentially rewarding stimulus, for instance, deducting someone’s pocket money to punish undesirable behavior.

Note : It is not always easy to distinguish between punishment and negative reinforcement.

They are two distinct methods of punishment used to decrease the likelihood of a specific behavior occurring again, but they involve different types of consequences:

Positive Punishment :

  • Positive punishment involves adding an aversive stimulus or something unpleasant immediately following a behavior to decrease the likelihood of that behavior happening in the future.
  • It aims to weaken the target behavior by associating it with an undesirable consequence.
  • Example : A child receives a scolding (an aversive stimulus) from their parent immediately after hitting their sibling. This is intended to decrease the likelihood of the child hitting their sibling again.

Negative Punishment :

  • Negative punishment involves removing a desirable stimulus or something rewarding immediately following a behavior to decrease the likelihood of that behavior happening in the future.
  • It aims to weaken the target behavior by taking away something the individual values or enjoys.
  • Example : A teenager loses their video game privileges (a desirable stimulus) for not completing their chores. This is intended to decrease the likelihood of the teenager neglecting their chores in the future.
There are many problems with using punishment, such as:
  • Punished behavior is not forgotten, it’s suppressed – behavior returns when punishment is no longer present.
  • Causes increased aggression – shows that aggression is a way to cope with problems.
  • Creates fear that can generalize to undesirable behaviors, e.g., fear of school.
  • Does not necessarily guide you toward desired behavior – reinforcement tells you what to do, and punishment only tells you what not to do.

Examples of Operant Conditioning

Positive Reinforcement : Suppose you are a coach and want your team to improve their passing accuracy in soccer. When the players execute accurate passes during training, you praise their technique. This positive feedback encourages them to repeat the correct passing behavior.

Negative Reinforcement : If you notice your team working together effectively and exhibiting excellent team spirit during a tough training session, you might end the training session earlier than planned, which the team perceives as a relief. They understand that teamwork leads to positive outcomes, reinforcing team behavior.

Negative Punishment : If an office worker continually arrives late, their manager might revoke the privilege of flexible working hours. This removal of a positive stimulus encourages the employee to be punctual.

Positive Reinforcement : Training a cat to use a litter box can be achieved by giving it a treat each time it uses it correctly. The cat will associate the behavior with the reward and will likely repeat it.

Negative Punishment : If teenagers stay out past their curfew, their parents might take away their gaming console for a week. This makes the teenager more likely to respect their curfew in the future to avoid losing something they value.

Ineffective Punishment : Your child refuses to finish their vegetables at dinner. You punish them by not allowing dessert, but the child still refuses to eat vegetables next time. The punishment seems ineffective.

Premack Principle Application : You could motivate your child to eat vegetables by offering an activity they love after they finish their meal. For instance, for every vegetable eaten, they get an extra five minutes of video game time. They value video game time, which might encourage them to eat vegetables.

Other Premack Principle Examples :

  • A student who dislikes history but loves art might earn extra time in the art studio for each history chapter reviewed.
  • For every 10 minutes a person spends on household chores, they can spend 5 minutes on a favorite hobby.
  • For each successful day of healthy eating, an individual allows themselves a small piece of dark chocolate at the end of the day.
  • A child can choose between taking out the trash or washing the dishes. Giving them the choice makes them more likely to complete the chore willingly.

Skinner’s Pigeon Experiment

B.F. Skinner conducted several experiments with pigeons to demonstrate the principles of operant conditioning.

One of the most famous of these experiments is often colloquially referred to as “ Superstition in the Pigeon .”

This experiment was conducted to explore the effects of non-contingent reinforcement on pigeons, leading to some fascinating observations that can be likened to human superstitions.

Non-contingent reinforcement (NCR) refers to a method in which rewards (or reinforcements) are delivered independently of the individual’s behavior. In other words, the reinforcement is given at set times or intervals, regardless of what the individual is doing.

The Experiment:

  • Pigeons were brought to a state of hunger, reduced to 75% of their well-fed weight.
  • They were placed in a cage with a food hopper that could be presented for five seconds at a time.
  • Instead of the food being given as a result of any specific action by the pigeon, it was presented at regular intervals, regardless of the pigeon’s behavior.

Observation:

  • Over time, Skinner observed that the pigeons began to associate whatever random action they were doing when food was delivered with the delivery of the food itself.
  • This led the pigeons to repeat these actions, believing (in anthropomorphic terms) that their behavior was causing the food to appear.
  • In most cases, pigeons developed different “superstitious” behaviors or rituals. For instance, one pigeon would turn counter-clockwise between food presentations, while another would thrust its head into a cage corner.
  • These behaviors did not appear until the food hopper was introduced and presented periodically.
  • These behaviors were not initially related to the food delivery but became linked in the pigeon’s mind due to the coincidental timing of the food dispensing.
  • The behaviors seemed to be associated with the environment, suggesting the pigeons were responding to certain aspects of their surroundings.
  • The rate of reinforcement (how often the food was presented) played a significant role. Shorter intervals between food presentations led to more rapid and defined conditioning.
  • Once a behavior was established, the interval between reinforcements could be increased without diminishing the behavior.

Superstitious Behavior:

The pigeons began to act as if their behaviors had a direct effect on the presentation of food, even though there was no such connection. This is likened to human superstitions, where rituals are believed to change outcomes, even if they have no real effect.

For example, a card player might have rituals to change their luck, or a bowler might make gestures believing they can influence a ball already in motion.

Conclusion:

This experiment demonstrates that behaviors can be conditioned even without a direct cause-and-effect relationship. Just like humans, pigeons can develop “superstitious” behaviors based on coincidental occurrences.

This study not only sheds light on the intricacies of operant conditioning but also draws parallels between animal and human behaviors in the face of random reinforcements.

Schedules of Reinforcement

Imagine a rat in a “Skinner box.” In operant conditioning, if no food pellet is delivered immediately after the lever is pressed then after several attempts the rat stops pressing the lever (how long would someone continue to go to work if their employer stopped paying them?). The behavior has been extinguished.

Behaviorists discovered that different patterns (or schedules) of reinforcement had different effects on the speed of learning and extinction. Ferster and Skinner (1957) devised different ways of delivering reinforcement and found that this had effects on

1. The Response Rate – The rate at which the rat pressed the lever (i.e., how hard the rat worked).

2. The Extinction Rate – The rate at which lever pressing dies out (i.e., how soon the rat gave up).

How Reinforcement Schedules Work

Skinner found that the type of reinforcement which produces the slowest rate of extinction (i.e., people will go on repeating the behavior for the longest time without reinforcement) is variable-ratio reinforcement. The type of reinforcement which has the quickest rate of extinction is continuous reinforcement.

(A) Continuous Reinforcement

An animal/human is positively reinforced every time a specific behavior occurs, e.g., every time a lever is pressed a pellet is delivered, and then food delivery is shut off.

  • Response rate is SLOW
  • Extinction rate is FAST

(B) Fixed Ratio Reinforcement

Behavior is reinforced only after the behavior occurs a specified number of times. e.g., one reinforcement is given after every so many correct responses, e.g., after every 5th response. For example, a child receives a star for every five words spelled correctly.

  • Response rate is FAST
  • Extinction rate is MEDIUM

(C) Fixed Interval Reinforcement

One reinforcement is given after a fixed time interval providing at least one correct response has been made. An example is being paid by the hour. Another example would be every 15 minutes (half hour, hour, etc.) a pellet is delivered (providing at least one lever press has been made) then food delivery is shut off.

  • Response rate is MEDIUM

(D) Variable Ratio Reinforcement

behavior is reinforced after an unpredictable number of times. For examples gambling or fishing.

  • Extinction rate is SLOW (very hard to extinguish because of unpredictability)

(E) Variable Interval Reinforcement

Providing one correct response has been made, reinforcement is given after an unpredictable amount of time has passed, e.g., on average every 5 minutes. An example is a self-employed person being paid at unpredictable times.

  • Extinction rate is SLOW

Applications In Psychology

1. behavior modification therapy.

Behavior modification is a set of therapeutic techniques based on operant conditioning (Skinner, 1938, 1953). The main principle comprises changing environmental events that are related to a person’s behavior. For example, the reinforcement of desired behaviors and ignoring or punishing undesired ones.

This is not as simple as it sounds — always reinforcing desired behavior, for example, is basically bribery.

There are different types of positive reinforcements. Primary reinforcement is when a reward strengths a behavior by itself. Secondary reinforcement is when something strengthens a behavior because it leads to a primary reinforcer.

Examples of behavior modification therapy include token economy and behavior shaping.

Token Economy

Token economy is a system in which targeted behaviors are reinforced with tokens (secondary reinforcers) and later exchanged for rewards (primary reinforcers).

Tokens can be in the form of fake money, buttons, poker chips, stickers, etc. While the rewards can range anywhere from snacks to privileges or activities. For example, teachers use token economy at primary school by giving young children stickers to reward good behavior.

Token economy has been found to be very effective in managing psychiatric patients . However, the patients can become over-reliant on the tokens, making it difficult for them to adjust to society once they leave prison, hospital, etc.

Staff implementing a token economy program have a lot of power. It is important that staff do not favor or ignore certain individuals if the program is to work. Therefore, staff need to be trained to give tokens fairly and consistently even when there are shift changes such as in prisons or in a psychiatric hospital.

Behavior Shaping

A further important contribution made by Skinner (1951) is the notion of behavior shaping through successive approximation.

Skinner argues that the principles of operant conditioning can be used to produce extremely complex behavior if rewards and punishments are delivered in such a way as to encourage move an organism closer and closer to the desired behavior each time.

In shaping, the form of an existing response is gradually changed across successive trials towards a desired target behavior by rewarding exact segments of behavior.

To do this, the conditions (or contingencies) required to receive the reward should shift each time the organism moves a step closer to the desired behavior.

According to Skinner, most animal and human behavior (including language) can be explained as a product of this type of successive approximation.

2. Educational Applications

In the conventional learning situation, operant conditioning applies largely to issues of class and student management, rather than to learning content. It is very relevant to shaping skill performance.

A simple way to shape behavior is to provide feedback on learner performance, e.g., compliments, approval, encouragement, and affirmation.

A variable-ratio produces the highest response rate for students learning a new task, whereby initial reinforcement (e.g., praise) occurs at frequent intervals, and as the performance improves reinforcement occurs less frequently, until eventually only exceptional outcomes are reinforced.

For example, if a teacher wanted to encourage students to answer questions in class they should praise them for every attempt (regardless of whether their answer is correct). Gradually the teacher will only praise the students when their answer is correct, and over time only exceptional answers will be praised.

Unwanted behaviors, such as tardiness and dominating class discussion can be extinguished through being ignored by the teacher (rather than being reinforced by having attention drawn to them). This is not an easy task, as the teacher may appear insincere if he/she thinks too much about the way to behave.

Knowledge of success is also important as it motivates future learning. However, it is important to vary the type of reinforcement given so that the behavior is maintained.

This is not an easy task, as the teacher may appear insincere if he/she thinks too much about the way to behave.

Operant Conditioning vs. Classical Conditioning

Learning type.

While both types of conditioning involve learning, classical conditioning is passive (automatic response to stimuli), while operant conditioning is active (behavior is influenced by consequences).

  • Classical conditioning links an involuntary response with a stimulus. It happens passively on the part of the learner, without rewards or punishments. An example is a dog salivating at the sound of a bell associated with food.
  • Operant conditioning connects voluntary behavior with a consequence. Operant conditioning requires the learner to actively participate and perform some type of action to be rewarded or punished. It’s active, with the learner’s behavior influenced by rewards or punishments. An example is a dog sitting on command to get a treat.

Learning Process

Classical conditioning involves learning through associating stimuli resulting in involuntary responses, while operant conditioning focuses on learning through consequences, shaping voluntary behaviors.

Over time, the person responds to the neutral stimulus as if it were the unconditioned stimulus, even when presented alone. The response is involuntary and automatic.

An example is a dog salivating (response) at the sound of a bell (neutral stimulus) after it has been repeatedly paired with food (unconditioned stimulus).

Behavior followed by pleasant consequences (rewards) is more likely to be repeated, while behavior followed by unpleasant consequences (punishments) is less likely to be repeated.

For instance, if a child gets praised (pleasant consequence) for cleaning their room (behavior), they’re more likely to clean their room in the future.

Conversely, if they get scolded (unpleasant consequence) for not doing their homework, they’re more likely to complete it next time to avoid the scolding.

Timing of Stimulus & Response

The timing of the response relative to the stimulus differs between classical and operant conditioning:

Classical Conditioning (response after the stimulus) : In this form of conditioning, the response occurs after the stimulus. The behavior (response) is determined by what precedes it (stimulus). 

For example, in Pavlov’s classic experiment, the dogs started to salivate (response) after they heard the bell (stimulus) because they associated it with food.

The anticipated consequence influences the behavior or what follows it. It is a more active form of learning, where behaviors are reinforced or punished, thus influencing their likelihood of repetition.

For example, a child might behave well (behavior) in anticipation of a reward (consequence), or avoid a certain behavior to prevent a potential punishment.

Looking at Skinner’s classic studies on pigeons’ / rat’s behavior we can identify some of the major assumptions of the behaviorist approach .

• Psychology should be seen as a science , to be studied in a scientific manner. Skinner’s study of behavior in rats was conducted under carefully controlled laboratory conditions . • Behaviorism is primarily concerned with observable behavior, as opposed to internal events like thinking and emotion. Note that Skinner did not say that the rats learned to press a lever because they wanted food. He instead concentrated on describing the easily observed behavior that the rats acquired. • The major influence on human behavior is learning from our environment. In the Skinner study, because food followed a particular behavior the rats learned to repeat that behavior, e.g., operant conditioning. • There is little difference between the learning that takes place in humans and that in other animals. Therefore research (e.g., operant conditioning) can be carried out on animals (Rats / Pigeons) as well as on humans. Skinner proposed that the way humans learn behavior is much the same as the way the rats learned to press a lever.

So, if your layperson’s idea of psychology has always been of people in laboratories wearing white coats and watching hapless rats try to negotiate mazes in order to get to their dinner, then you are probably thinking of behavioral psychology.

Behaviorism and its offshoots tend to be among the most scientific of the psychological perspectives . The emphasis of behavioral psychology is on how we learn to behave in certain ways.

We are all constantly learning new behaviors and how to modify our existing behavior. behavioral psychology is the psychological approach that focuses on how this learning takes place.

Critical Evaluation

Operant conditioning can be used to explain a wide variety of behaviors, from the process of learning, to addiction and language acquisition . It also has practical applications (such as token economy) which can be applied in classrooms, prisons and psychiatric hospitals.

Researchers have found innovative ways to apply operant conditioning principles to promote health and habit change in humans.

In a recent study, operant conditioning using virtual reality (VR) helped stroke patients use their weakened limb more often during rehabilitation. Patients shifted their weight in VR games by maneuvering a virtual object. When they increased weight on their weakened side, they received rewards like stars. This positive reinforcement conditioned greater paretic limb use (Kumar et al., 2019).

Another study utilized operant conditioning to assist smoking cessation. Participants earned vouchers exchangeable for goods and services for reducing smoking. This reward system reinforced decreasing cigarette use. Many participants achieved long-term abstinence (Dallery et al., 2017).

Through repeated reinforcement, operant conditioning can facilitate forming exercise and eating habits. A person trying to exercise more might earn TV time for every 10 minutes spent working out. An individual aiming to eat healthier may allow themselves a daily dark chocolate square for sticking to nutritious meals. Providing consistent rewards for desired actions can instill new habits (Michie et al., 2009).

Apps like Habitica apply operant conditioning by gamifying habit tracking. Users earn points and collect rewards in a fantasy game for completing real-life habits. This virtual reinforcement helps ingrain positive behaviors (Eckerstorfer et al., 2019).

Operant conditioning also shows promise for managing ADHD and OCD. Rewarding concentration and focus in ADHD children, for example, can strengthen their attention skills (Rosén et al., 2018). Similarly, reinforcing OCD patients for resisting compulsions may diminish obsessive behaviors (Twohig et al., 2018).

However, operant conditioning fails to take into account the role of inherited and cognitive factors in learning, and thus is an incomplete explanation of the learning process in humans and animals.

For example, Kohler (1924) found that primates often seem to solve problems in a flash of insight rather than be trial and error learning. Also, social learning theory (Bandura, 1977) suggests that humans can learn automatically through observation rather than through personal experience.

The use of animal research in operant conditioning studies also raises the issue of extrapolation. Some psychologists argue we cannot generalize from studies on animals to humans as their anatomy and physiology are different from humans, and they cannot think about their experiences and invoke reason, patience, memory or self-comfort.

Frequently Asked Questions

Who discovered operant conditioning.

Operant conditioning was discovered by B.F. Skinner, an American psychologist, in the mid-20th century. Skinner is often regarded as the father of operant conditioning, and his work extensively dealt with the mechanism of reward and punishment for behaviors, with the concept being that behaviors followed by positive outcomes are reinforced, while those followed by negative outcomes are discouraged.

How does operant conditioning differ from classical conditioning?

Operant conditioning differs from classical conditioning, focusing on how voluntary behavior is shaped and maintained by consequences, such as rewards and punishments.

In operant conditioning, a behavior is strengthened or weakened based on the consequences that follow it. In contrast, classical conditioning involves the association of a neutral stimulus with a natural response, creating a new learned response.

While both types of conditioning involve learning and behavior modification, operant conditioning emphasizes the role of reinforcement and punishment in shaping voluntary behavior.

How does operant conditioning relate to social learning theory?

Operant conditioning is a core component of social learning theory , which emphasizes the importance of observational learning and modeling in acquiring and modifying behavior.

Social learning theory suggests that individuals can learn new behaviors by observing others and the consequences of their actions, which is similar to the reinforcement and punishment processes in operant conditioning.

By observing and imitating models, individuals can acquire new skills and behaviors and modify their own behavior based on the outcomes they observe in others.

Overall, both operant conditioning and social learning theory highlight the importance of environmental factors in shaping behavior and learning.

What are the downsides of operant conditioning?

The downsides of using operant conditioning on individuals include the potential for unintended negative consequences, particularly with the use of punishment. Punishment may lead to increased aggression or avoidance behaviors.

Additionally, some behaviors may be difficult to shape or modify using operant conditioning techniques, particularly when they are highly ingrained or tied to complex internal states.

Furthermore, individuals may resist changing their behaviors to meet the expectations of others, particularly if they perceive the demands or consequences of the reinforcement or punishment to be undesirable or unjust.

What is an application of bf skinner’s operant conditioning theory?

An application of B.F. Skinner’s operant conditioning theory is seen in education and classroom management. Teachers use positive reinforcement (rewards) to encourage good behavior and academic achievement, and negative reinforcement or punishment to discourage disruptive behavior.

For example, a student may earn extra recess time (positive reinforcement) for completing homework on time, or lose the privilege to use class computers (negative punishment) for misbehavior.

Further Reading

  • Ivan Pavlov Classical Conditioning Learning and behavior PowerPoint
  • Ayllon, T., & Michael, J. (1959). The psychiatric nurse as a behavioral engineer. Journal of the Experimental Analysis of Behavior, 2(4), 323-334.
  • Bandura, A. (1977). Social learning theory . Englewood Cliffs, NJ: Prentice Hall.
  • Dallery, J., Meredith, S., & Glenn, I. M. (2017). A deposit contract method to deliver abstinence reinforcement for cigarette smoking. Journal of Applied Behavior Analysis, 50 (2), 234–248.
  • Eckerstorfer, L., Tanzer, N. K., Vogrincic-Haselbacher, C., Kedia, G., Brohmer, H., Dinslaken, I., & Corbasson, R. (2019). Key elements of mHealth interventions to successfully increase physical activity: Meta-regression. JMIR mHealth and uHealth, 7 (11), e12100.
  • Ferster, C. B., & Skinner, B. F. (1957). Schedules of reinforcement . New York: Appleton-Century-Crofts.
  • Kohler, W. (1924). The mentality of apes. London: Routledge & Kegan Paul.
  • Kumar, D., Sinha, N., Dutta, A., & Lahiri, U. (2019). Virtual reality-based balance training system augmented with operant conditioning paradigm.  Biomedical Engineering Online ,  18 (1), 1-23.
  • Michie, S., Abraham, C., Whittington, C., McAteer, J., & Gupta, S. (2009). Effective techniques in healthy eating and physical activity interventions: A meta-regression. Health Psychology, 28 (6), 690–701.
  • Rosén, E., Westerlund, J., Rolseth, V., Johnson R. M., Viken Fusen, A., Årmann, E., Ommundsen, R., Lunde, L.-K., Ulleberg, P., Daae Zachrisson, H., & Jahnsen, H. (2018). Effects of QbTest-guided ADHD treatment: A randomized controlled trial. European Child & Adolescent Psychiatry, 27 (4), 447–459.
  • Skinner, B. F. (1948). ‘Superstition’in the pigeon.  Journal of experimental psychology ,  38 (2), 168.
  • Schunk, D. (2016).  Learning theories: An educational perspective . Pearson.
  • Skinner, B. F. (1938). The behavior of organisms: An experimental analysis . New York: Appleton-Century.
  • Skinner, B. F. (1948). Superstition” in the pigeon . Journal of Experimental Psychology, 38 , 168-172.
  • Skinner, B. F. (1951). How to teach animals . Freeman.
  • Skinner, B. F. (1953). Science and human behavior . Macmillan.
  • Thorndike, E. L. (1898). Animal intelligence: An experimental study of the associative processes in animals. Psychological Monographs: General and Applied, 2(4), i-109.
  • Twohig, M. P., Whittal, M. L., Cox, J. M., & Gunter, R. (2010). An initial investigation into the processes of change in ACT, CT, and ERP for OCD. International Journal of Behavioral Consultation and Therapy, 6 (2), 67–83.
  • Watson, J. B. (1913). Psychology as the behaviorist views it . Psychological Review, 20 , 158–177.

Print Friendly, PDF & Email

What Is Operant Conditioning? Definition and Examples

  • Archaeology

operant conditioning essay definition

  • Ph.D., Psychology, Fielding Graduate University
  • M.A., Psychology, Fielding Graduate University
  • B.A., Film Studies, Cornell University

Operant conditioning occurs when an association is made between a particular behavior and a consequence for that behavior. This association is built upon the use of reinforcement and/or punishment to encourage or discourage behavior. Operant conditioning was first defined and studied by behavioral psychologist B.F. Skinner, who conducted several well-known operant conditioning experiments with animal subjects.

Key Takeaways: Operant Conditioning

  • Operant conditioning is the process of learning through reinforcement and punishment.
  • In operant conditioning, behaviors are strengthened or weakened based on the consequences of that behavior.
  • Operant conditioning was defined and studied by behavioral psychologist B.F. Skinner.

B.F. Skinner was a behaviorist , which means he believed that psychology should be limited to the study of observable behaviors. While other behaviorists, like John B. Watson, focused on classical conditioning, Skinner was more interested in the learning that happened through operant conditioning.

He observed that in classical conditioning responses tend to be triggered by innate reflexes that occur automatically. He called this kind of behavior respondent . He distinguished respondent behavior from operant behavior . Operant behavior was the term Skinner used to describe a behavior that is reinforced by the consequences that follow it. Those consequences play an important role in whether or not a behavior is performed again.

Skinner’s ideas were based on Edward Thorndike’s law of effect, which stated that behavior that elicits positive consequences will probably be repeated, while behavior that elicits negative consequences will probably not be repeated. Skinner introduced the concept of reinforcement into Thorndike’s ideas, specifying that behavior that is reinforced will probably be repeated (or strengthened).

To study operant conditioning, Skinner conducted experiments using a “Skinner Box,” a small box that had a lever at one end that would provide food or water when pressed. An animal, like a pigeon or rat, was placed in the box where it was free to move around. Eventually the animal would press the lever and be rewarded. Skinner found that this process resulted in the animal pressing the lever more frequently. Skinner would measure learning by tracking the rate of the animal’s responses when those responses were reinforced.

Reinforcement and Punishment

Through his experiments, Skinner identified the different kinds of reinforcement and punishment that encourage or discourage behavior.

Reinforcement

Reinforcement that closely follows a behavior will encourage and strengthen that behavior. There are two types of reinforcement:

  • Positive reinforcement occurs when a behavior results in a favorable outcome, e.g. a dog receiving a treat after obeying a command, or a student receiving a compliment from the teacher after behaving well in class. These techniques increase the likelihood that the individual will repeat the desired behavior in order to receive the reward again.
  • Negative reinforcement occurs when a behavior results in the removal of an unfavorable experience, e.g. an experimenter ceasing to give a monkey electric shocks when the monkey presses a certain lever. In this case, the lever-pressing behavior is reinforced because the monkey will want to remove the unfavorable electric shocks again.

In addition, Skinner identified two different kinds of reinforcers.

  • Primary reinforcers naturally reinforce behavior because they are innately desirable, e.g. food.
  • Conditioned reinforcers reinforce behavior not because they are innately desirable, but because we learn to associate them with primary reinforcers. For example, Paper money is not innately desirable, but it can be used to acquire innately desirable goods, such as food and shelter.

Punishment is the opposite of reinforcement. When punishment follows a behavior, it discourages and weakens that behavior. There are two kinds of punishment.

  • Positive punishment (or punishment by application) occurs when a behavior is followed by an unfavorable outcome, e.g. a parent spanking a child after the child uses a curse word.
  • Negative punishment (or punishment by removal) occurs when a behavior leads to the removal of something favorable, e.g. a parent who denies a child their weekly allowance because the child has misbehaved.

Although punishment is still widely used, Skinner and many other researchers found that punishment is not always effective. Punishment can suppress a behavior for a time, but the undesired behavior tends to come back in the long run. Punishment can also have unwanted side effects. For example, a child who is punished by a teacher may become uncertain and fearful because they don’t know exactly what to do to avoid future punishments.

Instead of punishment, Skinner and others suggested reinforcing desired behaviors and ignoring unwanted behaviors. Reinforcement tells an individual what behavior is desired, while punishment only tells the individual what behavior isn’t desired.

Behavior Shaping

Operant conditioning can lead to increasingly complex behaviors through shaping , also referred to as the “method of approximations.” Shaping happens in a step-by-step fashion as each part of a more intricate behavior is reinforced. Shaping starts by reinforcing the first part of the behavior. Once that piece of the behavior is mastered, reinforcement only happens when the second part of the behavior occurs. This pattern of reinforcement is continued until the entire behavior is mastered.

For example, when a child is taught to swim, she may initially be praised just for getting in the water. She is praised again when she learns to kick, and again when she learns specific arm strokes. Finally, she is praised for propelling herself through the water by performing a specific stroke and kicking at the same time. Through this process, an entire behavior has been shaped. 

Schedules of Reinforcement

In the real world, behavior is not constantly reinforced. Skinner found that the frequency of reinforcement can impact how quickly and how successfully one learns a new behavior. He specified several reinforcement schedules, each with different timing and frequencies.

  • Continuous reinforcement occurs when a particular response follows each and every performance of a given behavior. Learning happens rapidly with continuous reinforcement. However, if reinforcement is stopped, the behavior will quickly decline and ultimately stop altogether, which is referred to as extinction.
  • Fixed-ratio schedules reward behavior after a specified number of responses. For example, a child may get a star after every fifth chore they complete. On this schedule, the response rate slows right after the reward is delivered.
  • Variable-ratio schedules vary the number of behaviors required to get a reward. This schedule leads to a high rate of responses and is also hard to extinguish because its variability maintains the behavior. Slot machines use this kind of reinforcement schedule.
  • Fixed-interval schedules provide a reward after a specific amount of time passes. Getting paid by the hour is one example of this kind of reinforcement schedule. Much like the fixed-ratio schedule, the response rate increases as the reward approaches but slows down right after the reward is received.
  • Variable-interval schedules vary the amount of time between rewards. For example, a child who receives an allowance at various times during the week as long as they’ve exhibited some positive behaviors is on a variable-interval schedule. The child will continue to exhibit positive behavior in anticipation of eventually receiving their allowance.

Examples of Operant Conditioning

If you’ve ever trained a pet or taught a child, you have likely used operant conditioning in your own life. Operant conditioning is still frequently used in various real-world circumstances, including in the classroom and in therapeutic settings.

For example, a teacher might reinforce students doing their homework regularly by periodically giving pop quizzes that ask questions similar to recent homework assignments. Also, if a child throws a temper tantrum to get attention, the parent can ignore the behavior and then acknowledge the child again once the tantrum has ended.

Operant conditioning is also used in behavior modification , an approach to the treatment of numerous issues in adults and children, including phobias, anxiety, bedwetting, and many others. One way behavior modification can be implemented is through a token economy , in which desired behaviors are reinforced by tokens in the form of digital badges, buttons, chips, stickers, or other objects. Eventually these tokens can be exchanged for real rewards.

While operant conditioning can explain many behaviors and is still widely used, there are several criticisms of the process. First, operant conditioning is accused of being an incomplete explanation for learning because it neglects the role of biological and cognitive elements.

In addition, operant conditioning is reliant upon an authority figure to reinforce behavior and ignores the role of curiosity and an individual's ability to make his or her own discoveries. Critics object to operant conditioning's emphasis on controlling and manipulating behavior, arguing that they can lead to authoritarian practices. Skinner believed that environments naturally control behavior, however, and that people can choose to use that knowledge for good or ill.

Finally, because Skinner’s observations about operant conditioning relied on experiments with animals, he is criticized for extrapolating from his animal studies to make predictions about human behavior. Some psychologists believe this kind of generalization is flawed because humans and non-human animals are physically and cognitively different.

  • Cherry, Kendra. “What is Operant Conditioning and How Does it Work?” Verywell Mind , 2 October 2018. https://www.verywellmind.com/operant-conditioning-a2-2794863
  • Crain, William. Theories of Development: Concepts and Applications. 5th ed., Pearson Prentice Hall. 2005.
  • Goldman, Jason G. “What is Operant Conditioning? (And How Does It Explain Driving Dogs?)” Scientific American , 13 December 2012. https://blogs.scientificamerican.com/thoughtful-animal/what-is-operant-conditioning-and-how-does-it-explain-driving-dogs/
  • McLeod, Saul. “Skinner – Operant Conditioning.” Simply Psychology , 21 January 2018. https://www.simplypsychology.org/operant-conditioning.html#class
  • What Is Behaviorism in Psychology?
  • What Is the Law of Effect in Psychology?
  • Behavior Versus Classroom Management
  • ABC: Antecedent, Behavior, Consequence
  • What Is the Premack Principle? Definition and Examples
  • A Behavior Point System That Improves Math Skills
  • Attending or Attention is the First Preacademic Skill
  • Creating a Comprehensive Classroom Management Plan
  • What Is Classical Conditioning?
  • What Is Social Learning Theory?
  • Behavior Contracts to Support Good Behavior
  • Social Cognitive Theory: How We Learn From the Behavior of Others
  • BIP: The Behavior Intervention Plan
  • Behavior and Classroom Management in Special Education
  • Classroom Essentials for the New Special Educator
  • Operational Definition of Behavior in a School Setting

Logo for UH Pressbooks

Want to create or adapt books like this? Learn more about how Pressbooks supports open publishing practices.

Operant Conditioning

OpenStaxCollege

[latexpage]

Learning Objectives

By the end of this section, you will be able to:

  • Define operant conditioning
  • Explain the difference between reinforcement and punishment
  • Distinguish between reinforcement schedules

The previous section of this chapter focused on the type of associative learning known as classical conditioning. Remember that in classical conditioning, something in the environment triggers a reflex automatically, and researchers train the organism to react to a different stimulus. Now we turn to the second type of associative learning, operant conditioning . In operant conditioning, organisms learn to associate a behavior and its consequence ( [link] ). A pleasant consequence makes that behavior more likely to be repeated in the future. For example, Spirit, a dolphin at the National Aquarium in Baltimore, does a flip in the air when her trainer blows a whistle. The consequence is that she gets a fish.

Psychologist B. F. Skinner saw that classical conditioning is limited to existing behaviors that are reflexively elicited, and it doesn’t account for new behaviors such as riding a bike. He proposed a theory about how such behaviors come about. Skinner believed that behavior is motivated by the consequences we receive for the behavior: the reinforcements and punishments. His idea that learning is the result of consequences is based on the law of effect, which was first proposed by psychologist Edward Thorndike . According to the law of effect , behaviors that are followed by consequences that are satisfying to the organism are more likely to be repeated, and behaviors that are followed by unpleasant consequences are less likely to be repeated (Thorndike, 1911). Essentially, if an organism does something that brings about a desired result, the organism is more likely to do it again. If an organism does something that does not bring about a desired result, the organism is less likely to do it again. An example of the law of effect is in employment. One of the reasons (and often the main reason) we show up for work is because we get paid to do so. If we stop getting paid, we will likely stop showing up—even if we love our job.

Working with Thorndike’s law of effect as his foundation, Skinner began conducting scientific experiments on animals (mainly rats and pigeons) to determine how organisms learn through operant conditioning (Skinner, 1938). He placed these animals inside an operant conditioning chamber, which has come to be known as a “Skinner box” ( [link] ). A Skinner box contains a lever (for rats) or disk (for pigeons) that the animal can press or peck for a food reward via the dispenser. Speakers and lights can be associated with certain behaviors. A recorder counts the number of responses made by the animal.

A photograph shows B.F. Skinner. An illustration shows a rat in a Skinner box: a chamber with a speaker, lights, a lever, and a food dispenser.

Watch this brief video clip to learn more about operant conditioning: Skinner is interviewed, and operant conditioning of pigeons is demonstrated.

In discussing operant conditioning, we use several everyday words—positive, negative, reinforcement, and punishment—in a specialized manner. In operant conditioning, positive and negative do not mean good and bad. Instead, positive means you are adding something, and negative means you are taking something away. Reinforcement means you are increasing a behavior, and punishment means you are decreasing a behavior. Reinforcement can be positive or negative, and punishment can also be positive or negative. All reinforcers (positive or negative) increase the likelihood of a behavioral response. All punishers (positive or negative) decrease the likelihood of a behavioral response. Now let’s combine these four terms: positive reinforcement, negative reinforcement, positive punishment, and negative punishment ( [link] ).

REINFORCEMENT

The most effective way to teach a person or animal a new behavior is with positive reinforcement. In positive reinforcement , a desirable stimulus is added to increase a behavior.

For example, you tell your five-year-old son, Jerome, that if he cleans his room, he will get a toy. Jerome quickly cleans his room because he wants a new art set. Let’s pause for a moment. Some people might say, “Why should I reward my child for doing what is expected?” But in fact we are constantly and consistently rewarded in our lives. Our paychecks are rewards, as are high grades and acceptance into our preferred school. Being praised for doing a good job and for passing a driver’s test is also a reward. Positive reinforcement as a learning tool is extremely effective. It has been found that one of the most effective ways to increase achievement in school districts with below-average reading scores was to pay the children to read. Specifically, second-grade students in Dallas were paid $2 each time they read a book and passed a short quiz about the book. The result was a significant increase in reading comprehension (Fryer, 2010). What do you think about this program? If Skinner were alive today, he would probably think this was a great idea. He was a strong proponent of using operant conditioning principles to influence students’ behavior at school. In fact, in addition to the Skinner box, he also invented what he called a teaching machine that was designed to reward small steps in learning (Skinner, 1961)—an early forerunner of computer-assisted learning. His teaching machine tested students’ knowledge as they worked through various school subjects. If students answered questions correctly, they received immediate positive reinforcement and could continue; if they answered incorrectly, they did not receive any reinforcement. The idea was that students would spend additional time studying the material to increase their chance of being reinforced the next time (Skinner, 1961).

In negative reinforcement , an undesirable stimulus is removed to increase a behavior. For example, car manufacturers use the principles of negative reinforcement in their seatbelt systems, which go “beep, beep, beep” until you fasten your seatbelt. The annoying sound stops when you exhibit the desired behavior, increasing the likelihood that you will buckle up in the future. Negative reinforcement is also used frequently in horse training. Riders apply pressure—by pulling the reins or squeezing their legs—and then remove the pressure when the horse performs the desired behavior, such as turning or speeding up. The pressure is the negative stimulus that the horse wants to remove.

Many people confuse negative reinforcement with punishment in operant conditioning, but they are two very different mechanisms. Remember that reinforcement, even when it is negative, always increases a behavior. In contrast, punishment always decreases a behavior. In positive punishment , you add an undesirable stimulus to decrease a behavior. An example of positive punishment is scolding a student to get the student to stop texting in class. In this case, a stimulus (the reprimand) is added in order to decrease the behavior (texting in class). In negative punishment , you remove an aversive stimulus to decrease behavior. For example, when a child misbehaves, a parent can take away a favorite toy. In this case, a stimulus (the toy) is removed in order to decrease the behavior.

Punishment, especially when it is immediate, is one way to decrease undesirable behavior. For example, imagine your four-year-old son, Brandon, hit his younger brother. You have Brandon write 100 times “I will not hit my brother” (positive punishment). Chances are he won’t repeat this behavior. While strategies like this are common today, in the past children were often subject to physical punishment, such as spanking. It’s important to be aware of some of the drawbacks in using physical punishment on children. First, punishment may teach fear. Brandon may become fearful of the street, but he also may become fearful of the person who delivered the punishment—you, his parent. Similarly, children who are punished by teachers may come to fear the teacher and try to avoid school (Gershoff et al., 2010). Consequently, most schools in the United States have banned corporal punishment. Second, punishment may cause children to become more aggressive and prone to antisocial behavior and delinquency (Gershoff, 2002). They see their parents resort to spanking when they become angry and frustrated, so, in turn, they may act out this same behavior when they become angry and frustrated. For example, because you spank Brenda when you are angry with her for her misbehavior, she might start hitting her friends when they won’t share their toys.

While positive punishment can be effective in some cases, Skinner suggested that the use of punishment should be weighed against the possible negative effects. Today’s psychologists and parenting experts favor reinforcement over punishment—they recommend that you catch your child doing something good and reward her for it.

In his operant conditioning experiments, Skinner often used an approach called shaping. Instead of rewarding only the target behavior, in shaping , we reward successive approximations of a target behavior. Why is shaping needed? Remember that in order for reinforcement to work, the organism must first display the behavior. Shaping is needed because it is extremely unlikely that an organism will display anything but the simplest of behaviors spontaneously. In shaping, behaviors are broken down into many small, achievable steps. The specific steps used in the process are the following:

Shaping is often used in teaching a complex behavior or chain of behaviors. Skinner used shaping to teach pigeons not only such relatively simple behaviors as pecking a disk in a Skinner box, but also many unusual and entertaining behaviors, such as turning in circles, walking in figure eights, and even playing ping pong; the technique is commonly used by animal trainers today. An important part of shaping is stimulus discrimination. Recall Pavlov’s dogs—he trained them to respond to the tone of a bell, and not to similar tones or sounds. This discrimination is also important in operant conditioning and in shaping behavior.

Here is a brief video of Skinner’s pigeons playing ping pong.

It’s easy to see how shaping is effective in teaching behaviors to animals, but how does shaping work with humans? Let’s consider parents whose goal is to have their child learn to clean his room. They use shaping to help him master steps toward the goal. Instead of performing the entire task, they set up these steps and reinforce each step. First, he cleans up one toy. Second, he cleans up five toys. Third, he chooses whether to pick up ten toys or put his books and clothes away. Fourth, he cleans up everything except two toys. Finally, he cleans his entire room.

PRIMARY AND SECONDARY REINFORCERS

Rewards such as stickers, praise, money, toys, and more can be used to reinforce learning. Let’s go back to Skinner’s rats again. How did the rats learn to press the lever in the Skinner box? They were rewarded with food each time they pressed the lever. For animals, food would be an obvious reinforcer.

What would be a good reinforce for humans? For your daughter Sydney, it was the promise of a toy if she cleaned her room. How about Joaquin, the soccer player? If you gave Joaquin a piece of candy every time he made a goal, you would be using a primary reinforcer . Primary reinforcers are reinforcers that have innate reinforcing qualities. These kinds of reinforcers are not learned. Water, food, sleep, shelter, sex, and touch, among others, are primary reinforcers. Pleasure is also a primary reinforcer. Organisms do not lose their drive for these things. For most people, jumping in a cool lake on a very hot day would be reinforcing and the cool lake would be innately reinforcing—the water would cool the person off (a physical need), as well as provide pleasure.

A secondary reinforcer has no inherent value and only has reinforcing qualities when linked with a primary reinforcer. Praise, linked to affection, is one example of a secondary reinforcer, as when you called out “Great shot!” every time Joaquin made a goal. Another example, money, is only worth something when you can use it to buy other things—either things that satisfy basic needs (food, water, shelter—all primary reinforcers) or other secondary reinforcers. If you were on a remote island in the middle of the Pacific Ocean and you had stacks of money, the money would not be useful if you could not spend it. What about the stickers on the behavior chart? They also are secondary reinforcers.

Sometimes, instead of stickers on a sticker chart, a token is used. Tokens, which are also secondary reinforcers, can then be traded in for rewards and prizes. Entire behavior management systems, known as token economies, are built around the use of these kinds of token reinforcers. Token economies have been found to be very effective at modifying behavior in a variety of settings such as schools, prisons, and mental hospitals. For example, a study by Cangi and Daly (2013) found that use of a token economy increased appropriate social behaviors and reduced inappropriate behaviors in a group of autistic school children. Autistic children tend to exhibit disruptive behaviors such as pinching and hitting. When the children in the study exhibited appropriate behavior (not hitting or pinching), they received a “quiet hands” token. When they hit or pinched, they lost a token. The children could then exchange specified amounts of tokens for minutes of playtime.

Parents and teachers often use behavior modification to change a child’s behavior. Behavior modification uses the principles of operant conditioning to accomplish behavior change so that undesirable behaviors are switched for more socially acceptable ones. Some teachers and parents create a sticker chart, in which several behaviors are listed ( [link] ). Sticker charts are a form of token economies, as described in the text. Each time children perform the behavior, they get a sticker, and after a certain number of stickers, they get a prize, or reinforcer. The goal is to increase acceptable behaviors and decrease misbehavior. Remember, it is best to reinforce desired behaviors, rather than to use punishment. In the classroom, the teacher can reinforce a wide range of behaviors, from students raising their hands, to walking quietly in the hall, to turning in their homework. At home, parents might create a behavior chart that rewards children for things such as putting away toys, brushing their teeth, and helping with dinner. In order for behavior modification to be effective, the reinforcement needs to be connected with the behavior; the reinforcement must matter to the child and be done consistently.

A photograph shows a child placing stickers on a chart hanging on the wall.

Time-out is another popular technique used in behavior modification with children. It operates on the principle of negative punishment. When a child demonstrates an undesirable behavior, she is removed from the desirable activity at hand ( [link] ). For example, say that Sophia and her brother Mario are playing with building blocks. Sophia throws some blocks at her brother, so you give her a warning that she will go to time-out if she does it again. A few minutes later, she throws more blocks at Mario. You remove Sophia from the room for a few minutes. When she comes back, she doesn’t throw blocks.

There are several important points that you should know if you plan to implement time-out as a behavior modification technique. First, make sure the child is being removed from a desirable activity and placed in a less desirable location. If the activity is something undesirable for the child, this technique will backfire because it is more enjoyable for the child to be removed from the activity. Second, the length of the time-out is important. The general rule of thumb is one minute for each year of the child’s age. Sophia is five; therefore, she sits in a time-out for five minutes. Setting a timer helps children know how long they have to sit in time-out. Finally, as a caregiver, keep several guidelines in mind over the course of a time-out: remain calm when directing your child to time-out; ignore your child during time-out (because caregiver attention may reinforce misbehavior); and give the child a hug or a kind word when time-out is over.

Photograph A shows several children climbing on playground equipment. Photograph B shows a child sitting alone at a table looking at the playground.

REINFORCEMENT SCHEDULES

Remember, the best way to teach a person or animal a behavior is to use positive reinforcement. For example, Skinner used positive reinforcement to teach rats to press a lever in a Skinner box. At first, the rat might randomly hit the lever while exploring the box, and out would come a pellet of food. After eating the pellet, what do you think the hungry rat did next? It hit the lever again, and received another pellet of food. Each time the rat hit the lever, a pellet of food came out. When an organism receives a reinforcer each time it displays a behavior, it is called continuous reinforcement . This reinforcement schedule is the quickest way to teach someone a behavior, and it is especially effective in training a new behavior. Let’s look back at the dog that was learning to sit earlier in the chapter. Now, each time he sits, you give him a treat. Timing is important here: you will be most successful if you present the reinforcer immediately after he sits, so that he can make an association between the target behavior (sitting) and the consequence (getting a treat).

Watch this video clip where veterinarian Dr. Sophia Yin shapes a dog’s behavior using the steps outlined above.

Once a behavior is trained, researchers and trainers often turn to another type of reinforcement schedule—partial reinforcement. In partial reinforcement , also referred to as intermittent reinforcement, the person or animal does not get reinforced every time they perform the desired behavior. There are several different types of partial reinforcement schedules ( [link] ). These schedules are described as either fixed or variable, and as either interval or ratio. Fixed refers to the number of responses between reinforcements, or the amount of time between reinforcements, which is set and unchanging. Variable refers to the number of responses or amount of time between reinforcements, which varies or changes. Interval means the schedule is based on the time between reinforcements, and ratio means the schedule is based on the number of responses between reinforcements.

Now let’s combine these four terms. A fixed interval reinforcement schedule is when behavior is rewarded after a set amount of time. For example, June undergoes major surgery in a hospital. During recovery, she is expected to experience pain and will require prescription medications for pain relief. June is given an IV drip with a patient-controlled painkiller. Her doctor sets a limit: one dose per hour. June pushes a button when pain becomes difficult, and she receives a dose of medication. Since the reward (pain relief) only occurs on a fixed interval, there is no point in exhibiting the behavior when it will not be rewarded.

With a variable interval reinforcement schedule , the person or animal gets the reinforcement based on varying amounts of time, which are unpredictable. Say that Manuel is the manager at a fast-food restaurant. Every once in a while someone from the quality control division comes to Manuel’s restaurant. If the restaurant is clean and the service is fast, everyone on that shift earns a $20 bonus. Manuel never knows when the quality control person will show up, so he always tries to keep the restaurant clean and ensures that his employees provide prompt and courteous service. His productivity regarding prompt service and keeping a clean restaurant are steady because he wants his crew to earn the bonus.

With a fixed ratio reinforcement schedule , there are a set number of responses that must occur before the behavior is rewarded. Carla sells glasses at an eyeglass store, and she earns a commission every time she sells a pair of glasses. She always tries to sell people more pairs of glasses, including prescription sunglasses or a backup pair, so she can increase her commission. She does not care if the person really needs the prescription sunglasses, Carla just wants her bonus. The quality of what Carla sells does not matter because her commission is not based on quality; it’s only based on the number of pairs sold. This distinction in the quality of performance can help determine which reinforcement method is most appropriate for a particular situation. Fixed ratios are better suited to optimize the quantity of output, whereas a fixed interval, in which the reward is not quantity based, can lead to a higher quality of output.

In a variable ratio reinforcement schedule , the number of responses needed for a reward varies. This is the most powerful partial reinforcement schedule. An example of the variable ratio reinforcement schedule is gambling. Imagine that Sarah—generally a smart, thrifty woman—visits Las Vegas for the first time. She is not a gambler, but out of curiosity she puts a quarter into the slot machine, and then another, and another. Nothing happens. Two dollars in quarters later, her curiosity is fading, and she is just about to quit. But then, the machine lights up, bells go off, and Sarah gets 50 quarters back. That’s more like it! Sarah gets back to inserting quarters with renewed interest, and a few minutes later she has used up all her gains and is $10 in the hole. Now might be a sensible time to quit. And yet, she keeps putting money into the slot machine because she never knows when the next reinforcement is coming. She keeps thinking that with the next quarter she could win $50, or $100, or even more. Because the reinforcement schedule in most types of gambling has a variable ratio schedule, people keep trying and hoping that the next time they will win big. This is one of the reasons that gambling is so addictive—and so resistant to extinction.

In operant conditioning, extinction of a reinforced behavior occurs at some point after reinforcement stops, and the speed at which this happens depends on the reinforcement schedule. In a variable ratio schedule, the point of extinction comes very slowly, as described above. But in the other reinforcement schedules, extinction may come quickly. For example, if June presses the button for the pain relief medication before the allotted time her doctor has approved, no medication is administered. She is on a fixed interval reinforcement schedule (dosed hourly), so extinction occurs quickly when reinforcement doesn’t come at the expected time. Among the reinforcement schedules, variable ratio is the most productive and the most resistant to extinction. Fixed interval is the least productive and the easiest to extinguish ( [link] ).

A graph has an x-axis labeled “Time” and a y-axis labeled “Cumulative number of responses.” Two lines labeled “Variable Ratio” and “Fixed Ratio” have similar, steep slopes. The variable ratio line remains straight and is marked in random points where reinforcement occurs. The fixed ratio line has consistently spaced marks indicating where reinforcement has occurred, but after each reinforcement, there is a small drop in the line before it resumes its overall slope. Two lines labeled “Variable Interval” and “Fixed Interval” have similar slopes at roughly a 45-degree angle. The variable interval line remains straight and is marked in random points where reinforcement occurs. The fixed interval line has consistently spaced marks indicating where reinforcement has occurred, but after each reinforcement, there is a drop in the line.

Skinner (1953) stated, “If the gambling establishment cannot persuade a patron to turn over money with no return, it may achieve the same effect by returning part of the patron’s money on a variable-ratio schedule” (p. 397).

Skinner uses gambling as an example of the power and effectiveness of conditioning behavior based on a variable ratio reinforcement schedule. In fact, Skinner was so confident in his knowledge of gambling addiction that he even claimed he could turn a pigeon into a pathological gambler (“Skinner’s Utopia,” 1971). Beyond the power of variable ratio reinforcement, gambling seems to work on the brain in the same way as some addictive drugs. The Illinois Institute for Addiction Recovery (n.d.) reports evidence suggesting that pathological gambling is an addiction similar to a chemical addiction ( [link] ). Specifically, gambling may activate the reward centers of the brain, much like cocaine does. Research has shown that some pathological gamblers have lower levels of the neurotransmitter (brain chemical) known as norepinephrine than do normal gamblers (Roy, et al., 1988). According to a study conducted by Alec Roy and colleagues, norepinephrine is secreted when a person feels stress, arousal, or thrill; pathological gamblers use gambling to increase their levels of this neurotransmitter. Another researcher, neuroscientist Hans Breiter, has done extensive research on gambling and its effects on the brain. Breiter (as cited in Franzen, 2001) reports that “Monetary reward in a gambling-like experiment produces brain activation very similar to that observed in a cocaine addict receiving an infusion of cocaine” (para. 1). Deficiencies in serotonin (another neurotransmitter) might also contribute to compulsive behavior, including a gambling addiction.

It may be that pathological gamblers’ brains are different than those of other people, and perhaps this difference may somehow have led to their gambling addiction, as these studies seem to suggest. However, it is very difficult to ascertain the cause because it is impossible to conduct a true experiment (it would be unethical to try to turn randomly assigned participants into problem gamblers). Therefore, it may be that causation actually moves in the opposite direction—perhaps the act of gambling somehow changes neurotransmitter levels in some gamblers’ brains. It also is possible that some overlooked factor, or confounding variable, played a role in both the gambling addiction and the differences in brain chemistry.

A photograph shows four digital gaming machines.

COGNITION AND LATENT LEARNING

Although strict behaviorists such as Skinner and Watson refused to believe that cognition (such as thoughts and expectations) plays a role in learning, another behaviorist, Edward C. Tolman , had a different opinion. Tolman’s experiments with rats demonstrated that organisms can learn even if they do not receive immediate reinforcement (Tolman & Honzik, 1930; Tolman, Ritchie, & Kalish, 1946). This finding was in conflict with the prevailing idea at the time that reinforcement must be immediate in order for learning to occur, thus suggesting a cognitive aspect to learning.

In the experiments, Tolman placed hungry rats in a maze with no reward for finding their way through it. He also studied a comparison group that was rewarded with food at the end of the maze. As the unreinforced rats explored the maze, they developed a cognitive map : a mental picture of the layout of the maze ( [link] ). After 10 sessions in the maze without reinforcement, food was placed in a goal box at the end of the maze. As soon as the rats became aware of the food, they were able to find their way through the maze quickly, just as quickly as the comparison group, which had been rewarded with food all along. This is known as latent learning : learning that occurs but is not observable in behavior until there is a reason to demonstrate it.

An illustration shows three rats in a maze, with a starting point and food at the end.

Latent learning also occurs in humans. Children may learn by watching the actions of their parents but only demonstrate it at a later date, when the learned material is needed. For example, suppose that Ravi’s dad drives him to school every day. In this way, Ravi learns the route from his house to his school, but he’s never driven there himself, so he has not had a chance to demonstrate that he’s learned the way. One morning Ravi’s dad has to leave early for a meeting, so he can’t drive Ravi to school. Instead, Ravi follows the same route on his bike that his dad would have taken in the car. This demonstrates latent learning. Ravi had learned the route to school, but had no need to demonstrate this knowledge earlier.

Have you ever gotten lost in a building and couldn’t find your way back out? While that can be frustrating, you’re not alone. At one time or another we’ve all gotten lost in places like a museum, hospital, or university library. Whenever we go someplace new, we build a mental representation—or cognitive map—of the location, as Tolman’s rats built a cognitive map of their maze. However, some buildings are confusing because they include many areas that look alike or have short lines of sight. Because of this, it’s often difficult to predict what’s around a corner or decide whether to turn left or right to get out of a building. Psychologist Laura Carlson (2010) suggests that what we place in our cognitive map can impact our success in navigating through the environment. She suggests that paying attention to specific features upon entering a building, such as a picture on the wall, a fountain, a statue, or an escalator, adds information to our cognitive map that can be used later to help find our way out of the building.

Watch this video to learn more about Carlson’s studies on cognitive maps and navigation in buildings.

Operant conditioning is based on the work of B. F. Skinner. Operant conditioning is a form of learning in which the motivation for a behavior happens after the behavior is demonstrated. An animal or a human receives a consequence after performing a specific behavior. The consequence is either a reinforcer or a punisher. All reinforcement (positive or negative) increases the likelihood of a behavioral response. All punishment (positive or negative) decreases the likelihood of a behavioral response. Several types of reinforcement schedules are used to reward behavior depending on either a set or variable period of time.

Review Questions

________ is when you take away a pleasant stimulus to stop a behavior.

  • positive reinforcement
  • negative reinforcement
  • positive punishment
  • negative punishment

Which of the following is not an example of a primary reinforcer?

Rewarding successive approximations toward a target behavior is ________.

Slot machines reward gamblers with money according to which reinforcement schedule?

  • fixed ratio
  • variable ratio
  • fixed interval
  • variable interval

Critical Thinking Questions

What is a Skinner box and what is its purpose?

A Skinner box is an operant conditioning chamber used to train animals such as rats and pigeons to perform certain behaviors, like pressing a lever. When the animals perform the desired behavior, they receive a reward: food or water.

What is the difference between negative reinforcement and punishment?

In negative reinforcement you are taking away an undesirable stimulus in order to increase the frequency of a certain behavior (e.g., buckling your seat belt stops the annoying beeping sound in your car and increases the likelihood that you will wear your seatbelt). Punishment is designed to reduce a behavior (e.g., you scold your child for running into the street in order to decrease the unsafe behavior.)

What is shaping and how would you use shaping to teach a dog to roll over?

Shaping is an operant conditioning method in which you reward closer and closer approximations of the desired behavior. If you want to teach your dog to roll over, you might reward him first when he sits, then when he lies down, and then when he lies down and rolls onto his back. Finally, you would reward him only when he completes the entire sequence: lying down, rolling onto his back, and then continuing to roll over to his other side.

Personal Application Questions

Explain the difference between negative reinforcement and punishment, and provide several examples of each based on your own experiences.

Think of a behavior that you have that you would like to change. How could you use behavior modification, specifically positive reinforcement, to change your behavior? What is your positive reinforcer?

Operant Conditioning Copyright © 2014 by OpenStaxCollege is licensed under a Creative Commons Attribution 4.0 International License , except where otherwise noted.

  • Subject List
  • Take a Tour
  • For Authors
  • Subscriber Services
  • Publications
  • African American Studies
  • African Studies
  • American Literature
  • Anthropology
  • Architecture Planning and Preservation
  • Art History
  • Atlantic History
  • Biblical Studies
  • British and Irish Literature
  • Childhood Studies
  • Chinese Studies
  • Cinema and Media Studies
  • Communication
  • Criminology
  • Environmental Science
  • Evolutionary Biology
  • International Law
  • International Relations
  • Islamic Studies
  • Jewish Studies
  • Latin American Studies
  • Latino Studies
  • Linguistics
  • Literary and Critical Theory
  • Medieval Studies
  • Military History
  • Political Science
  • Public Health
  • Renaissance and Reformation
  • Social Work
  • Urban Studies
  • Victorian Literature
  • Browse All Subjects

How to Subscribe

  • Free Trials

In This Article Expand or collapse the "in this article" section Operant Conditioning

Introduction.

  • Data Sources
  • Units of Analysis
  • Reinforcement
  • Conditioned Reinforcement
  • Stimulus Control
  • Motivating Operations
  • Symbolic and Complex Behavior
  • Verbal Behavior
  • Choice and Behavioral Economics
  • Applied Behavior Analysis
  • Conceptual and Philosophical Foundations
  • Social Issues and Culture
  • Methodology
  • Behavioral Pharmacology and Substance Abuse
  • Biology and Behavior
  • Behavior Therapy
  • Organizational Behavior Management
  • Misconceptions and Responses

Related Articles Expand or collapse the "related articles" section about

About related articles close popup.

Lorem Ipsum Sit Dolor Amet

Vestibulum ante ipsum primis in faucibus orci luctus et ultrices posuere cubilia Curae; Aliquam ligula odio, euismod ut aliquam et, vestibulum nec risus. Nulla viverra, arcu et iaculis consequat, justo diam ornare tellus, semper ultrices tellus nunc eu tellus.

  • Animal Learning
  • B.F. Skinner
  • Neuroscience of Associative Learning

Other Subject Areas

Forthcoming articles expand or collapse the "forthcoming articles" section.

  • Data Visualization
  • Remote Work
  • Workforce Training Evaluation
  • Find more forthcoming articles...
  • Export Citations
  • Share This Facebook LinkedIn Twitter

Operant Conditioning by Jesse Dallery , Brantley Jarvis , Allison Kurti LAST REVIEWED: 22 May 2017 LAST MODIFIED: 23 August 2017 DOI: 10.1093/obo/9780199828340-0043

The study of operant conditioning represents a natural-science approach to understanding the causes of goal-directed behavior. Operant behavior produces changes in the physical or social environment, and these consequences influence whether such behavior occurs in the future. Thus, operant behavior is selected by its consequences. The basic unit of analysis in the operant framework is the operant, or operant class, which is a class of activities that produces the same consequence. For example, an operant such as joke telling is shaped and maintained by positive social consequences (e.g., laughter) or extinguished by negative social consequences (e.g., silence). Selection of operant behavior is analogous to the selection of biological traits via natural selection. The environment (physical, social, cultural) selects behavior via the processes of reinforcement and punishment. Stimuli present when these processes occur become occasioning or discriminative stimuli for particular operants. More complex forms of learning, such as conceptual and symbolic behavior, are also considered to be forms of operant behavior. Whether simple or complex, operant behavior is always included within a three-term contingency: discriminative stimulus, operant behavior, and reinforcing or punishing consequence. The three-term contingency is deceptively simple, as the probabilities of occurrence represented by each term can vary over time. In addition, the under-represented role of verbal behavior further enriches and complicates the picture of human behavior. From its inception, the operant analysis has also included private behavior such as thoughts, feelings, and other aspects of the “inside story.” The operant framework has led to a number of extensions and applications to human affairs, including the treatment of developmental disorders, interventions for psychopathology, teaching technologies for classrooms, strategies to improve behavior in business and occupational settings, and approaches to reduce substance use and abuse. Although less empirical in nature, the operant framework has also been extended to explanations of cultural behavior and future threats posed by consumerism, nuclear proliferation, and other human rights and social justice issues. Operant conditioning has a long history of being mischaracterized, and several responses to these claims have appeared in the literature.

A variety of textbooks cover operant principles and applications. The textbooks here focus on general principles. For those new to the field, Johnston 2014 and Baum 2005 provide a good introduction to some of the philosophical and conceptual background of behavior analysis and are less focused on empirical findings. Schneider 2012 is geared toward a general audience, and it is also recommended for those new to the field and to anyone interested in behavior. Pierce and Cheney 2017 and Malott and Shane 2014 are good introductions to the empirical work and to basic concepts in the field. Catania 2013 and Mazur 2013 are a bit more advanced. Iversen and Lattal 1991 and Madden 2012 are the most advanced texts and recommended for graduate students in behavior analysis or allied disciplines. Skinner 1953 is recommended to anyone interested in behavior analysis or behaviorism more generally. The book covers basic principles, issues related to understanding feelings, thoughts, the self, and social influences on behavior.

Baum, W. B. 2005. Understanding behaviorism: Behavior, culture, and evolution . 2d ed. Malden, MA: Blackwell.

A useful introduction to the conceptual roots of radical behaviorism. Includes sections on philosophy, the basic elements of an operant account (including verbal behavior), and extensions to relationships, government, and other cultural phenomena.

Catania, A. C. 2013. Learning . 5th ed. Cornwall-on-Hudson, NY: Sloan.

This is one of several common undergraduate textbooks. Covers all of the basic principles and is organized into two sections: one that discusses learning without words and a second that discusses learning with words (verbal behavior).

Iversen, I. H., and K. A. Lattal, eds. 1991. Techniques in the behavioral and neural sciences: Experimental analysis of behavior (Parts 1 and 2). Amsterdam: Elsevier.

This textbook is primarily geared toward graduate students in behavior analysis. The material is more complex than typical undergraduate texts.

Johnston, J. J. 2014. Radical behaviorism for ABA practitioners . New York: Sloan.

This text is an accessible introduction to the core conceptual foundations of radical behaviorism, with an emphasis on the practical consequences of these foundations in applied contexts.

Madden, G. J., ed. 2012. APA handbook of behavior analysis . 2 vols. Washington, DC: American Psychological Association.

This handbook is the most up-to-date and comprehensive account of behavior analysis. Volume 1 covers experimental and research methods in single-subject designs as well as major content areas in the experimental analysis of behavior. Volume 2 focuses entirely on translational research and areas of application.

Malott, R. W., and J. T. Shane. 2014. Principles of behavior . 7th ed. New York: Routledge.

This is another common text for undergraduates.

Mazur, J. E. 2013. Learning and behavior . 7th ed. Upper Saddle River, NJ: Pearson Prentice Hall.

This is another standard text for undergraduate pedagogy. The book provides excellent analyses of the current state of knowledge in the field of learning and provides an interesting overview of unresolved empirical questions.

Pierce, W. D., and C. D. Cheney. 2017. Behavior analysis and learning . 6th ed. New York: Routledge.

Yet another text for undergraduates. Covers all of the basic principles of behavior analysis, including sections on verbal behavior and selectionism at three levels of analysis (biological, behavioral, and cultural).

Schneider, S. M. 2012. The science of consequences: How they affect genes, change the brain, and impact our world . New York: Prometheus.

This is a trade book geared toward a general audience. As such, it is a highly readable starting place for anyone interested in operant behavior and its broad implications.

Skinner, B. F. 1953. Science and human behavior . New York: Macmillan.

This is one of Skinner’s major works. This book is more conceptual in nature, as the database for operant science was still in its infancy. Covers basic principles, including chapters on private behavior, motivation, thinking, the self, and culturally mediated sources of behavioral influence.

back to top

Users without a subscription are not able to see the full content on this page. Please subscribe or login .

Oxford Bibliographies Online is available by subscription and perpetual access to institutions. For more information or to contact an Oxford Sales Representative click here .

  • About Psychology »
  • Meet the Editorial Board »
  • Abnormal Psychology
  • Academic Assessment
  • Acculturation and Health
  • Action Regulation Theory
  • Action Research
  • Addictive Behavior
  • Adolescence
  • Adoption, Social, Psychological, and Evolutionary Perspect...
  • Advanced Theory of Mind
  • Affective Forecasting
  • Affirmative Action
  • Ageism at Work
  • Allport, Gordon
  • Alzheimer’s Disease
  • Ambulatory Assessment in Behavioral Science
  • Analysis of Covariance (ANCOVA)
  • Animal Behavior
  • Anxiety Disorders
  • Art and Aesthetics, Psychology of
  • Artificial Intelligence, Machine Learning, and Psychology
  • Assessment and Clinical Applications of Individual Differe...
  • Attachment in Social and Emotional Development across the ...
  • Attention-Deficit/Hyperactivity Disorder (ADHD) in Adults
  • Attention-Deficit/Hyperactivity Disorder (ADHD) in Childre...
  • Attitudinal Ambivalence
  • Attraction in Close Relationships
  • Attribution Theory
  • Authoritarian Personality
  • Bayesian Statistical Methods in Psychology
  • Behavior Therapy, Rational Emotive
  • Behavioral Economics
  • Behavioral Genetics
  • Belief Perseverance
  • Bereavement and Grief
  • Biological Psychology
  • Birth Order
  • Body Image in Men and Women
  • Bystander Effect
  • Categorical Data Analysis in Psychology
  • Childhood and Adolescence, Peer Victimization and Bullying...
  • Clark, Mamie Phipps
  • Clinical Neuropsychology
  • Clinical Psychology
  • Cognitive Consistency Theories
  • Cognitive Dissonance Theory
  • Cognitive Neuroscience
  • Communication, Nonverbal Cues and
  • Comparative Psychology
  • Competence to Stand Trial: Restoration Services
  • Competency to Stand Trial
  • Computational Psychology
  • Conflict Management in the Workplace
  • Conformity, Compliance, and Obedience
  • Consciousness
  • Coping Processes
  • Correspondence Analysis in Psychology
  • Counseling Psychology
  • Creativity at Work
  • Critical Thinking
  • Cross-Cultural Psychology
  • Cultural Psychology
  • Daily Life, Research Methods for Studying
  • Data Science Methods for Psychology
  • Data Sharing in Psychology
  • Death and Dying
  • Deceiving and Detecting Deceit
  • Defensive Processes
  • Depressive Disorders
  • Development, Prenatal
  • Developmental Psychology (Cognitive)
  • Developmental Psychology (Social)
  • Diagnostic and Statistical Manual of Mental Disorders (DSM...
  • Discrimination
  • Dissociative Disorders
  • Drugs and Behavior
  • Eating Disorders
  • Ecological Psychology
  • Educational Settings, Assessment of Thinking in
  • Effect Size
  • Embodiment and Embodied Cognition
  • Emerging Adulthood
  • Emotional Intelligence
  • Empathy and Altruism
  • Employee Stress and Well-Being
  • Environmental Neuroscience and Environmental Psychology
  • Ethics in Psychological Practice
  • Event Perception
  • Evolutionary Psychology
  • Expansive Posture
  • Experimental Existential Psychology
  • Exploratory Data Analysis
  • Eyewitness Testimony
  • Eysenck, Hans
  • Factor Analysis
  • Festinger, Leon
  • Five-Factor Model of Personality
  • Flynn Effect, The
  • Forensic Psychology
  • Forgiveness
  • Friendships, Children's
  • Fundamental Attribution Error/Correspondence Bias
  • Gambler's Fallacy
  • Game Theory and Psychology
  • Geropsychology, Clinical
  • Global Mental Health
  • Habit Formation and Behavior Change
  • Health Psychology
  • Health Psychology Research and Practice, Measurement in
  • Heider, Fritz
  • Heuristics and Biases
  • History of Psychology
  • Human Factors
  • Humanistic Psychology
  • Implicit Association Test (IAT)
  • Industrial and Organizational Psychology
  • Inferential Statistics in Psychology
  • Insanity Defense, The
  • Intelligence
  • Intelligence, Crystallized and Fluid
  • Intercultural Psychology
  • Intergroup Conflict
  • International Classification of Diseases and Related Healt...
  • International Psychology
  • Interviewing in Forensic Settings
  • Intimate Partner Violence, Psychological Perspectives on
  • Introversion–Extraversion
  • Item Response Theory
  • Law, Psychology and
  • Lazarus, Richard
  • Learned Helplessness
  • Learning Theory
  • Learning versus Performance
  • LGBTQ+ Romantic Relationships
  • Lie Detection in a Forensic Context
  • Life-Span Development
  • Locus of Control
  • Loneliness and Health
  • Mathematical Psychology
  • Meaning in Life
  • Mechanisms and Processes of Peer Contagion
  • Media Violence, Psychological Perspectives on
  • Mediation Analysis
  • Memories, Autobiographical
  • Memories, Flashbulb
  • Memories, Repressed and Recovered
  • Memory, False
  • Memory, Human
  • Memory, Implicit versus Explicit
  • Memory in Educational Settings
  • Memory, Semantic
  • Meta-Analysis
  • Metacognition
  • Metaphor, Psychological Perspectives on
  • Microaggressions
  • Military Psychology
  • Mindfulness
  • Mindfulness and Education
  • Minnesota Multiphasic Personality Inventory (MMPI)
  • Money, Psychology of
  • Moral Conviction
  • Moral Development
  • Moral Psychology
  • Moral Reasoning
  • Nature versus Nurture Debate in Psychology
  • Nonergodicity in Psychology and Neuroscience
  • Nonparametric Statistical Analysis in Psychology
  • Observational (Non-Randomized) Studies
  • Obsessive-Complusive Disorder (OCD)
  • Occupational Health Psychology
  • Olfaction, Human
  • Operant Conditioning
  • Optimism and Pessimism
  • Organizational Justice
  • Parenting Stress
  • Parenting Styles
  • Parents' Beliefs about Children
  • Path Models
  • Peace Psychology
  • Perception, Person
  • Performance Appraisal
  • Personality and Health
  • Personality Disorders
  • Personality Psychology
  • Phenomenological Psychology
  • Placebo Effects in Psychology
  • Play Behavior
  • Positive Psychological Capital (PsyCap)
  • Positive Psychology
  • Posttraumatic Stress Disorder (PTSD)
  • Prejudice and Stereotyping
  • Pretrial Publicity
  • Prisoner's Dilemma
  • Problem Solving and Decision Making
  • Procrastination
  • Prosocial Behavior
  • Prosocial Spending and Well-Being
  • Protocol Analysis
  • Psycholinguistics
  • Psychological Literacy
  • Psychological Perspectives on Food and Eating
  • Psychology, Political
  • Psychoneuroimmunology
  • Psychophysics, Visual
  • Psychotherapy
  • Psychotic Disorders
  • Publication Bias in Psychology
  • Reasoning, Counterfactual
  • Rehabilitation Psychology
  • Relationships
  • Reliability–Contemporary Psychometric Conceptions
  • Religion, Psychology and
  • Replication Initiatives in Psychology
  • Research Methods
  • Risk Taking
  • Role of the Expert Witness in Forensic Psychology, The
  • Sample Size Planning for Statistical Power and Accurate Es...
  • Schizophrenic Disorders
  • School Psychology
  • School Psychology, Counseling Services in
  • Self, Gender and
  • Self, Psychology of the
  • Self-Construal
  • Self-Control
  • Self-Deception
  • Self-Determination Theory
  • Self-Efficacy
  • Self-Esteem
  • Self-Monitoring
  • Self-Regulation in Educational Settings
  • Self-Report Tests, Measures, and Inventories in Clinical P...
  • Sensation Seeking
  • Sex and Gender
  • Sexual Minority Parenting
  • Sexual Orientation
  • Signal Detection Theory and its Applications
  • Simpson's Paradox in Psychology
  • Single People
  • Single-Case Experimental Designs
  • Skinner, B.F.
  • Sleep and Dreaming
  • Small Groups
  • Social Class and Social Status
  • Social Cognition
  • Social Neuroscience
  • Social Support
  • Social Touch and Massage Therapy Research
  • Somatoform Disorders
  • Spatial Attention
  • Sports Psychology
  • Stanford Prison Experiment (SPE): Icon and Controversy
  • Stereotype Threat
  • Stereotypes
  • Stress and Coping, Psychology of
  • Student Success in College
  • Subjective Wellbeing Homeostasis
  • Taste, Psychological Perspectives on
  • Teaching of Psychology
  • Terror Management Theory
  • Testing and Assessment
  • The Concept of Validity in Psychological Assessment
  • The Neuroscience of Emotion Regulation
  • The Reasoned Action Approach and the Theories of Reasoned ...
  • The Weapon Focus Effect in Eyewitness Memory
  • Theory of Mind
  • Therapies, Person-Centered
  • Therapy, Cognitive-Behavioral
  • Thinking Skills in Educational Settings
  • Time Perception
  • Trait Perspective
  • Trauma Psychology
  • Twin Studies
  • Type A Behavior Pattern (Coronary Prone Personality)
  • Unconscious Processes
  • Video Games and Violent Content
  • Virtues and Character Strengths
  • Women and Science, Technology, Engineering, and Math (STEM...
  • Women, Psychology of
  • Work Well-Being
  • Wundt, Wilhelm
  • Privacy Policy
  • Cookie Policy
  • Legal Notice
  • Accessibility

Powered by:

  • [66.249.64.20|185.80.150.64]
  • 185.80.150.64

6.3 Operant Conditioning

Learning objectives.

By the end of this section, you will be able to:

  • Define operant conditioning
  • Explain the difference between reinforcement and punishment
  • Distinguish between reinforcement schedules

The previous section of this chapter focused on the type of associative learning known as classical conditioning. Remember that in classical conditioning, something in the environment triggers a reflex automatically, and researchers train the organism to react to a different stimulus. Now we turn to the second type of associative learning, operant conditioning . In operant conditioning, organisms learn to associate a behavior and its consequence ( Table 6.1 ). A pleasant consequence makes that behavior more likely to be repeated in the future. For example, Spirit, a dolphin at the National Aquarium in Baltimore, does a flip in the air when her trainer blows a whistle. The consequence is that she gets a fish.

Psychologist B. F. Skinner saw that classical conditioning is limited to existing behaviors that are reflexively elicited, and it doesn’t account for new behaviors such as riding a bike. He proposed a theory about how such behaviors come about. Skinner believed that behavior is motivated by the consequences we receive for the behavior: the reinforcements and punishments. His idea that learning is the result of consequences is based on the law of effect, which was first proposed by psychologist Edward Thorndike . According to the law of effect , behaviors that are followed by consequences that are satisfying to the organism are more likely to be repeated, and behaviors that are followed by unpleasant consequences are less likely to be repeated (Thorndike, 1911). Essentially, if an organism does something that brings about a desired result, the organism is more likely to do it again. If an organism does something that does not bring about a desired result, the organism is less likely to do it again. An example of the law of effect is in employment. One of the reasons (and often the main reason) we show up for work is because we get paid to do so. If we stop getting paid, we will likely stop showing up—even if we love our job.

Working with Thorndike’s law of effect as his foundation, Skinner began conducting scientific experiments on animals (mainly rats and pigeons) to determine how organisms learn through operant conditioning (Skinner, 1938). He placed these animals inside an operant conditioning chamber, which has come to be known as a “Skinner box” ( Figure 6.10 ). A Skinner box contains a lever (for rats) or disk (for pigeons) that the animal can press or peck for a food reward via the dispenser. Speakers and lights can be associated with certain behaviors. A recorder counts the number of responses made by the animal.

Link to Learning

Watch this brief video to see Skinner's interview and a demonstration of operant conditioning of pigeons to learn more.

In discussing operant conditioning, we use several everyday words—positive, negative, reinforcement, and punishment—in a specialized manner. In operant conditioning, positive and negative do not mean good and bad. Instead, positive means you are adding something, and negative means you are taking something away. Reinforcement means you are increasing a behavior, and punishment means you are decreasing a behavior. Reinforcement can be positive or negative, and punishment can also be positive or negative. All reinforcers (positive or negative) increase the likelihood of a behavioral response. All punishers (positive or negative) decrease the likelihood of a behavioral response. Now let’s combine these four terms: positive reinforcement, negative reinforcement, positive punishment, and negative punishment ( Table 6.2 ).

Reinforcement

The most effective way to teach a person or animal a new behavior is with positive reinforcement. In positive reinforcement , a desirable stimulus is added to increase a behavior.

For example, you tell your five-year-old son, Jerome, that if he cleans his room, he will get a toy. Jerome quickly cleans his room because he wants a new art set. Let’s pause for a moment. Some people might say, “Why should I reward my child for doing what is expected?” But in fact we are constantly and consistently rewarded in our lives. Our paychecks are rewards, as are high grades and acceptance into our preferred school. Being praised for doing a good job and for passing a driver’s test is also a reward. Positive reinforcement as a learning tool is extremely effective. It has been found that one of the most effective ways to increase achievement in school districts with below-average reading scores was to pay the children to read. Specifically, second-grade students in Dallas were paid $2 each time they read a book and passed a short quiz about the book. The result was a significant increase in reading comprehension (Fryer, 2010). What do you think about this program? If Skinner were alive today, he would probably think this was a great idea. He was a strong proponent of using operant conditioning principles to influence students’ behavior at school. In fact, in addition to the Skinner box, he also invented what he called a teaching machine that was designed to reward small steps in learning (Skinner, 1961)—an early forerunner of computer-assisted learning. His teaching machine tested students’ knowledge as they worked through various school subjects. If students answered questions correctly, they received immediate positive reinforcement and could continue; if they answered incorrectly, they did not receive any reinforcement. The idea was that students would spend additional time studying the material to increase their chance of being reinforced the next time (Skinner, 1961).

In negative reinforcement , an undesirable stimulus is removed to increase a behavior. For example, car manufacturers use the principles of negative reinforcement in their seatbelt systems, which go “beep, beep, beep” until you fasten your seatbelt. The annoying sound stops when you exhibit the desired behavior, increasing the likelihood that you will buckle up in the future. Negative reinforcement is also used frequently in horse training. Riders apply pressure—by pulling the reins or squeezing their legs—and then remove the pressure when the horse performs the desired behavior, such as turning or speeding up. The pressure is the negative stimulus that the horse wants to remove.

Many people confuse negative reinforcement with punishment in operant conditioning, but they are two very different mechanisms. Remember that reinforcement, even when it is negative, always increases a behavior. In contrast, punishment always decreases a behavior. In positive punishment , you add an undesirable stimulus to decrease a behavior. An example of positive punishment is scolding a student to get the student to stop texting in class. In this case, a stimulus (the reprimand) is added in order to decrease the behavior (texting in class). In negative punishment , you remove a pleasant stimulus to decrease behavior. For example, when a child misbehaves, a parent can take away a favorite toy. In this case, a stimulus (the toy) is removed in order to decrease the behavior.

Punishment, especially when it is immediate, is one way to decrease undesirable behavior. For example, imagine your five-year-old son, Brandon, runs out into the street to chase a ball. You have Brandon write 100 times “I will not run into the street" (positive punishment). Chances are he won’t repeat this behavior. While strategies like this are common today, in the past children were often subject to physical punishment, such as spanking. It’s important to be aware of some of the drawbacks in using physical punishment on children. First, punishment may teach fear. Brandon may become fearful of the street, but he also may become fearful of the person who delivered the punishment—you, his parent. Similarly, children who are punished by teachers may come to fear the teacher and try to avoid school (Gershoff et al., 2010). Consequently, most schools in the United States have banned corporal punishment. Second, punishment may cause children to become more aggressive and prone to antisocial behavior and delinquency (Gershoff, 2002). They see their parents resort to spanking when they become angry and frustrated, so, in turn, they may act out this same behavior when they become angry and frustrated. For example, if you spank your child when you are angry with them for their misbehavior, they might start hitting their friends when they won’t share their toys.

While positive punishment can be effective in some cases, Skinner suggested that the use of punishment should be weighed against the possible negative effects. Today’s psychologists and parenting experts favor reinforcement over punishment—they recommend that you catch your child doing something good and reward them for it.

In his operant conditioning experiments, Skinner often used an approach called shaping. Instead of rewarding only the target behavior, in shaping , we reward successive approximations of a target behavior. Why is shaping needed? Remember that in order for reinforcement to work, the organism must first display the behavior. Shaping is needed because it is extremely unlikely that an organism will display anything but the simplest of behaviors spontaneously. In shaping, behaviors are broken down into many small, achievable steps. The specific steps used in the process are the following:

  • Reinforce any response that resembles the desired behavior.
  • Then reinforce the response that more closely resembles the desired behavior. You will no longer reinforce the previously reinforced response.
  • Next, begin to reinforce the response that even more closely resembles the desired behavior.
  • Continue to reinforce closer and closer approximations of the desired behavior.
  • Finally, only reinforce the desired behavior.

Shaping is often used in teaching a complex behavior or chain of behaviors. Skinner used shaping to teach pigeons not only such relatively simple behaviors as pecking a disk in a Skinner box, but also many unusual and entertaining behaviors, such as turning in circles, walking in figure eights, and even playing ping pong; the technique is commonly used by animal trainers today. An important part of shaping is stimulus discrimination. Recall Pavlov’s dogs—he trained them to respond to the tone of a bell, and not to similar tones or sounds. This discrimination is also important in operant conditioning and in shaping behavior.

Watch this brief video of Skinner's pigeons playing ping pong to learn more.

It’s easy to see how shaping is effective in teaching behaviors to animals, but how does shaping work with humans? Let’s consider parents whose goal is to have their child learn to clean his room. They use shaping to help him master steps toward the goal. Instead of performing the entire task, they set up these steps and reinforce each step. First, he cleans up one toy. Second, he cleans up five toys. Third, he chooses whether to pick up ten toys or put his books and clothes away. Fourth, he cleans up everything except two toys. Finally, he cleans his entire room.

Primary and Secondary Reinforcers

Rewards such as stickers, praise, money, toys, and more can be used to reinforce learning. Let’s go back to Skinner’s rats again. How did the rats learn to press the lever in the Skinner box? They were rewarded with food each time they pressed the lever. For animals, food would be an obvious reinforcer.

What would be a good reinforcer for humans? For your child cleaning the room, it was the promise of a toy. How about Sydney, the soccer player? If you gave Sydney a piece of candy every time Sydney scored a goal, you would be using a primary reinforcer . Primary reinforcers are reinforcers that have innate reinforcing qualities. These kinds of reinforcers are not learned. Water, food, sleep, shelter, sex, and touch, among others, are primary reinforcers. Pleasure is also a primary reinforcer. Organisms do not lose their drive for these things. For most people, jumping in a cool lake on a very hot day would be reinforcing and the cool lake would be innately reinforcing—the water would cool the person off (a physical need), as well as provide pleasure.

A secondary reinforcer has no inherent value and only has reinforcing qualities when linked with a primary reinforcer. Praise, linked to affection, is one example of a secondary reinforcer, as when you called out “Great shot!” every time Sydney made a goal. Another example, money, is only worth something when you can use it to buy other things—either things that satisfy basic needs (food, water, shelter—all primary reinforcers) or other secondary reinforcers. If you were on a remote island in the middle of the Pacific Ocean and you had stacks of money, the money would not be useful if you could not spend it. What about the stickers on the behavior chart? They also are secondary reinforcers.

Sometimes, instead of stickers on a sticker chart, a token is used. Tokens, which are also secondary reinforcers, can then be traded in for rewards and prizes. Entire behavior management systems, known as token economies, are built around the use of these kinds of token reinforcers. Token economies have been found to be very effective at modifying behavior in a variety of settings such as schools, prisons, and mental hospitals. For example, a study by Adibsereshki and Abkenar (2014) found that use of a token economy increased appropriate social behaviors and reduced inappropriate behaviors in a group of eight grade students. Similar studies show demonstrable gains on behavior and academic achievement for groups ranging from first grade to high school, and representing a wide array of abilities and disabilities. For example, during studies involving younger students, when children in the study exhibited appropriate behavior (not hitting or pinching), they received a “quiet hands” token. When they hit or pinched, they lost a token. The children could then exchange specified amounts of tokens for minutes of playtime.

Everyday Connection

Behavior modification in children.

Parents and teachers often use behavior modification to change a child’s behavior. Behavior modification uses the principles of operant conditioning to accomplish behavior change so that undesirable behaviors are switched for more socially acceptable ones. Some teachers and parents create a sticker chart, in which several behaviors are listed ( Figure 6.11 ). Sticker charts are a form of token economies, as described in the text. Each time children perform the behavior, they get a sticker, and after a certain number of stickers, they get a prize, or reinforcer. The goal is to increase acceptable behaviors and decrease misbehavior. Remember, it is best to reinforce desired behaviors, rather than to use punishment. In the classroom, the teacher can reinforce a wide range of behaviors, from students raising their hands, to walking quietly in the hall, to turning in their homework. At home, parents might create a behavior chart that rewards children for things such as putting away toys, brushing their teeth, and helping with dinner. In order for behavior modification to be effective, the reinforcement needs to be connected with the behavior; the reinforcement must matter to the child and be done consistently.

Time-out is another popular technique used in behavior modification with children. It operates on the principle of negative punishment. When a child demonstrates an undesirable behavior, they are removed from the desirable activity at hand ( Figure 6.12 ). For example, say that Sophia and her brother Mario are playing with building blocks. Sophia throws some blocks at her brother, so you give her a warning that she will go to time-out if she does it again. A few minutes later, she throws more blocks at Mario. You remove Sophia from the room for a few minutes. When she comes back, she doesn’t throw blocks.

There are several important points that you should know if you plan to implement time-out as a behavior modification technique. First, make sure the child is being removed from a desirable activity and placed in a less desirable location. If the activity is something undesirable for the child, this technique will backfire because it is more enjoyable for the child to be removed from the activity. Second, the length of the time-out is important. The general rule of thumb is one minute for each year of the child’s age. Sophia is five; therefore, she sits in a time-out for five minutes. Setting a timer helps children know how long they have to sit in time-out. Finally, as a caregiver, keep several guidelines in mind over the course of a time-out: remain calm when directing your child to time-out; ignore your child during time-out (because caregiver attention may reinforce misbehavior); and give the child a hug or a kind word when time-out is over.

Reinforcement Schedules

Remember, the best way to teach a person or animal a behavior is to use positive reinforcement. For example, Skinner used positive reinforcement to teach rats to press a lever in a Skinner box. At first, the rat might randomly hit the lever while exploring the box, and out would come a pellet of food. After eating the pellet, what do you think the hungry rat did next? It hit the lever again, and received another pellet of food. Each time the rat hit the lever, a pellet of food came out. When an organism receives a reinforcer each time it displays a behavior, it is called continuous reinforcement . This reinforcement schedule is the quickest way to teach someone a behavior, and it is especially effective in training a new behavior. Let’s look back at the dog that was learning to sit earlier in the chapter. Now, each time he sits, you give him a treat. Timing is important here: you will be most successful if you present the reinforcer immediately after he sits, so that he can make an association between the target behavior (sitting) and the consequence (getting a treat).

Watch this video clip of veterinarian Dr. Sophia Yin shaping a dog's behavior using the steps outlined above to learn more.

Once a behavior is trained, researchers and trainers often turn to another type of reinforcement schedule—partial reinforcement. In partial reinforcement , also referred to as intermittent reinforcement, the person or animal does not get reinforced every time they perform the desired behavior. There are several different types of partial reinforcement schedules ( Table 6.3 ). These schedules are described as either fixed or variable, and as either interval or ratio. Fixed refers to the number of responses between reinforcements, or the amount of time between reinforcements, which is set and unchanging. Variable refers to the number of responses or amount of time between reinforcements, which varies or changes. Interval means the schedule is based on the time between reinforcements, and ratio means the schedule is based on the number of responses between reinforcements.

Now let’s combine these four terms. A fixed interval reinforcement schedule is when behavior is rewarded after a set amount of time. For example, June undergoes major surgery in a hospital. During recovery, they are expected to experience pain and will require prescription medications for pain relief. June is given an IV drip with a patient-controlled painkiller. Their doctor sets a limit: one dose per hour. June pushes a button when pain becomes difficult, and they receive a dose of medication. Since the reward (pain relief) only occurs on a fixed interval, there is no point in exhibiting the behavior when it will not be rewarded.

With a variable interval reinforcement schedule , the person or animal gets the reinforcement based on varying amounts of time, which are unpredictable. Say that Manuel is the manager at a fast-food restaurant. Every once in a while someone from the quality control division comes to Manuel’s restaurant. If the restaurant is clean and the service is fast, everyone on that shift earns a $20 bonus. Manuel never knows when the quality control person will show up, so he always tries to keep the restaurant clean and ensures that his employees provide prompt and courteous service. His productivity regarding prompt service and keeping a clean restaurant are steady because he wants his crew to earn the bonus.

With a fixed ratio reinforcement schedule , there are a set number of responses that must occur before the behavior is rewarded. Carla sells glasses at an eyeglass store, and she earns a commission every time she sells a pair of glasses. She always tries to sell people more pairs of glasses, including prescription sunglasses or a backup pair, so she can increase her commission. She does not care if the person really needs the prescription sunglasses, Carla just wants her bonus. The quality of what Carla sells does not matter because her commission is not based on quality; it’s only based on the number of pairs sold. This distinction in the quality of performance can help determine which reinforcement method is most appropriate for a particular situation. Fixed ratios are better suited to optimize the quantity of output, whereas a fixed interval, in which the reward is not quantity based, can lead to a higher quality of output.

In a variable ratio reinforcement schedule , the number of responses needed for a reward varies. This is the most powerful partial reinforcement schedule. An example of the variable ratio reinforcement schedule is gambling. Imagine that Sarah—generally a smart, thrifty woman—visits Las Vegas for the first time. She is not a gambler, but out of curiosity she puts a quarter into the slot machine, and then another, and another. Nothing happens. Two dollars in quarters later, her curiosity is fading, and she is just about to quit. But then, the machine lights up, bells go off, and Sarah gets 50 quarters back. That’s more like it! Sarah gets back to inserting quarters with renewed interest, and a few minutes later she has used up all her gains and is $10 in the hole. Now might be a sensible time to quit. And yet, she keeps putting money into the slot machine because she never knows when the next reinforcement is coming. She keeps thinking that with the next quarter she could win $50, or $100, or even more. Because the reinforcement schedule in most types of gambling has a variable ratio schedule, people keep trying and hoping that the next time they will win big. This is one of the reasons that gambling is so addictive—and so resistant to extinction.

In operant conditioning, extinction of a reinforced behavior occurs at some point after reinforcement stops, and the speed at which this happens depends on the reinforcement schedule. In a variable ratio schedule, the point of extinction comes very slowly, as described above. But in the other reinforcement schedules, extinction may come quickly. For example, if June presses the button for the pain relief medication before the allotted time the doctor has approved, no medication is administered. They are on a fixed interval reinforcement schedule (dosed hourly), so extinction occurs quickly when reinforcement doesn’t come at the expected time. Among the reinforcement schedules, variable ratio is the most productive and the most resistant to extinction. Fixed interval is the least productive and the easiest to extinguish ( Figure 6.13 ).

Connect the Concepts

Gambling and the brain.

Skinner (1953) stated, “If the gambling establishment cannot persuade a patron to turn over money with no return, it may achieve the same effect by returning part of the patron's money on a variable-ratio schedule” (p. 397).

Skinner uses gambling as an example of the power of the variable-ratio reinforcement schedule for maintaining behavior even during long periods without any reinforcement. In fact, Skinner was so confident in his knowledge of gambling addiction that he even claimed he could turn a pigeon into a pathological gambler (“Skinner’s Utopia,” 1971). It is indeed true that variable-ratio schedules keep behavior quite persistent—just imagine the frequency of a child’s tantrums if a parent gives in even once to the behavior. The occasional reward makes it almost impossible to stop the behavior.

Recent research in rats has failed to support Skinner’s idea that training on variable-ratio schedules alone causes pathological gambling (Laskowski et al., 2019). However, other research suggests that gambling does seem to work on the brain in the same way as most addictive drugs, and so there may be some combination of brain chemistry and reinforcement schedule that could lead to problem gambling ( Figure 6.14 ). Specifically, modern research shows the connection between gambling and the activation of the reward centers of the brain that use the neurotransmitter (brain chemical) dopamine (Murch & Clark, 2016). Interestingly, gamblers don’t even have to win to experience the “rush” of dopamine in the brain. “Near misses,” or almost winning but not actually winning, also have been shown to increase activity in the ventral striatum and other brain reward centers that use dopamine (Chase & Clark, 2010). These brain effects are almost identical to those produced by addictive drugs like cocaine and heroin (Murch & Clark, 2016). Based on the neuroscientific evidence showing these similarities, the DSM-5 now considers gambling an addiction, while earlier versions of the DSM classified gambling as an impulse control disorder.

In addition to dopamine, gambling also appears to involve other neurotransmitters, including norepinephrine and serotonin (Potenza, 2013). Norepinephrine is secreted when a person feels stress, arousal, or thrill. It may be that pathological gamblers use gambling to increase their levels of this neurotransmitter. Deficiencies in serotonin might also contribute to compulsive behavior, including a gambling addiction (Potenza, 2013).

It may be that pathological gamblers’ brains are different than those of other people, and perhaps this difference may somehow have led to their gambling addiction, as these studies seem to suggest. However, it is very difficult to ascertain the cause because it is impossible to conduct a true experiment (it would be unethical to try to turn randomly assigned participants into problem gamblers). Therefore, it may be that causation actually moves in the opposite direction—perhaps the act of gambling somehow changes neurotransmitter levels in some gamblers’ brains. It also is possible that some overlooked factor, or confounding variable, played a role in both the gambling addiction and the differences in brain chemistry.

Cognition and Latent Learning

Strict behaviorists like Watson and Skinner focused exclusively on studying behavior rather than cognition (such as thoughts and expectations). In fact, Skinner was such a staunch believer that cognition didn't matter that his ideas were considered radical behaviorism . Skinner considered the mind a "black box"—something completely unknowable—and, therefore, something not to be studied. However, another behaviorist, Edward C. Tolman, had a different opinion. Tolman’s experiments with rats demonstrated that organisms can learn even if they do not receive immediate reinforcement (Tolman & Honzik, 1930; Tolman, Ritchie, & Kalish, 1946). This finding was in conflict with the prevailing idea at the time that reinforcement must be immediate in order for learning to occur, thus suggesting a cognitive aspect to learning.

In the experiments, Tolman placed hungry rats in a maze with no reward for finding their way through it. He also studied a comparison group that was rewarded with food at the end of the maze. As the unreinforced rats explored the maze, they developed a cognitive map : a mental picture of the layout of the maze ( Figure 6.15 ). After 10 sessions in the maze without reinforcement, food was placed in a goal box at the end of the maze. As soon as the rats became aware of the food, they were able to find their way through the maze quickly, just as quickly as the comparison group, which had been rewarded with food all along. This is known as latent learning : learning that occurs but is not observable in behavior until there is a reason to demonstrate it.

Latent learning also occurs in humans. Children may learn by watching the actions of their parents but only demonstrate it at a later date, when the learned material is needed. For example, suppose that Ravi’s dad drives him to school every day. In this way, Ravi learns the route from his house to his school, but he’s never driven there himself, so he has not had a chance to demonstrate that he’s learned the way. One morning Ravi’s dad has to leave early for a meeting, so he can’t drive Ravi to school. Instead, Ravi follows the same route on his bike that his dad would have taken in the car. This demonstrates latent learning. Ravi had learned the route to school, but had no need to demonstrate this knowledge earlier.

This Place Is Like a Maze

Have you ever gotten lost in a building and couldn’t find your way back out? While that can be frustrating, you’re not alone. At one time or another we’ve all gotten lost in places like a museum, hospital, or university library. Whenever we go someplace new, we build a mental representation—or cognitive map—of the location, as Tolman’s rats built a cognitive map of their maze. However, some buildings are confusing because they include many areas that look alike or have short lines of sight. Because of this, it’s often difficult to predict what’s around a corner or decide whether to turn left or right to get out of a building. Psychologist Laura Carlson (2010) suggests that what we place in our cognitive map can impact our success in navigating through the environment. She suggests that paying attention to specific features upon entering a building, such as a picture on the wall, a fountain, a statue, or an escalator, adds information to our cognitive map that can be used later to help find our way out of the building.

Watch this video about Carlson's studies on cognitive maps and navigation in buildings to learn more.

As an Amazon Associate we earn from qualifying purchases.

This book may not be used in the training of large language models or otherwise be ingested into large language models or generative AI offerings without OpenStax's permission.

Want to cite, share, or modify this book? This book uses the Creative Commons Attribution License and you must attribute OpenStax.

Access for free at https://openstax.org/books/psychology-2e/pages/1-introduction
  • Authors: Rose M. Spielman, William J. Jenkins, Marilyn D. Lovett
  • Publisher/website: OpenStax
  • Book title: Psychology 2e
  • Publication date: Apr 22, 2020
  • Location: Houston, Texas
  • Book URL: https://openstax.org/books/psychology-2e/pages/1-introduction
  • Section URL: https://openstax.org/books/psychology-2e/pages/6-3-operant-conditioning

© Jan 6, 2024 OpenStax. Textbook content produced by OpenStax is licensed under a Creative Commons Attribution License . The OpenStax name, OpenStax logo, OpenStax book covers, OpenStax CNX name, and OpenStax CNX logo are not subject to the Creative Commons license and may not be reproduced without the prior and express written consent of Rice University.

Book cover

Encyclopedia of Autism Spectrum Disorders pp 2087–2088 Cite as

Operant Conditioning

  • Dorrey Sproatt 2 &
  • Anahita Navab 3  
  • Reference work entry

1873 Accesses

1 Citations

16 Altmetric

Instrumental conditioning ; Instrumental learning

A process of learning in which a behavior’s consequence affects the future occurrence of that behavior. B. F. Skinner ( 1953 ) derived the principles of operant conditioning from Thorndike’s “law of effect,” which suggests that a behavior producing a favorable or satisfying outcome is more likely to reoccur, while a behavior producing an unfavorable or discomforting outcome is more likely to decrease in frequency (Thorndike, 1911 ).

Skinner’s experimental work focused on the effects of different schedules on the rates of operant responses made by rats and pigeons (Skinner, 1953 ). His work revealed that the frequency of a behavior could be increased through reinforcement. Two types of reinforcement include positive reinforcement, the giving of a rewarding stimulus following a behavior, and negative reinforcement, the removal of an aversive stimulus following a behavior.

Similarly, the frequency of a behavior can be...

This is a preview of subscription content, log in via an institution .

Buying options

  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
  • Available as EPUB and PDF

Tax calculation will be finalised at checkout

Purchases are for personal use only

References and Readings

Hewett, F. M. (1965). Teaching speech to an autistic child through operant conditioning. The American Journal of Orthopsychiatry, 35 (5), 927–936. doi:10.1111/j.1939- 0025.1965.tb00472.x.

PubMed   Google Scholar  

Lovaas, O. I. (1987). Behavioral treatment and normal educational and intellectual functioning in young autistic children. Journal of Consulting and Clinical Psychology, 55 (1), 3–9.

Lovaas, O. I., Berberich, J. P., Perloff, B. F., & Schaeffer, B. (1966). Acquisition of imitative speech by schizophrenic children. American Association for the Advancement of Science, 151 (3711), 705–707. doi:10.1126/science.151.3711.705.

Google Scholar  

Skinner, B. F. (1953). Science and human behavior . New York: Macmillan.

Thorndike, E. L. (1911). Animal intelligence . New York: Macmillan.

Download references

Author information

Authors and affiliations.

Psychological Studies in Education, University of California, Los Angeles, CA, USA

Dr. Dorrey Sproatt

Department of Psychology, University of California, Los Angeles, CA, USA

Dr. Anahita Navab

You can also search for this author in PubMed   Google Scholar

Editor information

Editors and affiliations.

Irving B. Harris Professor of Child Psychiatry, Pediatrics and Psychology Yale University School of Medicine, Chief, Child Psychiatry Children's Hospital at Yale-New Haven Child Study Center, New Haven, CT, USA

Fred R. Volkmar ( Director, Child Study Center ) ( Director, Child Study Center )

Rights and permissions

Reprints and permissions

Copyright information

© 2013 Springer Science+Business Media New York

About this entry

Cite this entry.

Sproatt, D., Navab, A. (2013). Operant Conditioning. In: Volkmar, F.R. (eds) Encyclopedia of Autism Spectrum Disorders. Springer, New York, NY. https://doi.org/10.1007/978-1-4419-1698-3_127

Download citation

DOI : https://doi.org/10.1007/978-1-4419-1698-3_127

Publisher Name : Springer, New York, NY

Print ISBN : 978-1-4419-1697-6

Online ISBN : 978-1-4419-1698-3

eBook Packages : Behavioral Science

Share this entry

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

  • Publish with us

Policies and ethics

  • Find a journal
  • Track your research

Operant Conditioning: Definition, Basic Principles & Applications

Imagine training a dog to do tricks, teaching a child good manners, or even learning a new skill. What’s the secret to shaping behavior in all these scenarios? It’s called operant conditioning, a fascinating concept that wields the power to influence our actions, often without us even realizing it.

In this article, we’ll explore the ins and outs of this intriguing psychological phenomenon and how it impacts our daily lives.

Table of Contents

Key Takeaways

What is operant conditioning, pioneers in the field, basic principles of operant conditioning, abcs of operant conditioning, applications of operant conditioning, criticisms and controversies.

  • Operant conditioning is a learning process based on behavioral consequences.
  • Reinforcement and punishment play key roles in shaping our actions.
  • Applications of operant conditioning include education, parenting, and therapy.

Operant conditioning is a psychological theory of learning that focuses on how behavior is shaped through the consequences that follow it. In this type of learning, individuals learn to associate their actions with either positive or negative outcomes.

Positive outcomes, such as rewards or reinforcement, tend to strengthen or increase the likelihood of a behavior occurring again. On the other hand, negative outcomes, such as punishments or removal of rewards, tend to weaken or decrease the likelihood of a behavior being repeated.

Two prominent figures in the development of operant conditioning were B.F. Skinner and Edward Thorndike.

  • B.F. Skinner (1904-1990): Skinner, an American psychologist, made significant contributions to this field in the mid-20th century. He introduced the concept of the “Skinner box,” a controlled environment used to study animal behavior. Skinner’s research laid the foundation for understanding how consequences shape behavior. He demonstrated that by providing rewards or punishments, one could effectively modify an individual’s actions.
  • Edward Thorndike (1874-1949): Thorndike, another American psychologist, made early strides in operant conditioning. He formulated the “law of effect,” asserting that behaviors followed by satisfying outcomes are more likely to be repeated. His work with puzzle boxes and cats offered insights into the shaping of animal behavior through rewards.

Positive Reinforcement

In operant conditioning, positive reinforcement is about adding something desirable to increase the likelihood of a behavior being repeated. Think of it as a reward. For example, if you praise a child for doing their homework, they are more likely to continue doing it. Positive reinforcement strengthens the connection between the behavior and the reward.

Negative Reinforcement

Negative reinforcement involves removing or avoiding something unpleasant to increase the likelihood of a behavior happening again. It’s not about punishment; it’s about relief. A common example is taking pain medication to relieve a headache. This makes you more likely to take the medication when you have a headache in the future.

Punishment, as a concept in operant conditioning, is about introducing something unpleasant to decrease the likelihood of a behavior. For instance, a teacher giving detention for disruptive behavior aims to discourage future disruptions. It’s important to note that punishment can have unintended side effects, so it should be used carefully.

Extinction occurs when a previously reinforced behavior no longer receives reinforcement, causing the behavior to decrease or disappear over time. Imagine you stop responding to a child’s tantrums. Eventually, the child may stop throwing tantrums because the behavior is no longer effective in getting your attention.

In operant conditioning, there are three fundamental components that play crucial roles: Antecedent, Behavior, and Consequence. These elements are essential to understand how this psychological phenomenon operates.

  • Antecedent The “A” in the ABCs stands for Antecedent. It’s what happens before the behavior you’re interested in. Antecedents are the cues or triggers that signal to an individual that a certain behavior is expected or may result in a specific outcome. They can be environmental stimuli, words, or even an internal thought or feeling. For example, a teacher saying, “ Please take out your textbooks, ” is an antecedent that signals to students to start reading.
  • Behavior The “B” represents Behavior, which is the action or response that an individual performs in response to the antecedent. Behavior can be as simple as raising your hand in class, or it can be more complex, like solving a math problem. In operant conditioning, the focus is on how behaviors are influenced and modified.
  • Consequence The “C” stands for Consequence, which is what follows the behavior. Consequences can be positive or negative and play a significant role in operant conditioning. Positive consequences, like praise or rewards, often increase the likelihood of the behavior happening again. Negative consequences, like punishment or criticism, can decrease the likelihood of the behavior happening in the future.

These three elements, Antecedent, Behavior, and Consequence, work together in operant conditioning to shape and modify behaviors. By controlling and manipulating these components, psychologists and educators can influence and change how people respond to various situations and stimuli.

In Education

Classroom management.

Operant conditioning is a valuable tool in classroom management. Teachers use it to shape the behavior of students.

When students exhibit desired behaviors, such as completing assignments on time or actively participating in class, they are rewarded with positive reinforcement. This can be in the form of praise, extra privileges, or small incentives.

On the other hand, undesirable behaviors can be discouraged through negative reinforcement, such as giving a student extra homework or taking away certain privileges. The key is to establish a clear connection between behavior and consequences to foster a positive learning environment.

Skill Acquisition

Operant conditioning also aids in skill acquisition. Teachers employ this method to teach new skills or reinforce existing ones. By breaking down complex tasks into smaller, manageable steps and providing rewards for achieving each step, students are motivated to acquire the desired skills.

This gradual process of skill acquisition, known as “shaping,” allows students to master skills and build competence over time.

In Parenting

Discipline strategies.

Parents often use operant conditioning to manage their children’s behavior. This involves setting clear expectations and consequences for their children’s actions.

For instance, parents might provide positive reinforcement, such as praise or small rewards, when their child follows rules or behaves well.

Conversely, undesirable behaviors may be discouraged through the removal of privileges or the introduction of a time-out as a form of negative reinforcement. By consistently applying these principles, parents can guide their children toward better behavior and decision-making.

Shaping Desired Behaviors

Operant conditioning is also a potent tool for shaping desired behaviors in children. Parents can use this method to encourage habits like completing homework, helping with chores, or showing respect to others.

By providing immediate and consistent positive reinforcement for these behaviors, children are more likely to repeat them. Over time, these behaviors become ingrained and part of the child’s routine.

In Psychology and Therapy

Behavior modification.

Operant conditioning plays a significant role in behavior modification within the field of psychology. Therapists and psychologists use this approach to help individuals replace unwanted behaviors with more desirable ones.

By identifying the triggers and consequences of behaviors, they can design interventions that incorporate positive and negative reinforcement.

This helps individuals develop healthier habits and overcome issues such as substance addiction, overeating, or excessive anxiety.

Treating Phobias and Addictions

Operant conditioning can also be used to treat phobias and addictions. Exposure therapy, a common technique, involves systematically exposing individuals to the source of their fear or addiction.

When they face these triggers without negative consequences, the fear or craving is gradually reduced. This process harnesses operant conditioning principles to retrain the brain’s response to these stimuli, promoting recovery and symptom reduction.

Ethical Concerns

When it comes to operant conditioning, some ethical concerns have been raised. This method involves the use of reinforcement and punishment to modify behavior, and it’s essential to consider the ethical implications of these practices.

Critics argue that operant conditioning can lead to the manipulation of individuals and their behavior, potentially infringing upon their autonomy and free will.

In the case of punishment, the use of aversive stimuli to deter certain behaviors has been criticized for its potential harm. For instance, using physical punishment or extreme measures to suppress behaviors may cause emotional and physical harm to the subjects involved, which can be considered unethical.

Furthermore, operant conditioning sometimes involves the use of extrinsic rewards, such as money or prizes, to motivate behavior. This has raised concerns about the potential effects on intrinsic motivation, as individuals may become overly reliant on external rewards, diminishing their genuine interest in the behavior itself.

The Limitations of Operant Conditioning

Operant conditioning, while effective in many situations, has its limitations. One notable limitation is its inability to explain all aspects of human behavior.

It primarily focuses on observable behaviors, ignoring the cognitive and emotional processes that influence behavior. Thus, it doesn’t provide a complete picture of human psychology.

Another limitation is the potential for over-simplification. Operant conditioning often oversimplifies the complexity of human behavior by reducing it to a stimulus-response model.

In reality, behavior is influenced by a wide range of factors, including genetics, emotions, and cultural influences, which operant conditioning may not adequately address.

Alternatives and Complementary Theories

Critics argue that operant conditioning should not be viewed in isolation but as part of a broader understanding of behavior. Alternative and complementary theories, such as classical conditioning, social learning theory, and cognitive psychology, provide a more comprehensive view of human behavior.

Classical conditioning, for instance, emphasizes the importance of associations between stimuli, while social learning theory emphasizes the role of observation and modeling in behavior acquisition.

Cognitive psychology delves into the mental processes that influence behavior, such as perception, memory, and problem-solving, which operant conditioning tends to overlook.

How useful was this post?

Click on a star to rate it!

As you found this post useful...

Share it on social media!

We are sorry that this post was not useful for you!

Let us improve this post!

Tell us how we can improve this post?

Photo of author

Brenda Calisaan

Brenda Calisaan is a psychology graduate who strongly desires to impact society positively. She aspires to spread awareness and knowledge about mental health, its importance, and its impact on individuals and society.

She also has a passion for working with children and hopes to dedicate her career to positively impacting their lives.

Outside of work, Brenda is an avid traveler and enjoys exploring new experiences. She is also a music enthusiast and loves to listen to a variety of genres. When she's not on the road or working, Brenda can often be found watching interesting YouTube videos, such as Ted-Ed content.

Operant Conditioning Theory (+ How to Apply It in Your Life)

operant conditioning theory

How do you use your knowledge of its principles to build, change, or break a habit? How do you use it to get your children to do what you ask them to do – the first time?

The study of behavior is fascinating and even more so when we can connect what is discovered about behavior with our lives outside of a lab setting.

Our goal is to do precisely that; but first, a historical recap is in order.

Before you read on, we thought you might like to download our three Positive Psychology Exercises for free . These science-based exercises explore fundamental aspects of positive psychology, including strengths, values, and self-compassion, and will give you the tools to enhance the wellbeing of your clients, students, or employees.

This Article Contains:

Our protagonists: pavlov, thorndike, watson, and skinner, operant conditioning: a definition, the principles of operant conditioning, 10 examples of operant conditioning, operant conditioning vs. classical conditioning, operant conditioning in therapy, applications in everyday life, a look at reinforcement schedules, useful techniques for practitioners, an interesting video, 5 books on the topic, a take-home message.

Like all great stories, we will begin with the action that got everything else going. A long time ago, Pavlov was trying to figure out the mysteries surrounding salivation in dogs. He hypothesized that dogs salivate in response to the presentation of food. What he discovered set the stage for what was first called Pavlovian conditioning and later, classical conditioning.

What does this have to do with operant conditioning? Other behavior scientists found Pavlov’s work interesting but criticized it because of its focus on reflexive learning. It did not answer questions about how the environment might shape behavior.

E. L. Thorndike was a psychologist with a keen interest in education and learning. His theory of learning, called connectionism , dominated the United States educational system. In a nutshell, he believed that learning was the result of associations between sensory experiences and neural responses (Schunk, 2016, p. 74). When these associations happened, a behavior resulted.

Thorndike also established that learning is the result of a trial-and-error process. This process takes time, but no conscious thought. He studied and developed our initial concepts of operant conditioning reinforcement and how various types influence learning.

Thorndike’s principles of learning include:

  • The Law of Exercise, which involves the Law of Use and the Law of Disuse. These explain how connections are strengthened or weakened based on their use/disuse.
  • The Law of Effect focuses on the consequences of behavior. Behavior that leads to a reward is learned, but behavior that leads to a perceived punishment is not learned.
  • The Law of Readiness is about preparedness. If an animal is ready to act and does so, then this is a reward, but if the animal is ready and unable to act, then this is a punishment.
  • Associative shifting occurs when a response to a particular stimulus is eventually made to a different one.
  • Identical elements affect the transfer of knowledge. The more similar the elements, the more likely the transfer because the responses are also very similar.

Later research did not support Thorndike’s Laws of Exercise and Effect, so he discarded them. Further study revealed that punishment does not necessarily weaken connections (Schunk, 2016, p. 77). The original response is not forgotten.

We all have experienced this at one time or another. You are speeding, get stopped, and receive a ticket. This suppresses your speeding behavior for a short time, but it does not prevent you from ever speeding again.

Later, John B. Watson, another behaviorist, emphasized a methodical, scientific approach to studying behavior and rejected any ideas about introspection. Behaviorists concern themselves with observable phenomena, so the study of inner thoughts and their supposed relationship to behavior was irrelevant.

The “Little Albert” experiment, immortalized in most psychology textbooks, involved conditioning a young boy to fear a white rat. Watson used classical conditioning to accomplish his goal. The boy’s fear of the white rat transferred to other animals with fur. From this, scientists reasoned that emotions could be conditioned (Stangor and Walinga, 2014).

In the 1930s, B. F. Skinner, who had become familiar with the work of these researchers and others, continued the exploration of how organisms learn. Skinner studied and developed the operant conditioning theory that is popular today.

After conducting several animal experiments, Skinner (1938) published his first book, The Behavior of Organisms . In the 1991 edition, he wrote a preface to the seventh printing, reaffirming his position regarding stimulus/response research and introspection:

“… there is no need to appeal to an inner apparatus, whether mental, physiological, or conceptual.”

From his perspective, observable behaviors from the interplay of a stimulus, response, reinforcers, and the deprivation associated with the reinforcer are the only elements that need to be studied to understand human behavior. He called these contingencies and said that they “ account for attending, remembering, learning, forgetting, generalizing, abstracting, and many other so-called cognitive processes .”

Skinner believed that determining the causes of behavior is the most important factor for understanding why an organism behaves in a particular way.

Schunk (2016, p. 88) notes that Skinner’s learning theories have been discredited by more current ones that consider higher order and more complex forms of learning. Operant conditioning theory does not do this, but it is still useful in many educational environments and the study of gamification.

Now that we have a solid understanding of why and how the leading behaviorists discovered and developed their ideas, we can focus our attention on how to use operant conditioning in our everyday lives. First, though, we need to define what we mean by “operant conditioning.”

The basic concept behind operant conditioning is that a stimulus (antecedent) leads to a behavior, which then leads to a consequence. This form of conditioning involves reinforcers, both positive and negative, as well as primary, secondary, and generalized.

  • Primary reinforcers are things like food, shelter, and water.
  • Secondary reinforcers are stimuli that get conditioned because of their association with a primary reinforcer.
  • Generalized reinforcers occur when a secondary reinforcer pairs with more than one primary reinforcer. For example, working for money can increase a person’s ability to buy a variety of things (TVs, cars, a house, etc.)

The behavior is the operant. The relationship between the discriminative stimulus, response, and reinforcer is what influences the likelihood of a behavior happening again in the future. A reinforcer is some kind of reward, or in the case of adverse outcomes, a punishment.

3 positive psychology exercises

Download 3 Free Positive Psychology Exercises (PDF)

Enhance wellbeing with these free, science-based exercises that draw on the latest insights from positive psychology.

Download 3 Free Positive Psychology Tools Pack (PDF)

By filling out your name and email address below.

Reinforcement occurs when a response is strengthened. Reinforcers are situation specific. This means that something that might be reinforcing in one scenario might not be in another.

You might be triggered (reinforced) to go for a run when you see your running shoes near the front door. One day your running shoes end up in a different location, so you do not go for a run. Other shoes by the front door do not have the same effect as seeing your running shoes.

There are four types of reinforcement divided into two groups. The first group acts to increase a desired behavior. This is known as positive or negative reinforcement.

The second group acts to decrease an unwanted behavior. This is called positive or negative punishment. It is important to understand that punishment, though it may be useful in the short term, does not stop the unwanted behavior long term or even permanently. Instead, it suppresses the unwanted behavior for an undetermined amount of time. Punishment does not teach a person how to behave appropriately.

Edwin Gutherie (as cited in Schunk, 2016) believed that to change a habit, which is what some negative behaviors become, a new association is needed. He asserted that there are three methods for altering negative behaviors:

  • Threshold – Introduce a weak stimulus and then increase it over time.
  • Fatigue – Repeat the unwanted response to the stimulus until tired
  • Incompatible response – Pair a stimulus to something more desirable.

Another key aspect of operant conditioning is the concept of extinction. When reinforcement does not happen, a behavior declines. If your partner sends you several text messages throughout the day, and you do not respond, eventually they might stop sending you text messages.

Likewise, if your child has a tantrum, and you ignore it, then your child might stop having tantrums. This differs from forgetting. When there are little to no opportunities to respond to stimuli, then conditioning can be forgotten.

Response generalization is an essential element of operant conditioning. It happens when a person can generalize a behavior learned in the presence of a stimulus and then generalize that response to another, similar stimulus. For example, if you know how to drive one type of car, chances are you can drive another similar kind of car, mini-van, SUV, or truck.

Here’s another example offered by PsychCore.

By now, you are probably thinking of your own examples of both classical and operant conditioning. Please feel free to share them in the comments. In case you need a few more, here are 10 to consider.

Imagine you want a child to sit quietly while you transition to a new task. When the child does it, you reinforce this by recognizing the child in some way. Many schools in the United States use tickets as the reinforcer. These tickets are used by the student or the class to get a future reward. Another reinforcer would be to say, “ I like how Sarah is sitting quietly. She’s ready to learn .” If you have ever been in a classroom with preschoolers through second-graders, you know this works like a charm. This is positive reinforcement.

An example of negative reinforcement would be the removal of something the students do not want. You see that students are volunteering answers during class. At the end of the lesson, you could say, “ Your participation during this lesson was great! No homework! ” Homework is typically something students would rather avoid (negative reinforcer). They learn that if they participate during class, then the teacher is less likely to assign homework.

Your child is misbehaving, so you give her extra chores to do (negative punishment – presenting a negative reinforcer).

You use a treat (positive reinforcer) to train your dog to do a trick. You tell your dog to sit. When he does, you give him a treat. Over time, the dog associates the treat with the behavior.

You are a bandleader. When you step in front of your group, they quiet down and put their instruments into the ready position. You are the stimulus eliciting a specific response. The consequence for the group members is approval from you.

Your child is not cleaning his room when told to do so. You decide to take away his favorite device (negative punishment – removal of a positive reinforcer). He begins cleaning. A few days later, you want him to clean his room, but he does not do it until you threaten to take away his device. He does not like your threat, so he cleans his room. This repeats itself over and over. You are tired of having to threaten him to get him to do his chores.

What can you do when punishment is not effective?

In the previous example, you could pair the less appealing activity (cleaning a room) with something more appealing (extra computer/device time). You might say, “ For every ten minutes you spend cleaning up your room, you can have five extra minutes on your device. ” This is known as the Premack Principle. To use this approach, you need to know what a person values most to least. Then, you use the most valued item to reinforce the completion of the lesser valued tasks. Your child does not value cleaning his room, but he does value device time.

Here are a few more examples using the Premack Principle:

A child who does not want to complete a math assignment but who loves reading could earn extra reading time, a trip to the library to choose a new book, or one-to-one reading time with you after they complete their math assignment.

For every X number of math problems the child completes, he can have X minutes using the iPad at the end of the day.

For every 10 minutes you exercise, you get to watch a favorite show for 10 minutes at the end of the day.

Your child chooses between putting their dirty dishes into the dishwasher, as requested, or cleaning their dishes by hand.

What are your examples of operant conditioning? When have you used the Premack Principle?

An easy way to think about classical conditioning is that it is reflexive. It is the behavior an organism automatically does. Pavlov paired a bell with a behavior a dog already does (salivation) when presented with food. After several trials, Pavlov conditioned dogs to salivate when the bell dinged.

Before this, the bell was a neutral stimulus. The dogs did not salivate when they heard it. In case you are unfamiliar with Pavlov’s research, this video explains his famous experiments.

Operant conditioning is all about the consequences of a behavior; a behavior changes in relation to the environment. If the environment dictates that a particular behavior will not be effective, then the organism changes the behavior. The organism does not need to have conscious awareness of this process for behavior change to take place.

As we already learned, reinforcers are critical in operant conditioning. Behaviors that lead to pleasant outcomes (consequences) get repeated, while those leading to adverse outcomes generally do not.

If you want to train your cat to come to you so that you can give it medicine or flea treatment, you can use operant conditioning.

For example, if your cat likes fatty things like oil, and you happen to enjoy eating popcorn, then you can condition your cat to jump onto a counter near the sink where you place a dirty measuring cup.

  • Step 1: Pour oil and kernels from a measuring cup into a pot.
  • Step 2: Allow the cat to lick the measuring cup.
  • Step 3: Place the cup into the sink.
  • Step 4: Do these same steps each time you make popcorn.

It will not take long for the cat to associate the sound of the “kernels in the pot” with “measuring cup in the sink,” which leads to their reward (oil.) A cat can even associate the sound of the pot sliding across the stovetop with receiving their reward.

Once this behavior is trained, all you have to do is slide the pot across the stovetop or shake the bag of popcorn kernels. Your cat will jump up onto the counter, searching for their reward, and now you can administer the medicine or flea treatment without a problem.

Operant conditioning is useful in education and work environments, for people wanting to form or change a habit, and to train animals. Any environment where the desire is to modify or shape behavior is a good fit.

operant conditioning therapy

Stroke patients tend to place more weight on their non-paretic leg, which is typically a learned response. Sometimes, though, this is because the stroke damages one side of their brain.

The resulting damage causes the person to ignore or become “blind” to the paretic side of their body.

Kumar et al. (2019) designed the V2BaT system. It consists of the following:

  • VR-based task
  • Weight distribution and threshold estimator
  • Wii balance board–VR handshake
  • Heel lift detection
  • Performance evaluation
  • Task-switching modules

Using Wii balance boards to measure weight displacement, they conditioned participants to use their paretic leg by offering an in-game reward (stars and encouragement). The balance boards provided readings that told the researchers which leg was used most during weight-shifting activities.

They conducted several normal trials with multiple difficulty levels. Intermediate catch trials allowed them to analyze changes. When the first catch trial was compared to the final catch trial, there was a significant improvement.

Operant and classical conditioning are the basis of behavioral therapy. Each can be used to help people struggling with obsessive-compulsive disorder (OCD).

People with OCD experience “recurring thoughts, ideas, or sensations (obsessions) that make them feel driven to do something repetitively” (American Psychiatric Association, n.d.). Both types of conditioning also are used to treat other types of anxiety or phobias.

We are an amalgam of our habits. Some are automatic and reflexive, others are more purposeful, but in the end, they are all habits that can be manipulated. For the layperson struggling to change a habit or onboard a new one, operant conditioning can be helpful.

It is the basis for the habit loop made popular in Charles Duhigg’s (2014) book, The Power of Habit .

Habit Loop

The cue (trigger, antecedent) leads to a routine (behavior), and then a reward (consequence).

We all know how challenging changing a habit can be. Still, when you understand the basic principles of operant conditioning, it becomes a matter of breaking the habit down into its parts. Our objective is to change the behavior even when the reward from the original behavior is incredibly attractive to us.

For instance, if you want to start an exercise habit, but you have been sedentary for several months, your motivation will only get you so far. This is one reason why this particular habit as a New Year’s resolution often fails. People are excited to get into the gym and shed a few pounds from the holiday season. Then, after about two weeks, their drive to do this is slowly overtaken by a dozen other things they could do with their time.

Using an operant conditioning approach, you can design for your new exercise habit. B. J. Fogg, a Stanford researcher, advocates starting with something so small it would seem ridiculous.

In his book Tiny Habits: The Small Changes that Change Everything, Fogg (2020) guides readers through the steps to making lasting changes. One of the key things to keep in mind is making the habit as easy as possible and more attractive. If it is a habit you want to break, then you make it harder to do and less appealing.

In our example, you might begin by deciding on one type of exercise you want to do. After that, choose the smallest action toward that exercise. If you want to do 100 pushups, you might start with one wall pushup, one pushup on your knees, or one military pushup. Anything that takes less than 30 seconds for you to accomplish would work.

When you finish, give yourself a mental high-five, a checkmark on a wall calendar, or in an app on your phone. The reward can be whatever you choose, but it is a critical piece of habit change.

Often, when you begin small, you will do more, but the important thing is that all you have to do is your minimum. If that is one pushup, great! You did it! If that is putting on your running shoes, awesome! Following this approach helps stop the mental gymnastics and guilt that often accompanies establishing an exercise habit.

This same methodology is useful for many different types of habits.

A word of caution: If you are dealing with addiction, then getting the help of a professional is something to consider. This does not preclude you from using this approach, but it could help you cope with any withdrawal symptoms you might have, depending on your particular addiction.

The timing of a reward is important as is an understanding of how fast or slow the response is and how quickly the reward loses its effectiveness. The former is called the response rate and the latter, the extinction rate.

Ferster and Skinner (as cited in Schunk, 2016) determined that there are five types of reinforcement, and each has a different effect on response time and the rate of extinction. Schunk (2016) provided explanations for several, but the basic schedules of reinforcement are:

  • Continuous: Reward after each correct action
  • Fixed ratio: Every nth response is rewarded, and the n remains constant.
  • Fixed interval: The timing of the reward is fixed. It might occur after every fifth correct response.
  • Variable ratio: Every nth response is reinforced, but the value varies around an average number n.
  • Variable interval: The time interval varies from instance to instance around some average value.

If you want a behavior to continue for the foreseeable future, then a variable ratio schedule is most effective. The unpredictability maintains interest, and the extinction rate of the reward is the slowest. Examples of this are slot machines and fishing. Not knowing when a reward will happen is usually enough to keep a person working for the reward for an undetermined amount of time.

Continuous reinforcement (rewarding) has the fastest extinction rate. Intuitively this makes sense when the subjects are human. We like novelty and tend to become accustomed to new things quickly. The same reward, given at the same time, for the same thing repeatedly is boring. We also will not work harder, only hard enough to get the reward.

Therapists, counselors, and teachers can all use operant conditioning to assist clients and students in managing their behaviors better. Here are a few suggestions:

  • Create a contract that establishes the client’s/student’s responsibilities and expected behaviors, and those of the practitioner.
  • Focus on reinforcement rather than punishment.
  • Gamify the process.

PsychCore put together a series of videos about operant conditioning, among other behaviorist topics. Here is one explaining some basics. Even though you have read this entire article, this video will help reinforce what you have learned. Different modalities are important for learning and retention.

If you are interested in learning more about classical conditioning, PsychCore also has a video titled, Respondent Conditioning . In it, the concept of extinction is briefly discussed.

Several textbooks covering both classical and operant conditioning are available, but if you are looking for practical suggestions and steps, then look no further than these five books.

1. Science and Human Behavior – B. F. Skinner

Science and Human Behavior

It is often assigned for coursework in applied behavior analysis, a field driven by behaviorist principles.

Available on Amazon .

2. Atomic Habits: An Easy and Proven Way to Build Good Habits and Break Bad Ones – James Clear

Atomic Habits

James Clear started his habit formation journey experimenting with his own habits.

One interesting addition is his revised version of the habit loop to explicitly include “craving.” His version is cue > craving > response > reward. Clear’s advice to start small is similar to both Fogg’s and Maurer’s approach.

3. The Power of Habit: Why We Do What We Do in Life and Business – Charles Duhigg

The Power of Habit

Duhigg offers several examples of businesses that figured out how to leverage habits for success, and then he shares how the average person can do it too.

4. Tiny Habits: The Small Changes That Change Everything  – B. J. Fogg

Tiny Habits: The Small Changes That Change Everything

The Stanford researcher works with businesses, large and small, as well as individuals.

You will learn about motivation, ability, and prompt (MAP) and how to use MAP to create lasting habits. His step-by-step guide is clear and concise, though it does take some initial planning.

5. One Small Step Can Change Your Life: The Kaizen Way – Robert Maurer

One Small Step Can Change Your Life

He breaks down the basic fears people have and why we procrastinate. Then, he shares seven small steps to set us on our new path to forming good habits that last.

If you know of a great book we should add to this list, leave its name in the comment section.

operant conditioning essay definition

17 Top-Rated Positive Psychology Exercises for Practitioners

Expand your arsenal and impact with these 17 Positive Psychology Exercises [PDF] , scientifically designed to promote human flourishing, meaning, and wellbeing.

Created by Experts. 100% Science-based.

Operant and classical conditioning are two ways animals and humans learn. If you want to train a simple stimulus/response, then the latter approach is most effective. If you’re going to build, change, or break a habit, then operant conditioning is the way to go.

Operant conditioning is especially useful in education and work environments, but if you understand the basic principles, you can use them to achieve your personal habit goals .

Reinforcements and reinforcement schedules are crucial to using operant conditioning successfully. Positive and negative punishment decreases unwanted behavior, but the effects are not long lasting and can cause harm. Positive and negative reinforcers increase the desired behavior and are usually the best approach.

How are you using operant conditioning to make lasting changes in your life?

We hope you enjoyed reading this article. Don’t forget to download our three Positive Psychology Exercises for free .

  • American Psychiatric Association (n.d.). What is obsessive-compulsive disorder? Retrieved January 26, 2020, from https://www.psychiatry.org/patients-families/ocd/what-is-obsessive-compulsive-disorder
  • Clear, J. (2018). Atomic habits: An easy and proven way to build good habits and break bad one s. Avery.
  • Duhigg, C. (2014). The power of habit: Why we do what we do in life and business. Random House Trade Paperbacks.
  • Fogg, B.J. (2020). Tiny habits: The small changes that change everything . Houghton Mifflin Harcourt.
  • Kumar, D., Sinha, N., Dutta, A., & Lahiri, U. (2019). Virtual reality-based balance training system augmented with operant conditioning paradigm. BioMedical Engineering OnLine, 18 , 90.
  • Maurer, R. (2014). One small step can change your life: The kaizen way. Workman.
  • PsychCore (2018, September 9). We were asked about response generalization effects [Video]. YouTube. https://youtu.be/9U5xylxV0AE
  • PsychCore (2016, October 28). Operant conditioning continued [Video]. YouTube. https://youtu.be/_JDalbCTpVc
  • Schunk, D. (2016). Learning theories: An educational perspective . Pearson.
  • Skinner, B.F. (1991). The behavior of organisms: An experimental analysis. Copley.
  • Skinner, B.F. (1953). Science and human behavior . Macmillan.
  • Stangor, C., & Walinga, J. (2014). Introduction to psychology (1st Canadian ed.). BC Campus OpenEd. Retrieved January 27, 2020, from https://opentextbc.ca/introductiontopsychology/

' src=

Share this article:

Article feedback

What our readers think.

mel

Helped me better understand my psychology homework. 🙂

Nicholas okeyo

Really love this article as a teacher and as a parent. More enlightened on how to positively influence positive behavior change.

Edson

Muito bom o artigo. Parabéns.

Anne

Hi, one of your examples “Your child is not cleaning his room when told to do so. You decide to take away his favorite device (positive punishment – removal of a positive reinforcer)…” I believed it should be “negative punishment” instead of positive punishment. Negative punishment means punishment by removal. You are removing what a person wants when he performed an undesired behavior.

Nicole Celestine, Ph.D.

Great spotting, and thank you for bringing this to our attention! We’ve corrected this in the post now 🙂

– Nicole | Community Manager

agesa akufa

Let us know your thoughts Cancel reply

Your email address will not be published.

Save my name, email, and website in this browser for the next time I comment.

Related articles

Hierarchy of needs

Hierarchy of Needs: A 2024 Take on Maslow’s Findings

One of the most influential theories in human psychology that addresses our quest for wellbeing is Abraham Maslow’s Hierarchy of Needs. While Maslow’s theory of [...]

Emotional Development

Emotional Development in Childhood: 3 Theories Explained

We have all witnessed a sweet smile from a baby. That cute little gummy grin that makes us smile in return. Are babies born with [...]

Classical Conditioning Phobias

Using Classical Conditioning for Treating Phobias & Disorders

Does the name Pavlov ring a bell? Classical conditioning, a psychological phenomenon first discovered by Ivan Pavlov in the late 19th century, has proven to [...]

Read other articles by their category

  • Body & Brain (49)
  • Coaching & Application (57)
  • Compassion (26)
  • Counseling (51)
  • Emotional Intelligence (24)
  • Gratitude (18)
  • Grief & Bereavement (21)
  • Happiness & SWB (40)
  • Meaning & Values (26)
  • Meditation (20)
  • Mindfulness (45)
  • Motivation & Goals (45)
  • Optimism & Mindset (34)
  • Positive CBT (28)
  • Positive Communication (20)
  • Positive Education (47)
  • Positive Emotions (32)
  • Positive Leadership (18)
  • Positive Parenting (3)
  • Positive Psychology (33)
  • Positive Workplace (37)
  • Productivity (16)
  • Relationships (46)
  • Resilience & Coping (36)
  • Self Awareness (21)
  • Self Esteem (37)
  • Strengths & Virtues (31)
  • Stress & Burnout Prevention (34)
  • Theory & Books (46)
  • Therapy Exercises (37)
  • Types of Therapy (64)

3 Positive Psychology Tools (PDF)

U.S. flag

An official website of the United States government

The .gov means it’s official. Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

The site is secure. The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

  • Publications
  • Account settings

Preview improvements coming to the PMC website in October 2024. Learn More or Try it out now .

  • Advanced Search
  • Journal List
  • HHS Author Manuscripts

Logo of nihpa

O perant C onditioning

Operant behavior is behavior “controlled” by its consequences. In practice, operant conditioning is the study of reversible behavior maintained by reinforcement schedules. We review empirical studies and theoretical approaches to two large classes of operant behavior: interval timing and choice. We discuss cognitive versus behavioral approaches to timing, the “gap” experiment and its implications, proportional timing and Weber's law, temporal dynamics and linear waiting, and the problem of simple chain-interval schedules. We review the long history of research on operant choice: the matching law, its extensions and problems, concurrent chain schedules, and self-control. We point out how linear waiting may be involved in timing, choice, and reinforcement schedules generally. There are prospects for a unified approach to all these areas.

INTRODUCTION

The term operant conditioning 1 was coined by B. F. Skinner in 1937 in the context of reflex physiology, to differentiate what he was interested in—behavior that affects the environment—from the reflex-related subject matter of the Pavlovians. The term was novel, but its referent was not entirely new. Operant behavior , though defined by Skinner as behavior “controlled by its consequences” is in practice little different from what had previously been termed “instrumental learning” and what most people would call habit. Any well-trained “operant” is in effect a habit. What was truly new was Skinner's method of automated training with intermittent reinforcement and the subject matter of reinforcement schedules to which it led. Skinner and his colleagues and students discovered in the ensuing decades a completely unsuspected range of powerful and orderly schedule effects that provided new tools for understanding learning processes and new phenomena to challenge theory.

A reinforcement schedule is any procedure that delivers a reinforcer to an organism according to some well-defined rule. The usual reinforcer is food for a hungry rat or pigeon; the usual schedule is one that delivers the reinforcer for a switch closure caused by a peck or lever press. Reinforcement schedules have also been used with human subjects, and the results are broadly similar to the results with animals. However, for ethical and practical reasons, relatively weak reinforcers must be used—and the range of behavioral strategies people can adopt is of course greater than in the case of animals. This review is restricted to work with animals.

Two types of reinforcement schedule have excited the most interest. Most popular are time-based schedules such as fixed and variable interval, in which the reinforcer is delivered after a fixed or variable time period after a time marker (usually the preceding reinforcer). Ratio schedules require a fixed or variable number of responses before a reinforcer is delivered.

Trial-by-trial versions of all these free-operant procedures exist. For example, a version of the fixed-interval schedule specifically adapted to the study of interval timing is the peak-interval procedure, which adds to the fixed interval an intertrial interval (ITI) preceding each trial and a percentage of extra-long “empty” trials in which no food is given.

For theoretical reasons, Skinner believed that operant behavior ought to involve a response that can easily be repeated, such as pressing a lever, for rats, or pecking an illuminated disk (key) for pigeons. The rate of such behavior was thought to be important as a measure of response strength ( Skinner 1938 , 1966 , 1986 ; Killeen & Hall 2001 ). The current status of this assumption is one of the topics of this review. True or not, the emphasis on response rate has resulted in a dearth of experimental work by operant conditioners on nonrecurrent behavior such as movement in space.

Operant conditioning differs from other kinds of learning research in one important respect. The focus has been almost exclusively on what is called reversible behavior, that is, behavior in which the steady-state pattern under a given schedule is stable, meaning that in a sequence of conditions, XAXBXC…, where each condition is maintained for enough days that the pattern of behavior is locally stable, behavior under schedule X shows a pattern after one or two repetitions of X that is always the same. For example, the first time an animal is exposed to a fixed-interval schedule, after several daily sessions most animals show a “scalloped” pattern of responding (call it pattern A): a pause after each food delivery—also called wait time or latency —followed by responding at an accelerated rate until the next food delivery. However, some animals show negligible wait time and a steady rate (pattern B). If all are now trained on some other procedure—a variable-interval schedule, for example—and then after several sessions are returned to the fixed-interval schedule, almost all the animals will revert to pattern A. Thus, pattern A is the stable pattern. Pattern B, which may persist under unchanging conditions but does not recur after one or more intervening conditions, is sometimes termed metastable ( Staddon 1965 ). The vast majority of published studies in operant conditioning are on behavior that is stable in this sense.

Although the theoretical issue is not a difficult one, there has been some confusion about what the idea of stability (reversibility) in behavior means. It should be obvious that the animal that shows pattern A after the second exposure to procedure X is not the same animal as when it showed pattern A on the first exposure. Its experimental history is different after the second exposure than after the first. If the animal has any kind of memory, therefore, its internal state 2 following the second exposure is likely to be different than after the first exposure, even though the observed behavior is the same. The behavior is reversible; the organism's internal state in general is not. The problems involved in studying nonreversible phenomena in individual organisms have been spelled out elsewhere (e.g., Staddon 2001a , Ch. 1); this review is mainly concerned with the reversible aspects of behavior.

Once the microscope was invented, microorganisms became a new field of investigation. Once automated operant conditioning was invented, reinforcement schedules became an independent subject of inquiry. In addition to being of great interest in their own right, schedules have also been used to study topics defined in more abstract ways such as timing and choice. These two areas constitute the majority of experimental papers in operant conditioning with animal subjects during the past two decades. Great progress has been made in understanding free-operant choice behavior and interval timing. Yet several theories of choice still compete for consensus, and much the same is true of interval timing. In this review we attempt to summarize the current state of knowledge in these two areas, to suggest how common principles may apply in both, and to show how these principles may also apply to reinforcement schedule behavior considered as a topic in its own right.

INTERVAL TIMING

Interval timing is defined in several ways. The simplest is to define it as covariation between a dependent measure such as wait time and an independent measure such as interreinforcement interval (on fixed interval) or trial time-to-reinforcement (on the peak procedure). When interreinforcement interval is doubled, then after a learning period wait time also approximately doubles ( proportional timing ). This is an example of what is sometimes called a time production procedure: The organism produces an approximation to the to-be-timed interval. There are also explicit time discrimination procedures in which on each trial the subject is exposed to a stimulus and is then required to respond differentially depending on its absolute ( Church & Deluty 1977 , Stubbs 1968 ) or even relative ( Fetterman et al. 1989 ) duration. For example, in temporal bisection , the subject (e.g., a rat) experiences either a 10-s or a 2-s stimulus, L or S . After the stimulus goes off, the subject is confronted with two choices. If the stimulus was L , a press on the left lever yields food; if S , a right press gives food; errors produce a brief time-out. Once the animal has learned, stimuli of intermediate duration are presented in lieu of S and L on test trials. The question is, how will the subject distribute its responses? In particular, at what intermediate duration will it be indifferent between the two choices? [Answer: typically in the vicinity of the geometric mean, i.e., √( L.S ) − 4.47 for 2 and 10.]

Wait time is a latency; hence (it might be objected) it may vary on time-production procedures like fixed interval because of factors other than timing—such as degree of hunger (food deprivation). Using a time-discrimination procedure avoids this problem. It can also be mitigated by using the peak procedure and looking at performance during “empty” trials. “Filled” trials terminate with food reinforcement after (say) T s. “Empty” trials, typically 3 T s long, contain no food and end with the onset of the ITI. During empty trials the animal therefore learns to wait, then respond, then stop (more or less) until the end of the trial ( Catania 1970 ). The mean of the distribution of response rates averaged over empty trials ( peak time ) is then perhaps a better measure of timing than wait time because motivational variables are assumed to affect only the height and spread of the response-rate distribution, not its mean. This assumption is only partially true ( Grace & Nevin 2000 , MacEwen & Killeen 1991 , Plowright et al. 2000 ).

There is still some debate about the actual pattern of behavior on the peak procedure in each individual trial. Is it just wait, respond at a constant rate, then wait again? Or is there some residual responding after the “stop” [yes, usually (e.g., Church et al. 1991 )]? Is the response rate between start and stop really constant or are there two or more identifiable rates ( Cheng & Westwood 1993 , Meck et al. 1984 )? Nevertheless, the method is still widely used, particularly by researchers in the cognitive/psychophysical tradition. The idea behind this approach is that interval timing is akin to sensory processes such as the perception of sound intensity (loudness) or luminance (brightness). As there is an ear for hearing and an eye for seeing, so (it is assumed) there must be a (real, physiological) clock for timing. Treisman (1963) proposed the idea of an internal pacemaker-driven clock in the context of human psychophysics. Gibbon (1977) further developed the approach and applied it to animal interval-timing experiments.

WEBER'S LAW, PROPORTIONAL TIMING AND TIMESCALE INVARIANCE

The major similarity between acknowledged sensory processes, such as brightness perception, and interval timing is Weber's law . Peak time on the peak procedure is not only proportional to time-to-food ( T ), its coefficient of variation (standard deviation divided by mean) is approximately constant, a result similar to Weber's law obeyed by most sensory dimensions. This property has been called scalar timing ( Gibbon 1977 ). Most recently, Gallistel & Gibbon (2000) have proposed a grand principle of timescale invariance , the idea that the frequency distribution of any given temporal measure (the idea is assumed to apply generally, though in fact most experimental tests have used peak time) scales with the to-be-timed-interval. Thus, given the normalized peak-time distribution for T =60 s, say; if the x -axis is divided by 2, it will match the distribution for T = 30 s. In other words, the frequency distribution for the temporal dependent variable, normalized on both axes, is asserted to be invariant.

Timescale invariance is in effect a combination of Weber's law and proportional timing. Like those principles, it is only approximately true. There are three kinds of evidence that limit its generality. The simplest is the steady-state pattern of responding (key-pecking or lever-pressing) observed on fixed-interval reinforcement schedules. This pattern should be the same at all fixed-interval values, but it is not. Gallistel & Gibbon wrote, “When responding on such a schedule, animals pause after each reinforcement and then resume responding after some interval has elapsed. It was generally supposed that the animals' rate of responding accelerated throughout the remainder of the interval leading up to reinforcement. In fact, however, conditioned responding in this paradigm … is a two-state variable (slow, sporadic pecking vs. rapid, steady pecking), with one transition per interreinforcement interval ( Schneider 1969 )” (p. 293).

This conclusion over-generalizes Schneider's result. Reacting to reports of “break-and-run” fixed-interval performance under some conditions, Schneider sought to characterize this feature more objectively than the simple inspection of cumulative records. He found a way to identify the point of maximum acceleration in the fixed-interval “scallop” by using an iterative technique analogous to attaching an elastic band to the beginning of an interval and the end point of the cumulative record, then pushing a pin, representing the break point, against the middle of the band until the two resulting straight-line segments best fit the cumulative record (there are other ways to achieve the same result that do not fix the end points of the two line-segments). The postreinforcement time ( x -coordinate) of the pin then gives the break point for that interval. Schneider showed that the break point is an orderly dependent measure: Break point is roughly 0.67 of interval duration, with standard deviation proportional to the mean (the Weber-law or scalar property).

This finding is by no means the same as the idea that the fixed-interval scallop is “a two-state variable” ( Hanson & Killeen 1981 ). Schneider showed that a two-state model is an adequate approximation; he did not show that it is the best or truest approximation. A three- or four-line approximation (i.e., two or more pins) might well have fit significantly better than the two-line version. To show that the process is two-state, Schneider would have had to show that adding additional segments produced negligibly better fit to the data.

The frequent assertion that the fixed-interval scallop is always an artifact of averaging flies in the face of raw cumulative-record data“the many nonaveraged individual fixed-interval cumulative records in Ferster & Skinner (1957 , e.g., pp. 159, 160, 162), which show clear curvature, particularly at longer fixed-interval values (> ∼2 min). The issue for timescale invariance, therefore, is whether the shape, or relative frequency of different-shaped records, is the same at different absolute intervals.

The evidence is that there is more, and more frequent, curvature at longer intervals. Schneider's data show this effect. In Schneider's Figure 3, for example, the time to shift from low to high rate is clearly longer at longer intervals than shorter ones. On fixed-interval schedules, apparently, absolute duration does affect the pattern of responding. (A possible reason for this dependence of the scallop on fixed-interval value is described in Staddon 2001a , p. 317. The basic idea is that greater curvature at longer fixed-interval values follows from two things: a linear increase in response probability across the interval, combined with a nonlinear, negatively accelerated, relation between overall response rate and reinforcement rate.) If there is a reliable difference in the shape, or distribution of shapes, of cumulative records at long and short fixed-interval values, the timescale-invariance principle is violated.

A second dataset that does not agree with timescale invariance is an extensive set of studies on the peak procedure by Zeiler & Powell (1994 ; see also Hanson & Killeen 1981) , who looked explicitly at the effect of interval duration on various measures of interval timing. They conclude, “Quantitative properties of temporal control depended on whether the aspect of behavior considered was initial pause duration, the point of maximum acceleration in responding [break point], the point of maximum deceleration, the point at which responding stopped, or several different statistical derivations of a point of maximum responding … . Existing theory does not explain why Weber's law [the scalar property] so rarely fit the results …” (p. 1; see also Lowe et al. 1979 , Wearden 1985 for other exceptions to proportionality between temporal measures of behavior and interval duration). Like Schneider (1969) and Hanson & Killeen (1981) , Zeiler & Powell found that the break point measure was proportional to interval duration, with scalar variance (constant coefficient of variation), and thus consistent with timescale invariance, but no other measure fit the rule.

Moreover, the fit of the breakpoint measure is problematic because it is not a direct measure of behavior but is itself the result of a statistical fitting procedure. It is possible, therefore, that the fit of breakpoint to timescale invariance owes as much to the statistical method used to arrive at it as to the intrinsic properties of temporal control. Even if this caveat turns out to be false, the fact that every other measure studied by Zeiler & Powell failed to conform to timescale invariance surely rules it out as a general principle of interval timing.

The third and most direct test of the timescale invariance idea is an extensive series of time-discrimination experiments carried out by Dreyfus et al. (1988) and Stubbs et al. (1994) . The usual procedure in these experiments was for pigeons to peck a center response key to produce a red light of one duration that is followed immediately by a green light of another duration. When the green center-key light goes off, two yellow side-keys light up. The animals are reinforced with food for pecking the left side-key if the red light was longer, the right side-key if the green light was longer.

The experimental question is, how does discrimination accuracy depend on relative and absolute duration of the two stimuli? Timescale invariance predicts that accuracy depends only on the ratio of red and green durations: For example, accuracy should be the same following the sequence red:10, green:20 as the sequence red:30, green:60, but it is not. Pigeons are better able to discriminate between the two short durations than the two long ones, even though their ratio is the same. Dreyfus et al. and Stubbs et al. present a plethora of quantitative data of the same sort, all showing that time discrimination depends on absolute as well as relative duration.

Timescale invariance is empirically indistinguishable from Weber's law as it applies to time, combined with the idea of proportional timing: The mean of a temporal dependent variable is proportional to the temporal independent variable. But Weber's law and proportional timing are dissociable—it is possible to have proportional timing without conforming to Weber's law and vice versa (cf. Hanson & Killeen 1981 , Zeiler & Powell 1994 ), and in any case both are only approximately true. Timescale invariance therefore does not qualify as a principle in its own right.

Cognitive and Behavioral Approaches to Timing

The cognitive approach to timing dates from the late 1970s. It emphasizes the psychophysical properties of the timing process and the use of temporal dependent variables as measures of (for example) drug effects and the effects of physiological interventions. It de-emphasizes proximal environmental causes. Yet when timing (then called temporal control; see Zeiler 1977 for an early review) was first discovered by operant conditioners (Pavlov had studied essentially the same phenomenon— delay conditioning —many years earlier), the focus was on the time marker , the stimulus that triggered the temporally correlated behavior. (That is one virtue of the term control : It emphasizes the fact that interval timing behavior is usually not free-running. It must be cued by some aspect of the environment.) On so-called spaced-responding schedules, for example, the response is the time marker: The subject must learn to space its responses more than T s apart to get food. On fixed-interval schedules the time marker is reinforcer delivery; on the peak procedure it is the stimulus events associated with trial onset. This dependence on a time marker is especially obvious on time-production procedures, but on time-discrimination procedures the subject's choice behavior must also be under the control of stimuli associated with the onset and offset of the sample duration.

Not all stimuli are equally effective as time markers. For example, an early study by Staddon & Innis (1966a ; see also 1969) showed that if, on alternate fixed intervals, 50% of reinforcers (F) are omitted and replaced by a neutral stimulus (N) of the same duration, wait time following N is much shorter than after F (the reinforcement-omission effect ). Moreover, this difference persists indefinitely. Despite the fact that F and N have the same temporal relationship to the reinforcer, F is much more effective as a time marker than N. No exactly comparable experiment has been done using the peak procedure, partly because the time marker there involves ITI offset/trial onset rather than the reinforcer delivery, so that there is no simple manipulation equivalent to reinforcement omission.

These effects do not depend on the type of behavior controlled by the time marker. On fixed-interval schedules the time marker is in effect inhibitory: Responding is suppressed during the wait time and then occurs at an accelerating rate. Other experiments ( Staddon 1970 , 1972 ), however, showed that given the appropriate schedule, the time marker can control a burst of responding (rather than a wait) of a duration proportional to the schedule parameters ( temporal go–no-go schedules) and later experiments have shown that the place of responding can be controlled by time since trial onset in the so-called tri-peak procedure ( Matell & Meck 1999 ).

A theoretical review ( Staddon 1974 ) concluded, “Temporal control by a given time marker depends on the properties of recall and attention, that is, on the same variables that affect attention to compound stimuli and recall in memory experiments such as delayed matching-to-sample.” By far the most important variable seems to be “the value of the time-marker stimulus—Stimuli of high value … are more salient …” (p. 389), although the full range of properties that determine time-marker effectiveness is yet to be explored.

Reinforcement omission experiments are transfer tests , that is, tests to identify the effective stimulus. They pinpoint the stimulus property controlling interval timing—the effective time marker—by selectively eliminating candidate properties. For example, in a definitive experiment, Kello (1972) showed that on fixed interval the wait time is longest following standard reinforcer delivery (food hopper activated with food, hopper light on, house light off, etc.). Omission of any of those elements caused the wait time to decrease, a result consistent with the hypothesis that reinforcer delivery acquires inhibitory temporal control over the wait time. The only thing that makes this situation different from the usual generalization experiment is that the effects of reinforcement omission are relatively permanent. In the usual generalization experiment, delivery of the reinforcer according to the same schedule in the presence of both the training stimulus and the test stimuli would soon lead all to be responded to in the same way. Not so with temporal control: As we just saw, even though N and F events have the same temporal relationship to the next food delivery, animals never learn to respond similarly after both. The only exception is when the fixed-interval is relatively short, on the order of 20 s or less ( Starr & Staddon 1974 ). Under these conditions pigeons are able to use a brief neutral stimulus as a time marker on fixed interval.

The Gap Experiment

The closest equivalent to fixed-interval reinforcement–omission using the peak procedure is the so-called gap experiment ( Roberts 1981 ). In the standard gap paradigm the sequence of stimuli in a training trial (no gap stimulus) consists of three successive stimuli: the intertrial interval stimulus (ITI), the fixed-duration trial stimulus (S), and food reinforcement (F), which ends each training trial. The sequence is thus ITI, S, F, ITI. Training trials are typically interspersed with empty probe trials that last longer than reinforced trials but end with an ITI only and no reinforcement. The stimulus sequence on such trials is ITI, S, ITI, but the S is two or three times longer than on training trials. After performance has stabilized, gap trials are introduced into some or all of the probe trials. On gap trials the ITI stimulus reappears for a while in the middle of the trial stimulus. The sequence on gap trials is therefore ITI, S, ITI, S, ITI. Gap trials do not end in reinforcement.

What is the effective time marker (i.e., the stimulus that exerts temporal control) in such an experiment? ITI offset/trial onset is the best temporal predictor of reinforcement: Its time to food is shorter and less variable than any other experimental event. Most but not all ITIs follow reinforcement, and the ITI itself is often variable in duration and relatively long. So reinforcer delivery is a poor temporal predictor. The time marker therefore has something to do with the transition between ITI and trial onset, between ITI and S. Gap trials also involve presentation of the ITI stimulus, albeit with a different duration and within-trial location than the usual ITI, but the similarities to a regular trial are obvious. The gap experiment is therefore a sort of generalization (of temporal control) experiment. Buhusi & Meck (2000) presented gap stimuli more or less similar to the ITI stimulus during probe trials and found results resembling generalization decrement, in agreement with this analysis.

However, the gap procedure was not originally thought of as a generalization test, nor is it particularly well designed for that purpose. The gap procedure arose directly from the cognitive idea that interval timing behavior is driven by an internal clock ( Church 1978 ). From this point of view it is perfectly natural to inquire about the conditions under which the clock can be started or stopped. If the to-be-timed interval is interrupted—a gap—will the clock restart when the trial stimulus returns (reset)? Will it continue running during the gap and afterwards? Or will it stop and then restart (stop)?

“Reset” corresponds to the maximum rightward shift (from trial onset) of the response-rate peak from its usual position t s after trial onset to t + G E , where G E is the offset time (end) of the gap stimulus. Conversely, no effect (clock keeps running) leaves the peak unchanged at t , and “stop and restart” is an intermediate result, a peak shift to G E − G B + t , where G B is the time of onset (beginning) of the gap stimulus.

Both gap duration and placement within a trial have been varied. The results that have been obtained so far are rather complex (cf. Buhusi & Meck 2000 , Cabeza de Vaca et al. 1994 , Matell & Meck 1999 ). In general, the longer the gap and the later it appears in the trial, the greater the rightward peak shift. All these effects can be interpreted in clock terms, but the clock view provides no real explanation for them, because it does not specify which one will occur under a given set of conditions. The results of gap experiments can be understood in a qualitative way in terms of the similarity of the gap presentation to events associated with trial onset; the more similar, the closer the effect will be to reset, i.e., the onset of a new trial. Another resemblance between gap results and the results of reinforcement-omission experiments is that the effects of the gap are also permanent: Behavior on later trials usually does not differ from behavior on the first few ( Roberts 1981 ). These effects have been successfully simulated quantitatively by a neural network timing model ( Hopson 1999 , 2002 ) that includes the assumption that the effects of time-marker presentation decay with time ( Cabeza de Vaca et al. 1994 ).

The original temporal control studies were strictly empirical but tacitly accepted something like the psychophysical view of timing. Time was assumed to be a sensory modality like any other, so the experimental task was simply to explore the different kinds of effect, excitatory, inhibitory, discriminatory, that could come under temporal control. The psychophysical view was formalized by Gibbon (1977) in the context of animal studies, and this led to a static information-processing model, scalar expectancy theory (SET: Gibbon & Church 1984 , Meck 1983 , Roberts 1983 ), which comprised a pacemaker-driven clock, working and reference memories, a comparator, and various thresholds. A later dynamic version added memory for individual trials (see Gallistel 1990 for a review). This approach led to a long series of experimental studies exploring the clocklike properties of interval timing (see Gallistel & Gibbon 2000 , Staddon & Higa 1999 for reviews), but none of these studies attempted to test the assumptions of the SET approach in a direct way.

SET was for many years the dominant theoretical approach to interval timing. In recent years, however, its limitations, of parsimony and predictive range, have become apparent and there are now a number of competitors such as the behavioral theory of timing ( Killeen & Fetterman 1988 , MacEwen & Killeen 1991 , Machado 1997 ), spectral timing theory ( Grossberg & Schmajuk 1989 ), neural network models ( Church & Broadbent 1990 , Hopson 1999 , Dragoi et al. 2002 ), and the habituation-based multiple time scale theory (MTS: Staddon & Higa 1999 , Staddon et al. 2002 ). There is as yet no consensus on the best theory.

Temporal Dynamics: Linear Waiting

A separate series of experiments in the temporal-control tradition, beginning in the late 1980s, studied the real-time dynamics of interval timing (e.g., Higa et al. 1991 , Lejeune et al. 1997 , Wynne & Staddon 1988 ; see Staddon 2001a for a review). These experiments have led to a simple empirical principle that may have wide application. Most of these experiments used the simplest possible timing schedule, a response-initiated delay (RID) schedule 3 . In this schedule the animal (e.g., a pigeon) can respond at any time, t , after food. The response changes the key color and food is delivered after a further T s. Time t is under the control of the animal; time T is determined by the experimenter. These experiments have shown that wait time on these and similar schedules (such as fixed interval) is strongly determined by the duration of the previous interfood interval (IFI). For example, wait time will track a cyclic sequence of IFIs, intercalated at a random point in a sequence of fixed ( t + T =constant) intervals, with a lag of one interval; a single short IFI is followed by a short wait time in the next interval (the effect of a single long interval is smaller), and so on (see Staddon et al. 2002 for a review and other examples of temporal tracking). To a first approximation, these results are consistent with a linear relation between wait time in IFI N + 1 and the duration of IFI N :

where I is the IFI, a is a constant less than one, and b is usually negligible. This relation has been termed linear waiting ( Wynne & Staddon 1988 ). The principle is an approximation: an expanded model, incorporating the multiple time scale theory, allows the principle to account for the slower effects of increases as opposed to decreases in IFI (see Staddon et al. 2002 ).

Most importantly for this discussion, the linear waiting principle appears to be obligatory. That is, organisms seem to follow the linear waiting rule even if they delay or even prevent reinforcer delivery by doing so. The simplest example is the RID schedule itself. Wynne & Staddon (1988) showed that it makes no difference whether the experimenter holds delay time T constant or the sum of t + T constant ( t + T = K ): Equation 1 holds in both cases, even though the optimal (reinforcement-rate-maximizing) strategy in the first case is for the animal to set t equal to zero, whereas in the second case reinforcement rate is maximized so long as t < K . Using a version of RID in which T in interval N + 1 depended on the value of t in the preceding interval, Wynne & Staddon also demonstrated two kinds of instability predicted by linear waiting.

The fact that linear waiting is obligatory allows us to look for its effects on schedules other than the simple RID schedule. The most obvious application is to ratio schedules. The time to emit a fixed number of responses is approximately constant; hence the delay to food after the first response in each interval is also approximately constant on fixed ratio (FR), as on fixed- T RID ( Powell 1968 ). Thus, the optimal strategy on FR, as on fixed- T RID, is to respond immediately after food. However, in both cases animals wait before responding and, as one might expect based on the assumption of a roughly constant interresponse time on all ratio schedules, the duration of the wait on FR is proportional to the ratio requirement ( Powell 1968 ), although longer than on a comparable chain-type schedule with the same interreinforcement time ( Crossman et al. 1974 ). The phenomenon of ratio strain —the appearance of long pauses and even extinction on high ratio schedules ( Ferster & Skinner 1957 )—may also have something to do with obligatory linear waiting.

Chain Schedules

A chain schedule is one in which a stimulus change, rather than primary reinforcement, is scheduled. Thus, a chain fixed-interval–fixed-interval schedule is one in which, for example, food reinforcement is followed by the onset of a red key light in the presence of which, after a fixed interval, a response produces a change to green. In the presence of green, food delivery is scheduled according to another fixed interval. RID schedules resemble two-link chain schedules. The first link is time t , before the animal responds; the second link is time T , after a response. We may expect, therefore, that waiting time in the first link of a two-link schedule will depend on the duration of the second link. We describe two results consistent with this conjecture and then discuss some exceptions.

Davison (1974) studied a two-link chain fixed-interval–fixed-interval schedule. Each cycle of the schedule began with a red key. Responding was reinforced, on fixed-interval I 1 s, by a change in key color from red to white. In the presence of white, food reinforcement was delivered according to fixed-interval I 2 s, followed by reappearance of the red key. Davison varied I 1 and I 2 and collected steady-state rate, pause, and link-duration data. He reported that when programmed second-link duration was long in relation to the first-link duration, pause in the first link sometimes exceeded the programmed link duration. The linear waiting predictions for this procedure can therefore be most easily derived for those conditions where the second link is held constant and the first link duration is varied (because under these conditions, the first-link pause was always less than the programmed first-link duration). The prediction for the terminal link is

where a is the proportionality constant, I 2 is the duration of the terminal-link fixed-interval, and t 2 is the pause in the terminal link. Because I 2 is constant in this phase, t 2 is also constant. The pause in the initial link is given by

where I 1 is the duration of the first link. Because I 2 is constant, Equation 3 is a straight line with slope a and positive y-intercept aI 2 .

Linear waiting theory can be tested with Davison's data by plotting, for every condition, t 1 and t 2 versus time-to-reinforcement (TTR); that is, plot pause in each link against TTR for that link in every condition. Linear waiting makes a straightforward prediction: All the data points for both links should lie on the same straight line through the origin (assuming that b → 0). We show this plot in Figure 1 . There is some variability, because the data points are individual subjects, not averages, but points from first and second links fit the same line, and the deviations do not seem to be systematic.

An external file that holds a picture, illustration, etc.
Object name is nihms-2125-0001.jpg

Steady-state pause duration plotted against actual time to reinforcement in the first and second links of a two-link chain schedule. Each data point is from a single pigeon in one experimental condition (three data points from an incomplete condition are omitted). (From Davison 1974 , Table 1)

A study by Innis et al. (1993) provides a dynamic test of the linear waiting hypothesis as applied to chain schedules. Innis et al. studied two-link chain schedules with one link of fixed duration and the other varying from reinforcer to reinforcer according to a triangular cycle. The dependent measure was pause in each link. Their Figure 3, for example, shows the programmed and actual values of the second link of the constant-cycle procedure (i.e., the first link was a constant 20 s; the second link varied from 5 to 35 s according to the triangular cycle) as well as the average pause, which clearly tracks the change in second-link duration with a lag of one interval. They found similar results for the reverse procedure, cycle-constant , in which the first link varied cyclically and the second link was constant. The tracking was a little better in the first procedure than in the second, but in both cases first-link pause was determined primarily by TTR.

There are some data suggesting that linear waiting is not the only factor that determines responding on simple chain schedules. In the four conditions of Davison's experiment in which the programmed durations of the first and second links added to a constant (120 s)—which implies a constant first-link pause according to linear waiting—pause in the first link covaried with first-link duration, although the data are noisy.

The alternative to the linear waiting account of responding on chain schedules is an account in terms of conditioned reinforcement (also called secondary reinforcement)—the idea that a stimulus paired with a primary reinforcer acquires some independent reinforcing power. This idea is also the organizing principle behind most theories of free-operant choice. There are some data that seem to imply a response-strengthening effect quite apart from the linear waiting effect, but they do not always hold up under closer inspection. Catania et al. (1980) reported that “higher rates of pecking were maintained by pigeons in the middle component of three-component chained fixed-interval schedules than in that component of the corresponding multiple schedule (two extinction components followed by a fixed-interval component)” (p. 213), but the effect was surprisingly small, given that no responding at all was required in the first two components. Moreover, results of a more critical control condition, chain versus tandem (rather than multiple) schedule, were the opposite: Rate was generally higher in the middle tandem component than in the second link of the chain. (A tandem schedule is one with the same response contingencies as a chain but with the same stimulus present throughout.)

Royalty et al. (1987) introduced a delay into the peck-stimulus-change contingency of a three-link variable-interval chain schedule and found large decreases in response rate [wait time (WT) was not reported] in both first and second links. They concluded that “because the effect of delaying stimulus change was comparable to the effect of delaying primary reinforcement in a simple variable-interval schedule … the results provide strong evidence for the concept of conditioned reinforcement” (p. 41). The implications of the Royalty et al. data for linear waiting are unclear, however, ( a ) because the linear waiting hypothesis does not deal with the assignment-of-credit problem, that is, the selection of the appropriate response by the schedule. Linear waiting makes predictions about response timing—when the operant response occurs—but not about which response will occur. Response-reinforcer contiguity may be essential for the selection of the operant response in each chain link (as it clearly is during “shaping”), and diminishing contiguity may reduce response rate, but contiguity may play little or no role in the timing of the response. The idea of conditioned reinforcement may well apply to the first function but not to the second. ( b ) Moreover, Royalty et al. did not report obtained time-to-reinforcement data; the effect of the imposed delay may therefore have been via an increase in component duration rather than directly on response rate.

Williams & Royalty (1990) explicitly compared conditioned reinforcement and time to reinforcement as explanations for chain schedule performance in three-link chains and concluded “that time to reinforcement itself accounts for little if any variance in initial-link responding” (p. 381) but not timing, which was not measured. However, these data are from chain schedules with both variable-interval and fixed-interval links, rather than fixed-interval only, and with respect to response rate rather than pause measures. In a later paper Williams qualified this claim: “The effects of stimuli in a chain schedule are due partly to the time to food correlated with the stimuli and partly to the time to the next conditioned reinforcer in the sequence” (1997, p. 145).

The conclusion seems to be that linear waiting plays a relatively major, and conditioned reinforcement (however defined) a relatively minor, role in the determination of response timing on chain fixed-interval schedules. Linear waiting also provides the best available account of a striking, unsolved problem with chain schedules: the fact that in chains with several links, pigeon subjects may respond at a low level or even quit completely in early links ( Catania 1979 , Gollub 1977 ). On fixed-interval chain schedules with five or more links, responding in the early links begins to extinguish and the overall reinforcement rate falls well below the maximum possible—even if the programmed interreinforcement interval is relatively short (e.g., 6×15=90 s). If the same stimulus is present in all links (tandem schedule), or if the six different stimuli are presented in random order (scrambled-stimuli chains), performance is maintained in all links and the overall reinforcement rate is close to the maximum possible (6 I , where I is the interval length). Other studies have reported very weak responding in early components of a simple chain fixed-interval schedule (e.g., Catania et al. 1980 , Davison 1974 , Williams 1994 ; review in Kelleher & Gollub 1962 ). These studies found that chains with as few as three fixed-interval 60-s links ( Kelleher & Fry 1962 ) occasionally produce extreme pausing in the first link. No formal theory of the kind that has proliferated to explain behavior on concurrent chain schedules (discussed below) has been offered to account for these strange results, even though they have been well known for many years.

The informal suggestion is that the low or zero response rates maintained by early components of a multi-link chain are a consequence of the same discrimination process that leads to extinction in the absence of primary reinforcement. Conversely, the stimulus at the end of the chain that is actually paired with primary reinforcement is assumed to be a conditioned reinforcer; stimuli in the middle sustain responding because they lead to production of a conditioned reinforcer ( Catania et al. 1980 , Kelleher & Gollub 1962 ). Pairing also explains why behavior is maintained on tandem and scrambled-stimuli chains ( Kelleher & Fry 1962 ). In both cases the stimuli early in the chain are either invariably (tandem) or occasionally (scrambled-stimulus) paired with primary reinforcement.

There are problems with the conditioned-reinforcement approach, however. It can explain responding in link two of a three-link chain but not in link one, which should be an extinction stimulus. The explanatory problem gets worse when more links are added. There is no well-defined principle to tell us when a stimulus changes from being a conditioned reinforcer, to a stimulus in whose presence responding is maintained by a conditioned reinforcer, to an extinction stimulus. What determines the stimulus property? Is it stimulus number, stimulus duration or the durations of stimuli later in the chain? Perhaps there is some balance between contrast/extinction, which depresses responding in early links, and conditioned reinforcement, which is supposed to (but sometimes does not) elevate responding in later links? No well-defined compound theory has been offered, even though there are several quantitative theories for multiple-schedule contrast (e.g., Herrnstein 1970 , Nevin 1974 , Staddon 1982 ; see review in Williams 1988 ). There are also data that cast doubt even on the idea that late-link stimuli have a rate-enhancing effect. In the Catania et al. (1980) study, for example, four of five pigeons responded faster in the middle link of a three-link tandem schedule than the comparable chain.

The lack of formal theories for performance on simple chains is matched by a dearth of data. Some pause data are presented in the study by Davison (1974) on pigeons in a two-link fixed-interval chain. The paper attempted to fit Herrnstein's (1970) matching law between response rates and link duration. The match was poor: The pigeon's rates fell more than predicted when the terminal links (contiguous with primary reinforcement) of the chain were long, but Davison did find that “the terminal link schedule clearly changes the pause in the initial link, longer terminal-link intervals giving longer initial-link pauses” (1974, p. 326). Davison's abstract concludes, “Data on pauses during the interval schedules showed that, in most conditions, the pause duration was a linear function of the interval length, and greater in the initial link than in the terminal link” (p. 323). In short, the pause (time-to-first-response) data were more lawful than response-rate data.

Linear waiting provides a simple explanation for excessive pausing on multi-link chain fixed-interval schedules. Suppose the chief function of the link stimuli on chain schedules is simply to signal changing times to primary reinforcement 4 . Thus, in a three-link fixed-interval chain, with link duration I , the TTR signaled by the end of reinforcement (or by the onset of the first link) is 3 I . The onset of the next link signals a TTR of 2 I and the terminal, third, link signals a TTR of I . The assumptions of linear waiting as applied to this situation are that pausing (time to first response) in each link is determined entirely by TTR and that the wait time in interval N +1 is a linear function of the TTR in the preceding interval.

To see the implications of this process, consider again a three-link chain schedule with I =1 (arbitrary time units). The performance to be expected depends entirely on the value of the proportionality constant, a , that sets the fraction of time-to-primary-reinforcement that the animal waits (for simplicity we can neglect b ; the logic of the argument is unaffected). All is well so long as a is less than one-third. If a is exactly 0.333, then for unit link duration the pause in the third link is 0.33, in the second link 0.67, and in the first link 1.0 However, if a is larger, for instance 0.5, the three pauses become 0.5, 1.0, and 1.5; that is, the pause in the first link is now longer than the programmed interval, which means the TTR in the first link will be longer than 3 the next time around, so the pause will increase further, and so on until the process stabilizes (which it always does: First-link pause never goes to ∞).

The steady-state wait times in each link predicted for a five-link chain, with unit-duration links, for two values of a are shown in Figure 2 . In both cases wait times in the early links are very much longer than the programmed link duration. Clearly, this process has the potential to produce very large pauses in the early links of multilink-chain fixed-interval schedules and so may account for the data Catania (1979) and others have reported.

An external file that holds a picture, illustration, etc.
Object name is nihms-2125-0002.jpg

Wait time (pause, time to first response) in each equal-duration link of a five-link chain schedule (as a multiple of the programmed link duration) as predicted by the linear-waiting hypothesis. The two curves are for two values of parameter a in Equation 1 ( b =0). Note the very long pauses predicted in early links—almost two orders of magnitude greater than the programmed interval in the first link for a =0.67. (From Mazur 2001 )

Gollub in his dissertation research (1958) noticed the additivity of this sequential pausing. Kelleher & Gollub (1962) in their subsequent review wrote, “No two pauses in [simple fixed interval] can both postpone food-delivery; however, pauses in different components of [a] five-component chain will postpone food-delivery additively” (p. 566). However, this additivity was only one of a number of processes suggested to account for the long pauses in early chain fixed-interval links, and its quantitative implications were never explored.

Note that the linear waiting hypothesis also accounts for the relative stability of tandem schedules and chain schedules with scrambled components. In the tandem schedule, reinforcement constitutes the only available time marker. Given that responding after the pause continues at a relatively high rate until the next time marker, Equation 1 (with b assumed negligible) and a little algebra shows that the steady-state postreinforcement pause for a tandem schedule with unit links will be

where N is the number of links and a is the pause fraction. In the absence of any time markers, pauses in links after the first are necessarily short, so the experienced link duration equals the programmed duration. Thus, the total interfood-reinforcement interval will be t + N − 1 ( t ≥ 1): the pause in the first link (which will be longer than the programmed link duration for N > 1/ a ) plus the programmed durations of the succeeding links. For the case of a = 0.67 and unit link duration, which yielded a steady-state interfood interval (IFI) of 84 for the five-link chain schedule, the tandem yields 12. For a = 0.5, the two values are approximately 16 and 8.

The long waits in early links shown in Figure 2 depend critically on the value of a . If, as experience suggests (there has been no formal study), a tends to increase slowly with training, we might expect the long pausing in initial links to take some time to develop, which apparently it does ( Gollub 1958 ).

On the scrambled-stimuli chain each stimulus occasionally ends in reinforcement, so each signals a time-to-reinforcement (TTR) 5 of I , and pause in each link should be less than the link duration—yielding a total IFI of approximately N , i.e., 5 for the example in the figure. These predictions yield the order IFI in the chain > tandem > scrambled, but parametric data are not available for precise comparison. We do not know whether an N -link scrambled schedule typically stabilizes at a shorter IFI than the comparable tandem schedule, for example. Nor do we know whether steady-state pause in successive links of a multilink chain falls off in the exponential fashion shown in Figure 2 .

In the final section we explore the implications of linear waiting for studies of free-operant choice behavior.

Although we can devote only limited space to it, choice is one of the major research topics in operant conditioning (see Mazur 2001 , p. 96 for recent statistics). Choice is not something that can be directly observed. The subject does this or that and, in consequence, is said to choose. The term has unfortunate overtones of conscious deliberation and weighing of alternatives for which the behavior itself—response A or response B—provides no direct evidence. One result has been the assumption that the proper framework for all so-called choice studies is in terms of response strength and the value of the choice alternatives. Another is the assumption that procedures that are very different are nevertheless studying the same thing.

For example, in a classic series of experiments, Kahneman & Tversky (e.g., 1979) asked a number of human subjects to make a single choice of the following sort: between $400 for sure and a 50% chance of $1000. Most went for the sure thing, even though the expected value of the gamble is higher. This is termed risk aversion , and the same term has been applied to free-operant “choice” experiments. In one such experiment an animal subject must choose repeatedly between a response leading to a fixed amount of food and one leading equiprobably to either a large or a small amount with the same average value. Here the animals tend to be either indifferent or risk averse, preferring the fixed alternative ( Staddon & Innis 1966b , Bateson & Kacelnik 1995 , Kacelnik & Bateson 1996 ).

In a second example pigeons responded repeatedly to two keys associated with equal variable-interval schedules. A successful response on the left key, for example, is reinforced by a change in the color of the pecked key (the other key light goes off). In the presence of this second stimulus, food is delivered according to a fixed-interval schedule (fixed-interval X ). The first stimulus, which is usually the same on both keys, is termed the initial link ; the second stimulus is the terminal link . Pecks on the right key lead in the same way to food reinforcement on variable-interval X . (This is termed a concurrent-chain schedule.) In this case subjects overwhelmingly prefer the initial-link choice leading to the variable-interval terminal link; that is, they are apparently risk seeking rather than risk averse ( Killeen 1968 ).

The fact that these three experiments (Kahneman & Tversky and the two free-operant studies) all produce different results is sometimes thought to pose a serious research problem, but, we contend, the problem is only in the use of the term choice for all three. The procedures (not to mention the subjects) are in fact very different, and in operant conditioning the devil is very much in the details. Apparently trivial procedural differences can sometimes lead to wildly different behavioral outcomes. Use of the term choice as if it denoted a unitary subject matter is therefore highly misleading. We also question the idea that the results of choice experiments are always best explained in terms of response strength and stimulus value.

Concurrent Schedules

Bearing these caveats in mind, let's look briefly at the extensive history of free-operant choice research. In Herrnstein's seminal experiment (1961 ; see Davison & McCarthy 1988 , Williams 1988 for reviews; for collected papers see Rachlin & Laibson 1997 ) hungry pigeons pecked at two side-by-side response keys, one associated with variable-interval v 1 s and the other with variable-interval v 2 s ( concurrent variable-interval–variable-interval schedule). After several experimental sessions and a range of v 1 and v 2 values chosen so that the overall programmed reinforcement rate was constant (1/ v 1 + 1/ v 2 = constant), the result was matching between steady-state relative response rates and relative obtained reinforcement rates:

where x and y are the response rates on the two alternatives and R ( x ) and R ( y ) are the rates of obtained reinforcement for them. This relation has become known as Herrnstein's matching law. Although the obtained reinforcement rates are dependent on the response rates that produce them, the matching relation is not forced, because x and y can vary over quite a wide range without much effect on R ( x ) and R ( y ).

Because of the negative feedback relation intrinsic to variable-interval schedules (the less you respond, the higher the probability of payoff), the matching law on concurrent variable-interval–variable-interval is consistent with reinforcement maximization ( Staddon & Motheral 1978 ), although the maximum of the function relating overall payoff, R ( x ) + R ( y ), to relative responding, x /( x + y ), is pretty flat. However, little else on these schedules fits the maximization idea. As noted above, even responding on simple fixed- T response-initiated delay (RID) schedules violates maximization. Matching is also highly overdetermined, in the sense that almost any learning rule consistent with the law of effect—an increase in reinforcement probability causes an increase in response probability—will yield either simple matching ( Equation 5 ) or its power-law generalization ( Baum 1974 , Hinson & Staddon 1983 , Lander & Irwin 1968 , Staddon 1968 ). Matching by itself therefore reveals relatively little about the dynamic processes operating in the responding subject (but see Davison & Baum 2000 ). Despite this limitation, the strikingly regular functional relations characteristic of free-operant choice studies have attracted a great deal of experimental and theoretical attention.

Herrnstein (1970) proposed that Equation 5 can be derived from the function relating steady-state response rate, x , and reinforcement rate, R ( x ), to each response key considered separately. This function is negatively accelerated and well approximated by a hyperbola:

where k is a constant and R 0 represents the effects of all other reinforcers in the situation. The denominator and parameter k cancel in the ratio x / y , yielding Equation 5 for the choice situation.

There are numerous empirical details that are not accounted for by this formulation: systematic deviations from matching [undermatching and overmatching ( Baum 1974 )] as a function of different types of variable-interval schedules, dependence of simple matching on use of a changeover delay , extensions to concurrent-chain schedules, and so on. For example, if animals are pretrained with two alternatives presented separately, so that they do not learn to switch between them, when given the opportunity to respond to both, they fixate on the richer one rather than matching [extreme overmatching ( Donahoe & Palmer 1994 , pp. 112–113; Gallistel & Gibbon 2000 , pp. 321–322)]. (Fixation—extreme overmatching—is, trivially, matching, of course but if only fixation were observed, the idea of matching would never have arisen. Matching implies partial, not exclusive, preference.) Conversely, in the absence of a changeover delay, pigeons will often just alternate between two unequal variable-interval choices [extreme undermatching ( Shull & Pliskoff 1967 )]. In short, matching requires exactly the right amount of switching. Nevertheless, Herrnstein's idea of deriving behavior in choice experiments from the laws that govern responding to the choice alternatives in isolation is clearly worth pursuing.

In any event, Herrnstein's approach—molar data, predominantly variable-interval schedules, rate measures—set the basic pattern for subsequent operant choice research. It fits the basic presuppositions of the field: that choice is about response strength , that response strength is equivalent to response probability, and that response rate is a valid proxy for probability (e.g., Skinner 1938 , 1966 , 1986 ; Killeen & Hall 2001 ). (For typical studies in this tradition see, e.g., Fantino 1981 ; Grace 1994 ; Herrnstein 1961 , 1964 , 1970 ; Rachlin et al. 1976 ; see also Shimp 1969 , 2001 .)

We can also look at concurrent schedules in terms of linear waiting. Although published evidence is skimpy, recent unpublished data ( Cerutti & Staddon 2002 ) show that even on variable-interval schedules (which necessarily always contain a few very short interfood intervals), postfood wait time and changeover time covary with mean interfood time. It has also long been known that Equation 6 can be derived from two time-based assumptions: that the number of responses emitted is proportional to the number of reinforcers received multiplied by the available time and that available time is limited by the time taken up by each response ( Staddon 1977 , Equations 23–25). Moreover, if we define mean interresponse time as the reciprocal of mean response rate, 6 x , and mean interfood interval is the reciprocal of obtained reinforcement rate, R ( x ), then linear waiting yields

where a and b are linear waiting constants. Rearranging yields

where 1/ b = k and a / b = R 0 in Equation 6 . Both these derivations of the hyperbola in Equation 6 from a linear relation in the time domain imply a correlation between parameters k and R 0 in Equation 6 under parametric experimental variation of parameter b by (for example) varying response effort or, possibly, hunger motivation. Such covariation has been occasionally but not universally reported ( Dallery et al. 2000 , Heyman & Monaghan 1987 , McDowell & Dallery 1999 ).

Concurrent-Chain Schedules

Organisms can be trained to choose between sources of primary reinforcement (concurrent schedules) or between stimuli that signal the occurrence of primary reinforcement ( conditioned reinforcement : concurrent chain schedules). Many experimental and theoretical papers on conditioned reinforcement in pigeons and rats have been published since the early 1960s using some version of the concurrent chains procedure of Autor (1960 , 1969) . These studies have demonstrated a number of functional relations between rate measures and have led to several closely related theoretical proposals such as a version of the matching law, incentive theory, delay-reduction theory, and hyperbolic value-addition (e.g., Fantino 1969a , b ; Grace 1994 ; Herrnstein 1964 ; Killeen 1982 ; Killeen & Fantino 1990 ; Mazur 1997 , 2001 ; Williams 1988 , 1994 , 1997 ). Nevertheless, there is as yet no theoretical consensus on how best to describe choice between sources of conditioned reinforcement, and no one has proposed an integrated theoretical account of simple chain and concurrent chain schedules.

Molar response rate does not capture the essential feature of behavior on fixed-interval schedules: the systematic pattern of rate-change in each interfood interval, the “scallop.” Hence, the emphasis on molar response rate as a dependent variable has meant that work on concurrent schedules has emphasized variable or random intervals over fixed intervals. We lack any theoretical account of concurrent fixed-interval–fixed-interval and fixed-interval–variable-interval schedules. However, a recent study by Shull et al. (2001 ; see also Shull 1979) suggests that response rate may not capture what is going on even on simple variable-interval schedules, where the time to initiate bouts of relatively fixed-rate responding seems to be a more sensitive dependent measure than overall response rate. More attention to the role of temporal variables in choice is called for.

We conclude with a brief account of how linear waiting may be involved in several well-established phenomena of concurrent-chain schedules: preference for variable-interval versus fixed-interval terminal links, effect of initial-link duration, and finally, so-called self-control experiments.

preference for variable-interval versus fixed-interval terminal links On concurrent-chain schedules with equal variable-interval initial links, animals show a strong preference for the initial link leading to a variable-interval terminal link over the terminal-link alternative with an equal arithmetic-mean fixed interval. This result is usually interpreted as a manifestation of nonarithmetic (e.g., harmonic) reinforcement-rate averaging ( Killeen 1968 ), but it can also be interpreted as linear waiting. Minimum TTR is necessarily much less on the variable-interval than on the fixed-interval side, because some variable intervals are short. If wait time is determined by minimum TTR—hence shorter wait times on the variable-interval side—and ratios of wait times and overall response rates are (inversely) correlated ( Cerutti & Staddon 2002 ), the result will be an apparent bias in favor of the variable-interval choice.

effect of initial-link duration Preference for a given pair of terminal-link schedules depends on initial link duration. For example, pigeons may approximately match initial-link relative response rates to terminal-link relative reinforcement rates when the initial links are 60 s and the terminal links range from 15 to 45 s ( Herrnstein 1964 ), but they will undermatch when the initial-link schedule is increased to, for example, 180 s. This effect is what led to Fantino's delay-reduction modification of Herrnstein's matching law (see Fantino et al. 1993 for a review). However, the same qualitative prediction follows from linear waiting: Increasing initial-link duration reduces the proportional TTR difference between the two choices. Hence the ratio of WTs or of initial-link response rates for the two choices should also approach unity, which is undermatching. Several other well-studied theories of concurrent choice, such as delay reduction and hyperbolic value addition, also explain these results.

Self-Control

The prototypical self-control experiment has a subject choosing between two outcomes: not-so-good cookie now or a good cookie after some delay ( Rachlin & Green 1972 ; see Logue 1988 for a review; Mischel et al. 1989 reviewed human studies). Typically, the subject chooses the immediate, small reward, but if both delays are increased by the same amount, D , he will learn to choose the larger reward, providing D is long enough. Why? The standard answer is derived from Herrnstein's matching analysis ( Herrnstein 1981 ) and is called hyperbolic discounting (see Mazur 2001 for a review and Ainslie 1992 and Rachlin 2000 for longer accounts). The idea is that the expected value of each reward is inversely related to the time at which it is expected according to a hyperbolic function:

where A i is the undiscounted value of the reward, D i is the delay until reward is received, i denotes the large or small reward, and k is a fitted constant.

Now suppose we set D L and D S to values such that the animal shows a preference for the shorter, sooner reward. This would be the case ( k =1) if A L =6, A S =2, D L = 6 s, and D S = 1 s: V L =0.86 and V S =1—preference for the small, less-delayed reward. If 10 s is added to both delays, so that D L = 16 s and D S =11 s, the values are V L =0.35 and V S =0.17—preference for the larger reward. Thus, Equation 8 predicts that added delay—sometimes awkwardly termed pre-commitment— should enhance self-control, which it does.

The most dramatic prediction from this analysis was made and confirmed by Mazur (1987 , 2001) in an experiment that used an adjusting-delay procedure (also termed titration ). “A response on the center key started each trial, and then a pigeon chose either a standard alternative (by pecking the red key) or an adjusting alternative (by pecking the green key) … the standard alternative delivered 2 s of access to grain after a 10-s delay, and the adjusting alternative delivered 6 s of access to grain after an adjusting delay” (2001, p. 97). The adjusting delay increased (on the next trial) when it was chosen and decreased when the standard alternative was chosen. (See Mazur 2001 for other procedural details.) The relevant independent variable is TTR. The discounted value of each choice is given by Equation 8 . When the subject is indifferent does not discriminate between the two choices, V L = V S . Equating Equation 8 for the large and small choices yields

that is, an indifference curve that is a linear function relating D L and D S , with slope A L / A S > 1 and a positive intercept. The data ( Mazur 1987 ; 2001 , Figure 2 ) are consistent with this prediction, but the intercept is small.

It is also possible to look at this situation in terms of linear waiting. One assumption is necessary: that the waiting fraction, a , in Equation 1 is smaller when the upcoming reinforcer is large than when it is small ( Powell 1969 and Perone & Courtney 1992 showed this for fixed-ratio schedules; Howerton & Meltzer 1983 , for fixed-interval). Given this assumption, the linear waiting analysis is even simpler than hyperbolic discounting. The idea is that the subject will appear to be indifferent when the wait times to the two alternatives are equal. According to linear waiting, the wait time for the small alternative is given by

where b S is a small positive intercept and a S > a L . Equating the wait times for small and large alternatives yields

which is also a linear function with slope > 1 and a small positive intercept.

Equations 9 and 11 are identical in form. Thus, the linear waiting and hyperbolic discounting models are almost indistinguishable in terms of these data. However, the linear waiting approach has three potential advantages: Parameters a and b can be independently measured by making appropriate measurements in a control study that retains the reinforcement-delay properties of the self-control experiments without the choice contingency; the linear waiting approach lacks the fitted parameter k in Equation 9 ; and linear waiting also applies to a wide range of time-production experiments not covered by the hyperbolic discounting approach.

Temporal control may be involved in unsuspected ways in a wide variety of operant conditioning procedures. A renewed emphasis on the causal factors operating in reinforcement schedules may help to unify research that has hitherto been defined in terms of more abstract topics like timing and choice.

ACKNOWLEDGMENTS

We thank Catalin Buhusi and Jim Mazur for comments on an earlier version and the NIMH for research support over many years.

1 The first and only previous Annual Review contribution on this topic was as part of a 1965 article, “Learning, Operant Conditioning and Verbal Learning” by Blough & Millward. Since then there have been (by our estimate) seven articles on learning or learning theory in animals, six on the neurobiology of learning, and three on human learning and memory, but this is the first full Annual Review article on operant conditioning. We therefore include rather more old citations than is customary (for more on the history and philosophy of Skinnerian behaviorism, both pro and con, see Baum 1994 , Rachlin 1991 , Sidman 1960 , Staddon 2001b , and Zuriff 1985 ).

2 By “internal” we mean not “physiological” but “hidden.” The idea is simply that the organism's future behavior depends on variables not all of which are revealed in its current behavior (cf. Staddon 2001b , Ch. 7).

3 When there is no response-produced stimulus change, this procedure is also called a conjunctive fixed-ratio fixed-time schedule ( Shull 1970 ).

4 This idea surfaced very early in the history of research on equal-link chain fixed-interval schedules, but because of the presumed importance of conditioned reinforcement, it was the time to reinforcement from link stimulus offset, rather than onset that was thought to be important. Thus, Gollub (1977) , echoing his 1958 Ph.D. dissertation in the subsequent Kelleher & Gollub (1962) review, wrote, “In chained schedules with more than two components … the extent to which responding is sustained in the initial components … depends on the time that elapses from the end of the components to food reinforcement” (p. 291).

5 Interpreted as time to the first reinforcement opportunity.

6 It is not of course: The reciprocal of the mean IRT is the harmonic mean rate. In practice, “mean response rate” usually means arithmetic mean, but note that harmonic mean rate usually works better for choice data than the arithmetic mean (cf. Killeen 1968 ).

LITERATURE CITED

  • Ainslie G. Picoeconomics: The Strategic Interaction of Successive Motivational States Within the Person. Harvard Univ. Press; Cambridge, MA: 1992. [ Google Scholar ]
  • Autor SM. The strength of conditioned reinforcers as a function of frequency and probability of reinforcement. Harvard Univ.; Cambridge, MA: 1960. PhD thesis. [ Google Scholar ]
  • Autor SM. The strength of conditioned reinforcers and a function of frequency and probability of reinforcement. In: Hendry DP, editor. Conditioned Reinforcement. Dorsey; Homewood, IL: 1969. pp. 127–62. [ Google Scholar ]
  • Bateson M, Kacelnik A. Preferences for fixed and variable food sources: variability in amount and delay. J. Exp. Anal. Behav. 1995; 63 :313–29. [ PMC free article ] [ PubMed ] [ Google Scholar ]
  • Baum WM. On two types of deviation from the matching law: bias and undermatching. J. Exp. Anal. Behav. 1974; 22 :231–42. [ PMC free article ] [ PubMed ] [ Google Scholar ]
  • Baum WM. Understanding Behaviorism: Science, Behavior and Culture. HarperCollins; New York: 1994. [ PMC free article ] [ PubMed ] [ Google Scholar ]
  • Blough DS, Millward RB. Learning: operant conditioning and verbal learning. Annu. Rev. Psychol. 1965; 17 :63–94. [ PubMed ] [ Google Scholar ]
  • Buhusi CV, Meck WH. Timing for the absence of the stimulus: the gap paradigm reversed. J. Exp. Psychol.: Anim. Behav. Process. 2000; 26 :305–22. [ PubMed ] [ Google Scholar ]
  • Cabeza de Vaca S, Brown BL, Hemmes NS. Internal clock and memory processes in animal timing. J. Exp. Psychol.: Anim. Behav. Process. 1994; 20 :184–98. [ PubMed ] [ Google Scholar ]
  • Catania AC. Reinforcement schedules and psychophysical judgments: a study of some temporal properties of behavior. In: Schoenfeld WN, editor. The Theory of Reinforcement Schedules. Appleton-Century-Crofts; New York: 1970. pp. 1–42. [ Google Scholar ]
  • Catania AC. Learning. Prentice-Hall; Englewood Cliffs, NJ: 1979. [ Google Scholar ]
  • Catania AC, Yohalem R, Silverman PJ. Contingency and stimulus change in chained schedules of reinforcement. J. Exp. Anal. Behav. 1980; 5 :167–73. [ PMC free article ] [ PubMed ] [ Google Scholar ]
  • Cerutti DT, Staddon JER. The temporal dynamics of choice: concurrent and concurrent-chain interval schedules. 2002 [ Google Scholar ]
  • Cheng K, Westwood R. Analysis of single trials in pigeons' timing performance. J. Exp. Psychol.: Anim. Behav. Process. 1993; 19 :56–67. [ Google Scholar ]
  • Church RM. The internal clock. In: Hulse SH, Fowler H, Honig WK, editors. Cognitive Processes in Animal Behavior. Erlbaum; Hillsdale, NJ: 1978. pp. 277–310. [ Google Scholar ]
  • Church RM, Broadbent HA. Alternative representations of time, number and rate. Cognition. 1990; 37 :55–81. [ PubMed ] [ Google Scholar ]
  • Church RM, Deluty MZ. Bisection of temporal intervals. J. Exp. Psychol.: Anim. Behav. Process. 1977; 3 :216–28. [ PubMed ] [ Google Scholar ]
  • Church RM, Miller KD, Meck WH. Symmetrical and asymmetrical sources of variance in temporal generalization. Anim. Learn. Behav. 1991; 19 :135–55. [ Google Scholar ]
  • Crossman EK, Heaps RS, Nunes DL, Alferink LA. The effects of number of responses on pause length with temporal variables controlled. J. Exp. Anal. Behav. 1974; 22 :115–20. [ PMC free article ] [ PubMed ] [ Google Scholar ]
  • Dallery J, McDowell JJ, Lancaster JS. Falsification of matching theory's account of single-alternative responding: Herrnstein's K varies with sucrose concentration. J. Exp. Anal. Behav. 2000; 73 :23–43. [ PMC free article ] [ PubMed ] [ Google Scholar ]
  • Davison M. A functional analysis of chained fixed-interval schedule performance. J. Exp. Anal. Behav. 1974; 21 :323–30. [ PMC free article ] [ PubMed ] [ Google Scholar ]
  • Davison M, Baum W. Choice in a variable environment: Every reinforcer counts. J. Exp. Anal. Behav. 2000; 74 :1–24. [ PMC free article ] [ PubMed ] [ Google Scholar ]
  • Davison M, McCarthy D. The Matching Law: A Research Review. Erlbaum; Hillsdale, NJ: 1988. [ Google Scholar ]
  • Donahoe JW, Palmer DC. Learning and Complex Behavior. Allyn & Bacon; Boston: 1994. [ Google Scholar ]
  • Dragoi V, Staddon JER, Palmer RG, Buhusi VC. Interval timing as an emergent learning property. Psychol. Rev. 2002 In press. [ PubMed ] [ Google Scholar ]
  • Dreyfus LR, Fetterman JG, Smith LD, Stubbs DA. Discrimination of temporal relations by pigeons. J. Exp. Psychol.: Anim. Behav. Process. 1988; 14 :349–67. [ PubMed ] [ Google Scholar ]
  • Fantino E. Choice and rate of reinforcement. J. Exp. Anal. Behav. 1969a; 12 :723–30. [ PMC free article ] [ PubMed ] [ Google Scholar ]
  • Fantino E. Conditioned reinforcement, choice, and the psychological distance to reward. In: Hendry DP, editor. Conditioned Reinforcement. Dorsey; Homewood, IL: 1969b. pp. 163–91. [ Google Scholar ]
  • Fantino E. Contiguity, response strength, and the delay-reduction hypothesis. In: Harzem P, Zeiler M, editors. Advances in Analysis of Behavior: Predictability, Correlation, and Contiguity. Vol. 2. Wiley; Chichester, UK: 1981. pp. 169–201. [ Google Scholar ]
  • Fantino E, Preston RA, Dunn R. Delay reduction: current status. J. Exp. Anal. Behav. 1993; 60 :159–69. [ PMC free article ] [ PubMed ] [ Google Scholar ]
  • Ferster CB, Skinner BF. Schedules of Reinforcement. Appleton-Century-Crofts; New York: 1957. [ Google Scholar ]
  • Fetterman JG, Dreyfus LR, Stubbs DA. Discrimination of duration ratios. J. Exp. Psychol.: Anim. Behav. Process. 1989; 15 :253–63. [ PubMed ] [ Google Scholar ]
  • Gallistel CR. The Organization of Learning. MIT/Bradford; Cambridge, MA: 1990. [ Google Scholar ]
  • Gallistel CR, Gibbon J. Time, rate, and conditioning. Psychol. Rev. 2000; 107 :289–344. [ PubMed ] [ Google Scholar ]
  • Gibbon J. Scalar expectancy theory and Weber's law in animal timing. Psychol. Rev. 1977; 84 :279–325. [ Google Scholar ]
  • Gibbon J, Church RM. Sources of variance in an information processing theory of timing. In: Roitblat HL, Bever TG, Terrace HS, editors. Animal Cognition. Erlbaum; Hillsdale, NJ: 1984. [ Google Scholar ]
  • Gollub LR. The chaining of fixed-interval schedules. 1958 [ Google Scholar ]
  • Gollub L. Conditioned reinforcement: schedule effects. 1977:288–312. See Honig & Staddon 1977. [ Google Scholar ]
  • Grace RC. A contextual choice model of concurrent-chains choice. J. Exp. Anal. Behav. 1994; 61 :113–29. [ PMC free article ] [ PubMed ] [ Google Scholar ]
  • Grace RC, Nevin JA. Response strength and temporal control in fixed-interval schedules. Anim. Learn. Behav. 2000; 28 :313–31. [ Google Scholar ]
  • Grossberg S, Schmajuk NA. Neural dyamics of adaptive timing and temporal discrimination during associative learning. Neural. Netw. 1989; 2 :79–102. [ Google Scholar ]
  • Hanson SJ, Killeen PR. Measurement and modeling of behavior under fixed-interval schedules of reinforcement. J. Exp. Psychol.: Anim. Behav. Process. 1981; 7 :129–39. [ Google Scholar ]
  • Herrnstein RJ. Relative and absolute strength of response as a function of frequency of reinforcement. J. Exp. Anal. Behav. 1961; 4 :267–72. [ PMC free article ] [ PubMed ] [ Google Scholar ]
  • Herrnstein RJ. Secondary reinforcement and rate of primary reinforcement. J. Exp. Anal. Behav. 1964; 7 :27–36. [ PMC free article ] [ PubMed ] [ Google Scholar ]
  • Herrnstein RJ. On the law of effect. J. Exp. Anal. Behav. 1970; 13 :243–66. [ PMC free article ] [ PubMed ] [ Google Scholar ]
  • Herrnstein RJ. Self control as response strength. In: Bradshaw CM, Lowe CP, Szabadi F, editors. Recent Developments in the Quantification of Steady-State Operant Behavior. Elsevier/North-Holland; Amsterdam: 1981. pp. 3–20. [ Google Scholar ]
  • Heyman GM, Monaghan MM. Effects of changes in response requirements and deprivation on the parameters of the matching law equation: new data and review. J. Exp. Psychol.: Anim. Behav. Process. 1987; 13 :384–94. [ Google Scholar ]
  • Higa JJ, Wynne CDL, Staddon JER. Dynamics of time discrimination. J. Exp. Psychol.: Anim. Behav. Process. 1991; 17 :281–91. [ PubMed ] [ Google Scholar ]
  • Hinson JM, Staddon JER. Matching, maximizing and hill climbing. J. Exp. Anal. Behav. 1983; 40 :321–31. [ PMC free article ] [ PubMed ] [ Google Scholar ]
  • Honig WK, Staddon JER, editors. Handbook of Operant Behavior. Prentice-Hall; Englewood Cliffs, NJ: 1977. [ Google Scholar ]
  • Hopson JW. Gap timing and the spectral timing model. Behav. Process. 1999; 45 :23–31. [ PubMed ] [ Google Scholar ]
  • Hopson JW. Timing without a clock: learning models as interval timing models. Duke Univ.; Durham, NC: 2002. PhD thesis. [ Google Scholar ]
  • Howerton L, Meltzer D. Pigeons' FI behavior following signaled reinforcement duration. Bull. Psychon. Soc. 1983; 21 :161–63. [ Google Scholar ]
  • Innis NK, Mitchell S, Staddon JER. Temporal control on interval schedules: What determines the postreinforcement pause? J. Exp. Anal. Behav. 1993; 60 :293–311. [ PMC free article ] [ PubMed ] [ Google Scholar ]
  • Kacelnik A, Bateson M. Risky theories—the effects of variance on foraging decisions. Am. Zool. 1996; 36 :402–34. [ Google Scholar ]
  • Kahneman D, Tversky A. Prospect theory: an analysis of decision under risk. Econometrika. 1979; 47 :263–91. [ Google Scholar ]
  • Kelleher RT, Fry WT. Stimulus functions in chained and fixed-interval schedules. J. Exp. Anal. Behav. 1962; 5 :167–73. [ PMC free article ] [ PubMed ] [ Google Scholar ]
  • Kelleher RT, Gollub LR. A review of positive conditioned reinforcement. J. Exp. Anal. Behav. 1962; 5 :541–97. [ PMC free article ] [ PubMed ] [ Google Scholar ]
  • Kello JE. The reinforcement-omission effect on fixed-interval schedules: frustration or inhibition? Learn. Motiv. 1972; 3 :138–47. [ Google Scholar ]
  • Killeen PR. On the measurement of reinforcement frequency in the study of preference. J. Exp. Anal. Behav. 1968; 11 :263–69. [ PMC free article ] [ PubMed ] [ Google Scholar ]
  • Killeen PR. Incentive theory: II. Models for choice. J. Exp. Anal. Behav. 1982; 38 :217–32. [ PMC free article ] [ PubMed ] [ Google Scholar ]
  • Killeen PR, Fantino E. Unification of models for choice between delayed reinforcers. J. Exp. Anal. Behav. 1990; 53 :189–200. [ PMC free article ] [ PubMed ] [ Google Scholar ]
  • Killeen PR, Fetterman JG. A behavioral theory of timing. Psychol. Rev. 1988; 95 :274–95. [ PubMed ] [ Google Scholar ]
  • Killeen PR, Hall SS. The principal components of response strength. J. Exp. Anal. Behav. 2001; 75 :111–34. [ PMC free article ] [ PubMed ] [ Google Scholar ]
  • Lander DG, Irwin RJ. Multiple schedules: effects of the distribution of reinforcements between components on the distribution of responses between components. J. Exp. Anal. Behav. 1968; 11 :517–24. [ PMC free article ] [ PubMed ] [ Google Scholar ]
  • Lejeune H, Ferrara A, Simons F, Wearden JH. Adjusting to changes in the time of reinforcement: peak-interval transitions in rats. J. Exp. Psychol.: Anim. Behav. Process. 1997; 23 :211–321. [ PubMed ] [ Google Scholar ]
  • Logue AW. Research on self-control: an integrating framework. Behav. Brain Sci. 1988; 11 :665–709. [ Google Scholar ]
  • Lowe CF, Harzem P, Spencer PT. Temporal control of behavior and the power law. J. Exp. Anal. Behav. 1979; 31 :333–43. [ PMC free article ] [ PubMed ] [ Google Scholar ]
  • MacEwen D, Killeen P. The effects of rate and amount on the speed of the pacemaker in pigeons' timing behavior. Anim. Learn. Behav. 1991; 19 :164–70. [ Google Scholar ]
  • Machado A. Learning the temporal dynamics of behavior. Psychol. Rev. 1997; 104 :241–65. [ PubMed ] [ Google Scholar ]
  • Matell MS, Meck WH. Reinforcement-induced within-trial resetting of an internal clock. Behav. Process. 1999; 45 :159–71. [ PubMed ] [ Google Scholar ]
  • Mazur JE. An adjusting procedure for studying delayed reinforcement. In: Commons ML, Mazur JE, Nevin JA, Rachlin H, editors. Quantitative Analyses of Behavior. The Effects of Delay and Intervening Events on Reinforcement Value. Vol. 5. Erlbaum; Mahwah, NJ: 1987. pp. 55–73. [ Google Scholar ]
  • Mazur JE. Choice, delay, probability, and conditioned reinforcement. Anim. Learn. Behav. 1997; 25 :131–47. [ Google Scholar ]
  • Mazur JE. Hyperbolic value addition and general models of animal choice. Psychol. Rev. 2001; 108 :96–112. [ PubMed ] [ Google Scholar ]
  • McDowell JJ, Dallery J. Falsification of matching theory: changes in the asymptote of Herrnstein's hyperbola as a function of water deprivation. J. Exp. Anal. Behav. 1999; 72 :251–68. [ PMC free article ] [ PubMed ] [ Google Scholar ]
  • Meck WH. Selective adjustment of the speed of an internal clock and memory processes. J. Exp. Psychol.: Anim. Behav. Process. 1983; 9 :171–201. [ PubMed ] [ Google Scholar ]
  • Meck WH, Komeily-Zadeh FN, Church RM. Two-step acquisition: modification of an internal clock's criterion. J. Exp. Psychol.: Anim. Behav. Process. 1984; 10 :297–306. [ PubMed ] [ Google Scholar ]
  • Mischel W, Shoda Y, Rodriguez M. Delay of gratification for children. Science. 1989; 244 :933–38. [ PubMed ] [ Google Scholar ]
  • Nevin JA. Response strength in multiple schedules. J. Exp. Anal. Behav. 1974; 21 :389–408. [ PMC free article ] [ PubMed ] [ Google Scholar ]
  • Perone M, Courtney K. Fixed-ratio pausing: joint effects of past reinforcer magnitude and stimuli correlated with upcoming magnitude. J. Exp. Anal. Behav. 1992; 57 :33–46. [ PMC free article ] [ PubMed ] [ Google Scholar ]
  • Plowright CMS, Church D, Behnke P, Silverman A. Time estimation by pigeons on a fixed interval: the effect of pre-feeding. Behav. Process. 2000; 52 :43–48. [ PubMed ] [ Google Scholar ]
  • Powell RW. The effect of small sequential changes in fixed-ratio size upon the post-reinforcement pause. J. Exp. Anal. Behav. 1968; 11 :589–93. [ PMC free article ] [ PubMed ] [ Google Scholar ]
  • Powell RW. The effect of reinforcement magnitude upon responding under fixed-ratio schedules. J. Exp. Anal. Behav. 1969; 12 :605–8. [ PMC free article ] [ PubMed ] [ Google Scholar ]
  • Rachlin H. Introduction to Modern Behaviorism. Freeman; New York: 1991. [ Google Scholar ]
  • Rachlin H. The Science of Self-Control. Harvard Univ. Press; Cambridge, MA: 2000. [ Google Scholar ]
  • Rachlin H, Green L. Commitment, choice and self-control. J. Exp. Anal. Behav. 1972; 17 :15–22. [ PMC free article ] [ PubMed ] [ Google Scholar ]
  • Rachlin H, Green L, Kagel JH, Battalio RC. Economic demand theory and psychological studies of choice. In: Bower GH, editor. The Psychology of Learning and Motivation. Vol. 10. Academic; New York: 1976. pp. 129–54. [ Google Scholar ]
  • Rachlin H, Laibson DI, editors. The Matching Law: Papers in Psychology and Economics. Harvard Univ. Press; Cambridge, MA: 1997. [ Google Scholar ]
  • Roberts S. Isolation of an internal clock. J. Exp. Psychol.: Anim. Behav. Process. 1981; 7 :242–68. [ PubMed ] [ Google Scholar ]
  • Roberts S. Properties and function of an internal clock. In: Melgren R, editor. Animal Cognition and Behavior. North-Holland; Amsterdam: 1983. pp. 345–97. [ Google Scholar ]
  • Royalty P, Williams B, Fantino E. Effects of delayed reinforcement in chain schedules. J. Exp. Anal. Behav. 1987; 47 :41–56. [ PMC free article ] [ PubMed ] [ Google Scholar ]
  • Schneider BA. A two-state analysis of fixed-interval responding in pigeons. J. Exp. Anal. Behav. 1969; 12 :677–87. [ PMC free article ] [ PubMed ] [ Google Scholar ]
  • Shimp CP. The concurrent reinforcement of two interresponse times: the relative frequency of an interresponse time equals its relative harmonic length. J. Exp. Anal. Behav. 1969; 1 :403–11. [ PMC free article ] [ PubMed ] [ Google Scholar ]
  • Shimp CP. Behavior as a social construction. Behav. Process. 2001; 54 :11–32. [ PubMed ] [ Google Scholar ]
  • Shull RL. The response-reinforcement dependency in fixed-interval schedules of reinforcement. J. Exp. Anal. Behav. 1970; 14 :55–60. [ PMC free article ] [ PubMed ] [ Google Scholar ]
  • Shull RL, Harzem P. The postreinforcement pause: some implications for the correlational law of effect. In: Zeiler MD, editor. Reinforcement and the Organization of Behavior. Academic; New York: 1979. pp. 193–221. [ Google Scholar ]
  • Shull RL, Gaynor ST, Grimes JA. Response rate viewed as engagement bouts: effects of relative reinforcement and schedule type. J. Exp. Anal. Behav. 2001; 75 :247–74. [ PMC free article ] [ PubMed ] [ Google Scholar ]
  • Shull RL, Pliskoff SS. Changeover delay and concurrent schedules: some effects on relative performance measures. J. Exp. Anal. Behav. 1967; 10 :517–27. [ PMC free article ] [ PubMed ] [ Google Scholar ]
  • Sidman M. Tactics of Scientific Research: Evaluating Experimental Data in Psychology. Basic Books; New York: 1960. [ Google Scholar ]
  • Skinner BF. Two types of conditioned reflex: a reply to Konorski and Miller. J. Gen. Psychol. 1937; 16 :272–79. [ Google Scholar ]
  • Skinner BF. The Behavior of Organisms. Appleton-Century; New York: 1938. [ Google Scholar ]
  • Skinner BF. Operant behavior. In: Honig WK, editor. Operant Behavior: Areas of Research and Application. Appleton-Century-Crofts; New York: 1966. pp. 12–32. [ Google Scholar ]
  • Skinner BF. Some thoughts about the future. J. Exp. Anal. Behav. 1986; 45 :229–35. [ PMC free article ] [ PubMed ] [ Google Scholar ]
  • Staddon JER. Some properties of spaced responding in pigeons. J. Exp. Anal. Behav. 1965; 8 :19–27. [ PMC free article ] [ PubMed ] [ Google Scholar ]
  • Staddon JER. Spaced responding and choice: a preliminary analysis. J. Exp. Anal. Behav. 1968; 11 :669–82. [ PMC free article ] [ PubMed ] [ Google Scholar ]
  • Staddon JER. Temporal effects of reinforcement: a negative “frustration” effect. Learn. Motiv. 1970; 1 :227–47. [ Google Scholar ]
  • Staddon JER. Reinforcement omission on temporal go-no-go schedules. J. Exp. Anal. Behav. 1972; 18 :223–29. [ PMC free article ] [ PubMed ] [ Google Scholar ]
  • Staddon JER. Temporal control, attention and memory. Psychol. Rev. 1974; 81 :375–91. [ Google Scholar ]
  • Staddon JER. On Herrnstein's equation and related forms. J. Exp. Anal. Behav. 1977; 28 :163–70. [ PMC free article ] [ PubMed ] [ Google Scholar ]
  • Staddon JER. Behavioral competition, contrast, and matching. In: Commons ML, Herrnstein RJ, Rachlin H, editors. Quantitative Analyses of Behavior. Quantitative Analyses of Operant Behavior: Matching and Maximizing Accounts. Vol. 2. Ballinger; Cambridge, MA: 1982. pp. 243–61. [ Google Scholar ]
  • Staddon JER. Adaptive Dynamics: The Theoretical Analysis of Behavior. MIT/Bradford; Cambridge, MA: 2001a. p. 423. [ Google Scholar ]
  • Staddon JER. The New Behaviorism: Mind, Mechanism and Society. Psychol. Press; Philadelphia: 2001b. p. 211. [ Google Scholar ]
  • Staddon JER, Chelaru IM, Higa JJ. A tuned-trace theory of interval-timing dynamics. J. Exp. Anal. Behav. 2002; 77 :105–24. [ PMC free article ] [ PubMed ] [ Google Scholar ]
  • Staddon JER, Higa JJ. Time and memory: towards a pacemaker-free theory of interval timing. J. Exp. Anal. Behav. 1999; 71 :215–51. [ PMC free article ] [ PubMed ] [ Google Scholar ]
  • Staddon JER, Innis NK. An effect analogous to “frustration” on interval reinforcement schedules. Psychon. Sci. 1966a; 4 :287–88. [ Google Scholar ]
  • Staddon JER, Innis NK. Preference for fixed vs. variable amounts of reward. Psychon. Sci. 1966b; 4 :193–94. [ Google Scholar ]
  • Staddon JER, Innis NK. Reinforcement omission on fixed-interval schedules. J. Exp. Anal. Behav. 1969; 12 :689–700. [ PMC free article ] [ PubMed ] [ Google Scholar ]
  • Staddon JER, Motheral S. On matching and maximizing in operant choice experiments. Psychol. Rev. 1978; 85 :436–44. [ Google Scholar ]
  • Starr B, Staddon JER. Temporal control on fixed-interval schedules: signal properties of reinforcement and blackout. J. Exp. Anal. Behav. 1974; 22 :535–45. [ PMC free article ] [ PubMed ] [ Google Scholar ]
  • Stubbs A. The discrimination of stimulus duration by pigeons. J. Exp. Anal. Behav. 1968; 11 :223–38. [ PMC free article ] [ PubMed ] [ Google Scholar ]
  • Stubbs DA, Dreyfus LR, Fetterman JG, Boynton DM, Locklin N, Smith LD. Duration comparison: relative stimulus differences, stimulus age and stimulus predictiveness. J. Exp. Anal. Behav. 1994; 62 :15–32. [ PMC free article ] [ PubMed ] [ Google Scholar ]
  • Treisman M. Temporal discrimination and the indifference interval: implications for a model of the “internal clock.” Psychol. Monogr. 1963; 77 (756) [ PubMed ] [ Google Scholar ]
  • Wearden JH. The power law and Weber's law in fixed-interval post-reinforcement pausing. Q. J. Exp. Psychol. B. 1985; 37 :191–211. [ Google Scholar ]
  • Williams BA. Reinforcement, choice, and response strength. In: Atkinson RC, Herrnstein RJ, Lindzey G, Luce RD, editors. Stevens' Handbook of Experimental Psychology. 2nd Wiley; New York: 1988. pp. 167–244. [ Google Scholar ]
  • Williams BA. Conditioned reinforcement: neglected or outmoded explanatory construct? Psychon. Bull. Rev. 1994; 1 :457–75. [ PubMed ] [ Google Scholar ]
  • Williams BA. Conditioned reinforcement dynamics in three-link chained schedules. J. Exp. Anal. Behav. 1997; 67 :145–59. [ PMC free article ] [ PubMed ] [ Google Scholar ]
  • Williams BA, Royalty P. Conditioned reinforcement versus time to primary reinforcement in chain schedules. J. Exp. Anal. Behav. 1990; 53 :381–93. [ PMC free article ] [ PubMed ] [ Google Scholar ]
  • Wynne CDL, Staddon JER. Typical delay determines waiting time on periodic-food schedules: static and dynamic tests. J. Exp. Anal. Behav. 1988; 50 :197–210. [ PMC free article ] [ PubMed ] [ Google Scholar ]
  • Zeiler MD. Schedules of reinforcement: the controlling variables. 1977:201–32. See Honig & Staddon 1977. [ Google Scholar ]
  • Zeiler MD, Powell DG. Temporal control in fixed-interval schedules. J. Exp. Anal. Behav. 1994; 61 :1–9. [ PMC free article ] [ PubMed ] [ Google Scholar ]
  • Zuriff G. Behaviorism: A Conceptual Reconstruction. Columbia Univ. Press; New York: 1985. [ Google Scholar ]

Logo for LOUIS Pressbooks: Open Educational Resources from the Louisiana Library Network

34 Operant Conditioning

Learning Objectives

By the end of this section, you will be able to:

  • Define operant conditioning
  • Explain the difference between reinforcement and punishment
  • Distinguish between reinforcement schedules

The previous section of this chapter focused on the type of associative learning known as classical conditioning. Remember that in classical conditioning, something in the environment triggers a reflex automatically, and researchers train the organism to react to a different stimulus. Now we turn to the second type of associative learning, operant conditioning . In operant conditioning, organisms learn to associate a behavior with its consequence. A pleasant consequence makes that behavior more likely to be repeated in the future. For example, Spirit, a dolphin at the National Aquarium in Baltimore, does a flip in the air when her trainer blows a whistle. The consequence is that she gets a fish.

Psychologist B. F. Skinner saw that classical conditioning is limited to existing behaviors that are reflexively elicited, and it doesn’t account for new behaviors such as riding a bike. He proposed a theory about how such behaviors come about. Skinner believed that behavior is motivated by the consequences we receive for the behavior: the reinforcements and punishments. His idea that learning is the result of consequences is based on the law of effect, which was first proposed by psychologist Edward Thorndike . According to the law of effect , behaviors that are followed by consequences that are satisfying to the organism are more likely to be repeated, and behaviors that are followed by unpleasant consequences are less likely to be repeated (Thorndike, 1911). Essentially, if an organism does something that brings about a desired result, the organism is more likely to do it again. If an organism does something that does not bring about a desired result, the organism is less likely to do it again. An example of the law of effect is in employment. One of the reasons (and often the main reason) we show up for work is because we get paid to do so. If we stop getting paid, we will likely stop showing up—even if we love our job.

Working with Thorndike’s law of effect as his foundation, Skinner began conducting scientific experiments on animals (mainly rats and pigeons) to determine how organisms learn through operant conditioning (Skinner, 1938). He placed these animals inside an operant conditioning chamber, which has come to be known as a “Skinner box.” A Skinner box contains a lever (for rats) or disk (for pigeons) that the animal can press or peck for a food reward via the dispenser. Speakers and lights can be associated with certain behaviors. A recorder counts the number of responses made by the animal.

A photograph shows B.F. Skinner. An illustration shows a rat in a Skinner box: a chamber with a speaker, lights, a lever, and a food dispenser.

In discussing operant conditioning, we use several everyday words—positive, negative, reinforcement, and punishment—in a specialized manner. In operant conditioning, positive and negative do not mean good and bad. Instead, positive means you are adding something, and negative means you are taking something away. Reinforcement means you are increasing a behavior, and punishment means you are decreasing a behavior. Reinforcement can be positive or negative, and punishment can also be positive or negative. All reinforcers (positive or negative) increase the likelihood of a behavioral response. All punishers (positive or negative) decrease the likelihood of a behavioral response. Now let’s combine these four terms: positive reinforcement, negative reinforcement, positive punishment, and negative punishment.

Reinforcement

The most effective way to teach a person or animal a new behavior is with positive reinforcement. In positive reinforcement , a desirable stimulus is added to increase a behavior.

For example, you tell your five-year-old son, Jerome, that if he cleans his room, he will get a toy. Jerome quickly cleans his room because he wants a new art set. Let’s pause for a moment. Some people might say, “Why should I reward my child for doing what is expected?” But in fact we are constantly and consistently rewarded in our lives. Our paychecks are rewards, as are high grades and acceptance into our preferred school. Being praised for doing a good job and for passing a driver’s test is also a reward. Positive reinforcement as a learning tool is extremely effective. It has been found that one of the most effective ways to increase achievement in school districts with below-average reading scores was to pay the children to read. Specifically, second-grade students in Dallas were paid two dollars each time they read a book and passed a short quiz about the book. The result was a significant increase in reading comprehension (Fryer, 2010). What do you think about this program? If Skinner were alive today, he would probably think this was a great idea. He was a strong proponent of using operant conditioning principles to influence students’ behavior at school. In fact, in addition to the Skinner box, he also invented what he called a teaching machine that was designed to reward small steps in learning (Skinner, 1961)—an early forerunner of computer-assisted learning. His teaching machine tested students’ knowledge as they worked through various school subjects. If students answered questions correctly, they received immediate positive reinforcement and could continue; if they answered incorrectly, they did not receive any reinforcement. The idea was that students would spend additional time studying the material to increase their chance of being reinforced the next time (Skinner, 1961).

In negative reinforcement , an undesirable stimulus is removed to increase a behavior. For example, car manufacturers use the principles of negative reinforcement in their seatbelt systems, which go “beep, beep, beep” until you fasten your seatbelt. The annoying sound stops when you exhibit the desired behavior, increasing the likelihood that you will buckle up in the future. Negative reinforcement is also used frequently in horse training. Riders apply pressure—by pulling the reins or squeezing their legs—and then remove the pressure when the horse performs the desired behavior, such as turning or speeding up. The pressure is the negative stimulus that the horse wants to remove.

Many people confuse negative reinforcement with punishment in operant conditioning, but they are two very different mechanisms. Remember that reinforcement, even when it is negative, always increases a behavior. In contrast, punishment always decreases a behavior. In positive punishment , you add an undesirable stimulus to decrease a behavior. An example of positive punishment is scolding a student to get the student to stop texting in class. In this case, a stimulus (the reprimand) is added in order to decrease the behavior (texting in class). In negative punishment , you remove an aversive stimulus to decrease behavior. For example, when a child misbehaves, a parent can take away a favorite toy. In this case, a stimulus (the toy) is removed in order to decrease the behavior.

Punishment, especially when it is immediate, is one way to decrease undesirable behavior. For example, imagine your four-year-old son, Brandon, hit his younger brother. You have Brandon write one hundred times “I will not hit my brother” (positive punishment). Chances are he won’t repeat this behavior. While strategies like this are common today, in the past children were often subject to physical punishment, such as spanking. It’s important to be aware of some of the drawbacks of using physical punishment on children. First, punishment may teach fear. Brandon may become fearful of the street, but he also may become fearful of the person who delivered the punishment—you, his parent. Similarly, children who are punished by teachers may come to fear the teacher and try to avoid school (Gershoff et al., 2010). Consequently, most schools in the United States have banned corporal punishment. Second, punishment may cause children to become more aggressive and prone to antisocial behavior and delinquency (Gershoff, 2002). They see their parents resort to spanking when they become angry and frustrated, so, in turn, they may act out this same behavior when they become angry and frustrated. For example, because you spank Brenda when you are angry with her for her misbehavior, she might start hitting her friends when they won’t share their toys.

While positive punishment can be effective in some cases, Skinner suggested that the use of punishment should be weighed against the possible negative effects. Today’s psychologists and parenting experts favor reinforcement over punishment—they recommend that you catch your child doing something good and reward her for it.

In his operant conditioning experiments, Skinner often used an approach called shaping. Instead of rewarding only the target behavior, in shaping , we reward successive approximations of a target behavior. Why is shaping needed? Remember that in order for reinforcement to work, the organism must first display the behavior. Shaping is needed because it is extremely unlikely that an organism will display anything but the simplest of behaviors spontaneously. In shaping, behaviors are broken down into many small, achievable steps. The specific steps used in the process are the following:

Shaping is often used in teaching a complex behavior or chain of behaviors. Skinner used shaping to teach pigeons not only such relatively simple behaviors as pecking a disk in a Skinner box, but also many unusual and entertaining behaviors, such as turning in circles, walking in figure eights, and even playing ping pong; the technique is commonly used by animal trainers today. An important part of shaping is stimulus discrimination. Recall Pavlov’s dogs—he trained them to respond to the tone of a bell, and not to similar tones or sounds. This discrimination is also important in operant conditioning and in shaping behavior.

Here is a brief video of Skinner’s pigeons playing ping pong: BF Skinner Foundation – Pigeon Ping Pong Clip .

It’s easy to see how shaping is effective in teaching behaviors to animals, but how does shaping work with humans? Let’s consider parents whose goal is to have their child learn to clean his room. They use shaping to help him master steps toward the goal. Instead of performing the entire task, they set up these steps and reinforce each step. First, he cleans up one toy. Second, he cleans up five toys. Third, he chooses whether to pick up ten toys or put his books and clothes away. Fourth, he cleans up everything except two toys. Finally, he cleans his entire room.

Test Your Understanding

Primary and secondary reinforcers.

Rewards such as stickers, praise, money, toys, and more can be used to reinforce learning. Let’s go back to Skinner’s rats again. How did the rats learn to press the lever in the Skinner box? They were rewarded with food each time they pressed the lever. For animals, food would be an obvious reinforcer.

What would be a good reinforcer for humans? For your daughter Sydney, it was the promise of a toy if she cleaned her room. How about Joaquin, the soccer player? If you gave Joaquin a piece of candy every time he made a goal, you would be using a primary reinforcer . Primary reinforcers are reinforcers that have innate reinforcing qualities. These kinds of reinforcers are not learned. Water, food, sleep, shelter, sex, and touch, among others, are primary reinforcers. Pleasure is also a primary reinforcer. Organisms do not lose their drive for these things. For most people, jumping in a cool lake on a very hot day would be reinforcing and the cool lake would be innately reinforcing—the water would cool the person off (a physical need), as well as provide pleasure.

A secondary reinforcer has no inherent value and only has reinforcing qualities when linked with a primary reinforcer. Praise, linked to affection, is one example of a secondary reinforcer, as when you called out “Great shot!” every time Joaquin made a goal. Another example, money, is only worth something when you can use it to buy other things—either things that satisfy basic needs (food, water, shelter—all primary reinforcers) or other secondary reinforcers. If you were on a remote island in the middle of the Pacific Ocean and you had stacks of money, the money would not be useful if you could not spend it. What about the stickers on the behavior chart? They also are secondary reinforcers.

Sometimes, instead of stickers on a sticker chart, a token is used. Tokens, which are also secondary reinforcers, can then be traded in for rewards and prizes. Entire behavior management systems, known as token economies, are built around the use of these kinds of token reinforcers. Token economies have been found to be very effective at modifying behavior in a variety of settings such as schools, prisons, and mental hospitals. For example, a study by Cangi and Daly (2013) found that the use of a token economy increased appropriate social behaviors and reduced inappropriate behaviors in a group of autistic schoolchildren. Autistic children tend to exhibit disruptive behaviors such as pinching and hitting. When the children in the study exhibited appropriate behavior (not hitting or pinching), they received a “quiet hands” token. When they hit or pinched, they lost a token. The children could then exchange specified amounts of tokens for minutes of playtime.

Parents and teachers often use behavior modification to change a child’s behavior. Behavior modification uses the principles of operant conditioning to accomplish behavior change so that undesirable behaviors are switched for more socially acceptable ones. Some teachers and parents create a sticker chart, in which several behaviors are listed. Sticker charts are a form of token economies, as described in the text. Each time children perform the behavior, they get a sticker, and after a certain number of stickers, they get a prize, or reinforcer. The goal is to increase acceptable behaviors and decrease misbehavior. Remember, it is best to reinforce desired behaviors, rather than to use punishment. In the classroom, the teacher can reinforce a wide range of behaviors, from students raising their hands, to walking quietly in the hall, to turning in their homework. At home, parents might create a behavior chart that rewards children for things such as putting away toys, brushing their teeth, and helping with dinner. In order for behavior modification to be effective, the reinforcement needs to be connected with the behavior; the reinforcement must matter to the child and be done consistently.

A photograph shows a child placing stickers on a chart hanging on the wall.

Time-out is another popular technique used in behavior modification with children. It operates on the principle of negative punishment. When a child demonstrates an undesirable behavior, she is removed from the desirable activity at hand. For example, say that Sophia and her brother Mario are playing with building blocks. Sophia throws some blocks at her brother, so you give her a warning that she will go to time-out if she does it again. A few minutes later, she throws more blocks at Mario. You remove Sophia from the room for a few minutes. When she comes back, she doesn’t throw blocks.

There are several important points that you should know if you plan to implement time-out as a behavior modification technique. First, make sure the child is being removed from a desirable activity and placed in a less desirable location. If the activity is something undesirable for the child, this technique will backfire because it is more enjoyable for the child to be removed from the activity. Second, the length of the time-out is important. The general rule of thumb is one minute for each year of the child’s age. Sophia is five; therefore, she sits in a time-out for five minutes. Setting a timer helps children know how long they have to sit in time-out. Finally, as a caregiver, keep several guidelines in mind over the course of a time-out: remain calm when directing your child to time-out; ignore your child during time-out (because caregiver attention may reinforce misbehavior); and give the child a hug or a kind word when time-out is over.

Photograph A shows several children climbing on playground equipment. Photograph B shows a child sitting alone at a table looking at the playground.

Reinforcement Schedules

Remember, the best way to teach a person or animal a behavior is to use positive reinforcement. For example, Skinner used positive reinforcement to teach rats to press a lever in a Skinner box. At first, the rat might randomly hit the lever while exploring the box, and out would come a pellet of food. After eating the pellet, what do you think the hungry rat did next? It hit the lever again, and received another pellet of food. Each time the rat hit the lever, a pellet of food came out. When an organism receives a reinforcer each time it displays a behavior, it is called continuous reinforcement . This reinforcement schedule is the quickest way to teach someone a behavior, and it is especially effective in training a new behavior. Let’s look back at the dog that was learning to sit earlier in the chapter. Now, each time he sits, you give him a treat. Timing is important here: you will be most successful if you present the reinforcer immediately after he sits, so that he can make an association between the target behavior (sitting) and the consequence (getting a treat).

Watch this video clip where veterinarian Dr. Sophia Yin shapes a dog’s behavior using the steps outlined above: Free Shaping with an Australian CattleDog | drsophiayin.com .

Once a behavior is trained, researchers and trainers often turn to another type of reinforcement schedule—partial reinforcement. In partial reinforcement , also referred to as intermittent reinforcement, the person or animal does not get reinforced every time they perform the desired behavior. There are several different types of partial reinforcement schedules. These schedules are described as either fixed or variable, and as either interval or ratio. Fixed refers to the number of responses between reinforcements, or the amount of time between reinforcements, which is set and unchanging. Variable refers to the number of responses or amount of time between reinforcements, which varies or changes. Interval means the schedule is based on the time between reinforcements, and ratio means the schedule is based on the number of responses between reinforcements.

Now let’s combine these four terms. A fixed interval reinforcement schedule is when behavior is rewarded after a set amount of time. For example, June undergoes major surgery in a hospital. During recovery, she is expected to experience pain and will require prescription medications for pain relief. June is given an IV drip with a patient-controlled painkiller. Her doctor sets a limit: one dose per hour. June pushes a button when pain becomes difficult, and she receives a dose of medication. Since the reward (pain relief) only occurs on a fixed interval, there is no point in exhibiting the behavior when it will not be rewarded.

With a variable interval reinforcement schedule , the person or animal gets the reinforcement based on varying amounts of time, which are unpredictable. Say that Manuel is the manager at a fast-food restaurant. Every once in a while someone from the quality control division comes to Manuel’s restaurant. If the restaurant is clean and the service is fast, everyone on that shift earns a $20 bonus. Manuel never knows when the quality control person will show up, so he always tries to keep the restaurant clean and ensures that his employees provide prompt and courteous service. His productivity regarding prompt service and keeping a clean restaurant is steady because he wants his crew to earn the bonus.

With a fixed ratio reinforcement schedule , there are a set number of responses that must occur before the behavior is rewarded. Carla sells glasses at an eyeglass store, and she earns a commission every time she sells a pair of glasses. She always tries to sell people more pairs of glasses, including prescription sunglasses or a backup pair, so she can increase her commission. She does not care if the person really needs the prescription sunglasses, Carla just wants her bonus. The quality of what Carla sells does not matter because her commission is not based on quality; it’s only based on the number of pairs sold. This distinction in the quality of performance can help determine which reinforcement method is most appropriate for a particular situation. Fixed ratios are better suited to optimize the quantity of output, whereas a fixed interval, in which the reward is not quantity-based, can lead to a higher quality of output.

In a variable ratio reinforcement schedule , the number of responses needed for a reward varies. This is the most powerful partial reinforcement schedule. An example of the variable ratio reinforcement schedule is gambling. Imagine that Sarah—generally a smart, thrifty woman—visits Las Vegas for the first time. She is not a gambler, but out of curiosity she puts a quarter into the slot machine, and then another, and another. Nothing happens. Two dollars in quarters later, her curiosity is fading, and she is just about to quit. But then, the machine lights up, bells go off, and Sarah gets 50 quarters back. That’s more like it! Sarah gets back to inserting quarters with renewed interest, and a few minutes later she has used up all her gains and is $10 in the hole. Now might be a sensible time to quit. And yet, she keeps putting money into the slot machine because she never knows when the next reinforcement is coming. She keeps thinking that with the next quarter she could win $50, or $100, or even more. Because the reinforcement schedule in most types of gambling has a variable ratio schedule, people keep trying and hoping that the next time they will win big. This is one of the reasons that gambling is so addictive—and so resistant to extinction.

In operant conditioning, the extinction of a reinforced behavior occurs at some point after reinforcement stops, and the speed at which this happens depends on the reinforcement schedule. In a variable ratio schedule, the point of extinction comes very slowly, as described above. But in the other reinforcement schedules, extinction may come quickly. For example, if June presses the button for the pain relief medication before the allotted time her doctor has approved, no medication is administered. She is on a fixed interval reinforcement schedule (dosed hourly), so extinction occurs quickly when reinforcement doesn’t come at the expected time. Among the reinforcement schedules, variable ratio is the most productive and the most resistant to extinction. Fixed interval is the least productive and the easiest to extinguish.

A graph has an x-axis labeled “Time” and a y-axis labeled “Cumulative number of responses.” Two lines labeled “Variable Ratio” and “Fixed Ratio” have similar, steep slopes. The variable ratio line remains straight and is marked in random points where reinforcement occurs. The fixed ratio line has consistently spaced marks indicating where reinforcement has occurred, but after each reinforcement, there is a small drop in the line before it resumes its overall slope. Two lines labeled “Variable Interval” and “Fixed Interval” have similar slopes at roughly a 45-degree angle. The variable interval line remains straight and is marked in random points where reinforcement occurs. The fixed interval line has consistently spaced marks indicating where reinforcement has occurred, but after each reinforcement, there is a drop in the line.

Review Questions

Critical thinking questions.

A Skinner box is an operant conditioning chamber used to train animals such as rats and pigeons to perform certain behaviors, like pressing a lever. When the animals perform the desired behavior, they receive a reward: food or water.

In negative reinforcement you are taking away an undesirable stimulus in order to increase the frequency of a certain behavior (e.g., buckling your seat belt stops the annoying beeping sound in your car and increases the likelihood that you will wear your seatbelt). Punishment is designed to reduce a behavior (e.g., you scold your child for running into the street in order to decrease the unsafe behavior.)

Shaping is an operant conditioning method in which you reward closer and closer approximations of the desired behavior. If you want to teach your dog to roll over, you might reward him first when he sits, then when he lies down, and then when he lies down and rolls onto his back. Finally, you would reward him only when he completes the entire sequence: lying down, rolling onto his back, and then continuing to roll over to his other side.

Personal Application Questions

Operant conditioning is based on the work of B. F. Skinner. Operant conditioning is a form of learning in which the motivation for a behavior happens after the behavior is demonstrated. An animal or a human receives a consequence after performing a specific behavior. The consequence is either a reinforcer or a punisher. All reinforcement (positive or negative) increases the likelihood of a behavioral response. All punishment (positive or negative) decreases the likelihood of a behavioral response. Several types of reinforcement schedules are used to reward behavior depending on either a set or variable period of time.

Operant Conditioning Copyright © 2022 by LOUIS: The Louisiana Library Network is licensed under a Creative Commons Attribution 4.0 International License , except where otherwise noted.

Share This Book

34 Operant Conditioning

[latexpage]

Learning Objectives

By the end of this section, you will be able to:

  • Define operant conditioning
  • Explain the difference between reinforcement and punishment
  • Distinguish between reinforcement schedules

The previous section of this chapter focused on the type of associative learning known as classical conditioning. Remember that in classical conditioning, something in the environment triggers a reflex automatically, and researchers train the organism to react to a different stimulus. Now we turn to the second type of associative learning, operant conditioning . In operant conditioning, organisms learn to associate a behavior and its consequence ( [link] ). A pleasant consequence makes that behavior more likely to be repeated in the future. For example, Spirit, a dolphin at the National Aquarium in Baltimore, does a flip in the air when her trainer blows a whistle. The consequence is that she gets a fish.

Psychologist B. F. Skinner saw that classical conditioning is limited to existing behaviors that are reflexively elicited, and it doesn’t account for new behaviors such as riding a bike. He proposed a theory about how such behaviors come about. Skinner believed that behavior is motivated by the consequences we receive for the behavior: the reinforcements and punishments. His idea that learning is the result of consequences is based on the law of effect, which was first proposed by psychologist Edward Thorndike . According to the law of effect , behaviors that are followed by consequences that are satisfying to the organism are more likely to be repeated, and behaviors that are followed by unpleasant consequences are less likely to be repeated (Thorndike, 1911). Essentially, if an organism does something that brings about a desired result, the organism is more likely to do it again. If an organism does something that does not bring about a desired result, the organism is less likely to do it again. An example of the law of effect is in employment. One of the reasons (and often the main reason) we show up for work is because we get paid to do so. If we stop getting paid, we will likely stop showing up—even if we love our job.

Working with Thorndike’s law of effect as his foundation, Skinner began conducting scientific experiments on animals (mainly rats and pigeons) to determine how organisms learn through operant conditioning (Skinner, 1938). He placed these animals inside an operant conditioning chamber, which has come to be known as a “Skinner box” ( [link] ). A Skinner box contains a lever (for rats) or disk (for pigeons) that the animal can press or peck for a food reward via the dispenser. Speakers and lights can be associated with certain behaviors. A recorder counts the number of responses made by the animal.

A photograph shows B.F. Skinner. An illustration shows a rat in a Skinner box: a chamber with a speaker, lights, a lever, and a food dispenser.

Watch this brief video clip to learn more about operant conditioning: Skinner is interviewed, and operant conditioning of pigeons is demonstrated.

In discussing operant conditioning, we use several everyday words—positive, negative, reinforcement, and punishment—in a specialized manner. In operant conditioning, positive and negative do not mean good and bad. Instead, positive means you are adding something, and negative means you are taking something away. Reinforcement means you are increasing a behavior, and punishment means you are decreasing a behavior. Reinforcement can be positive or negative, and punishment can also be positive or negative. All reinforcers (positive or negative) increase the likelihood of a behavioral response. All punishers (positive or negative) decrease the likelihood of a behavioral response. Now let’s combine these four terms: positive reinforcement, negative reinforcement, positive punishment, and negative punishment ( [link] ).

REINFORCEMENT

The most effective way to teach a person or animal a new behavior is with positive reinforcement. In positive reinforcement , a desirable stimulus is added to increase a behavior.

For example, you tell your five-year-old son, Jerome, that if he cleans his room, he will get a toy. Jerome quickly cleans his room because he wants a new art set. Let’s pause for a moment. Some people might say, “Why should I reward my child for doing what is expected?” But in fact we are constantly and consistently rewarded in our lives. Our paychecks are rewards, as are high grades and acceptance into our preferred school. Being praised for doing a good job and for passing a driver’s test is also a reward. Positive reinforcement as a learning tool is extremely effective. It has been found that one of the most effective ways to increase achievement in school districts with below-average reading scores was to pay the children to read. Specifically, second-grade students in Dallas were paid $2 each time they read a book and passed a short quiz about the book. The result was a significant increase in reading comprehension (Fryer, 2010). What do you think about this program? If Skinner were alive today, he would probably think this was a great idea. He was a strong proponent of using operant conditioning principles to influence students’ behavior at school. In fact, in addition to the Skinner box, he also invented what he called a teaching machine that was designed to reward small steps in learning (Skinner, 1961)—an early forerunner of computer-assisted learning. His teaching machine tested students’ knowledge as they worked through various school subjects. If students answered questions correctly, they received immediate positive reinforcement and could continue; if they answered incorrectly, they did not receive any reinforcement. The idea was that students would spend additional time studying the material to increase their chance of being reinforced the next time (Skinner, 1961).

In negative reinforcement , an undesirable stimulus is removed to increase a behavior. For example, car manufacturers use the principles of negative reinforcement in their seatbelt systems, which go “beep, beep, beep” until you fasten your seatbelt. The annoying sound stops when you exhibit the desired behavior, increasing the likelihood that you will buckle up in the future. Negative reinforcement is also used frequently in horse training. Riders apply pressure—by pulling the reins or squeezing their legs—and then remove the pressure when the horse performs the desired behavior, such as turning or speeding up. The pressure is the negative stimulus that the horse wants to remove.

Many people confuse negative reinforcement with punishment in operant conditioning, but they are two very different mechanisms. Remember that reinforcement, even when it is negative, always increases a behavior. In contrast, punishment always decreases a behavior. In positive punishment , you add an undesirable stimulus to decrease a behavior. An example of positive punishment is scolding a student to get the student to stop texting in class. In this case, a stimulus (the reprimand) is added in order to decrease the behavior (texting in class). In negative punishment , you remove an aversive stimulus to decrease behavior. For example, when a child misbehaves, a parent can take away a favorite toy. In this case, a stimulus (the toy) is removed in order to decrease the behavior.

Punishment, especially when it is immediate, is one way to decrease undesirable behavior. For example, imagine your four-year-old son, Brandon, hit his younger brother. You have Brandon write 100 times “I will not hit my brother” (positive punishment). Chances are he won’t repeat this behavior. While strategies like this are common today, in the past children were often subject to physical punishment, such as spanking. It’s important to be aware of some of the drawbacks in using physical punishment on children. First, punishment may teach fear. Brandon may become fearful of the street, but he also may become fearful of the person who delivered the punishment—you, his parent. Similarly, children who are punished by teachers may come to fear the teacher and try to avoid school (Gershoff et al., 2010). Consequently, most schools in the United States have banned corporal punishment. Second, punishment may cause children to become more aggressive and prone to antisocial behavior and delinquency (Gershoff, 2002). They see their parents resort to spanking when they become angry and frustrated, so, in turn, they may act out this same behavior when they become angry and frustrated. For example, because you spank Brenda when you are angry with her for her misbehavior, she might start hitting her friends when they won’t share their toys.

While positive punishment can be effective in some cases, Skinner suggested that the use of punishment should be weighed against the possible negative effects. Today’s psychologists and parenting experts favor reinforcement over punishment—they recommend that you catch your child doing something good and reward her for it.

In his operant conditioning experiments, Skinner often used an approach called shaping. Instead of rewarding only the target behavior, in shaping , we reward successive approximations of a target behavior. Why is shaping needed? Remember that in order for reinforcement to work, the organism must first display the behavior. Shaping is needed because it is extremely unlikely that an organism will display anything but the simplest of behaviors spontaneously. In shaping, behaviors are broken down into many small, achievable steps. The specific steps used in the process are the following:

Shaping is often used in teaching a complex behavior or chain of behaviors. Skinner used shaping to teach pigeons not only such relatively simple behaviors as pecking a disk in a Skinner box, but also many unusual and entertaining behaviors, such as turning in circles, walking in figure eights, and even playing ping pong; the technique is commonly used by animal trainers today. An important part of shaping is stimulus discrimination. Recall Pavlov’s dogs—he trained them to respond to the tone of a bell, and not to similar tones or sounds. This discrimination is also important in operant conditioning and in shaping behavior.

Here is a brief video of Skinner’s pigeons playing ping pong.

It’s easy to see how shaping is effective in teaching behaviors to animals, but how does shaping work with humans? Let’s consider parents whose goal is to have their child learn to clean his room. They use shaping to help him master steps toward the goal. Instead of performing the entire task, they set up these steps and reinforce each step. First, he cleans up one toy. Second, he cleans up five toys. Third, he chooses whether to pick up ten toys or put his books and clothes away. Fourth, he cleans up everything except two toys. Finally, he cleans his entire room.

PRIMARY AND SECONDARY REINFORCERS

Rewards such as stickers, praise, money, toys, and more can be used to reinforce learning. Let’s go back to Skinner’s rats again. How did the rats learn to press the lever in the Skinner box? They were rewarded with food each time they pressed the lever. For animals, food would be an obvious reinforcer.

What would be a good reinforce for humans? For your daughter Sydney, it was the promise of a toy if she cleaned her room. How about Joaquin, the soccer player? If you gave Joaquin a piece of candy every time he made a goal, you would be using a primary reinforcer . Primary reinforcers are reinforcers that have innate reinforcing qualities. These kinds of reinforcers are not learned. Water, food, sleep, shelter, sex, and touch, among others, are primary reinforcers. Pleasure is also a primary reinforcer. Organisms do not lose their drive for these things. For most people, jumping in a cool lake on a very hot day would be reinforcing and the cool lake would be innately reinforcing—the water would cool the person off (a physical need), as well as provide pleasure.

A secondary reinforcer has no inherent value and only has reinforcing qualities when linked with a primary reinforcer. Praise, linked to affection, is one example of a secondary reinforcer, as when you called out “Great shot!” every time Joaquin made a goal. Another example, money, is only worth something when you can use it to buy other things—either things that satisfy basic needs (food, water, shelter—all primary reinforcers) or other secondary reinforcers. If you were on a remote island in the middle of the Pacific Ocean and you had stacks of money, the money would not be useful if you could not spend it. What about the stickers on the behavior chart? They also are secondary reinforcers.

Sometimes, instead of stickers on a sticker chart, a token is used. Tokens, which are also secondary reinforcers, can then be traded in for rewards and prizes. Entire behavior management systems, known as token economies, are built around the use of these kinds of token reinforcers. Token economies have been found to be very effective at modifying behavior in a variety of settings such as schools, prisons, and mental hospitals. For example, a study by Cangi and Daly (2013) found that use of a token economy increased appropriate social behaviors and reduced inappropriate behaviors in a group of autistic school children. Autistic children tend to exhibit disruptive behaviors such as pinching and hitting. When the children in the study exhibited appropriate behavior (not hitting or pinching), they received a “quiet hands” token. When they hit or pinched, they lost a token. The children could then exchange specified amounts of tokens for minutes of playtime.

Parents and teachers often use behavior modification to change a child’s behavior. Behavior modification uses the principles of operant conditioning to accomplish behavior change so that undesirable behaviors are switched for more socially acceptable ones. Some teachers and parents create a sticker chart, in which several behaviors are listed ( [link] ). Sticker charts are a form of token economies, as described in the text. Each time children perform the behavior, they get a sticker, and after a certain number of stickers, they get a prize, or reinforcer. The goal is to increase acceptable behaviors and decrease misbehavior. Remember, it is best to reinforce desired behaviors, rather than to use punishment. In the classroom, the teacher can reinforce a wide range of behaviors, from students raising their hands, to walking quietly in the hall, to turning in their homework. At home, parents might create a behavior chart that rewards children for things such as putting away toys, brushing their teeth, and helping with dinner. In order for behavior modification to be effective, the reinforcement needs to be connected with the behavior; the reinforcement must matter to the child and be done consistently.

A photograph shows a child placing stickers on a chart hanging on the wall.

Time-out is another popular technique used in behavior modification with children. It operates on the principle of negative punishment. When a child demonstrates an undesirable behavior, she is removed from the desirable activity at hand ( [link] ). For example, say that Sophia and her brother Mario are playing with building blocks. Sophia throws some blocks at her brother, so you give her a warning that she will go to time-out if she does it again. A few minutes later, she throws more blocks at Mario. You remove Sophia from the room for a few minutes. When she comes back, she doesn’t throw blocks.

There are several important points that you should know if you plan to implement time-out as a behavior modification technique. First, make sure the child is being removed from a desirable activity and placed in a less desirable location. If the activity is something undesirable for the child, this technique will backfire because it is more enjoyable for the child to be removed from the activity. Second, the length of the time-out is important. The general rule of thumb is one minute for each year of the child’s age. Sophia is five; therefore, she sits in a time-out for five minutes. Setting a timer helps children know how long they have to sit in time-out. Finally, as a caregiver, keep several guidelines in mind over the course of a time-out: remain calm when directing your child to time-out; ignore your child during time-out (because caregiver attention may reinforce misbehavior); and give the child a hug or a kind word when time-out is over.

Photograph A shows several children climbing on playground equipment. Photograph B shows a child sitting alone at a table looking at the playground.

REINFORCEMENT SCHEDULES

Remember, the best way to teach a person or animal a behavior is to use positive reinforcement. For example, Skinner used positive reinforcement to teach rats to press a lever in a Skinner box. At first, the rat might randomly hit the lever while exploring the box, and out would come a pellet of food. After eating the pellet, what do you think the hungry rat did next? It hit the lever again, and received another pellet of food. Each time the rat hit the lever, a pellet of food came out. When an organism receives a reinforcer each time it displays a behavior, it is called continuous reinforcement . This reinforcement schedule is the quickest way to teach someone a behavior, and it is especially effective in training a new behavior. Let’s look back at the dog that was learning to sit earlier in the chapter. Now, each time he sits, you give him a treat. Timing is important here: you will be most successful if you present the reinforcer immediately after he sits, so that he can make an association between the target behavior (sitting) and the consequence (getting a treat).

Watch this video clip where veterinarian Dr. Sophia Yin shapes a dog’s behavior using the steps outlined above.

Once a behavior is trained, researchers and trainers often turn to another type of reinforcement schedule—partial reinforcement. In partial reinforcement , also referred to as intermittent reinforcement, the person or animal does not get reinforced every time they perform the desired behavior. There are several different types of partial reinforcement schedules ( [link] ). These schedules are described as either fixed or variable, and as either interval or ratio. Fixed refers to the number of responses between reinforcements, or the amount of time between reinforcements, which is set and unchanging. Variable refers to the number of responses or amount of time between reinforcements, which varies or changes. Interval means the schedule is based on the time between reinforcements, and ratio means the schedule is based on the number of responses between reinforcements.

Now let’s combine these four terms. A fixed interval reinforcement schedule is when behavior is rewarded after a set amount of time. For example, June undergoes major surgery in a hospital. During recovery, she is expected to experience pain and will require prescription medications for pain relief. June is given an IV drip with a patient-controlled painkiller. Her doctor sets a limit: one dose per hour. June pushes a button when pain becomes difficult, and she receives a dose of medication. Since the reward (pain relief) only occurs on a fixed interval, there is no point in exhibiting the behavior when it will not be rewarded.

With a variable interval reinforcement schedule , the person or animal gets the reinforcement based on varying amounts of time, which are unpredictable. Say that Manuel is the manager at a fast-food restaurant. Every once in a while someone from the quality control division comes to Manuel’s restaurant. If the restaurant is clean and the service is fast, everyone on that shift earns a $20 bonus. Manuel never knows when the quality control person will show up, so he always tries to keep the restaurant clean and ensures that his employees provide prompt and courteous service. His productivity regarding prompt service and keeping a clean restaurant are steady because he wants his crew to earn the bonus.

With a fixed ratio reinforcement schedule , there are a set number of responses that must occur before the behavior is rewarded. Carla sells glasses at an eyeglass store, and she earns a commission every time she sells a pair of glasses. She always tries to sell people more pairs of glasses, including prescription sunglasses or a backup pair, so she can increase her commission. She does not care if the person really needs the prescription sunglasses, Carla just wants her bonus. The quality of what Carla sells does not matter because her commission is not based on quality; it’s only based on the number of pairs sold. This distinction in the quality of performance can help determine which reinforcement method is most appropriate for a particular situation. Fixed ratios are better suited to optimize the quantity of output, whereas a fixed interval, in which the reward is not quantity based, can lead to a higher quality of output.

In a variable ratio reinforcement schedule , the number of responses needed for a reward varies. This is the most powerful partial reinforcement schedule. An example of the variable ratio reinforcement schedule is gambling. Imagine that Sarah—generally a smart, thrifty woman—visits Las Vegas for the first time. She is not a gambler, but out of curiosity she puts a quarter into the slot machine, and then another, and another. Nothing happens. Two dollars in quarters later, her curiosity is fading, and she is just about to quit. But then, the machine lights up, bells go off, and Sarah gets 50 quarters back. That’s more like it! Sarah gets back to inserting quarters with renewed interest, and a few minutes later she has used up all her gains and is $10 in the hole. Now might be a sensible time to quit. And yet, she keeps putting money into the slot machine because she never knows when the next reinforcement is coming. She keeps thinking that with the next quarter she could win $50, or $100, or even more. Because the reinforcement schedule in most types of gambling has a variable ratio schedule, people keep trying and hoping that the next time they will win big. This is one of the reasons that gambling is so addictive—and so resistant to extinction.

In operant conditioning, extinction of a reinforced behavior occurs at some point after reinforcement stops, and the speed at which this happens depends on the reinforcement schedule. In a variable ratio schedule, the point of extinction comes very slowly, as described above. But in the other reinforcement schedules, extinction may come quickly. For example, if June presses the button for the pain relief medication before the allotted time her doctor has approved, no medication is administered. She is on a fixed interval reinforcement schedule (dosed hourly), so extinction occurs quickly when reinforcement doesn’t come at the expected time. Among the reinforcement schedules, variable ratio is the most productive and the most resistant to extinction. Fixed interval is the least productive and the easiest to extinguish ( [link] ).

A graph has an x-axis labeled “Time” and a y-axis labeled “Cumulative number of responses.” Two lines labeled “Variable Ratio” and “Fixed Ratio” have similar, steep slopes. The variable ratio line remains straight and is marked in random points where reinforcement occurs. The fixed ratio line has consistently spaced marks indicating where reinforcement has occurred, but after each reinforcement, there is a small drop in the line before it resumes its overall slope. Two lines labeled “Variable Interval” and “Fixed Interval” have similar slopes at roughly a 45-degree angle. The variable interval line remains straight and is marked in random points where reinforcement occurs. The fixed interval line has consistently spaced marks indicating where reinforcement has occurred, but after each reinforcement, there is a drop in the line.

Skinner (1953) stated, “If the gambling establishment cannot persuade a patron to turn over money with no return, it may achieve the same effect by returning part of the patron’s money on a variable-ratio schedule” (p. 397).

Skinner uses gambling as an example of the power and effectiveness of conditioning behavior based on a variable ratio reinforcement schedule. In fact, Skinner was so confident in his knowledge of gambling addiction that he even claimed he could turn a pigeon into a pathological gambler (“Skinner’s Utopia,” 1971). Beyond the power of variable ratio reinforcement, gambling seems to work on the brain in the same way as some addictive drugs. The Illinois Institute for Addiction Recovery (n.d.) reports evidence suggesting that pathological gambling is an addiction similar to a chemical addiction ( [link] ). Specifically, gambling may activate the reward centers of the brain, much like cocaine does. Research has shown that some pathological gamblers have lower levels of the neurotransmitter (brain chemical) known as norepinephrine than do normal gamblers (Roy, et al., 1988). According to a study conducted by Alec Roy and colleagues, norepinephrine is secreted when a person feels stress, arousal, or thrill; pathological gamblers use gambling to increase their levels of this neurotransmitter. Another researcher, neuroscientist Hans Breiter, has done extensive research on gambling and its effects on the brain. Breiter (as cited in Franzen, 2001) reports that “Monetary reward in a gambling-like experiment produces brain activation very similar to that observed in a cocaine addict receiving an infusion of cocaine” (para. 1). Deficiencies in serotonin (another neurotransmitter) might also contribute to compulsive behavior, including a gambling addiction.

It may be that pathological gamblers’ brains are different than those of other people, and perhaps this difference may somehow have led to their gambling addiction, as these studies seem to suggest. However, it is very difficult to ascertain the cause because it is impossible to conduct a true experiment (it would be unethical to try to turn randomly assigned participants into problem gamblers). Therefore, it may be that causation actually moves in the opposite direction—perhaps the act of gambling somehow changes neurotransmitter levels in some gamblers’ brains. It also is possible that some overlooked factor, or confounding variable, played a role in both the gambling addiction and the differences in brain chemistry.

A photograph shows four digital gaming machines.

COGNITION AND LATENT LEARNING

Although strict behaviorists such as Skinner and Watson refused to believe that cognition (such as thoughts and expectations) plays a role in learning, another behaviorist, Edward C. Tolman , had a different opinion. Tolman’s experiments with rats demonstrated that organisms can learn even if they do not receive immediate reinforcement (Tolman & Honzik, 1930; Tolman, Ritchie, & Kalish, 1946). This finding was in conflict with the prevailing idea at the time that reinforcement must be immediate in order for learning to occur, thus suggesting a cognitive aspect to learning.

In the experiments, Tolman placed hungry rats in a maze with no reward for finding their way through it. He also studied a comparison group that was rewarded with food at the end of the maze. As the unreinforced rats explored the maze, they developed a cognitive map : a mental picture of the layout of the maze ( [link] ). After 10 sessions in the maze without reinforcement, food was placed in a goal box at the end of the maze. As soon as the rats became aware of the food, they were able to find their way through the maze quickly, just as quickly as the comparison group, which had been rewarded with food all along. This is known as latent learning : learning that occurs but is not observable in behavior until there is a reason to demonstrate it.

An illustration shows three rats in a maze, with a starting point and food at the end.

Latent learning also occurs in humans. Children may learn by watching the actions of their parents but only demonstrate it at a later date, when the learned material is needed. For example, suppose that Ravi’s dad drives him to school every day. In this way, Ravi learns the route from his house to his school, but he’s never driven there himself, so he has not had a chance to demonstrate that he’s learned the way. One morning Ravi’s dad has to leave early for a meeting, so he can’t drive Ravi to school. Instead, Ravi follows the same route on his bike that his dad would have taken in the car. This demonstrates latent learning. Ravi had learned the route to school, but had no need to demonstrate this knowledge earlier.

Have you ever gotten lost in a building and couldn’t find your way back out? While that can be frustrating, you’re not alone. At one time or another we’ve all gotten lost in places like a museum, hospital, or university library. Whenever we go someplace new, we build a mental representation—or cognitive map—of the location, as Tolman’s rats built a cognitive map of their maze. However, some buildings are confusing because they include many areas that look alike or have short lines of sight. Because of this, it’s often difficult to predict what’s around a corner or decide whether to turn left or right to get out of a building. Psychologist Laura Carlson (2010) suggests that what we place in our cognitive map can impact our success in navigating through the environment. She suggests that paying attention to specific features upon entering a building, such as a picture on the wall, a fountain, a statue, or an escalator, adds information to our cognitive map that can be used later to help find our way out of the building.

Watch this video to learn more about Carlson’s studies on cognitive maps and navigation in buildings.

Operant conditioning is based on the work of B. F. Skinner. Operant conditioning is a form of learning in which the motivation for a behavior happens after the behavior is demonstrated. An animal or a human receives a consequence after performing a specific behavior. The consequence is either a reinforcer or a punisher. All reinforcement (positive or negative) increases the likelihood of a behavioral response. All punishment (positive or negative) decreases the likelihood of a behavioral response. Several types of reinforcement schedules are used to reward behavior depending on either a set or variable period of time.

Review Questions

________ is when you take away a pleasant stimulus to stop a behavior.

  • positive reinforcement
  • negative reinforcement
  • positive punishment
  • negative punishment

Which of the following is not an example of a primary reinforcer?

Rewarding successive approximations toward a target behavior is ________.

Slot machines reward gamblers with money according to which reinforcement schedule?

  • fixed ratio
  • variable ratio
  • fixed interval
  • variable interval

Critical Thinking Questions

What is a Skinner box and what is its purpose?

A Skinner box is an operant conditioning chamber used to train animals such as rats and pigeons to perform certain behaviors, like pressing a lever. When the animals perform the desired behavior, they receive a reward: food or water.

What is the difference between negative reinforcement and punishment?

In negative reinforcement you are taking away an undesirable stimulus in order to increase the frequency of a certain behavior (e.g., buckling your seat belt stops the annoying beeping sound in your car and increases the likelihood that you will wear your seatbelt). Punishment is designed to reduce a behavior (e.g., you scold your child for running into the street in order to decrease the unsafe behavior.)

What is shaping and how would you use shaping to teach a dog to roll over?

Shaping is an operant conditioning method in which you reward closer and closer approximations of the desired behavior. If you want to teach your dog to roll over, you might reward him first when he sits, then when he lies down, and then when he lies down and rolls onto his back. Finally, you would reward him only when he completes the entire sequence: lying down, rolling onto his back, and then continuing to roll over to his other side.

Personal Application Questions

Explain the difference between negative reinforcement and punishment, and provide several examples of each based on your own experiences.

Think of a behavior that you have that you would like to change. How could you use behavior modification, specifically positive reinforcement, to change your behavior? What is your positive reinforcer?

Creative Commons License

Share This Book

  • Increase Font Size

Operant Conditioning Strategies: Positive Reinforcement Essay

Operant conditioning is a strategy that is used to change one’s undesirable behavior and encourage desirable ones; this is through punishments or rewards. According to Skinner, a behaviorist, internal thoughts, as well as motivations, explain one’s behavior; therefore, the environment under which an individual operates can be changed in order to generate specific consequences (Hartley, 2001).

For instance, a boy is promised a reward after completing his homework; in this case, the promise to get a reward increases the behavior of completing the homework.

The operant conditioning reduces an undesirable behavior; in the case of punishment, one is discouraged from undesirable behaviors. For instance, a student may be denied some privileges when they make noise during class time; this is meant to decrease an undesirable behavior, which is making noise during class time.

Operant conditioning uses some key concepts, which include punishment; this concept presents an adverse effect to an individual in order to discourage an undesirable behavior. The concept of punishment is categorized into positive punishment and negative punishment; positive punishment presents an event that is unfavorable to reduce an undesirable behavior, while negative punishment removes an unfavorable event when an undesirable behavior is decreased (Olson & Hergenhahn, 2009).

The concept of reinforcement creates an environment that encourages desirable behavior. This concept classified in positive reinforcement and negative reinforcement; positive reinforcement uses events presented after the behavior and the event should be favorable.

In this case, a desirable behavior is encouraged through praising or rewarding an individual directly (Olson & Hergenhahn, 2009). Negative reinforcement is used by removing an event that is unfavorable when an individual displays a desirable behavior; in this case, a desirable behavior is encouraged by removing an unpleasant event.

The use of positive reinforcement is more effective that the use of negative reinforcement. Natural people enjoy being praised and enjoy rewards; therefore, they will be easily encouraged to do good when they are rewarded or praised afterwards (Wills, 2005). Positive reinforcement also improves one’s attitude towards desirable behavior, this means that an individual will associate desirable behavior with good things; he/she will do good things even if he/she is not promised rewards or praised (Olson & Hergenhahn, 2009).

Additionally, positive reinforcement, especially praises, improves one’s self-confidence; he/she feels good when praised, and this makes him/her to believe in engaging in desirable behaviors. Therefore, such individuals will always be willing to do good things.

This means that the change of behavior will not only apply to a specific behavior being encouraged, but also to other desirable behaviors. Positive reinforcement also enhances better relations between the one correcting and the one being corrected; the one being corrected is perceived as a person who always means good and other subsequent corrections are always welcomed (Wills, 2005).

Here, is an example of an operant conditioning; a three year old boy plays the whole day without rest, and this makes him restless during his sleep at night, the boy likes chocolate ice cream. The boy’s mother wants to make the boy sleep for two hours after lunch, for him to sleep well at night.

I recommend the mother to use positive reinforcement, whereby, she promises the boy a chocolate ice cream after sleeping for 2 hours in the afternoon every day; she should continue with the same schedule for two weeks. This will make the boy excited, and he will force himself to sleep in the afternoon in order to enjoy a chocolate ice cream every day.

After two weeks, the boy will not only get used to sleeping for 2 hours after lunch, but also learn that his mother means good, he will be enjoying his sleep at night; therefore, he will develop good attitude towards his mother’s subsequent instructions.

Hartley, K. (2001). Learning Strategies and Hypermedia Instruction. Journal of Educational Multimedia and Hypermedia, 10 (3): 167-182.

Olson, M., & Hergenhahn, B. (2009). An introduction to theories of learning . Upper Saddle River: Prentice Hall.

Wills, A. (2005). New Directions in Human Associative Learning . New Jersey: Lawrence Erlbaum Associates.

  • Chicago (A-D)
  • Chicago (N-B)

IvyPanda. (2022, February 23). Operant Conditioning Strategies: Positive Reinforcement. https://ivypanda.com/essays/operant-conditioning/

"Operant Conditioning Strategies: Positive Reinforcement." IvyPanda , 23 Feb. 2022, ivypanda.com/essays/operant-conditioning/.

IvyPanda . (2022) 'Operant Conditioning Strategies: Positive Reinforcement'. 23 February.

IvyPanda . 2022. "Operant Conditioning Strategies: Positive Reinforcement." February 23, 2022. https://ivypanda.com/essays/operant-conditioning/.

1. IvyPanda . "Operant Conditioning Strategies: Positive Reinforcement." February 23, 2022. https://ivypanda.com/essays/operant-conditioning/.

Bibliography

IvyPanda . "Operant Conditioning Strategies: Positive Reinforcement." February 23, 2022. https://ivypanda.com/essays/operant-conditioning/.

  • Impact of Operant Conditioning on Child Development
  • Phobia in Operant and Classical Conditioning
  • Classical Conditioning and Operant Conditioning
  • Operant Conditioning Theory by Burrhus Frederic Skinner
  • Operant Conditioning Concept - Psychology
  • Operant Conditioning, Memory Cue and Perception
  • Essay on Operant Conditioning
  • Operant Learning Principles and Application
  • Classical and Operant Conditioning
  • Operant Conditioning in Regulating Drivers' Behavior
  • Sigmund Freud’s Theories
  • Examples of Special Populations in Psychology
  • Legal Aspects in Professional Psychology
  • Dissociative Identity Disorder in "Sybil"
  • Internal Determinants of Attraction

Home — Essay Samples — Psychology — Operant Conditioning — Vicarious Conditioning Vs Operant Conditioning

test_template

Vicarious Conditioning Vs Operant Conditioning

  • Categories: Operant Conditioning

About this sample

close

Words: 889 |

Published: Mar 19, 2024

Words: 889 | Pages: 2 | 5 min read

Table of contents

Introduction, vicarious conditioning, operant conditioning, applications and implications.

Image of Dr. Oliver Johnson

Cite this Essay

Let us write you an essay from scratch

  • 450+ experts on 30 subjects ready to help
  • Custom essay delivered in as few as 3 hours

Get high-quality help

author

Dr. Heisenberg

Verified writer

  • Expert in: Psychology

writer

+ 120 experts online

By clicking “Check Writers’ Offers”, you agree to our terms of service and privacy policy . We’ll occasionally send you promo and account related email

No need to pay just yet!

Related Essays

1 pages / 530 words

2 pages / 826 words

1 pages / 597 words

1 pages / 616 words

Remember! This is just a sample.

You can get your custom paper by one of our expert writers.

121 writers online

Still can’t find what you need?

Browse our vast selection of original essay samples, each expertly formatted and styled

Related Essays on Operant Conditioning

Operant conditioning is based on the premise that behavior is shaped and learned through consequences. The three key elements of this concept include reinforcement, punishment, and extinction. Reinforcement refers to the process [...]

“Classical Conditioning” Lumen Boundless Psychology, the-difference-between-positive-and-negative-punishment/

This essay will explore the possible application of operant conditioning strategies in the classroom to encourage students to be more participative during class and to be more proactive in completing their assignments and [...]

If asked to define conditioning, I would say it is the act of using repeated efforts to shape something into what is desired. In the context of psychology, conditioning is described as a way of learning. Psychologists categorize [...]

The theory that I chose to write about is B.F. Skinner’s Operant Conditioning because it intrigues me and is the one that I agree with the most. B.F. Skinner is an incredible American psychologist who developed one of the most [...]

Ivan Pavlov was the first psychologist to use the term, “Classical Conditioning”. He came about this phenomenon while studying the secretion of stomach acids and salivation in dogs in response to the ingestion of different kinds [...]

Related Topics

By clicking “Send”, you agree to our Terms of service and Privacy statement . We will occasionally send you account related emails.

Where do you want us to send this sample?

By clicking “Continue”, you agree to our terms of service and privacy policy.

Be careful. This essay is not unique

This essay was donated by a student and is likely to have been used and submitted before

Download this Sample

Free samples may contain mistakes and not unique parts

Sorry, we could not paraphrase this essay. Our professional writers can rewrite it and get you a unique paper.

Please check your inbox.

We can write you a custom essay that will follow your exact instructions and meet the deadlines. Let's fix your grades together!

Get Your Personalized Essay in 3 Hours or Less!

We use cookies to personalyze your web-site experience. By continuing we’ll assume you board with our cookie policy .

  • Instructions Followed To The Letter
  • Deadlines Met At Every Stage
  • Unique And Plagiarism Free

operant conditioning essay definition

IMAGES

  1. Operant Conditioning: What It Is, How It Works, and Examples

    operant conditioning essay definition

  2. 13 Operant Conditioning Examples (2023)

    operant conditioning essay definition

  3. Operant Conditioning Definition and Concepts

    operant conditioning essay definition

  4. Operant Conditioning to Change Behaviors

    operant conditioning essay definition

  5. 14 Best Examples Of Operant Conditioning

    operant conditioning essay definition

  6. Operant Conditioning: Behavior Through Reinforcements Free Essay Example

    operant conditioning essay definition

VIDEO

  1. Operant Conditioning Lecture Notes (AP Psychology) #3

  2. Operant condition is associated #psychology#ctet2024#ctet #viral #shorts #hptet #cdpbyhimanshisingh

  3. Operant conditioning notes

  4. Learn Gymnastics Bent Arm Strength

  5. what is operant conditioning #important notes

  6. Operant Conditioning Explained

COMMENTS

  1. Operant Conditioning: What It Is, How It Works, and Examples

    Operant conditioning, sometimes referred to as instrumental conditioning, is a method of learning that employs rewards and punishments for behavior. Through operant conditioning, an association is made between a behavior and a consequence (whether negative or positive) for that behavior. For example, when lab rats press a lever when a green ...

  2. Operant Conditioning In Psychology: B.F. Skinner Theory

    Operant conditioning is a method of learning that occurs through rewards and punishments for behavior. Through operant conditioning, an individual makes an association between a particular behavior and a consequence. B.F Skinner is regarded as the father of operant conditioning and introduced a new term to behavioral psychology, reinforcement.

  3. What Is Operant Conditioning? Definition and Examples

    Key Takeaways: Operant Conditioning. Operant conditioning is the process of learning through reinforcement and punishment. In operant conditioning, behaviors are strengthened or weakened based on the consequences of that behavior. Operant conditioning was defined and studied by behavioral psychologist B.F. Skinner.

  4. Operant Conditioning: Definition, Examples, & Psychology

    Operant conditioning is a fundamental concept in psychology. It describes the process where behavior changes depending on the consequences of the behavior (American Psychological Association, 2023). For example, if a behavior is rewarded (positively reinforced), the likelihood of it being repeated increases. And if it's punished, the ...

  5. Operant Conditioning

    Operant conditioning is based on the work of B. F. Skinner. Operant conditioning is a form of learning in which the motivation for a behavior happens after the behavior is demonstrated. An animal or a human receives a consequence after performing a specific behavior. The consequence is either a reinforcer or a punisher.

  6. What Is Operant Conditioning? I Psych Central

    Operant conditioning therapy is a main component of cognitive behavioral therapy — a form of psychotherapy. If you live with a mental health condition, mental health professionals can introduce ...

  7. Operant conditioning

    operant conditioning, in psychology and the study of human and animal behaviour, a mechanism of learning through which humans and animals come to perform or to avoid performing certain behaviours in response to the presence or absence of certain environmental stimuli. The behaviours are voluntary—that is, the human or animal subjects decide ...

  8. Operant Conditioning

    Introduction. The study of operant conditioning represents a natural-science approach to understanding the causes of goal-directed behavior. Operant behavior produces changes in the physical or social environment, and these consequences influence whether such behavior occurs in the future. Thus, operant behavior is selected by its consequences.

  9. 6.3 Operant Conditioning

    Figure 6.10 (a) B. F. Skinner developed operant conditioning for systematic study of how behaviors are strengthened or weakened according to their consequences. (b) In a Skinner box, a rat presses a lever in an operant conditioning chamber to receive a food reward. (credit a: modification of work by "Silly rabbit"/Wikimedia Commons)

  10. Operant Conditioning

    Experimental Design and Contingency. Operant conditioning involves a causal relation between a response-outcome relation (the fact that a specific response produces a specific outcome) and the probability of that response. Therefore, operant conditioning can only be demonstrated experimentally, by manipulating the response-outcome relation (the independent variable) and observing its effect on ...

  11. Operant Conditioning

    Definition. A process of learning in which a behavior's consequence affects the future occurrence of that behavior. B. F. Skinner ( 1953) derived the principles of operant conditioning from Thorndike's "law of effect," which suggests that a behavior producing a favorable or satisfying outcome is more likely to reoccur, while a behavior ...

  12. Operant conditioning

    Operant conditioning, also called instrumental conditioning, is a learning process where voluntary behaviors are modified by association with the addition (or removal) of reward or aversive stimuli.The frequency or duration of the behavior may increase through reinforcement or decrease through punishment or extinction.. Operant conditioning originated in the work of Edward Thorndike, whose law ...

  13. Operant Conditioning: Definition, Basic Principles & Applications

    Operant conditioning is a psychological theory of learning that focuses on how behavior is shaped through the consequences that follow it. In this type of learning, individuals learn to associate their actions with either positive or negative outcomes. Positive outcomes, such as rewards or reinforcement, tend to strengthen or increase the ...

  14. Operant Conditioning Theory (+ How to Apply It in Your Life)

    The basic concept behind operant conditioning is that a stimulus (antecedent) leads to a behavior, which then leads to a consequence. This form of conditioning involves reinforcers, both positive and negative, as well as primary, secondary, and generalized. Primary reinforcers are things like food, shelter, and water.

  15. Operant Conditioning: What It Is and How It Works

    Operant conditioning at school The above was an example of positive reinforcement operant conditioning. Here's an example of negative punishment operant conditioning, in which something gets ...

  16. Operant Conditioning

    Operant behavior is behavior "controlled" by its consequences. In practice, operant conditioning is the study of reversible behavior maintained by reinforcement schedules. We review empirical studies and theoretical approaches to two large classes of operant behavior: interval timing and choice. We discuss cognitive versus behavioral ...

  17. Operant Conditioning

    They also believed that learning is similar in both humans and animals. Operant conditioning is a learning model through which people are rewarded or punished for their behavior. This means that for every behavior, there is a consequence. The theory attempts to change behavior by using either reinforcement or punishment.

  18. Operant Conditioning

    Operant conditioning is based on the work of B. F. Skinner. Operant conditioning is a form of learning in which the motivation for a behavior happens after the behavior is demonstrated. An animal or a human receives a consequence after performing a specific behavior. The consequence is either a reinforcer or a punisher.

  19. Theory of Operant Conditioning

    Operant conditioning presents the idea that "behavior is a function of its consequences." (Robbins & Judge, 2008, p. 55). In other words, people act and behave a certain way to get something they want or to avoid it. This kind of behavior is learned and voluntary unlike the classical conditioning theory that presents the idea that behavior ...

  20. Operant Conditioning

    Operant conditioning is based on the work of B. F. Skinner. Operant conditioning is a form of learning in which the motivation for a behavior happens after the behavior is demonstrated. An animal or a human receives a consequence after performing a specific behavior. The consequence is either a reinforcer or a punisher.

  21. Operant Conditioning Strategies: Positive Reinforcement Essay

    Operant conditioning is a strategy that is used to change one's undesirable behavior and encourage desirable ones; this is through punishments or rewards. According to Skinner, a behaviorist, internal thoughts, as well as motivations, explain one's behavior; therefore, the environment under which an individual operates can be changed in ...

  22. Vicarious Conditioning Vs Operant Conditioning: [Essay Example], 889

    Operant Conditioning. Operant conditioning, on the other hand, was developed by B.F. Skinner in the mid-20th century. This form of conditioning focuses on the relationship between behavior and its consequences, specifically reinforcement and punishment. According to Skinner, behavior is shaped and modified through the consequences that follow it.

  23. Operant conditioning.

    The principle of operant conditioning may be seen everywhere in the multifarious activities of human beings from birth until death. Alone, or in combination with the Pavlovian principle, it is involved in all the strengthenings of behavior with which we shall be concerned in this book. This chapter examines the following topics: Thorndike and the law of effect, Skinner and operant conditioning ...