‘Learning theory’ terminology: classical and operant conditioning

Understanding how animals learn is key to interpreting animal behaviour. We tend to think of learning as something that happens when we deliberately train animals (e.g. in teaching dogs to ‘sit’ or ‘come’). But actually learning happens all the time – everything that a dog or cat experiences throughout their life will impact to some extent on subsequent behaviour.

There is a lot of talk about ‘learning theory’ in animal training and behaviour. It is one of the central facets of most of the courses you will see in this area, a common topic in discussions and forums, and the topic of hot debate when applied to methods of training dogs. But, it is a topic that has very confusing terminology – so my aim here is to disentangle the meaning of some of the key terms which lead to common misunderstandings.

I think misunderstanding of terminology occurs because there is a lack of appreciation of the historical context. Terms commonly used today in animal training were first described back in the early 1900’s. Around that period, the founders of ‘behaviourism’, such as Watson and Thorndike, were doing controlled experiments to record what happened to behaviours with different interventions. At that time there was very little knowledge about what happened inside the brain, and the animal was essentially a ‘black box’ – which things happened to and responses occurred. The terminology coined at the time, therefore, did not imply anything about the animal itself – it’s perception of the events, how it felt, or why it responded as it did.  Nor did it imply any judgement as to what was done to the animal. The terms just described what happened and the behaviour seen.

The problem has been that the terms derived from this ‘black box’ approach have been retained, and are sometimes used to describe much more than they were originally meant to. Considerable developments in cognition and neuroscience have enabled us to appreciate how the processes of learning actually occur in the brain – and it is difficult to not infer some of this knowledge when using the original ‘learning theory’ terms.

Associative learning

Associative learning is the process whereby things that occur close together become associated. Associative learning is divided into two types: classical (or Pavlovian) conditioning and operant (or instrumental) conditioning.

Classical conditioning is an association between an important event and one which reliably predicts it. It’s called Pavlovian conditioning because it was first described by the Russian physiologist Ivan Pavlov, who noticed that dogs in his study on saliva would start to anticipate food (and produce saliva) on hearing the researcher go into the food preparation area. He tested this by ringing a bell before feeding them – and after a few presentations they started to salivate on hearing the bell. Clearly dogs don’t normally go around salivating when they hear bells – the response was due to them learning that the bell was a reliable indicator of the imminent arrival of food. This type of learning is clearly a huge evolutionary advantage – identifying events which indicate the approach of a predator gives an animal time to get away. Equally, reacting to early indicators of food means getting to the resource first.

Operant conditioning is the association between the action of an animal and its consequence. If a dog sits, and gets a treat, he or she will make an association between the action and the consequence. If the consequence is perceived by the animal as good (e.g. the treat) then the behaviour is more likely to occur again, the next time the animal is in the same situation. If the consequence is perceived by the animal as bad (e.g. a cat jumps on the work surface and a pan falls over making a loud clang), then the behaviour is less likely to occur again in the same context. This is again eminently sensible and clearly an advantage for survival – if an animal can learn to avoid repeating actions with bad outcomes and repeat those with good, it’s more likely to make its way successfully in the world.

Reinforcement and punishment

It is the terminology used in operant conditioning that can cause confusion. Based on the ‘black box’ interpretation of learning theory, if the chance of a behaviour occurring increases, it is known as ‘reinforcement’. Where an action decreases the chance that a particular behaviour will occur it is known as ‘punishment’. Hence, in learning theory terms if you do something to a dog which results in the increase of a particular behaviour, it is reinforcement. If you do something which leads to a decrease in that behaviour, it is called punishment. No matter what you did!

‘Punishment’ is particularly confusing, because in general use it has the implication of doing something very unpleasant to a person or animal. But when used in the context of learning theory it’s just a decrease in behaviour.

More confusion arises with the further addition of descriptors. The early behaviourists also split the categories of ‘reinforcement’ and ‘punishment’ into those which occurred when something was added (termed ‘positive’) and those which occurred when something was removed (called ‘negative’). So, if you add something and the behaviour increases – it is positive reinforcement. Taking away something which causes an increase in behaviour is negative reinforcement. Similarly, negative punishment is a decrease of behaviour when something is removed, and positive punishment is a decrease in behaviour when something is added (Table 1). And that is all it means. Remember this is ‘black box’ stuff – there is no implication about what is done nor how the animal feels – just a description of whether something is added or removed, and whether the behaviour increases or decreases.

Table 1: The Four Categories of Operant Conditioning

Behaviour increases Behaviour decreases
Something added Positive reinforcement Positive punishment
Something removed Negative reinforcement Negative punishment

A few examples might help with this, and highlight some of the misunderstandings. Let’s say you are leading a horse in from a field with a halter and it pulls forward on the lead rope. If you hang on to the halter, putting tension on the rope, the halter will tighten on the nose and around the head. To avoid this, the horse slows down and walks next to you again, and the halter loosens off. By putting tension on the rope you added something (positive) which decreased a behaviour (pulling) – so the pulling behaviour was positively punished. BUT, if you think about this a bit more – you have also done some negative reinforcement here. By releasing the pressure on the halter as the horse comes level again, you have negatively reinforced not pulling (or walking next to you).

Positive reinforcement and negative punishment are similarly linked. So, for example, you might train your dog to sit down when you come into the house by giving him attention when his bottom hits the floor. This is positive reinforcement – adding something which results in an increase in the behaviour. But removing the attention when the dog is wriggling about instead of sitting is negative punishment – you are removing attention to reduce the wriggling behaviour. The term used will therefore depend on which behaviour you are talking about.

A dog being trained to sit down using positive reinforcement

A dog being trained to sit down using positive reinforcement (with ‘standing’ or ‘jumping about’ being negatively punished)

So, if you read that a trainer only uses positive reinforcement, you should realise by now that this is actually impossible! You can’t positively reinforce some behaviours without negatively punishing others. It’s very likely that such trainers mean that they have a philosophy of rewarding desired behaviours  rather than using aversive methods for undesired ones. That’s great – but the use of the term ‘reinforcement’ is confusing and its best to check what they do actually mean. Similarly some trainers might say things like ‘we never use punishment, only negative reinforcement’. Your alarm bells should be ringing again here, because clearly in order to negatively reinforce one behaviour, you must inevitably positively punish another. For example, trainers who use a lunge line to move a horse away are both negatively reinforcing moving away and positively punishing standing still.

Hopefully this blog has helped a bit with the basic terminology used in learning theory. ‘Punishment’ and ‘reinforcement’ have very specific and limited meanings. Misunderstandings occur where further meaning is applied to these terms, for example through implying what the animal might feel or to lead people into thinking something is ‘positive’ for animals when it may not actually be so. In my opinion, the use of these terms is somewhat outdated, and it would probably be better to move on to less ambiguous terminology. But as people still use them, it’s important to know what they do, and particularly what they don’t, mean.

In future blogs, I’ll come back to other terms used in learning, and discuss a bit more about the factors that influence what, how and when animals learn in the real world. I’ll also look at the controversial topic of using reward or aversion based training methods in companion animals.