Index: > A B C D E F G H I J K L M N O P Q R S T U V W X Y Z
Business Industries Finance Tax

Home > Prisoner's dilemma


First Prev [ 1 2 3 4 ] Next Last

The prisoner's dilemma is a type of non-zero-sum game. In this game theory problem, as in many others, it is assumed that each individual player is trying to maximise his own advantage, without concern for the well-being of the other player. This Nash equilibrium does not lead to a jointly optimum solution in the prisoner's dilemma; in the equilibrium, each prisoner chooses to defect even though the joint payoff of the players would be higher by cooperating. Unfortunately (for the prisoners), each player has an individual incentive to cheat even after promising to cooperate. This is the heart of the dilemma.

In the iterated prisoner's dilemma cooperation may arise as an equilibrium outcome. Here the game is played repeatedly. Since the game is repeated, each player is afforded an opportunity to punish the other player for previous non-cooperative play. Thus, the incentive to cheat may be overcome by the threat of punishment, leading to a superior, cooperative outcome.

1 The classical prisoner's dilemma

The classical prisoner's dilemma (PD) is as follows:

Two suspects, you and another person, are arrested by the police. The police have insufficient evidence for a conviction, and having separated the both of you, visit each of you and offer the same deal: if you confess and your accomplice remains silent, he gets the full 10-year sentence and you go free. If he confesses and you remain silent, you get the full 10-year sentence and he goes free. If you both stay silent, all they can do is give you both 6 months for a minor charge. If you both confess, you each get 6 years.

It can be summarized thus:

You Deny You Confess
He Denies Both serve six months He serves ten years; you go free
He Confesses He goes free; you serve ten years Both serve six years

Let's assume both prisoners are completely selfish and their only goal is to minimize their own jail terms. As a prisoner you have two options: to cooperate with your accomplice and stay quiet, or to betray your accomplice and confess. The outcome of each choice depends on the choice of your accomplice; unfortunately, however, you don't know the choice of your accomplice. Even if you were able to talk to him, you couldn't be sure whether to trust him.

If you expect your accomplice will choose to cooperate and stay quiet, the optimal choice for you would be to confess, as this means you get to go free immediately, while your accomplice lingers in jail for 10 years. If you expect your accomplice will choose to confess, your best choice is to confess as well, since then at least you can be spared the full 10 years serving time and have to sit out 6 years, while your accomplice does the same. If however you both decide to cooperate and stay quiet, you would both be able to get out in 6 months.

Confessing is a dominant strategy for both players. No matter what the other player's choice is, you can always reduce your sentence by confessing. Unfortunately for the prisoners, this leads to a poor outcome where both confess and both get heavy jail sentences. This is the core of the dilemma.

If reasoned from the perspective of the optimal interest of the group (of two prisoners), the correct outcome would be for both prisoners to cooperate with each other, as this would reduce the total jail time served by the group to one year total. Any other decision would be worse for the two prisoners considered together. However by each following their selfish interests, the two prisoners each receive a lengthy sentence.

If you had an opportunity to punish the other player for confessing, then a cooperative outcome could be sustained. The iterated form of this game (discussed below) presents an opportunity for such punishment. In that game, if your accomplice cheats by confessing this time, you can punish him by cheating next time yourself. Thus, the iterated game builds in an opportunity for punishment absent in the classic one-period game.

2 A similar but different game

The cognitive scientist Douglas Hofstadter (see References, below) once suggested that people often find problems such as the PD problem easier to understand when it is illustrated in the form of a simple game, or trade-off. One of several examples he used was two people meeting and exchanging closed bags, with the understanding that one of them contains money, and the other contains an item being bought. Either player can choose to honor the deal by putting into his bag what he agreed, or he can defect by handing over an empty bag. In this exchange game, unlike in the PD, defection is always the best course.

3 The PD payoff matrix

In the same article, Hofstadter also observed that the PD payoff matrix can, in fact, be written in a variety of ways, as long as it conforms to the following principle:

T > R > P > S

where T is the temptation to defect (ie, what you get when you defect and the other player cooperates); R is the reward for mutual cooperation; P is the punishment for mutual defection; and S is the sucker's payoff (ie, what you get when you cooperate and the other player defects).

(It is also usually the case that (T + S)/2 < R, and this is required in the iterated case.)

The above formulae, then, ensures that, whatever the precise numbers in each part of the payoff matrix, it is always 'better' for each player to defect regardless of what the other does.

Following this principle, and simplifying the PD to the above 'bag switching' scenario (or an Axelrod-type two player game, see below), we get the following 'canonical' PD payoff matrix — that is, the one that is normally shown in literature on the subject:

Cooperate Defect
Cooperate 3, 3 0, 5
Defect 5, 0 1, 1

In "win-win" terminology the table would look like this:

Cooperate Defect
Cooperate win-win lose much-win much
Defect win much-lose much lose-lose




Non User