- Hey Charles. How you been?
- I've been doing just fine. How've you been?
- Good. It's been a while. I understand you went to Africa.
- I went to Africa. The entire continent, or at least parts of it.
- How's Africa?
- It's fine. It's, warm.
- The entire continent?
- [LAUGH] The entire continent. Or at least Kenya and Nigeria.
- Particularly in Nigeria. Which was
- So, do you want to do a shout
out to any of, any of your friends from there?
- Yeah, I'd like to say hi to Chairman and I'd like to say hi
to Prof. Oh, and Good Times. I would like to say hi to Good Times. He
was awesome. Okay, so Michael. I am back now so that we can talk about the
last little section. That we're going to do in this class, and it's game theory.
- That sounds cool. What does that have to do with machine learning?
- So that's an interesting question because,
in fact, game theory comes from a tradition
way outside of machine learning in A.I. but you'll see
in a moment why it is I care about game
theory, and really this is a very natural extension to
all the stuff we've been talking about with reinforcement learning.
- Interesting. Are we talking about games like Monopoly?
- In some sense, we are. Because all of life is like
Monopoly. So in fact, what is game theory? Maybe that's what we can
do. We can just try to define game theory, and maybe it'll
be clear why it is we're worried about this at all. Seem fair?
What is Game Theory
Alright Michael, so there's lots of definitions
of game theory that we could use.
One that I like in particular is
that game theory is the mathematics of conflict.
- Hm, [CROSSTALK] that's interesting.
- I think it's kind of interesting. Or generally it's the
mathematics of conflicts of interest when trying to make optimal choices.
- because I feel like a lot
of people have their own conflicts with mathematics.
- I think everyone but mathematicians have
their conflicts with mathematics. I think that's fair.
- I see.
- But do you see if you, can you see
how worrying about the mathematical conflict might be a sort of
natural next thing to think about after you've learned a lot
about reinforcement learning? I guess then well the next bullet kind
of, kind of suggests a trend. So, so we've been talking about
decision making and it's almost always in the context of a
single agent that lives in a world and it's trying to maximize
reward. But that's kind of a lonely way to think about
things, so what if there's other agents in the world with you?
- Right and of course evidence suggests that there are in
fact other agents in the world with you. And what we've been
doing with reinforcement learning which, you know, has worked out very well
for us, is we've been mostly pretending that those other agents are
just a part of the environment. Somehow all the stuff that
the other agents do is hidden inside of the transition model. But
truthfully it probably makes sense if you want to make optimal decisions
to try to take into account explicitly the desires and the goals
of all the other agents in the world with you. Does that seem fair?
- Right. So that's what game theory helps us to do
and then at the very end I think we'll, we'll be
able to tie what we're going to learn Directly back into the
reinforcement learning that we've done and even into the Bellman equation.
- Oh, okay, nice.
- Yeah, so that is going to work out pretty well but,
but we have to get there first and there's a lot
of stuff that we have to do to get there. But
right now what I want you to think about is this notion
that, we're going to move from reinforcement learning world of single
agents to a game theory world of multiple agents and tie
it all back back together. It's a sort of general
note that I think that, that's worthwhile is that, game theory
sort of comes out of economics. And then in fact,
if you think about multiple agents there being millions and millions
of multiple agents, in some sense that's economics. Right? Economics is
kind of the math, and the science, and the art of
thinking about what happens when there are, lots, and lots, and
lots, and lots of people with their own goals possibility conflicting, trying
to work together to accomplish something, right. And so what game
theory does, is it gives us mathematical tools to think about that.
- I feel like I feel like other fields
would care about some of these things too, like sociology.
- And what about, I could kind
of imagine biology caring about these things, too.
- Even biology, I like the idea of
biology. Biology. Why would biology care about this?
- Well, I guess the way you described it
in terms of lots of individual agents that are interacting.
Like, you know, creatures that live and reproduce. I
feel like they, they have some of those same issues.
- Sure. So certainly biology at at the level of entities, at the level of
mammals or level of insects, you might be able to think about it that way.
But perhaps even at the level of genes and at the level of
cells. Little virii and, and bacteria. You
could possibly think about it that way.
- because they're in conflict too, I guess.
- Yeah. Now there's probably this notion of
intention. It's not entirely clear what that means here
and I think implicit in the notion of
what we're doing here is this notion of intention
and explicit goals as opposed to ones that
are kind of built into your genes, but I think
that's a perfectly reasonable way of thinking about
it. I think the, the lesson from this discussion
though is that. What game theory sort of captures for us or
what would like for it to capture for us, is ways of thinking
about what happens when you're not the only thing with intention in
the world and how do you incorporate other goals from other people who
might not have your best interest at heart or might have your
best interest at heart. How do you make that work? And so if
you think about that problem then I think it makes sense that it's
been increasingly a part of AI over the years, and in some ways
machine learning has started to think of it
as being a mainstream part of what we do.
- Hence, why it's worth talking about today. Okay. Sound good.
- Okay. Let's, let's try to make this
concrete with a very simple sort of example.
A Simple Game
Okay, Michael, so here's a, a nice little concrete example to,
to think about this. Let's pretend that we're no longer in
a world of a single agent like we've been thinking about
with reinforcement learning, but we've gone
full-blown generality to two agents, [LAUGH]
okay? And let's call those agents a and b, and they're
going to be in a very simple game where a gets to make
a choice. And then b gets to make a choice. And
then a might be able to make a choice. So this tree
that I've drawn over on the right is going to capture
the dynamics of this specific game that I'm imagining. So, these
little nodes here, these circles represent states. And we can
think about those in states in the same way that we.
Talked about reinforcement learning in the past. The edges between
these nodes represent actions that one could take. So, this should
look familiar, this is just basically a game tree like
anyone who's taken a, an AI course might have seen. Okay?
- I guess so.
It doesn't look like a very interesting game.
- But I guess it's a, sort of abstract example.
- Yes. It's a very simple game just so
that we can get a handle on some basic concepts.
So, in particular, if you look at the details
of this game, you start out in state one. Ok?
And A gets to make a choice between two
actions, going left or going right. If A goes right,
goes right, she ends up in state three. If
she goes left, she ends up in state two. Regardless
B gets to make a choice. From state three we
can choose to go right, and really that's all that can
happen. And this, what happens if B goes right from state
three is that, a value of plus two is assigned to
A, okay? All of these numbers at the bottom, a
the leaves here, are going to be values or rewards if you
want to think about 'em that way that are assigned to
player A. And, in fact, for the purposes of this game,
it's going to be the case that B always get's the opposite
of what A get's. So, if A get's plus 2 then B get's
minus 2, if A get's plus 4 then B get's minus 4,
if A get's minus 1, B get's plus 1, does that make sense?
- Yeah, though could you write it down so that I won't forget?
- Okay, that's fine. So, by the way, this is a very specific type
of game. here, and it has a name, which I want to get right.
This is a two-player zero-sum finite deterministic game of perfect information.
So as a, as a title or description of, of this
kind of game, does this make sense to you? Do you
think you know what they all mean, what all those words mean?
- So, two players because it's a and b,
zero-sum. Because you said the leaves are a's rewards and
b's reward is the negation so if you add
the two rewards together you're always going to get zero.
- [LAUGH] Ok.
- It's not exactly right. Actually, so zero sum really just
means that the sum of the rewards is always a constant.
- And that constant needs to be zero.
- It doesn't need to be zero.
- So if it added up to eleven, that would still be zero sum?
- If it added up to eleven everywhere. Yes.
- Huh, okay, interesting choice of
terminology. finite, I don't know, everything seems
to be finite here. There's no infinite number of choices or states or depth.
- deterministic, well, again, thinking about it in an MVPish kind
of way, there's no sort of casting transitions in this particular. Picture.
- Right. So if I'm in, state two and I
go right, I always end up in state four, period.
- Game. I guess, a game is because it's more than one player?
- Of perfect information. That doesn't quite sound like the same terminology
that we used in the empty MDP setting. But,
I'm wondering if that's like, I know what state
I'm in, when I'm making a decision. So, it's
like a, like an MDP as apposed to a POMDP.
- Well it, that's exactly right. It's, it's
that you know what state you're in and. Yeah.
That's exactly what it means. It's like being the
MDP versus the Palm DP. That's a great analogy.
- Cool. And does it matter that it's a tree like this? because when
we were looking at MDPs, we had more complex structures of graphs and things.
- Well, you can think of this as unrolling the MDP if you
- So then those states are sort of
time stamped and history stamped. For, yeah, for
the purposes of this discussion, yes. And that's
a perfectly reasonable way of thinking about it.
- But, okay.
- But in general, we're going to be thinking
about game trees, but actually, we're going through
all of this for nothing, because we're going to
discover pretty soon that none of this matters.
- but, give me a couple of slides to get there, okay?
- [LAUGH] Sure.
- Okay. So this is about the simplest or at least the, the least
complicated game that you can think
about. Two players, zero sum. Finite deterministic
game of perfect information. You know, basically, I can look at this tree,
I know everything I need to know, and I can make decisions about
what action I might want to take in order to maximize my reward. Okay?
- All right. Now, in NBPs, of course we had this
notion of. policies, right. You remember what a policy was Michael?
- Mapping from states to actions.
- So in the game theory world, we have something very similar to, policies.
We call them strategies. So, all a strategy is, is a mapping of, [LAUGH]
all, of all possible states to actions. So for example, here's a strategy,
that A might have. When in state 1, go left. And when in state 4,
also go left.
- Seems like a terrible strategy. Does it?
- Well, just, if nothing else, just in state 4.
- Sure, but it's a strategy, right?
- Okay, but it's just, it's a strategy for one of the players.
- Right, exactly, each player has a strategy. And that makes
sense, right? Before when we talked about a policy, and mapped [UNKNOWN]
to action, there was only ever one player, only ever one
agent. And so we didn't have to worry about what other strategies
there were. Here, when we talk about a strategy, it's always with respect
to one of the players of the game. Okay, so, question. I've just
given you one strategy, which is what A does in all the states
A could potentially end up in. How many other strategies are there for A?
- For A? Okay, that sounds like a quiz.
- That does sound like a quiz. Let's make it a quiz.
A Simple Quiz
- Okay Michael, so here's the quiz, because you made me
make it a quiz, I decided to make it even harder, so
I want you to tell me how many different strategies are
there for A and how many different strategies there are for B.
- And you are talking about just deterministic mappings only, right?
- Well, we are in a two player
zero sum, finite, deterministic game of perfect information.
- So, [LAUGH].
- So yes, I mean deterministic.
- Okay, alright. because, you know, it could be that it might be helpful
to be stochastic.
- Oh, that's true, that's true. So in fact, that's
a good point. I was going to mention that later, but
l'll mention it now. What I just wrote down here, is
actually not just a strategy, it's something called a pure strategy.
- Hm, it's always about purity with you.
- It is, purity of chocolate and bacon mainly. But, yes. So, these are
simple pure strategies. Okay, so how many pure strategies are there for A and B.
- Okay, then let's go.
A Simple Quiz
Okay, Michael, you ready?
- Alright, what's the answer?
- So I was thinking about A before you threw in the B. So let me,
let me not think about B yet. I'll just think about A. So you had said
in state one, it can go either left or right. And in state four, it can
go either left or right. So boy, that
sounds a lot like two times two equals four.
- Yes that's exactly right. But generally speaking, so actually walk
me through that again Michael. How did you get two times two?
- So, well I, I had a little choice about how
to think about it. One is that in some sense, if you
go right from one, then you don't really have to make another choice.
- But if you go left, then you have this other choice
to make of either left or right. So it's, you if, if
you're just writing it down as a mapping from state to action,
you've got two choices at state one, and two choices at state four.
- And so that is two times two, right?
You can make, independently choose each of those.
- Right, that's exactly right. So in
fact it's important there that even though
if I can gone right on one, I would never, have to make another choice
because I can't reach state four. In order to be a strategy, you have
to basically say what you would do, in all states where you might end up.
- Okay, that's fair.
- Okay, alright so what about B, using that incredible reasoning?
- [LAUGH] So yeah, so B seems a little trickier because
here it can only ever matter whether you're in like if
player A sends us down to the left then we have a
choice of three. If player one sends us down the right, we
have a choice of one, which is really no choice at all.
- You can have any color car you want as long as it's black.
- Yeah, like my Tesla.
- I was thinking the Model T, but maybe T stands for Tesla.
- T does stand for Tesla.
- So, by the definition of how
many different you know, sort of reachable strategies
it would be one answer but, if you're defining it the
way you're defining it, it's going to be three times one or, three.
- Yeah. And that's exactly right. Good Michael.
So, let's actually write down what those strategies are.
Another Simple Quiz
Okay, Michael so I have written out biased on upon your rather impressive way
of thinking about it. All the possible
strategies for a. And all the possible strategies
for b. A, can go left, left, left right, right left, right, right. For states
one and four respectively. And b can go Left right, middle right or right right.
- Right. In particular, three in stage
three we can only go R. Which makes me a?
- A pirate.
- Oh. That makes more sense.
- It does make more sense. Okay. So,
now why did I write it this way Michael?
Well, I wrote it this way because if I write it this way with all the choices,
all the strategies that b has up here and all the strategies that a have here. I
actually form a matrix. And I can put in each of cells of the matrix the value
of taking a particular from a and a
particular strategy from b. Does that make sense?
- Yeah, that's very clever.
- Yes, it is very clever. I'm very happy that I came
up with it entirely on my own. Okay, so let's start filling
in these numbers. Or, instead of filling in the numbers ourselves, we
could ask the students to do it by making it a quiz.
- Nice, I see.
- Shall we do that?
doesn't seem very hard.
- Okay, so let's make certain everyone here understands
and what exactly we're asking you to do. We're saying
that if for example A takes this first strategy
go left and stay one and left and stayed four.
And B takes it's first strategy which is go
left and stay two and right and stayed three. What
is the value for A that will result? So, let's
actually do the first one as an example and ask
everyone to do the rest of them. That seem fair?
- Yeah, that's what I was going to suggest too.
- Okay, good. So let's see. If A chooses to go left in state one, since A
goes first, we'll end up in state 2, right? And then in this first strategy. B
goes left in state two, which means we will end up going down this path and the
value there is plus seven. So seven is in fact the value of this game with
respect to A. Now we know that because this
is a two player zero sum finite deterministic game
of perfect information. That if a gets a value
of seven, b gets a value of minus seven. So
we could write minus seven here, seven comma minus
seven here. But since we know that it's equal
and opposite let's just write down the value for
a. Okay? Just for compactness-sake. That seem fair to you?
- Okay, cool. So with that
in mind, let's see if we can fill out
the rest of this table. Think you can do it?
- I think so.
- Okay. Then.
- Can you give me a place to write everything?
- It will magically appear. Go.
Another Simple Quiz
Alright Michael. Tell me the answer. Lets go across.
- Alright. Across. So. The first row it's all left, left,
but the second left doesn't matter because that's in state four. Oh
it might matter. I take it back. So left, left you
did was seven. Left and then, player two goes to medium, middle.
- That would be three, so then the pay off is three.
- Mm hm.
the first player does left, left and the second player does right, right;
then we're going to go left, right, left again, or for a minus one.
- Mm hm. Oh wait, hang on. Left, right,
left. Yes, huh, yeah. The next row should be
exactly the same as this one, except for in
that third case. So it should be seven, three.
And the r, the difference there. The r in four
only matters in this case where we've gone to the
right state in [INAUDIBLE], sorry, right action from state two.
And so that should give us a plus four, so four.
- Correct so far.
- All right half way there, so, all right now, now, we've
got something different so now player one is going to the right,
so now actually player one's choice in state four never matters, so
the next two rows are going to be identical to each other.
And so in the first case, oh, actually
[LAUGH] all of these numbers are going to be
identical to each other, so player one goes
right, player. Or, sorry, player A goes right, player
B has to go right. There's no choice.
So we're going to get 2s, no matter what now,
so it's going to be two, two, two, and then
the second row will repeat that two, two, two.
- Okay. So there you go. Michael, you are correct. You can feel very
good about yourself. Very good, very good. So you're right, Michael, we do have
we do have this nice little matrix and we
have these numbers, and you know what's really interesting about
this? That we can figure out what the payoffs
are for B without having to do any additional work.
- That's true. But it's even more interesting than that. Which is, now that
I've written down this matrix, nothing else
matters. This whole tree, all these rules.
- None of it matters. It's irrelevant.
- Wait, wait, but. Okay.
- Because everything about the game is captured here in this matrix.
This is actually called the matrix form of the
game, and it comprises everything you need to know about
it. It doesn't matter ha, what it means to go
left or right or what it means to go middle
or whatever. The point is by following this strategy and
this strategy, you end up with seven for a. This
strategy and this strategy, you end up with two for
a, period. And how you got there does not matter.
- But it does creep me out a little bit that
what your saying is that all that matters is the matrix.
- You know
we should write a movie about that.
- I think it's been done and it's scary.
- Well, not the first one.
- It was kind of scary. Like people are
trapped in this big box and their slurpyness.
- But their happy.
- But they don't know their happy.
- What do you mean they don't know their happy?
- What is means to be, Michael, Michael, Michael.
- Speaking of which now how do we, how do
we figure out. What do we do with this now?
- So that's the big question right?
So, what was the whole point of doing reinforcement learning? The whole point
of doing reinforcement learning was to
optimize your long term expected reward right?
- And so you pick the policy that would get you to the best place
possible, so here's a question I have for you. Give that this is everything we
need to know, this little matrix of values. These are all the policies A could
choose from. We're calling them strategies here.
But, you know, strategies, policies. There are four
A can choose from. And there are three B can
choose from. What will you think A and B actually do?
- Okay, so, you know? A wants the rewards to be high.
So, seven is the highest, so A should choose the upper left corner.
- But A can't choose the upper left corner.
- A can only choose a strategy. B gets to choose some strategy.
- I see. So A is choosing the row and then B
gets to choose the column.
- So then B should choose that also because that's what A wants.
- No, B wants to maximize what B gets. And remember, B always gets the
opposite of A because. Because it's a two-player,
zero-sum, finite game of perfect information, deterministic something.
- So, so you're saying that if we, if A chooses the first row.
- Say. Mm-hm.
- Which is the left, left strategy, and B now has
a choice between three values and will choose the one that
is worst for A, which would be the minus
- Okay then, so then what if A chooses the second row?
- Okay if A chooses the second row?
- So now again B is not going to let them have that seven which is kind of
sad, but still cant make it to bad for
A so B would choose the middle right. Strategy.
- And then get, and then that would be a 3 for A, a minus 3
for B. But wait, hang on. So that was done thinking of it as A move,
A kind of making the choice first.
- That doesn't seem fair.
- Okay, well then do it the other way.
- Alright, so if B chooses first, then B could choose
the far right column because that's where the minus 1 is.
- That's where it's going to be happiest. See,
B seems kind of, kind of mean. It's like
it's happiest when others are suffering. Well, but
A is also happiest when others are suffering.
- I see. So, if B chooses that column
then A wants to choose the second row and get the four, so
B could choose the middle column, which then A would choose the two.
- A would choose what?
- One of the twos. One of the bottom two rows.
- No he wouldn't.
- Oh, right. A is trying to maximize, so A would choose
one of the top two rows. Oh, yeah, that makes more sense.
- Then choose a three which is better for B
than having chosen the far right, and B could choose
the far left column, but then A would choose one
of the sevens, and B would be unhappy with that.
So we'd end up with B choosing the middle column
and A either choosing the top or the second row. Mhm.
- So that's kind of the same answer we got the other way.
- That's exactly right.
- And is that, was that just luck, or
did you make this example to make that happen?
- No, I didn't make this example to make that happen.
In fact, the process you went through is exactly the process
you would expect to go through, and you will always end up with the, with the
same answer. For a two player zero sum
game. You know, deterministic, finite, all those other words.
- The process you just went through, is exactly the
process you would expect to go through. So let's see
if we can, if we can be a little bit
more, you know, explicit about the process you went through.
So, in particular, the way I heard you write it down was that
A must consider the sort of worst case counter strategy by B right?
- I see, because when A chooses the row, then B was going to
make things bad for A along that row, so that's the counter strategy you mean.
- Right, and in fact, when you try to do it the other way with B. Well,
B has to do the same thing. B has to consider the worst case counter, as well.
And in this particular case, the way we set it up.
Where the values for A. A is always therefore trying to maximize.
And B is always trying to minimize A. Which works out
to be the same thing as maximizing itself. Does that make sense?
- Yeah! I mean, other than the fact that, you know, I name
my first child Max. I really wanted to name my second child Min.
- [LAUGH] That actually would have been pretty cool. Why didnt you do that?
- Because I'm married to someone with more sense then i have.
- Yeah i understand, i completely understand. Okay so, A is
trying to maximize B is trying to minimize they both have
to consider the worst case that the other will do. And
so whats thats going to is going to force them through exactly
the same process you went through. We just figure, I'm going to
make the choice so that my opponent makes the counter choice,
the worst case choice. I will end up as best as
I can. So A is going to basically try to find the
maximum minimum, and B is trying to find the minimum maximum.
- In fact that strategy has a name, or that way of
thinking about it has a name. What do you think it's called?
- The minimum, maximum.
- Yes, or Mini max.
- Which is movie production company, I think.
- No, that was Miramax. Do you recall
that where you have seen Mini max before
Michael? In some other class that you once
taught or once took years and years ago?
- Mmm. No.
- Mini max was exactly the algorythm that
we use for game search. And intro to AI.
- Oh, which was a game tree, which is just what we
started with in this case. Even though we turned it into a matrix.
- Exactly. So
in the end, we, you know, this matrix induces a game tree, if you want to
think about it that way. Or, game tree
induces a matrix, if you want to think about
it that way. And the strategy then, in sort of basic AI search, and the strategy
in game theory is Minimax when you're in
a two player zero sum game of perfect information.
- So is there a way to do alpha beta pruning on this?
- All alpha beta pruning does is gives you a more efficient
way of finding the answer.
- I see but it's the same answer no matter how you set it up.
- Right. Cool.
- That's right. Okay, so this is pretty cool.
So, we have set up a kind of game
where we have multiple agents, you know, or at
least two agents in this case, who have different
strategies. And we actually sort of know, [SOUND], you
know, sort of, in a world where it's a
zero sum game, and you know the other person
is trying to minimize what you get, maximize what
they get. That the mini-max strategy would actually
give you sort of an answer, in this
case, by the way. We say that the value of this game for a is three.
If a does the rational thing, and b does the rational thing, that is trying to
maximize their own value, you will end up
in this situation. That's kind of cool, don't you think?
- Very cool. I feel like there should be a theorem.
- There is in fact a theorem. I'm going to write it down.
Okay Michael, so here's this theorem that
you, you so desperately wanted. You ready?
- I'm going to read it to you, because you can't read. In a two player
zero-sum deterministic game of perfect information, minimax equals maximin.
- Alright you told us what minimax was,
but you didn't tell us what maximin was.
- Well maximin is like minimax, except the
other way around. So a is trying to. [LAUGH]
- You know, one side is trying to minimize
the maximum, the other side is trying to maximize the minimum.
- It's exactly what we described before, just depends upon whether you're
looking at it from a's point of view or b's point of view.
- Oh, I see, like, which, do you choose
a column first or do you choose a row first?
- Exactly, so whether you go a first followed by b,
or b first followed by a. You're end, going to end up
in the same, with the same result. And that, more importantly,
or at least as important, there always exists an optimal pure
strategy for each player. In other words, you can solve those
games and you know what the answer is. Once you write down
the matrix. You just do Minimax, or you do Maximin and you
end up with the proper answer. And now you know what the
optimal players would do. Now there is a subtlety here which
I got it a little bit in the previous slide, when I
talked about rational agents. And what we're sort of assuming in everything
that we discuss here is that people are always trying to maximize
their rewards, okay? So we define the, the
reinforcement learning problem that way. That my goal is
to find a policy that maximizes my long
term expected reward. You know so I'm trying to
find, to get the best reward that I
can. And what you're assuming here is that everyone
else is doing the same thing and they're
assuming that everyone else is doing the same thing.
- DOes that make sense?
- It does thought I'm a little bit stuck on this
word optimal at the moment. Right. Well, that's what I'm trying
to get at actually. That optimal here really means I'm
maximizing the reward that I can get, and I'm assuming everyone
else is doing the same thing. And I'm, furthermore, I'm assuming
that they're assuming that everyone else is doing the same thing.
- So, so I guess I'm wondering whether. Whether this theroem is
vaccus in a sense that are we defining optimal to be mini max.
- What we're defining optimal to be, I think.
So that's a good questino. I think I would unroll the vaccus one level by saying
this. Optimal here, basically has to be optimal
in respect to what? And the respect of
what here is the underlying assumption that everyone
is trying to maximize their rewards. And that
everyone knows this. So, in a world where
you have perfect information. It's zero-sum. Then, the strategy
of Minimax and Maximin give you the same answer.
And that furthermore there is always some place where the
column and the row cross, the best column and the
best row cross. And that is always going to be
the solution to that particular game. Now, if we weren't
in a two-player zero-sum deterministic game of perfect information, that
might not be the case. But in a case where
we're in this sort of simplest version, where everyone's being.
Rational, that is, optimal, that is, trying to maximize their, their
own reward. And assuming everyone else is maximizing their own reward, this
is the right thing to do. Now, I've got a little
weasel word here as well which we're going to get to in
a moment which is this notion not just of a strategy
but of a pure strategy. [INAUDIBLE] There's a reason why we have
notions of pure strategies because in the end as we get
more complicated we're going to have to do it with impure strategies.
- Okay, but are you with me so far?
- I think so, yeah.
- So basically all that stuff we did in AI with game trees and search
is kind of what you would expect people to do if they knew everything. [LAUGH]
- So, So, I feel like I could prove this theorum in the case of trees
because you just prop, you just kind of
commute values from leaves up to the root.
- And, it, it doesn't matter. There is no notion of who goes first or
who goes second. So there's just going to be
one answer, to that process. It's not obvious
to me, how to show it, if you, once you've. Turn the tree into a matrix, that
that matrix, I guess because it captures the
same information, it ought to be the case that
this is still true, but like, I'd have to kind of sit down and think it through.
- No, and it's, so, so, to help you
think it through, I guess what I would suggest is
if you have the matrix, you can create a
tree that's consistent with it. Because every row and every
column represents a strategy. You don't know what
that strategy is, but you can, since it's
a finite matrix. You can construct a tree
that is consistant with that major. In fact, possibly
an infinate number of them, I'm not sure,
but you can certainly construct at least one that
is consistent with it. And then once you
have the tree you just do what you said.
- Alright, good. So we've got this
fundamental result and now what we're going to
do is we're going to try to be a bit more interesting. But it
is important to go through this because now we've
got some basic vocabulary and some basic building blocks okay?
So here's another game tree. We have, again, two
players, a and b, and a gets to make
a choice, first, to go left or right. And
b will find herself in a situation, perhaps, where
she gets to choose to go left or right
as well. I've drawn these little square boxes, though,
to represent chance. So what this means is that
if a goes left, you end up in a chance
box where you flip a coin and 50% of the time you end up over here, and 50% of
the time you end up over here. Along a similar
vein, if a goes right and then b goes left,
then you end up in another chance node, and 80%
of the time you end up here, and 20% of
the time you end up here. Alternatively, if b goes
right, you always end up here. Does that make sense?
- I think so. Uh-huh.
- So what we've done is we've gone from
a two player zero-sum deterministic game of perfect information
to a two player zero-sum non-deterministic game of
perfect information. Okay, so we've relaxed, we gone at
least one level up in complexity. Okay, so do
you understand this tree? Do you understand this setup?
- Yeah, I think so. I mean here
the stochasticness is happening essentially at the leaves.
- What does that mean?
- That there's no choices to make for either player after this
randomness happens. But is that. That's not what you mean in general, right?
- No, that's right. There could be,
after here, you end up in a different state
where you can then make more choices. But because
I don't have enough room, maybe because I want to
make it simple, it just sort of ends after that.
- But yes, this tree could keep going
on and there could be choice, random, choices chance
nodes happening everywhere. There could have been a
chance node that happened at the very beginning, even.
- I feel like I could work out the value of this game.
- Oh, well, how would you go about working out the value of the game?
- I would
probably have it as a quiz.
- Okay, but what would the quiz look like?
- It would say, what's the value of this game?
- Okay. Do you think that just having that be the quiz is the best
way for someone to learn or do you think maybe it can be done in stages?
- Oh, I don't know. Do you have other questions that you want to ask?
- Well, you know, how would you go about
determining the value of the game? I think the first
thing that you would do, at least if you were
patterning after what we just did, is you would try
to write down a matrix.
- So, I'm going to write down a matrix. So
that our students will get a chance to, to work out
what that matrix looks like and then from there figure
out the value of the game. Are you okay with that?
- Okay. So if we were going to do
that of course. The first thing we'd have to
do is we'd have to figure out, figure out
what the strategies are. So. What are a's strategies.
- I would have called them left and right but it, but they're not labeled.
- Yeah well let's
do that we can call them left and right. A can
go left or A can go right. What about B's strategies?
- B can go left or right.
- And B can go left or right.
Game Tree Quiz
- Here's your first quiz question. In a world
where A can go left or right and B can
go left or right. What are the values that you
would put in the cells of this matrix? Again, this
is zero sum. So, these values are from A's
point of view and implicitly, there's we know what the
values are for B. So, we're just looking for the
values from A. Okay? Do you understand the quiz, Michael?
- Okay, so go.
Game Tree Quiz
All right Michael, you know the answer?
- Okay, what's the answer?
- Do you want to know the value of the whole game?
- No. I want to know the matrix.
- [LAUGH] Okay. So right. So, if A goes
left, it doesn't matter what B does. And at that
point there's a chance node, and it's 50/50 for negative
- That's right. How'd you get negative eight? because half
of 2 is sorry, half of 4 is 2 and half
of -20 is -10. So -8.
- Right, so you just took the expectation of where you would end up.
- Okay. What next? And that, it doesn't matter what b does. So
then negative eight is also in the upper right corner of the matrix.
- Fair enough.
- The next easy on to do is if both go right. A goes
right and b goes right then we get the three. It's just sitting there.
- And the last thing requires me to do
some multiplication. So
- -5 times 0.8 Is like -4.
- 10 times 0.2 is like 2 so -2.
- That is correct Michael.
- So now we have the matrix. Now, what did
we just notice here? Remember what I said about matrices before?
- It has all the information that you need.
- Right, so in fact none of this matters.
Who cares how we got here?
- wait, but, but, oh.
- [LAUGH] I love erasing stuff!
- So here's the thing, we cannot reconstruct that tree from this matrix.
- We can't reconstruct, well, actually we
could reconstruct that tree from the matrix.
- No, we can't.
- Yeah, we, of course we could.
- No, we can't.
- Yes, we can, because there's an infinite number of
trees we could do, and one of them is that one.
- I see.
- We just don't happen to know that its the right one.
- I don't think that's really what people mean when
they say I could reconstruct this. Like I could reconstruct
the crime at this crime scene, it could be any of
a million things. Like no, we can't construct that specific tree.
- But you know what it doesn't matter,
because the only thing that matters is the matrix.
- And you will notice, Michael, that you did all that multiplication
and you multiplied 0.8 times 5 and after a couple of seconds of
thinking about which was probably edited out but I know how long
it took you do. You came up with the right answer. That's great.
But notice that the number you came up with doesn't say anything about
expected values and what the probablities are. It doesn't even matter that it's
nondeterministic, because once you have these
numbers, these expected values, that's what you're
going to end up with. So who cares what the original tree was? Who cares
if you can reconstruct it? Who cares what the rules of the games
are? All you know is I have a choice of two strategies. We
call them left and right here, but we could have called them one
and two, or Q and Z. It doesn't matter. For the purpose of solving
the game. So what is the solution for the game by the way?
- Oh, A is trying to maximize right?
- So A would never, ever, ever, ever want to go left.
- So A's going to go right and then B is trying to
minimize. So B is going to go left and we get the -2.
- I believe that is correct. And it
makes sense the other way as well, right? That
B would not ever choose to go right because
then A would choose to go right as well.
- So you'll still end up here.
- So, here's in fact another theorem for you Michael. So, it turns out the
theorem that we wrote down before is
still true in the case of non-deterministic games.
- Oh, cool.
- Of perfect information, right? By the
way, do you know whose theorem this is?
- Charles' theorem?
- No, although that would be cool.
- Von Neumann's theorem?
- Yes, this is von Neumann's theorem.
- Oh really?
- Yeah. Did you just guess?
- Von Neumann, he's responsible for everything.
- Do you know, well, you going to remind everyone who von Neumann is?
- He's my uncle?
- No, you're right. You're right. Von Neumann,
so we talk about von Neumann architectures in computer
science, that the basic design of a microprocessor is
still following the ideas that that he worked out.
- Right, the von Neumann Machine. So there you go. So, this is pretty
good, right? So, what we did, so what did we learn so far? What we
learned so far is, the only thing that matters is the matrix. And once you
have the matrix, you used mini max, at least if you're in a two player
zero sum game of perfect information. At least if you're in a
world of two player zero sum games of perfect information. We can
just write down this matrix, throw everything else away and use mini
max or maxi min and figure out what the value of the
game is, which is the same thing as saying as, we know
what the policy is. The kind of joint policy, in a world
where everyone is being rational and trying to maximize their rewards and
assumes everyone else is doing the same. That's pretty cool, I think.
- Awesome, so though, you know, I feel like
I could make a matrix that wouldn't have this
property. This maxi min equals mini max. And, and
that we couldn't make a tree out of it.
- You probably could, but I'm going to guess that
it's going to require that it's no longer zero sum.
- Nooo, I'm going to say no to that.
- Well, you know what, I, actually now that,
I think I understand what you mean and the answer
is yes you could. And in fact, that's what we're
going to do next when we relax the next little bit.
- So, would you like
to do that?
- Sure, though I'm afraid if we get too relaxed, we might just fall asleep.
- In fact, why don't we do that, why don't we take a nap and then come back?
the, the joy of being on video, asynchronous napping.
- Done. Okay, I'll see you in a second.
Okay Michael, so here is a, another little game for us
to play. And, what I want you to notice about this
game before I describe it to you in detail is that
we have relaxed yet another one of the constraints. So, we
started out playing two player, zero sum, deterministic games of perfect
information. There's also a finite in there somewhere. From now on
we're just going to assume everything's finite, because why not? And then,
what we did last time. Just a few seconds ago. Is we
relax the deterministic part, so we have two player,
zero sum, non-deterministic games of perfect information, and now
we are going to relax the requirement for perfect information.
So now we're going to look at two player, zero-sum,
possibly non-deterministic games of hidden information. And this is
really important Michael because, this last little bit of
relaxation, going from perfect information to hidden information is
going to be sort of a quantum leap into the
difficult problems of game theory. So this is where
it actually starts to get interesting so we have been
building a foundation so far and now we are
going to get to the interesting and complicated stuff, okay?
- Wow, one of the things that I learned
just now is that the opposite of perfect is hidden.
- I always thought the opposite of perfect is imperfect but okay but hidden.
I guess if I do not have a perfect face I should hide my face.
- Thats in fact the atomology
of the phrase.
- Alright, I understand now.
- Yeah, cool, It's in wikipedia. Here we go, let me describe
the game to you, are you ready? This is a version of
mini poker where there are a set of cards, but they have
no numbers of faces on them they are just red or black, okay.
- And red is bad, for our hero, Player
A, and black is good for our hero, Player A.
- I see.
- Wait. Why's it have to be red?
- I don't know, man. You know, red. You know how it is. Okay.
So, here are the rules. They're all written down on the screen, but let me
walk through them with you. So A is delta card. Magically. It will be red
or black. and, the probability of it being red or black is 50% each. Okay?
- Right. So we have a uniform prior over, over the
color. Now, remember red is bad for A and black is good
for A. So, it's going to turn out without loss of generality, that if A gets
a black card, A's definitely going to hold onto
the card. Okay? Now A gets this card.
B, player B, does not get to see the card. A can choose to either resign or to
hold. If A resigns given a red card, then he looses 20 cents. Okay?
- Okay. Wait. So A's dealt red. A may resign if
but only if red.
- And then A loses 20 cents.
- Okay. Alright, okay.
- Okay, so this is a betting game. It's not strange it makes perfect
sense its sort of a metaphor for life. Now A can choose to hold instead,
hold the card. Thus requiring B to do something. So if A holds the card B can
either resign or can demand to see the card. Now if B resigns
then A gets 10 cents. Regardless of the color of the card. Okay?
- That make sense?
- Okay. Now if B chooses to see the card, in fact demands to
see the card. Then if the card is red, then A loses 40 cents. But
if the card is black, then A gets 30 cents. And since we're betting, this
means that whatever A wins, B loses and vice versa. That makes it zero sum.
- Got it.
- Okay, is this all make sense?
- Yeah, I
don't know if I can hold all those different numbers in
my head. But I, but the basic pattern of it is,
that as you say Red is, red is bad, black is
good. If A gets a bad card, A can essentially either fold
- Right, resign? Or kind of bluff, like hey I've
got this great card and then A and if
B believes that the card is bad then and
calls A or, or, or, and, and just folds then,
then A gets, A wins that. But if B says no I think
maybe you're bluffing and calls him, then
everybody's rewards are more extreme, I guess.
- >Exactly. So it's just a version of poker. A weird version of poker.
Simple version of poker. But a version of poker, none the less. There's a
minor little detail here which isn't that
important, but you know, notice it's written
that A will A may resign if its red. Basically, A will never resign
on a black card. Because it just doesn't
make any sense. And so there's really, it just,
there's not point in riding it out. Okay.
Because black is always good, sort of, nothing bad
can ever happen to A, if A gets a black card. So there's really sort of no
point in riding anything out. But that's just
a minor detail. Regardless these are the rules. Okay?
- You got it? Sure.
- I'm going to re-draw this as a game tree which might make
it a little easier to keep all the rules in your head.
Okay so Michaell here's the tree version, of what I just
said. So remember I draw squares as chance nodes. And so,
chance takes a, chance. And half the time we end up
in a state, where I have, where A has a red card.
And half the time I end up in a state where
A has, or we end up in a state, where A has
a black card. Okay? If A has the black card. Then
B gets to choose whether to hold or not, so let me
add a little bit of color for you since you wanted
some color. This is the place where A gets to make a
decision. Okay. Yep. And then this is the place where B
gets to make a decision. Got it. Okay? So, A is going
to either be in a red state or a black state.
Only A knows this. B does not know this, does not know
what state He is in. And so let's say A is
in a black state. Well, A can only hold in that case.
In part because it makes no sense to resign. And
then B gets to decide whether to resign, and therefore
A gets ten cents or to see, in which case
A gets thirty cents. By contrast, when we're in the
red state A can either hold or resign. If A
resigns he loses 20 cents. If A holds then B
gets to decide whether to resign and which case A
gets 10 cents, or to see the card in which case
A loses 40 cents and B of course gains 40 cents. So this
is just a tree version of what I just wrote. Does that make sense?
- Okay, yeah I can see how this kind of captures the flow of information but
I feel like there might be a missing
constraint to it. So [INAUDIBLE] I don't know if
there's actually a constraint in this thing but
let me just point out something. And that
is that B has no idea which of
these two states his in, and its hidden information.
- And because B doesn't know what
states he's in. He doesn't know whether resign over see. In particular B knew
that he was in this left mode state then he would always say see.
If B knew he was always in the rightmost state, then he would always
resign. But he doesn't know which one, so it's not entirely clear what to do.
- Neat! Where did you get this game? This game is awesome! .
- I got this game like I got all
of the examples that we're using today, from Andrew Moore.
- He's clever.
- He is clever. And
I have shamelessly stolen... All of his examples
and materials for this. But he said it was
okay, by putting up slides in the world and
saying, feel free to steal all of the stuff.
- You want to write his name so that people can credit him?
- Yeah, I'm going to write it at the very end. When we talk about.
- What we've learned.
- Okay. So here's a question for you Michael. We know that I wrote down
a bunch of words which describe the game, and then I made it a tree,
because I could do that. And it makes it nice and easy
to see what's going on. But now we know, that at least
everything else we've done. We want to make a little matrix. And so
we want to figure out what the value is for different strategies for A
and B. So I'm going to assert, and I think it's pretty
easy to see, I hope. That A basically has only two strategies. Either
A is the type of person that resigns when a card is
read. Or A is the type of person who holds, when a card
is read. Agreed?
- Interesting, okay. Yeah. Right. So
it really is a conditional policy, right?
It's basically If red hold to resign, if black hold to resign, but your
point is that black isn't really a choice, red is a choice, and
there's only two choices. Okay I can see that as the two strategies sure.
- Right and of course this does say,
this is, this does kind of say when you're
in the black case you know you're going
to hold. So you know what's going to happen. Ultimately
B can either be the kind of person who resigns,
whenever A holds or chooses to see, whenever A holds. Right?
- I see so, like in the previous trees. B would have four different
strategies Resigner see in the left state, resigner see in the right state. Here
there's this kind of extra connection, this
sort of a quantum entanglement between these
two states that it, that makes them have to be the same. So there really
is just those two choices.
- That's exactly right, although, I probably wouldn't have used
the phrase quantum entanglement. But yes, there's an entanglement, certainly.
because we can't tell which state we're in. You're either
going to resign or you're going to see. And you
just don't know what else to do. Okay so A's
a resigner or holder. B's a resigner or a seer.
So the question is what numbers go in this matrix?
And we're going to figure that out by having a quiz.
- I kind of saw that coming.
- Mm-hm, do you think you can figure this
out? yeah, yeah. Sure. Sure. Yeah, I think so.
- Yeah, see, see, see. Yeah, I can
do that. Okay, cool. So, let's go then. Go.
Okay Michael whats the answer? Let's start with resigner resigner.
- Alright, resigner resigner. Resign resigner diner.
Alright, E, so resigner vs resigner. So resinger,
if A is a resigner that means whenever A gets a red card A resigns. Yes.
- And that would be a negative 20.
- But that's not going to be the answer is it. Because A
doesn't always, A doesn't always get a red card A
sometimes gets a black card if A gets a black
card [CROSSTALK] than a resigner, resigner that means B is
going to resign and, and a plus 10 will happen.
- So those are the two possibilities and they're equally likely, so it's a
minus 10 divided by 2 which is minus 5. Right. You were correct, sir.
- Whew. Alright.
- Which one do you want to do next?
- Resigner seer.
- So again, oh, so the, yeah, good. This is a good choice, because
now it's the same as argument as before, except for when we end up in
that far right node, and that means minus 20 in half the cases, plus 30
in half the cases Which is a plus 10 divided by 2, or plus 5.
- That's exactly right. Well done Michael.
- Okay which one next?
- So holder resigner. So holder reigner.
That means when A, gets a card. A is going to hold the card.
- And that's true, red or black.
- And then, then, it's going to be B's turn, and B is
going to, oh, we're doing holder resigner. So, it's going to resign.
- So, oh, well, interestingly, I think that takes us
to those two leaves, both of which are plus ten.
- Because, why does that make sense?
Because B, oh, 'cause B doesn't get any,
no. Right, right, right, because it's independent of the
card. You actually said that when you explained the rules.
- So its, its average of plus 10 and plus 10, which ought to be plus 10.
- It is in fact plus 10. Well done. Okay what about holder's here?
- Whew, alright, so that's a case where A holds so we go down those
branches and we end up, we always end up in one of the blue circled states.
- Yep. And
B sees half the time that leads to minus 40, half the time that leads
to plus 30. So that's minus 10 divided by 2, which is minus 5 again?
- Yeah, that's exactly right. So that's
correct, Michael. And that's pretty cool, isn't it?
- Yeah, I really like this game. I didn't think you could
have anything that had sort of poker essence and be this tiny.
- So yeah, so we live in this really nice
little structure here. So I have a question for you.
You ready for the question?
- I'm trying to guess what it is, but sure.
- Okay, here's my question. What is the value of this game?
- I was thinking that you might ask that. So,
can I, can I step through it, is that okay?
- Yeah, sure, go ahead. So a's choosing the first row or
the second row. So if a chooses the first row, and then
b chooses the column, then if it's the first row, then b
is going to choose the first column. So a's going to get minus 5.
- The same story's going to go through on the bottom row. If a chooses
the bottom row, then b's going to choose
the seer position which gets the minus 5.
- Right. So from this so, it seems that the value of the
games minus 5. But now let's do the same thing on the b column.
So if V, resigns, now a gets to choose resign or holder. And
it gets a plus 10. And if B is a sire. Then A chooses
between re signer and holder and gets plus 5.
- So then from this perspective, the value
of the game is plus 5. So, so here's
a case where it better not be that
we could take a perfect information game and put
it into a matrix and get this out,
because this is something that can't, like, it doesn't,
it doesn't fit your theorem, right? We can't get
the value of it by doing minimax or maximin.
- Exactly, the problem here is that once
you move the hidden information. Minimax is not
necessarily, and in this case, definitely is
not equal to maximin. So von Neumann.
- As we all see.
- Yeah, what has, what has he ever
done for us, anyway? His theorems and his computer
architecture that rules the world. Anyway, so we
seem to have a problem here. And the problem
is that once we go to this hidden
information, as I promise complexity enters in Michael, and
now we can't do something very simple with
the matrix, find a pure strategy that's going to work.
It really is the case that A strategy depends upon what B will do and
B strategy depends upon what A will do.
And if you don't already know what that's
going to be, you don't actually have a value for the game. And in some sense
you can get every single one of these
values. So, there's got to be some [INAUDIBLE].
- I feel like, that's sort of what you'd
expect in a game like this. Right. So, if, because
of the way that it is. If you know that
I'm always a resigner. That I'm always going to, what?
[LAUGH] Oh, that when ever I have a
red card, I'm going to resign, then. You know
that if I don't resign, I have a black card, so you know that you should resign.
- Yeah, so it's, it's, it's one of these things where if I
am really consistent and never bluff, say, then you can take advantage of
me, and vice versa. Like, if you always respond the same way to
when I act a certain way, then I can manipulate that. So it's kind
of like this game of this sort of mind game like I want you to
not know that I know the thing that you don't know that I don't know.
- Right but the problem is that I know what you
know and I know that you know what I know. And that
I know that you know that I know that you know
that I know that you know that you know what I know.
- Did you really think that I didn't know that?
- [LAUGH] And so you end up in this terrible situation. But. There
was a key word that you used there, Michael, and it was consistent.
And everything we've talked about so far, pure strategies, is
exactly the same thing as talking about consistency. So, the way
we're going to get around this is, we're going to stop
being consistent, or at least, consistent in the same way. Okay?
- And to cheat here, is that we are now going
to introduce, instead of pure strategies. Let's see the opposite of pure.
- Mixed, that's exactly right.
- No, so, rather than sticking with pure
strategies, we're going to start using mixed strategies.
So what's the difference between a pure strategy and
a mixed strategy, Michael? Well, it's, it's simply this.
A mixed strategy, simply implies, or means, some distribution,
over strategies. So in the case of two strategies,
like we have here, where you can either, A
can be either a resigner or a holder, we're
going to simply say that the mixed strategy for A
is some value for P. Which is the probability
of choosing to be a holder. So do you, you see
what's going on here? So the only difference between a mixed
strategy and a pure strategy. Is that for a mixed strategy,
you choose some probability over all the different strategies that you
might choose. So you decide that, you know, going into this
I'm going to flip a coin. And half the time I'm going to
be a resigner, and half the time I'm going to be a
holder. Say. Or 30% of the time I'll be a resigner.
And 70% of the time I'll be a holder. Okay?
- Where as with pure strategies, you always chose
on or the other. So technically, it's the case
that a pure strategy's also a mixed strategy where
all the probability mass is on a single strategy.
- Makes sense.
- So in this case we're going to, in
fact, choose P to represent the probability for
A of choosing to be a holder rather than a resigner. And so P can be
any value in between. You with me on that?
- Yeah, that's neat.
- Okay, good. To make certain you understand this I'm going to give you
a little quiz. Which I have up here on the screen. You ready?
- Oh, I see it. [LAUGH] It's like those, those very square boxes.
- I didn't even realize what, what, what this could be about.
- It certainly couldn't be a quiz because Charles has never drawn a straight
box in his life. I had drew those Michael. It took me 17 hours.
- Oh man, you are, you're committed to this and I appreciate that.
- I am committed to this. Okay, so,
given that we have a mixed strategy, and we
have a probability P of A being a holder,
here's my question for you. In a world where
B is a resigner, okay? B is always
going to choose to resign. What is A's expected profit?
To make it easy for you, I copied the
matrix over here in the upper right hand corner.
- Wait, wait, wait. If B
is always a resigner.
- B is always a resigner.
- Then, what's, and what is A?
- A is going to choose to be a holder with probability P.
- Oh, so you want this to be a function of P.
- If it's, if it's, okay, sure, maybe [LAUGH]. It could, well, I mean, yes.
It's a function of P, it just might be a constant function that ignores P.
- It could be, in principle. Okay, now, after you figure
that out, I want you to decide, in a world where B
is the seer, B always chooses to see the card. What would A's expected profit
be, in a world where A will choose to be a holder with probability p. Okay?
- Got it?
- And that's going to be, oh yeah, that could be different
because it's a it's a different strategy, though you know seering.
- Like like if you're a resigner you're a resigner, but
if you're a seer then what you are doing is seering
- That would be an anagram of resigner.
- How do you see
- I don't know. They're just there. I just,
they words just kind of mix themselves up. Alright anyway,
I'm, I think I am ready to do the
function. I'm, I'm ready to stop looking at the letters.
- Okay? And go!
What's the answer? If B is the resigner,
we don't really care about the other column anymore.
- Then what's going to happen is A is mixing between resigning and holding.
- And probability P is a probability of being a holder,
so whatever P, whenever that event happens it gets ten, and
whenever one the opposite of it happens it gets minus five,
so I want to say ten p plus one minus p five.
is that your.
- Let me try that again.
- Is that your final answer?
- Ten p. Well I could simplify it.
- Okay. Well then say it to me again. I'll write it
out here so it's easy for you to see. Okay and that is.
- My answer?
- That's fair. Do you want to simplify it? You don't have to.
- Sure, let me simplify it, so it's a minus minus
p five and a ten p so that's fifteen P minus one.
- Minus what?
- Minus what?
there you go. Thank you.
- That's correct. We would obviously accept either
answer or any combination of those letters. That, [LAUGH]
- No, I think I might have to do this quiz over actually.
- No, not the one. I'm talking about either 15 p minus
five or ten p minus one minus p times five. Or ten
p plus one minus p times minus five. Or any other combination
that we can get push car to actually Bothered to, you know.
- Im, Implement or check. Okay. Well, that was pretty good. And this,
of course, is exactly the expected profit. As you put it, P
times A as a holder and P times or P percentage 1
minus P percentage A chooses to be a resigner. And so it's
just the weighted average between those two values. So, Kent, let me
just double check that. So, if P is 0, that means it
never holds, it means it always resigns and it gets -5, so
that's right and if P is 1, means it always holds, so
it should get a +10. And 15 -5 is 10. So, boom!
it works. You used math there, very good. Okay, what about B?
- B. So the same story, except on the seer side.
- So yeah, I might need that space again.
So 5, oh I see, right, minus 5 times p.
- Plus 5 times 1 minus p.
- Mm-hm. If we simplify that we get.
- There's a minus 5 and another minus 5. So we get minus 10 p.
- Mm-hm. Plus 1.
- Plus what?
- Plus 5.
- There you go. See you learned. Okay you want to check it?
- yeah. Oh, that's a good idea. So again if P is 1, then that
means you're always which should be the minus 5 and if we put in a 1
there, we get 5 minus 10 is minus 5. And if P is 0, that
means that we're always a resigner, and we should get a 5 for that. And yeah,
so we zero out the negative 10 and we get the 5.
- Now, it's not clear to me why we're playing this game. Oh, it is clear to
me why we're playing this game, because we
want to figure out something about. Strategy that is mixed.
- So this is how well a mixed strategy does, but
not against another mixed strategy.
This is against two deterministic strategies.
- But is it, Michael? But is it? So
- I'm going to, oh, okay.
- I'm going to notice something here, which is that you,
as you astutely pointed out earlier, we have two functions. Of P or to
equations of P, and by the way, do you know what they are? They're lines.
- Sure, because it's just a, it's linear in P, so that's what linear means.
- Right, so what would happen you, do you
think if we were to actually draw these lines?
- I think we'd have two lines.
- Yes, and what would those two lines
look like? Let's take a look, shall we?
Okay, so what I've done here Michael, is I have
drawn both of these lines. So here's the line 15 p
minus 5. This is a case when b is a
resigner. This is my best attempt at drawing it to scale.
You start out minus 5, as you point out, and
you end up with plus 10. And this is the line
where b is a seer. That's minus ten b plus 5,
so you start out with plus 5. And you end at
minus 5. Okay?
- So, what do you notice about these two lines?
- They make an x? So they intersect.
- They do intersect. Can you tell where they intersect?
- How would I solve that?
- You'd have a quiz. Okay Michael, so here's a quiz. Where
do the two lines intersect? Think you know how to figure it out?
- Okay. Then go.
- Okay Michael, what's the answer?
- I don't know, but I would, here's how I'd get it. I'd set the equations
of the two lines equal and find the p where the, where those values are equal.
- Oh, okay. So here let's do that.
- And I feel like that's like 25 p on one side and ten on the other
- So like 10 over 25, which is like 2 over 5, which is, like, 2 over 5.
- Yes, which is also 0.4. So 0.4, that's exactly right, and it's,
you should do it exactly the way you said. Set
the two equations equal to one another, do simple algebra, and.
- So what would have happened if p ended up being, like, not a probability?
- Then they wouldn't have crossed inside.
- Good point.
So by the way, here's a question for you. We can make it a quiz but I'm
not going to make it a quiz. What is the value of the game at p equals 0.4?
- So I just plug it into those equations, and so negative 10 times
- Mm-hm. So the value here is in fact $0.01. And if you put it
in the other equation you get the
same thing. Yes, right, because they're equal there.
actually the answer to the entire kit and kaboodle. If.
- Sorry, where were the kittens?
- If A chooses a mixed strategy, where with probability 0.4 he
chooses to be a holder. Notice that it doesn't matter whether B
is a resigner or B is a seer. You will end up
here and there will be a value of plus 1 penny to A.
- On average.
- The expected value is.
- plus 1.
- Now you might ask yourself, well what if B
decides to be a smarty pants and also do a mixed
strategy? What do you think would happen in that case? So
of course it changes something, but it doesn't change the value
of the game, because B, if B is a resigner, A
is getting one on average. If B is a seer then
A is getting one on average, and an, any average, any
convex average of one and one is going to give us one.
- That's exactly right.
So in fact if you think about it, if B tries to
do a mixed strategy between these two lines. It's going to have to
be for every single point between the two lines, somewhere the average
is going to have to be somewhere in between there. No matter how you
weight that average. So it's like a bow tie is this space
of possible payoffs. Right and since for any let's if I just,
If B decided to pick some, you know, some values such that
it's in this region of the space. And it's going to end up somewhere,
here that's fine, or here that's fine, or here that's fine,
or here that's fine, or here that's fine. More importantly, right here
it's going to have to be between the two lines. Where those
two lines cross. So, no matter what mixed strategy B chooses. On
average we will end up here, so the value of this
game is the expected value of this game and it is, plus
there are other values that can be obtained. There is a minus
- So you say that the strategy for A is to choose .4.
- But why? Like, why is that, of all the
different values. What [INAUDIBLE], just because the lines intersect. I
don't understand what makes that a good idea for A.
Like, it's not like it gets any additional payoff for having
things intersect. Well, if a is going to choose a mixed
strategy, and b is going to choose a mixed strategy. This is
the mixed strategy for a, that guarantees an expected value of
plus 1. Meanwhile, let's imagine this is the way you're going to set
it up. So we've been kind of talking in general that
a's going to choose the strategy; b's going to choose a strategy, and they're
going to go. And that, because we know everyone's rational, you already
know what strategy b's going to pick, and b already knows what strategy
a's going to pick Right, because we made this assumption about everyone
is trying to maximize their own utility, there own reward. So in
a mixed strategy, if you know you're going to go for a
mixed strategy, you have exactly the same situation. B can figure out
well what is it that A should choose and A can
figure out what is it that B should choose. So notice that
in this particular version of the game, the way it's setup. Even
if A announced beforehand to B. Listen I am going to choose
to be a holder with probability 0.4. It doesn't matter
what B chooses the expected value is this +1. Right?
- But imagine if A said I am going to pick, I'm going to choose to be
a holder with probability 1. Well then, what
should b do? B should choose to seer.
- It's exactly the situation we were in before.
- Okay, but here's the thing I'm having trouble wrapping
my head around. So, it's not special that it's an intersection,
what's special maybe, is it that, you know, if b is always.
Okay; taking what you said before, that a is going to announce a
strategy. So for anything that A can announce B is going to
presumably do what's best for B which is to just minimize right?
- So if you look at that triangle at the bottom.
- If you think about that as those lines that V, upside down
V shape. Yeah exactly that thing. If you
think about that as being the payoffs, the
payoff function for A, as a function of
the probability P that it chooses to hold.
- Then we've chosen the maximum.
- Right, we're trying to find the, the probability for which A gets
the highest expected value. And it's at that, it's at that peak there.
- Hm-hm. But notice something, Michael. For any case
where the two lines cross, you're going to have a function of this form.
- Well, let's be sure of that, because that's,
I wasn't seeing, I was thinking that that was not
true. because basically in this case, we have a, a
line of positive slope and a line of negative slope.
- Right? So, what if we, we can have
an intersection between two lines of negative slope. huh. So
again, by doing the same exercise you were doing before,
where you draw the, you sort of take the minimum
of the two lines at all points.
- You'll end up with this. Where do you pick it then?
- The far left, p equals 0.
- But not the intersection in particular.
- No, that's true.
- Okay, alright, I just, wasn't quite
getting that. But so, the intersection is special,
Because in some cases that is where the max is. It can only I guess
the max can only be in those three places, right, it could be far left
far right or the intersection. There' s no other way to have a maximum of lines.
- Right. And
b by the way there's another case like this where
they never actually cross.
- Or like this where they never actually cross. Got it.
- Right, but in either case, right, it's always going to
be those three points. It's going to be the extreme or
where they cross. If they happen to never cross, then
that point goes away and you just pick the maximum. So
what you could, so, in fact, if you wanted to
think of an algorithm to choose the right thing to
do for A, you basically plot the lines. You take
the min at every point and you find the maximum point.
- Oh I see, so you could just kind
of discretized it almost.
- Yeah, and you're done. Seem reasonable?
- Well discretized seems problematic but like I get
that we're trying to find the probability so that
we maximize. Oh wait we maximize the minimum of
the two other things. So it's like maximin again. Yeah.
- Except here, so, in fact, it is exactly min
and max or max and min, except, oh, I guess, if
you're thinking from A's point of view. It's min and max
or max and min but in this case, there's this other
parameter, which is the probability. Which is how
you determine how you're doing the min and max.
- Got it, right. In this bigger space, it's min
and max. I see, yeah, yeah, yeah, that makes sense. What
about so, why is A the one who needs to be
random? Why isn't B the one who needs to be random?
- It's, you end up with the same, you end up in
the same place. Because again, remember,
you're, you're doing this kind of optimal
notion, where underneath all of that is this belief that both people are
going to be rational. So the only reason to do this, to take
the, you know, maximum minimum, is if you believe
that B is always going to try to minimize what
you do. But B's in exactly the same situation.
Right, from B's point of view, That's what's happening.
- Huh. It's the same equations?
- Yeah, it just it gets turned around. That's
actually an interesting exercise. Maybe that's the next homework assignment.
- So, did that all make sense to you?
- [LAUGH] [CROSSTALK] It looks a little
green and scribbley, but yeah. Sure. That's cool.
- That's how you know
when you're done, Michael. When it's all green and scribbley.
- [LAUGH] Does this generalize to more than two
options? It seems like it's going to get messy fast.
- Yes, it does. It generalizes to more than two options. Effectively
now, you're just doing a max over n things, instead of two things.
- And you have to search for,
there's possibly way more intersections to worry about.
- Yeah, but all you care about is the minimum.
Think of it as you're always drawing the minimum function.
Okay Michael, so, feels like we've relaxed almost
everything we could relax except for one thing. We're
now going to look at two player non zero
sum. None possibly none deterministic games of hidden information.
- And that's going to turn out to be messy, but
get us to the place where I've secretly been trying to
get us all along. So, let me describe a game
for you. Very carefully, and let's see where it leads us.
Okay, here's the game, you've got two people. These two people are criminals.
- Oh no! Are they smooth criminals?
- One of them is.
- [LAUGH] Oh, you're terrible. Okay, so
we have two people. They're criminals. Okay,
and unfortunately they've both been captured by
the cops. I have actually no idea how
to draw that, but lets just try to draw that like this. The cops come along, and
capture them both because they are suspected in a particular robbery. Okay?
- And they take them and put them both in jail.
So those are jail bars.
- Okay? But they don't just put them
in jails. They put them in two separate jails.
- Oh no.
- Okay. And one cop goes to one criminal and says
listen, here's the deal. We know you did it. Okay? We know you did.
And your friend over there, he's currently singing, singing like a bird. And he,
is going to pin it all on you. So this is your last chance to give us a little
help and admit that you two did it. Or, that the other guy did it.
You admit that the other guy does it, you say it was his fault, then we'll cut
you a deal. Now to make it worse, the
cop tells him that there's another cop over there.
- Who's talking to the smooth criminal, and
is offering him the same deal.
- And whoever, goes first, whoever pins it
on the other guy first, gets to walk. Okay?
- So, if I can get the curly-head guy to defect, okay?
- Then, I am going to let him walk and he will spend zero months in jail.
- Okay. On
the other hand, if the other guy defects, then he's going to
get to spend zero time in jail. And since we've got all
that we need. We got a confession. To get this guy to
go to jail. He's going to go to jail for nine months. Okay?
- Negative nine months?
- Yeah. Oh, that's a cost in months. I see. Uh-huh. [CROSSTALK]
- Yeah that's a cost in months. Okay? So
if, curly-head guy defects before the other guy does,
then zero. He walks. He pays no cost, other than
the cost he's already paid for a life of crime.
- Now if he refuses to drop a dime on the other
guy, but the smooth criminal decides to defect he's going to lose nine months.
That make sense? And the other guy's going to walk. So he's
getting the same deal, they're both getting the same deal. You got it?
- All right, now. It's a little more complicated
than that, all right? But I, I hope that you're
getting all, keeping all of this in your head. It's
a little more complicated than that. There's actually four choices
here, not just two. It's not just a case of. I
defect or the other guy defects. I defect, the other
guy doesn't. I don't defect, the other guy does. Okay? You
could both refuse to drop a dime on the other.
You could cooperate, or you could both rat out the other.
- At the same exact moment?
- At the same exact moment.
- So, there's a big thick wall here. Curly guy doesn't know what
smooth guy is doing, and smooth guy doesn't know what curly guy is doing.
- So the question is, if you have a choice between either defecting or.
- Which means blaming the other guy? Which means
blaming the other guy you defecting from your friendship.
- Oh. Or you can cooperate, that is be true to your
friendship. I can, we both people can defect, both people can cooperate,
one person can defect and the other person
cooperate, so there's actually four different options there.
- I fee, I feel a matrix coming on.
- Yeah, I think there's a matrix
coming on. So let's draw the matrix, because
I think that makes it easier to see. But you have the background here, right?
- I think so, yeah.
- The key piece here is that each person has
a choice of either defecting from their friendship or trying
to cooperate. Keeping their mouth shut, and, they don't know
what the other person is doing. So, what are all
the costs here? What are the worst case things?
Let's just draw it all out as a matrix.
So let's just call the smooth guy A, because that's what we've
been doing all along. And he can either choose to cooperate, or
he can choose to defect. Now by the way, I'm saying cooperate
and defect here because they're terms of art, not because these are the
words I would have chosen. Okay? B can do the same thing.
B can choose to cooperate or B can choose to defect. Now
I've set up the game in a particular way here. The, the,
the cops have set up the game in a particular way here. If
B defects but A cooperates, then B gets to walk and A has to pay the price,
nine months in jail. So, I'm going to put into
this cell minus 9, 0. So, this means this
is the value of this set of strategies for A, and this is the value for B. So
the first number. In the pair is what A
gets and the second number's what B gets. Okay?
- Got it.
- You with me?
- Okay, now notice we have to do this now because it's
no longer zero sum. It's not going to always add up to a constant.
- Oh. Well, it is, so far it's a constant negative 9.
- That's right, oh and in fact it looks this
way as well because I have a symmetric deal here.
So if A defects and B cooperates, A gets to
walk and B goes to jail for nine months. Okay?
- So right now, you're right, it's looking like a zero sum game,
but it isn't. Because what happens if both A and B
drop a dime on each other? Well, they both confessed. So it's
a little good, it's a little bit better than having one
of them, only one of them, confess. The deal that the DA
is willing to give them is that both of them will
spend six months in jail. Now these are just, these are just,
the, the numbers that are a part of the game. This is,
I'm not computing this from anywhere. This is just the way the,
the cops have set this up. And I'm going to, now what's
an interesting question here is what happens if both A and
B keep their mouth shut? They both choose to cooperate? It
turns out that it's not a perfect world because they were
caught with a bunch of guns they didn't have permits for.
So if neither one of them admits to robbing the bank,
they're still going to do a little bit of time. But in
this case, it's a small weapon's charge and so each only spends
a month in jail.
- I see, so now it's definitely not zero sum.
- Because we have a negative 2 there, a
negative 12 there, a negative 9 and a negative 9.
- So, you got the game, you understand it?
- Yeah, I think so.
- It's very simple. Now, just looking at
this, what's the best possible outcome for the duo?
- So if they cooperate with each other, then there's you know,
one month later, they're back on the streets, back to their criminal activities.
- Or they're reformed. You never know.
- Sure. Back to their choice of
whether to have criminal activities. The, the mutual
defection one, that, where they both defect,
it's either, well, 12 months or six months,
depending on how you think about it, but they don't do nearly as well. And
then the defect and cooperate, it seems like
there's a lot of incarceration that will happen.
- So, it feels like the best for the,
for the two of them is to mutual cooperate. Cooperate,
- Yeah. And that makes sense. This is sort
of what you want to happen. Both of them
keep their mouth shut. They do a little bit
of time, but on average, they do pretty well. Right?
- So my question to you is, is that going to happen?
- Sure, why wouldn't it happen?
- Well, you tell me. If I know
that you're going to cooperate. Let's say that I'm
A, okay? And you're B, and I know
that you're going to cooperate. What should I do?
- You should cooperate.
- Should I? You've
chosen, you've chosen this column. All right?
- I see. Well, if you think of it that way then it's not a joint choice,
but it's actually an individual choosing. Then you're
better off defecting because then you get off scot-free.
- Then, but I go to jail for nine
months. I could have a whole baby in that time.
- Usually takes closer to ten but sure. But
you go to to jail for nine months, I don't.
- So if you are just that cold.
I guess because you're a criminal.
- It doesn't matter. Remember, the matrix is everything. The
value of doing this to me is 0 versus minus 1.
- Alright, that's, that's cold, man.
- I agree, but it's what the numbers tell
you. What if it wasn't criminal? What if it was
just, you know, the amount of money that I
was going to win in a, in a mini poker game?
- Yeah, but it's a different kind of mini poker game,
because we're both, because. [LAUGH] It's like you beat me. But,
by beating me that way, you like, kill, nearly kill me.
- So? Are you saying that when?
- What do you mean so?
- Are you saying you always let me win whenever we play poker?
- Okay then. Because you're cold, is that what you're saying?
- No. Wait, what?
- Exactly, right. So the point, Michael, is that if I know
you're going to cooperate, you're going to choose
this column, then I should defect.
- Okay, alright, so, fine. So now I'm going to jail for nine months.
- Mm-hm, if you choose to cooperate.
- Aha. If, if is good. So what you're
saying is, I could drop a dime on you. [LAUGH]
- You could. So, you could choose to defect, and if
I knew you were going to choose to defect, what would I do?
- Well, you already, you already showed
your colors, man. You're, you're defecting, so
I'm just switching, I'm just saving myself three months by, by ratting you out.
- Yeah, so we would end up here. By the way, you know that
this game is symmetric, right?
- no. You defected first. [LAUGH]
- No, here's the thing. Since the only thing I care about is maximizing my
own reward, my own value. I'm going to do the thing that makes sense for me to
do in this case which is if you cooperate, defect. If you defect, defect. But
you would do the same thing because your
whole goal here is to maximize your value.
- Yeah, well you don't know me like that.
- Yes I do because it's everything that's here
in the matrix. This is the wonderful thing about
game theory. All the stuff that you're concerned about is all
inside the matrix. Remember the rules of the game don't matter. There's
no prisoners here. There are no criminals. There's just, I get a
dollar, I lose a dollar, or I lose zero dollars. I lose
$9 or I lose $6. Or, for that matter, cents. I
lose one penny or I lose no pennies. I lose nine pennies
or I lose six pennies. It doesn't matter. This is the value
of these particular strategies. And I'm
going to want to defect if you cooperate.
You'll notice if you defect, I'm also going to want to defect.
In fact, if you look closely at this matrix, you
should notice something, Michael. From A's point of view, when
does it make sense for me to cooperate versus defect?
- You mean if you have a, some
kind of probabilistic policy over cooperate and defect?
- No, no, not even that. Just in general, is there ever a
time as A that I would rather cooperate than defect or vice versa?
- Oh, I see. So if I know you're going to
cooperate, I should defect to, to save myself a month. But if I know you're
going to defect, I should defect to save myself
three months. Either way, I'm coming out ahead.
- Right. So in fact, let's take a look at this.
If I look at the value for me as A, between cooperating and defecting
in this first column. It's minus one versus zero, right? Which number is bigger?
- [LAUGH], zero.
- If I look at it in this column,
it's minus 9 or minus 6. Which number's bigger?
- Negative six.
- Right. In both cases, defecting is better
than cooperating. So in fact this choice, this strategy,
dominates the other. In other words it is always better for me to defect than
cooperate. So I will never cooperate, ever.
Because it's always better for me to defect.
- So fine. So I'll never cooperate ever. That'll show you.
- By exactly the same argument, minus
one versus zero, minus nine versus minus
six. Defecting is always better. B will
never cooperate. What's the only thing that's left?
- Pain. This is called the Prisoner's Dilemma.
- Huh. You said it was simple, but it seems kind of evil.
- I didn't say it wasn't evil. Those
things aren't opposite of one another. It's not like
perfect versus hidden. Or pure versus mixed. It's
not easy versus evil [LAUGH]. Evil's often actually easier.
- It's easy and evil.
- So we're in a little depressing place
here, Michael. You claimed in the beginning, and
I agreed with you, that this is sort of the best option,
you want everyone to cooperate, because that's sort of what's best for the
group. But because defecting dominates cooperating both for A and B you're going
to end up here. At least if you reason about it that way.
- Yeah, I see that.
- Hence prisoner's dilemma. The only way
to break this is if somehow we could
communicate and collude with one another. So
that we could guarantee that we would both
choose to cooperate at the same time.
- So that wall? What about that wall? What if
you put a like a Skype connection or a Google hangout?
- Well, if you were forced to say what you were going to say at the same
time, or somehow be able to punish someone
maybe for, you know not doing the right thing,
then you might be able to make a
different decision. But for this very simple version where
I'm going to do it once, even if I could hear what you had to say, if one
of us, whichever one of us went first, the second
one would always be able to take advantage of that.
- I do find this kind of depressing.
- Yes, it's a dilemma. It's a true dilemma. So, this brings us to a more
general sort of strategy. This whole notion of
strict dominance, works in this case. But, you could
imagine very complicated large matrces where it may
not work. But it turns out there's a
generalization of this notion of dominance. That works
remarkably well. And that's what people tend to use
to try to solve these kinds of games, to find out what the
true value of a game is. So letme describe that for you, ok?
A Beautiful Equilibrium
Alright Michael. I'm going to define a new concept
for you. It is called the NASH EQUILIBRIUM.
- Okay, here's the set up. You have n players. So we move beyond simply
two players. You have n players. And
each player has strategies that it can chose
from. So here I'm referring to them as, S1, these are all the strategies for
player one. As two are all the strategies for player two up to S N
all the players for strategy N. Got it?
- Yep. Okay so each of these are set, so im
going to say that the particular strategies, S1 star S2 star
S3 star, so on and so forth as N star that
is a strategy that each of the N players has chosen.
- Is in a Nash equilibrium if and only if for each one of those strategies
chosen by the n players it is the strategy
that maximizes the utility for that particular player, it
is the strategy that maximize the utility for that
particular player Given all the other strategies that were
chosen. Now I actually find that difficult to kind
of work through, so let me try to say
it a different way. Given that you have a
set of strategies as one star, as two star,
as three star, as n star. We know that they
are in Nash equilibrium, if and only if. If you randomly
chose one of the players, and gave them the chance to
switch their strategy, they would have no reason to do it.
- Interesting, okay.
- So does that make sense?
- Right. So an equilibrium right, just in general, the word, the word sort
of makes sense, right, an equilibrium is
at a place where everything is balanced.
And in some sense, there's no reason
for anything to move because they're in balance.
So we set that a set of strategies are a Nash Equilibrium if no one person
has any reason to change their strategy
in a world where everyone else's strategy remains
the same. Then you're, then you're in equilibrium,
and in particular you're in a Nash equilibrium.
- Nash equilibrium. Okay good.
- Do you know who Nash is?
- Ogden Nash was a poet.
- Not that Nash.
- I think there was a TV series with a Nash in it.
- Yes there was.
- But you're thinking of John Nash.
- That's right The Nobel Prize winning person who
was the, was featured in the movie A Beautiful Mind.
- Yep, that's exactly right. And it
was, in fact, a Beautiful Equilibrium. Okay,
so, there's the notion of a Nash Equilibrium. It was admitted by John Nash,
if you believe the movie, in order to pick up women. Which
I, you know, you know I never, I've never seen this movie.
- Oh really? Oh, I watched it and it was, it was
surprising how unhelpful it was in explaining what a Nash Equilibrium was.
- [LAUGH] I'm not surprised. I hope this is helpful, though. So,
it really is a kind of difficult concept to completely wrap your, your
mind around, but what it really boils down to is, listen, if
we all picked a bunch of strategies and we knew that one other
person, one of us, we don't know who before hand, but we know
that one of us Would have the opportunity to change their strategy after they
see what everyone else's strategy is. We'd be in a nash equilibrium only
if that person, whoever that person is, has no reason to change their strategy.
- Gotcha. So, so specifically you've made a distinction
between strategies that were pure and strategies that were mixed.
- And I guess I don't see in this case which kind we're talking about.
- Right. So,
in this particular case, we're talking about. Pure
strategies; however, exactly this wording works with mixed strategies.
- Oh, I see. So you could have
a pure Nash equilibrium or a mixed Nash equilibrium.
- Right. And so, you could, each one of these, instead
of choosing a particular strategy and saying they're a Nash equilibrium, you
could talk about a probability dristribution over each of these sets of
strategies and say those are Nash equilibrium if. No one would want
to change their probability distribution.
- Got it.
- So this works for both pure and
mixed. Alright, so you think you understand this?
- [LAUGH] sure.
- Good, we will test that.
A Beautiful Quiz
- Okay, Michael. So you think you know everything, so here's a
quiz. So, here are two matrices, matrix prisoner's dilemma and matrix bunch of
numbers that look vaguely symmetric but
aren't quite. So here's what I'm going to
ask you do do. In each of these cases find the Nash equilibrium.
- Rut-roh. So do, you told me what it was but you
didn't tell me how to find them so that seems kind of rude.
- It was implicit. It's intuitively
obvious even to the most casual observer.
- Oh, well then, okay. So, and just to remind me now, each row
and column is a choice for one of the players or the other player.
- And the first number in the pair is the row player,
or A's pay off and the second number is B's pay off.
- And we might need probability distributions, or we
might not, depending on what it takes to be a
- okay, I don't [LAUGH] I'm not, it's not like I see the answer yet but it,
but I either will be able to find it,
or I won't. At least I understand the question.
- Okay. And by the way, just to make it
easier for you there are pure Nash equilibria here, okay?
- Alright, so you ready?
- Just circle the one or underline it or whatever it is Pushcar
does to make it so that you can answer this quiz. You ready?
A Beautiful Quiz
Alright Michael, you got the answer?
- Yea, I'm ready to try and figure it out.
- Alright let's go. Let's try the first one.
- I feel like the first one's going to go rather well. So we
need a pair of strategies so that no player is. Motivated to switch.
- And you told us they were pure strategies, so we actually have a nice
algorithm for doing this, which is we could
just check one of them with the definition.
- But, but in the case of prisoner's dilemma, I think
a natural place to start would be that minus 6, minus 6.
- So let's say that A chooses the second row and B
chooses the second column, let's see if that's a Nash Equilibrium. So
both players need to be happy. So if, would A be happier
switching? If A switched, it would be getting minus 9, which is worse.
- So A is happy, where A is. And if B
switches, B would be getting minus 9, so boom, Nash Equilibrium.
- Now I didn't verify that other
ones weren't a Nash Equilibrium, but you didn't
say. Find all of the Nash Equilibrium. You just
said find, well you did say find the.
So you kind of implied that there's just one.
- Right. But actually that's true, but you don't have to check this because
we already went through an exercise where we, where we knew the answer was minus
second row is always better than this. Right? And then in particular this
row strictly dominates this row.
- Which implies that if I picked anything on this
row I would rather move to the other row. And
you can see it. Minus 1 0 is better. Minus
be strictly dominated. So, I'd never pick this row anyway.
And this same argument for B for this column. So,
you'll notice that by getting rid of the things that
are strictly dominated, the only thing we're left with is this.
And, it turns out, in fact, to be a Nash Equilibrium. So, this is correct.
- So, you just told me how to do my job, which makes me little sad.
- Well, in this instance.
- Because I already had the answer to this. But maybe, maybe
you were signalling to me how I might attack this next problem. Maybe.
- So, A is choosing a row and gets the first number.
So, is there a row, that, were they dominates all the other rows?
It doesn't seem that way.
- But maybe, maybe, oh I could start with B, maybe B, there's a
column that dominates all the other columns.
But no, it looks like it's totally symmetrical.
- Yep, so strictly dominated doesn't necessarily
help here, and by necessarily I mean doesn't.
- So that was kind of mean. Thank you.
- All right, so oh, but we have something else we could do.
- So, there is a, the largest number that anybody can
get is 6.
- And there's a play where both of them can get the 6.
- So there's no way they're going to want
to switch away from that, because everyone's getting there
kind of maximum out of reward. So A
bottom row, B right column, gets us Nash Equilibrium.
- And that is in fact correct, and you can
see it because from here I would always, it would
always be worse for me, and it would always worse
for me. So, these are in fact the Nash Equilibrium,
for these two problems.
- Cool. They've seemed easier than I was expecting.
A Beautiful Equilibrium Two
So here are three, fundamental theorems that come out of
all of this work on Nash equilibrium. They're all in
the screen in front of you. I'll read them for
you in case you can't read my handwriting or if
there's some typo you discover that forces me to erase
some words and then write some new ones. Okay, are
we ready? Alright, the first one is, in the n-player pure strategy game,
if elimination of all strictly nominated strategies,
eliminates all but one combination of strategies, then that combination
is in fact the unique Nash equilibrium. So that's, that's what
happened in prisoner's dilemma, we got rid of all but one
option and that option had to be the unique Nash equilibrium.
- You say, eliminate all of them, do we, is it possible that things
that we couldn't eliminate in one round, we could eliminate in the next round?
- Yeah it's possible, so in fact, this is done in an
iterated fashion. You get rid of whatever you can eliminate and pretend
they were never there. Because the truth is, no
one would ever choose them. And then with what you
have left, you just do it again. So it
is possible. Although not in any examples that we did.
- The second one is and this, both of these I think are kind
of Obvious. They sort of make sense anyway. At least to me. That any nash
equilibrium will survive the iterated elimination of
strictly dominated strategies. In other words if
you get rid of things that are
strictly dominated you will not accidentally get rid
of nash equilibria in the process. And that
makes sense because, if they're strictly dominated then
if you ever end up there, you would
want to leave. And therefore can't be a Nash.
- And therefore can't be a Nash equilibrium, that's
right. So, those two things sort of make sense I
think. The last one is true but isn't at least
painfully obvious, at least not to me, and that is.
- If n is finite, that is you have a finite number of players, and for each
of the set of strategies, that set of strategies is
also finite, in other words you're still in a finite game,
then there exists at least one Nash equilibrium which might
involve mixed strategies. So you're saying there's always an Nash equilibrium?
- Yes, possibly mixed, for any finite gain. So, when you said I
relaxed everything, I didn't actually relax the
requirement that it be a finite gain.
- Fair enough. But you stopped writing
finite, so it's sort of the same thing.
- Yeah, I agree. Because why would you play an infinite game?
- I don't know. Maybe you got a lot of time on you hands.
- Mm. That would be like an infinite jest. [LAUGH]
- Okay. So those are the, the main, results here. So, what did we learn?
What we've learned is that with interesting
games. We end up in these weird situations
where we have to figure out how to solve the game and one really nice
concept, is this notion of an equilibrium and
in particular the notion of a Nash equilibrium.
- And there you go.
The Two Step
Let me just wrap up by doing a couple quick things. Earlier
on when we were talking, you made a kind of offhand comment about,
or maybe I made an offhand comment, about the fact that because we're
doing this little prisoners dilemma thing, we have this sort of wall here,
we're going to end up in this little bad place. Right? With
the -6, -6. And there was this kind of notion that what if
we could hear each other, would that make a difference? And the answer
was, no not really, because who ever goes first is going to lose.
And if you try to get people to go at the
same time, well in some sense, that's the same thing as having
a wall and you can't hear each other if you have
to go at exactly the same time. Right, does that make sense?
- I didn't quite understand. Why does the first, person who goes first lose?
- Because if I find out whether you're going to cooperate or
defect, then I will just then do the thing that makes sense.
So if you do anything but defect, I will defect, and
then I will win and you will lose. So whoever goes first
runs the risk of the other person screwing them over. So.
- I see. Unless, unless that person defects.
- Right, but if that person defects, the other person's going to defect.
- Right, so, but you don't lose any more than you would have.
- That's true, and, but you still end up in this kind of
unfortunate place. So, one question you might
have there is, yeah, but that's just
because you can only do this once. But what if you could do this
twice? So, lets imagine were, going to be in this situation today and then were
going to be in this situation tomorrow because, you
know, you keep leading me astray. So, whatever happens today
I can use as information, for what I might
do tomorrow. So, if I decide to go ahead and
cooperate this time. And you defect, and I know
you're the kind of person who defects, and so when
we're in this situation again, I am definitely going to
defect. So maybe in that situation, it's in your best
interest to go ahead and cooperate
because, then we can keep coopoerating together,
and in the end it sort of works out. So maybe this notion
of not just playing prisoners' dilemma
once, But playing prisoner's dilemma twice or
three times, or four times, or five times, might come to a different result.
- Yeah, that seems plausible. I mean, I could
imagine saying to you though this channel in the
wall, something that says, you know, cooperate with me
or, yeah, right, or I will, I will stop cooperating
with you. And so, you get more reward if you, if you cooperate with me.
- Right. So, my question to you is,
what does happen exactly in this set? Where
I have two versions, two consecutive games of
prisoner's dilemma to play. What am I going to do?
- Ok, well now I mean, we could turn this into a
game, right? I mean, that's what you were teaching us how to do.
- So you, so now the game has four,
well, I don't know. So one way to do it is
to say A has a choice to make on the first
step, and [COUGH] that could be cooperate or defect, and then
the second step could be cooperate or defect, so there's four possibilities.
- B has the same four possibilities. But maybe we can actually
have A be responsive to B. So A has two things to
do on the first step and then two things to do on
the second step for each thing that B did on the first
step, for a total of eight combinations.
- Right. You could do that. [CROSSTALK]
- So if we made an eight by eight matrix, we should be able to solve it.
- Right. So, that's pretty easy to do.
You just kind of draw an eight by eight
matrix in, you know, that many dimensions. And
it, you know, you end up right here.
- I wasn't thinking about that many dimensions. I
just thought, like the normal kind of matrix. But, okay.
- Oh, I see, I see, I see.
- I mean its eight by eight, so that's kind
of a pain. Right. So, its 64 cells and, I guess,
I guess we don't want to fill that out.
- Yeah, but you know. We could fill that out. But, I'm going to help
you out here by pointing out You know what? It's not going to make a difference.
- So, here. Let's see if I can walk
you through why it's not going to make a difference.
Let's imagine I have 20 of these games. Okay?
- So I'm going to play these games 20 times in
a row. Now what you've been doing is you've been
going forward. You've said, well, if I can get us
the right thing or if I can basically threaten you and
say. If you screw me over this time, I'll screw
you over from now on. Then maybe I can get you
to do the right thing and vice versa, since it's
a symmetric game. And then we'll end up doing the right
thing all along. And that makes a lot of sense, right?
- If you're going forward. But what
if you're going backward, Michael? Let's imagine
we are doing [CROSSTALK]. What is amazing
to me is that is exactly backwards what
you just said. So, let's imagine we are doing these 20 dot dot dot dot,
and now I'm on the 20th game, okay? What's going to happen on the 20th game?
- Well, I mean, I guess I could have built up
trust in you from the previous games and
think that you're going to cooperate with me, because we've
been cooperating together. And then that would be
the perfect time to drop a dime on you.
- Exactly. So, if this is the 20th game and we're
in this situation, remember whatever value we've been adding up along
the way, that's sunk cost. Right. The
only thing that's left is the final game. And the final game looks like this,
which means we're going to end up here.
But guess what, Michael? The final one is
determined. So since we already know the outcome
of the final game, the only one that
we, the next one that we can look at is the 19th game. Well, we already
know what the outcome of the final game
is. So this is effectively the last game.
- And what is the outcome of that
going to be? It's going to be this and backwards,
backwards, backwards, backwards proof by induction because
the only that proves computer scientists know
how to do is proof by induction. It turns out that we will always defect
- For what it's worth, those are the only proofs worth doing. But, okay.
- That's true. It's a fine point. Truth by induction and
perhaps by conduction [LAUGH]. So, there you go Michael, even if I
can play multiple, multiple games to try to build up trust,
the truth is the Nash equilibrium, if I filled out that eight
by eight matrix you wanted me to fill out. I would still
end up in the same place where I would defect both times.
- Certainly if, yeah okay, so if you're going to be a jerk then, then I
have to be a jerk, but yeah, I
guess that's right. That's, that seems pretty horrible.
- It is. And by the way this is not
just something I'm making up. This is another theorem that
comes out of Nash equilibrium. That if you have an
n repeated game then the solution is n repeated, Nash
equilibrium. So whatever the Nash equilibrium is
for the first game, the one version
of the game, is the repeated Nash equilibrium for all versions of that game.
- Okay, wait a, hang on, hang on. So I. Okay, I buy that.
- But couldn't you, well so, in a game that
has more than one Nash equilibrium, couldn't we like, al, alternate?
- We could, and in fact you'll notice that in so far,
I have not talked about the case where you have multiple Nash equilibrium.
Because in fact, that's it's own problem. If I have one
Nash equilibrium, then we know what's going to happen. If I have two
Nash equilibria, which one do we choose? Now, what's important to know
here, is that if we have two Nash Equilibria then that means
that if you're in any one of them, you won't leave
it. So it's not like you'll move from one Nash Equilibrium to
another, sort of in the general case. Because the fact that it's
an equilibrium means that if everyone else is fixed except one person,
that person won't choose to move. But. I haven't said anything about how you
would choose among multiple Nash equilibria. And
that's beyond the scope of this class.
- But it is an active researcher and in fact
some of my own students have done some work in
thinking about. What it would mean to choose and actually
the answer always boils down to, let's not worry about that.
- [LAUGH] All right. But I'm still kind of disturbed
by this, right? So it seems like it's sort of saying, is if we
knew the world was going to end tomorrow then we might as well be greedy
bastards. And, in fact, if we know that the world is going to end at
any particular time, which of course it
will. We might as well be greedy bastards.
- That's right. So, be greedy bastards.
- Oh, no!
- Or at least, or at least be you know, Nash bastards.
- [LAUGH] We'll, the Nash is kind of greedy right, it's like I'm always
taking an action that's always best for me.
- Yes, you are.
- Every single thing we've talked about today. Has always
assumed that you're always doing what's best for you. The fact
that it might also be best for someone else is
neither here nor there, you're always doing what's best for you.
- Right, I get, I get that, but it's, it sounds like this argument is saying
that we might as well be like that, like all the time and never really form,
like never, never self-sacrifice even a little
bit for greater advantage even to yourself.
- Well, that's not true. This, it's just
all hidden in the utilities, once you've gone through
all the what I'm going to do today, what
I'm going to do tomorrow, you add it all up,
whatever the right strategy is to take. That's
the strategy that you take because it's already accounted
for all the self-sacrfice you might do that then
leads to later good deeds or later good outcome.
So the wonderful thing about all of this, is that this
matrix is everything you need to know. It captures the future, past
and everything else, so everything you need to know is right
here. And it's all been reduced to a bunch of pairs of
numbers. But you did say something kind of interesting though, Michael,
which I think is worth pointing out, which is that everything you
say requires knowing when, at least in this sort of iterated version,
requires knowing when the world is going to end. So an interesting question
to ask would be, what would happen if I knew the world
was going to end but I didn't know when. Would that change my behavior.
- It, it doesn't seem like that
should make any difference because it's still
going to end. If not today then tomorrow. Or if not tomorrow then the next day.
- True. But I bet yeah it does make a difference.
- I'm willing to go and think about that.
- Okay, why don't you go think about it and let me know.
- Alright. I'll, I'll tell you about it in the next lecture then.
- Okay, I like that. Excellent. Okay, cool. So,
let's move on.
- [LAUGH] Okay, fair enough.
What Have We Learned
All right, Michael. So, I think that brings us
to the end of what I wanted to talk about
anyway. So, can you help me remember what it is
that we've learned today? In particular, what you've learned today?
- Sure. So, I guess the first think I learned is
that game theory can make you depressed. And, in fact, in
particular that my friend Charles, given the opportunity. Would totally drop
a dime on me just to save a month of incarceration.
- Wait, no,
no, no, I don't think you summarized that
correctly. Game theory is not depressed. It's depressing.
- Oh, yeah. That's a good point, that's a good point.
- And Michael is not cruel. He is the victim of cruelty.
- I don't think so. Because you want to know what the secret here is,
Michael? You've got a little matrix of
numbers. Those number capture what's going on.
The truth, Michael, is that, if we were in Prisoner's Dilemma. I would cooperate
with you because my utility is not simply the number of months that I
would spend in jail. But it's the number of months you also would spend in jail.
- Oo. Interesting.
- So if I, the best way to
beat Prisoner's Dilemma, is to change the numbers.
- [LAUGH] I see. It's like the Kobayashi Maru of game theory.
- Exactly. So there's an interesting question for you
right there, Michael. If I had prisoner's dilemma here, here,
let's write it out so you can remember. If I
had prisoner's dilemma here, we already know that we're going to
end up here, because that's what the numbers tell
us to do. But what we'd have to do
is change the game. So how would you change
the game in prisoner's dilemma? I see. So, if, if
we're thinking about it in particular in terms of
I care about how long you spend in jail.
Maybe not as much as I care about how
much I spend in jail. Like maybe half as much.
- Then, the payments shift, right?
- So, now we have like minus1 and a
half, minus 1 and a half in the upper left
- Mm-hm. Minus 9 comma minus 4.5, minus 4.5, minus 9 and minus 9, minus 9.
- So, yeah.
- So now that, that bottom right becomes a lot
less attractive if we actually care about the other person.
- Well that's, that's, that's, okay, I'm less depressed now.
- Except of course, that requires that you feel that way internally and that
I feel that way internally. There's another
way that you could change the game here.
Which is, what happens to sniches in jail?
- They are rewarded.
- No. No they're not.
- They're punished.
- So, if your a part of the criminal fraternity, and you don't like
prisoners dilemma, then what you have to do is to create a system Where the
people who snitch get punished. So it's not just the months that they spend
in jail, it's everything else that's going to
happen to them if they drop a dime.
you're saying that minus 6, minus 6, ends up being worse?
- No, what I'm.
- No, the minus, wait, no, wait, what? Yeah. Oh, the
zero ends up getting, oh I see the zero ends up being
worse. Because even though your not in jail your going to get
I don't know somehow thwarted or, or punished for your past behaviors.
- That's right. So that's
what you have to do and that works not just with criminals but with
the real world. Whenever your in this
sort of situation like a prisoners dilemma,
you can change the game by changing
everyone's utilities. Like for example hiring police
officer, police officers or hiring members of
the mob to take care of everything.
- I see. So it almost seems like what
you're talking about is a kind of inverse game
theory, where if there's a particular behavior that I
want to see. How do I set up the payments and
rewards so that that behavior is encouraged.
- Right, and by the way that has
a name, and it's called mechanism design. Mechanism design?
- I'm not sure I understand either of those words. [LAUGH]
- Well, that's where you're trying to set up the set of incentives,
the mechanisms that you're using to pay people. You're trying to design them in
such a way to get particular behavior. This is what a lot of
economics is all about. This is what a lot of government is all about.
Tax breaks for example, for mortgage interest, encourages you
to buy a home, rather than rent a home.
- I see, by changing the payoff structure.
- Oh that's neat.
- And so that's what we learned today. At least right now. [LAUGH]
- Okay. Alright. So, let's see. So, just to try
to rattle off some of the other things. The whole
notion of Game Theory. We talked about, especially the idea
that. You can think about a game as a tree or
you could represent it as a matrix.
And, I believe you said, repeatedly, the matrix
has everything. Is that how you said it? Or the matrix is all you need.
Let's see. We talked about minimax and maximin.
- We, we relaxed a bunch of constraint on games.
- So we, we looked at both perfect and hidden information.
- We looked at both zero sum and non zero sum.
- We learned a lot today. We looked at deterministic
and [UNKNOWN] I would want to say, but you called it non-deterministic.
And assuming that we can get rid of the first two
bullet items that look like maybe they were jokes. I would suggest
saying things like, we talked about what strategies are and that they
come in different flavors. We talked about the evil prisoners dilemma game.
- Mm-hm. What else? You gotta give me more, otherwise it's [INAUDIBLE].
- Oh, more, good point. Andrew Moore gave lots of really good examples that
- Michael may be cruel, but Andrew Moore is awesome.
- He's more cool than me. Andrew Moor is very
cool. All of the examples that we've used today, or almost
all of the examples we used today actually come from
Andrew Moore's slides. Andrew Moore is a professor at Carnegie Mellon
or at least he was before he went off to
Google. And is a really smart guy who cares very much
about machine learning. And game theory and produced a bunch
of slides that it turns out lots and lots of people
use in their own courses. And his examples were so good for game theory. That I
decided to coop them with his permission of
course. He tells everyone that they may use them.
And, in fact, we have pointers to the
slides in the resources links and folders for all
of you to look at. And I recommend
that you do. Did we learn anything else, Michael?
- The only other thing that I would want to
mention is NASH which is a concept that is nashtastic.
- It is nashtastic.
There are by the way, I should mention briefly. Other kinds
of equilibria concepts they're beyond the scope of this class. But
there's a whole lot more to game theory as you might
imagine. And sometimes when they ask you to in these situations where
you can't do what you want to do, you end up in
these prisoner's lemonade situations. Other kinds of equilibria can get you
out of it. And I'm going to argue without explaining why
that the way that they get around this is by introducing Other
ways of doing various kinds of communication. and, in fact, I claim they're
a particular part of mechanism design. But that's a topic for another day.
- Okay. Fair enough.
- Okay. Did we learn anything else, Michael?
- I don't know. That's what I was thinking about. I mean, that seems
like a lot to absorb. And, the other thing is that repeated games, even
the prisoner's dilemma kind of unravel if you know when the end. And I
was going to look into what happens if you don't know when they're going to end.
- Okay. So, I guess,
that will be something that we will learn. Next time.
- Right, what we, what will we have will learned?
- Yes. Future past tense.
- Alright, Michael, well, I think that's about it.
At least my brain is full. So I will talk
to you next time, and you get to lead what
I believe is the last full lesson of the course.
- Oh, exciting. It is excuiting. Alright, well bye
Michael. You have fun. I'll see you next time.
- Alright. Bye Charles.