cs212 »

Back to course page | CS212 Unit 1 | CS212 Unit 2 | CS212 Unit 3 | CS212 Unit 4 | CS212 Unit 5 | Print this page

CS212 Homework 5


Improving Optimal

hw5-1 video/ | hw5-1 question code/

This homework is about improving on the optimal strategy function.

Now, it may seem like that's impossible. Well, how can we be better than optimal? And in fact, we can't in the sense that we can't come up with a function that produces better results, but we can come up with a function thats faster.

Now one thing that bothered me about the optimal function is a big part of it was defining this probability of winning, Pwin(state), from a given state, and if we give it some random state like say Pwin((0,7,13,19)). It turns out that the value of this state, whatever that is, is exactly equal to the state where its the other players turn to play. So:

Pwin((0,7,13,19)) == Pwin((1,7,13,19))

On the left player 0 goes first, and on the right, player 1 goes first, but the probability of winning for an optimal function is symmetric because we are assuming that the opponent is also playing optimally. So if 0 goes first or 1 goes first, that probability of winning is going to be exactly the same. But the way we defined Pwin() it computes both of these and stores both of these and that is wasteful.

So in this exercise, we are going to get rid of that waste.

How wasteful is it? Well, I am going to do a timedcall(Pwin, (0,0,0,0)) for the Pwin function and pass it the starting state. So the first time you pass it the starting state it has to do the complete computation to get all the way to the end. So thats going to be a lot of work.

And it turns out that it takes about 3/4 of a second to do that work when the goal is 40, and the probability that's returned is 54% so there is a slight advantage for going first.

timedcall(Pwin, (0,0,0,0))
(0.763976000000013, 0.5465745271091017)

That tells us about the time, how about the space? Well I can look at the cache that the memo function has computed. It stored it at Pwin.cache. I can compute that length as follows:


And it works out to 86 thousand entries. So I should be able to cut this time down and I should be able to cut this size of the cache roughly in half by working more efficiently and having a better version of Pwin.

And we are going to do that by defining a function Pwin2 as follows:

def Pwin2(state):
   """The utility of a state; here just the probability that an optimal player
   whose turn it is to move can win from the current state."""
   _, me, you, pending = state
   return Pwin3(me, you, pending)

def Pwin3(me, you, pending):
   ## your code here

Pwin2 which is going to have the same signature as Pwin. It performs the same function only its going to do it a little bit better. The way it works is it breaks out this state, throws away the player to play because we know its symmetric and we don't need that, and then it calls Pwin3; which we name because it takes 3 arguments. The only ones that matter — my score, your score, and the pending score. Amd thats what I want you to do.

Go ahead and write the code for Pwin3 that will efficiently do this computation.

Now make sure here that you don't get stuck in an infinite loop. Take care when the value of pending is 0. That's a special case to deal with. And remember that the probability that I win from a position is always 1 minus the probability that my opponenet wins. You may need that in your definition

Doubling Pigs

hw5-2 video/ | hw5-2 question code/

This exercise involves doubling in the games of Pig.

What is doubling?

Well, if you are familiar with the game of backgammon, which is a gambling game with dice, doubling is a way to end the game early. The idea is, when you start out, each game is worth one point. But in the course of the game if one player is winning he may propose to double the stakes of the game so that it is worth 2 points, that is, say the winner wins 2 dollars from the loser rather than 1 dollar or 2 euros or whatever currency you're playing with.

When the player makes this move of doubling then the opponent has a choice — he must immediately either accept or decline.

If the opponent accepts, then we play the game and it's worth two points.

If the opponent declines, then the proposer of the double immediately wins 1 point and the game is over.

Now in backgammon doubling can go on infinitely from 1 to 2 to 4 to 8 to 16 and so on, for Pig, we are going to allow only one round of doubling. So the game is going to start off as being worth 1 point and can go up to 2 points and no more than that.

So we are going to define a whole new game, the top level function will be called play_pig_d ('d' for double) and all our functions will end in that '_d'.

A state is this game will be just like before with one new value for the doubling amount.

  • (p, me, you, pending, double)

So a state will have the player to play (p), the score for me (me), the score for you (you), the pending score and an extra field for the double.

And the double can be 1 or 2 — and we say we will not allow it to go higher that — or the string 'double' and it takes that value at only one point during the game when the first play proposes the double and the second player has the choice of either 'accept' or 'decline'. That player will see 'double' in the state field and then will have to reply.

So what I want you to do is write these two functions: pig_actions_d which takes a state and returns all the legal actions from that state, and here is a description of what they can be, and I want you to define some strategy function strategy_d that gives a state in the doubling pig game and returns a good action.

And what do I mean by 'good'?

Well it's got to be better than this strategy function: hold at 20, which holds at 20, always accepts when offered a double, never doubles, and holds at 20 or rolls otherwise.

You should define a strategy that is better than that. If you want, you can try to define the optimal strategy, but you're not required to.

And I've defined for you the main function play_pig_d. It's similar to what we had before except the result that is returned is two values now — both the winning strategy and the value of the game (which will be either 1 or 2).

And the other difference is that we've gotten rid of the roll and hold functions and now replaced it with just one do function which takes an action and a state and passes along the die roll and then does that action. So that's really the definition of how the game works where the action can be either rolling, holding, doubling, declining, and accepting.

Then dierolls are as before, we defined the clueless function which just takes a random choice from among the legal actions — you definitely ought to be able to beat that — but you should also be able to beat the hold 20 strategy.