cs373 ยป

Contents

Welcome to homework assignment #4 in CS373. To remind you, we covered A-star and dynamic programming in class. Let's start with an A-star question. We learned in class that we can use heuristics, and a heuristic is a admissible if the heuristic value is no larger than the actual cost it takes to get to the goal. In a maze, we know that the number of steps in the maze is a good heuristic, because obstacles will make the path only longer, not shorter. Consider a maze of size 3 x 3 and say there's a single goal state, and the cost of moving is 1. Is the function shown here admissible or not? Please check one of the two radio buttons.

The answer is "yes." In this part of the state space, we are obviously entering values that are too small, but the heuristic is admissible if h(x) is lower than the cost-to-goal. It would have been inadmissible if the values over here were much larger than the values we want them to be.

Here's another heuristic function for a 3 x 3 maze. Assume again the cost is 1, and it's our sole goal state in the corner over here. Tell me is this admissible? Please check one of the two buttons here.

The answer is "no," because these numbers over here are too large. Why is this bad? You can see that in a maze like this you would expand node over here before we touch any of the nodes over here. If those are an optimal path, then we might miss them and pick what might end up being a suboptimal path even though in this specific case all the paths seem to be equally good.

Let me ask a more general question now. What may happen if h, the heuristic, is not admissible? A-star finds the optimal path always. A-star may find a suboptimal path in some cases. A-star may fail to find a path even if one exists. Or none of the above. In answering this question, it is important to say this is a statement that always applies, and these statements suggest they just apply sometimes under certain circumstances.

There was only one correct answer, which is this one over here. The inadmissibility of the heuristic might lead down an exploration of the state space that leads to the goal on a suboptimal path. Let me demonstrate this to you. Here is a world. Suppose our start step is here. Our goal is here. The heuristic is super large for those states over here and smaller around here. The nodes will be expanded following the zeros around here and around here, and the first time the goal is set occurs when we reach the point over here. We never discover the shortcut straight from S to G because of the inadmissible heuristic. That renders point 2 correct, and as a result A-star will not always find the optimal path as I have just proven. Now, it turns out A-star will find the path if one exists. It might find a suboptimal path, but eventually it expands all the nodes that can be reached. If the goal state is among them, it'll succeed to find a path. Therefore this answer here was wrong. Of course, none of the above is also a false answer.

## Diagonal Motion

I now what to quiz you about dynamic programming. Consider the following 3 x 3 world with a goal state in the corner over here. Say the value of the goal state is defined to be zero. The cost of motion is 1. Quickly fill in the values for all the other cells over here, assuming you can either move straight or diagonally. In both cases the cost of motion is 1.

## Diagonal Motion Solution

The answer is 1 over here, 2 here, 3 here, 4 here. This one also is resolves to 2 and 3.

## Stochastic Motion Solution

Here is my solution. As I go through all different actions a, as before, I now create a new inner loop of going through different action outcomes. This lists is (-1, 0, 1), and I set the actual outcome to the adjacent action in the action list. You might remember the action list is a list of different outcomes. By incrementing it by 1 or decrementing it by 1, I can pick a slightly different action in that list. Of course, I have to do the modulo 4 on the right side. Then the limitation is similar to before. I project the outcome into new coordinates--x2 and y2. Now I need to assign the probability with this outcome where if they modify a 0, we take the success probability. If it's not 0, we take 1 minus that divided by 2, because there are 2 possible undesired outcomes. Then the test proceeds by checking whether this is a legal grid cell, it's inside the grid, and the grid value is 0. Then like before, I add the value of the grid cell by now multiplying by the probability of that specific action outcome. Otherwise, I do the same for the collision cost. Finally, I take my cumulative value of v2, which I initialized with the cost of motion. You can't see this right here, but it's filled up. I update my value function just like before. You can see the quote over here. This is what you should have programmed. The key difference to our example in class is the inner loop over here where I go over different possible action outcomes, compute the actual action outcome, and then do the probabilistic addition of these outcomes rather than just studying one outcome.