Exam testing your knowledge
We summarize all of the content in the course thus far.
Welcome back. We're about to start the final lesson in this course.
This time we're mostly going to focus on review and some practice problems to get you ready for the exam or at least to get you in the mood for the exam.
But there may also be just a little bit of fun.
You may get the chance to hear someone who is not Wes Weimer talking. I know, I know. Someone with real voice inflection? Be still, your beating heart.
But before we get going with practice problems, let us make The List,a high level summary of everything you've learned thus far.
We started off by introducing the concept of a language as a set of strings.
Regular expressions, finite state machines, formal grammars--these all denote or accept or correspond to sets of strings.
One of the first tools we introduced was regular expressions,which are just a concise notation for specifying some sets of strings.
Those sets of strings are called regular languages.
An incredible surprise move: regular expressions denote regular languages.
And we learned a bunch of regular expressions:
r+, r*, r1|r2, [a-z], r?
and we ended up using these to specify tokens.
More on that in just a bit.
Then we learned about finite state machines, which are a cool way to draw regular expressions and also, it turns out, a way that we implement them under the hood.
That's how Python actually does it.
Here I've shown a finite state machine for r"ab*".
Possible to have ambiguity or epsilon transitions in a finite state machine.
That makes a finite state machine nondeterministic, because if you're trying to simulate it, you don't know exactly where to go at any given point.
It turns out, however, that is not a problem at all.
We can convert nondeterministic finite state machines down into deterministic finite state machines.
They may get a little bit bigger, but it will totally work.
Then we moved on to the more powerful context-free grammars, which are a concise notation for specifying some sets of strings.
Wait! I thought that's what regular expressions were.
Actually, they're both just concise notations for specifying possibly infinite sets of strings.
And your typical context-free grammar is just a set of rewrite rules with a nonterminal symbol on the left, an arrow, and then some terminals and nonterminals on the right.
Terminals are the same thing as tokens.
They're the actual input that we're trying to match.
There are some cool things that we can do with context-free grammars, like matching balanced parentheses, that we could not do-- we're certain we cannot do it, it is impossible to do correctly--with regular expressions.
We often want to check and see if a string is in the language of a context-free grammar or matches that context-free grammar, can be derived or generated by that context-free grammar-- these are all the same question--and one way to do that was memoization, which for many years I always wanted to call "memorization" but it's just not.
It's also called dynamic programming, which sounds really exciting, but in practice basically builds charts where we write down previously computed results so that we don't have to compute them again.
This is called being lazy, and it's a phenomenal virtue when you're writing programs.
We can combine context-free grammars and, potentially, memoization together to get parsing, which is the process of determining if a list of tokens is in the language of a context-free grammar.
If so, we produce a parse tree.
Where did we get that list of tokens, you ask?
The process of lexing breaks a big string, like a web page, up into a list of tokens or important words.
The tokens are specified using regular expressions, which means that a lexer is implemented using finite state machines.
We do lexing first and then parsing.
I have written them out of order to shake things up.
Once we have our parse tree, we're getting closer and closer to the meaning of a program.
One aspect of program semantics or program meanings is the notion of types--that we can classify objects or values like 1, 2, and 3 into groups and say, "Those are all numbers".
So a type is just a set of values and some associated operations that you can apply.
So the values might be things like all numbers, all strings, or all lists, and the operations might be things like + - / or length.
I can apply length to a string or a list but not a number.
I can add numbers, strings, and lists, but it means something different every time.
I can divide numbers, but I can't really divide strings or lists, at least not using this division operator.
Types are our first step along the road to meaning, and in computer science we formally call that semantics.
By the way, if you've been wondering the whole time, semantics is a tricky word that essentially always ends in an S, even when we're using it in sort of a singular fashion.
Semantics of a program: its meaning, what does it compute.
A program may have type errors, like if you try to divide a string by an integer, or it may have any number of other exceptions.
But in the general case, it produces a value.
This means that we have computed something. That was the result.That's the meaning of our program, just like a sentence in English or French or Cantonese might have a meaning.
Once we have a grip on semantics, we can introduce optimization, where we replace one program with another or, conceptually, one part of a program with another as long as the whole thing has the same semantics.
This is the critical rule of optimization.
You can't change the meaning of the program.
If you can't change the meaning, what can you change?
Typically, the new code you've brought in uses fewer resources-- less time, less memory, consumes less power-- and we've seen a bunch of examples of these.
After optimizing, or not--you never have to optimize--we can move on to interpretation.
This is the fun part.
We recursively walk over the parse tree, and the meaning of a program, the final result, the picture we should display for a web page, the result of a computation in a financial program, is computed from the meanings of its subexpressions.
So if I'm in a state or environment where a maps to 5, I can compute the meaning of this abstract syntax tree expression.
Well, we have times and plus. I'll go down here and figure out what a is. a is 5, 3 is 3. I multiply them together and I get 15. Over here, 1 and 2. I add them together and I get 3.
The whole thing I get 18.
Walk down the tree on both sides, and only as I'm coming back up do I compute the values.
Typically, to perform interpretation we have to track state, like the values of variables which may change, in environments.
And environments are often chained together, especially as you make function calls.
Finally, we put all of that together to build our web browser.
We followed a particular architecture.
You could imagine doing another one, but this is the one we used for this class.
We got the string from a bunch of calls to document.write().
Wow! And then we're done.
That was a highlighted summary of a lot of the key concepts in the course, especially if you're taking notes at home.
And now we're going to do a bunch of review questions. Many of them should seem very easy. That's intentional.
We just want to go over the material and get it fresh in your mind, and the expectation is that you'll do harder studying on your own. Mastering the material involves personal responsibility.
You're going to have to put extra time into it at the end.
All right. First quiz: regular expressions.
I have written a Python format regular expression here at the top of the slide, and what I'd like you to do is indicate which of the strings listed here match it exactly.
Multiple multiple choice. List all that are correct.
All right, next we have a bit of a trick question--I'm importing the regular expression library and then I'm calling "re.findall" on the regular expression r"[a-z][a-z]+", and the haystack string "a quick brown".
And that's going to return a list, and in these blanks, I want you to write the elements of that list.
What are the strings inside "a quick brown" that are matched by this regular expression.
All right! This quiz is on finite state machines.
I have drawn a super cool one here at the top and then down here at the bottom, I have four candidate regular expressions.
What I'd like you to do is note each regular expression that has exactly the same language that accepts exactly the same strings as this finite state machine.
That is what I want you to go backwards and find from this finite state machine best regular expression that matches it.
This time I've written a regular at the top in python syntax and I've drawn four finite state machines.
I'd like you to check each finite state machine that accepts exactly the same language as this regular expression.
All right. Our penultimate question on regular expressions.
Which of the following sentences are true? Check all that apply.
This is the Super Hard Quiz of Doom and our last question on finite state machines and regular languages and regular expressions.
This requires quite a bit of abstract thinking so don't feel bad if this takes a number of tries.
These questions are always, sometimes, and never.
I've written out two claims, and what I want you to do is mark this box if the first claim is always true, mark this box if they're sometimes when it's true and sometime when it's false, and this box if it's never true.
So here's the first claim.
Suppose I have any finite state machine A. You draw it out for me. Can I then make a new finite state machine that accepts strings of the following form?
For any string that's in yours, I then get another string that's in yours and I accept their concatenation when I glue them together.
This is a little hard to get a feeling for but let's imagine my finite state machine A accepted ba ba ba, that sort of thing. Then I want to make a new finite state machine that accepts ba ba, ba ba, ba ba, doublings of the first finite state machine.
Down here, I have a very similar claim but with one bit change.
If I have a finite state machine B, can I make a new finite state machine that accepts strings of this form--x concatenated with x where x was a string accepted by B?
So if you give me a finite statement machine that only recognizes hello, can I make a new finite machine that recognizes hello, hello?
I don't know why you'd say that. Anyway, give this some thought. This may take some extra time.
All right. This time I have written out a fragment of HTML.
<font color = "red"> Richard Stallman launched the <a href = "gnu.org">GNU project</a> for software freedom.</font>
What I would like you to do is check each box that corresponds to a word that will be printed on the screen when we look at a in the web browser.
So, if you think the word "Richard" will appear on the screen, check this box.
If you think the word "red" will appear on the screen, check this box. Try your hand.
I was wondering if you might tell us a bit about changes to the language over time or what's coming down the pike.
As a result, the platform grows, the language grows, and the needs of the programmer grow with it.
A big one is that as your programs start to get bigger, you have needs around organization of your programs that don't actually arise when your programs are smaller.
You may have seen as you're working with Python that Python has a module system. It has the ability to define separate pieces of code that you put into modules, and then you can mix and match those modules and put together larger programs from smaller components.
There are various idioms that people use within the language to sort of simulate a module system, but they don't get any support from the language directly.
The web has some particular needs of its own.
But it'll make it much easier for people to create reusable individual chunks of code that they can share with each other.
One of the tasks in this course was to specify tokens like numbers in quoted strings.
Here I'm giving you a slightly different kind of number--submit via the interpreter, define a variable regexp that matches numbers with one or more leading digits 1,2,3--1,1 and an optional dot followed by zero or more digits.
Here's the dot and then here it has one digit, here it has zero digits after it.
# Crafting Token Specifications # Define a variable regexp that matches numbers with 1 or more leading digits # and an optional . followed by 0 or more digits. regexp = "" import re tests = [("123", True), ("1.2", True), ("1.", True), (".5", False), (".5.6", False), ("1..2", False)] for r, ans in tests: print (re.findall(regexp, r) == [r]) == ans
Now you are going to play the role of a learner.
I've given you three positive examples and three negative examples, and what I'd like you to do is craft a regular expression that accepts the three positive examples but rejects the three negative examples.
Quite a bit of room for creativity here. Go try it out.
# Learning Regular Expressions # Define a variable regexp that matches the left 3 strings but not the right 3. # Yes No # aaa aabbb # abb aaccc # acc bc regexp = "" import re tests = [("aaa", True), ("abb", True), ("acc", True), ("aabbb", False), ("aaccc", False), ("bc", False)] for r, ans in tests: print (re.findall(regexp, r) == [r]) == ans
Here I've written down four claims.
Or equivalently, I could use a context-free grammar to determine if a program declares every variable before he uses it.
All right! Now let's practice generating strings from a context-free grammar, or equivalently checking to see if the string is in a context-free grammar.
Over here, I've written five strings and I'd like you to check each one that's in the language of this context-free grammar.
Dave Herman talks about security and languages at Mozilla. One key concept is the trusted computing base, which we want to be as small as possible. Languages like C++ admit buffer overflows, a source of security exploits. The halting problem prevents us from perfectly detecting malicious code or webpages statically. Computer security is a critical area in computer science.
Let's take a break and talk a bit about computer security-- a topic we really haven't touched on much in this unit so far.
Security can sometimes be defined as computing in the presence of an adversary-- someone else using a computer or a network or resources who means you hard or hopes to exploit or take advantage of resources resources that you've put out there.
Then as soon as you've directed your browser to that page, it would stall, and you'd be denied some server. You wouldn't be able to use your computer.
In practice, browsers often put some sort of timeout on script execution.
I'll run your script for 3 or 4 seconds, and after that if it hasn't finished, I'd better stop. that's one of the reasons why optimization is so important, but it also has this security angle. If we make any mistakes when we're designing our web browser, then outside agents might break in to our computers and gain access to sensitive documents--our tax returns, our personal email, or whatnot. We want to make sure that that doesn't happen.
Let's listen to someone who knows quite a bit more about security than I do talk about how this plays out in the real world with web browsers.
The web can be a great place but it can also be a dangerous place. Nothing stops users from posting malicious code or websites or writing scripts that will trying to take advantage of your computer or your browser.
There might be security holes or exploits, they're sometimes called, that allow this. I was wondering if you might talk a bit about security at the language or implementation level from the perspective of Mozilla.
[Dave Herman] Absolutely. Security is a prevalent concern at Mozilla.
When I was growing up, the idea of getting a piece of software was that you actually drove to a store and you looked at a shelf and you pulled a box off the shelf and you paid somebody and you took it home and that piece of software was sold to you by a company that you trusted, by a third party that you trusted, so you had some sense of this piece of software that I'm about to put on my computer is something that I can believe is going to do something on behalf, not something against me.
The web doesn't work like that way at all.
The way web applications work is you can go to any website anywhere in the world, and somebody you've never met and never seen and that nobody can vouch for you is going to run some of their code that they wrote on your computer.
So that changes the game.
That means that in order to build a serious platform where programs can do important things on your behalf, we need to make sure that they can't also do important things against you.
The more people put valuable parts of their lives onto the web like their bank account, for example, the more we have important assets to defend.
One of the concepts people talk about a lot in security is the notion of the trust computing base. When you download some code from some third party that you don't know, if we're being kind of maximally pessimistic we say, well, I don't trust this code completely. I'd like it to do something for me but I don't know for sure that it's not malicious. However, that code is running inside of a web browser like Firefox, and I do trust the code that was written by Mozilla. I do trust Firefox.
In order to be able to say as much as we can about the security of a system, we'd like for the parts of the system that we need to trust the most to be as small as possible, so we talk about shrinking the trusted computing base, as being a goal of having the smallest possible amount of software where if it goes wrong disaster can ensue--like somebody can steal or credit card information or your bank account.
All modern web browsers are implemented in C++. Now, C++ is a very powerful language. There are a lot of things that you can do with it. But it's also not a particularly safe language. It's a language where if you make one wrong move, if you make one little programming mistake, you can actually leave parts of your system open to malicious programs.
For example, if you write a program in C++ and you have an array of data
in most programming languages you'll get a runtime error that says,
"Oops. You tried to access beyond the end of the array."
C++ doesn't give you that protection. What C++ does is it just keeps reading what ever happens to be in memory at the end of that array.
Well, whatever happens to be in memory could actually be some part of the browser that has internal information like a password. It could also be some other program running on the system, and there are a lot of people out there who work on finding ways to exploit these kinds of bugs to use them to take control of your computer or to get access to your private data.
The project of building a web browser that people can trust and building a web browser that operates on user's behalf is also one of building software that they can trust.
In order to build software that they can trust, it needs to be built on top of programming technologies that we know we can work with effectively to build software that's not unsafe.
Unfortunately, it turns out that there's a lot of good theoretical computer science that shows us that that is a provably impossible problem.
I don't know if you've discussed the halting program.
We may have.
It turns out that if I were able to prove that any particular piece of code was not malicious I would also be able to solve the halting problem, and we already know that that's an unsolvable problem in computer science.
Looking at this maybe from a more positive side, that means that I'll always have a job.
The law of compiler writer employability.
Here's a new quiz. The Great Grammar Fix-Up.
It's a puzzle in which I've written a partial context-free grammar on the left with five holes, and over here on the right, I have five strings that I want to be in the language of the grammar.
What I'd like you to do is fill in each hole with the single word like s, a, b, c, d lower case letter, that sort of thing, so that this grammar accepts these five strings.
Now, the real challenge here is this partial structure that I've provided.
If I covered this up and said you could write anything you want, it'd be pretty easy, but instead, you have to use this context that I've already given you and that's the challenge.
The hint is that the naming is sometimes handy, but sometimes, it leads you astray. Be careful.
All right. So, it's another one of these puzzle-like questions.
This time, I've written out three context-free grammars, and then down here at the bottom, there are five strings that I want to be in the language of the grammar.
Check all that apply. Which of these grammars can produce these five strings?
Here once again, we're presenting familiar material in an unfamiliar way.
This time, I'm going to give you the parse tree, and I want you to tell me what the grammar would have to be.
Over here on the left, I have a parse tree for the string (int + int), and over here on the right,
I have a grammar with three holes.
I'd like you to fill in each blank with a single symbol and a word to make this grammar fit with this parse string.
Back to quizzes involving the interpreter.
I'd like you to write me a short Python program that for all numbers x between 0 and 99 prints x cubed, that is x * x * x, but only if x itself is even and x cubed is less than 20.
For example, 2 is even, and 2 times 2 is 4 times 2 again is 8, that's less than 20, so I would definitely want to print out 8 among potentially other numbers.
And the hint is use a list comprehension. This can be done very concisely.
All right. Parsing States. Over here on the left I've drawn a grammar.
S → aSb S → S → c
Oh, it's just like balanced parentheses, but with central content.
And our input is "a c b" but we've only seen the "a" and the "c" so far.
I'd like to know which parsing states are in chart 2.
Remember, chart 0 is "I haven't seen anything." Chart 1 is, "I've seen just the "a"." Chart 2 is, "I've seen the "a" and the "c", but I haven't yet seen the "b"."
Check all that apply.
Another quiz about parsing charts, although the hint for this one is to think about the closure.
I've written a grammar on the left. It's like the E plus E E minus a grammar, but this time, we're using multiplication and division instead for no adequately explored reason.
We're currently looking at a state in chart 2.
Looks like this, E goes to E times, here's where we are, and then another E coming from chart state zero.
What I'd like you to do is indicate which of these other states would have to be included in chart 2.
That is, let's say we bring in closure from this parse state. What else we are going to bring in?
All right, this review quiz is about precedence and associativity.
Over here on the left, I've written out a grammar for list, and list selections in Python, which is kind of cool.
An expression can be a list of integers separated by commas.
You can have as many integers as you want, and let's imagine there is some other rule where int goes to [1, 2, 3] and we have tokens for that.
You can take the length of the list, or if you're given a list, you can select off all but the last element.
If you have the list [1, 2, 3], then you just get the list [1, 2].
Now I tell you as part of this problem that "len" associates to the "right" and it has low precedence, and selecting off all but the last element associates to the left and has high precedence.
Just like we may have told you before that multiplication has high precedence, and addition has lower precedence.
Then the input here is I want to compute the length of [1, 2, 3] with the last element selected off.
The last element selected off. And then down here, I've parenthesized that input two different ways.
What I want you to do is check the box that corresponds to what are parse 3 looks like.
If we were to put this associativity and precedence declaration into our parser with this grammar, which one of these worlds would we end up in?
Here I've written four claims.
Mark each one that's true.
Here's a fairly direct quiz to test your knowledge of environment lookups when we have nested, or chained, environments.
So we have a global environment that maps a to 1, b to 2, c to 3, and three other environments.
You are in the last one.
And down here I have five questions.
Fill in the blanks. Tell me what each one is.
Let's say I'd asked you to write the following procedure:
returns the index of the last instance of needle in haystack, or -1 if it's not present.
For example, if we're trying to
we're going to get 6 instead of this earlier 4 find the last occurrence.
Some of you may already be familiar with this procedure. You may have seen it in the previous class, but it doesn't matter--I'm going to write it for you.
I've written findlast for you but actually I've written a version that has a bug.
This works pretty much correctly.
What we're going to do is work our way backwards, but we're going to use two variables-- one to store where we're currently are and one to remember the last time we found something successful.
Let's imagine we're looking in haystack for a, we're going to find this a first--gray and then we're going to go over and find this a--gray and then we're going to get -1 so we want to return this a.
We're going to find the last one by repeatedly searching forward until we get -1 and then conceptually we'll back up one.
Remember we get -1 because the find procedure in Python returns -1 if something isn't found.
Forever, I'm going to start finding the needle in the haystack, find this a--great!
If that didn't work, I return, otherwise, I look again until it doesn't work.
Down here at the bottom, I have four possible inputs,
findlast haystack a, findlast haystack z, findlast haystack empty string, findlast haystack empty string needle.
Only some of these will actually show off the bug.
The bug is that I am returning this pos instead of last pos.
What I'd like you to do is check each box corresponding to an input that shows the bug.
Remember what it means to show the bug--on the buggy code you get one answer but on the correct code and in our minds, you get another.
"To spice up the quiz, we introduce the function reduce (also known as fold)."
Now let's get a little more practice with debugging.
Suppose someone asks you to write a sudoku checker.
Sudoku is a common pencil and paper game, also often seen on the computer, in which you fill in grids with numbers according to certain rules.
If you haven't seen Sudoku before, it doesn't matter. We're not going to do full Sudoku. We're going to do a very simple subset of it.
People would normally write out these almost Tic-Tac-Toe-like board.
We will represent that as a list of lists.
Our first row is the list [1, 2, 3], our next row is the list [4, 5, 6], and our last row is [7, 8, 9].
What I want to do is just check that horizontally all the numbers are different.
This is good, because 1 is different from 2 which is different from 3.This is good, 4, 5, and 6 are all not equal.This is good--7, 8, and 9 are all not equal.
But this makes me super sad, because I've reused 1.
This is our horizontal Sudoku checker.I'm going to write one with a bug, and you're going to help me fix it.
Here I've written out our horizontal Sudoku checker, but I'm going to do it probably a little differently than you might have in the past.
I'm going to use a more complicated algorithm to make it a little harder to debug to give you a feel for how this might go in the real world.
I've written a helper procedure called count.Given a number and a row, a row is just a list.It returns the number of time that number appears--whoa, that's ambiguous!
If, for example, you're asking for 5, it occurs once in [1, 2, 3, 4, 5], but it's count is now two in [1, 2, 3, 4, 5, 5].
Reduce and map and list comprehension are aspects of functional programming used by companies like Google to make their search engines faster and more parallelizable.
What this does is starting with the number zero, I'm going to go over every element in the row, and I'm going to call the current element this, and I'm going to call what I have so far the accumulator.
If the current element is equal to the number that I want, I'll return 1+ the accumulator.
Otherwise, I'll just return the accumulator.
Let me walk you through it on checking the count of 5 in 1, 2, 3, 4, 5. The number that we want is 5.
The current element starts out at 1, so I'm going leave the accumulator alone.
Next, the current element is 2. That's not 5, so I leave the current accumulator alone.
Next, the current element is 3. That's not still not 5, so I leave the accumulator alone.
Next, the current element is 5. Still not 5, I leave the accumulator alone.
Then, the current element is 5--whoa! Happy day in jubilation.
I turn the accumulator into 1+ the accumulator, so just 1.
Then in this particular list there is actually another 5 right after that.
This is 5 again, so I'll have 1 + 1, which becomes 2.
I'm walking down this list. How many 5s have I seen? 2, great.
You could have written this with a simple for loop without using reduce, but often when you have to debug a program it's after you've been away for awhile, and it looks a bit unfamiliar.
I'm intentionally giving you code that's not your usual style.
I assert that there are no bugs in count. Count works correctly.
However, there is a bug in my horizontal checker. Here's what we're going to do.
We're going to figure out the number of rows in the board and store that in the variable size.
For every row in 0 to size -1, I'm going to use functional programming again.
Here's what I'm going to do to figure out if a number is unique or not.
I'm going to use map to convert every element of the row into its count in that row.
Let me show you how that would play out.
Suppose our row is [5, 6, 6, 7].
After the map it's going to be [1, 2, 2, 1], because I've mapped every element of this list to its count in that same list.
We've used map before to map strings to their lengths or map x to x\^2, but you can use map to do more complicated calculations just like I'm doing here.
I'm going to convert every one of these numbers--5, 6, and 7--to its count and then check and see if the count is less than or equal to 1.
This will become True, False, False, True.
What I really want is for every element in this list to be True.
That would correspond to each element occurring having a count of at most 1.
There is a Python procedure called all that given a list returns true if every element of that list is true.
I'm using that to check that every element in my row had a count of 1 or less.
I'm doing it all one two lines.
This is pretty slick, but it's also pretty complicated.
It's very concise code, but it might be harder to read, especially if you're not familiar with all or map or making functions in Python. We've had less experience with them in this class.This problem is intentionally difficult.
So what I want to do is go over every row in the board and check to make sure that for every number in that row, it's count is less than or equal to 1 and then that's true for all the numbers on that board.
If you were writing horizontal checker yourself, you might use a number of nested for loops, maybe even three nested for loops.
I'm not writing as many for loops, because things like map and all and count do them for me.
Anyway, if it's not the case that everything is true, then we return False.
If I make it all the way through here, then we return True. I've even made some test cases.
Down here I've got a good board and a bad board
Over here--1, 1, 1, 1, 1, 1--we've got a lot of repetition there. That fails our checker.
Unfortunately, my code has a bug. Here's a board that shows it off.
I've already made the test case for you.
[1, 2, 3]--that's looking good. Horizontally everything is unique.
[4, 4, 4]--that's looking bad. This is not unique. We should be returning false.
Then [7, 8, 9]--that's okay.
We return true. We think that this board checks out, but it doesn't.
What I would like you to do is define and submit via the interpreter your own version of horiz_checker that's very, very similar to mine, but just changes one or two words to fix it.
Fix the bug, submit a new version of horiz_checker.
It should still use all and map and lambda and all that good stuff.
# Fix the Bug def count(number, row): return reduce(lambda acc, this: (1 + acc) if this == number else acc, row, 0) def horiz_checker(board): size = len(board) for row in range(size): if not (all(map(lambda x : count(x, board[row]) <= 1, board))): return False return True board_good = [[1,2,3], [4,5,6], [7,8,9]] board_bad = [[1,1,1], [4,5,6], [7,8,9]] board_bug = [[1,2,3], [4,4,4], [7,8,9]] print horiz_checker(board_good) print horiz_checker(board_bad) print horiz_checker(board_bug)
All right. For our last big review question, Let's delve into Optimization, making programs faster.
Remember that we can only apply an optimization-- that is, we can only replace one subtree of a program with another-- if we don't change the semantics, the meaning, the answer of the program.
Here's another way to look at the challenge of optimization.
What we really want to know for optimization is given two functions, F and G, do they both compute the same answer for all inputs.
This may seem a little abstract, but let me firm it up with a concrete example.
It might be that F is the original program, and G is the candidate optimized program.
If they compute the same answer on all inputs a, then we can replace F with G wherever it occurs.
We can imagine framing this as some hypothetical procedure--optimization_ok.
You just pass it in F and G and if F(a) = G(a) for all a, then you return True.
Otherwise you return False.
Now, this is clearly sketchy code. We sometimes call this psuedocode, because it wouldn't actually work in Python, but you can imagine what would go there.
Somehow we're going to try this out--maybe we'll look at their source code.
We apply these rules that we did previously.
Maybe we test it out on a bunch of inputs.
Somehow we come to the conclusion that F(a) is always equal to G(a).
If so, we return True, and the optimization is okay.
Otherwise, the optimization is not safe, so we don't apply it.
Just to give one more concrete example,
we expect optimization_ok of F and G to return True, but optimization_ok of X and Y down here where X is a/a and Y is 1.
This should return False because remember if a equals 0 then this one is division by 0 exception, and this one is 1, and those aren't the same thing.
All right. So here is the actual quiz.
I've written four statements, four claims, that refer to that optimization_ok phrasing or procedure from before.
It's kind of like a specification.
Here they are.
for optimization in all cases, for all possible programs.
optimization okay of betty frieden and george f will. Check all that are true.
We might replace the subtrees like x * 1 with just x if we believed that that preserve semantics. I was wondering if you could just give me a hint without telling me any trade secrets about higher level optimizations that are important.
What's the next step after replacing subtrees?
We have a very adaptive sort of online-type inference algorithm. You can study the code and try to infer type,s and if some codes gets loaded or invalid, it can change its mind or it can conserve what is good and throw away what was invalidated.
That's very important for performance.
Thank you for taking the time to tell us about it.
Sure. My pleasure.
'Congratulations on completing a difficult course on programming languages!'
It is time to declare moral victory.
While there is much more to learn in computer science and the world, this about wraps it up for this class. You should celebrate.
Here you can see happy people and clearly they're next to--let's call that a cake.
At least one thing we can conclude is that while you've learned quite a bit about computer science, my sketching skills have not improved at all.
This course was not easy. This course was not easy, and I fully acknowledge that.
In fact, some of the material that it covers is similar to material covered in 400-level classes at universities like Cornell or Berkeley or the University of Virginia.
There's a lot of deep stuff going on in here.
You should really congratulate yourself on making through it alive.
A legitimate question is what comes next?
Well, there's likely to be a final for this course and perhaps some wrap-up surveys, but if you'll indulge me for just a moment here at the end of the class, let me make some recommendations for what you might do with your copious spare time.
I personally recommend that you study--if you have the time--
Free will--do we have it? So unclear.Logic, the philosophy of science, and what it is like to be a bat.
Wait, what was I saying earlier about statistics? Ah, never mind.
Let me just conclude by saying that it has been an honest pleasure leading you during your discovery of this aspect of computer science.
It's rare that I can do something like teaching this class that will leave such an impact on the world.
Thank you for giving me this opportunity.
We see that this regular expression starts with an unmodified "a".
So we can rule out immediately any of these strings at the bottom that start with a "b".
They're not going to match it. The first letter has to be an "a".
So we have an "a," and then remember that (?: are a Python syntax loosely just meaning open and close parentheses.
So we've got an "a" followed by zero or more copies of "a?b*".
Well, I did say this was a bit of a trick.
If you look carefully at this regular expression, it requires words that are at least two letters long, a through z followed by one or more a through z at least two.
So, we actually don't match a. We match quick and brown and then there's nothing in here.
We just return a two-element list.
We have questions like this, because I want you to be confident in your own answer and not necessarily trust the structure of the problem that's given to you.
This particular phrase a quick brown comes from the sentence, "A quick brown fox jumps over the lazy dog," used commonly because it typically involves all of the letters in the English alphabet giving you a chance to see what each one looks like if you're testing out a font or what not.
Well, for a finite state machine and a regular expression to have the same language, they have to have exactly the same set of strings that they accept.
Let's go try some of those out.
After that point we have as many c's as we want, but possibly zero.
That's c*, so yeah. This matches number 3.
None of the choices match the regular expression exactly
except we're using a's in b's instead of open and close parenthesis. There is no such regular expression. We need a context-free grammar.
Well, let's try to reason these out since they're extra complicated.
Suppose you give me your finite state machine A.
Here I've drawn all these dots to mean it could be bigger in the middle.
I don't get to know what it is. This is going to have to work for any finite state machine.
But I do know that it has a start state and an end state--a final state.
Here's what I'm going to do. I'm going to draw another copy of A on my paper.
Then I'm going to go back to the first one and where it was a final state,
I'm going to make it not a final state, so I change this part.
Then I'm going to add an epsilon transition here, and I'm going to erase this other start state arrow.
Now I have a new finite state machine that accepts strings of the following form-- there has to be some part x here that would've been accepted by A.
Then there's some possibly different part y here that would be accepted by A again, so in general I accept x epsilon y or just xy.
In fact, I can always do this.
If you give me any finite state machine, I can always make a twice as big finite state machine that accepts one string from you followed by one independent, separate string from you.
However, for the second one it's going to be a little more complicated.
It turns out that the answer is only sometimes.
Let me give you an example for yes and an example for no.
Let's say the finite state machine you give me for big B just has one string in it--lowercase b. I'll do the same trickery as before, and now I have a new finite state machine that accepts bb.
Since there is only one string in the language, I've now doubled it. This works fine.
I can do it at least once, but now here comes the evil part.
This upper example--this blue example--of B totally worked.
Now I'm going give you an example for B that does not.
Here, B is a*x.
This finite state machine accepts the regular expression a*x.
Any number of a's, possibly zero, but then you need an x.
Now, note the difference between this sentence and its earlier copy.
This requires exactly the same string double.
Let's imagine that I do complete this finite state machine construction-- the same sort of thing I've done before.
Now I'm going to write out some of the strings that would be in a*x twice.
Well, I could have no a's, so then I have no a's followed by an x.
Or I could have one a--that's looking okay so far.
Or I could have two a's, or I could have three a's.
This pattern should be looking ominous.
Or in general, if I were able to double this machine so that it accepted the same string both times rather than a new string each time, I would be recognizing (a^n)x(a^n)x.
For the same reason that a^Nb^n can't be recognized--it's not regular--neither can this.
Here we have one positive example and one negative example, so it only holds sometimes.
This was super tricky. Do not feel bad if you didn't see this the first time.
Theoreticians would say that this quiz is about closure properties of regular languages.
Everything in the tag doesn't actually appear on the screen-- no font, no color, no red, yes Richard, yes Stallman launched the, none of this, none of this, none of this.
You might see this link somewhere else in your browser, but it's not rendered as part of the webpage. GNU yes, project nope, yeah, yeah, yeah, no, no, no.
Nothing inside the tags gets printed on the screen.
I need one or more leading digits, and then there's a sort of a compound optional part, so I'll use parenthesis.
We may have a dot and to really mean the dot, I'm going to use the backslash to escape the dot because remember it has a special meaning in regular expressions, any character.
And then I can have zero or more digits afterwards.
And then that whole grouping is optional.
regexp = r"[0-9]+(?:\.[0-9]*)?"
Well, there are a number of ways to go about this-- actually, an infinite number of ways to go about this.
An entirely legitimate strategy but one that might feel almost like cheating would be just to put these together.
You could submit the Python code that corresponds to either this or this or this.
Now you're guaranteed that it accepts the three strings on the left and nothing else.
Now, you'd have to be careful about how you group things or whatnot, so that the or and the parentheses would bind correctly, but that's one way to do it.
Let's go see if we can make a more natural feeling regular expression that actually does it though.
It looks like we definitely have to start with an a.
Now I have my choice of what I want to say the difference is between these other strings-- abb and here we've got aabbb.
In some sense one of the key differences is the number of b's.
Here's one regular expression that captures all the ones on the left and none of the ones on the right.
It's an a*--so that gets this first one, and then after you're done with the a's, you may optionally have either bb or cc. Great.
That gets all the Yes but none of the No.
This one is ruled because it's not bb cc with a's in front of it.
But there are a large number of ways you could've done this.
In fact, there is an area of study called learning theory that tries to make machines find patterns the same way that we can with our brain. There's a notion that you've really learned something when you've abstracted it to a small rule.
The reason we're less happy with this answer up here is because its' over-fitting the data in some sense. It's just recording the yes instances. It's not really generalizing the pattern.
We say that more learning is happening when we generalize a smaller pattern than just copying all of the available input, but that was by no means required for this problem.
No. This one produces an infinite number of utterances.
S → (S)
I can keep looping around rule one as many times as I want.
And in fact, this is one of the glories of context-free grammars-- that they can often accept an infinite number strings, thus, giving you room for creativity based on just a finite grammar.
Yes. It looks like this. Just like the balanced parentheses one but with "ab" instead.
Here, more powerful or more expressive means that there are some languages-- remember, a language is set of string--that context-free grammars can recognize that regular expressions cannot.
This is one of them, a^N^b^N^ can't be captured by regular expressions, but it can be done by context-free grammars. Therefore, they are more powerful.
That's a context-sensitive property. We won't be able to check it with context-free grammars.
We could only notice that later when we are doing interpreting or in a statically typed language if we were type-checking things to make sure we are using integers and strings correctly.
We might also check to see which variables were in scope. But in general, it's very hard to tell if a variable has already been used. That depends on the previous program. That previous program is the context. Context-free grammars don't get any context.
S → eE → ei
so this string ei is definitely in the language of our grammar.
S → eE → eE + E → ei + i
I could also have picked rule 2 followed by rule 3, and then eventually this will boil down to ei plus i. This string ei+i is also in the language of the grammar.
and that requires that I bring in this e, and there's no e in this string so that's not looking good.
I can get that by looping around the first rule twice and then ei. Yeah, this string is in language of the grammar.
We'll note that these a's and b's follows the same balanced parentheses pattern we've been doing a few times.
The real question is whether ie plus can be generated from s. We can get the e, we can get the i, but if we use the plus rule it has to have an e on both sides so you know we really need another i in here but we don't have it so this string is not in language of the grammar.
Alright. We noticed that all of these strings start with a lowercase a and S starts with capital A.
Why don't we have capital A goes to lowercase a just to get rid of this lowercase a at the beginning. So, we don't have to worry as much about it.
Now, I noticed that there can be some number of b's: bb, bb, one b, so one or more b's here, and then sometimes, we've got c, and sometimes, we've got c d at the end.
So, I'm going to make B C D be a list of one or more b's, and one way to do that is like this:
B C D goes to b, B C D. So here in the middle, after the a, we could either have no b's or we could loop as many times as we want and pick up more b's, and now after that, it looks like we either need c, c d, or nothing.
Well, we have c. We got nothing, so we could also have c d.
Note that there are multiple solutions to this problem.
Just to shake things up, let's get started on the right.
Here our start symbol is S and it goes to PQR and there actually aren't that many choices.
If we think about it, there are two strings in this grammar--PQR and PQ nothing.
So this doesn't look like a good bet because it doesn't produce PQQ, or P, or PR.
This is not a match. All right. What do we have over here?
Well, every string is going to have to start with lowercase p and then Q,
looks like I can have 0 or more Qs and then eventually we end with an R.
This looks pretty good because it matches the first string and the string down here, but it doesn't get PQQ so I don't like this one.
How about this grammar in the middle that uses ABC instead of Senatus Populus Que Romany in the name of the senate and the people of Rome--A, B and C?
Well, we have two choices for our start symbol that definitely complicates things, but both of them start with a P--well that's convenient since all five of our strings starts with a P.
Let's just go through the strings one at a time. Can I get P alone?
Yes. A goes to PB. B goes to nothing--so I can get that one. How about PQQ?
A goes to PB. B goes to Q and then Q again and then nothing. Yeah, I can get that one.
How about PR? A goes to PBC. B goes to nothing. C goes to R.
This is looking super promising. PQR--we use the first rule-PBC.
We'll gather up one Q and then we'll have an R. Great! PQQR. We use the first rule--PBC.
We'll go around this loop twice to get two Qs and then end with an R.
Now, let's just take a look at each of the steps one-by-one.
This first step "E" can be rewritten by (E) corresponds to rule two.
Must have used rule two to do that. Then here, I just changed the "E" to a "T" that's rule one.
But then down here, I somehow changed this "T" to an "E + E".
We only have two rules to play with, and only one of them starts with a "T".
So, I must have used rule four to do this, and this tells me exactly what it has to be. "T" goes "E + E", so it must've been "E + E", and down here, the only thing I've changed is "E" goes to "int".
That's not currently one of my grammar rules. It must have been rule three, "E" goes to "int".
Remember, every step in the parse tree corresponds to taking some part of the current input and using one of the rewrite rules.
Here is a candidate answer.
print [x*x*x for x in range(100) if (x % 2 == 0) and (x*x*x) < 20] [0, 8]
Remember list comprehensions start and end with the square brackets that mean I'm making a list.
I want to print out x cubed for every x in 0 to 99, but we can get that with range 100.
But I only want to print it out if x is even, and to do that, I will compute the remainder when x is divided by 2 if that remainder is 0, then x was even, x cubed to be less than 20.
In fact, there are only two of these 0 cubed and 2 cubed.
As soon as we get up even to 4, 4 cubed is 64, which is already much higher than 20.
A very short list comprehended down from a very large one.
This is a state that will occur in our chart, but it occurs in chart state 0.
Now, these last two really test your memory of how we compute the parsing states, the charts, and how we keep track of this from information.
Remember when one is computing the closure, you look for the red dot right in front of a nonterminal light key.
Oh! We found one E → E*.E this is super convenient.
When you go back to the grammar and look for every rule that starts with "E" and you put a red dot right at the beginning of it, so we're expecting "E" goes to dot this, "E" goes to dot that, and "E" goes to dot int. And you put from the current state, state 2.
Higher precedence means it gets done first.
It's on the inside of the parentheses, so I really need this part with the colon, -1--that has to be inside all the parentheses, and not len.
And then associates to the left means that we typically want our parentheses to look like this-- they're grouped off to the left as much as possible.
So here we have 1, 2, 3--we remove one element, and then we remove one element from the rest of it, and then we compute the length.
Here we are in the Python interpreter, and I'm just going to show you that this subrange operator actually does associate to the left in Python.
Here I have more or less two versions of the same expression, where taking 1, 2, 3--chopping off all but the last element-- chopping off all but the last element again.
So our list is going to go down from three elements to two elements to one.
Here I have the original expression, and here I have the original expression with added parentheses, and we look down and see that they both get the same answer.
If I write a program while true do x becomes x + 1, that never returns with a value, and it may never raise an error.
This is also known as The Halting Problem.
Well, we're going to follow the same recursive procedure to get every answer, check and see if we have it, and if not, call our parents.
So, a--I don't know what it is. How about you?I don't know what it is. Oh, it's 99.
B--I don't know what it is. Do you? Oh, it's 88.
C--I don't know. Don't know. Don't know.Looks like it's 3.
D--looks like it's 77.
And this last one requires you to remember a bit about the corner case in our environment lookup.
I don't know what it is. Do you? No, do you?
No, do you? No, do you?
And my parent pointer is none, so we'll just return none.We don't have a value for this variable.We could also have raised some sort of exception.
Remember that we return None for variables without defined values.
Let's go run it and see.
On the buggy code we get -1, -1, -1, -1.
But if I fix the bug, then I get 5, -1, 8, -1.
That means these first two--
"haystack", "a" and "haystack", "empty"
show off the bug, and the last two do not.
So the correct answer was, first one and this other one on the left, but not the ones on the right.
You can also reason through the program qualitatively to get a handle on this.
We're only going to see this difference if last_pos is not equal to this_pos.
Remember, this was the bug, and this version is correct.
So in order to see the bug, this_pos will have to be different from last_pos, which means we will have to have found the string at least once.
We find the "a"--we find the other "a"--so this is a good example, and then in fact you have to know a bit about Python to know what it means to try to find the empty string in something.
But Python always says the empty string is present-- present at offset zero--it's going to be present at offset 1--at offset 2-- just like epsilon, it fits anywhere.
So there's more than one "a" in the string.
There's more than one empty string in "haystack".
So these are both good test inputs to show off this bug.
Since there's no "z" in "haystack", last_pos and this_pos are both -1, so we can't tell the difference.
And since there's no "needle" in the empty string, last_pos and this_pos are both -1, so we can't tell the difference.
Testing is not easy.
Reminder: In Python, string.find("") always returns 0. (The empty string is so small that it is present everywhere!)
Let's get started on this by looking at the test cases instead of the code.
Our buggy program worked fine when row 0 had the duplicates, but it didn't work well when row 1 had the duplicates.
So, probably we're not handling the row correctly.
Let's go up and see if that intuition pans out. In fact, it does.
We're looping over all of the rows in range size.
But if you look down in this code, although we introduced the variable row, we never really use it.
Just use it over here. We're not using in all that much.
Remember, I want to check to see that all of the numbers in these row are unique within that row.
Conceptually, that's two references to row and I only see one over here.
The bug in this code is that we're always checking the 0 true board.
We should be checking the current row against itself, and that single word fix causes us to get the answers as we works on.
# Fix the Bug def count(number, row): return reduce(lambda acc, this: (1 + acc) if this == number else acc, row, 0) def horiz_checker(board): size = len(board) for row in range(size): if not (all(map(lambda x : count(x, board[row]) <= 1, board[row]))): return False return True board_good = [[1,2,3], [4,5,6], [7,8,9]] board_bad = [[1,1,1], [4,5,6], [7,8,9]] board_bug = [[1,2,3], [4,4,4], [7,8,9]] print horiz_checker(board_good) print horiz_checker(board_bad) print horiz_checker(board_bug)
This was potentially a difficult quiz, requires you to think about outside the box of what I normally ask.
We can implement optimization_ok so that it returns a safe answer for optimization in all cases.
Remember how are we using optimization_ok?
We consider a bunch of optimizations, and if this says True, then we swap out parts of the program.
A safe answer is just to return false all the time, never do any optimization.
That'll make for a slower programs, but it's safe because we'll always get the same answer.
We're never changing the meaning of the program.
Optimization is sometimes described as conservative because if you're not absolutely certain that an optimization is okay, just be conservative and don't do it, and you'll be fine. The program will be a little slower, but it'll always get the right answer.
We cannot implement and optimization_ok that works precisely in all cases.
It is undecidable like the Halting Problem.
This sadly is true.
I'm going to sketch an answer for this.
A formal proof that something is impossible, like the Halting Problem, is a little beyond the scope of most of this class.
But let me show you how it would go.
Suppose we could write optimization okay in a way that was precisely correct in all cases.
I, the Great Wesini, shall solve the Halting Problem by telling you if an arbitrary program halts or not.
You want to know if a program P halts or not. I'm going to tell you how to do it.
Remember, we know this is impossible.
Here's how we do it.
Suppose you claim we've implemented optimization_ok.
It always gets exactly the right answer. I'm going to use it to build a Halting Problem solver.
You give me a program P and you want to know if it halts? I just make up a little program over here called "loops forever" or "loops."
This procedure loops sets x to zero and while True, it increments x.
We've seen before that this program never terminates, never returns a value.
I can just check to see if your program ever halts by checking to see if it would be okay to replace it with loops forever.
If your program gets the same answer on all inputs as loops forever, then your program loops forever on all inputs. Otherwise, your program halts.
If optimization_ok could be written, then the Halting Problem could be written-- this Halting Problem decider. But this cannot be.
We've seen before that there is no way to solve the Halting Problem.
It's equivalent to figuring out if "this sentence is false" is true or false. It's a contradiction in the real world.
So halts is impossible, but I could totally make halts if I had optimization_ok.
That means optimization_ok must me impossible--impossible to solve every time precisely.
We can solve it some of the time. We just can't solve it all of the time. Because if we could solve it all of the time, it'd be just like the Halting Problem.
Then finally down here at the end-- optimization_ok of AB implies optimization_ok of BA.
This is true because we're just checking equality.
If A == B, then B == A.
Now the two I've picked--George F Will, a Pulitzer Prize-winning conservative commentator, and Betty Frieden who wrote The Feminine Mystique and lead second-wave feminism in the United States-- they're unlikely to have exactly the same things to say.
These two are unlikely to be equal, but if they were then you could reverse it, and they would match up. You never know. Check them both out.