cs253 ยป

CS253 Lesson 5


01 l Introduction

Okay, so welcome to Lesson 5. In this lecture we're going to be talking about how to make your web service--your web application--talk to other computers. So up until now we've had your programs basically outputting HTML that a browser will interpret for a user to see--to make something pretty. But your web application can also return things like XML and JSON that other computers can interpret so they can build websites or services that interact with your data. So we're going to be talking about those formats--XML and JSON-- how to read them, how to interpret them and manipulate them, and that sort of thing. And then we're going to be adding a feature to ASCII Chan that is built on Google Maps, so we can see where it uses our posting from. And then, when we're all done, you're homework will be basically making your blog output itself in JSON. So it should be fun. It's a little bit of a change of pace from the last two lectures. Let's jump right in.

02 l HTTP Clients

Until now, we've been working with this kind of standard picture of this little guy and his browser. Now, this used to be you, but now this is the user. You have upgraded to the other side of the picture where the servers are. So let's get these boxes. We've seen these 100 times. Okay, and let's add you over here. You are now the programmer. Congratulations, you are a web developer. We can talk about users and how much trouble they cause us. Now, in a normal web request, the user makes a request to the servers, and we respond with the response. No surprises there. Now, what we're going to be talking about today is when your servers start making requests to other servers. So this is our website. It runs on these boxes. Let's say we're going to hit somebody else--Twitter, for example. They have their own servers. Their servers are probably on fire, because it's Twitter. So we can have a web page that actually makes requests to Twitter. These are our computers talking to their computers. This happens all the time. They're still communicating over HTTP, and Twitter still responds as usual, but if we're writing some web program that, for example, does some data analysis on Twitter, the user might make a request to us. We might make a request to Twitter servers, they respond with what their response would be normally, and then we may manipulate that data and return it to the user. And this is actually a really common case. What I'd like to do now is actually explain how Hipmunk works a little bit, because we do a lot of this type of communication. Okay, let's change our picture a little bit to be a little bit more about Hipmunk, because I'd like to explain how our architecture works. So in this case, this is still-- we call users customers when they're actually paying-- and this is me--Steve--and this is Hipmunk servers. When a user does a flight search, what we do is we hit a bunch of our data providers where we actually get our flight data from. So the first thing we'll do is we'll take your flight search and we'll send it to ITA, we'll send it to an airline, and in some cases we'll even send it to Amtrak, if that's appropriate. Each of these guys are their own-- these are companies who have their own services that we work with. So ITA will run our flight search, and they will send us data back. The Airline, too, will run their own flight search on their own system and they'll send it back to us. And Amtrak will do its thing and send their data back to us. So then on our server, we have all this flight search data, represented by this blob here. We will manipulate all this data, collate it, make you nice results, and then we'll send back our HTML response. So what we're going to be working on in this lesson is how do we make our server speak to other servers when there's no browser involved. We're still using HTTP, but we are now communicating over other protocols. We saw some of this in Lesson 1, but we're going to be doing a lot of it in this lesson, because there's a lot of cool things you can do when you realize that you're not the only service on the internet.

03 l urllib

Okay, so what I'd like to do now is I'm going to go into Python, and we're going to play with a Python library for actually making an HTTP request so you can see how that works, and then prepare for some quizzes. So in Python we have a library called urllib2. There's also a urllib1, and this is kind of the evolution of Python in front of your here. We're going to use 2, for the most part. urllib1 has a few handy functions of its own, and when we use those, I'll include those in documents. But anyway, urllib2 has a function in it called url open, and we can give a url here to download. So let's say I'm downloading google.com. Actually, I need to make sure I save this. I usually save it in a variable called P for page. Probably not even the right concept, but that's my habit. I always use P when I use url open. So if you run this, we're going to get this P object, which is actually, basically, a file object. In Python, file objects-- basically, what a file object is, is an object that has a read method. And you can call a read on there to get the contents. So I'm going to show the contents in C and call read on P. Okay, we called url open on this url, storing it in this variable P, and then we called read on the response and stored it in a variable called C. Now, if we were to evaluate C, a wall of text, which is what we expect. So this is actually Google's front page. If you remember early in this class, we basically accomplished the same thing using telnet or curl. You can also do the same thing in Python. So now we have this variable C that has this whole response in it, and we can manipulate it in our programs, which is what we're going to be doing a lot of. Let's take a peek at what we have on that P object. We can use the dir built-in function in Python to examine an object. So now we can see the methods and attributes on our P object, and we can see a couple of them that are probably interesting to us-- headers, for one, and get url is another. Get code is probably the status code. This is generally how I work. When you don't know a library super well, you can use dir to kind of examine the object. So let's take a peek at a couple of these. We've also got a url one. Let's see what's in there. That's the url we requested. No big surprise. We can look at the headers. So this is an HTTP message instance. Now, I happen to know that this is a dictionary, and dictionaries have a function on them called items. If we were to run items on this in Python, this is what you can call in any dictionary--items-- to view the keys and the values, and it will actually print them, generally, nicely for you. We can see all of the headers we got back from Google. This is an actual dictionary, so we can say p.headers, for example, content type, and we can see the content type that we got back from Google. It's actually kind of interesting; we're getting an ISO charset, which is--I was expecting UTF 8, but, hey, you learn things every day. So in the future, especially for your Windows users who had trouble using telnet, you can just use urllib and get the same answer. What I'd like you to do now is play with this library a little bit in the form of a quiz.

import urllib2
import urllib

p = urllib2.urlopen("http://www.google.com")

04 qs Using urllib

Okay, quiz: What server does www.example.com use? You're going to use the server header, which is in the responses, and I would like you to of course use urllib2 and Python to answer this. You don't have to, but it will make the rest of this lecture a lot easier if you start figuring out how to use it now.

04 s Using urllib

Okay, and the answer is Apache/22.3. Now, if you put in Big IP, we accepted that, but I'm pretty sure you didn't use urlib2. Because when I was finding the answer to this, I also cheated, used curl, and got Big IP, which is actually what www.example.com uses, but urllib2 automatically follows redirects, and the server it redirects to uses Apache. So let me show you how I got the example, and then we'll go ahead and talk about what happened with the redirect. Okay, so the way I found this answer is I used urllib2. I used the url open function to hit www.example.com. I stored that in a variable, and then I looked at its headers. I see that the server is Apache/22.3. And that's the correct answer. If we look at the url attribute on P, we see it's actually iana.org. Now, if you remember from Lesson 1, I asked you what is the location header when you hit www.example.com. And that was because-- And this was the answer, because example.com redirects to iana.org. Now, urllib2 automatically follows redirects, which can sometimes be confusing. And in this case, it automatically followed the redirect. It hit iana.org, whose server is Apache. But we're going to do this by hand. I'm going to use curl. This is the Lesson 1 example. We can see that server is actually Big IP. So hopefully you used urllib2, but if you cheated like I did when I was writing this quiz and you saw Big IP, we'll also accept that. And it's important to know, when you're using these libraries, that a lot of these libraries--urllib2, the default one included in Google App Engine called URL Fetch-- they follow redirects automatically. And if you don't want to follow the redirects, say your are writing a grading script for your homeworks, you need to look up the docs. There's almost always an option to tell the library not to follow automatically follow redirects. So that's something to keep in mind when you get behavior you don't expect as just happened to me. Okay, let's move on.

05 l XML

Let's move on a little bit. We now know how to make basic requests with urlLib. You guys are going to become very friendly with that module. I'd like to talk a little bit now is what we actually send over the wire between two computers. We could have our servers--in this case we'll use the Hipmunk example. We could have our servers make our request to Amtrak and receive HTML back from them. Then we can actually look into that HTML on Hipmunk servers. That's actually what we do, but this is suboptimal. Let me show you why. You've written some HTML at this point. You know that it's somewhat complex. It's not very regular. You've got things like--browsers are very forgiving. You can write HTML to look something like this where you have an opening <form> tag, and you have an opening tag to make some text bold, and you can forget to put the closing tag, put your closing </form> tag, and the browser will actually probably render it appropriately. At least some browsers will. If you were a computer trying to parse this, you're expecting a tag to have a tag. All of a sudden you can get lost in this loop. Depending on how complicated you want to make your parser, maybe you can recover from this like browsers do or maybe not. But HTML is not an ideal language for computer-to-computer communication. It turns on Amtrak, we actually get their HTML, and I'm going to show you some of the heartache we have to go through to actually parse this HTML. Remember I gave you some regular expressions during homework 2 to verify your quiz answers--to verify a username and an email. These are a bunch of regular expressions that we actually use on Hipmunk to parse Amtrak's HTML. As you can see, this is just a wall of text. This is extremely error-prone, and you can see like we're actually looking for div with class availability. We're going to look for the span whose ID is service_span. This is what a time looks like. This is really nutso. This is not the ideal way of doing things. In a perfect world, we wouldn't have to hit Amtrak's webpage. We would instead use an API that speaks a language more appropriate for this task. Suck a language, if language is the correct word, might be XML. XML is what actually invented in the late '90s specifically for this purpose-- to have a regular way of expression data between computer systems. I can't claim to be the biggest fan of XML, but it is fairly easy to parse. In fact, you've seen a lot of it. So this is what some XML might look like. If you're thinking this looks an awful lot like HTML, you are correct. We have our first line, which is basically the document type. We have the same thing in HTML. Remember we've been using HTML5, so our doc type looks something like this. It's just the first sign that says what format the rest of the document is. Now, the reason both HTML and XML have doc types and this tag structure is because they actually share a common ancestor in SGML, which was invented in the '80s. Now, the main difference between XML and HTML is in XML every tag has to have a closing tag. We've got opening <results>, closing </results>. The tag format is the same. We've still got out less thans and our greater thans and our slashes to indicate a closing tag. But we have no void tags in XML. Remember in HTML we could have the
tag for putting in a line break, and we never had a closing
tag. That's because HTML doesn't require all tags to close. We have this notion of a void tag. The line break was an example of one of those. It's just a opening tag. XML has nothing like that . Now, if you want a tag that has no content in XML, you could do something like this. You could include a closing slash before you're greater than symbol. In fact, there is actually a doc type for HMTL called "XHTML," which basically says my HTML document is actually going to be valid XML. Instead of doing void tags with no closing slash, you include the closing slash before the greater than. You'll see that a lot in XML. The whole point of what I'm trying to say is that XML is very similar to HTML, but it's more rigorous. It's similar because they share the same ancestor. Now, I'm not going to spend a whole lot more time on the structure of XML, because we spent so much time on HTML already. Just keep in mind that it's similar to HTML, but a little bit more consistent.

06 q XML and HTML

Okay, quick quiz. Which of these are true statements? All HTML is XML? All XML is HTML? HTML can be expressed in XML? Or XML and HTML share a common lineage. Check all that are correct.

06 s XML and HTML

The first answer--all HTML is XML--that's not true. Despite the similarity between the two, HTML can have things in it that are not valid in XML. A good example was the void tag--the
tag--with no closer. All XML is HTML. This is also not true, but it is borderline acceptable. You could certainly--if you had an XML document that was full of all HTML tags it would be very close to HTML and would actually probably render in a browser just fine. For example, if you included in your browser something like this-- and opening and closing
tag--it would actually probably render just fine. Actually, that's a good question. I'd invite you to see if this enters one new line or two. I honestly don't know off the top of my head, and I'm not going to quiz you on it. Now, the next answer--HTML can be expressed in XML. That is true. When we use the doc type XHTML instead of just HTML, that says the following HTML document is actually going to be a valid XML document, and you can parse it as such. You don't have to look for broken tags. Your browser doesn't do less work. It's not expecting the HTML to be somewhat sloppy as HTML often is. If you say it's in XML, it'd better be in XML though, because the browser is not going to be quite as lenient on you. The final answer--XML and HTML share a common lineage--this is also true. Remember their ancester is SGML, which stands for Standard Generalize Markup Language. There are actually many other types of documents that descended from SGML, and HTML and XML are the two that will affect our lives in this course. Okay, let me just show you some doc types in the browser now that we have a little bit more of a framework to understand these. If I were to go to Wikipedia, and I were to view the source of Wikipedia, I would see that the doc type is HTML. This means HTML5. I know it doesn't say 5, but trust me on this one. Doc type HTML means HTML5, which is the most modern version of HTML we have. Now, if I were to go to a particular Wikipedia page, the Wikipedia page for SGML, for example, and I were to look at the source of this page, we see that the doc type is actually XHTML. It's actually XHTML transitional, which basically means the document is going to be in XML but will have some non-standard things in there. I'm not going to get into too much of how this affects things, but it does make some browsers behave differently. Why Wikipedia has two different doc types on two different pages I cannot explain to you. Although I can offer one guess, which is the SGML page is probably generated dynamically, and the front page is probably a nearly static page that are served from two different servers or two different machines, which is fine. You see things like this all over the internet. But when you see XHTML that means every tag in here should have a closing tag. You can see, for example, some of these header tags are these little void tags that have the closing slashes. You wouldn't see that in most HTML5 documents, although any browser would accept it, because it doesn't actually hurt anything.

07 l Parsing XML

The next thing we're going to learn about is how do we parse XML? Now, I'm not going make you write an actual parser. I think there's actually probably a whole class in Udacity learning how to do almost exactly that. What I'm going to show you how to do is use the built-in parser in Python. Python has a library called "minidom," and you can get it by saying something like this: Now, one thing I would like to point out real quick here is when you're working with XML you'll often see this word "dom" up here. What this stands for is "document object model." This basically refers to the internal representation of an XML document. In Python you would have an object that has a list of children, and each of these children is some sort of tag object, and a tag object may have a name and an attribute and contents and that sort of thing. Any time you're dealing with XML programmatically, you'll see references to a dom, or if you're working in your browser, you'll see references to "the dom," which kind of refers to the document, the HTML that you're manipulating programmatically. In this particular case we're going to use minidom. Why is it called minidom and not something else? Well, "mini" kind of implies that this is a smaller, lightweight version of this dom parser. Actually, parsing XML is actually a really complicated thing, because you can get XML that is many, many gigabytes large sometimes. Parsing all of that text is nontrivial. But when you're only parsing a little bit of text, you can use this library minidom, which is basically simple and fast and will break if you throw lots and lots of gigabytes of text at it but for our purposes will work just great. I'm not going to quiz you on this sort of stuff. Just kind of carry this with you. Dom refers to the computer representation of the XML, and minidom is a handy library for manipulating this stuff in Python. Now I will show you how to use it. Here we are in Python. I'm going to give you a little demo of minidom before you start using it on your own. From xml.dom import minidom. Now we have our minidom. Minidom has a function on it called "parseString," which is a function for just parsing a string of XML. Let's go ahead and give that a whirl. I've typed up some example HTML. We have an opening <mytag>, some text, an opening <children> tag-- remember these tag names I'm just making up. HTML has specific tags that you need to use. XML you can have whatever arbitrary tags you want. It's up to the people reading and writing the XML to agree on the tag names. I created some items--item 1, item 2. I closed my </children> tag and I closed my </mytag>. Now, when I was typing this, I had a little typo here, and I'm kind of curious to see what happens. Let's go ahead and run this with the typo and see--oh, boy! [chuckles] So I ran this with the typo to see what would happen, and we get an error--a mismatched tag. That kind of makes sense. We have an opening "chilrdren" and a closing "children." Let's just make this proper. Okay, we're going to run this without the type, and I'm going to store it in a variable so I have access to it. I'll call it x. All right. This time no exception. If we were to take a peak at x, we can see we have this minidom document instance. Let's take a peak at what we have on x. Holy smokes! Look at all this stuff. There is a lot of interesting things here in x. It looks like appendchild, functions for manipulating the document, all this creating nodes, and stuff like that. Some lookup functions--these are what we're going to be using later-- getElementById, getElementByTagName. NS refers to a name space. All sorts of stuff--parentNode, some output functions. Toprettyxml--this is actually an interesting one, so let's play with this one. This is one I use all the time. If we were to take our document object and call "toprettyxml" on it--toprettyxml. This actually doesn't look very pretty, does it, at all? Let's print that, because this is the Python string with the new lines in it. If we were to actually print it, it would look a lot prettier. Here is the xml that I entered, and you can see the structure of the document. It indents it nicely for us. That's a handy little function. When you download XML from somewhere you can see the structure of it a little bit more clearly with prettyxml. Okay, there's a function I'd like to show you here. Get elements by a tag name. Now, if I were to run this function on our x object and give it mytag, it returns one dom element. If I were to run it on item, we actually get two dom elements. Looking at the first tag called "item," we can see that we have an item. If we were to look at its children, we can call child nodes to see a list of children. We can see that we have one text node. If we were to look at the first one of those, we can access the node value attribute and see that it's 1. Now, remember our pretty printed version of our XML. What we just did here was we said get me all of the elements that are called item. Here is 1, and here is 2. On this first one, which is this guy here, get me its first child, which is basically this node here, which isn't strictly a node, but in minidom it's represented as a text node, which is basically just this text content. Different libraries may handle contents differently, but in minidom this is how we get it. Then we can actually say get the value of that text node. That's how we got the number 1 right there. This u basically means that's a unicode string. Minidom assumed that we were entering a unicode string, which is fine.

08 l RSS

One thing I'd like to expose you to really quickly is RSS. You've probably heard of RSS before. It's how you read a website that has daily content like a blog or a news site, and you may have a reader that's specialized for just reading the content. RSS stands for "RDF Site Summary," and RDF stands for "Resource Description Framework." I'm not going to quiz you on this. RDF is an XML format for describing just about anything. It's basically for representing knowledge in XML. Actually, in my first job I dealt with a lot of RDF, so I don't want to spend any more time on it. More commonly, RSS actually refers "Really Simple Syndication." That's kind of more of the context of how we're going to be dealing with it. RDF was kind of conceived to organize the world's information problems, and RSS uses RDF, but really it's just a list of content in XML. Let me show you some examples of this in the wild. I'm at The New York Times's homepage right now. If we go down to the bottom of this page, we'll see a link for RSS. I'll click that link, and I'll choose The New York Times' global homepage. Now, you can see in our URL we have global home.xml, and so we've received an XML document. Most browsers will actually display XML in a nice way. This is an XML document. It's actually an RSS. What this basically means is there is a particular name space--a tag space, if you will-- for the items in this list. Just like HTML opens with an HTML and has a body and very specific tags, an RSS document will also have specific tags. In XML, you can use this header area to describe what name space you're going to use. We're using the atom name space and the RSS 2.0 name space. Basically what this is telling our parser is you can download descriptions of these tags from these URLs, and then we'll know what tags to expect. I'm not going to quiz you on what RSS actually is. If we take a little peek at this document, you can see it starts out with some header stuff-- kind of in this channel section--and then we get down to this list of items. I'm going to collapse these first few, and we can see we've got an item. This is basically just a list of stories that are in The New York Times. I can collapse an item, and we can see there's another item. There are actually a whole bunch of items. Each item has a title and a link and the sorts of things that would power an RSS reader-- a little description. This is neat. This is for an RSS reader or a program to download the contents of The New York Times without having to parse all the HTML, which brings us to our next quiz.

09 qs Parsing RSS

A question I'd like you to answer for me is going to that URL that we were just at-- The New York Times RSS listing. Here is the URL. We'll also include this in the notes so you can copy and paste it. Use urllib and minidom in Python to download this page and tell me how many item elements are in that listing. Remember the function getElementsByTagName( ) will be particularly useful to you on the minidom object. Okay. Have at it.

09 s Parsing RSS

And the answer, at least at the time of writing, is 16 and if it's actually different from 16, we will have our grading script update accordingly. Let me show you how I arrived to this answer. All right, so here in Python. I first import the libraries I need, the urllib2 and minidom. Okay, and then I download The New York Times page. I am going to use urllib2.urlopen and then paste the URL. And we'll go ahead and call read on that to download the contents And we're going to start on a variable called "contents." We'll take a peek at contents. There we go. A lot of stuff and it looks like RSS. You can see the closing RSS tag there. Now, we're going to parse this with minidom. Okay. That worked. It's got a variable D. It's got a document instance. And I'm going to use get elements by tag name "item" to find all of those. So, let's give this a whirl. Success. And then we just run length on this. Okay, the answer is 18. So, grading this is actually going to be a little tricky for us. All we wanted you to do is to go through this process. So hopefully that worked out for you and you got a number and hopefully we manage to grade that number. That's how you parse some basic XML. I think you can see probably the value here in that many webpages have-- XML interfaces them or RSS interfaces them where you can actually download their content and manipulate it from a program.

10 l JSON

The next thing I'd like to talk about is JSON. JSON serves the same purpose as XML, which is it's a nice kind of computer and human readable way to exchange data in a consistent format. It stands for JavaScript Object Notation. The reason is says JavaScript is because JSON is actually valid JavaScript code. It looks something like this. It might look something like this. To use our travel search example from before, we have this dictionary structure. This actually looks a lot like Python code, because Python and JavaScript have very similar syntax for dictionaries and lists. We have this dictionary. It might have a key called "itineraries" whose value may be a list of other dictionaries. In this case we have a dictionary for each routing or something like that, or we have a dictionary for each leg, which may have a key for "from" and a key for "to" and a value for each of those. This might be leg 1, and this might be leg 2. You can see leg 2 is also made up of another dictionary, which is what we use the curly braces for, which has a couple key values pairs of its own-- key "from," value "IAD," and key "to," value "SEA." We can close our list, and we can close this dictionary. JSON is really handy for expression these types of objects. Anything you can express in XML, you can also express in JSON, except JSON is a little less verbose, because you don't need these opening and closing tags. You can build things up out of dictionaries or a mapping or an object, depending on what vocabulary words you want to use--a hash table, which is a just a curly and then a list of key value pairs, just like you would in Python, just like you would in JavaScript. You can also have lists, which use brackets just like Python does, and separate the values in the list with commas. We could have 1, 2, and the string "three." We can have both integers and strings in our lists or in the value of a hast table. The list can also be the value of a hash table. A list can also be an item in a list. This could look something like this. So we've got a list inside a list here with two more data types-- a boolean, which you're familiar with--true or false, and a float. These are basically all the data types we can have in JSON-- int, string, boolean, float. We can, of course, also have "null," which would be inside, for example, the empty list. Our main data structures are dictionaries or a mapping, which is a key to a value or multiple keys to multiple values, and lists. What I'd like to show you now is how to parse JSON in Python. We're in our Python interpreter. We could import JSON, which is now included in Python's version 2.6 and newer. If you're using Python 2.5, I suggest you try to find 2.6 or 2.7. I think App Engine uses 2.7, which is what we've been in this class, so you shouldn't have any problem importing JSON. If we were to make a JSON string in Python here, let's call it "j." If we were to take a string to represent some JSON-- in this case it's basically a dictionary with two keys, "one" and "number." The value for "one" is 1, and the value of "numbers" is the list [1, 2 , 3.5]. Let's parse that in JSON. In JSON we use the function "loads," which basically stands for load string. There is also load, but that expects a file. In this case we're going to be using just loads. When we run that we get back a Python dictionary with our same keys, numbers and one-- the order doesn't matter in Python dictionaries--and our same values, 1 and [1, 2, 3.5]. If we were to store that in a variable d, we can manipulate it like this. We can look at d['numbers'], and we get our list. We can look at d['one'], and we can see our number 1. Actually, because JSON looks just like Python, we could actually eval(j), and what eval does is it actually treats this as Python code as if I had just typed this at the prompt. This is the result we get. Now, that's a neat thing you can do. Never, ever do it. Because in addition to having eval JSON in here, somebody could actually have code that might do something to your computer. Never use eval for parsing JSON. I just wanted to show that you can use eval to parse JSON. It's a really convenient thing when you're working in Python with JSON. The two sync up very nicely.

11 p Parsing JSON

One thing I would like to show you on the browser is we're going to go to Reddit, and we're going to go to reddit.com/json. This is going to load Reddit's front page expressed in JSON, which is something we've implemented on Reddit so computers can browse Reddit, and people can write third-party software that uses Reddit. When you add .json to Reddit you get this wall of text, which brings us to our next quiz. Okay. In this quiz, what I've done for you is I have a variable called reddit_front, which is what I'd just showed you from the browser. It's Reddit's front page expressed in JSON. Now, this is actually quite a lot of bit of data. Look at my scroll bar. That's a very long line. You don't have to worry too much about that. Just know that it is a string, a JSON string of Reddit's front page. Now, what I would like you to do, use the JSON library to parse the string, find in there the links. It's kind of a deep data structure. You can copy and paste this into a Python editor if it makes a little bit easier. You need to find the links and each link has an ups attribute, and what I want you to do is write a function called total_ups, which returns the total number of ups--the sum of all the ups of all the links in this list. Now, there isn't anything else in this list that has ups. Basically, if you could find every instance of the key-value pair ups and value, add them all up and return it. I just want to test your ability to load this JSON and manipulate it a little bit. Okay. Good luck.

11 s Parsing JSON

I'm going to show you how I arrived at my answer, so I'm going select this line, and I'm actually going to use the Python IDE to figure out the structure of this document. Here we are in the IDE. I'm going to paste in that first line. I pasted in that first line--it's in a variable called reddit_front. We can go ahead and run length on it. We see it's 26,000 characters. The first thing I'm going to do is import JSON, and then I'm going to convert this document into JSON using the loads function in the JSON module. Now I've got this big dictionary j, and it's got all this stuff in it. Actually that wasn't very useful. It just printed everything. Let's look at j.keys. We can see there are two keys here--"kind" and "data." Data is almost certainly the one we want--let's look at that. Ah, another bunch of stuff. Let's look at the keys on this. This has just four keys--"after," "before," "children," and "modhash." Children is going to be the one we want. The other ones are going to be just simple little variables. Let's look at children. Now we're starting to get somewhere. Let's look at the keys of this. It's a list, so it's probably a list of links, which is what we're expecting. Let's look at one of these--again, a bunch of crap. Let's look at the keys for the first element in the links or in this children's list. We can see that it has "kind" and "data." Let's look at the data for this guy. Ah, we're starting to get a little bit closer. Let's see what the keys are for this guy. Aha! Perfect. We can see that "ups" is actually in this. If I were to call "ups," we can see that it is the integer of the number of ups in this link. That's how I found this. Looking at our total JSON document, we're going to look at data. We're going to look at children, and then for each of the children we're going to sum up the "ups." If I were to change this 0 to a 1 to find the second element on this list, we can see that we get another variable. I'm going to take this piece of code with me into the IDE, and we're going to write our function to add up all of the alternative placement positions. Here we are in the IDE, and what we want to do is we want to sum up all the ups. I can say "sum,: and I'm going to say c['data']['ups] for c in ['data']['children']. Basically, what I'm doing is I'm iterating over the list data children, which we know is a list, for c and each element in that list, I'm going to look up data ups on that object c. Then we're going to sum it up using the Python built-in function "some." I'm just going to return it. Let's give that a run. Ah, j is not defined. That means I didn't load the actual string of the Reddit front page into a JSON object. Let's do that. Let's run that again. Here we go. Now we see our answer--103,978. Simple enough. What I wanted you to accomplish there was just learn how to load this into JSON and then manipulate the data structure a little bit. You can see it's just like manipulating any Python data structure, because JSON maps very cleanly to what we already have been working with in Python, which is dictionaries and lists and integers and floats and that sort of thing. Pretty handy there. You are now a JSON expert.

12 l APIs

On Reddit, we can get any page in JSON or we can also get it in XML by changing our extension, and a lot of webpages have this feature where you can get their content in different formats. Another good example is Twitter. If we were to go to Twitter and do a search--let's look for Udacity-- we can see all the tweets about Udacity. Now, if we want to get a JSON listing of this--I happen to know the URL-- we can go to search.twitter.com/search.JSONq=Udacity. Now we get a JSON listing of the search result that we were just looking at at Twitter. If you're writing web software and you want to manipulate another website or get data from another website, it usually just takes a little bit of poking around to find their APIs. The way I found this API is I just googled for Twitter API, and the documentation for this came up. I'll show you that really quick. There's this whole page about Twitter's search API. One of the nice things about when you're building websites is if you have data-- --let's say you have a blog--making your blog so it supports RSS, an XML representation of its content and maybe a JSON representation of its content, allows other people to build software on top of your website that can do cool things. Many large websites--Wikipedia, Twitter, Reddit-- all support these types of functionality, and your homework in this class is actually going to be to add that functionality to your blog that you've been working on. Just something to be aware of, most users don't really see this side of the internet, right? It's really only other developers who get to see these APIs and these other formats for the content. So it's cool that it's all there, and it's good to be aware of it.

13 l JSON Escaping

Do you remember when we were learning HTML and we had to escape our HTML content so it renders appropriately in the browser? If not, we covered that in Lesson 2, I believe. JSON has a similar issue. If we have this little JSON blog of a little dictionary that takes the key "story" and maps it to the string "once upon a time." This is valid JSON. Our key is surrounded by double quotes, and our value is surrounded by double quotes. What if we want to include double quotes in our value? If I were to just put a double quote in here, this would cause invalid JSON, because this quote actually ends the string, and then we've got a bunch garbage after the string. We would need a comma and another opening curly brace--all sorts of things. That totally screwed things up. Let's see what would happen if we tried to use that in our terminal real quick. I'm going to take that JSON string that we just loaded. This actually works. Our Python string is actually this whole piece. This is the JSON piece we're sending into Python. It's surround by single quotes. Remember in Python you can use either single quotes or double quotes to delineate a string. That's what we're gong to use here. We're going to use single quotes so we can use the double quotes inside the JSON. That works just fine, and if I were to do what I just did in the editor and replaced this p with a quote, let's see what happens. Explosion. The JSON parser did not like that at all. The way we get around that is we escape this quote by putting a slash in front of it. This escapes the quote in Python, but what this turns into is still basically the same string we had before, which is just the quote. I actually have to escape both the slash and the quotes for this to work in Python. Basically, this slash is the Python escape for this slash, which basically says we're inserting a slash in the string, and yes we mean to do that. The JSON interpreter will see that slash and say, okay, they must mean to include this quote. Let's give this a run. There we go. The other way to do this in Python that is a little simpler--instead of using double slashes, which is kind of confusing--is we can put an r in front of our string, which says this is a raw string, which basically means, Python, ignore any escaping we're doing in here, for the purposes of Python and let the JSON module interpret this slash however it will. If we run this, it also works with our quote in the value. The answer, as we just saw if we want to include a quote in our value is to put a slash in front of it, and that's the u that we had there before. We have to escape our quotes with a slash. This is not a big issue when we're reading JSON, because we assume that the JSON we're going to be reading is valid, and if it's not valid, our JSON module will tell us. It'll throw an exception when we try to read it.

14 q Escaping JSON in Python

What if we're writing JSON. Well, the function for writing JSON is called dumps. Just like we had loads, this stands for dump string. If we just use dump in Python, that would expect a file argument, so we're writing it directly to a file. But we'll just be using dumps. You can pass into dumps a Python object. In this case, let's just say [1, 2, 3]--a list. Dumps will convert that to JSON for us. Let's go ahead and see that in the terminal. You can see I didn't put quotes around this, because this is the actual list object we're converting to JSON. It outputs a string--that's what these quotes are--of the JSON representation, which looks almost identical. That's pretty cool, huh? If we were to make this object a little bit more complicated, map the string "one" to the number 1 and the string "two" to the number 2, we get our JSON. Now, the order changed because order is not defined in dictionaries, but we get our JSON. Now, let's see where escaping comes into play. If I were to instead print the string "two"--if I were to make this string 'the man said, "cool!"', what would dumps do? What I did is I changed the value for the "two" key to a string. Remember these single quotes are delineating the string. Then the string has an internal section with double quotes in it. Our JSON library will escape that for us, and you can see here that it is printing out a Python string that is valid JSON. The Python version of the string has double slashes in it. You need to be careful, because this is not in itself valid JSON. This is valid Python representing valid JSON. That's why you have these double slashes, so you need to be careful when you are copying and pasting JSON in and out of Python that you get the escaping right. If I were to take this and instead print it so we can see the actual value we just have the single backslashes, which is the actual valid JSON. Time for a quick little quiz. What is the valid JSON representation for this Python data structure? This is Python code. We'll include a text version of this so you can copy and paste it into your editor. I'd like you to put in this text box what the valid JSON version of this is. Basically, I'm testing your ability to use the JSON module and have it escape properly. Remember, I don't want to see the Python version of the JSON string. I want to see the actual JSON string that, if you were sending this over the wire to somebody, that would be interpreted properly by the JSON reader.

14 s Escaping JSON in Python

The correct answer is { "blah" = ["one", 2, "th\"r\"ee" ] }. The main distinction between a Python object and a JSON representation of a very similar object is that JSON has to use double quotes to delineate a string. It can't use single quotes like Python does. You must escape any internal double quotes. So this is basic JSON. As we've seen, it maps very nicely to Python data structures, assuming you're using integers and floats and strings. If you want to map a more complicated Python structure to JSON-- let's say you're doing an object or maybe a date time or some other things-- the JSON dumps function isn't going to work for you. You're going to actually convert those by hand to a simple data structure made up of dictionaries and lists and integers and strings so that we can output it properly.

15 l Being a Good Citizen

I'd like to now take a few moments to talk about how to be a good citizen on the internet. There are two key things you can do when you're writing programs to manipulate other people's websites or to access other people's websites that will make everybody's life a lot easier. One is use a good user-agent. Remember we talked in Lesson 1 about user agents are the header that describe what browser you're using or what program you're using to access somebody. If you're planning on accessing somebody in a consistent fashion-- if you're going to poll them every couple of seconds for updates or do something like that, use a good user agent. When you're using urlLib2, you can specify headers in your request, and you should set a user agent header that says who you are, what you're name is, maybe links to your website so that somebody on the other end, if they see you pounding them with lots and lots of requests, they know what's up. They have a way of reaching you to ask you to stop or to tell you they blocked you or that sort of thing. It's good to always include that. The other really important thing is to rate-limit yourself. If you want to download, let's say, all the search results for the word Udacity on Twitter, yeah, you can request them 15 at a time--which is what their API returns, I believe-- as fast as you can, but you'd be really send a lot of requests to Twitter, because you can have some loop, and it's much, much faster than any human could type it, and that would actually hurt Twitter's servers. If you were to have code like this in Python--while there's more stuff, make another request to Twitter--and just run this on this infinite loop-- or maybe not an infinite loop, but a loop that's going to run through a number of iterations-- you'd be sending requests as fast as Twitter can possibly serve them. Instead, it's really good to get in the habit of using the sleep function. In Python you can say import time, time.sleep(1), and this will cause your interpreter to sleep for 1 second. This is nice. Then you're only hitting them once a second, which is much more sustainable. But if you abuse their service or do too many requests, they'll probably rate-limit you. I know Twitter does, because I thought about having a quiz in this lesson that was how many requests in a minute can you make before Twitter rate limits you, but then I realized that that would be the exact opposite of being a good citizen on the Net. Asking thousands of students to go hit some website as fast as you can is generally not a nice thing to do. Instead, we're just going to talk about it, and I'm going to ask you to make sure if you're hitting somebody hard that you structure your code like this. Include a sleep so you pause a little bit and don't hit anybody too hard.

16 l SOAP

Okay, another protocol I'd like to talk about is SOAP. Actually, I'm just kidding. We're not going to talk about SOAP. SOAP is based on XML. It is another protocol for communicating between two machines. If you ever have to deal with it, you'll know why that I don't even want to bother teaching it. It's very, very complicated but what I would like to list for you are a bunch of other common protocols and formats for communicating across the Internet, and SOAP is one of them. We won't be spending anytime on any of these but lots of people use them. For example, at Hipmunk a lot of our data sources communicate via SOAP. It's invented by Microsoft to make communication online as complicated as possible. We got protocol buffers, which are from Google. It's similar in concept to JSON. It's a way of encoding different types of data for sending it over the wire. And here's another one called Thrift. This is by Facebook. Now, you got all sorts of plaintext, custom formats. Now these are not all--SOAP kind of defines a whole protocol. Protocol buffers are really how to encode data. Thrift is how to encode data over the wire. These compare more to JSON. SOAP compares more to HTTP plus JSON-- kind of the whole package, the protocol and the data type. And of course, you can always just build your own plaintext protocol, and data format, but I wouldn't recommend doing this. It's not that hard to just use JSON instead and then somebody else who comes along and needs to use the service, whether it's outside of your company or internally, They don't have to figure out how to write all this custom code to parse your custom stuff. Because JSON and Thrift and Protocol buffers and SOAP, you can find implementations of these in almost any language. We mentioned XML and JSON. Those are also in this list. Use something that already exists. It'll save everybody a lot of time. I probably woundn't use SOAP.

17 q Good Habits

Okay. Quick quiz to summarize this section. Which of these are good habits to get into? Sending proper user agents? Writing custom protocols? Using SOAP? Rate-limiting yourself? Or using common protocols and data formats?

17 s Good Habits

The correct answers--at least according to me-- are sending proper user-agents--very important. Writing custom protocols, no. There's a time and a place for it. Generally using common protocols and data formats is a better idea. Using SOAP--depending on whether we mean the protocol or the cleaning substance, the correct answer may be yes or no. We're going to go with the protocol and say I wouldn't use it if I could avoid it. You might make somebody at Hipmunk really upset. Or rate-limiting yourself. That is also correct. Always be kind. Proper user-agent. Rate-limit yourself. Use protocols and data formats that everybody knows how to work with that are easy to work with. Your life, and everybody else's, online will be much easier.

18 l Ascii Chan 2

We're actually going to do some programming, and we're going to add a feature to ASCII Chan. If you don't remember ASCII Chan is, we started using that in Lesson 3, I believe. It looks something like this. We just have two inputs on a simple form, and we can insert in some ASCII art into the box here. Then when we submit it, we see our art here on our page, if we were to reload this page, we would show the 10 most recent pieces of artwork submitted. Obviously, this site is going to be a big deal--a big social community. I'd like to add some features to it. What I'd like to do is to draw over here a map that shows on the map where the most recent submissions came from, so that when you come to ASCHII Chan you can see what a global community it is. We have to do a couple things. Let's talk about them. One thing we're going to need to do is figure out where the user submitting is from. We're going to need some sort of service to convert the request IP into coordinates. For this we'll be using a service I just found called "hostip.info," which is a handy little website. I'll show you right now. I found this by googling. Basically, you go to hostip.info and for any IP address, they will tell you where it's located. They also have an API that has very simple documentation. Basically, if we go to api.hostip.info and include the IP address in a get parameter, they'll give us some XML with location data, and I'll show you what that looks like. For IP--their example IP--we get all this information, including city, country, country abbreviation, and what we're really after--some coordinates. Here we've got the longitude and latitude. This will be a really handle service that we'll be using. We also want to draw a map, and for this we'll be using Google Maps. Google Maps has a really handy service called "static maps," which is where we can basically make a URL that draws a make with markers on it. I'll show you where the documentation for that lives. That is here. It's called the Static Maps API version 2. If we just scroll down here to a quick example, we can see the type of thing we can build. Given this URL, it would load this image. If we were to just copy this URL here and load it in a new tab, we would get that image with the markers, and these markers are all defined in the URL itself. That's going to be a pretty cool, handy service. What I want is a map just like this, and I want it to appear right here, and I want it to have the location of the most recent submissions.

19 p Geolocation

Let's start with the first piece, which is going to be implementing a function that looks up the IP. Let's look at our code. Here is our code for ASCII Chan. It's not a lot of code--64 lines, it looks like. Most of it is contained in this main page handler. Remember the get function just called render_front, which is this function. All render_front does is run a basic query to get the 10 most recent pieces of art. This Google database object here renders front.html, which is my template that draws that whole form and the 10 most recent arts. The post function here grabs the title and the art work from the request, and if we have both of them, creates a new art object and puts it in a database and reloads the page by doing a redirect. Otherwise it draws an error. What we're going to do is we're going to change our code like this. I'll just kind of put in comments what we want to do. We want to look up the user's coordinates from their IP. If we have coordinates, add them to the art. That's going to be our first phase. Let's work on the first part of this. Look up the user's coordinates from their IP. We're going to need a function. We'll go ahead and throw it at the top here. We'll call this "get_cords," and this is the first function we're going to be implementing. This is going to take an IP, and it's going to make a request to hostip.info. That request is going to look something like this. Here is our API documentation. We're just going to take this URL, and we're going to paste it here. We don't want to use their IP. We want to use any IP. That's the URL we'll be requesting. We're going to be requesting this URL using URL obtuse. Let's go ahead and import that. We're going to be parsing the response. It's in XML, if you recall. We'll verify that. We're going the parsing this XML using minidom. Let's go ahead and import that. We know how to make a basic URL request. I'm going to show you a few things we do in the real world to be a little less error prone. The first thing we want to do is we want to store the content of the URL in a variable called "content.: The we're going to load the URL. We're going to urllib2.urlopen, like we've been using this whole time. We're going to call read on that response. Now, if this URL is invalid or that website is down, this is actually going to raise an exception I happen to know what that exception is, so I will go ahead and--. If there is URL error--I found this by trial and error when I was working on this the first time-- we'll just go ahead and return. There are no coordinates. That's if somehow this service is broken. Normally we might log this. If we were making a bigger site, I'd probably put some logging in here so that if I'm maintaining the site, I can see that the geocoding is broken for some reason. For our purposes right now, we'll just return none, because we'll be watching it. Then down here we can say if content--there is a chance that the page is broken in some other way and we just get an empty response, so we want to make sure there actually is content. Then in here, we're going to parse the XML and find the coordinates, which you are the lucky person who is going to write that code for me. What I'd like you to do is implement this get_coords function. It's going to take a blob of XML. Here is an example chunk of XML that comes from that website that should work. What this function should do is it should return the coordinates found in this XML. Keep in mind the coordinates here are longitude-latitude, and I'd like you to return a tuple of latitude-longitude. You're going to have to reverse them. If there are no coordinates in the XML--say, for example, we get a response that looks like this--I changed the IP to 0, 0, 0, 0, and you can see the response here. There are not coordinates in here. This is a case we're going to have to handle. Because hostip.info is a free service, and it only has the locations for IPs that people have entered. It'd also be worth your while to go ahead and go to that site and enter your location if it doesn't know it. I'd like you to implement that function. If it receives XML that looks like this, it returns the coordinates in the proper order--latitude, longitude. if it doesn't find coordinates, it just returns None. Good luck.

19 s Geolocation

Okay. My answer looks something like this. I had to import minidom so we could use that, and then we used a parse string for minidom, which is what we used previously in lecture to parse the XML that were passed in in this parameter. Then I used getElementsByTagName GML coordinates. Sometimes if the XML is more complex, you can't just cheat and find the one tag name that you're looking for, but in this case we know there's only going to be one coordinates element in a valid response, so we can just see if there's any element in here that has this tag name. If we have that element, this should return True. We're actually kind of going out on a limb here. If this wasn't such a simple document, such a simple protocol, I'd probably have to check to see are there childnodes? Do those childnodes have values? That sort of thing. I haven't done any testing that would suggest that. Just assuming it's all there it doesn't work. What I do is if it's all there, we call a split, which splits on the comma to get the two parts-- the longitude and the latitude--and then we reverse them and return with latitude and longitude. If we print our response, this is what we get. Now if I were to change our XML a little bit--let's say we just get rid of the actual coordinate from the sample XML I passed in--and run this, we get None as our answer. That should work for now. Good job if you got that, and let's add this to our program.

20 l Debugging

Okay. Here we are in our get coords function. If we get content back from that URL we run this code. Now, I don't have the return statement here, because I want to change this just a little bit. In the quiz, I had you return lat, long. What I want to return here is db.GeoPt(lat, long), which I have learned is a part of Google App Engine. It's a data type for storing a latitude and longitude. We could just store a tuple here. I figure if Google gives us a data type specifically for a location, that's what we should use. I'm going to use one of these, and that's what we're going to return out of our function. Let's test that this function is working. A quick way we can do that is just toss in a line right here. We're going to say self.write(repr(get_coords(self.request.remote_addr))). What I've added here is just a quick little hacky thing in our get( ) function. First we're just going to call this function write( ), which--if you recall--I added up here, and it calls self.response.out.write( ), so I don't have to type so much. I can just say .write. Then we're going to call repr( ). Now, this is a handy little trick when you're printing Python objects in HTML. Because when you print a Python object, it has angle brackets around it, which the browser will interpret as a tag, and then it won't actually print what you're trying to print. If you print repr around it, you'll get some extra quotes, and it'll print properly. Then I'm just going to call our function get_coords( ), and I'm going to call it with the requesting IP address, which is the remote_addr attribute of the request object. I learned how to do this just by looking in the App Engine docs. Almost every web framer will give you access to the requesting IP. I knew what I was kind of looking for and just found that in docs. Let's give this a shot. I reloaded the page, and I see None up here. That's better than an exception. We didn't get a location. So let's do some investigating. The first thing I want to do is I want to see what the IP was, so we can kind of debug this service. Let's say self.write( ). We'll print the IP as well. Let's give that a shot. Ah. Okay. That means we're running this locally. It's not surprising that a service on the internet doesn't know what our local IP is. Let's verify that. We'll go to their API. We'll put in the API by hand, and we'll run it. And, yes, it's a private address. For those of you who don't know, every machine's local IP address is This is called the loop-back address. This is how a machine refers to itself. It's not a public IP address on the internet. Local host--what we're accessing here--generally refers to that IP. Let's cheat a little bit and see if we can fix this. This is something that's going to come up during development. For our period here, I'm just going to overwrite the IP we send in this function to be an IP that I know is real-- This is a big-name server than helps you resolve TNS names into IPs. We'll just hardcode the IP in this function to be this for now, so we have something to actually test against. Let's go back in our browser, give this page as reload, and see if we get anything useful. Ah-ha. So we are now seeing the coordinates of that IP address where that machine is located, or at least where this free service claims that machine is located, which is fine for us. We're not trying to be too accurate here. We just want to draw a pretty map. Our get IP function is working. We also conveniently tested the error case, which you should always do. We're in good shape there. Undo those hacks, and let's check back to our to-do list.

21 l Updating the Database

We've done turning our request IP into some coordinates--ta-da! All right. The next step is to draw a map--actually, there are some in between steps. Before we draw a map let's look at our code again. We actually have two to-do lists. We said we're going to look up the users coordinates from their IP. Let's go ahead and do that here-- coords = get_coords(self.request.remote_addr). Okay. That should work. Then we said if we have coordinates, add them to the art. Right now our art object doesn't take any coordinates. Let's add an extra property to our art-- coords = db.GeoPtProperty( ). Again, I found that this existed when I was reading the Google Data Store docs. Since we're returning a Geo Point here, we can store it in a Geo Point Property. Again, this is a Google-specific datatype for storing latitude and longitude, and it's super convenient. Now, I'd like to say required = True, but we already have some art in our database that doesn't have coordinates. We have a couple options here. We could either delete all that art and start over, but being that ASCII Chan is a famous site on the internet, and everybody is using it, we don't want it to just break. We'll just make this parameter not required. We'll just have it for future ones. This is something that comes up all the time when you're developing web applications. It's kind of backwards compatibility, because you're often adding features, tweaking your data model. This is one of the reasons things can get a little hairy. But it's also one of the reasons why the web is really neat, because you can kind of develop iteratively. We're adding our coords to our art, and we're going to go down here and say, if we have coordinates, add them to the art. That's easy--if coords--basically, if get_coords doesn't return None-- p.coords = coords. Now we're good. Let's go ahead and try submitting some art in our browser and see if we get an exception. Reload this. Let's submit a new picture, call this one "cat," enter in a picture of a cat, and we'll submit this. Okay, I didn't see any exceptions. Let' me show you something handy we can do in Google App Engine to make sure this actually submitted properly. You may notice that when you start up App Engine, it actually mentions in the console that the admin console is available at this URL or at a URL like this. Let's go ahead and visit that and let me show you something we can do. It defaults to the datastore viewer, and it selects all of the entity kinds we have. We only have one in this drop down. If I click list entries, I can see all of my entries, and I can see here is camel and here is cat--my two entries. I can see that camel doesn't have any coordinates, because we entered that before we added this feature, and we can see our cat has map coordinates. So our feature is working, and that's pretty cool. There's all sorts of handy stuff in this tool here. You can check out your indices and all sorts of cool stuff. It's neat to poke around in here. But, working with the database, this is a particularly handy view. So let's get out of here for now, and let's move on to the next feature, which is actually drawing this map.

22 l Querying Coordinate Information

Here's our map API. What we need to do is figure out how to draw this map. The URL is going to look like this. It's going to be this URL--maps.googleapis.com/maps/apis/staticmap, and then it's going to have some parameters. I've done a little research already to find that the only required parameters are size-- which is going to be the size of the map, sensor-- if we were doing this on a phone or in a browser that knows where you are already, whether to actually try to figure out where the user is. We're not going to use any of that. Then we can just add these marker parameters. Here's one marker, and you can see in here this marker is defining the color to be blue, and it has a label, and then, also, you can see something that looks like coordinates in here. I've learned that the color and the label aren't required. We can just say marker and given it the coordinates. If you notice here there is another marker parameter. The way this API works is it just looks from for multiple marker parameters. We'll just add one marker parameter in the URL for each point that we have. Let's go ahead and do that. Now we want to update our front page to actually have the art images. Let's think about what we're going to do. We're going to first find which arts have the coordinates. If we have any arts with coordinates, then we need to display the image URL. First thing we need to do is we need to define which arts have coordinates. This is fairly easy. What we can do is we're going to take this list of arts, and for each one we're going to check to see if it has coordinates. Let's say for a in arts: if arts.coords: points.append(a.coords). We have to define points as equal to the empty list. This is pretty straightforward. For a in arts--for each one of these--if there are coordinates-- The way the datastore works is if there are not coordinates in the database, this'll just be None. It won't blow up on us or anything. Just add it to the list points. This is a clear way of writing it. How I would have written it is slightly less clear. A little shorter. This is just some Python learning for your own edification. Points = filter(None, (a.coords for a in arts)). This is a generator--a.coords for a in arts. This would return an iterator of all the coordinates, which may be either coordinates or None. Then filter takes two parameters. It takes a function or None and a list or an iterable and basically returns all of the items in the list that match the filter. In this case, if the filter is None, it basically says match all the ones that aren't None. This will give us all of the coords for each a in art if it's not None. That does exactly the same as this. We're going to use the slightly shorter version. This should work. Let's make sure this doesn't blow up on us. Here we are in the browser. We reload the page, and we see our little guys. Maybe we should print out the coordinates so we can see them. Let's go ahead and do that. Okay, we're going to go ahead and write that out. Let's try that in our browser. There we go. There's a list of our points, and in this case we only have one. That's good. There's one subtle bug I'd like to point out to you in this code. Arts is a query. When we iterate over arts, that's when we actually run the query. If we never iterated over arts here, this query wouldn't run. Well, we iterate over here, and then when we're rendering the front.html-- this template, if you recall, has a loop in it that draws all the arts. This also iterates over that arts query. Each time we do that we run the query. We don't want to run the query twice. That's wasteful. Not only is it wasteful a query, the results of that query could change. When ever you find yourself in a situation of iterating over results you get from a database, whether it's datastore or some other kind of curser based abstraction from a database, it's usually good to say something like this: [arts = list(arts)] What this does is it creates a new list out of that arts. The arts that comes out of here is a curser, and then we basically call the list constructor on it, which says, okay, make a new list out of that iterable. Then we can iterate over this list as many times as we want, and we've kind of detached it from the query. This is a good habit to get into if you think you're going to be using results from a database more than once or you're going to be manipulating this result from a database too far away from this query, because generally if you get a list of things from an iterable, you assume that you can go over it a couple of times. That's all this is going to do. It's basically going to cache the results of this query in a list. It's very subtle, and I was actually struggling a little bit how to demonstrate that to you. I don't know how to debug Google queries. If I could have made this print in the console, I would have. If we figure that out between now and when we wrap this lesson up, we'll include that in the instructor comments how to show what queries are actually running. But I did a little googling and confirmed for myself that this would be the case. That's good to do.

23 p Creating the Image URL

We've accomplished this first part to find which arts have coordinates. Let's just stick our comments up here. Now let's say, if we have any arts coords, make an image URL. If points image URL equals-- let's call this gmaps_image, and that's going to take some points. And guess what? You get to write this function. What I'd like you to do is implement this function_gmaps image. It's going to take a list of points. Now, in the ASCII Chan program there's going to be these Google GeoPoint objects, but since we don't have access to that in the IDE I made a fake Point_class that just has two properties - lat and long so it should work the same way as a Google Point does, and I made a little fake list of points here that we'll be testing with and we'll also throw you some other ones. You should generate a URL that looks like this. These parameters are all fixed--the size and the sensor-- and you can have a template URL here and then what you need to do is you need to add these marker parameters to the end of this URL and return it. It shouldn't be too bad. Good luck.

23 s Creating the Image URL

Okay, here's my answer. What I do is I make a variable called markers, which is going to be a bunch of strings of the format of 'markers = %s, %s', substituting in the lat and long for p and points. Then we're going to join those together with an ampersand, and then we're just going to append that to the base URL. Simple enough. Okay, if I were to run this with just a single point, it still needs to be in a list. Our function is still expecting a list but one point. Make up some fake coordinates here and give this a run. We see that this also works just fine-- "markers=100,200', and there's this one markers parameter. Let's go ahead and use this in our program.

24 l Putting It All Together

I'm going to go ahead and add that function to our program here. Actually, this one works verbatim just like out of the quiz. Now if we give it a list of points, it will return the URL for the image. Let's go ahead and use that. If we have any coordinates, make a URL image. We've already implemented that function. Let's go ahead and move our comment. Now we just need to display the image URL. Actually, we need to make one little change here-- img_url = None. If by default doesn't exist and if we actually have some points, we can set it to something else. Then we just need to pass img_url into our template--img_url = img_url. Now we just need to update our template, and we're going to insert our image. Remember, this is our template. I had a bunch of CSS. Remember I explained that kind of controls the layout of things. Then we have all of our HTML. Here is our form. We want to display this to the right of our form. I'm going to do somethings you haven't seen before in terms of getting it to display right, but first let's just get the image in here. If img_url--so basically it's not None--include our image. This is how we do that in the Jinja template language. Again, I know we haven't covered this in this class, but it's fairly straightforward. You just have some text, and you have these little escapes for actually running Python code. If the URL is there, include this image. Let's give this a whirl. I reloaded the page, and you can see below the form we have an image with a marker on it. Pretty cool, huh? Let's get our image displayed off to the right here. I'm just going to go ahead and do that, and I'll explain briefly how it's done. I've moved the image over here. I'll show you briefly what I did to do that. I added a class to map, so I can refer to it in my CSS. You can use this notion of position absolute, which allows you to position something anywhere on the page. I positioned absolute the map to be zero pixels from the right and 112 pixels from the top-- zero from the right, 112 from the top. It fits perfectly. Obviously, I knew the size of the image beforehand when we did this whole URL thing. Let's see if our program works. Remember we've got this hack in here forcing our IP. We're going to have to--probably if we're going to develop this long-term, maybe when it's in debug mode it'll chose from some random IPs or something like that, but when we put it into production it is going to use the real IP. Right now, we're just going to hard code some IPs so this actually works in the demo. I'll go ahead and include Udacity's office IP, and we'll submit some ASCII art. We'll have this IP, and let's make sure things work. Let's add a picture of a rabbit. We'll past that in or we'll draw it by hand, whichever you're faster at. We submit this--ta-da! It took a little longer to submit because we had to look up the IP, but now we can see that we used our office IP, which is down here in Palo Alto. We have our old IP from that other IP we faked, which in here in Colorado, and we can see our rabbit and our cat. Our camel didn't have a location. Now ASCII Chan is much more worldly. I'll put this online at asciichan.com, and you call can submit all the ASCII art you love to do. We'll see how this map goes. That's it for this lesson. I how you learned something about how to interact with other websites. Good luck on the homeworks.