Aaron Bassett: Effortless real time apps in Django¶

DANIELE PROCIDA: We are going for a coffee break in a moment, so I will keep you briefly. We are running about 15 minutes behind our schedule, if you can come back from coffee 10 minutes earlier, not half an hour, but in 20 minutes that will help rescue some time for our lightening talks. Briefly the numbers are improving, keep them going. As soon as Thursday evening becomes equal or greater than the numbers for lunch on Friday then I will stop bothering you. If you are coming out to dinner tonight, if you have a ticket for the VFS that is at 8:00 o’clock, if you are going on the walk, you can do that if you are going the VFS, tickets for the VFS are still available from the website, need to be at the Clink by 7:30 if you are doing there, there are no more tickets available for that thanks.

(BREAK).

NEW SPEAKER: Can I introduce Aaron.

AARON BASSETT: This place feels a lot fuller when you are standing up here! I don’t remember there being this many people in the hall when I was down that side. Yes I am Aaron, a freelance developer based in Glasgow been doing some sort of web development the last 15 years started in Django version 1.0 and been doing that more or less exclusively for the last 5 or 6 years, I run a small studio small is in me based out of Glasgow and as you might be able to tell I’m not actually Scottish myself I am Irish I have lived in Glasgow long enough however that my accent is somewhat muddled. I really don’t envy the transcribers today sorry about this but I will try and make sure my slides give some context to what I’m trying to say so if you can’t understand me you should be able to understand those. Throughout my talk I will be posting code, links, URLs, don’t worry if you miss them I will tweet them after wards.

Before we get into real-time apps today I want to do a little bit of the history in real-time and how we started off and how we’ve progressed to what me now know is real-time applications.

Have to go back to 1996. 1996 was the introduction of internet explorer 3, it was the first web browser to have I frames, and I frames was really the first time that we could up-date a part of our HTML document without refreshing the entire page. In 1997 another Microsoft team this time the outlet web access team introduced what would go on to become HTML request object and that allowed us then to start doing real-time in a better way than what we were doing with I frames, it wasn’t until 2004 with Gmail that we started to see these one page applications. But they were still using things like short polling. So short polling is where you issue an Ajax request to your server and issues lots of them so in this example we’re issuing a request once every second so it’s a lot of requests from our client to our server and it’s a lot of wasted requests as well because we’re issuing requests regardless whether the server has information or not, we’ve no way of knowing if there is anything on the server that’s been updated since our last request, we just issued rue guest and see what’s there. Then we brought in long polling and long polling was slightly different instead of having lots of very short requests we’d issue one long request with a very long time out. This example I think uses about a minute. And that request would stay open to the server until the server had some information for the browser, it would then send that information via that open connection, we’d close the connection and start a band new one over again and keep that connection open in the and that was slightly better because it meant then we weren’t issuing lots and lots of requests whenever the server didn’t have any new information to give us but both of these messages they’re dirty they’re hacks. We’re trying to shoe horn in real-time functionality on top of a protocol that wasn’t designed for it. In 2010 web sockets came along and web sockets it’s what allows us to do effortless real-time apps. It’s been specifically designed to solve this kind of problem and the major issue it overcomes if we use short polling as an example, every time we issue that request to the server we have to send our request headers and request headers are according to one of {inaudible} white papers are roughly 700 to 800 bytes. They’re not huge but it’s one standing on every request half second or whatever the frequency we set our short polling app that doesn’t need to be sent, for the majority of these requests we don’t really need to be sending these headers along with it web sockets it’s slightly different, instead of full request header we have only 2 byte header with one byte that marks the beginning of the data and one byte that marks the end and that’s it so it’s quite a substantial reduction in the amount of header size.

I know 700 bytes doesn’t seem like a lot any way but if we blow it up to web scale so say we have 100,000 users and these 100,000 users are using short polling so they’re heading our server every second. And along with that request they send a header that’s 871 bytes. So 871 bytes times 100,000 users per second 869 million 800 bytes per second a little over 665 megabits a second - I get bits and bytes mixed up - roughly 83 megabytes that is quite a lot of data we’re sending really we have no use for, it’s not containing any useful information, it’s not new data pushed out to the client, it’s just the same boring plate going back and for all the time. If we did the same with web sockets so had 100,000 users let’s say the client has the information once a second we still have the same frequency of updates, still sending information every second, headers 2 bytes that’s 1600(?) bytes but {inaudible} 0.2 meg. Immediately saving ourselves over 80 mega-bytes per second for 100,000 users so that’s bandwidth not only receiving ourselves but in this day and age where we have mobile internets massively over taking {inaudible} internets and people have restrictions on their secular networks and it’s costing a lot more for that band width we’re saving our users and ourselves money. So to take nothing away from this talk polling is bad web sockets is good.

Imagine going to move on now to how we’re going to use web sockets in Django and I’ll take a couple of different approaches in this talk.

The first will be a way for you to host your own kind of real-time application of the structure and the next will be how you can use some of the same tools but hand off the actual hosting of it to a cloud based solution so you don’t need to worry about the scaling and infrastructure behind it. In both instances we’re going to use a thing called swamp dragon. Swamp dragon is a fairly new Django package I think the first commit was back in march last year so slightly over a year old. Don’t let that put you off using it it’s really everything you wanted from an open source package. It has great test suite’s it has good documentation, it has easy to follow examples and the core developer and maintainer is quite ready to answer questions and feature requests to get help, yes it’s slightly immature at this point but it really is a pretty complete package already.

So, what is swamp dragon? Swamp dragon is 3 things. It’s Django. It’s tornado. And it’s Reddis. For those who know, Reddis is a key value cash or key value store. It is very similar to memcache in that it holds the data in memory so it’s very, very fast but it also writes {inaudible} disk so can be used as a persistent store as well so we get the benefits of memcache but should we need to restart server or turn it off won’t lose all our data and comes with pub sub publication subscription built in which is what we’ll use for real-time stuff. Second part is tornado, Python based, a web server, a non-blocking - uses non-blocking U curls(?) so can use up to {inaudible} connections fairly simple and lastly Django, well if you don’t know what Django is you’ll be very confused the last couple of days. {Laughter}.

So how does this pull together what is our stack?

Well we’ve got our regular Django application and it still saves to a regular database in this case postgres. What we also have is in swamp dragon. Swamp dragon sits on top of our Django application and whenever we make modification to our data it sends the data to re dis. Whenever you develop this and you’ll see examples shortly you are only working with Django, all the magic that does the real-time communication for you kind of bio directional communication via web sockets that’s done via swamp dragon you don’t need to worry about that, you write your Django applications more or less as you would any way and it takes care of the rest, so in front of Reddis we have our tornado server which is going to be creating our web sockets and they’re going - we’re going to create our subscriptions to these web sockets and browsers. We’ll be talking about web browsers solely today. It doesn’t mean it only works with web browsers, it can work with anything that understands web sockets, there is native libraries for IOS and androids, web sockets is just another protocol like http so really you can build apps that can subscribe to these services across a multitude of different platforms.

The easiest way to demonstrate how you build an application is to build an application so we’re going to build another to do app because the world doesn’t have enough of those.

To get started to do that we’re going to install swamp dragon. Swamp dragon is a Python package a simple pip install. It will install most of your dependencies as well, you’ll get tornado and things like that. What it won’t install unfortunately is re dis. You need to install Reddis you self. You can do in apt gets brew install it’s straight forward. The Reddis folks recommend you down load the {inaudible} off the Reddis site and do a stop to make sure you get an up-to-date version. The only place you’ll find problems is on windows. Reddis doesn’t support windows at all. If you are a windows developer then {laughter} I feel for you ... {laughter} if you are a windows developer you can run it in a virtual machine. There is also services such as Reddis to go, cloud based Reddis you can connect to instead but won’t be able to install it natively on OIS.

Once we have these installed swamp dragon comes with its own dragon admin. Swamp dragon admin is very similar to Django admin and has a 38 project command. I personally don’t like this approach. The 38 project will create a new project with D fault Django directory structure and I’m sure as many others here I don’t use D fault Django directory structure I have a modified directory structure I have a different way of setting up my settings . I have multiple settings depending upon environments and lots of other things you’ll probably - I’ve nicked from 2 scoops book so I don’t actually use the dragon admin but looking at what dragon admin does is, well, it creates a new Django project, we can do that ourselves that’s straight forward, adds a couple of settings in your settings file, well documented, and it creates this server dot py. Server dot py is a new file swamp dragon drops into the route of your project that controls the run server command for tornado, it’s very similar to the managed up higher run server that comes with Django except instead of starting your Django server it will start tornado server so you need to run both. It’s great for development same as the Django server. You’re not going to want to put it in production. You probably want to manage your tornado server much the same may as you manage your Django application and want to put that under supervisor or use g unicorn(?) yes you can drop it in don’t need to use swamp dragon to create it copy into side drop yourself but when you go into production you probably won’t use it anyway.

So, we have our application all installed, we have updated our settings file. Then create models. I’m not going to put the full models up here. It’s a to do app it’s pretty basic. We have a to do list model which has a title and description and then we have a to do item model which has a text {inaudible} to hold is, {inaudible} and a foreign key to our list so we can group all our to do items together.

You notice it looks very much like a regular Django model, there is nothing really strange or exciting about it. We do have this swamp dragon code that we’ve imported here. Mixing called self-publish model. What that mixing is going to do is it’s going to override the save method on our model. So whenever we create a new instance of a model, whenever we instance of our model it’s going to call this new save method and what that save method will do is take the data we’re interested from that model and it’s going to send it to Reddis so then we can push that out to anyone who subscribes to browsers via tornado.

What we also have on our model is a serialiser. For anybody who has not come across a serialiser before they’re a way to translate Python code into something that your client can understand. So, in this instance we are translating our Python objects into Json because we’re sending it to a web browser, web browser can’t understand Python objects so we convert it to Java script which it can understand. The serialiser is completely new it’s not a modification of a Django model or anything, going to create a new py file for it saw in the see serialiser we have the model serialiser and they’re kind of like our model forms in that you don’t need to use a model serialiser, you tell it your function, it’s there to make things easier because a lot of the time you’re going to be dealing directly with models but you don’t need to only serialise model - if you wanted to build an app let’s say was moderate in your server so was looking at your CPE load or at the hard drive space you had left, could serialise that data and send it. So like the models form and forms class you use a model form if you just want to do the basics model or get into forms class - same as serialisers in swamp dragon you use model serialiser if you want to serialise a model or you can dive down into the kind of bare serialisers themselves and write one for any bespoke data you may have with the model serialiser we’re going to tell it here is the model we want to serialise, tell them what fields we’re interested in publishing so our done to mark whenever we’ve completed an item and text if there is in an item and we’re going to find this up-date field. The up-date fields will see you later front end and that allows bio directional communication so we’re telling it that in the client we want to be in up-date done status, want to be the tick to do item in our browser and have that saved back into Django as well.

For anybody who has used Django rest framework in the past they’ll probably be very familiar with serialisers. DRF uses something very, very similar. It’s a bit annoying how similar they are because they can’t be used for the same purpose so you do end up with a bit of code duplication. There is however a branch currently on swamp dragon that actually adds support for DRF style serialisers so hopefully in the very near future you’ll write one set of serialisers and that will {inaudible} rest PR I via rest framework and also {inaudible} via notifications via swamp dragon.

As well as serialisers we need to create roots. Roots are very similar to views in Django. So the fact we have our object which is going to return an instance single instance of an object and we have this get query set which allows us to override the get query set method and return like our own filtered query set. In this instance whenever I am looking at a list I obviously only want to do items that are part of that list so I am overriding query set and go OK here is the list ID only give me items that correspond to that. There is a bunch of built in verbs we have our get list, also have get list, delete, create, subscribe, unsubscribe, these are similar to the allowed http methods in our class issues. In much the same way you can define your own methods as well, you’re not stuck using these verbs you can define your own and subscribe to them in the front end.

When we do subscriptions where you give it a new name so in this instance it’s called to do I am. You’ll hear me referring to roots also referring to channels depending on the software you are using almost interchangeable. Some will refer to them as roots some refer to them as channels. I’ll use both in this talk so apologies if it gets confusing but in this instance they call them roots.

On the front ends a very basic kind of to do app front end. Use this thing called twitter boot strap, anyone heard of it? I don’t know I never see it anywhere. On our template here we have our list title and list subscription then have some to do items 4 of which are marked in green as done one of which is in red still to do.

Now to make this actually function, we need to link it up to our Django code. I’m going to use angular. I know angular is no longer the hot new thing. I should have done it in react is that right is that the one that is cool this week? {Laughter} doesn’t really matter which one you use to be honest or which one you prefer. Swamp dragon is not tied to any particular framework. It does come with an angular service that’s why I use it because it’s easier for me. If you wanted to use react or backbone you are free to do, also comes with Java script service if you are retro like that so don’t worry if you don’t write angular or you prefer something else or your front end team prefers something else. I’m using this for ease at the moment. It’s not a deal breaker. You can use what you like with swamp dragon I’m not going to go {inaudible} that’s really boring just the bit that matters and that’s this small snippet that is going to control that template we saw.

So we have our to do list name, description then have this loop that’s going to run through our list of to do items and it’s going to put them on to the page as well.

We are wrapping it in these verbatim tags, the reason for that is angular, also uses the double braces syntax for variable names that Django uses, without them Django would try and replace our angular variables then it would disappear, we don’t want that to happen, so we have wrapped it up.

Okay, I know it is way too small for people to see, we will go through it. Subscribe to your channel, subscribing to the channel is how we specify that we want to receive any updates on this route. So in this case, the to do. We are only really interested in to do items that belong to the first list, I have hard coded the list of one in here, if you are going the do this real, you wouldn’t hard code the ID you would have some way for users to select which list you are interested in. Also worth noting that the query syntax for swamp dragon is similar to Djangos, pre-set filters syntax, the reason the double one to specify a property on a foreign key.

We have our get single, these are going to run when the page first loads, this is to make sure when you first arrive at the page it is not empty and blank. We get our initial to do list and items. Only interested in the first list, get single an ID of one., then get list, all the items in the database list and pop on the page for the first time. The list idea overflow clone one, but wouldn’t do that in real life. Etc. etc.

So quick one back. So quick run back, subscribe

Get a list of our to do items, really thought better of naming these lists, pre-set lists and Python lists, when you call lists, the to do item lists, that will populate our templates.

Okay so. This is a bit of real-time where it comes in. We have the on channel message it is going the get called any time the server has new information for our clients, anymore new information for the browser, call this on channel message funs.

So we can check if it is the channel we are interest. ... The to do items, we apply the data mapper, look to see is this a new item, there is an existing item already with the ID, if there is, let’s add that one, we don’t want duplicate to do items.

Then I am nowhere brave enough to do a real-time demo on conference Wi-Fi, we will have a video, this is a real application, I do have it on my lap top. I can prove it does work, I am more than happy to do that away from the pressures that is the stage. Come and see me.

Here we have a chrome window, a fire fox window and a safari, then beneath the Django admin. As I go through, I am updating stuff in the Django admin, changing the status to done, immediately changes in all three browsers, normally it will change in the browsers before my Django admin page even finishes refreshing, obviously going the be running quick because it is running on my local machine, but when you are working on it remotely, because the connection is always open and ready to have information pushed down, almost as quick when you are working with remote servers as well. Notice I am able to update stuff in the actual browser and that is then sending that information back to Django, so it is bidirectional, not that we are pushing out from Django, but the clients can also send information back. We won’t see the updates automatically in Django, in the same way, but it is saving it back, so if I refreshed that admin screen you will have seen all the changes I made were reflected in the admin screen.

All the code for this is on, have a look, play around, put it to react, backbone or whatever you want., if you do mess with your urls’s, don’t worry, I will tweak them.

So in the first example we were introducing additional infrastructure, what is your normal Django stack, I don’t know about anybody else, real-time, I didn’t have tornado running, I had install just for swamp dragon, so it is adding additional complexity and more things to scale. I have ... I don’t want to have to manage anything else.

I have much better things to be doing.

So we look at the platforms of services. There is a couple of different ones out there, fire base, we have pub nub, we have pusher, some of the sponsors to have day. There is a few database as a service out there, there is not a huge difference between people doing pub nub and database as a service, most concerned with data sink, making sure you have lots of read reply cars and the ability to sync across platforms, the other ones pub nub and pusher and ones you mentioned earlier are more interested in the pub sub part of it.

There is an interesting aside, when researching the different services out there. One of the ones is called data Fly, I don’t know how many work in corporates or enterprise – good luck putting a purchase order through for data Fly.

Phil one of the pusher guys you may have seen earlier, done a really good Blog post, fairly non-biased considering he works for pusher on getting a. ... , don’t take my word for which platform to use, read that, that should help you make your decision.

We will be using pusher today, it is pt1 I am most familiar with, it is kind of good for me up here because they do have this debug console, makes it better.

So how does pusher fit into our application stack that we had earlier?

So we are going the get rid of Reddis, going to keep pusher, we are going to keep swamp dragon but reduce the amount we use. The idea is we don’t want to have to rewrite the application that we started with earlier, we wanted to keep as much of the swamp dragon stuff we can, swap out the data store, don’t want to have to start off in swamp dragon and then hit scale and put in one of the other services and rewrite the application. Nobody wants to do that. Try to look at how we can keep much of the same but swap out what is it is actually doing the pub sub.

So now we are in a, ... how do we publish our information? Pusher has got a Python library, takes a pusher.trigger, send to channel. This is what I was talking of earlier, now we are switching to using channel to confuse everybody. We have a channel name and event, that is very much like the verbs we had earlier, so things like our updated created, deleted and then we have a payload, serialised model. Yes, that is my API key and secret key, yes, I did revoke them before I stepped on stage, better luck next time (LAUGHTER).

Okay, so how can we end up inviting pusher and swamp dragon, it sends our data to Reddis is our self-published model. That mix in at the start to be included in the models. We are going the rewrite the self-published model. So, we are going to keep our serialisers the same, they will still be the swamp dragon, we don’t need to modify them in any way, but instead of sending the information to Reddis and then on to tornado, send it to pusher. Here you can see I have just used the same pusher code we saw a couple of slides back, sending it there rather than to Reddis, this is the way, I wrote both the applications, the swamp dragon first and then challenged to do the pusher application using few edits than I can.

This is the get dif. Most of it was deleting stuff. I didn’t need the server.py. I no locker needed the, no longer needed the routers, that is by pusher, I added some code to do the self-published model that we saw a sect ago. Again using angular, I don’t want to change, I am trying not to change as much code as I can., pusher has an angular library, it worked out well.

So this is our new angular code, I have had to make modifications here, because the pusher angular service doesn’t map one to one with the swamp dragon one, we are now creating a pusher client, but we are still subscribing in much the same way. Subscribe to do item channel, then written a bit of code, updated to that same list we saw.

If I was to take this further, which is probably little bit out of scope of this, it would be java script heavy – we would look at the angular service and ensures it has a one to one map with the, the swamp dragon and our pusher angular service and one to one map. The data we are saving through and the methods are the same. Then we can code the exact same.

Okay, another demo, so this time it actually isn’t always going the be running on my local machine, sending information to the server and then going the be seeing that come back to the browser window, so I have the two browser windows on the left, then the machine, safari, I think fire fox, then the browser in the right is the debug console I mentioned briefly. The one on the right is running on pusher servers, the two on the left are my local machine, the admin is on my local machine, that is where I create the information and send it up.

As I create in the Django admin, you will see that on the pusher debug, you also roughly the same time see it in the two left hand browsers so you see it is being pushed back down the local machine.

That will work for all the same things as we had earlier, I will be able to create new to do items and edit, I will be able to delete to do items.

Code for that is also up on get hub, have a look, have a look at the dif, make sure I wasn’t lying to you and give me any feedback you have got.

So the final bit I want to talk about, so far we have been talking about how you serialise model data, for a lot of the modification stuff, you might not want to send the whole model. This is a logo I am working with the council up in Glasgow, a data portal, a way for the council to open up the data they currently have and make it available to everybody, it is a pretty good product. Data from everything from bicycle rack locations through to school truancy rates and things like congestion levels and pollution and all the other stuff that should be able to us as, as like, members of that society or of that council area.

When I am working on the application, there is a lot of the data is automatically harvested from coup sill systems, a lot of the data can’t be, the council systems are not exactly up to date. Not a lot of API’s, available., it has to be manually entered a lot of the time by a team of data entry people. We found sometimes if you have got more than one person working on the same file or data resource, we can run into issues, we have our user one, two opens the source, user one saves the changes, user one changes vanish, that was the end goal to have some kind of collaborative editing, like Google docks style, but this is government work we don’t always have the biggest budgets. Also it is not a problem that really occurred that often because we don’t want to spend the developer time or the budget on it. A service called help Skype has a unique way of dealing with this. An icon that goes blue, red, it is telling you that somebody else accessed the thread and somebody is reading it and replying to it. That is all we needed. Needed to know when somebody else was working on the data set. Three desks over tell them, please don’t do that. The easiest way to drop in the pusher code we saw earlier into our view, have it send an event r vent and the user name so we knew who was using it. Not really pythonic or dry, I have put a mix in, the response, this is published in a, about a week ago, some really good feedback on it. One of the things we are discussing, where is the best place to put the mix. If it is updating something, should it override? There is a big discussion I would love peoples feedback.

We are only really interested in updates about a particular instance of a model. We don’t want to know when somebody is viewing any model. We want to know when they are viewing that instance. To achieve that per instances. So each channel gets a unique name to subscribe to in the front end, that is the model name plus the primary came. The data is the person’s name, who else is editing this? Doesn’t necessarily need to be something basic as that, we could use serialisers we saw previously, there is no reason not to. This could be any kind of data, using the same publishing methods that we did before.

On the front end, I haven’t used angular, because it is small. Created a pusher client instance with an API key, subscribing the channel, the objects, this will only work on pages such as detailed views or update views that we have the object in the current context.

Yes, you can see I am kind of specified that there. I don’t have a fancy video in action, it is still in development at work. I have wrapped this up in a Python package that is installable now from py.py. It is a few mix ins from your, creates update, delete, also the java script code we saw there, is all wrapped together in a nice template tags so really simple to use and also up on get hub. So this is a pull request going on, fortunate one of the kind of core developers of braces has been giving us some great feedback, I don’t know anybody better to give feedback on mixings than anyone involved in Django braces, I am pleased to see that.

It is on get hub. I would love people to get involved in the conversation around that and the arguments where we should override the different types of events links there.

So, that is it really to be honest. Any questions?

(APPLAUSE).

FROM THE FLOOR: How do you handle log in authentication and access control to the data especially when you are using a service like pusher?

AARON BASSETT: Probably not the best, those guys, the swamp dragon stuff does have things for authentication as well, you can authenticate users before they subscribe to it. With the pusher I know they do have private channels which can also be encrypted as well, I don’t know the ins and outs of that I am afraid.

FROM THE FLOOR: So when you push back the update from the client so when you click down on the to do item, it goes back query {inaudible} direction into the Reddis {inaudible} and from there reaches Django how?

AARON BASSETT: So although it’s coming in via tornado then into Reddis we still have that swamp dragon layer and that’s getting from tornado and that’s back saving into your Django model.

NEW SPEAKER: So swamp dragon basically runs a listener thread where it listens for incoming connections from Reddis?

AARON BASSETT: Swamp dragon is integrated into tornado, that’s why you see the run server for tornado in swamp dragon so it actually handles that side of it for you as well, as long as you are using the swamp dragon way of getting tornado up and running then it should handle that as well.

NEW SPEAKER: OK thanks.

NEW SPEAKER: That’s great, can we thank Aaron again. {Applause}.