Monday, 14 February 2011

Managing the Good and Bad Risks

Like any software project, this one involves a number of significant risks. Here's what we think the main risks are, and how we're trying to guard against them:

Hiring difficulties

Since planning this project, I've lost my developer, Zbigniew Lukasiak, who returned to Poland to work for Opera. JISC didn't give me much headway to hire someone else, so now I'm scrambling to find a developer at the last minute. There's a risk that this will take longer than it should, especially since in the current climate the University of London does not like to hire, even using external funds.

The good news is that the process is well on its way, we're able to pay a competitive rate even for the City, and we're able to hire for more than the project's duration thanks to the funds we've got from SAS. In the worst case the project will only have been 'shifted right' by two weeks on the calendar because of this.

All kinds of delays

Hiring difficulties are just one way that a software project can be delayed. Anything else from corrupt data to bugs in third-party software to bad design decisions can delay a project massively.  We're managing the general risk of delay by breaking down our outputs into small and prioritized packages, which is really the core idea behind Agile.

Take-up

The other big problem Agile tries to address is lack of output take-up. How can we make sure that PhilEvents and xEvents are big successes? First, we'll focus on PhilEvents, because that's our main output, and its success is going to bring the success of xEvents in its wake. The general strategy to insure that PhilEvents is a success is a) to make sure our target users like it and b) to make sure our target users get to try it. The art of achieving (a) is the art of software development in general - I won't delve into that. Let us say simply that it's a matter of knowing what the user needs and wants the most, and of translating that into a working piece of software. Obviously, the Devil's in the details. As for (b), we're lucky that we can rely on the fact that almost all our target users are already users of PhilPapers, and on top of that we already have a proven marketing formula to reach these people (it worked with PhilPapers after all).

Excessive success

My title for this post alludes to 'Good' risks. That's mainly the risk of excessive success - remember Twitter's serious technical problems about a year ago. We're not very worried about that because we're not really targeting the general public. At least as far as PhilEvents goes, we know the size of the target community (in the order of 100,000 people worldwide), and we know that we can serve a community of that size with relatively modest means. The only worry is if some much larger community started using xEvents, or lots of small communities.

At this stage, to address this risk satisfactorily, all we need to do is respect some basic principles of scalable web site design (principles that I learned to respect after infringing on them with PhilPapers):

Design for caching

PhilPapers' entry listings perfectly illustrate the challenge here. Whenever a paper is displayed, information from about 5-6 other database tables is displayed as well. This makes each entry expensive to render, which makes it natural to try to cache the HTML rendering of each entry. The problem is that a little bookmark checkbox sits squat into the middle of each record. This is user-specific, so can't be cached.

The best solution to that problem is what I call the 2nd Order Template pattern: the output of your template that generates a record is not HTML but another template to which you apply user-specific data. This way the output of the first template can be cached. Sometimes of course it's easier to segregate generic and user-specific content than to use 2nd order templates. Another way to preserve cacheability is to inject user-specific content through Ajax, which is typically the best choice when the content is not visible by default (the 'My bibliography' menu on PP works like that). Either way, one has to think about keeping the expensive content cacheable.

Use memcached

It's tempting to prefer a local in-memory cache (like the one provided by the Cache::FastMmap module in Perl) because that's more performant and easier to set up in a one-machine configuration. But this kind of caching makes it very hard to scale to several servers. Best to go with a caching solution that scales seamlessly to a multi-server set up.

Design for a master-slave setup if possible

It's a lot easier to scale to a multi-server setup (including multiple database servers) when you can restrict writes to a subset of your users small enough to be handled by one server. This could simply mean restricting writes to people who are signed in (as on PhilPapers). Provided your app meets that condition, you can use session affinity to make sure signed in users are routed to the master server, and you can multiply slave servers as needed to handle the large amount of anonymous traffic you get.

3 comments:

  1. Ah, hiring risk is often listed as unlikely but high consequences. We can see that it does happen though, and I'm glad you seem to have it under control.

    ReplyDelete
  2. Really appreciated for this effort but as a student I want to raise an issue on students assignments I just came a cross that most of the student facing a problem in their assignment and they required some reasonable help so everyone can afford it. I just want to recommend Get Help with Assignments, they are the best in town and in very affordable amount.

    ReplyDelete
  3. Okay!
    Employing danger is regularly recorded as improbable yet high results . We can witness that it does however, and I'm happy you appear to have it under control and day by day improving continuously like.
    Regards,
    Academic Paper Genius.

    ReplyDelete

 
Copyright David Bourget and University of London, 2011. This blog's content is license under the Attribution-ShareAlike license.