Friday, 11 November 2011

PhilEvents: Events in Philosophy

We're very pleased to announce the official launch of PhilEvents, a new kind of calendar for academic events in philosophy targeted at researchers and graduate students.

PhilEvents is a one-stop shop to keep up with all upcoming events in philosophy, from talks to calls for papers and conference announcements. What makes it special is that it is both geo- and theme-aware: it allows academics to see just the events that are either close enough to them geographically or sufficiently important for their research to warrant the trip. This makes PhilEvents orders of magnitude more convenient than keeping up with mailing lists or sifting through ordinary event calendars.

The main way to use PhilEvents is through its "Upcoming Events" page, which looks like this:



Notice that it knows where I'm browsing from, as well as the research topics I'm interested in:

Another crucial feature of this page is what I like to call the "laziness level selector." That's the part of the page where the user tells PhilEvents how far he or she is willing to go to attend various events:

That's what allows PhilEvents to present just the right events below.

PhilEvents has many other important features not apparent here, for example:
  • It sends off email alerts of upcoming events based on criteria that parallel those of the upcoming events page.
  • It allows anyone to download its events in a variety of formats, for example, CSV and iCal.
  • It supports embedded widgets similar to Google Gadgets, which institutions or individuals can use to embed events on their web pages. This is handy to maintain a calendar in sync with PhilEvents.
  • It supports attaching recordings and live streams of events to event announcements.
PhilEvents itself is not a redistributable product, but its data are available under a Creative Commons Attribution-ShareAlike license. The School of Advanced Study is also making available a Blogger-like platform for other organisations to operate calendars like PhilEvents: xEvents. xEvents is still in beta, but anyone is welcome to try it immediately. The School of Advanced Study is not only committed to supporting PhilEvents and xEvents over the coming years, but it will continue active development on these products over the coming months in order to improve them in response to user feedback.

I'd like to take this opportunity to thank the whole project team for their great work and enthusiasm throughout the project. Thanks Vithun (programming and graphics design), Prabhu (programming), Chrissy (content management), Lee (user rep), Jean-Philippe (graphics design), Martin (user rep), Dave (user rep), Shahrar (administration), Valerie (administration), and Barry (direction). We're also grateful for the support provided by JISC's Geospatial programme and the Dean's Development Fund at the School of Advanced Study.

Table of contents for project posts

Project plan










Wednesday, 2 November 2011

MySQL full-text and spatial search in Grails

For the advanced search in our application, we had to combine full-text and spatial matches within a single query. Since we were using Grails, and in-turn hibernate, we needed to have a MySQL dialect which would allow us to write such a query. This blog post highlights how this functionality was implemented.

Friday, 16 September 2011

The xEvents production server has arrived


2 x 6-core Xeon processors with hyperthreading, for a total of 24 effective CPU cores

48 Gigs of RAM
128GB Intel C300 SSD
2 x 2 TB Seagate ES2 disks

Wednesday, 24 August 2011

Matching IP addresses, cities, geolocations


On xEvents, users search events in part based on their locations. Their locations are specified by submitting users as postal addresses. In general, all we know about our visitors are their IP addresses or the city where they live. Given these basic data, how do we enable users to easily find events taking place near them?

First we have to make sure that the cities of users and events can be cross-referenced. To achieve this we require users to pick their cities or the cities of their events from a list of cities derived from the database available at geonames.org. This is a free database of geographic landmarks (including towns of all sizes) which comes with latitudes and longitudes. Given these constraints, we're able to calculate distances between all our events and all the users who have specified their city in their profiles.

Things get more complicated when we want to offer a good default city for a newly registered user, or just to guess the city of an unregistered user to provide a good search default. For that we have to rely on a database which matches IP ranges with city names and latitudes/longitudes. After some research we've settled on the IPLigence database because it was the cheapest that seemed to have all the features we needed.

Unfortunately, it's not good enough for us to get the latitude and longitude of an anonymous user / IP address: we need to know which city they are in, because some of our searches have criteria like “same city”. So we've had to somehow align the city labels provided by IPLigence with those we use for events and registered users. (We couldn't use the IPLigence labels everywhere because the formatting of city names is not good in their data. They also count as cities many administrative regions which are not, for example, many London boroughs are represented as cities in this database. Geonames is much better organised in this regard.)

So we've had to match the city labels in the IPLigence database with the city labels in the Geonames database. This turned out to be more complicated than initially expected. The general algorithm is to take every IPLigence city label and find the best match in the Geonames database. Here's how we find the best match. First we look to see if there is only one entry in Geonames with the same country code and city name. If so, problem solved, we've got our best match. If there is no match, we abandon that city label (this is very rare). What is common and problematic is multiple matches. In this case, we pick the nearest match based on the coordinates that we have both in the IPLigence and the Geonames database.

We will follow up soon with another post about how to do geographic distance calculations with MySQL and Hibernate efficiently.  

Tuesday, 31 May 2011

Grails Image Crop Using Jcrop

A user in an X-Events hosted site should be able to upload his picture and use it as a profile picture. Also, the uploaded picture has to be cropped to a certain resolution before it is used.

You might have seen many websites nowadays have a profile picture. And when you upload a profile picture, you are usually prompted to crop the picture (sometimes with certain forced resolution), and it is this cropped picture that is used from then onwards. I was required to implement a similar functionality here. I decided to use Jcrop for the JavaScript side of the cropping mechanism as it was based on jQuery and we were already using a bit of jQuery in our application. Here I illustrate how I did the job.

Multi-Tenant Architecture

As SaaS (Software as a service) takes big steps in these days, lots of new methods or rather less used methods are getting more attention. Making an application truly Multi-tenant is one of the key feature in SaaS. In xEvents, we made the application multi-tenant capable in the initial phase of the project to reduce incremental time and cost factor if we converting into multi-tenant based application in the later stage. We had lots of discussions going back and forth to make right design decision for implementing the multi-tenant enabled platform which is more suitable for xEvents application. We thought it is worth to post here on this topic.

Possible Design solution: One of the Key factor while designing the multi-tenant design is the database design. As we will be intent to store all the tenant data in our server and mostly same set of application code will be accessing different tenant data, designing suitable database structure turned out to be a tricky call. In a broad category, we can implement in 3 different ways:

1. Isolated database for each tenant
2. Isolated schema for each tenant
3. Shared schema for all tenants

Though i have stated three different individual type, it is a continuum spectrum. Most of the time we need to choose a mixed approach (for e.g. sharing few tables/schemes across the tenant and also few separate table per tenant). Meaning, choosing the correct type is fully depends on type of application.


(source: http://i.msdn.microsoft.com/dynimg/IC150005.gif )


Few worth noting factors while making the design decision : No. of Tenants, customization required per tenant, Scalability, database size / maintenance cost, No. of users per tenant, data privacy/isolation issues. This[1] article clearly listed the pros and cons with a detailed analysis.


Multi-tenant grails plugin: Once we are clear with the design in our mind and having grails powered by plugins, now we have to pick-up the right plugin and fine-tune the configuration as per the our requirement. Multi-tenant plugin[2] provides out of box implementation for multi tenant based applications. Though, this is not clearly (rather fully) documented, this page[2] will give you a very good idea to start with and about how this plugin works. Here is our few cents:

1. Although, this plugin suppose single database per tenant, it is little immature as of now[June,2011].
2. There is another version of this plugin[3] (using Hawk Eventing and Hibernate Hijacker) to support single db per tenant. But this is still in the experiment mode.
3. This plugin have incompatibility issue with searchable plugin. we tried few suggestions given in forums to fix this issue, but in vain. (Though, i am not sure of the fact, but this might be because both the plugins modify the datasouce in the beginning or because of the subscription to the hibernate events).
4. Need to be careful about having common pages like admin page which will access data across the tenant. It is not available out of box from this plugin.


[1] http://msdn.microsoft.com/en-us/library/aa479086.aspx
[2] http://multi-tenant.github.com/grails-multi-tenant-core/guide/index.html
[3] http://multi-tenant.github.com/grails-multi-tenant-single-db/docs/v0.7.2/guide/

Custom Password Encoder for Grails Spring-Security Plugin

The spring-security (core and others) plugin is very handy for incorporating user (and role) based functionalities for a grails application. The plugin comes with a lot of features out-of-the-box. And more often than not, some of the basic features can be used as is. However, there are occasions when a little bit of customization might be required.

In this project, I was required to use a custom password encryption algorithm. The spring-security plugin uses the 'SHA-256' algorithm by default. This can be changed to use other standard algorithms (MD2, MD5 etc.) by adding the following lines (if, for example, MD5 encoding is required) in /grails-app/conf/Config.groovy:
grails.plugins.springsecurity.password.algorithm="MD5"
But what I wanted was to use a custom algorithm of our own, not the standard ones. Fortunately, this is again, very easy. Because of Spring's dependency injection, we can easily create our own password encoder and inject it. The steps to do the same are explained here.

Multiple 'MailSender's in grails-mail Plugin

In the project, there was a use-case where we have to use different SMTP configurations for sending mails from different user accounts. To elaborate, each user in the system can create a 'site' which sends notification mails using custom mailing configuration provided by the user. I had a look at the grails-mail plugin, but found out that it currently did not support multiple MailSenders.

So I modified the plugin and added a new method in which we could specify a custom MailSender. This blog post contains information on how I did it.

Perl, Ruby, Java, Groovy, Python?

xEvents is now well on its way, with major components being added to the system at each iteration. Vithun and Prabhu will soon be posting about interesting technical challenges we've encountered so far and are grappling with now. For my part I'd like to say something about our choice of Groovy and Grails as platform.

PhilPapers, our flagship product at the Institute, is mostly written in Perl. It's made up of kilometres of Perl code which I know inside out. At this stage I'm constantly reusing tools from this code base, and I feel I know every little quirk of the stack PhilPapers is built on --MySQL, Apache, HTML::Mason, FastCGI. These are compelling reasons to stick with a platform, yet I decided to jump ships. There are two main reasons for this:

1. Labour shortage. Perl's title as 'glue of the web' is long lost and the population of Perl mongers appears to be dwindling. Sure, there are still large systems running on Perl (I gather it's in widespread use at Amazon and BBC), but it's not a platform new entrants to the profession choose, and, realistically, junior programmers are often all that I can afford, so Perl is not a good choice for me. Last time I had to hire a Perl programmer, I very nearly failed.

2. Archaic tooling. My love relationship with Vi goes back about 15 years, and I hate lifting my hands from the keyboard to click, so I'm not referring to those colourful IDEs with eye-busting white backgrounds. I'm talking about things like HtmlUnit, a Java headless browser (that's right, a headless browser which runs Javascript et al, not a mere HTTP client + HTML parser). I'm also thinking about Hibernate, an ORM that's so far ahead of the Perl equivalents (DBIx, Rose::DB::Object) I feel like I've wasted years of my life working around the latter's flaws.

Another factor that entered my decision is that one day we'll need to rewrite PhilPapers yet again (hopefully not before 2020) to make it Web 5.0 and incorporate all the latest conveniences of modern life. When that day comes it will be great to be at ease with the state-of-the-art platform. This is also part of why I've chosen Groovy+Grails as my next platform.

Groovy is a language that compiles into Java bytecode. From .groovy files you generate .class files which can be used exactly like .class files compiled from .java files. You can even mix Groovy and Java classes in your programs (using either Java from Groovy or Groovy from Java). In fact when you code in Groovy you use Java libraries all the time because that's a big part of the attraction of the platform: the API is a superset of Java's vast API ecosystem. Grails is an application framework similar to Rails for Ruby. It provides all the main benefits of Rails, plus this: it's an extension of Spring, arguably the most battle-tested and slickest application framework there is.

Many of the virtues of the Groovy+Grails combination are obvious from the above. I won't say much more on this because the Grails web site already has a good sales pitch. The only thing I'd like to add is that Groovy as a programming language is perfect (for me). Its C-style curvy brackets make it feel homely to someone who's spent most of his waking adult life starring at Perl, C, or Java code. Its syntax feels maximally clean and concise, but not obscure like Perl's. It's incredibly predictable. It has all the things that Java misses, closures and scripts in particular. It also has the things that Perl misses, for example, proper OO, without missing any of the cool Perl features which make it so well suited to Web development and text processing (e.g. string interpolation, compact regular expression handling). Its dynamic typing system is right on the sweet spot between paranoia and helpfulness.

Barring any major social or economic obstacles (I'm a little worried about Oracle trying too hard to monetize the JVM), I predict that Groovy+Grails is going to steadily gain in popularity over the coming years, because it's a winner on all counts from a technical point of view. As for finding Groovy+Grails developers, any Java programmer can be converted into a Groovy+Grails programmer within a few weeks, because the syntax is incredibly easy to learn and the APIs are basically Java's. I predict a stream of would-be converts for the foreseeable future.

Sunday, 6 March 2011

Big data versus Great data

JISC has commissioned a long report on the latest Strata (Big data) conference

Big Data is certainly Big Business, but there's also value in Great Data. 

Take Facebook. The most valuable aspect of Facebook's data is the social graph, not all the gossip and pictures users have posted to date (okay, there's value in that too, but the graph is the key). The graph isn't so big. They say they have 500M active users. Let's allow 10 kilobytes per user to store their name, email, and friend list. That's only 5 terabytes. Conclusion: the most valuable dataset in the world fits on a (high-end) desktop computer. 

I believe in Great Data: information that's highly useful even if not voluminous. And Great Data that is not voluminous is even better than Big Great Data. xEvents is a Small Great Data project. 



Tuesday, 15 February 2011

The budget

This project's budget breaks down into five spending areas:


The big alarming orange slice (indirect costs) is a standard charge required by the University of London to cover miscellaneous expenses it incurs - indeed, there is a large and powerful administration supporting us. The yellow slice (estates) is likewise a standard charge imposed by the University based on the number of FTEs (Full-Time Equivalents) involved in the project. It's essentially the rent we pay. This is an area where the University actually undercharges (did you know that Bloomsbury is more expensive than Chelsea?).

The blue slice (Staff) is the one that does the actual work. It's designed to provide yours truly and his staff with a competitive salary so that the project gets done enthusiastically.

The green slice is also very important. It might seem a bit high, but we needed to modernize our PCs for this project, and we need a beefy server to have the capability to host several PhilEvents-like sites. In actual fact, the server is going to cost about £6000, leaving us with pretty modest PCs. Since I need a Mac now, and they're absurdly priced in the UK, I'm actually going to have to wait for the machine to do my bidding sometimes.

The red slice is the party slice, that is, it corresponds to the amount of money the project management is expected to spend out of its pocket buying drinks around the various JISC events this money will buy train tickets to.

Spending of these funds is managed by the Project Manager with the oversight (and veto power) of everyone above him in the University's hierarchy.

Projected Timeline, Workplan & Overall Project Methodology

Timeline

Our timeline for this project is summarized by this Gantt chart:














Methodology


We are accustomed to a methodology similar to eXtreme Programming, with a bit less peer-programming and a bit more upfront specs. This methodology suits this project well, as it is the outcome of much reflection, probing, and design on the part of the project management, which happens to be from the academic milieu and well positioned to know the end user's needs. Initial specs will be designed covering slightly more detailed user stories than usual (a bit more like traditional use cases) and the key screens in some detail. But these specs will be discussed among the customer reps (including the product manager) in a series of on-paper demos. Then use cases / user stories will be implemented in 1-week or 2-week sprints using peer programming for the trickiest parts. The rest of our methodology is pretty much standard XP.


Workpackages

Management: On-going project management tasks such as keeping track of progress, organising meetings with customer reps, preparing documentation, participating in relevant JISC activities. (Project Manager [DB])

Release planning: Initial gathering of user stories, estimation of effort (Customer reps, Product Manager).

Development: Iterative development of features to fulfil the user stories (Programmers).

Dissemination: Taking care of the project's public representation; participation in relevant JISC events (Project Manager).

Community: Community synthesis project, DevCSI (Project Manager, Programmers).

Evaluation: Evaluation of the project (Project Manager, Product Manager).

Monday, 14 February 2011

The Project Team


David Bourget: Project Manager, Product Manager, Programmer
David serves both as project manager and as product manager -- he manages the release plan with the help of the customer reps included below. David also contributes to the coding effort and contributes most of the formal project documentation.


Vithun Kumar: Programmer
Responsible for implementing user stories, though this being a small project he will perform all roles to some degree. 


Prabhu Seerangan: Programmer
Responsible for implementing user stories, though this being a small project he will perform all roles to some degree. 


Jean-Philippe Cote: Graphics Designer


Chrissy Meijns: Content manager


Lee Walters: Customer rep (philosophy)
Lee is our appointed customer representative for philosophy. His function is to 'negotiate' the release plan with the product manager and provide regular feedbacks on feature demos.

Martin Steer: Customer rep (non-philosophy)











The IPR Question

It has been decided that this project's outputs will be licensed as follows:

Managing the Good and Bad Risks

Like any software project, this one involves a number of significant risks. Here's what we think the main risks are, and how we're trying to guard against them:

Hiring difficulties

Since planning this project, I've lost my developer, Zbigniew Lukasiak, who returned to Poland to work for Opera. JISC didn't give me much headway to hire someone else, so now I'm scrambling to find a developer at the last minute. There's a risk that this will take longer than it should, especially since in the current climate the University of London does not like to hire, even using external funds.

The good news is that the process is well on its way, we're able to pay a competitive rate even for the City, and we're able to hire for more than the project's duration thanks to the funds we've got from SAS. In the worst case the project will only have been 'shifted right' by two weeks on the calendar because of this.

All kinds of delays

Hiring difficulties are just one way that a software project can be delayed. Anything else from corrupt data to bugs in third-party software to bad design decisions can delay a project massively.  We're managing the general risk of delay by breaking down our outputs into small and prioritized packages, which is really the core idea behind Agile.

Take-up

The other big problem Agile tries to address is lack of output take-up. How can we make sure that PhilEvents and xEvents are big successes? First, we'll focus on PhilEvents, because that's our main output, and its success is going to bring the success of xEvents in its wake. The general strategy to insure that PhilEvents is a success is a) to make sure our target users like it and b) to make sure our target users get to try it. The art of achieving (a) is the art of software development in general - I won't delve into that. Let us say simply that it's a matter of knowing what the user needs and wants the most, and of translating that into a working piece of software. Obviously, the Devil's in the details. As for (b), we're lucky that we can rely on the fact that almost all our target users are already users of PhilPapers, and on top of that we already have a proven marketing formula to reach these people (it worked with PhilPapers after all).

Excessive success

My title for this post alludes to 'Good' risks. That's mainly the risk of excessive success - remember Twitter's serious technical problems about a year ago. We're not very worried about that because we're not really targeting the general public. At least as far as PhilEvents goes, we know the size of the target community (in the order of 100,000 people worldwide), and we know that we can serve a community of that size with relatively modest means. The only worry is if some much larger community started using xEvents, or lots of small communities.

At this stage, to address this risk satisfactorily, all we need to do is respect some basic principles of scalable web site design (principles that I learned to respect after infringing on them with PhilPapers):

Design for caching

PhilPapers' entry listings perfectly illustrate the challenge here. Whenever a paper is displayed, information from about 5-6 other database tables is displayed as well. This makes each entry expensive to render, which makes it natural to try to cache the HTML rendering of each entry. The problem is that a little bookmark checkbox sits squat into the middle of each record. This is user-specific, so can't be cached.

The best solution to that problem is what I call the 2nd Order Template pattern: the output of your template that generates a record is not HTML but another template to which you apply user-specific data. This way the output of the first template can be cached. Sometimes of course it's easier to segregate generic and user-specific content than to use 2nd order templates. Another way to preserve cacheability is to inject user-specific content through Ajax, which is typically the best choice when the content is not visible by default (the 'My bibliography' menu on PP works like that). Either way, one has to think about keeping the expensive content cacheable.

Use memcached

It's tempting to prefer a local in-memory cache (like the one provided by the Cache::FastMmap module in Perl) because that's more performant and easier to set up in a one-machine configuration. But this kind of caching makes it very hard to scale to several servers. Best to go with a caching solution that scales seamlessly to a multi-server set up.

Design for a master-slave setup if possible

It's a lot easier to scale to a multi-server setup (including multiple database servers) when you can restrict writes to a subset of your users small enough to be handled by one server. This could simply mean restricting writes to people who are signed in (as on PhilPapers). Provided your app meets that condition, you can use session affinity to make sure signed in users are routed to the master server, and you can multiply slave servers as needed to handle the large amount of anonymous traffic you get.

What to expect from PhilEvents / xEvents

The academic community can expect four main benefits from this project:
  1. PhilEvents will greatly increase awareness of academic events in philosophy both in the UK and abroad, which should lead to an increase in participation and more value for money in the field.
  2. PhilEvents will enable a new level of analysis of research trends in the discipline. A better grasp of current research trends will help researchers and event organisers determine what areas of research need more (or less) attention. 
  3. By providing a suitable dissemination channel, PhilEvents will increase the creation and consumption of videocasts and podcasts of research events, which will increase the impact of research outputs. 
  4. xEvents will empower other communities to support similar services in a cost-efficient manner.
Of course, there's also something in it for the Institute of Philosophy (and, indirectly, the School of Advanced Study): this project will help the Institute fulfill its unique research facilitation and promotion mission.

The xEvents / PhilEvents Project - Overview

Project Rationale

Conferences, workshops, and talks constitute one of the main channels of communication for research outputs. Researchers must systematically keep track of events taking place in their fields for three main reasons:
  1. Because they must attend certain events in person.
  2. Because they must watch broadcasts or recordings of certain events when available. 
  3. Because they must remain aware of the latest research trends and developments.
It is particularly difficult for researchers to consistently monitor all events they might be interested in attending (i.e. to monitor events for purpose 1). To do this, one must consistently monitor both events of general interest in the field taking place in one's region and major specialized events occurring around the world. For example, a specialist in metaphysics (a branch of philosophy) with some interest in other areas of research might want to follow all events occurring in her city (irrespective of the topic), but only want to know about events taking place outside of her city if they have a significant metaphysics component. By and large, one wants to attend to an event just in case it is interesting enough or taking place close enough. In many fields, monitoring upcoming academic events based on such criteria is currently impractical because there are no comprehensive, adequately organised listings of events combining geospatial information with sufficiently detailed thematic information.

The lack of adequate tools to keep track of academic events leads to inefficiencies. For example, sometimes the same individual will be invited to give the same paper in different departments in the same city--each time with expenses paid by host the department. This happens because members of department A are unaware of what is happening in department B.

While our focus with this project is primarily to improve access to information about academic events to fulfil purpose 1 above, we will by the same token address a growing need for infrastructure to assist event discovery for purposes 2 and 3. Good indexes of research-grade audiovisual content are lacking in most disciplines, as most videocasting and podcasting repositories which cater to the higher education and research sector focus on teaching material and are insufficiently structured to enable efficient access by research topic. This is the case of YouTube EDU and iTunesU in particular. Good tools for surveying recent events for the purposes of forming a view of the latest trends and developments in one's field (purpose 3) are also generally lacking.

Overall Aim

The overarching aim of this project is to facilitate and improve research through a better coordination and dissemination of information about academic events. This will be made possible by enriching conventional event descriptions with geospatial information and making the resulting data available both directly to end users through convenient interfaces and in interoperable formats to enable third-party applications

Products and Objectives

We will create, maintain, and support two related services: xEvents and PhilEvents. xEvents will be a hosted online service (not unlike Blogger) to build and maintain subject-centric and geo-aware calendars that assist end users in event discovery for the three purposes identified above under the heading of 'Rationale'; PhilEvents will be one such service covering events in philosophy.

Researchers will use xEvents-powered calendars (including PhilEvents) mainly to a) monitor upcoming events based on criteria which combine research topics and location; b) browse past events geographically and/or thematically to identify trends; c) search for recordings of past events; d) submit information to power features (a) to (c).
 
Copyright David Bourget and University of London, 2011. This blog's content is license under the Attribution-ShareAlike license.