Web Applications and Transactions
I’ve noticed lately that some web application frameworks seem to encourage the practice of automatically putting database transactions around view/controller execution. This may seem like a convenience, but it encourages precisely the wrong way of approaching transactions— at least if your application ever does anything else besides interact with the database. Yes, you’ll want to use transactions when your views or controllers are using the database, but in some cases having a transaction over the entirety of view/controller processing is far too coarse-grained.
See, there are two basic principles to bear in mind when working with a transactional store:
- Minimize transaction size and length.
- Avoid performing non-transactional IO during a transaction.
Minimizing transaction length not only saves resources, it also improves throughput and reduces the likelihood of creating conflicts with other transactions.
However, you do also need to be prepared for the case where conflicts occur. I see an awful lot of code out there which assums that transactions always succeed, and this is dreadfully wrong—if there is writing going on, you basically always need to be prepared for the possibility of a transaction being aborted due to a conflict.
In many cases the appropriate way to recover from a conflict is to retry the aborted transaction after a short delay; repeated conflicts are typically typically with some kind of exponential backoff and a bit of randomness introduced to keep retrying transactions from butting heads too much.
And retrying is one place where the second rule comes in: if you did a whole bunch of non-repeatable IO during the course of the transaction, then you’ve pretty much just screwed yourself.
I’ll give an example of what I saw yesterday which got me thinking about writing this post:
A web application was set up to automatically put a database transaction around each request. One of the functions of this web application is to accept browser file uploads, updating some bits in the database before and after the upload takes place. A consequence of this is that the application would have a transaction held open for the entire duration of the upload—which could be minutes or even hours. This ties up resources associated with the transaction, creates the potential for other transactions to block (to the extent that pessimistic concurrency control is used by the database), and lastly creates a great deal of potential for update conflicts to occur.
And at that point, what happens if the transaction does fail? Since an upload transaction included reading the entire file from the browser, we can’t easily retry it: the browser is only going to send the file once.
The correct solution in such a case is to use multiple, finer-grained transactions during the course of processing the request. Those transactions would be shorter-lived than a mega-transaction spanning the entirety of view/controller processing, are less likely to provoke update conflicts, and can be individually retried when conflicts do occur.