Not so stupid db & and how / why IO works with concurrency in clojure
Murphy, a friend of mine, asked in a comment to my last post ‘does stupiddb support the clojure concurrency model out of the box’. Sadly my first answer had to be ‘perhaps’, since I honestly wasn’t sure, most of it should but the logging caused me headache – since in the worst case it might happen that two threads log at the same time. I wasn’t sure how the writer handles this, and my fear was it might come out mixed.
So I did some research, mainly I addressed it in the #clojure channel – a place where many smart people and me hang around and talk about clojure. My first idea was ‘lets use agents’. Agents are clojures answer to asynchronous but ordered state. Pretty much what logs are.
How do they work? A agent has a state or value, in the case of stupid db this would be a java.io.SomeKindOfWriter. Now we can send this agent functions to execute, and take the new value. This means something like this:
(def log (agent (io/writer log-file)))
; ...
(defn- write-log [out action key value]
(binding [*out* out]
(prn [action key value])
(flush))
out)
; ...
(send log write-log :assoc 1 1)This will write the log [:assoc 1 1] if at the same time someone wants to write the log [:assoc 1 2] the one that is send later, will be sent last. So in the case we’ve a transaction that the [:assoc 1 1] transaction fails and needs to rollback to do it again, we would have something like [:assoc 1 1] [:assoc 1 2] [:assoc 1 1] right? That would still be correct since even we’ve [:assoc 1 1] in there twice, the correct value would be the last one in the log.
BUT, big but here, clojure is smarter some smart person in the channel pointed this line out to me:
Agents are integrated with the STM – any dispatches made in a transaction are held until it commits, and are discarded if it is retried or aborted.
What does this mean? Actually it means clojures STM takes care of the problem for us, the send is only fired if the transaction succeeds, that is pretty darn cool! So the log would actually look like [:assoc 1 2] [:assoc 1 1] which is awesome!
So the essence of this is: when doing IO in a dosync, or in a concurrent situation where you can’t guarantee that the io is only fired once, use agents since they ‘just work’. It is very neat and a good solution to the IO issue, I’m really surprised it isn’t promoted more.
Is everything perfect now? Not exactly, since agents are asynchronous the log might be delayed slightly, which isn’t nice :( but well it is a small tradeoff :) for things just working perfectly. If I figure out how to do it right when it happens, or at the end of the transaction I’ll give an update.
Trackbacks
Use the following link to trackback from your own site:
http://blog.licenser.net/trackbacks?article_id=70
Comments
- Hello, I don't know for sure if your "log" is the fact of persisting the new state of your db to disk. If not, then you can just skip the following. If I'm true, then please note that the I/O operation performed by the agent is not part of the transaction: If there is an IOException in your agent, you will have desynchronized the new state in the STM and the serialized state on the disk storage. This case can occur in practice. One must know about it. Connecting STM with external storages in a synchronized ways requires a piece of work that is not yet offered by Clojure: the equivalent of java TransactionManagers + XA drivers. HTH, -- Laurent
- Hi Laurent, I fear you're right, the log is supposed to represent or keep track of the state changes on the disk. Thank you for the advice here, and what you say sadly sounds right, so I'll have to find a solution for that and at some point have a reason for a new post ;). Regards, Heinz.
- Great new design! I love it!
