Matt Whipple, Software Engineer

Writing in a Vacuum

The Perils of Dropping ACID Transactionality

16 Sep 2016

ACID compliant transactionality has been part and parcel of RDBMS-backed enterprise software development for years, but its absence or relative incompleteness is also one of the major trade offs of using one of the newer breeds of NoSQL databases.

Transactionality introduces a hefty cost: a cost that seems often disregarded by enterprise developers even though it should be apparent if the functionality provided is thought through rather than being written off as something the database “just does’. Even before getting to bits about scaling and sharding and the CAP theorem: transactions are expensive and incur a significant performance hit…so I also look forward to the opportunity to try avoid them.

Sometimes avoiding transactions and embracing some kind of event-sourced, eventually-consistent other-new-buzzword system can be easy, but other times you need some of the more traditional transaction semantics. The challenge then becomes how to model the state within your data so that things may not be fully consistent internally but do not become exposed in an inconsistent representation. I just wrapped up spending a couple weeks at work writing such a system using MongoDB, and using the tools it provides to perform atomic operations with mostly compare-and-swap style behavior (the particular logic involved operations involving multiple documents which is outside of provided transactional boundaries).

By the end of the work I discovered a newfound appreciation for ACID transactionality: designing some core pieces of your business logic in a way that avoids what can be very clunky transactionality can be a lot of fun and provide valuable insight into your business model…having to think through that same stuff for all of ancillary calls can be an exhausting nuisance. After a couple weeks I found myself staring at my monitor at some trivial bit of supporting logic…or at least it would have been trivial if I didn’t have to think through the different possible ways any of the data could end up in a potentially inconsistent state if something unexpected happened. I then realized what I was missing most about ACID transactions was not that they helped make sure the primary pieces of my app were holding together the right way, but that they helped keep the amount of work for the secondary, etc. pieces in proportion to the importance. The overhead of transactions suddenly seemed a much more worthwhile price to pay.

There are libraries and approaches to add ACID transactionas to some of the NoSQL stores including MongoDB, but I’m not about to cash in my chips on avoiding them; it should get easier as recipes are acquired and a lot of the approaches to ensure consistency are of general use for things like concurrency. Presently I count it as exhausting time well spent.

Hacking on GnuCash (Options, Part 1)

12 Sep 2016

As outlined here, I’m spending some time noodling around with GnuCash to attempt to bend it to my wishes. Presently, I’m focusing on tweaking the provided budget report a bit with the larger goal of making report authoring an overall more pleasant experience that utilizes some more modern technologies. The grand vision/possibly interesting things will be covered in later posts, but for now I’ll be getting my feet wet with Scheme.

When trying to get my head around the original budget report, one of the first obstacles was processing the way options are handled, starting with look-ups (which I really ended with but is a simpler topic). A standard option look-up is some derivative of:

(gnc:option-value
  (gnc:lookup-option
    (gnc:report-options report-obj)
    gnc:pagename-display optname-show-zb-accounts)))

A single option lookup therefore requires three variables: report-obj which is the present incarnation of the report and both the section name and the option name (represented by gnc:pagename-display and optname-show-zb-accounts respectively) to retrieve the value of a given option. The latter two parameters effectively represent a compound identifier with no enforced association. Given that they’re both simple strings, there’s also not an overly good place to provide option specific feedback during look up.

What I want is a single handle that can be used to keep more of that logic bundled together (a.k.a an object). We can start with some basic dispatching logic to get back the name for the created object:

(define (opt raw-name section)
  (let ((name (N_ raw-name))) ;internationalize
    (define (get-name) name)
    (lambda args
      (apply
        (case (car args)
          ((name get-name))
          (else (error "Invalid method: " (car args))))
        (cdr args)))))

The above creates a closure around the provided name and section, and returns a function which will dispatch to a nested function within the closure based on the first parameter, for instance: (define opt-foo (opt "Foo" (N_ "Sec1"))) (opt-foo 'name).

The report object isn’t really owned by the option and so must be passed as another argument. The apply above will forward the rest of the parameters appropriately to the nested function so defining a value method only needs a new entry in the case and the new nested method which is basically the same code from the beginning of this post but using the variables bound within the closure:

(define (opt raw-name section)
  (let ((name (N_ raw-name))) ;internationalize
    (define (get-name) name)
    (define (value r)
      (gnc:option-value
        (gnc:lookup-option
          (gnc:report-options r) section name)))
    (lambda args
      (apply
        (case (car args)
          ((name) get-name)
          ((value) value)
          (else (error "Invalid method: " (car args))))
        (cdr args)))))

And now the value can be retrieved with a call like (opt-foo 'value my-report) without having to worry about preserving the association with the section name…

Calling it that way is still kinda ugly though; there’s a chance that there’s going to be multiple report objects floating around but most likely there will be one for which a bunch of options will be looked up. Putting the options in some kind of container can also keep them from running amok all over the global namespace and allow for some basic sanity checking, so we could also enclose a report object and an option container (using an alist):

(define (opts-for-report r opts)
  (lambda (o)
    (let ((opt (assoc-ref opts o)))
      (if (not opt) (error "Option " o " not found in " opts))
      ((car opt) 'value r))))

Which can be used like:

(define report-opt (opts-for-report report-obj my-opts))
(report-opt 'foo)

Ta-da! A look up mechanism that is neatly contained, expressive, and provides a slightly better chance of triggering fat finger alerts earlier and more loudly (thereby fighting the good fight against #f gremlins). The ease with which options can be retrieved can also impact the style used in the report…if it’s cumbersome to look up options then they are more likely to be done in bulk and passed around as unpacked values. However if the look up function such as report-opt is passed around instead it can lead to more focused function signatures and a clear indication that a particular decision is based on a report option. Passing this object around may be sometimes good, and sometimes not, but now laziness may be less of a factor.

The above code by itself doesn’t help with option definition/registration and could lead to painful code or segmented logic…

So Long LogMan

09 Sep 2016

One of my GitHub projects lingering around is LogMan: an interface for management of logging systems which allowed for registration of implementations at run time rather than design or build or time. It was a fun, useful little project but was primarily the result of my trying to solve cultural issues through code (the tool at my disposal). Tools like JMX are far better positioned to provide that type of functionality, and if there are constraints that block the usability of such tools then that may be a symptom of deeper issues.

In light of that LogMan is now abandon-ware. On the off chance someone finds it useful or interesting then it should be simple enough to take over.

Hacking on GnuCash (Origin Story)

01 Sep 2016

I’ve been a long time on-and-off user of GnuCash but have only recently been starting to really pay attention to my finances (age + family = time to start being a grown-up). In that pursuit I’m looking for some reporting that goes beyond what GnuCash offers. Customizing GnuCash reports the prescribed way is not for the faint of heart, and having done some basic tweaks several years ago my thought this time was to avoid the whole ordeal and access the data directly using R or Python: the data is fairly accessible and last I checked GNU Guile (the extension mechanism that GnuCash uses) seemed to be languishing.

Although GnuCash data is either stored as gzipped XML or in an RDBMS (either of which would be easily read by plenty of tools), the GnuCash architecture consists of a core engine that enforces the model, so accessing the data through the engine certainly seemed a preferable option for maintainable consistent data rather than reconstructing the associations within the stored data. I opted to at least look at using the internal reporting engine to initially extract the data…but in so doing I discovered that my perception of Guile being abandoned was either wrong or outdated (as evidenced by their fancy new Web site).

Perhaps most importantly, Guile is supporting multiple languages including Lua and JavaScript/ECMAScript which will make it far more likely to be adopted by greedy developers who want more from their language than Scheme’s parentheses. Mollified about possibly trying to resurrect long dead code, further investigation has piqued my interest about Guile. [insert beguile pun]

In light of this new-found wisdom I’m going to be looking at implementing everything I need within GnuCash/Guile and hopefully helping move report customization in GnuCash to a place where it has less attached disclaimers.

Working with images from the command line

29 Aug 2016

This post is mostly regurgitating readily available information: mostly for my future reference and I may do something more interesting with it in the future.

As part of working on this Web site I’m working on organizing my pictures a bit, which is something for which I’ve never found a solution that I particularly like (though I haven’t looked too hard). Most photo organizing software that I’ve tried has seemed too needy; I’m not looking for the kind of commitment where I cede control of the underlying storage or have to devote more than a couple seconds at a time.

My usual workflow and the above constraints lead me to the command line, and the command line has the excellent de facto standard suite of utilities ImageMagick Additional metadata from a photo file can be listed using identify -verbose filename which can be used to belatedly tag or rename image files and convert can be used to quickly modify an image for uses such as sticking on a Web site. The photo on this page was re-sized and rotated from my photo library with:

convert 20140614_Bay_Bridge.jpg -resize 8% -rotate 90 bay-bridge.jpg

A huge perk for me of preferring ImageMagick commands over using a graphics editor is that source files are likely to be consistent dimensions (whatever the camera uses) are the desired renditions (to fit a Web page format) so once the percentage(s) are worked out resizing images is totally painless. I can’t say the same for cropping or any other manipulations, but hopefully my laziness can keep from having to cross that bridge for a while.

All the ImageMagick goodness is covered in the man pages and (presumably) on the site.

Website, take n

24 Aug 2016

I’m launching a new Web site/blog for some of my work (yay!). It’s not much to look at the moment but I’ll be incrementally improving the design so it will probably be pretty snazzy by the time I retire or so.

I’m turning my attention back to attempting to blog because I find that I mostly interact with the Internet in a way that aligns with static’ish Web sites and blogging: preferring more unidirectional asynchronous batching rather than the generally more high touch social media model (in which I tend to participate just sporadically enough to eke out something resembling an existence). The more asynchronous, on-demand nature of Web sites also aligns with at least one of the biases asserted in Douglas Rushkoff’s Program or Be Programmed

I plan to write a short blog entry every week covering some technical topic which may or may not be of much use to anyone other than me, in addition to jazzing up the site in some way or another as part of the same commit. Each post will also contain a picture, hopefully in some way related to the post but probably not…but either way it will help me conquer my aversion of adding images to Web sites and spur me to organize some of my photos.