Foggy sponges and kittens
Sponges are easy. Nobody likes to write loops with lots of error handling for IOError
or trying to figure out what exactly rewind()
does. Plus, we can’t write our favorite dot-chain-of-death map/select/flatten/zip monstrosity until that file’s contents are in something that implements Enumerable
, right? As a result of this type of Ruby-think, I’ve seen far too many “working with files” examples that boil down to contents = File.new("something.txt").readlines
or IO.read("blah.txt")
and should instead be correctly titled “files are hard, let’s go sponging”.
Sadly, sponges also kill kittens. Not really, but they get pretty close. When you are dealing with file input and write a sponge, what you’re really saying is “I can’t wait until this grows, consumes all memory, and gets OOM killed!” It’s a false economy: you’re trading superficial Ruby idiom simplicity for a design that is built to fail as your data set grows.
If you write a sponge today, you will almost surely rewrite it tomorrow.
This came up again recently with respect to dealing with Amazon S3 and a rewrite we had to do because (surprise) it was consuming all the memory on a Resque machine and getting killed. The engineers assigned solved some of the hard problems (dealing with the core algorithm that is spongy), but left an obvious sponge right at the center of things, ensuring the “fix the OOM” branch would suffer a different but related OOM once merged.
This is where we get to Fog. Fog is a really impressive gem and you should probably be using it if you’re dealing with any cloud services and writing Ruby. But we were creating a sponge with Fog when putting and getting potentially large files from S3. The seductive “one clean line” solution was staring at me from yet another pull request.
It seemed odd to me. I know Wes. He and the others that work on Fog are really smart people. The gem is widely used and contributed to. It’s unlikely that nobody has noticed it is impossible to use Fog::Storage
unless you can fit the file into local memory. That’d be a silly design deficiency.
And here’s where the corollary to “A bad workman blames his tools” should kick in: “A good workman blames himself.” There had to be something I was missing. I either skipped some documentation or am too stupid to fully understand the very clever (and beautiful) code that makes up Fog. As it turns out, both were true. In the process of digging, code reading, and thinking I found my answers. It’s not particularly well documented nor code-obvious, but Fog supports streaming instead of sponging. Briefly:
The “create” example is right there in the Fog::Storage docs, but is easy to look past if you’re not used to dealing with IO-like objects. The “get” example, however, is not documented in a clear place, nor is it obvious from a cursory reading of the code that it takes a block variation (the implementation is actually over in Wes’s similarly excellent Excon library).
Consider sponges an anti-pattern. Watch for them. Avoid them. Get comfortable with chunked or line-based processing. Add StringIO or thorough mocks to your TDD toolkit for testing things that expect IO-like objects.
And for goodness sake, think of the kittens.