Involuntary Hiatus Hiccup

It’s more than two months since I last wrote a blog article. I’ve been ridiculously busy since then and things are only just now calming down. It now looks as though I’m going to try something new, at least for three months or so, and that should provide more time for blogging. I had to drop more or less all of my personal projects for the last couple of months, which has been frustrating (no work on my data analysis book, no work on arb-fft, very little work on C2HS, a huge backlog of technical reading piling up and up and up like some Tower of Techno-Babel). Things should get back to something more like normal from now on though.

One benefit of working like a donkey for the last couple of months is that I now have a bit of money in the bank, and I’m planning to use that financial window to push some personal projects forwards. I have a few ideas, starting with “finishing” arb-fft and getting back to some work on my book. I’ll do a couple of days of paid work a week, do a bit of open-source stuff (C2HS and Hackage mostly) and work on those personal projects. And blogging. There will be blogging.

Starting tomorrow. Now though, I’m going to go outside and lie myself down in the sunshine.


Fatherland and HHhH

book-reviews

by Robert Harris; Laurent Binet & Sam Taylor

These two books are tied together by the name of Reinhard Heydrich. I can’t think of a polite way of describing Heydrich. He was one of the architects of the Holocaust, a fervid Nazi, and an all-round total bastard. Hitler called him “the man with the iron heart”, which gives you some kind of idea of what kind of a git he was.

In the real world, Heydrich dies in 1942 from injuries sustained during an assassination attempt by Czech and Slovak commandos while he was “Acting Reich Protector of Bohemia and Moravia”. In the world of Fatherland, he survives, the Germans win the Second World War and Europe languishes under Nazi rule, with a sympathetic administration in the USA led by Kennedy (père rather than fils) providing no effective check on their activities. Heydrich continues in his role as second-in-command of the SS and goes on being as much as a bastard as ever.


Using Images in Haddock Documentation on Hackage

haskell

A useful but little used feature of Haddock is the ability to include inline images in Haddock pages. Here are a few examples. You can use images for diagrams or for inserting mathematics into your documentation in a readable way. In order to use an image, you just put <<path-to-image>> in a Haddock comment. The image can be of any format that browsers support: PNG, JPEG, SVG, whatever.


Functional Differential Geometry

book-reviews

by Gerald Jay Sussman & Jack Wisdom (with Will Farr)

This book follows very much in the mould of Sussman & Wisdom’s Structure and Interpretation of Classical Mechanics in that it’s an attempt to take a body of mathematics (differential geometry here, classical mechanics in SICM) and to “computationalise” it, i.e. to use computer programs to make the mathematical structures involved manifest in a way that a more traditional approach to these subjects doesn’t.

This computational viewpoint is a very powerful one to adopt, for a number of reasons.


The Storage Capacity of Neural Systems

AI

I recently read The Quest for Artificial Intelligence by Nils Nilsson. Interrupting a fairly linear history of the AI field is an interlude on some more philosophical questions about artificial intelligence, minds and thought. Searle’s Chinese Room, things like that.

I got to thinking about some of these questions on my daily peregrinations with Winnie, and I started wondering about the scales that are involved in most discussions of minds and AI. Searle talks about an individual person in his Chinese Room, which makes the idea of some sort of disembodied intelligence actually understanding the Chinese sentences it’s responding to seem pretty absurd. But is that in any sense a realistic representation of a brain or a human mind?

In keeping with my upbringing as a baby physicist, I’m going to take a very reductionist approach to this question. I want to get some idea of exactly what the information storage capacity of a human brain is. I’m deliberately not going to think about information processing speeds (because that would involve too much thinking about switching rates, communication between different parts of brains, and other biological things about which I know very little). I’ll treat this in the spirit of a Fermi problem, which means I’ll be horrifyingly slapdash with anything other than powers of ten. There will be big numbers.


Haskell FFT 15: Some small Haskell things

data-analysishaskell

After releasing the arb-fft package last week, a couple of issues were pointed out by Daniel Díaz, Pedro Magalhães and Carter Schonwald. I’ve made another couple of releases to fix these things. They were mostly little Haskell things that I didn’t know about, so I’m going to quickly describe them here. (I’m sure these are things that most people know about already...)


Some Thoughts on the MOOCopalypse

moocs

I was obscurely disappointed to discover that I wasn’t the first person to come up with “MOOCopalypse”. I think I could claim first dibs on “MOOCaclysm” if I wanted, since there are no Google hits for it, but it’s not quite as good.

Unless you’ve been living under a rock for the last couple of years, you’ve probably heard of MOOCs (Massive Open Online Courses). These are courses offered online by universities or other suppliers, intended to bring high-quality higher education resources to a wide and varied audience. I’ve done a few of these courses: Peter Norvig and Sebastian Thrun’s Stanford AI course, one on fantasy and science fiction, and more recently Jeff Ullman’s automata theory course and another on general game playing (about which I’ll write something once I’ve had some time to work on the software I developed as part of the course).


Haskell FFT 14: Wrap-up

data-analysishaskell

After thirteen blog articles and about two-and-a-half weeks’ equivalent work time spread out over three months, I think it’s time to stop with the FFT stuff for a while. I told myself when I started all this that if I could get within a factor of 20 of the performance of FFTW, I would be “quite obscenely pleased with myself”. For most $N$, the performance of our FFT code is within a factor of 5 or so of the performance of the vector-fftw package. For a pure Haskell implementation with some but not lots of optimisation work, that’s not too shabby.

But now, my notes are up to nearly 90 pages and, while there are definitely things left to do, I’d like to move on for a bit and do some other data analysis tasks. This post is just to give a quick round-up of what we’ve done, and to lay out the things remaining to be done (some of which I plan to do at some point and some others that I’d be very happy for other people to do!). I’ve released the latest version of the code to Hackage. The package is called arb-fft. There is also an index of all the posts in this series.


Haskell FFT 13: Optimisation Part 3

data-analysishaskell

In this article, we’ll finish up with optimisation, tidying up a couple of little things and look at some “final” benchmarking results. The code described here is the pre-release-4 version in the GitHub repository. I’m not going to show any code snippets in this section, for two reasons: first, if you’ve been following along, you should be familiar enough with how things work to find the sections of interest; and second, some of the optimisation means that the code is getting a little more unwieldy, and it’s getting harder to display what’s going on in small snippets. Look in the GitHub repository if you want to see the details.


Haskell FFT 12: Optimisation Part 2

data-analysishaskell

In the last article, we did some basic optimisation of our FFT code. In this article, we’re going to look at ways of reordering the recursive Cooley-Tukey FFT decomposition to make things more efficient. In order to make this work well, we’re going to need more straight line transform “codelets”. We’ll start by looking at our $N=256$ example in detail, then we’ll develop a general approach.


Link Round-up

I haven’t done one of these for a long time, but I have a big old pile of interesting things I’ve read recently. I might make this a weekly thing–Harry Connolly does a Friday Randomness roundup and there’s always something interesting or weird there. I don’t think I can quite attain his level of randomness, so I’ll have to settle for “worth reading” as a criterion.

  1. Dogs Are People, Too: Functional MRI for dogs! And the dogs get to choose whether or not to play, which is really great.

  2. The Mother of All Squid Builds a Library: Lovely short story.

  3. What the World Eats: Twelve families from very different parts of the world photographed with a week’s worth of groceries. I keep going back to look at them.

  4. Meat Atlas: Friends of the Earth and the Böll Foundation produced this report about global meat production, the economics, the environmental impacts, and so on.

  5. Hofstadter interview in The Atlantic: After reading this long interview with Douglas Hoftstadter, I was inspired to go back and re-read Gödel, Escher, Bach and I now have a pile of other Hofstadter things to read. He’s a very interesting guy.

  6. The Guardian: Ask a Grown-Up: This is something The Guardian has been doing for a while, but I’ve only just seen it. It’s a great idea. Children can write in with questions about anything at all, and they get answers from experts in whatever field they’re asking about (or sometimes slightly random celebrities, but mostly the answerers know what they’re on about). Best one I’ve seen so far: Douglas Hofstadter (yes, him again!) answering a six-year-old’s question “Does my cat Oscar know he’s a cat?”!

  7. The UK National Archives: I’ve not had much of a chance to rummage around in here yet, but it looks like an amazing resource where I could probably waste infinite amounts of time.

  8. How Doctors Die: End-of-life care choices and the differences between what doctors recommend to their patients and what they do themselves. Pretty grim, but needs to be talked about.

  9. Early Onset Dementia: More grimness, also needing to be talked about. Early onset dementia (not just Alzheimer’s–the article is about something different) is just terrifying. Imagine losing the ability to recognise all of your loved ones, to know who you are or where you are, and this not to be something that happens towards the end of your life, but at a time when you might conceivably have to live like that for 40 or 50 years. Now imagine that it’s not you that has this problem, but your partner...

  10. Implementing a JIT Compiled Language with Haskell and LLVM: On a completely different topic, this is a Haskell rendering of a long LLVM tutorial that looks really useful. I’ve only read part of it so far, but it’s good.


Haskell FFT 11: Optimisation Part 1

data-analysishaskell

Based on what we saw concerning the performance of our FFT code in the last article, we have a number of avenues of optimisation to explore. Now that we’ve got reasonable looking $O(N \log N)$ scaling for all input sizes, we’re going to try to make the basic Danielson-Lanczos step part of the algorithm faster, since this will provide benefits for all input sizes. We can do this by looking at the performance for a single input length (we’ll use $N=256$). We’ll follow the usual approach of profiling to find parts of the code to concentrate on, modifying the code, then looking at benchmarks to see if we’ve made a positive difference.

Once we’ve got some way with this “normal” kind of optimisation, there are some algorithm-specific things we can do: we can include more hard-coded base transforms for one thing, but we can also try to determine empirically what the best decomposition of our input vector length is–for example, for $N=256$, we could decompose as $2 \times 2 \times 2 \times 2 \times 2 \times 2 \times 2 \times 2$, using length-2 base transforms and seven Danielson-Lanczos steps to form the final transform, or as $16 \times 16$, using a length-16 base transform and a single Danielson-Lanczos step to form the final result.


Haskell FFT 10: Building a package

data-analysishaskell

So far in this series, we’ve been looking at code piecemeal, writing little test modules to explore the algorithms we’ve been developing. Before we start trying to optimise this code, it makes sense to put it into a Cabal package with a more organised module structure, to provide a sensible API and to make a few other small changes.

From now on, the code we’re going to be talking about will be the arb-fft package, which is hosted on GitHub. In this article, we’ll be talking about the version of the code tagged pre-release-1. (We’ll be uploading a version of the code to Hackage once it’s ready, but there will be a few pre-release versions that we’ll look at before we get to that point.)


Gödel, Escher, Bach: An Eternal Golden Braid

book-reviews

by Douglas Hoftstadter

I first read GEB about 25 years ago, as far as I can remember. I’m not sure how much I took from it then (I was about 16 years old and knew a little bit of maths, but it was my first exposure to logic and the theory of computation), although I do remember the dialogues and all the wonderful Escher pictures.

I picked up a copy again recently as a result of reading an interview with Hofstadter in The Atlantic. The book has aged really well, with just a few technological fossils from the 1970s (record players playing records that they cannot play, for instance). Hofstadter has a timeless and almost overwhelmingly fecund imagination, bubbling over with novel ideas that seem to pop up from nowhere. His description of the thought process that led to the construction of one of the dialogues (each of which is modelled on a piece of Bach’s music and attempts to explore questions of linguistics or metamathematics on a number of levels at once) gives an impression of a mind of almost obsessional twistiness, trying to pack as many ideas as possible into the smallest possible amount of text using puns, structure and patterns at all available levels. It’s quite wonderful.


Haskell FFT 9: Prime-Length FFT With Rader’s Algorithm

data-analysishaskell

[The code from this article is available as a Gist]

In the last article in this series, benchmarking results gave a pretty good indication that we needed to do something about prime-length FFTs if we want to get good scaling for all input sizes. Falling back on the $O(N^2)$ DFT algorithm just isn’t good enough.

In this article, we’re going to look at an implementation of Rader’s FFT algorithm for prime-length inputs. Some aspects of this algorithm are not completely straightforward to understand since they rely on a couple of results from number theory and group theory. We’ll try to use small examples to motivate what’s going on.