[The code from this article is available as a Gist.]

In the first article in this series, we considered a set of $N$ data values from a function $h(t)$ sampled at a regular interval $\Delta $:

$${h}_{n}=h(n\Delta )\phantom{\rule{2em}{0ex}}n=0,1,2,\dots ,N-1.$$

and defined the discrete Fourier transform of this set of data values as

$${H}_{n}=\sum _{k=0}^{N-1}{h}_{k}{e}^{2\pi ikn/N}\phantom{\rule{2em}{0ex}}n=0,1,2,\dots ,N-1.\phantom{\rule{2em}{0ex}}(1)$$

The original Cooley and Tukey fast Fourier transform algorithm is based on the observation that, for even $N$, the DFT calculation can be broken down into two subsidiary DFTs, one for elements of the input vector with even indexes and one for elements with odd indexes:

$$\begin{array}{cc}\hfill {H}_{n}& =\sum _{k=0}^{N-1}{h}_{k}{e}^{2\pi ikn/N}\hfill \\ \hfill & =\sum _{k=0}^{N/2-1}{h}_{2k}{e}^{2\pi i(2k)n/N}+\sum _{k=0}^{N/2-1}{h}_{2k+1}{e}^{2\pi i(2k+1)n/N}\hfill \\ \hfill & =\sum _{k=0}^{N/2-1}{h}_{2k}{e}^{2\pi ikn/(N/2)}+{\omega}^{n}\sum _{k=0}^{N/2-1}{h}_{2k+1}{e}^{2\pi ikn/(N/2)}\hfill \\ \hfill & ={H}_{n}^{e}+{\omega}_{N}^{n}{H}_{n}^{o},\hfill \end{array}$$

where we write ${\omega}_{N}={e}^{2\pi i/N}$, and where ${H}_{n}^{e}$ is the $n$th component of the DFT of the evenly indexed elements of $h$ and ${H}_{n}^{o}$ is the $n$th component of the DFT of the oddly indexed elements of $h$. This decomposition is sometimes called the *Danielson-Lanczos lemma*.

In this article, we’re going to look in detail at this decomposition to understand just how it works and how it will help us to calculate the DFT in an efficient way. In the next article, we’ll see how to put a series of these even/odd decomposition steps together to form a full DFT.

The DFT calculation $(1)$ that we implemented in the last article can be viewed as a matrix multiplication. For a complex input vector of length $N$, we can write $(1)$ as

$$\left(\begin{array}{c}{H}_{1}\\ {H}_{2}\\ \vdots \\ {H}_{N}\end{array}\right)=\left(\begin{array}{ccccc}1& 1& 1& \dots & 1\\ 1& {\omega}_{N}& {\omega}_{N}^{2}& \dots & {\omega}_{N}^{N-1}\\ 1& {\omega}_{N}^{2}& {\omega}_{N}^{4}& \dots & {\omega}_{N}^{2(N-1)}\\ \vdots & \vdots & \vdots & \ddots & \vdots \\ 1& {\omega}_{N}^{N-1}& {\omega}_{N}^{2(N-1)}& \dots & {\omega}_{N}^{(N-1{)}^{2}}\end{array}\right)\left(\begin{array}{c}{h}_{1}\\ {h}_{2}\\ \vdots \\ {h}_{N}\end{array}\right)\phantom{\rule{2em}{0ex}}(2)$$

We’ll call the matrix in $(2)$ the *Fourier matrix*, ${F}_{N}$.

So far, since ${F}_{N}$ is a dense matrix, it seems as though the DFT will require $O({N}^{2})$ floating point operations. The *fast Fourier transform* does much better than this, using a clever decomposition of ${F}_{N}$ to calculate the DFT in $O(N\mathrm{log}N)$ operations. This decomposition and the algorithm that it leads to were introduced to the world by Cooley and Tukey in the 1960s^{1} although many of the ideas had been developed earlier. We’re going to write some simple Haskell code to understand how this decomposition works. In this article, we’ll consider only vectors with $N$ a power of two. We’ll deal with the general case later. (The Cooley-Tukey algorithm is now one of a group of algorithms called fast Fourier transforms. We may look at another example later on for dealing with vectors of prime length.)

Before we show an example of the FFT decomposition, let’s look at a couple of instances of ${F}_{N}$:

$${F}_{4}=\left(\begin{array}{cccc}1& 1& 1& 1\\ 1& {\omega}_{4}& -1& {\omega}_{4}^{3}\\ 1& -1& 1& -1\\ 1& {\omega}_{4}^{3}& -1& {\omega}_{4}\end{array}\right)\phantom{\rule{2em}{0ex}}{F}_{8}=\left(\begin{array}{cccccccc}1& 1& 1& 1& 1& 1& 1& 1\\ 1& {\omega}_{8}& {\omega}_{4}& {\omega}_{8}^{3}& -1& {\omega}_{8}^{5}& {\omega}_{4}^{3}& {\omega}_{8}^{7}\\ 1& {\omega}_{4}& -1& {\omega}_{4}^{3}& 1& {\omega}_{4}& -1& {\omega}_{4}^{3}\\ 1& {\omega}_{8}^{3}& {\omega}_{4}^{3}& {\omega}_{8}& -1& {\omega}_{8}^{7}& {\omega}_{4}& {\omega}_{8}^{5}\\ 1& -1& 1& -1& 1& -1& 1& -1\\ 1& {\omega}_{8}^{5}& {\omega}_{4}& {\omega}_{8}^{7}& -1& {\omega}_{8}& {\omega}_{4}^{3}& {\omega}_{8}^{3}\\ 1& {\omega}_{4}^{3}& -1& {\omega}_{4}& 1& {\omega}_{4}^{3}& -1& {\omega}_{4}\\ 1& {\omega}_{8}^{7}& {\omega}_{4}^{3}& {\omega}_{8}^{5}& -1& {\omega}_{8}^{3}& {\omega}_{4}& {\omega}_{8}\end{array}\right)$$

Clearly, although there are ${N}^{2}$ entries in each matrix, there can only be $N$ distinct powers of ${\omega}_{N}$, so there is a lot of redundancy in the entries in the Fourier matrix and in the operations of the naïve DFT algorithm. Further, ${\omega}_{2N}^{2}={\omega}_{N}$, so there is a relationship between the entries in Fourier matrices of different orders. This second observation turns out to be critical for the Cooley-Tukey algorithm.

As a first example, let’s decompose ${F}_{4}$:

$${F}_{4}=\left(\begin{array}{cccc}1& 1& 1& 1\\ 1& {\omega}_{4}& -1& {\omega}_{4}^{3}\\ 1& -1& 1& -1\\ 1& {\omega}_{4}^{3}& -1& {\omega}_{4}\end{array}\right)=\left(\begin{array}{cccc}1& 0& 1& 0\\ 0& 1& 0& {\omega}_{4}\\ 1& 0& -1& 0\\ 0& 1& 0& -{\omega}_{4}\end{array}\right)\left(\begin{array}{cccc}1& 1& 0& 0\\ 1& -1& 0& 0\\ 0& 0& 1& 1\\ 0& 0& 1& -1\end{array}\right)\left(\begin{array}{cccc}1& 0& 0& 0\\ 0& 0& 1& 0\\ 0& 1& 0& 0\\ 0& 0& 0& 1\end{array}\right)$$

which can be rewritten as

$${F}_{4}=\left(\begin{array}{cc}{I}_{2}& {D}_{2}\\ {I}_{2}& -{D}_{2}\end{array}\right)\left(\begin{array}{cc}{F}_{2}& \\ & {F}_{2}\end{array}\right)P\phantom{\rule{2em}{0ex}}(3)$$

where $P$ is a permutation matrix that picks out first the even then the odd elements of the input vector $h$, ${F}_{2}$ is the $2\times 2$ Fourier matrix, ${I}_{2}$ the $2\times 2$ unit matrix and ${D}_{2}$ a $2\times 2$ diagonal matrix whose diagonal entries are powers of ${\omega}_{4}$.

Verifying identities like $(3)$ by hand is extremely tedious, so let’s write some Haskell code to help. This code will demonstrate a technique that’s useful for a lot of algebraic calculations. Instead of using a completely general computer algebra system (Mathematica or Maple, for example), we’ll represent a restricted set of algebraic entities as Haskell data types. The main benefit of doing this is that it allows us to encode preconceptions that we have about the calculations we’re doing directly in the code. If those preconceptions turn out not to be correct, or if we extend the domain of our calculations beyond the examples we use to construct those preconceptions, we can get immediate feedback when our ideas turn out to be wrong.

From a little bit of manual calculation, it seems as though we’re going to need to deal with matrices with entries that are either zero or powers of roots of unity. Also, it seems as though all of the matrix multiplications we need to do result in dot products between matrix rows and columns with only a single non-zero entry. Here’s an encoding of these ideas in Haskell:

```
data Sign = P | M deriving (Eq, Show)
data Entry = Z | O Sign | W Sign Int Int deriving Eq
negS :: Sign -> Sign
negS P = M
negS M = P
multS :: Sign -> Sign -> Sign
multS s1 s2 | s1 == s2 = P
| otherwise = M
instance Num Entry where
abs _ = error "Absolute value not appropriate"
signum _ = error "Sign not appropriate"
negate (O s) = O (negS s)
negate (W s n p) = W (negS s) n p
Z + x = x
x + Z = x
_ + _ = error "Adding two non-zero terms!"
x - y = x + negate y
Z * x = Z
x * Z = Z
(O s1) * (O s2) = O (multS s1 s2)
(O s1) * (W s2 n p) = W (multS s1 s2) n p
(W s1 n p) * (O s2) = W (multS s1 s2) n p
(W s1 n1 p1) * (W s2 n2 p2)
| n1 == n2 = W (multS s1 s2) n1 ((p1+p2) `mod` n1)
| gcd n1 n2 == 1 = error "Multiplying different omegas"
| otherwise = if n1 > n2
then W (multS s1 s2) n1 ((p1+p2*(n1`div`n2))`mod`n1)
else W (multS s1 s2) n2 ((p2+p1*(n2`div`n1))`mod`n2)
fromInteger n
| n == 0 = Z
| n == 1 = O P
| n == -1 = O M
| otherwise = error "Invalid integer"
normalise :: Entry -> Entry
normalise = fix norm
where norm (W s _ 0) = O s
norm (W M n k) | even n = W P n ((k + n `div` 2) `mod` n)
| otherwise = W M n (k `mod` n)
norm (W s 2 k) | odd k = O s * O M
| otherwise = O s
norm (W P n k) | even n && even k = W P (n `div` 2) (k `div` 2)
| n == 2 * k = O M
| otherwise = W P n (k `mod` n)
norm x = x
fix f x = let xs = iterate f x
in fst $ head $ dropWhile (\(x,y) -> x/=y) $ zip xs (tail xs)
```

We have an `Entry`

data type representing values of zero (constructor `Z`

), plus or minus one (constructor `O`

) or powers of roots of unity (constructor `W`

). We explicitly encode the sign of terms using a `Sign`

data type, so that

$$\begin{array}{c}\mathtt{\text{O P}}\to +1\phantom{\rule{2em}{0ex}}\mathtt{\text{O M}}\to -1\\ \mathtt{\text{W P n k}}\to {\omega}_{\mathtt{\text{n}}}^{\mathtt{\text{k}}}\phantom{\rule{2em}{0ex}}\mathtt{\text{W M n k}}\to -{\omega}_{\mathtt{\text{n}}}^{\mathtt{\text{k}}}\end{array}$$

We make `Entry`

an instance of `Num`

in a way that conforms to our preconceptions about how the matrix entries we have to deal with are computed: we raise an error if an attempt is made to add two non-zero terms, for instance. We also perform normalisation of products of powers of different roots of unity so that, for example, we correctly determine that ${\omega}_{3}{\omega}_{6}={\omega}_{6}^{3}$. There is a separate `normalise`

function to perform other canonicalisation of matrix entries:

$$\begin{array}{c}\pm {\omega}_{N}^{0}=\pm 1\phantom{\rule{2em}{0ex}}{\omega}_{2}=-1\phantom{\rule{2em}{0ex}}{\omega}_{2N}^{2}={\omega}_{N}\\ -{\omega}_{2N}^{k}={\omega}_{2N}^{(k+N)\phantom{\rule{0.167em}{0ex}}\mathrm{\text{mod}}\phantom{\rule{0.167em}{0ex}}2N}\end{array}$$

Armed with these definitions of matrix entries, we can define the matrix operations that we need:

```
newtype Mat = Mat [[Entry]] deriving Eq
normMat :: Mat -> Mat
normMat (Mat rs) = Mat $ map (map normalise) rs
transpose :: Mat -> Mat
transpose (Mat xs) = Mat (L.transpose xs)
dot :: [Entry] -> [Entry] -> Entry
dot v1 v2 = L.sum $ L.zipWith (*) v1 v2
(%*%) :: Mat -> Mat -> Mat
(Mat m1) %*% (Mat m2) =
normMat $ Mat [[r `dot` c | c <- L.transpose m2] | r <- m1]
diag :: [Entry] -> Mat
diag ds = let bigN = length ds
zs = replicate (bigN - 1) Z
zsplits = map (flip splitAt zs) [0..bigN-1]
rows = zipWith (\d (b, a) -> b ++ [d] ++ a) ds zsplits
in Mat rows
unit :: Int -> Mat
unit bigN = diag $ replicate bigN (O P)
```

We do this using a very simple nested list matrix representation, since we don’t really care about efficiency here – this code is just for exploration and understanding. We’ve imported the `Data.List`

module qualified as `L`

, which is where `L.transpose`

, `L.sum`

, etc. come from. We have simple functions to normalise the entries in a matrix using the entry normalisation function from above, to calculate matrix products (`%*%`

) and to generate diagonal matrices with specified elements, as well as unit matrices of a given size.

Using this matrix representation, we can now construct the components of the FFT decomposition:

```
-- Basic Fourier matrix.
fourier :: Int -> Mat
fourier bigN =
normMat $ Mat [[(W P bigN 1)^(n*k) | k <- [0..bigN-1]] | n <- [0..bigN-1]]
-- Doubled-up half-size Fourier matrices.
fourierDouble :: Int -> Mat
fourierDouble n2 =
let n = n2 `div` 2
zs = replicate n Z
Mat f = fourier n
in Mat $ (map (++ zs) f) ++ (map (zs ++) f)
-- Even/odd permutation matrix.
perm :: Int -> Mat
perm n2 = let n = n2 `div` 2
evens = ((O P) : replicate (n2-1) Z) :
map (take n2 . (Z :) . (Z :)) evens
odds = map (take n2 . (Z :)) evens
in Mat $ take n evens ++ take n odds
-- I + D matrix.
idMatrix :: Int -> Mat
idMatrix n2 = let n = n2 `div` 2
Mat i = unit n
ds = (O P) : [W P n2 i | i <- [1..n-1]]
Mat d = diag ds
Mat nd = diag $ map negate ds
in Mat $ (zipWith (++) i d) ++ (zipWith (++) i nd)
```

First, the `fourier`

function calculates the full Fourier matrix – in GHCi:

```
> fourier 4
1 1 1 1
1 W4^1 -1 W4^3
1 -1 1 -1
1 W4^3 -1 W4^1
```

(There are `Show`

instances for matrices and entries to produce readable representations – the code isn’t shown here, but is available in the repository.) There are also functions to generate the nested double Fourier matrix, the even-odd permutation matrix and the unit matrix/diagonal matrix composite that appears first in $(3)$:

```
> fourierDouble 4
1 1 0 0
1 -1 0 0
0 0 1 1
0 0 1 -1
> perm 4
1 0 0 0
0 0 1 0
0 1 0 0
0 0 0 1
> idMatrix 4
1 0 1 0
0 1 0 W4^1
1 0 -1 0
0 1 0 -W4^1
```

We can then check that the decomposition is correct:

```
> idMatrix 4 %*% (fourierDouble 4 %*% perm 4) == fourier 4
True
```

What have we done here? We’ve constructed a “toy algebra system” that deals only with the algebraic entities that we believe are going to be involved in the calculations we’re interested in. This restricted system is just enough for us to experiment with and check the Fourier matrix decomposition. Of course, this decomposition is just a restatement of the Danielson-Lanczos lemma in terms of matrices, so we can directly prove it algebraically, but the Haskell code provides us with a little playground to help our understanding of what’s going on. We’re lead to the conclusion that

$${F}_{2N}=\left(\begin{array}{cc}{I}_{N}& {D}_{N}\\ {I}_{N}& -{D}_{N}\end{array}\right)\left(\begin{array}{cc}{F}_{N}& \\ & {F}_{N}\end{array}\right){P}_{2N}$$

where ${D}_{N}=\mathrm{\text{diag}}(1,{\omega}_{2N},\dots ,{\omega}_{2N}^{N-1})$ and ${P}_{2N}$ is the $2N\times 2N$ even-odd permutation matrix. (Note that this decomposition doesn’t rely on $N$ being even – although we’re only considering input vectors whose length is a power of two at the moment, this generality will be important later on.)

In fact, in the powers-of-two length case, these decompositions aren’t too onerous to calculate and check by hand. When we come to consider the general case (data vector lengths that aren’t powers of two), the algebra is quite a lot more tedious (and you won’t find the answer in most textbooks…), and this code will prove invaluable.

Next time, we’ll see how we can use the knowledge we’ve gained here to implement the full Cooley-Tukey algorithm for data vectors whose lengths are powers of two.

Cooley, James W.; Tukey, John W. (1965). “An algorithm for the machine calculation of complex Fourier series”.

*Math. Comput.***19**: 297-301. doi:10.2307/2003354↩

Site content copyright © 2011-2013 Ian Ross
Powered by Hakyll