[The code from this article is available as a Gist]
In the last article in this series, benchmarking results gave a pretty good indication that we needed to do something about primelength FFTs if we want to get good scaling for all input sizes. Falling back on the $O({N}^{2})$ DFT algorithm just isn’t good enough.
In this article, we’re going to look at an implementation of Rader’s FFT algorithm for primelength inputs. Some aspects of this algorithm are not completely straightforward to understand since they rely on a couple of results from number theory and group theory. We’ll try to use small examples to motivate what’s going on.
Before we begin exploring Rader’s algorithm, it’s probably worth taking a minute to look at why we might expect there to be a better algorithm for primelength FFTs than the naïve $O({N}^{2})$ DFT algorithm we’ve been using so far. Throughout this article, we’re going to use ${F}_{5}$ and ${F}_{7}$ as running examples (you’ll see why later). Here’s ${F}_{5}$:
$${F}_{5}=\left(\begin{array}{ccccc}1& 1& 1& 1& 1\\ 1& {\omega}_{5}^{1}& {\omega}_{5}^{2}& {\omega}_{5}^{3}& {\omega}_{5}^{4}\\ 1& {\omega}_{5}^{2}& {\omega}_{5}^{4}& {\omega}_{5}^{6}& {\omega}_{5}^{8}\\ 1& {\omega}_{5}^{3}& {\omega}_{5}^{6}& {\omega}_{5}^{9}& {\omega}_{5}^{12}\\ 1& {\omega}_{5}^{4}& {\omega}_{5}^{8}& {\omega}_{5}^{12}& {\omega}_{5}^{16}\end{array}\right)=\left(\begin{array}{ccccc}1& 1& 1& 1& 1\\ 1& {\omega}_{5}^{1}& {\omega}_{5}^{2}& {\omega}_{5}^{3}& {\omega}_{5}^{4}\\ 1& {\omega}_{5}^{2}& {\omega}_{5}^{4}& {\omega}_{5}^{1}& {\omega}_{5}^{3}\\ 1& {\omega}_{5}^{3}& {\omega}_{5}^{1}& {\omega}_{5}^{4}& {\omega}_{5}^{2}\\ 1& {\omega}_{5}^{4}& {\omega}_{5}^{3}& {\omega}_{5}^{2}& {\omega}_{5}^{1}\end{array}\right)$$
and here’s ${F}_{7}$:
$${F}_{7}=\left(\begin{array}{ccccccc}1& 1& 1& 1& 1& 1& 1\\ 1& {\omega}_{7}^{1}& {\omega}_{7}^{2}& {\omega}_{7}^{3}& {\omega}_{7}^{4}& {\omega}_{7}^{5}& {\omega}_{7}^{6}\\ 1& {\omega}_{7}^{2}& {\omega}_{7}^{4}& {\omega}_{7}^{6}& {\omega}_{7}^{8}& {\omega}_{7}^{10}& {\omega}_{7}^{12}\\ 1& {\omega}_{7}^{3}& {\omega}_{7}^{6}& {\omega}_{7}^{9}& {\omega}_{7}^{12}& {\omega}_{7}^{15}& {\omega}_{7}^{18}\\ 1& {\omega}_{7}^{4}& {\omega}_{7}^{8}& {\omega}_{7}^{12}& {\omega}_{7}^{16}& {\omega}_{7}^{20}& {\omega}_{7}^{24}\\ 1& {\omega}_{7}^{5}& {\omega}_{7}^{10}& {\omega}_{7}^{15}& {\omega}_{7}^{20}& {\omega}_{7}^{25}& {\omega}_{7}^{30}\\ 1& {\omega}_{7}^{6}& {\omega}_{7}^{12}& {\omega}_{7}^{18}& {\omega}_{7}^{24}& {\omega}_{7}^{30}& {\omega}_{7}^{36}\end{array}\right)=\left(\begin{array}{ccccccc}1& 1& 1& 1& 1& 1& 1\\ 1& {\omega}_{7}^{1}& {\omega}_{7}^{2}& {\omega}_{7}^{3}& {\omega}_{7}^{4}& {\omega}_{7}^{5}& {\omega}_{7}^{6}\\ 1& {\omega}_{7}^{2}& {\omega}_{7}^{4}& {\omega}_{7}^{6}& {\omega}_{7}^{1}& {\omega}_{7}^{3}& {\omega}_{7}^{5}\\ 1& {\omega}_{7}^{3}& {\omega}_{7}^{6}& {\omega}_{7}^{2}& {\omega}_{7}^{5}& {\omega}_{7}^{1}& {\omega}_{7}^{4}\\ 1& {\omega}_{7}^{4}& {\omega}_{7}^{8}& {\omega}_{7}^{5}& {\omega}_{7}^{2}& {\omega}_{7}^{6}& {\omega}_{7}^{3}\\ 1& {\omega}_{7}^{5}& {\omega}_{7}^{3}& {\omega}_{7}^{1}& {\omega}_{7}^{6}& {\omega}_{7}^{4}& {\omega}_{7}^{2}\\ 1& {\omega}_{7}^{6}& {\omega}_{7}^{5}& {\omega}_{7}^{4}& {\omega}_{7}^{3}& {\omega}_{7}^{2}& {\omega}_{7}^{1}\end{array}\right)$$
The main thing to notice is that if you look at the submatrix that excludes the top row and left column (both of which have entries that are all ones) from ${F}_{5}$ and ${F}_{7}$, you’ll see a matrix each of whose rows and columns is a permutation of the values ${\omega}_{N}^{1},{\omega}_{N}^{2},\dots ,{\omega}_{N}^{N1}$. This is because each row of ${F}_{N}$ is of the form $({\omega}_{N}^{0k},{\omega}_{N}^{1k},{\omega}_{N}^{2k},\dots ,{\omega}_{N}^{(N1)k})$ for $1\le k\le N1$. For prime $N$, none of these $k$ values divides $N$ exactly, so that there are no “repeats” in the powers of ${\omega}_{N}$.
So, there’s definitely something a little bit special about ${F}_{N}$ for prime $N$. To make effective use of this to produce a primelength FFT requires some nonobvious insight.
Throughout the remainder of this article, we’ll use $p$ to denote some prime number, just to avoid repeatedly writing “for some prime $N$”.
The basic idea of Rader’s algorithm is to make use of the special permutation structure of the powers of ${\omega}_{p}$ in ${F}_{p}$ to write the DFT summation
$${H}_{n}=\sum _{k=0}^{N1}{h}_{k}{e}^{2\pi ikn/N}\phantom{\rule{2em}{0ex}}n=0,1,2,\dots ,N1.\phantom{\rule{2em}{0ex}}(1)$$
as a convolution, which can then be calculated via FFTs of nonprime lengths. We’ll step through how this works, leaving a couple of details aside until we come to the implementation.
The first thing we do is to split out the special zeroindex parts of the Fourier matrix calculation, so that $(1)$ becomes
$$\begin{array}{c}{H}_{0}=\sum _{k=0}^{p1}{h}_{p}\\ {H}_{n}={h}_{0}+\sum _{k=1}^{p1}{h}_{k}\phantom{\rule{0.167em}{0ex}}{\omega}_{p}^{nk}\phantom{\rule{2em}{0ex}}n=1,2,\dots ,p1.\end{array}\phantom{\rule{2em}{0ex}}(2)$$
We thus need to work out an efficient way of calculating the sum in the second expression.
The most common approach to thinking of a group structure for integers modulo some number $n$ is to use addition as the group operation, giving the group $\mathbb{\text{\mathbb{Z}}}/n$ with group elements $0,1,2,\dots ,n1$. A less common approach is to think of the integers in $1,2,\dots ,n1$ that are prime relative to $n$ with the group operation being multiplication modulo $n$. This group is denoted by a range of different notations, but we’ll call it ${\mathbb{\text{\mathbb{Z}}}}_{n}^{\times}$. For prime $n$, all of the numbers $1,2,\dots ,n1$ are elements of ${\mathbb{\text{\mathbb{Z}}}}_{n}^{\times}$. For example, for $n=5$ and $n=7$, we have the following group multiplication tables:
1  2  3  4  
1  1  2  3  4 
2  2  4  1  3 
3  3  1  4  2 
4  4  3  2  1 
1  2  3  4  5  6  
1  1  2  3  4  5  6 
2  2  4  6  1  3  5 
3  3  6  2  5  1  4 
4  4  1  5  2  6  3 
5  5  3  1  6  4  2 
6  6  5  4  3  2  1 
This group turns out to be the key to writing the sum in $(2)$ as a convolution, since both $n$ and $k$ range over the values $1,2,\dots ,p1$ and the multiplication in the power of ${\omega}_{p}$ is of necessity calculated modulo $p$ (since ${\omega}_{p}^{p}=1$).
The group ${\mathbb{\text{\mathbb{Z}}}}_{p}^{\times}$ is cyclic, which means that any element of the group $a$ can be written as an integer power of a single element of the group, the group generator $g$, i.e. $a={g}^{i}$ for some unique positive integer $i$ with $0\le i\le p2$, and equivalently $a={g}^{j}$ for some unique positive integer $j$ with $0\le j\le p2$, where negative powers of $g$ denote powers of the multiplicative inverse (modulo $p$) of $g$. For example, for ${\mathbb{\text{\mathbb{Z}}}}_{5}^{\times}$, either 2 or 3 is a generator:
$$\begin{array}{c}{2}^{0}=1=1\phantom{\rule{0.167em}{0ex}}\mathrm{\text{mod}}\phantom{\rule{0.167em}{0ex}}5\phantom{\rule{2em}{0ex}}{2}^{1}=2=2\phantom{\rule{0.167em}{0ex}}\mathrm{\text{mod}}\phantom{\rule{0.167em}{0ex}}5\phantom{\rule{2em}{0ex}}{2}^{2}=4=4\phantom{\rule{0.167em}{0ex}}\mathrm{\text{mod}}\phantom{\rule{0.167em}{0ex}}5\phantom{\rule{2em}{0ex}}{2}^{3}=8=3\phantom{\rule{0.167em}{0ex}}\mathrm{\text{mod}}\phantom{\rule{0.167em}{0ex}}5\\ {2}^{0}=1=1\phantom{\rule{0.167em}{0ex}}\mathrm{\text{mod}}\phantom{\rule{0.167em}{0ex}}5\phantom{\rule{2em}{0ex}}{2}^{1}=3=3\phantom{\rule{0.167em}{0ex}}\mathrm{\text{mod}}\phantom{\rule{0.167em}{0ex}}5\phantom{\rule{2em}{0ex}}{2}^{2}=9=4\phantom{\rule{0.167em}{0ex}}\mathrm{\text{mod}}\phantom{\rule{0.167em}{0ex}}5\phantom{\rule{2em}{0ex}}{2}^{3}=27=2\phantom{\rule{0.167em}{0ex}}\mathrm{\text{mod}}\phantom{\rule{0.167em}{0ex}}5\\ \\ {3}^{0}=1=1\phantom{\rule{0.167em}{0ex}}\mathrm{\text{mod}}\phantom{\rule{0.167em}{0ex}}5\phantom{\rule{2em}{0ex}}{3}^{1}=3=3\phantom{\rule{0.167em}{0ex}}\mathrm{\text{mod}}\phantom{\rule{0.167em}{0ex}}5\phantom{\rule{2em}{0ex}}{3}^{2}=9=4\phantom{\rule{0.167em}{0ex}}\mathrm{\text{mod}}\phantom{\rule{0.167em}{0ex}}5\phantom{\rule{2em}{0ex}}{3}^{3}=27=2\phantom{\rule{0.167em}{0ex}}\mathrm{\text{mod}}\phantom{\rule{0.167em}{0ex}}5\\ {3}^{0}=1=1\phantom{\rule{0.167em}{0ex}}\mathrm{\text{mod}}\phantom{\rule{0.167em}{0ex}}5\phantom{\rule{2em}{0ex}}{3}^{1}=2=2\phantom{\rule{0.167em}{0ex}}\mathrm{\text{mod}}\phantom{\rule{0.167em}{0ex}}5\phantom{\rule{2em}{0ex}}{3}^{2}=4=4\phantom{\rule{0.167em}{0ex}}\mathrm{\text{mod}}\phantom{\rule{0.167em}{0ex}}5\phantom{\rule{2em}{0ex}}{3}^{3}=8=3\phantom{\rule{0.167em}{0ex}}\mathrm{\text{mod}}\phantom{\rule{0.167em}{0ex}}5\end{array}$$
In this case, $2={3}^{1}$ and $3={2}^{1}$. For ${\mathbb{\text{\mathbb{Z}}}}_{7}^{\times}$, a suitable generator is 3, and
$$\begin{array}{c}{3}^{0}=1=1\phantom{\rule{0.167em}{0ex}}\mathrm{\text{mod}}\phantom{\rule{0.167em}{0ex}}7\phantom{\rule{2em}{0ex}}{3}^{1}=3=3\phantom{\rule{0.167em}{0ex}}\mathrm{\text{mod}}\phantom{\rule{0.167em}{0ex}}7\phantom{\rule{2em}{0ex}}{3}^{2}=9=2\phantom{\rule{0.167em}{0ex}}\mathrm{\text{mod}}\phantom{\rule{0.167em}{0ex}}7\\ {3}^{3}=27=6\phantom{\rule{0.167em}{0ex}}\mathrm{\text{mod}}\phantom{\rule{0.167em}{0ex}}7\phantom{\rule{2em}{0ex}}{3}^{4}=81=4\phantom{\rule{0.167em}{0ex}}\mathrm{\text{mod}}\phantom{\rule{0.167em}{0ex}}7\phantom{\rule{2em}{0ex}}{3}^{5}=243=5\phantom{\rule{0.167em}{0ex}}\mathrm{\text{mod}}\phantom{\rule{0.167em}{0ex}}7\\ \\ {3}^{0}=1=1\phantom{\rule{0.167em}{0ex}}\mathrm{\text{mod}}\phantom{\rule{0.167em}{0ex}}7\phantom{\rule{2em}{0ex}}{3}^{1}=5=5\phantom{\rule{0.167em}{0ex}}\mathrm{\text{mod}}\phantom{\rule{0.167em}{0ex}}7\phantom{\rule{2em}{0ex}}{3}^{2}=25=4\phantom{\rule{0.167em}{0ex}}\mathrm{\text{mod}}\phantom{\rule{0.167em}{0ex}}7\\ {3}^{3}=125=6\phantom{\rule{0.167em}{0ex}}\mathrm{\text{mod}}\phantom{\rule{0.167em}{0ex}}7\phantom{\rule{2em}{0ex}}{3}^{4}=625=2\phantom{\rule{0.167em}{0ex}}\mathrm{\text{mod}}\phantom{\rule{0.167em}{0ex}}7\phantom{\rule{2em}{0ex}}{3}^{5}=3125=3\phantom{\rule{0.167em}{0ex}}\mathrm{\text{mod}}\phantom{\rule{0.167em}{0ex}}7\end{array}$$
In this case, $5={3}^{1}$.
We’ll talk about how we determine the group generator $g$ later. For the moment, assume that we know a suitable generator.
Given a generator $g$ for the group ${\mathbb{\text{\mathbb{Z}}}}_{p}^{\times}$, we can write the second equation in $(2)$ as
$${H}_{{g}^{r}}={h}_{0}+\sum _{q=0}^{p2}{h}_{{g}^{q}}\phantom{\rule{0.167em}{0ex}}{\omega}_{p}^{{g}^{qr}}={h}_{0}+\sum _{q=0}^{p2}{h}_{{g}^{q}}\phantom{\rule{0.167em}{0ex}}{\omega}_{p}^{{g}^{(rq)}}\phantom{\rule{2em}{0ex}}r=0,1,\dots ,p2.\phantom{\rule{2em}{0ex}}(3)$$
(This relies on the fact that ${h}_{k}$ and ${H}_{k}$ are both periodic in $p$ and ${\omega}_{p}^{p}=1$.)
For example, if $p=5$, taking $g=2$, we have:
$$\begin{array}{c}r=0,\phantom{\rule{0.278em}{0ex}}{g}^{r}=1\Rightarrow {H}_{1}={h}_{0}+\sum _{q=0}^{p2}{h}_{{g}^{q}}{\omega}_{5}^{{g}^{q}}={h}_{0}+{h}_{1}{\omega}_{5}+{h}_{2}{\omega}_{5}^{2}+{h}_{4}{\omega}_{5}^{4}+{h}_{3}{\omega}_{5}^{3}\\ r=1,\phantom{\rule{0.278em}{0ex}}{g}^{r}=3\Rightarrow {H}_{3}={h}_{0}+\sum _{q=0}^{p2}{h}_{{g}^{q}}{\omega}_{5}^{{g}^{q1}}={h}_{0}+{h}_{1}{\omega}_{5}^{3}+{h}_{2}{\omega}_{5}^{1}+{h}_{4}{\omega}_{5}^{2}+{h}_{3}{\omega}_{5}^{4}\\ r=2,\phantom{\rule{0.278em}{0ex}}{g}^{r}=4\Rightarrow {H}_{4}={h}_{0}+\sum _{q=0}^{p2}{h}_{{g}^{q}}{\omega}_{5}^{{g}^{q2}}={h}_{0}+{h}_{1}{\omega}_{5}^{4}+{h}_{2}{\omega}_{5}^{3}+{h}_{4}{\omega}_{5}^{1}+{h}_{3}{\omega}_{5}^{2}\\ r=3,\phantom{\rule{0.278em}{0ex}}{g}^{r}=2\Rightarrow {H}_{2}={h}_{0}+\sum _{q=0}^{p2}{h}_{{g}^{q}}{\omega}_{5}^{{g}^{q3}}={h}_{0}+{h}_{1}{\omega}_{5}^{2}+{h}_{2}{\omega}_{5}^{4}+{h}_{4}{\omega}_{5}^{3}+{h}_{3}{\omega}_{5}^{1}\end{array}$$
and comparison of the right hand sides of each of these equations with the rows of ${F}_{5}$ shows that this generatorbased approach replicates the rows of the Fourier matrix, albeit in a different order to the “normal” order.
The summation in the expression $(3)$ is in the form of a cyclic convolution of the two sequences ${a}_{q}={h}_{{g}^{q}}$ and ${b}_{q}={\omega}_{p}^{{g}^{q}}$, both of length $p1$ with $0\le q\le p2$. For example, for $p=5$, taking $g=2$:
$q$  0  1  2  3 
${a}_{q}$  ${h}_{1}$  ${h}_{2}$  ${h}_{4}$  ${h}_{3}$ 
${b}_{q}$  ${\omega}_{5}^{1}$  ${\omega}_{5}^{3}$  ${\omega}_{5}^{4}$  ${\omega}_{5}^{2}$ 
and for $p=7$ with $g=3$:
$q$  0  1  2  3  4  5 
${a}_{q}$  ${h}_{1}$  ${h}_{3}$  ${h}_{2}$  ${h}_{6}$  ${h}_{4}$  ${h}_{5}$ 
${b}_{q}$  ${\omega}_{7}^{1}$  ${\omega}_{7}^{5}$  ${\omega}_{7}^{4}$  ${\omega}_{7}^{6}$  ${\omega}_{7}^{2}$  ${\omega}_{7}^{3}$ 
Recall that, for a continuous convolution of the form
$$(f*g)(t)={\int}_{\infty}^{\infty}f(\tau )g(t\tau )\phantom{\rule{0.167em}{0ex}}dt,$$
we have the convolution theorem:
$$\mathcal{\text{\mathcal{F}}}(f*g)=\mathcal{\text{\mathcal{F}}}(f)\cdot \mathcal{\text{\mathcal{F}}}(g),$$
where we denote the Fourier transform of a function $f$ by $\mathcal{\text{\mathcal{F}}}(f)$. A comparable result applies for the discrete cyclic convolution in $(3)$. Let us denote the sequence ${a}_{q}$ defined above as $\stackrel{~}{h}=({h}_{{g}^{q}})$ and the sequence ${b}_{q}$ as ${\stackrel{~}{\omega}}_{p}=({\omega}_{p}^{{g}^{q}})$, both for $0\le q\le p2$, and let us write $\stackrel{~}{H}=({H}_{{g}^{q}})$ for the same range of $q$ to represent the results of the convolution as a sequence. Then:
$$\stackrel{~}{H}{h}_{0}={\mathrm{\text{DFT}}}^{1}[\mathrm{\text{DFT}}[\stackrel{~}{h}]\cdot \mathrm{\text{DFT}}[{\stackrel{~}{\omega}}_{p}]]$$
The discrete Fourier transform of the primelength input can thus be calculated by:
The convolution derived above is of length $p1$, which, since $p$ is prime, must be composite. We can thus calculate the length $p1$ discrete Fourier transforms required by the above approach using our CooleyTukey divide and conquer algorithm. The index reorderings and the DFT of the sequence of powers of ${\omega}_{p}$ can all be determined in advance since they depend only on $p$.
For $p=5$, taking $g=2$, this approach looks like this:
For $p=5$, $p1$ is a power of two, which means that the FFT calculations needed in the Rader algorithm are as efficient as possible. However, if $p1$ itself has large prime factors, it may prove necessary to apply Rader’s algorithm recursively, which will be much less efficient than a direct recursive FFT.
An alternative (and better) approach is provided by zeropadding the sequences involved in the convolution – by padding to a length that is a power of two, recursive application of Rader’s algorithm is avoided. To do this, let us write the two sequences to be convolved as $f[i]$ and $g[i]$, both of length $M1$ (we write $M1$ here to match the convolution length $p1$ in the FFT calculation, and we use the square bracket notation for indexing to make some of the expressions below less ambiguous). The cyclic convolution of these two sequences is
$$(f*g)[n]=\sum _{m=0}^{M1}f[m]g[nm]$$
where all indexes on the sequences $f[i]$ and $g[i]$ are zerobased and are taken modulo $M$.
We produce new sequences $f\prime [j]$ and $g\prime [j]$ of length $M\prime $ where $M\prime \ge 2M3$. This condition on $M\prime $ is required to avoid “interference” between unrelated elements of the original sequences due to the wraparound of the cyclic convolution that we’re going to compute. In our FFT application, we will choose $M\prime $ to be a power of two, but any value works as long as it is large enough to satisfy this condition. The sequence $f\prime [j]$ is constructed by inserting $M\prime M$ zeroes between the zeroth and first element of $f[i]$. Sequence $g\prime [j]$ is constructed by cyclically repeating the values of $g[i]$ to give a sequence of total length $M\prime $. If we now convolve the sequences $f\prime [j]$ and $g\prime [j]$, we find that the result contains the convolution of the original sequences $f[i]$ and $g[i]$ as its first $M1$ elements.
A minimal example gives a feeling for why this works. Suppose that we set $M=4$ (so the sequences are of length 3) and consider
$$f[i]=(1,2,3)\phantom{\rule{1em}{0ex}}\mathrm{\text{and}}\phantom{\rule{1em}{0ex}}g[i]=(a,b,c).$$
The cyclic convolution of these sequences is found as:
$$\begin{array}{c}(f*g)[0]=f[0]g[00]+f[1]g[01]+f[2]g[02]=1a+2c+3b,\\ (f*g)[1]=f[0]g[10]+f[1]g[11]+f[2]g[12]=1b+2a+3c,\\ (f*g)[2]=f[0]g[20]+f[1]g[21]+f[2]g[22]=1c+2b+3a.\end{array}$$
The condition on $M\prime $ is that $M\prime \ge 2M3=5$. Putting $M\prime =5$, the new sequences are then
$$f\prime [j]=(1,0,0,2,3)\phantom{\rule{1em}{0ex}}\mathrm{\text{and}}\phantom{\rule{1em}{0ex}}g\prime [j]=(a,b,c,a,b).$$
and the first three elements of the cyclic convolution of these new sequences are:
$$\begin{array}{c}\begin{array}{cc}\hfill (f\prime *g\prime )[0]& =f\prime [0]g\prime [00]+f\prime [1]g\prime [01]+f\prime [2]g\prime [02]+\hfill \\ \hfill & f\prime [3]g\prime [03]+f\prime [4]g\prime [04]=1a+0b+0a+2c+3b=1a+2c+3b,\hfill \end{array}\\ \begin{array}{cc}\hfill (f\prime *g\prime )[1]& =f\prime [0]g\prime [10]+f\prime [1]g\prime [11]+f\prime [2]g\prime [12]+\hfill \\ \hfill & f\prime [3]g\prime [13]+f\prime [4]g\prime [14]=1b+0a+0b+2a+3c=1b+2a+3c,\hfill \end{array}\\ \begin{array}{cc}\hfill (f\prime *g\prime )[2]& =f\prime [0]g\prime [20]+f\prime [1]g\prime [21]+f\prime [2]g\prime [22]+\hfill \\ \hfill & f\prime [3]g\prime [23]+f\prime [4]g\prime [24]=1c+0b+0a+2b+3a=1c+2b+3a.\hfill \end{array}\end{array}$$
We see that these are identical to the values found from the original sequences $f[i]$ and $g[i]$.
To make use of this result in Rader’s algorithm, we just choose the smallest power of two that satisfies the length condition on the sequences to be convolved, pad the input sequence $f[i]$ with zeroes and convolve with the repeated ${\omega}_{p}$ sequence. (Most of the details can be precomputed as part of the FFT “planning” phase.)
In this section, we’ll look at a basic Haskell implementation of Rader’s algorithm. The implementation here is structured more to be understandable than to be particularly efficient. We’ll reorganise the code to precompute some things for efficiency when we start optimising the overall FFT code later on.
After describing the details of the main algorithm, we’ll look at the approach used for the calculation of primitive roots of $p$ to determine generators of the group ${\mathbb{\text{\mathbb{Z}}}}_{p}^{\times}$, since this requires a little explanation. Finally, we’ll look at some test results. Here’s the code for our Haskell implementation (we’ll refer to the line numbers in what follows):
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 

As for the earlier mixedradix FFT, we have toplevel functions to perform forward and inverse FFTs (lines 24), both of which call a common driver function (raderFFT'
), passing in sign and scale parameters to control the direction and final output scaling of the transform.
The final generation of the transformed output (lines 1012) applies the appropriate output scaling to a vector constructed from the sum of the input elements (index 0, corresponding to ${H}_{0}$ in $(2)$) and (other indexes) a DC offset (${h}_{0}$ in $(2)$) and the appropriate element of the generatorbased convolution. In order to deal with the generatorbased indexing for the $H$ values in $(3)$, the convolution results are put into a map (convmap
: line 46) from where they can easily be extracted in output index order.
The determination of the indexes needed for the generatorbased indexing scheme in $(3)$ is done by:
g
for the group ${\mathbb{\text{\mathbb{Z}}}}_{p}^{\times}$ where $p$ is the prime input vector length (line 21: see below for details);invModN
function finds the multiplicative inverse modulo $N$ of a given integer argument);iidxs
: line 35).The iidxs
index vector is used to permute the powers of ${\omega}_{p}$ (line 38) producing the ${b}_{q}$ sequence used in the convolution of $(3)$ and is also used (line 46) to build the index map used to extract result values from the result of the convolution.
If $p1$ is a power of two, no padding of the ${a}_{q}$ and ${b}_{q}$ sequences is needed for efficient calculation of the convolution $(3)$. In all other cases, both sequences need to be padded to a suitable power of two. Calculation of the padded length (lines 1719) take account of the minimum size requirement on the padded inputs to the convolution to avoid “wraparound” effects, and this length is used to control the generation of padded versions of ${a}_{q}$ (apad
: lines 3032) and ${b}_{q}$ (bpad
: line 41). (The “padded” version of ${b}_{q}$ is just a cyclic repeat of the values calculated in line 38.)
Once suitably padded versions of the ${a}_{q}$ and ${b}_{q}$ sequences are computed, the actual convolution step is simple (line 44). Both sequences are Fourier transformed (recall that the padded sequence lengths are a power of two, so this is an efficient computation), multiplied together pointwise and inverse transformed. The convolution of the original unpadded sequences is then found in the first $p1$ entries of this result — this is what is picked out by the indexing defined by convmap
, defined in line 46.
An important step in Rader’s prime length FFT algorithm is the determination of the generator of the group ${\mathbb{\text{\mathbb{Z}}}}_{p}^{\times}$. For this group, the group generator is called the primitive root modulo $p$.
There is no simple general formula to compute primitive roots modulo $n$, but there are methods to determine a primitive root faster than simply trying out all candidate values. In our case, we’re only ever going to be dealing with primitive roots of prime $p$, which simplifies things a little. We proceed by first calculating $\phi (p)$, where $\phi $ is Euler’s totient function. For prime $p$, $\phi (p)=p1$. We then determine the distinct prime factors of $\phi (p)$, which we’ll call ${p}_{1},{p}_{2},\dots ,{p}_{k}$. Then, for each $m\in {\mathbb{\text{\mathbb{Z}}}}_{p}^{\times}$, we compute
$${m}^{\phi (p)/{p}_{i}}\phantom{\rule{0.167em}{0ex}}\mathrm{\text{mod}}\phantom{\rule{0.167em}{0ex}}n\phantom{\rule{2em}{0ex}}\mathrm{\text{for}}\phantom{\rule{0.333em}{0ex}}i=1,\dots ,k$$
using a fast algorithm for modular exponentiation (we use exponentiation by squaring). A number $m$ for which these $k$ values are all different from 1 is a primitive root.
The implementation is pretty much just a straightforward transcription into Haskell of the description above. In cases where there multiple primitive roots (the number of primitive roots modulo $n$, if there are any, is $\phi (\phi (n))$), we just take the first one:
primitiveRoot :: Int > Int
primitiveRoot p
 isPrime p =
let tot = p  1
 ^ Euler's totient function for prime values.
totpows = map (tot `div`) $ fromList $ nub $ toList $ factors tot
 ^ Powers to check.
check n = all (/=1) $ map (expt p n) totpows
 ^ All powers are different from 1 => primitive root.
in fromIntegral $ head $ dropWhile (not . check) $ fromList [1..p1]
 otherwise = error "Attempt to take primitive root of nonprime value"
  Fast exponentation modulo n by squaring.

expt :: Int > Int > Int > Int
expt n b pow = fromIntegral $ go pow
where bb = fromIntegral b
nb = fromIntegral n
go :: Int > Integer
go p
 p == 0 = 1
 p `mod` 2 == 1 = (bb * go (p  1)) `mod` nb
 otherwise = let h = go (p `div` 2) in (h * h) `mod` nb
We also have some support code for QuickCheck testing of the primitive root algorithm:
  QuickCheck generator for prime values. Inefficient...

newtype Prime a = Prime { getPrime :: a }
deriving (Eq, Ord, Show, Read, Num, Integral, Real, Enum)
instance (Integral a, Ord a, Arbitrary a) => Arbitrary (Prime a) where
arbitrary = (Prime . (\n > P.head $ P.dropWhile (< n) primes)) `fmap`
(arbitrary `suchThat` (> 1))
 Test code for primitive root determination.
prop_primitive_root ((Prime n) :: Prime Int) =
primitiveRootTest n $ primitiveRoot n
primitiveRootTest :: Int > Int > Bool
primitiveRootTest p root
 isPrime p = (sort $ toList $ pows) == [1..p1]
 otherwise = error "Attempt to take primitive root of nonprime value"
where pows = generate (p  1) (expt p root)
We need to generate prime numbers to test the primitive root calculation, and it’s much quicker to write a separate QuickCheck generator, using a specialised Arbitrary
instance for a custom Prime
newtype
to do this than to try generating positive integers and filtering out nonprimes using QuickCheck’s ==>
filtering operator. We can collect some evidence that the algorithm works by doing something like this in GHCi:
> :load PrimeFFT.hs
[1 of 1] Compiling PrimeFFT ( PrimeFFT.hs, interpreted )
Ok, modules loaded: PrimeFFT.
> verboseCheckWith (stdArgs { maxSize=25 }) prop_primitive_root
Passed:
Prime {getPrime = 3}
Passed:
Prime {getPrime = 7}
...
Passed:
Prime {getPrime = 14771}
Passed:
Prime {getPrime = 6691}
Passed:
Prime {getPrime = 23753}
+++ OK, passed 100 tests.
We can use the same approach for testing our implementation of Rader’s algorithm that we used for testing the mixedradix FFT algorithm, i.e. by comparing Rader FFT results to results of the basic DFT algorithm, and testing FFT/IFFT roundtrip calculations. The only slight wrinkle here is that we need to test for input vectors of prime length only, so that we need a specialised Arbitrary
instance to generate these:
instance Arbitrary VCD where
arbitrary = do
len < elements $ P.takeWhile (< 500) primes
fromList <$> vectorOf len arbitrary
With this, we can do some testing. In GHCi:
> :load PrimeFFT.hs
[1 of 1] Compiling PrimeFFT ( PrimeFFT.hs, interpreted )
Ok, modules loaded: PrimeFFT.
> quickCheck prop_dft_vs_fft
+++ OK, passed 100 tests.
> quickCheck prop_ifft
+++ OK, passed 100 tests.
Nice. It works. There are definitely improvements that we can make to the code, which we’ll do when we start optimising the overall FFT calculation, but the algorithm is complicated enough that it’s nice just to have a working version to start from!