An excursion into Fast Fourier Transform — The Basics and Time Series Analysis
Let’s begin by understanding how Fourier Transform (or more specifically Discrete Fourier Transform) works and how it is useful in software engineering.
Here we will look at a common problem of times series analysis.
Discrete Fourier Transform
Given any function f(x), one can represent it as a sum of sine and cosine functions. Each such sine and cosine function is characterized by the frequency, amplitude and the phase.
Using Fourier Transform, we obtain a function of the frequencies present in f(x) i.e. the frequencies of the sine and cosine functions that combined to give f(x). From this function of frequency, we can obtain the dominant frequencies i.e. the frequencies of the sine and cosine functions with the highest amplitudes.
Let F(v) be the function of frequencies obtained using Fourier transform on f(x), then F(v) is calculated as:
We use the above integration when the function f(x) is continuous, but in most real life practical applications f(x) is evaluated at discrete points.
For e.g. in time series, x is time and f(t) could be the average CPU usage for every t=5 minutes.
Thus for 24 hours (or 1440 minutes), we will obtain 288 values of f(t) evaluated at t=0, 5, 10,…. minutes.
In such cases, we cannot use integration but instead use summation.
When we are using fourier transform on discrete points, it is known as the Discrete Fourier Transform or DFT.
The values F(v) evaluated at v = 0, 1, 2…. and so on will give an estimate of the contribution of the frequency v to the time series f(t).
How does obtaining the frequencies present in f(t) helps us ?
Recall that a time series is composed of 3 different components — Trend, Seasonality and Residual.
Seasonality is a key component of a time series because it helps us uncover a pattern in the data. For e.g. CPU usage might decrease during the weekends as compared to weekdays.
The periodic pattern resembles a sine or a cosine wave.
Once we obtain the most significant frequency using DFT applied on f(t), we can then find the period of the time series which is usually 1/frequency.
One straightforward way to find the most significant frequency in DFT is the value of v for which F(v) is maximum.
dominant frequency = argmax F(v)
But it is not always true that the most significant frequency from DFT is the most useful for finding the period of the time series. The true period of the series might correspond to the 2nd or the 3rd significant frequency.
The Trend of the time series is obtained by subtracting the seasonal component from the time series and then fitting a linear or a non-linear regression model.
Residual is the random noise present in the time series after subtracting both trend and seasonality. This data is useful as it can be used to detect anomalies in the data.
Let us take an example to demonstrate further. Using the following Python code, we generate a time series with M different sine functions having M different frequencies and amplitudes.
import matplotlib.pyplot as plt
import numpy as np
def generate_points(n, m=20):
# generate m random frequencies and amplitudes
frequencies = np.random.randint(1, 100, m)
amplitudes = np.random.rand(m)
x = 0
for i in range(m):
x += amplitudes[i]*np.sin(2*np.pi*frequencies[i]*np.arange(n)/n)
return x
x = generate_points(288)
plt.figure(figsize = (8, 6))
plt.plot(x, 'r')
plt.xlabel('Time')
plt.ylabel('Usage')
plt.show()
The time series looks like below:
Ignore the ‘negative values for Usage’ for now as this is just an artificial series generated for demonstration purposes.
Also note that the time intervals are scaled from 0 to 287 with time units as t=1 instead of using 0 to 1439 with time units of 5 minutes, this is because in the DFT formula it is assumed that time steps are 0,1,2,… etc.
In case we desire to use a different time unit, we need to scale the formula accordingly.
Assuming that instead of t=1 we use t=H units, then the formula is updated to be:
If we look at the amplitude and frequency values for each sine wave sorted by the amplitude in decreasing order
sorted(zip(amplitudes, frequencies))[::-1]
[(0.9598146272682336, 60),
(0.7900554146878912, 84),
(0.6993323555752244, 60),
(0.6673129032509701, 91),
(0.5914894357361157, 11),
(0.5564108219214691, 24),
(0.5167964185482653, 63),
(0.5162064124539919, 13),
(0.4447807298455215, 76),
(0.39218453053836766, 87),
(0.383488871651234, 41),
(0.35768500440511364, 49),
(0.29957128133876154, 67),
(0.28179068301600996, 98),
(0.2690479882730439, 13),
(0.25546812772461147, 30),
(0.08254692221137805, 86),
(0.06729854000667723, 55),
(0.019577119863376846, 54),
(0.0074330132044285735, 71)]
The frequency with the value of 60 is the dominating frequency i.e. highest amplitude.
Now let’s do the DFT on the time series (using the DFT formula).
def dft(x):
n = len(x)
points = np.arange(n)
freq = points.reshape((n, 1))
e = np.exp(-2j * np.pi * freq * points / n)
return abs(np.dot(e, x))
f = dft(x)
On plotting the frequencies and their corresponding amplitudes we obtain the following plot:
If we look at the frequencies with the highest amplitudes from the DFT, we will see that at frequencies 60 and 228, the highest amplitude occurs, then the next highest occurs at 84 and 204 and so on.
np.argsort(f)[::-1]
array([ 60, 228, 204, 84, 13, 275, 197, 91, 11, 277, 24, 264, 225,
63, 76, 212, 201, ...], dtype=int64)
If we look at our generated data, then the decreasing order of amplitudes occurs at frequencies 60, 84, 13 and so on.
Thus half of our DFT results matches with our intuition, but the other half is unexpected. Also note that the other half has frequency values quite different than intended because the range of frequencies used in our input was in the range 0 to 100 whereas in the DFT, we see frequencies with values like 228, 204 etc. which are all > 100.
If you look at those additional frequencies, you will uncover a pattern i.e. 228=288–60, 204=288–84 and so on. If the frequency v has amplitude of A, then the frequency N-v also has amplitude of A, where N is the size of our data which is 288 in this example.
Mathematically:
Where the last reduction follows from the fact that:
If we take the absolute values of both F(v) and F(T-v), they will be equal although the imaginary parts differ by a negative sign.
i.e. abs(F(v)) = abs(F(T-v))
Thus the plot is symmetrical along the frequency axis. It is enough to take only the 1st half values.
The time complexity for calculating the DFT with input size of n is O(n²) because for each frequency v, we are calculating the sum with n terms of the input f(t) and there are n different values of v from 0 to n-1 that we are evaluating.
For n > 10⁶, the DFT calculation is not scalable and many time series data can have millions or billions of points.
How do we improve the time complexity?
This is where Fast Fourier Transform comes into the picture.
Fast Fourier Transform
From the above DFT formula, we can observe that:
Thus
Assuming that T is divisible by 2, we can decompose the DFT into 2 equal parts as follows:
The left summation corresponds to even points and right summation corresponds to odd points. Rewriting the above, we get:
Let us denote the left and right summations separately as follows:
and..
From our earlier results, we can show that the following are true for G and H.
Thus the values for G(v, T/2) for v ranging from 0 to T/2–1 is same as the corresponding values for v from T/2 to T-1.
Same thing holds true for H(v, T/2).
Once we compute G(v, T/2) and H(v, T/2) for v from 0 to T/2–1, we can obtain F(v) for v=0 to T/2–1.
Also we can re-use the same G and H values to find F(v) from T/2 to T-1.
Here is a python implementation using recursion.
def fft(x):
n = len(x)
if n == 1:
return x
even_fft = fft(x[::2])
odd_fft = fft(x[1::2])
twiddle_factors = np.exp(-2j*np.pi*np.arange(n)/n)
m = int(n/2)
return np.concatenate([even_fft + twiddle_factors[:m]*odd_fft,
even_fft + twiddle_factors[m:]*odd_fft])
f = abs(fft(x))
The time complexity of the above code is O(nlogn) instead of O(n²) in our naive DFT implementation above. This is because we are solving for the following recursion:
where the constant times ’n’ term is due to the concatenation of the terms F(0) to F(n/2–1) with F(n/2) to F(n-1).
Note that the above FFT algorithm works only when in each recursion, the size n is even which requires that the number of points is a power of 2. In case it is not a power of 2, then we can simply pad the remaining points with zeros to make length of x equal to its closest power of 2.
Doing a performance benchmark on both the naive DFT and FFT implementation, we uncover that it is infact true that FFT outperforms naive DFT by several factors for large number of input points.
for n in range(8, 15):
x = generate_points((1 << n))
print((1 << n))
%timeit dft(x)
%timeit abs(fft(x))
print()
The results are as follows:
256
10.3 ms ± 145 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
7.78 ms ± 248 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
512
51.6 ms ± 972 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
15.6 ms ± 290 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
1024
200 ms ± 14.1 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
31.5 ms ± 672 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
2048
824 ms ± 13.8 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
64.3 ms ± 2.07 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
4096
3.23 s ± 212 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
133 ms ± 8.1 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
8192
12.4 s ± 578 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
276 ms ± 13.6 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
16384
57 s ± 3.05 s per loop (mean ± std. dev. of 7 runs, 1 loop each)
713 ms ± 53.8 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
For 16384 points, naive DFT takes around 60s or 1 minutes whereas FFT takes around only 760 ms i.e. an improvement of over 79 times.
I will conclude this post here. In the next part of this post we will look at how we can leverage FFT for fast integer multiplication and related problems.
Part 2