intro
A fixedpoint.cpp that was complied by matlab mex, so that realisticfixedpoint fft can be simulated as fast as the builtin matlab fft() function.
Please find the project at https://github.com/yanxianghuang/fpfft/
detailed implementation
You can config the numberofbit for input number and twiddlefactors. Currently,
$$numberOfBits = integerBits + fractionBits$$
integerbit represents the bits before the decimal, whereas fractionbit shows the one after. For twiddle factor, as the range is always [1, 1], the integerbit is always 2, and the user may specify the fractionbit as you like. For input number, a 3+5 bit input example 010.01100 represents 2+0.25+0.125=2.375.
The FFT is implemented with decimatedintime fft (see below for 8point example):
Internally, after each addition, the number is rightshifted by one, to avoid wordlength increase. The same is done for the twiddle factor multiplication (right shift by fractwiddlesize). Therefore, the wordlength within the intern FFT is always kept the same. This minimize hardware cost for the realhardware we want to model.
user’s guide
requirements
64bit Matlab with license
 signal_toolbox (bitrevorder)
 optional: fixed_point_toolbox, just to show the difference againt naive implementation. This is not mandantory.
file structure
 src
 demo.m (a runme demo)
 fix_fft2k.m (2k fixedpoint fft, that uses the mex files)
 fix_fft.(mexa64, mexw64) the core fix_fft function
 fix_fft.cpp source file that was compiled into the mex files
usage
run demo.m. It will call fix_fft2k.m, a 2k fft instance.
You can modify the fix_fft2k.m into other FFT size you want. Currently, it is verifed only for powerof2 fftsize. So if you are using nonepowerof2, please pads zeros. Besides, we are not using dynamical memory allocation in C++, to increase robustness (not freeing space when interrupted execution). Therefore, the maximum size is 2048. If you are looking for longer FFTs, simply change the 2048 in the cpp files, and recompile the cpp file.
hacker’s guide
compiling
If you donot have a mex file yet, or you modified the cpp file, execute in Matlab


to build a new mex file.
outcome
The speed of the Cmex is 60ms for 1k execution of 2k fft, comapring with 40ms for 1k execution of the samesize builtin fft().
The following figure shows the example result for the FFT, for sure the error is larger (plot4) than the case where we only quantify the input and keep the rest compuation as double precesion (plot5). The difference is about 100times, or 7bits. This is due to the onlyquantizeinput method keeps intermidiate computation as double precesion, which is overoptimistic for signal processing hardwares
The followig figure show the pareto points twfrac size againt inputfrac size. On each curve, the rms error is the same. Obviously, rightupper curves denotes less error. It generally tells that, the wordwidth of the input should be 5bit wider than the twiddle factors. Otherwise one of them becomes the bottleneck for error.