It’s well known, that running complex fft subroutine in embedded system isn’t rational. Input data have no complex imaginary part, and consequently about 50% of processing power lost for nothing. Another aspect, is memory consumption, complex fft output has 2x redundancy. Solution to this problems is also known, SplitRadix algorithm. So I did once more same trick, like with Radix-4 code. I take readily available C code of the SplitRadix, thanks to http://www.jjj.de/ and convert it to arduino library. The most troubling part was to figure out corresponding addresses in the Sinewave LUT, that would substitute integer sine and cosine values.

Difference between SplitRadixReal and Radix4:

- SplitRR: 4630 usec.
- RDX4: 6968 usec.

New library runs 1.5 times faster. And requires 2x less memory to run.

An example of the application, where saving on memory size doubles spectral resolution (and saves uCPU clock cycles as well), is my latest project Sound Camera, I discovered that I can’t get desirable frequency resolution (20 Hz) with my Radix4 fft library in this application. Its simply demands too much memory when fft size 2048, all DUE 96 kbytes, and my compiler wasn’t agree with that: “section `.bss’ is not within region `ram’ collect2: ld returned 1 exit status”. And it’s on DUE SAM3X8E with its huge memory size, compare, for example, to Maple STM32 and bunch others, similar cortex-3 derivatives.

Seen this before in your sketch:

- int f_r[FFT_SIZE] = { 0};

int f_i[FFT_SIZE] = { 0};

Great news, you don’t need f_i[.] anymore.

And last, re-print :

Created for Arduino DUE & like boards, word size optimized for 12-bits data input.

FFT takes as input ANY size of array 8, 16, 32, 64, 128, 256, 512, 1024, 2048.

Demands 2-x less memory to run !

(Does DSP Lib offer all this options? Please, drop me a message, if it does.)

LInk: SplitRadixReal Library.

* Updates on 29 Sept. 2014:*

Playing with the code, I realized, that one of the previous authors did an optimization for sine function, and it’d make sense if library installed on a Linux. Arduino, where sin & cosine are n’t calculated at run time, but coming from LUT, requires absolutely different approach for optimization. So, I switch two “for” loops in reverse, making trigonometry math the most internal cycle, and it brings substantial acceleration in speed. Digits talk for itself:

**Before:**

- fft.2048: 4630 usec.

fft.1024 2115 usec.

**After:**

- fft.2048: 3479 usec.
- fft.1024: 1589 usec.

Now, the speed of new library more than twice better than Radix-4 code.

Download: SplitRadixRealP – 2. for DUE board.

**UNO VERSION.**

Same code “optimized” to run on 8-bit arduino UNO ( and like ) board. Short summary:

FFT takes as input ANY size of inputs array 8, 16, 32, 64, 128, 256, 512.

Max. size 512 defined by LUT (and uCPU memory limits).

Timing results, in usec:

- fft.256:
**4316** - fft.512:
**9572**

Download: SplitRadixRealT for UNO board.