SplitRadixReal FFT Library.

It’s well known, that running complex fft subroutine in embedded system isn’t rational. Input data have no complex imaginary part, and consequently about 50% of processing power lost for nothing. Another aspect, is memory consumption, complex fft output has 2x redundancy. Solution to this problems is also known, SplitRadix algorithm.  So I did once more same trick, like with Radix-4 code. I take readily available C code  of the SplitRadix, thanks to  and convert it to arduino library. The most troubling part was to figure out corresponding addresses in the Sinewave LUT, that would substitute integer sine and cosine values.

Difference between SplitRadixReal and Radix4:

  •  SplitRR: 4630 usec.
  •  RDX4:   6968 usec.

New library runs 1.5 times faster. And requires  2x less memory to run.

An example of the application, where saving on memory size doubles spectral resolution (and saves uCPU clock cycles as well), is my latest project Sound Camera,  I discovered that I can’t  get desirable frequency resolution (20 Hz) with my Radix4 fft library in this application. Its simply demands too much memory when fft size 2048,  all DUE 96 kbytes, and my compiler wasn’t agree with that: “section `.bss’ is not within region `ram’ collect2: ld returned 1 exit status”. And it’s on DUE SAM3X8E with its huge memory size, compare, for example, to Maple STM32 and bunch others, similar cortex-3 derivatives.

Seen this before in your sketch:

  • int f_r[FFT_SIZE] = { 0};
    int f_i[FFT_SIZE] = { 0};

Great news, you don’t need  f_i[.] anymore.

And last, re-print :

Created for Arduino DUE & like boards, word size optimized for 12-bits data input.

FFT takes as input ANY size of array 8, 16, 32, 64, 128, 256, 512, 1024, 2048.

Demands 2-x less memory to run !

(Does DSP Lib offer all this options? Please, drop me a message, if it does.)

LInk:   SplitRadixReal Library.

 Updates on 29 Sept. 2014:

Playing with the code, I realized, that one of the previous authors did an optimization for sine function, and it’d make sense if library installed on a Linux. Arduino, where sin & cosine  are n’t calculated at run time, but coming from LUT, requires absolutely different approach for optimization. So, I switch two “for” loops in reverse, making trigonometry math the most internal cycle, and it brings substantial acceleration in speed. Digits talk for itself:


  •  fft.2048:   4630 usec.
    fft.1024    2115 usec.


  •  fft.2048:  3479 usec.
  •  fft.1024:  1589 usec.

Now, the speed of new library more than twice better than Radix-4 code.

Download:  SplitRadixRealP – 2. for DUE board.



 Same code “optimized” to run on 8-bit arduino UNO ( and like ) board. Short summary:

 FFT takes as input ANY size of inputs array 8, 16, 32, 64, 128, 256, 512.
Max. size 512 defined by LUT (and uCPU memory limits).

Timing results, in usec:

  • fft.256:   4316
  • fft.512:   9572

 Download:   SplitRadixRealT  for UNO board.


Posted by on September 25, 2014 in Uncategorized


Sound Camera.



One more project, that shows breathtaking beauty of the FFT (Fast Fourier Transform). Once again, like in last 3D Ultrasonic Radar Project,    Arduino DUE was nominated to be Maestro, doing major part of the Digital Signal Processing in real time.  As you can see below, the hardware includes 4 “modules”:

  1. Sensor board
  2. Arduino DUE
  3. Bluetooth shield
  4. Android tablet.

Last two items aren’t strictly necessary. Alternative would be to connect TFT display directly to arduino, but I decided not to spend my time on re-inventing drawing-rendering software. Better to delegate all visualization stuff to the equipment that was  specifically design by big monsters in high tech industry.  I spend quite time digging into android graphics subject anyway, only hoping I can apply my knowledge somewhere else later on.


Sensor board holds 4 microphones from  SFE.  Plus a few decoupling components, capacitors and inductor in power line.


   Brief summary: Arduino sampling 4 analog inputs, close to 41 kHz,  x 4 = 164 ksps,  software library Radix4 posted on this blog was imported into project practically intact. DMA feature on Arduino DUE allows sampling rate up to 1 MSPS, and I already successfully tested its capability in 3D Radar project.  Having 2048 fft size, at the first processing stage  output there are 1024 bins 20 Hz each. Than, using arctangent LUT, phase of each bin is extracted.  Difference in phases two vertically position microphones gives Y component, and two horizontally spaced mic’s – X component. Sound source is localized with accuracy ~ 0.5 degree. Have to say, that on the lower frequency end, 100 Hz – 1 kHz , where wavelength is huge compare to spacing between two mic’s ( 3.4 meters at 100 Hz ), accuracy is deteriorating proportionally to wavelength.

Arduino is calculating data really fast, providing  X,  Y, and M  every 50 milliseconds. M – is for magnitude. Than, all this data stream flows to android over BT.  Everything else is obvious, watch the video.

Speaker outputs white noise, as for single tone (frequency) only one pixel would be visible on screen. Android software “colorized” picture based on a frequency, low range – starting from red, and up to violet on high end of the frequency band, through all 1024 color wheel possibilities.  You can see, that picture saturated with green and blue, and there is almost no red color. There are two things, first is a speaker, not performing well at low end. Second nuance is the fact, that low frequencies are not “grouped” so effectively, due to the localization error, what I tried to explain in a paragraph above. I created an option in the menu to select different types of colorization, based on a frequency or based on a magnitude. They are look pretty similar for white noise source, so there is only one video clip.

Have fun.


 Edited on 21 Oct. 2014:

 If you come across this page searching an old  “Localizator” project, published over 2 years ago, here is working material I was able to find:   Localizator.


Posted by on September 12, 2014 in Advanced projects., FFT series., Uncategorized


Tags: , , , , , , , , , ,

Ultrasonic 3D Radar.

This page is next level of Virtual Touch Screen project. 

First things is a distance, for virtual touch screen its less than 3 m, because the reflective area is too small. For radar (or sonar) its different, and the bigger size of object the stronger echo. Approximated range of detection the object as big as a wall, 30 meters.

Technically, there are two hardware parts were added, to fully demonstrate extra ordinary sensitivity of the VTS project. First one is the BlueTooth module. And second is a tablet, running android. Device that I have, doesn’t support USB host mode (OTG), otherwise I may be fine w/o BT, just transfer a data over USB cable, as it was done in two previous demo video clips.  Have to say, it was not easy to represent 3D perspective on a flat screen, and picture below shows what I designed to complete a task:


Don’t think it requires a comments, the tricky part was to create an elliptical grid to show a distance. The number of circles is not limited to 2, I’d think about how to film next demo video, that ‘d show a “volume”.

Enjoy the movie:

There are two apples, and arduino measure position in 3D space both of them. X, Y, and Z coordinates plus P – power of reflected ultrasonic wave used to draw circles, with different colors. You can see movement of the red circle on screen when first apples moves.

edited on 21-08-2014

After thinking awhile how to show a “volume” on a flat tablet screen, I decide to remove filtering stage in a software, when a bunch of consecutive “layers” were shown as one single ring (object) on a screen. Now each packet of data received from single “spherical” layer creates a circle. As always packet includes X, Y, Z, and P. To make an image “clear” there are two others filters left over in the processing algorithm. One is rejecting data below ( selectable in a menu ) power threshold, and another rejects anything thats located farther specific (again, selectable) distance. This is why in a video you can see only a ball, but not me – operator making a movie.

Here is how the ball looks like on radar screen:


And video:

That;s it for now.


Posted by on August 20, 2014 in Uncategorized


Tags: , , ,

Virtual touch screen (3D Ultrasonic Radar).

First things: there are no servo motors. No motors or any mechanical moving parts in this project.

There are 4 HC-SR04 ultrasonic sensor units, only one is working as transmitting – pinging module and receiver simultaneously, 3 others are just receivers. Arduino DUE, of course, is a prime violin of the whole project. Small prototype board has an additional 4-channel buffer amplifier (MCP6024).

Technical specification ( approximately ):

  • Scanning range 3 m, limited very low power and sensitivity of the HC-SR04 devices.
  • Spacial XY resolution depends on a distance, has to be determined.  Two object should be position at least 5 cm apart and not on the same spherical surface around sensor board.
  • Directivity diagram +-30 degree, or so.
  • Spacial Z – (distance) resolution 12 um. No typo error, micrometers.
  • Time to complete full scan 16 milliseconds, frame rate may vary from 60 Hz down to 10 Hz if there is strong reverberation.

Have to say, that ultrasonic units were slightly modified, to brought out an analog signal (40 kHz) before it gets quantization in the local uCPU.  After amplification, 4-channels are digitized by arduino’s ADC (12-bits 1 MSPS).

Fast Fourier Transform, not much differs from the published on this blog library. I’m not ready to disclose complete signal processing algorithm, and is not publishing a code, at this time. Nevertheless, I don’t mind to answer reasonable /meaningful questions.

Video: have to practice more -);

A few comments on a video clip. I intentionally use a pen to draw a picture, even it’s almost violate the basic of the physics, because reflective area of the pen practically equals to wave length, 8.5 mm for 40 kHz in the air. You can see, that arduino is loosing track on a few occasions. Distance ~ 1m.

Computer is running Linux with regular “mtPaint 3.40″ from official source. Software is receiving a mouse commands, as its has no idea where this commands come from. In the same way, if you draw a picture manually. To interface with a host, arduino emulates a left button and XY move mouse commands using “build-in” mouse driver, and I copy ButtonMouseControl example from IDE.

The surface of the touch screen is “virtual”, first things arduino does after I send a command over serial monitor console to start a drawing, is “search – scan” an object. Whatever it finds first, the closest one, would be “locked” and distance to this object is interpreted as “touched – untouched”. This is first try, and I was not pushing hard on the gesture pattern recognition yet. But as you can guess, there is no limits to “slide” “rotate” “scroll” etc movement discrimination, with only one exception. There is no “multi-touch”, as I mentioned in the specification section, two object has to be 5 cm apart. This limitation is introduced by two shortcomings of the current hardware design. First one, because there is no phase array, only one unit is transmitting ( in the middle on right side ), so there is no way arduino could identify two objects on the same sphere. Second, is low sampling rate of the ADC. In order to “shrink” XY spatial resolution down to wave length (8.5 mm), sampling rate has to be at least 6 MSPS or so.

Tracking update rate (scan frame rate – don’t confuse with a video)  is set to 32 fps.



eddited: 14 Aug. 2014       “New technology is rising!”

Second video clip is posted, that demonstrates better tracking stability over bigger distance range.

Distance is 1.2 m, same pen. I think, that all for Virtual Touch Screen demonstration. Any improvements I could make in a code ‘d introduced only small changes in overall representativity of the project.

This HID technology is completely new area for me, and I’m not a professional programmer. Be curious, I look into “regular” touch screen (resistive – capacitive)  library free accessible on the i-net. I find over 100 variables that initialized and updated in order to keep track on a bunch of real-time parameters, that “normal” TS supplies with 10 ms time interval. Another 100’s variables are buried inside proprietary driver in the OS. It would takes a years to run a test and debug effects each of this variables on stability, smoothness,  susceptibility etc. And moreover, my invention Virtual TS – 3D would require a lot more than a 100’s….

edited: 26 Aug. 2014   Answering the question, modification to HC-SR04 module.

There is an electrical drawings, I was able to locate on-line:

And photo:                  circuits_mod

As you can see, analog 40 kHz output is taken from pin 7, LM324. Conveniently,  it’s the rightest one, close to the board edge. Module – transmitter has a jumper wire over B-E of the transistor, others 3 units may not have this wire. I find out, that unit doesn’t transmit anything till it gets a response, that may never happened for echo reflected from tiny object like a pen.  It looks like on-board uCPU is waiting a transition in poling mode.  And additional amplification stage I build with MCP6024, is similar to first stage of the LM324 (U2D), with a gain x50.  In my first try, I connect output of LM324 directly to arduino DUE analog inputs, basically its not realy safe, as voltage goes close to 3.6-3.7 V. But than introducing MCP6024 (rail-to-rail) I decrease power voltage of the OPA down to 3.3V,  not to worry about my DUE.


Posted by on August 10, 2014 in Uncategorized


Tags: , , , ,

FFT Library for Arduino Due, SAM3X CPU.

Updates on 25 Sept. 2014

 New version FFT Library available, based on Split Radix Real algorithm. 


Created for Arduino DUE & like boards, word size optimized for 12-bits data input.

FFT takes as input ANY size of array 8, 16, 32, 64, 128, 256, 512, 1024, 2048.
Max. size 2048 defined by LUT.

Library may run on different platform, only PROGMEM macros and variable declaration
type may need to be adjusted accordingly.

Algorithm tested on Arduino DUE and IDE 1.5.6-r2 (Tested on Linux OS only).

Timing results, in usec, fft-2048:

  •   Hamng – 864
  •   Revb   –  817
  •   RDX4 – 6968
  •   GainR  – 320
  •   Sqrt  –  5297
  •   Sqrt2   – 405

There is two sub-functions for magnitude calculation, as you can see second one runs
more than 10x times faster, but it is less accurate, error in worst case scenario
may reach 5 %. Approximation based on

Short summary, DUE is ~15x times faster than UNO.

Analog Input – 0, Default sampling rate 48 000.
Link to   file.

Don’t missed out, here UNO version of the RADIX4 library.



Posted by on March 11, 2014 in Uncategorized


RADIX-4 FFT (integer math).

Updates on 30 Sept. 2014:

Everything below is correct, and may worth to read. But new  code based on Split Radix Real is published. Faster, lower memory demands, both version for UNO and DUE available as libraries.  New algorithm makes it possible to run FFT_SIZE =  512 on UNO board in less than 9.6 milliseconds.


Tweaking the FFT code, that I’ve published earlier in my series of blogs, I hit a “stone wall”. There are nothing could be improved in the “musical note recognition” version of the code, in order to make it faster. At least, nothing w/o completely switching to assembler language, what I’m trying to avoid for now.  I’m sure, it’s the fastest C algorithm. Looking around it didn’t take long to find out that there is other option: change RADIX-2 algorithm for RADIX with higher order, 4, 8, or split-radix approach. Putting split-radix aside, (would it be my next adventure?), RADIX-4 looks promising, with theoretically 1/4 reduction in number of multiplications (what I believe is an “Achilles heel”).

Googling for awhile, I couldn’t find fixed point version in plain C or C++ language. There is TI’s “Autoscaling Radix-4 FFT for MS320C6000TM” application report, which I find useful , but the problem is it’s “bind” with TI microprocessors hardware multiplier, and any attempt to re-write code would, probably, make it’s performance even worse than RADIX-2. Having “tweaking” experience with fix_fft source code from:             I decide to follow same path, as I did before, adapting fix_fft for arduino: take their floating point source, disassemble it to the pieces, and than combine all parts back as fixed point or integer math components.    And you know what ? Thanks God, I successed!!!

I decided not all parts to re-assemble back again, this is why fft_size has to be power of 4 ( 16, 64, 256, 1024 etc.). Next, the software is “adjustable” for different level of the optimization. Trade is always the same, accuracy against speed. I’d highlight 3 level at this point:

1. No optimization, all math operation 15-bits.   The slowest version. Not tested at all.

2. Compromise version.  Switches: 12-bits Sine table, regular multiplication (long) right shifted >>12, Half-Scaling in the sum_dif_I (RSL) >>1. Recorded measurements result:  24 milliseconds with N = 256 fft_size.

3. Maximum optimization. Switches: 8-bits Sine table, macro assembler multiplication short cut, no scaling in the core. Timing 10.1 millisecond!!!

Fastest. Best of the Best Ever written FFT code for 8-bit microprocessor.   Enjoy the meal:

Here is slightly modified copy, where I moved sine table from RAM to FLASH memory using progmem utility. For someone, who was curious to find the answer: how much progmem slower compare to access data in the RAM, there is an answer. 10.16 milliseconds become 10.28, or 120 usec slower. Divide by 84 x 6 = 504 number of readings, each progmem costs 0.24 useconds. Its about 4 cycles CPU.

Screenshot from the running application, signal generator running on the computer, feeding audio wave to OPA and than analog input 0. Look for hardware setup configuration on the “color organ” blog-post.

BTW, there is one more important thing, I missed to emphasize in my short introductory paragraph, code offers FLEXIBILITY over SNR ratio. Basic FFT algorithm has an intrinsic “build-in” GAIN: G(in) = FFT_SIZE / 2 . (in) stands for intrinsic. That is perfect value for fft_size = 64 ( Gain = 64 / 2 = 32) and arduino (Atmel AtMega328)  10-bit ADC ( max value = 1023 ). FFT output would be 32 x 1023 = 32736, exactly 15 bit + sign. In other words, scaling in the algorithm core doesn’t required at all! That alone improve speed and lower rounding noise error significantly. The same time G(in)  grows too high with FFT_SIZE = 256, when G = 256 / 2 = 128 and output of the FFT would overflow size of 16-bit integer math. But again, scaling don’t have to be 100%, as long as there is a way to keep it in balance with ADC data. In this particular case, with 10-bit ADC, we can keep gain just below 32, it’s not necessary to make it exactly “1”.  For 12-bit ADC upper G limit would be 8, still not “1”. To manipulate the gain, division by 2 (>> 1) in the “sum_dif_I” could be set, to prevent overflow with fft_size > 64. Right shift “gain limiter” creates a square root adjustment, according to new formula: G(rsl) = SQRT (FFT_SIZE) / 4 . (rsl) stands for right-shift-limiter.

  1.  G = 1 for fft_size = 16,
  2.  G = 2 for fft_size = 64,
  3.  G = 4 for fft_size = 256,
  4.  G = 8 for fft_size = 1024.

Summing up, for using RADIX-4 with arduino ADC and FFT_SIZE <= 64, keep division by 2 (>> 1) in the “sum_dif_I” commented out. In any other circumstances, >10 bits external ADC, >64 fft_size, uncomment it.

To be continue…..

Leave a comment

Posted by on March 24, 2012 in Advanced projects., FFT series.


Tags: , , , , , , , ,


Get every new post delivered to your Inbox.

Join 51 other followers