Math: FFT: Optimizations with Xtensa HiFi code by singalsu · Pull Request #10637 · thesofproject/sof

singalsu · 2026-03-20T15:42:53Z

No description provided.

This patch optimizes the cycle count of the radix-2 Cooley-Tukey implementation with with three changes: - Dedicated depth-1 stage: all N/2 butterflies use a real twiddle factor W^0 = 1+0j, so the complex multiply is replaced by plain add or subtract. - Skip multiply for j=0 in stages >= 2: The first butterfly in every group also uses W^0, saving an additional ~N/2 complex multiplications across all remaining stages. - Pointer arithmetic: replace per-butterfly index arithmetic (outx[k+j], outx[k+j+n], twiddle[i*j]) with auto-incrementing pointers and strided twiddle access (tw_r += stride), eliminating integer multiplies for address computation. This change saves 11 MCPS (from 74 MCPS to 63 MCPS) in STFT Process module in MTL platform with 1024/256 size/hop FFT processing. It was tested with scripts: scripts/rebuild-testbench.sh -p mtl scripts/sof-testbench-helper.sh -x -m stft_process_1024_256_ \ -p profile-stft_process.txt Signed-off-by: Seppo Ingalsuo <seppo.ingalsuo@linux.intel.com>

Copilot

Pull request overview

This PR refactors the multi-radix FFT implementation by moving the generic multi-FFT logic out of fft_multi.c and introducing a HiFi3-optimized implementation using Xtensa intrinsics, plus a small HiFi3 FFT kernel optimization.

Changes:

Split dft3_32() and fft_multi_execute_32() out of fft_multi.c into new generic and HiFi3-specific source files.
Add a new HiFi3-optimized multi-FFT implementation (fft_multi_hifi3.c) using packed complex arithmetic and fused MAC operations.
Optimize the HiFi3 32-bit FFT kernel by special-casing the first stage and skipping the twiddle multiply for j=0.

Reviewed changes

Copilot reviewed 6 out of 6 changed files in this pull request and generated 1 comment.

Show a summary per file

File	Description
test/cmocka/src/math/fft/CMakeLists.txt	Updates unit test build inputs to compile the new split multi-FFT sources.
src/math/fft/fft_multi_hifi3.c	Adds HiFi3-optimized `dft3_32()` and `fft_multi_execute_32()` implementations.
src/math/fft/fft_multi_generic.c	Adds the generic `dft3_32()` and `fft_multi_execute_32()` implementations previously in `fft_multi.c`.
src/math/fft/fft_multi.c	Removes `dft3_32()`/`fft_multi_execute_32()` from this file, leaving plan allocation/free + twiddle table inclusion.
src/math/fft/fft_32_hifi3.c	Optimizes HiFi3 FFT stage execution (skip twiddle multiply in first stage and for `j=0`).
src/math/fft/CMakeLists.txt	Adds the new multi-FFT source files to the build when `CONFIG_MATH_FFT_MULTI` is enabled.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

src/math/fft/fft_multi_generic.c

This patch adds HiFi3 versions for functions dft3_32() and fft_multi_execute_32(). The functions are implemented to fft_multi_hifi3.c and the generic versions are moved to fft_multi_generic.c. in MTL platform the optimization saves 119 MCPS, from 237 MCPS to 118 MCPS. The test was done with script run: scripts/rebuild-testbench.sh -p mtl scripts/sof-testbench-helper.sh -x -m stft_process_1536_240_ \ -p profile-stft_process.txt The above STFT used FFT length of 1536 with hop 240. Signed-off-by: Seppo Ingalsuo <seppo.ingalsuo@linux.intel.com>

Copilot

Pull request overview

Copilot reviewed 6 out of 6 changed files in this pull request and generated 2 comments.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-03-20T17:17:01Z

src/math/fft/fft_multi_generic.c

+#ifdef DEBUG_DUMP_TO_FILE
+	FILE *fh1 = fopen("debug_fft_multi_int1.txt", "w");
+	FILE *fh2 = fopen("debug_fft_multi_int2.txt", "w");
+	FILE *fh3 = fopen("debug_fft_multi_twiddle.txt", "w");
+	FILE *fh4 = fopen("debug_fft_multi_dft_out.txt", "w");
+#endif


DEBUG_DUMP_TO_FILE block uses FILE, fopen(), fprintf(), and fclose() but this file does not include <stdio.h>. If DEBUG_DUMP_TO_FILE is enabled, this will fail to compile; include <stdio.h> within the same #ifdef DEBUG_DUMP_TO_FILE guard (or unconditionally if preferred).

Copilot · 2026-03-20T17:17:01Z

src/math/fft/fft_multi_hifi3.c

+/** @brief Q1.31 constant -0.5 */
+#define DFT3_COEFR -1073741824
+/** @brief Q1.31 constant sqrt(3)/2 */
+#define DFT3_COEFI 1859775393
+/** @brief Q1.31 constant 1/3 */
+#define DFT3_SCALE 715827883
+
+/**
+ * dft3_32() - Compute 3-point DFT of Q1.31 complex data (HiFi3).
+ * @param x_in Pointer to 3 input complex samples in Q1.31.
+ * @param y Pointer to 3 output complex samples in Q1.31.


The DFT3 documentation here says the output is Q1.31, but the public API comment for dft3_32() in sof/math/fft.h describes the output as Q3.29. Please reconcile these so callers have a single, accurate definition of the output format (either update this comment to match the API, or adjust the API docs if they are incorrect).

Suggested change

/** @brief Q1.31 constant -0.5 */

#define DFT3_COEFR -1073741824

/** @brief Q1.31 constant sqrt(3)/2 */

#define DFT3_COEFI 1859775393

/** @brief Q1.31 constant 1/3 */

#define DFT3_SCALE 715827883

/**

* dft3_32() - Compute 3-point DFT of Q1.31 complex data (HiFi3).

* @param x_in Pointer to 3 input complex samples in Q1.31.

* @param y Pointer to 3 output complex samples in Q1.31.

/** @brief Q3.29 constant -0.5 (same fixed-point format as dft3_32() inputs) */

#define DFT3_COEFR -1073741824

/** @brief Q3.29 constant sqrt(3)/2 */

#define DFT3_COEFI 1859775393

/** @brief Q3.29 constant 1/3 */

#define DFT3_SCALE 715827883

/**

* dft3_32() - Compute 3-point DFT of Q3.29 complex data (HiFi3).

* @param x_in Pointer to 3 input complex samples in Q3.29.

* @param y Pointer to 3 output complex samples in Q3.29.

singalsu force-pushed the math_fft_xtensa_hifi_optimizations branch from 6acb7ad to 0fef7d3 Compare March 20, 2026 16:00

singalsu marked this pull request as ready for review March 20, 2026 16:29

singalsu requested review from dbaluta, lbetlej, lgirdwood, mmaka1 and plbossart as code owners March 20, 2026 16:29

Copilot AI review requested due to automatic review settings March 20, 2026 16:29

singalsu requested a review from kv2019i as a code owner March 20, 2026 16:29

Copilot started reviewing on behalf of singalsu March 20, 2026 16:30 View session

Copilot AI reviewed Mar 20, 2026

View reviewed changes

src/math/fft/fft_multi_generic.c Outdated Show resolved Hide resolved

singalsu force-pushed the math_fft_xtensa_hifi_optimizations branch from 0fef7d3 to 9e70d76 Compare March 20, 2026 16:51

singalsu requested a review from Copilot March 20, 2026 17:11

Copilot started reviewing on behalf of singalsu March 20, 2026 17:12 View session

Copilot AI reviewed Mar 20, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Math: FFT: Optimizations with Xtensa HiFi code#10637

Math: FFT: Optimizations with Xtensa HiFi code#10637
singalsu wants to merge 2 commits intothesofproject:mainfrom
singalsu:math_fft_xtensa_hifi_optimizations

singalsu commented Mar 20, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Mar 20, 2026

Uh oh!

Copilot AI Mar 20, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

singalsu commented Mar 20, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Copilot AI Mar 20, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 20, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants