Think about DSPs or ARM or FPGA.
DSP are built to do this job.
An ARM processor is much faster and has large amounts of RAM.
With an FPGA I have done this with video not audio.