I'm trying to generate multiple audio tones using a DAC/resistor ladder (currently 4 bit / 16 values - will scale up eventually), but am troubleshooting increasing the speed of this loop for a single tone. Originally I was using a sine lookup table, but for multiple tones it appears to be too slow. I'm currently using a roughed-in binary version of 'Bhaskara I's sine approximation formula', which eliminates any lookups.

However, the implementation I have currently is also slower than I'd like. I will scale up to a 20Mhz OSC at some point, but would like to get this loop <~25uS with a 4Mhz osc if possible.

My question is: what instructions in the loop below are computationally expensive, and what changes could be suggested to speed it up? I'm not sure how much additional overhead I incur with word variable operations as opposed to byte variable operations...?

Thanks in advance!
Dave

Code:
define OSC 4

cmcon=7

timebyte    var byte
x           var byte
range       var word
ampvar      var byte
ampvar2     var byte
nvar        var word
dvar        var word
fvar        var byte
negflip     var byte

pausec1     con 25

TRISB = %00000000

timebyte=0

high porta.1

loop1:
x=timebyte & 127
range=128-x
range=range*x
nvar=range<<3
dvar=20480-range
dvar=dvar>>2
fvar=nvar/dvar
ampvar=8+fvar
negflip=2*timebyte.7
ampvar2=ampvar-negflip*fvar
PORTB=ampvar2
timebyte=timebyte+1
goto loop1