I can find “tons of examples” in PIC16 and smaller! I can not find any thing in PIC18F.
There are many web hits on PIC18F math but it is all PIC16F and smaller.

I am trying to find the RMS value of 4 audio streams.
I have it running on a bigger computer.
Read in ADC(9:0) 10 bits.
Absolute value(7:0) 8 bits. Loose the sign and drop one bit.
Square ADC^2(15:0) 16 bits
Output(23:8)=Output-from last time(23:0)-((Output(23:8)-ADC^2(15:0))/256)
Convert to db.

I have the convert to db happening very fast.
The Square is fast on the PIC18F. very slow in the PIC16
The average formula finds the e^-t average just like a Resistor Capacitor finds the average. How does it work? Compare input value to the average form last time. We are only concerned with the difference. The difference is divided by some amount (that is the time constant 256) and subtracted from the average. In this type of averaging recent samples greatly effect the output, but averages hundreds of samples ago have only a little effect. (and it takes up very little RAM)
It takes no code to divide by 256. Actually I want to divide by 1024 but I think there is no time to do the shifting. There is a 24 bit – 16 bit that is hard to do in PBP.
I coded part of it in PBP and looked at the output. PBP spends a lot of time moving the variables to temp, calling a subroutine, and moving the variables back from temp, then turning around and moving the same variables back into temp for the next call subroutine.

I am trying some nasty games like:
AverageUpper= bits 23:8 while AverageLower= bits 15:0 of the same 24 bit number. Using overlapping memory space. That way I can use the 16 bit math in PBP.

Considering the speed problem I should use assembly. I had it running on a PIC10 in assembler, just for a joke. If a PIC10 running 4mhz with no multiply can do the job s-l-o-w-l-y, then it should run in a PIC18 at 40mhz.

I want to keep PBP for the LCD and IO functions.