Quote Originally Posted by Darrel Taylor View Post
That's all stuff that can be done later if things work out.
I think right now, to prove that it does "optimize" the divide routine, you need a "Base Line".
How long does it take PBP to do a single signed 32-bit divide.
And how long does it take yours.
Just a few tests at various bit sizes will tell if there's any benefit or not.
<br>
That's the thing...It depends on the bits in the divisor and the dividend.
I'm sure you know how binary division works, set a result bit, shift everything over one, subtract, if there's a carry, add it back in, reset the result bit, try again.
Like I said, I used your little error-checker, and the only error I can get out of the last code posted was 0/0. Everything else checks out good (unless my implementation is wrong!).
As it is, I changed up the code just a bit to only send out to the LCD once every 256 loops (if bl.byte0=0 then -do the lcd-). So, obviously, the LCD was taking up loads of cycles, which it does anyways.
The optimize's as they stand right now (the right shift version) do speed things up a little bit, overall, similar to how a right shift saves time over a divide by 2. In some cases though, they use a few more cycles.
I know the 'left shift loop reduction' method on R0 and R1 has decent grounding. Just can't figure out how to implement it without wrecking the results!
It might be easier to just use 4 different versions, where the version chosen is based on the larger of R0 and R1 (32, 24, 16 and 8 bit versions). Will use a bit more memory though...