That's the thing...It depends on the bits in the divisor and the dividend.
I'm sure you know how binary division works, set a result bit, shift everything over one, subtract, if there's a carry, add it back in, reset the result bit, try again.
Like I said, I used your little error-checker, and the only error I can get out of the last code posted was 0/0. Everything else checks out good (unless my implementation is wrong!).
As it is, I changed up the code just a bit to only send out to the LCD once every 256 loops (if bl.byte0=0 then -do the lcd-). So, obviously, the LCD was taking up loads of cycles, which it does anyways.
The optimize's as they stand right now (the right shift version) do speed things up a little bit, overall, similar to how a right shift saves time over a divide by 2. In some cases though, they use a few more cycles.
I know the 'left shift loop reduction' method on R0 and R1 has decent grounding. Just can't figure out how to implement it without wrecking the results!
It might be easier to just use 4 different versions, where the version chosen is based on the larger of R0 and R1 (32, 24, 16 and 8 bit versions). Will use a bit more memory though...
Bookmarks