I wonder if SHIFTING all 16 values at once would be faster than looping? Darn zeros
Code:
shiftout dpin,clk,1,[DATA,DATA,DATA,DATA,DATA,...]