By the way: The ARM instruction set isn't slower than the Thumb one. They are both executed at the same speed. The reason why it is slower is the 16 bit memory bus to the game cartridge. So ARM is commonly used in the sound rendering routines because the IWRAM has a 32 bit bus and can read and write a 32 bit integer by one CPU cycle and therefore you can benefit from more possible instructions (by executing ARM code from IWRAM) and you can reduce the overall amount of needed instructions (not bytes) for the code which makes it actually faster than Thumb code.
... Just to clarify things ;-)