Development: ipatix' High Quality Sound Mixer | V2.1 released!

ipatix · Apr 24, 2014

ipatix' High Quality Sound Mixer V2.1

Introduction:
If you are not interested in technical stuff skip to assembly and insertion. This is a snippet which could be used by any hack regardless of the use of music hacks. It also improves the vanilla music quality at no cost.

Hello and welcome to a new development thread of mine.
As you might already know from the thread's name I developed a new Sound Mixing Routine for GBA games that use the M4A driver (aka Sappy driver).
But what is the Sound Mixer? To understand what it does you need to know how digital sound is produced and what hardware abilities the AGB has.
To explain it very basic: The AGB only has 2 hardware channels for sound playback (usually one for the left and one for the right speaker). With these 2 channels we could only play 1 stereo sound at a time. This is where the mixer and a resampler comes in handy: The mixer "mixes" together a few sounds and produces one output sound. Using this we can play back more sound at the same time and if we use this in combination with a resampler we can playback any sound at any given samplingrate (--> variable pitch for notes) at the same time.
All of this is done by Nintendo's (with some mods done by Game Freak) Sound Engine which comes with their SDK. Sounds cool, doesn't it?

There has been done one major flaw with the design of the Mixer though:
The Sound Mixer produces a short period of sound each frame (~ 1/60 s) which gets placed in a buffer in memory. This data is then transferred by hardware timers and DMAs to the sound circuit for playback. Since the AGB only supports 8 bit resoultion of the audio samples this buffer must have an 8 bit depth. Because Nintendo wanted to make their code use less System ressources and RAM they also use this output buffer as work area for the actual mixing. This might not sound very problematic but the issues we are getting is that a sound that has an 8 bit resolution has hearable quantization noise. This quantization noise is pretty low, however, each time the mixer adds another sound from a virtual channel (these are also called Direct Sound channels although they have nothing to do with Microsoft's DirectSound) it adds quantization noise to the buffer due to the volume scaling that is always done (we can't play all channels at a fixed volume). Because the quantization noise is applied once per channel it get's really loud and is really annoying (even in some commercial titles, not Pokemon though). In Pokemon games this is mostly not noticeable due to an untrained ear and the limited virtual Direct Sound channels of 5. 5 Direct Sound channels aren't much though (if you have ever done Music Hacking) and I personally do the 12 channel hack already for quite a long time. This makes the noise way worse though and it did make the music sound really bad sometimes.

Then I came up with a solution for this:
Let's use a work area with a higher bit depth (e.g. commonly used 16 bits) to eliminate quantization noise during the mixing process and only add the noise once for the final downscaling to the main output buffer. The only problem we're getting is that we need an additional "work buffer" in IRAM. We need to use IRAM and not regular WRAM due to the execution performance we need. This also why the mixing routine is really complicated and very annoying to read to get the best performance possible (Nintendo's original does so, mine is even wrose in that aspect). The other thing that is necessary to run things as fast is possible is that the mixing routine is placed in IRAM aswell for faster code loading and the ability to use the ARM instruction set with no performance cost but with the ability to reduce the overall amount of instructions.
Originally I disassembled Nintendo's mixer code and found out how it worked. I then developed the first version of this code which was just a slight modification of the original but was enough to realize my initial goal. You know, it worked but it had a few issues and the code was a bit slower than the original. I need to go a little offtopic how I got along to write Version 2 of my mixer:
Well, a friend showed me this game called "Golden Sun" (if you haven't played it I really recommend to do so!). I've played it throught and especially liked the soundtrack of this game. Being interested in things I tried to extract their music and discovered that they also used the M4A sound driver and I got the MIDIs as I wanted to. None the less I was even more impressed by the high sound quality "Golden Sun" and "Golden Sun TLA" had. It took me some time but after a bit I just realised that the game had an incredibly clear sound and low noise level. All of that was before I ever intended writing all of this here. Doing Pokemon hacks I was just like "I need this high quality sound too" and after writing the first version of my high quality mixer I had the idea of copying whatever Camelot did to improve the sound quality to my new code base. No idea what I was up to I ran up a few debugging tools and tried to find out what the game was doing differently than all other GBA games with the M4A driver. Well, it took me quite some time to read the most ugly assembly code I've ever seen in my life but I managed to understand and document it. What even more surprised me that Golden Sun's sound ran at way higher samplingrates than most games did. How could all of that work? Well, I found out that their code was simly 30-100% faster (scaling better with higher samplingrates) than Nintendo's original while still providing higher quality. This lead me to where I currently am, writing the new code with the "power of Golden Sun". In the end I didn't copy everything they had like an even more obscure reverb algorithm. But even without the fancy reverb the code has served me and a few other guys out there (check Bregalad's new Final Fantasy Advance Sound Restaurations!)

The Routine:
Version V2.1 is currently being refactored and made look nice. I don't really work actively on it but hopefully it'll end up more readable and better documented that it has been before. In the future I might move it to my GitHub, but for now it remains here. Try to understand things ;) I am definitely impressed if you do so!

Code:

@ created by ~ipatix~
@ revision 2.1

    /* globals */
	.global	main_mixer
	.global	main_mixer_end

    /* game code definitions */
	.equ	GAME_BPED, 0
	.equ	GAME_BPEE, 1
	.equ	GAME_BPRE, 2
	.equ	GAME_KWJ6, 3
	.equ	GAME_AE7E, 4
	.equ	GAME_BPRD, 5

    /* SELECT USED GAME HERE */
	.equ	USED_GAME, GAME_BPRE		@ CHOOSE YOUR GAME

	.equ	FRAME_LENGTH_5734, 0x60
	.equ	FRAME_LENGTH_7884, 0x84	    @ THIS MODE IS NOT SUPPORTED BY THIS ENGINE BECAUSE IT DOESN'T USE AN 8 ALIGNED BUFFER LENGTH
	.equ	FRAME_LENGTH_10512, 0xB0
	.equ	FRAME_LENGTH_13379, 0xE0	@ DEFAULT
	.equ	FRAME_LENGTH_15768, 0x108
	.equ	FRAME_LENGTH_18157, 0x130
	.equ	FRAME_LENGTH_21024, 0x160
	.equ	FRAME_LENGTH_26758, 0x1C0
	.equ	FRAME_LENGTH_31536, 0x210
	.equ	FRAME_LENGTH_36314, 0x260
	.equ	FRAME_LENGTH_40137, 0x2A0
	.equ	FRAME_LENGTH_42048, 0x2C0

	.equ	DECODER_BUFFER_BPE, 0x03001300
	.equ	DECODER_BUFFER_BPR, 0x03002088
	.equ	DECODER_BUFFER_KWJ, 0x03005800

	.equ	BUFFER_IRAM_BPE, 0x03001AA8
	.equ	BUFFER_IRAM_BPR, 0x030028E0
	.equ	BUFFER_IRAM_KWJ, 0x03005840
	.equ	BUFFER_IRAM_AE7, 0x03006D60	@ PUT THE WORKBUFFER ADDRESS FOR FIRE EMBLEM HERE!!!

    /* stack variables */
	.equ	ARG_FRAME_LENGTH, 0x0       @ TODO actually use this variable
	.equ	ARG_REMAIN_CHN, 0x4         @ This is the channel count variable    
	.equ	ARG_BUFFER_POS, 0x8         @ stores the current output buffer pointer
	.equ	ARG_LOOP_START_POS, 0xC     @ stores wave loop start position in channel loop
	.equ	ARG_LOOP_LENGTH, 0x10       @   ''    ''   ''  end position
@   .equ    ARG_UKNOWN, 0x14 
	.equ	ARG_VAR_AREA, 0x18          @ pointer to engine the main work area

    /* channel struct */
	.equ	CHN_STATUS, 0x0             @ [byte] channel status bitfield
	.equ	CHN_MODE, 0x1               @ [byte] channel mode bitfield
	.equ	CHN_VOL_1, 0x2              @ [byte] volume right
	.equ	CHN_VOL_2, 0x3              @ [byte] volume left
	.equ	CHN_ATTACK, 0x4             @ [byte] wave attack summand
	.equ	CHN_DECAY, 0x5              @ [byte] wave decay factor
	.equ	CHN_SUSTAIN, 0x6            @ [byte] wave sustain level
	.equ	CHN_RELEASE, 0x7            @ [byte] wave release factor
	.equ	CHN_ADSR_LEVEL, 0x9         @ [byte] current envelope level
	.equ	CHN_FINAL_VOL_1, 0xA		@ [byte] not used anymore!
	.equ	CHN_FINAL_VOL_2, 0xB		@ [byte] not used anymore!
	.equ	CHN_ECHO_VOL, 0xC           @ [byte] pseudo echo volume
	.equ	CHN_ECHO_REMAIN, 0xD        @ [byte] pseudo echo length
	.equ	CHN_POSITION_REL, 0x18		@ [word] sample countdown in mixing loop
	.equ	CHN_FINE_POSITION, 0x1C     @ [word] inter sample position (23 bits)
	.equ	CHN_FREQUENCY, 0x20         @ [word] sample rate (in Hz)
	.equ	CHN_WAVE_OFFSET, 0x24       @ [word] wave header pointer
	.equ	CHN_POSITION_ABS, 0x28		@ [word] points to the current position in the wave data (relative offset for compressed samples)
	.equ	CHN_BLOCK_COUNT, 0x3C       @ [word] only used for compressed samples: contains the value of the block that is currently decoded

    /* wave header struct */
	.equ	WAVE_LOOP_FLAG, 0x3         @ [byte] 0x0 = oneshot; 0x40 = looped
	.equ	WAVE_FREQ, 0x4              @ [word] pitch adjustment value = mid-C samplerate * 1024
	.equ	WAVE_LOOP_START, 0x8        @ [word] loop start position
	.equ	WAVE_LENGTH, 0xC            @ [word] loop end / wave end position
    .equ    WAVE_DATA, 0x10             @ [byte array] actual wave data

    /* pulse wave synth configuration offset */
	.equ	SYNTH_BASE_WAVE_DUTY, 0x1   @ [byte]
	.equ	SYNTH_WIDTH_CHANGE_1, 0x2   @ [byte]
	.equ	SYNTH_MOD_AMOUNT, 0x3       @ [byte]
	.equ	SYNTH_WIDTH_CHANGE_2, 0x4   @ [byte]

    /* CHN_STATUS flags - 0x0 = OFF */
	.equ	FLAG_CHN_INIT, 0x80         @ [bit] write this value to init a channel
	.equ	FLAG_CHN_RELEASE, 0x40      @ [bit] write this value to release (fade out) the channel
	.equ	FLAG_CHN_COMP, 0x20         @ [bit] is wave being played compressed (yes/no)
	.equ	FLAG_CHN_LOOP, 0x10         @ [bit] loop (yes/no)
	.equ	FLAG_CHN_ECHO, 0x4          @ [bit] echo phase
	.equ	FLAG_CHN_ATTACK, 0x3        @ [bit] attack phase
	.equ	FLAG_CHN_DECAY, 0x2         @ [bit] decay phase
	.equ	FLAG_CHN_SUSTAIN, 0x1       @ [bit] sustain phase

    /* CHN_MODE flags */
	.equ	MODE_FIXED_FREQ, 0x8        @ [bit] set to disable resampling (i.e. playback with output rate)
	.equ	MODE_REVERSE, 0x10          @ [bit] set to reverse sample playback
	.equ	MODE_COMP, 0x30             @ [bit] is wave being played compressed or reversed (TODO: rename flag)
	.equ	MODE_SYNTH, 0x40            @ [bit] READ ONLY, indicates synthzied output

    /* variables of the engine work area */
	.equ	VAR_REVERB, 0x5             @ [byte] 0-127 = reverb level
	.equ	VAR_MAX_CHN, 0x6            @ [byte] maximum channels to process
	.equ	VAR_MASTER_VOL, 0x7         @ [byte] PCM master volume
	.equ	VAR_DEF_PITCH_FAC, 0x18     @ [word] this value get's multiplied with the samplerate for the inter sample distance
	.equ	VAR_FIRST_CHN, 0x50         @ [CHN struct] relative offset to channel array

    /* just some more defines */
	.equ	REG_DMA3_SRC, 0x040000D4
    .equ    ARM_OP_LEN, 0x4

@#######################################
@*********** GAME CONFIGS **************
@ add the game's name above to the ASM .equ-s before creating new configs
@#######################################


@*********** IF GERMAN POKEMON EMERALD
.if USED_GAME==GAME_BPED

	.equ	hq_buffer, BUFFER_IRAM_BPE
	.equ	decoder_buffer_target, DECODER_BUFFER_BPE
	.equ	ALLOW_PAUSE, 1
	.equ	DMA_FIX, 1
	.equ	ENABLE_DECOMPRESSION, 1
	.equ	PREVENT_CLIP, 1

.endif
@*********** IF ENGLISH POKEMON FIRE RED
.if USED_GAME==GAME_BPRD

	.equ	hq_buffer, BUFFER_IRAM_BPR
	.equ	decoder_buffer_target, DECODER_BUFFER_BPR
	.equ	ALLOW_PAUSE, 1
	.equ	DMA_FIX, 1
	.equ	ENABLE_DECOMPRESSION, 1
	.equ	PREVENT_CLIP, 1

.endif
@*********** IF ENGLISH POKEMON EMERALD
.if USED_GAME==GAME_BPEE

	.equ	hq_buffer, BUFFER_IRAM_BPE
	.equ	decoder_buffer_target, DECODER_BUFFER_BPE
	.equ	ALLOW_PAUSE, 1
	.equ	DMA_FIX, 1
	.equ	ENABLE_DECOMPRESSION, 1
	.equ	PREVENT_CLIP, 1

.endif
@*********** IF ENGLISH POKEMON FIRE RED
.if USED_GAME==GAME_BPRE

	.equ	hq_buffer, BUFFER_IRAM_BPR
	.equ	decoder_buffer_target, DECODER_BUFFER_BPR
	.equ	ALLOW_PAUSE, 1
	.equ	DMA_FIX, 1
	.equ	ENABLE_DECOMPRESSION, 1
	.equ	PREVENT_CLIP, 1

.endif
@*********** IF KAWAs JUKEBOX 2006
.if USED_GAME==GAME_KWJ6

	.equ	hq_buffer, BUFFER_IRAM_KWJ
	.equ	decoder_buffer_target, DECODER_BUFFER_KWJ
	.equ	ALLOW_PAUSE, 0
	.equ	DMA_FIX, 0
	.equ	ENABLE_DECOMPRESSION, 0
	.equ	PREVENT_CLIP, 1

.endif
@*********** IF US FIRE EMBLEM
.if USED_GAME==GAME_AE7E

	.equ	hq_buffer, BUFFER_IRAM_AE7
	.equ	ALLOW_PAUSE, 0
	.equ	DMA_FIX, 0
	.equ	ENABLE_DECOMPRESSION, 0
	.equ	PREVENT_CLIP, 0
.endif
@***********

	.thumb

main_mixer:
    /* load Reverb level and check if we need to apply it */
    LDRB	R3, [R0, #VAR_REVERB]
    LSR	R3, R3, #2
    BEQ  	clear_buffer

    ADR	R1, do_reverb
    BX	R1

	.align	2
	.arm

do_reverb:

    /* 
     * reverb is calculated by the following: new_sample = old_sample * reverb_level / 127
     * note that reverb is mono (both sides get mixed together)
     * 
     * reverb get's applied to the frame we are currently looking at and the one after that
     * the magic below simply calculateds the pointer for the one after the current one
     */

    CMP	R4, #2
    ADDEQ R7, R0, #0x350
    ADDNE R7, R5, R8
    MOV	R4, R8
    ORR	R3, R3, R3, LSL#16			
    STMFD SP!, {R8, LR}
    LDR	LR, hq_buffer_label

reverb_loop:
        /* This loop does the reverb processing */
        LDRSB	R0, [R5, R6]
        LDRSB	R1, [R5], #1
        LDRSB	R2, [R7, R6]
        LDRSB	R8, [R7], #1
        LDRSB	R9, [R5, R6]
        LDRSB	R10, [R5], #1
        LDRSB	R11, [R7, R6]
        LDRSB	R12, [R7], #1
        ADD	R0, R0, R1
        ADD	R0, R0, R2
        ADDS	R0, R0, R8
        ADDMI	R0, R0, #0x4
        ADD	R1, R9, R10
        ADD	R1, R1, R11
        ADDS	R1, R1, R12
        ADDMI	R1, R1, #0x4
        MUL	R0, R3, R0
        MUL	R1, R3, R1
        STMIA	LR!, {R0, R1}
        SUBS	R4, R4, #2
        BGT	reverb_loop
        /* end of loop */
    LDMFD	SP!, {R8, LR}
    ADR	R0, (adsr_setup+1)
    BX	R0

	.thumb

clear_buffer:
    /* Incase reverb is disabled the buffer get's set to zero */
    LDR	R3, hq_buffer_label
    MOV	R1, R8
    MOV	R4, #0
    MOV	R5, #0
    MOV	R6, #0
    MOV	R7, #0
    /*
     * Setting the buffer to zero happens in a very efficient loop
     * Depending on the alignment of the buffer length, twice or quadruple the amount of bytes
     * get cleared at once
     */
    LSR	R1, #3
    BCC	clear_buffer_align_8

    STMIA	R3!, {R4, R5, R6, R7}

clear_buffer_align_8:

    LSR	R1, #1
    BCC	clear_buffer_align_16

    STMIA	R3!, {R4, R5, R6, R7}
    STMIA	R3!, {R4, R5, R6, R7}

clear_buffer_align_16:
        /* This repeats until the buffer has been cleared */
        STMIA	R3!, {R4, R5, R6, R7}
        STMIA	R3!, {R4, R5, R6, R7}
        STMIA	R3!, {R4, R5, R6, R7}
        STMIA	R3!, {R4, R5, R6, R7}
        SUB	    R1, #1
        BGT	    clear_buffer_align_16
        /* loop end */
adsr_setup:
    /*
     * okay, before the actual mixing starts
     * the volume and envelope calculation happens
     */
    MOV R4, R8  @ R4 = buffer length
    /* this buffers the buffer length to a backup location
     * TODO: Move this variable to stack
     */
    ADR	R0, hq_buffer_length_label
    STR	R4, [R0]
    /* init channel loop */
    LDR	R4, [SP, #ARG_VAR_AREA]	        @ R4 = main work area pointer
    LDR	R0, [R4, #VAR_DEF_PITCH_FAC]	@ R0 = samplingrate pitch factor
    MOV	R12, R0					        @ --> R12
    LDRB R0, [R4, #VAR_MAX_CHN]		    @ load MAX channels to R0
    ADD	R4, #VAR_FIRST_CHN  			@ R4 = Base channel Offset (Channel #0)

mixer_entry:
        /* this is the main channel processing loop */
        STR	R0, [SP, #ARG_REMAIN_CHN]		
        LDR	R3, [R4, #CHN_WAVE_OFFSET]
        LDRB R6, [R4, #CHN_STATUS]
        MOVS R0, #0xC7					@ check if any of the channel status flags is set
        TST	R0, R6						@ check if none of the flags is set
        BEQ return_channel_null 		@ skip channel
        /* check channel flags */
        LSL	R0, R6, #25 				@ shift over the FLAG_CHN_INIT to CARRY
        BCC	adsr_echo_check				@ continue with normal channel procedure
        /* check leftmost bit */
        BMI	stop_channel_handler		@ if the channel is initiated but on release it gets turned off immediatley
        /* channel init procedure */
        MOVS R6, #FLAG_CHN_ATTACK		@ set the channel status to ATTACK
        MOVS R0, R3						@ R0 = CHN_WAVE_OFFSET
        ADD	R0, #WAVE_DATA				@ R0 = wave data offset

        /* Pokemon games seem to init channels differently than other m4a games */
    .if ALLOW_PAUSE==0
        STR	R0, [R4, #CHN_POSITION_ABS]
        LDR	R0, [R3, #WAVE_LENGTH]
        STR	R0, [R4, #CHN_POSITION_REL] 
    .else
        LDR	R1, [R4, #CHN_POSITION_REL]
        ADD	R0, R0, R1
        STR	R0, [R4, #CHN_POSITION_ABS]
        LDR	R0, [R3, #WAVE_LENGTH]
        SUB	R0, R0, R1
        STR	R0, [R4, #CHN_POSITION_REL]
    .endif

        MOVS R5, #0						@ initial envelope = #0
        STRB R5, [R4, #CHN_ADSR_LEVEL]
        STR	R5, [R4, #CHN_FINE_POSITION]
        LDRB R2, [R3, #WAVE_LOOP_FLAG]
        LSR	R0, R2, #6
        BEQ	adsr_attack_handler         @ if loop disabled --> branch
        /* loop enabled here */
        MOVS R0, #FLAG_CHN_LOOP	
        ORR	R6, R0      				@ update channel status
        B adsr_attack_handler

adsr_echo_check:
        /* this is the normal ADSR procedure without init */
        LDRB R5, [R4, #CHN_ADSR_LEVEL]
        LSL	R0, R6, #29				    @ echo flag --> bit 31
        BPL	adsr_release_check			@ PL == false
        /* pseudo echo handler */
        LDRB R0, [R4, #CHN_ECHO_REMAIN]
        SUB	R0, #1
        STRB R0, [R4, #CHN_ECHO_REMAIN]
        BHI	channel_vol_calc			@ if echo still on --> branch

stop_channel_handler:

        MOVS R0, #0
        STRB R0, [R4, #CHN_STATUS]

return_channel_null:
        /* go to end of the channel loop */
        B check_remain_channels

adsr_release_check:
        LSL	R0, R6, #25					@ bit 31 = release bit
        BPL	adsr_decay_check			@ if release == 0 --> branch
        /* release handler */
        LDRB R0, [R4, #CHN_RELEASE]
        @SUB R0, #0xFF                  @ linear decay; TODO make option for triggering it
        @SUB R0, #1
        @ADD R5, R5, R0
        MUL	R5, R5, R0	            	@ default release algorithm
        LSR	R5, R5, #8
        @BMI adsr_released_handler      @ part of linear decay
        BEQ	adsr_released_handler	    @ release gone down to #0 --> branch
        /* pseudo echo init handler */
        LDRB R0, [R4, #CHN_ECHO_VOL]
        CMP	R5, R0
        BHI	channel_vol_calc            @ if release still above echo level --> branch

adsr_released_handler:
        /* if volume released to #0 */
        LDRB R5, [R4, #CHN_ECHO_VOL]    @ TODO: replace with MOV R5, R0
        CMP	R5, #0
        BEQ	stop_channel_handler        @ if pseudo echo vol = 0 --> branch
        /* pseudo echo volume handler */
        MOVS R0, #FLAG_CHN_ECHO
        ORR	R6, R0						@ set the echo flag
        B adsr_update_status

adsr_decay_check:
        /* check if decay is active */
        MOVS R2, #3
        AND	R2, R6                      @ seperate phase status bits
        CMP	R2, #FLAG_CHN_DECAY
        BNE	adsr_attack_check			@ decay not active --> branch
        /* decay handler */
        LDRB R0, [R4, #CHN_DECAY]
        MUL	R5, R0
        LSR	R5, R5, #8
        LDRB R0, [R4, #CHN_SUSTAIN]
        CMP	R5, R0
        BHI	channel_vol_calc		    @ sample didn't decay yet --> branch
        /* sustain handler */
        MOVS R5, R0						@ current level = sustain level
        BEQ	adsr_released_handler       @ sustain level #0 --> branch
        /* step to next phase otherweise */
        B adsr_switchto_next

adsr_attack_check:
        /* attack handler */
        CMP	R2, #FLAG_CHN_ATTACK
        BNE	channel_vol_calc			@ if it isn't in attack attack phase, it has to be in sustain (no adsr change needed) --> branch

adsr_attack_handler:
        /* apply attack summand */
        LDRB R0, [R4, #CHN_ATTACK]
        ADD	R5, R5, R0
        CMP	R5, #0xFF
        BCC	adsr_update_status
        /* cap attack at 0xFF */
        MOVS R5, #0xFF
 
adsr_switchto_next:
        /* switch to next adsr phase */
        SUB	R6, #1

adsr_update_status:
        /* store channel status */
        STRB R6, [R4, #CHN_STATUS]

channel_vol_calc:
        /* store the calculated ADSR level */
        STRB R5, [R4, #CHN_ADSR_LEVEL]
        /* apply master volume */
        LDR	R0, [SP, #ARG_VAR_AREA]
        LDRB R0, [R0, #VAR_MASTER_VOL]
        ADD	R0, #1
        MUL	R5, R0, R5
        /* left side volume */
        LDRB R0, [R4, #CHN_VOL_2]
        MUL	R0, R5
        LSR	R0, R0, #13
        MOV	R10, R0                     @ R10 = left volume
        /* right side volume */
        LDRB R0, [R4, #CHN_VOL_1]
        MUL	R0, R5
        LSR	R0, R0, #13
        MOV	R11, R0						@ R11 = right volume
        /*
         * Now we get closer to actual mixing:
         * For looped samples some additional operations are required
         */
        MOVS R0, #FLAG_CHN_LOOP
        AND	R0, R6
        BEQ	mixing_loop_setup				@ TODO: This label should rather be called "skip_loop_setup"
        /* loop setup handler */
        ADD	R3, #WAVE_LOOP_START
        LDMIA R3!, {R0, R1}					@ R0 = loop start, R1 = loop end
        ADD	R3, R0, R3					    @ R3 = loop start position (absolute)
        STR	R3, [SP, #ARG_LOOP_START_POS]	@ backup loop start
        SUB	R0, R1, R0

mixing_loop_setup:
        /* do the rest of the setup */
        STR	R0, [SP, #ARG_LOOP_LENGTH]		@ if loop is off --> R0 = 0x0
        LDR	R5, hq_buffer_label
        LDR	R2, [R4, #CHN_POSITION_REL]		@ remaining samples for channel
        LDR	R3, [R4, #CHN_POSITION_ABS]		@ current stream position (abs)
        LDRB R0, [R4, #CHN_MODE]
        ADR	R1, mixing_arm_setup
        BX R1

	.align	2
hq_buffer_label:
	.word	hq_buffer
hq_buffer_length_label:     @ TODO: Replace with variable on stack
	.word	0xFFFFFFFF

	.arm
mixing_arm_setup:
        /* frequency and mixing loading routine */
        LDR	R8, hq_buffer_length_label
        ORRS R11, R10, R11, LSL#16		    @ R11 = 00RR00LL
        BEQ	switchto_thumb					@ volume #0 --> branch and skip channel processing
        /* normal processing otherwise */
        TST R0, #MODE_FIXED_FREQ
        BNE	fixed_mixing_setup
        TST R0, #MODE_COMP
        BNE special_mixing	                @ compressed? --> branch
        /* same here */
        STMFD SP!, {R4, R9, R12}
        /*
         * This mixer supports 4 different kind of synthesized sounds
         * They are triggered when the loop end = 0
         * This get's checked below
         */
        MOVS R2, R2
        ORREQ R0, R0, #MODE_SYNTH
        STREQB R0, [R4, #CHN_MODE]
        ADD	R4, R4, #CHN_FINE_POSITION
        LDMIA R4, {R7, LR}					@ R7 = Fine Position, LR = Frequency
        MUL	R4, R12, LR					    @ R4 = inter sample steps = output rate factor * samplerate
        /* now the first samples get loaded */
        LDRSB R6, [R3], #1
        LDRSB R12, [R3]
        TST	R0, #MODE_SYNTH
        BNE	init_synth
        /* incase no synth mode should be used, code contiues here */
        SUB	R12, R12, R6					@ R12 = DELTA
        /*
         * Mixing goes with volume ranges 0-127
         * They come in 0-255 --> divide by 2
         */
        MOVS R11, R11, LSR#1
        ADC	R11, R11, #0x8000
        BIC	R11, R11, #0xFF00
        MOV	R1, R7	    					@ R1 = inter sample position
        /*
         * There is 2 different mixing codepaths for uncompressed data
         *  path 1: fast mixing, but doesn't supports loop or stop
         *  path 2: not so fast but supports sample loops / stop
         * This checks if there is enough samples aviable for path 1.
         * important: R0 is expected to be #0
         */
        UMLAL R1, R0, R4, R8
        MOV	R1, R1, LSR#23
        ORR	R0, R1, R0, LSL#9
        CMP	R2, R0						    @ actual comparison
        BLE	split_sample_loading			@ if not enough samples are available for path 1 --> branch
        /* 
         * This is the mixer path 1.
         * The interesting thing here is that the code will
         * buffer enough samples on stack if enough space
         * on stack is available (or goes over the limit of 0x400 bytes)
         */
        SUB	R2, R2, R0
        LDR	R10, stack_capacity
        ADD	R10, R10, R0
        CMP	R10, SP
        ADD	R10, R3, R0
        ADR	R9, custom_stack_3
        /*
         * R2 = remaining samples
         * R10 = final sample position
         * SP = original stack location
         * These values will get reloaded after channel processing
         * due to the lack of registers.
         */
        STMIA	R9, {R2, R10, SP}
        CMPCC	R0, #0x400                  @ > 0x400 bytes --> read directly from ROM rather than buffered
        BCS	select_mixing_mode              @ TODO rename
        /*
         * The code below inits the DMA to read word aligned
         * samples from ROM to stack
         */
        BIC	R1, R3, #3
        MOV	R9, #0x04000000
        ADD	R9, R9, #0xD4
        ADD	R0, R0, #7
        MOV	R0, R0, LSR#2
        SUB SP, SP, R0, LSL#2
        AND	R3, R3, #3
        ADD	R3, R3, SP
        ORR	LR, R0, #0x84000000
        STMIA R9, {R1, SP, LR}              @ actually starts the DMA

        /* Somehow is neccesary for some games not to break */
    .if DMA_FIX==1
        MOV	R0, #0
        MOV	R1, R0
        MOV	R2, R1
        STMIA R9, {R0, R1, R2}
    .endif

select_mixing_mode:
        /*
         * This code decides which piece of code to load
         * depending on playback-rate / default-rate ratio.
         * Modes > 1.0 run with different volume levels.
         */
        SUBS R4, R4, #0x800000
        MOVPL R11, R11, LSL#1
        ADR	R0, math_resources				@ loads the base pointer of the code
        ADDPL R0, R0, #(ARM_OP_LEN*6)       @ 6 instructions further
        SUBPLS R4, R4, #0x800000
        ADDPL R0, R0, #(ARM_OP_LEN*6)
        ADDPL R4, R4, #0x800000				@ TODO how does restoring for > 2.0 ratios work?
        LDR	R2, function_pointer
        CMP	R0, R2						    @ code doesn't need to be reloaded if it's already in place
        BEQ	mixing_init
        /* This loads the needed code to RAM */
        STR	R0, function_pointer
        LDMIA R0, {R0-R2, R8-R10}			@ load 6 opcodes
        ADR	LR, runtime_created_routine

create_routine_loop:
            /* paste code to destination, see below for patterns */
            STMIA	LR, {R0, R1}
            ADD	LR, LR, #0x98
            STMIA	LR, {R0, R1}
            SUB	LR, LR, #0x8C
            STMIA	LR, {R2, R8-R10}
            ADD	LR, LR, #0x98
            STMIA	LR, {R2, R8-R10}
            SUB	LR, LR, #0x80
            ADDS	R5, R5, #0x40000000	    @ do that for 4 blocks
            BCC	create_routine_loop

        LDR	R8, hq_buffer_length_label

mixing_init:
        MOV	R2, #0xFF000000					@ load the fine position overflow bitmask
mixing_loop:
        /* This is the actual processing and interpolation code loop; NOPs will be replaced by the code above */
            LDMIA R5, {R0, R1, R10, LR}	@ load 4 stereo samples to Registers
            MUL	R9, R7, R12
runtime_created_routine:
            NOP							@ Block #1
            NOP
            MLANE R0, R11, R9, R0
            NOP
            NOP
            NOP
            NOP
            BIC	R7, R7, R2, ASR#1
            MULNE	R9, R7, R12
            NOP							@ Block #2
            NOP
            MLANE R1, R11, R9, R1
            NOP
            NOP
            NOP
            NOP
            BIC	R7, R7, R2, ASR#1
            MULNE R9, R7, R12
            NOP							@ Block #3
            NOP
            MLANE R10, R11, R9, R10
            NOP
            NOP
            NOP
            NOP
            BIC	R7, R7, R2, ASR#1
            MULNE R9, R7, R12
            NOP							@ Block #4
            NOP
            MLANE LR, R11, R9, LR
            NOP
            NOP
            NOP
            NOP
            BIC	R7, R7, R2, ASR#1
            STMIA R5!, {R0, R1, R10, LR}	@ write 4 stereo samples
            
            LDMIA R5, {R0, R1, R10, LR}	    @ load the next 4 stereo samples
            MULNE R9, R7, R12	
            NOP							@ Block #1
            NOP
            MLANE R0, R11, R9, R0
            NOP
            NOP
            NOP
            NOP
            BIC	R7, R7, R2, ASR#1
            MULNE R9, R7, R12
            NOP							@ Block #2
            NOP
            MLANE R1, R11, R9, R1
            NOP
            NOP
            NOP
            NOP
            BIC	R7, R7, R2, ASR#1
            MULNE R9, R7, R12
            NOP							@ Block #3
            NOP
            MLANE R10, R11, R9, R10
            NOP
            NOP
            NOP
            NOP
            BIC	R7, R7, R2, ASR#1
            MULNE R9, R7, R12
            NOP							@ Block #4
            NOP
            MLANE LR, R11, R9, LR
            NOP
            NOP
            NOP
            NOP
            BIC	R7, R7, R2, ASR#1
            STMIA R5!, {R0, R1, R10, LR}	@ write 4 stereo samples
            SUBS R8, R8, #8					@ subtract 8 from the sample count
            BGT	mixing_loop
        /* restore previously saved values */
        ADR	R12, custom_stack_3
        LDMIA R12, {R2, R3, SP}
        B mixing_end_func

@ work variables

	.align	2
custom_stack_3:
	.word	0x0, 0x0, 0x0
stack_capacity:
	.word	0x03007910
function_pointer:
	.word	0x0

@ math resources, not directly used

math_resources:

MOV	R9, R9, ASR#22					@ Frequency Lower than default Frequency
ADDS	R9, R9, R6, LSL#1
ADDS	R7, R7, R4
ADDPL	R6, R12, R6
LDRPLSB	R12, [R3, #1]!
SUBPLS	R12, R12, R6

ADDS	R9, R6, R9, ASR#23				@ Frequency < 2x && Frequency > default frequency
ADD	R6, R12, R6
ADDS	R7, R7, R4
LDRPLSB	R6, [R3, #1]!
LDRSB	R12, [R3, #1]!
SUBS	R12, R12, R6

ADDS	R9, R6, R9, ASR#23				@ Frequency >= 2x higher than default Frequency
ADD	R7, R7, R4
ADD	R3, R3, R7, LSR#23
LDRSB	R6, [R3]
LDRSB	R12, [R3, #1]!
SUBS	R12, R12, R6

split_sample_loading:

ADD	R5, R5, R8, LSL#2				@ R5 = End of HQ buffer

uncached_mixing_loop:

MUL	R9, R7, R12					@ calc interpolated DELTA
MOV	R9, R9, ASR#22					@ scale down the DELTA
ADDS	R9, R9, R6, LSL#1				@ Add to Base Sample (upscaled to 8 bits again)
LDRNE	R0, [R5, -R8, LSL#2]				@ load sample from buffer
MLANE	R0, R11, R9, R0					@ add it to the buffer sample
STRNE	R0, [R5, -R8, LSL#2]				@ write the sample
ADD	R7, R7, R4					@ add the step size to the fine position
MOVS	R9, R7, LSR#23					@ write the overflow amount to R9
BEQ	uncached_mixing_load_skip			@ skip the mixing load if it isn't required

SUBS	R2, R2, R7, LSR#23				@ remove the overflow count from the remaning samples
BLLE	loop_end_sub					@ if the loop end is reached call the loop handler
SUBS	R9, R9, #1					@ remove #1 from the overflow count
ADDEQ	R6, R12, R6					@ new base sample is previous sample + DELTA
@RETURN LOCATION FROM LOOP HANDLER
LDRNESB	R6, [R3, R9]!					@ load new sample
LDRSB	R12, [R3, #1]!					@ load the delta sample (always required)
SUB	R12, R12, R6					@ calc new DELTA
BIC	R7, R7, #0x3F800000				@ clear the overflow from the fine position by using the bitmask

uncached_mixing_load_skip:

SUBS	R8, R8, #1					@ reduce the sample count for the buffer by #1
BGT	uncached_mixing_loop

mixing_end_func:

SUB	R3, R3, #1					@ reduce sample pointer by #1 (???)
LDMFD	SP!, {R4, R9, R12}				@ pop values from stack
STR	R7, [R4, #CHN_FINE_POSITION]			@ store the fine position
B	store_coarse_sample_pos				@ jump over to code to store coarse channel position

loop_end_sub:

ADD	R3, SP, #ARG_LOOP_START_POS+0xC			@ prepare sample loop start loading and lopo length loading (0xC due to the pushed stack pointer)
LDMIA	R3, {R3, R6}					@ R3 = Loop Start; R6 = Loop Length
CMP	R6, #0						@ check if loop is enabled; if Loop is enabled R6 is != 0
RSBNE	R9, R2, #0					@ the sample overflow from the resampling needs to get subtracted so the remaining samples is slightly less
ADDNE	R2, R6, R2					@ R2 = add the loop length
ADDNE	PC, LR, #8					@ return from the subroutine to 2 instructions after the actual return location
LDMFD	SP!, {R4, R9, R12}				@ restore registers from stack
B	update_channel_status

fixed_freq_loop_end_handler:

LDR	R2, [SP, #ARG_LOOP_LENGTH+0x8]			@ load the loop length value
MOVS	R6, R2						@ copy it to R6 and check if loop is disabled
LDRNE	R3, [SP, #ARG_LOOP_START_POS+0x8]		@ reset the sample pointer to the loop start position
BXNE	LR						@ if it loops return to mixing function, if it doesn't go on and end mixing

LDMFD	SP!, {R4, R9}

update_channel_status:

STRB	R6, [R4]					@ if loop ist disabled R6 = 0 and we can disable the channel by writing R6 to R4 (channel area)
B	switchto_thumb					@ switch to thumb

fixed_math_resource:	@ not exectued, used to create mixing function

MOVS	R6, R10, LSL#24
MOVS	R6, R6, ASR#24
MOVS	R6, R10, LSL#16
MOVS	R6, R6, ASR#24
MOVS	R6, R10, LSL#8
MOVS	R6, R6, ASR#24
MOVS	R6, R10, ASR#24
LDMIA	R3!, {R10}					@ load chunk of samples
MOVS	R6, R10, LSL#24
MOVS	R6, R6, ASR#24
MOVS	R6, R10, LSL#16
MOVS	R6, R6, ASR#24
MOVS	R6, R10, LSL#8
MOVS	R6, R6, ASR#24
LDMFD	SP!, {R4, R9, R12}

fixed_mixing_setup:

STMFD	SP!, {R4, R9}					@ backup the channel pointer and 

fixed_mixing_check_length:

MOV	LR, R2						@ move absolute sample position to LR
CMP	R2, R8						@ 
MOVGT	LR, R8						@ if there is less samples than the buffer to process write the smaller sample amount to LR
SUB	LR, LR, #1					@ shorten samples to process by #1
MOVS	LR, LR, LSR#2					@ calculate the amount of words to process (-1/4)
BEQ	fixed_mixing_process_unaligned			@ process the unaligned samples if there is <= 3 samples to process

SUB	R8, R8, LR, LSL#2				@ subtract the amount of samples we need to process from the buffer length
SUB	R2, R2, LR, LSL#2				@ subtract the amount of samples we need to process from the remaining samples
ADR	R1, fixed_mixing_custom_routine
ADR	R0, fixed_math_resource				@ load the 2 pointers to create function (@R0) by instructions from R1
MOV	R9, R3, LSL#30					@ move sample alignment bits to the leftmost position
ADD	R0, R0, R9, LSR#27				@ alignment * 8 + resource offset = new resource offset
LDMIA	R0!, {R6, R7, R9, R10}				@ load 4 instructions
STMIA	R1, {R6, R7}					@ write the 1st 2 instructions
ADD	R1, R1, #0xC					@ move label pointer over to the next slot
STMIA	R1, {R9, R10}					@ write 2nd block
ADD	R1, R1, #0xC					@ move label pointer to next block
LDMIA	R0, {R6, R7, R9, R10}				@ load instructions for block #3 and #4
STMIA	R1, {R6, R7}					@ write block #3
ADD	R1, R1, #0xC					@ ...
STMIA	R1, {R9, R10}					@ write block #4
LDMIA	R3!, {R10}					@ write read 4 samples from ROM

fixed_mixing_loop:

LDMIA	R5, {R0, R1, R7, R9}				@ load 4 samples from hq buffer

fixed_mixing_custom_routine:

NOP
NOP
MLANE	R0, R11, R6, R0					@ add new sample if neccessary
NOP
NOP
MLANE	R1, R11, R6, R1
NOP
NOP
MLANE	R7, R11, R6, R7
NOP
NOP
MLANE	R9, R11, R6, R9
STMIA	R5!, {R0, R1, R7, R9}				@ write the samples to the work area buffer
SUBS	LR, LR, #1					@ countdown the sample blocks to process
BNE	fixed_mixing_loop				@ if the end wasn't reached yet, repeat the loop

SUB	R3, R3, #4					@ reduce sample position by #4, we'll need to load the samples again

fixed_mixing_process_unaligned:

MOV	R1, #4						@ we need to repeat the loop #4 times to completley get rid of alignment errors

fixed_mixing_unaligned_loop:

LDR	R0, [R5]					@ load sample from buffer
LDRSB	R6, [R3], #1					@ load sample from ROM ro R6
MLA	R0, R11, R6, R0					@ write the sample to the buffer
STR	R0, [R5], #4
SUBS	R2, R2, #1					@ reduce alignment error by #1
BLEQ	fixed_freq_loop_end_handler
SUBS	R1, R1, #1
BGT	fixed_mixing_unaligned_loop			@ repeat the loop #4 times

SUBS	R8, R8, #4					@ reduce the sample amount we wrote to the buffer by #1
BGT	fixed_mixing_check_length			@ go up to repeat the mixing procedure until the buffer is filled

LDMFD	SP!, {R4, R9}					@ pop registers from stack

store_coarse_sample_pos:

STR	R2, [R4, #CHN_POSITION_REL]			@ store relative and absolute sample position
STR	R3, [R4, #CHN_POSITION_ABS]			

switchto_thumb:

ADR	R0, (check_remain_channels+1)			@ load the label offset and switch to thumb
BX	R0

	.thumb

check_remain_channels:

LDR	R0, [SP, #ARG_REMAIN_CHN]			@ load the remaining channels
SUB	R0, #1						@ reduce the amount by #1
BLE	mixer_return					@ end the mixing when finished processing all channels

ADD	R4, #0x40
B	mixer_entry

mixer_return:

ADR	R0, downsampler
BX	R0

downsampler_return:

LDR	R0, [SP, #ARG_VAR_AREA]			@ load the main var area to R0
LDR	R3, mixer_finished_status		@ load some status indication value to R3
STR	R3, [R0]				@ store this value to the main var area
ADD	SP, SP, #0x1C
POP	{R0-R7}
MOV	R8, R0
MOV	R9, R1
MOV	R10, R2
MOV	R11, R3
POP	{R3}
BX	R3

	.align	2

mixer_finished_status:
	.word	0x68736D53

	.arm

downsampler:

LDR	R10, hq_buffer_label
LDR	R9, [SP, #ARG_BUFFER_POS]
LDR	R8, hq_buffer_length_label
MOV	R11, #0xFF
.if PREVENT_CLIP==1

MOV	R12, #0xFFFFFFFF
MOV	R12, R12, LSL#14
MOV	R7, #0x630

downsampler_loop:

LDRSH	R2, [R10], #2
LDRSH	R0, [R10], #2
LDRSH	R3, [R10], #2
LDRSH	R1, [R10], #2

CMP	R0, #0x4000
MOVGE	R0, #0x3F80
CMP	R0, #-0x4000
MOVLT	R0, R12

CMP	R1, #0x4000
MOVGE	R1, #0x3F80
CMP	R1, #-0x4000
MOVLT	R1, R12

CMP	R2, #0x4000
MOVGE	R2, #0x3F80
CMP	R2, #-0x4000
MOVLT	R2, R12

CMP	R3, #0x4000
MOVGE	R3, #0x3F80
CMP	R3, #-0x4000
MOVLT	R3, R12

AND	R0, R11, R0, ASR#7
AND	R1, R11, R1, ASR#7
AND	R2, R11, R2, ASR#7
AND	R3, R11, R3, ASR#7

ORR	R2, R2, R3, LSL#8
ORR	R0, R0, R1, LSL#8

STRH	R2, [R9, R7]
STRH	R0, [R9], #2

SUBS	R8, #2
BGT	downsampler_loop

.else
downsampler_loop:

LDRH	R4, [R10], #2
LDRH	R0, [R10], #2
LDRH	R5, [R10], #2
LDRH	R1, [R10], #2
LDRH	R6, [R10], #2
LDRH	R2, [R10], #2
LDRH	R7, [R10], #2
LDRH	R3, [R10], #2

AND	R0, R11, R0, LSR#7
AND	R1, R11, R1, LSR#7
AND	R2, R11, R2, LSR#7
AND	R3, R11, R3, LSR#7
AND	R4, R11, R4, LSR#7
AND	R5, R11, R5, LSR#7
AND	R6, R11, R6, LSR#7
AND	R7, R11, R7, LSR#7

ORR	R4, R4, R5, LSL#8
ORR	R4, R4, R6, LSL#16
ORR	R4, R4, R7, LSL#24

ORR	R0, R0, R1, LSL#8
ORR	R0, R0, R2, LSL#16
ORR	R0, R0, R3, LSL#24

STR	R4, [R9, #0x630]
STR	R0, [R9], #4

SUBS	R8, #4
BGT	downsampler_loop

.endif

ADR	R0, (downsampler_return+1)
BX	R0

	.align	2

init_synth:

CMP	R12, #0		@ $030057C4
BNE	check_synth_type

LDRB	R6, [R3, #SYNTH_WIDTH_CHANGE_1]			@ for saw wave -> 0xF0 (base duty cycle change)
ADD	R2, R2, R6, LSL#24				@ add it to the current synt
LDRB	R6, [R3, #SYNTH_WIDTH_CHANGE_2]			@ for saw wave -> 0x80 (base duty cycle change #2)
ADDS	R6, R2, R6, LSL#24				@ add this to the synth state aswell but keep the old value in R2 and put the new one in R6
MVNMI	R6, R6	 					@ negate if duty cycle is > 50%
MOV	R10, R6, LSR#8					@ dividide the final duty cycle by 8 to R10
LDRB	R1, [R3, #SYNTH_MOD_AMOUNT]			@ for saw wave -> 0xE0
LDRB	R0, [R3, #SYNTH_BASE_WAVE_DUTY]			@ for saw wave -> 0x10 (base duty cycle offset)
MOV	R0, R0, LSL#24					@ convert it to a usable duty cycle
MLA	R6, R10, R1, R0					@ calculate the final duty cycle with the offset, and intensity * rotating duty cycle amount
STMFD	SP!, {R2, R3, R9, R12}

synth_type_0_loop:

LDMIA	R5, {R0-R3, R9, R10, R12, LR}			@ load 8 samples
CMP	R7, R6						@ Block #1
ADDCC	R0, R0, R11, LSL#6
SUBCS	R0, R0, R11, LSL#6
ADDS	R7, R7, R4, LSL#3
CMP	R7, R6						@ Block #2
ADDCC	R1, R1, R11, LSL#6
SUBCS	R1, R1, R11, LSL#6
ADDS	R7, R7, R4, LSL#3
CMP	R7, R6						@ Block #3
ADDCC	R2, R2, R11, LSL#6
SUBCS	R2, R2, R11, LSL#6
ADDS	R7, R7, R4, LSL#3
CMP	R7, R6						@ Block #4
ADDCC	R3, R3, R11, LSL#6
SUBCS	R3, R3, R11, LSL#6
ADDS	R7, R7, R4, LSL#3
CMP	R7, R6						@ Block #5
ADDCC	R9, R9, R11, LSL#6
SUBCS	R9, R9, R11, LSL#6
ADDS	R7, R7, R4, LSL#3
CMP	R7, R6						@ Block #6
ADDCC	R10, R10, R11, LSL#6
SUBCS	R10, R10, R11, LSL#6
ADDS	R7, R7, R4, LSL#3
CMP	R7, R6						@ Block #7
ADDCC	R12, R12, R11, LSL#6
SUBCS	R12, R12, R11, LSL#6
ADDS	R7, R7, R4, LSL#3
CMP	R7, R6						@ Block #8
ADDCC	LR, LR, R11, LSL#6
SUBCS	LR, LR, R11, LSL#6
ADDS	R7, R7, R4, LSL#3

STMIA	R5!, {R0-R3, R9, R10, R12, LR}			@ write 8 samples
SUBS	R8, R8, #8					@ remove #8 from sample count
BGT	synth_type_0_loop

LDMFD	SP!, {R2, R3, R9, R12}
B	mixing_end_func

check_synth_type:

SUBS	R12, R12, #1					@ remove #1 from the synth type byte and check if it's #0
BNE	synth_type_2					@ if it still isn't it's synth type 2 (smooth pan flute)

MOV	R6, #0x300					@ R6 = 0x300
MOV	R11, R11, LSR#1					@ halve the volume
BIC	R11, R11, #0xFF00				@ clear bad bits from division
MOV	R12, #0x70					@ R12 = 0x70

synth_type_1_loop:

LDMIA	R5, {R0, R1, R10, LR}				@ load 4 samples from memory
ADDS	R7, R7, R4, LSL#3				@ Block #1 (some oscillator type code)
RSB	R9, R12, R7, LSR#24
MOV	R6, R7, LSL#1
SUB	R9, R9, R6, LSR#27
ADDS	R2, R9, R2, ASR#1
MLANE	R0, R11, R2, R0

ADDS	R7, R7, R4, LSL#3				@ Block #2
RSB	R9, R12, R7, LSR#24
MOV	R6, R7, LSL#1
SUB	R9, R9, R6, LSR#27
ADDS	R2, R9, R2, ASR#1
MLANE	R1, R11, R2, R1

ADDS	R7, R7, R4, LSL#3				@ Block #3
RSB	R9, R12, R7, LSR#24
MOV	R6, R7, LSL#1
SUB	R9, R9, R6, LSR#27
ADDS	R2, R9, R2, ASR#1
MLANE	R10, R11, R2, R10

ADDS	R7, R7, R4, LSL#3				@ Block #4
RSB	R9, R12, R7, LSR#24
MOV	R6, R7, LSL#1
SUB	R9, R9, R6, LSR#27
ADDS	R2, R9, R2, ASR#1
MLANE	LR, R11, R2, LR

STMIA	R5!, {R0, R1, R10, LR}
SUBS	R8, R8, #4
BGT	synth_type_1_loop

B	mixing_end_func					@ goto end

synth_type_2:

MOV	R6, #0x80					@ write base values to the registers
MOV	R12, #0x180

synth_type_2_loop:

LDMIA	R5, {R0, R1, R10, LR}				@ load samples from work buffer
ADDS	R7, R7, R4, LSL#3				@ Block #1
RSBPL	R9, R6, R7, ASR#23
SUBMI	R9, R12, R7, LSR#23
MLA	R0, R11, R9, R0

ADDS	R7, R7, R4, LSL#3				@ Block #2
RSBPL	R9, R6, R7, ASR#23
SUBMI	R9, R12, R7, LSR#23
MLA	R1, R11, R9, R1

ADDS	R7, R7, R4, LSL#3				@ Block #3
RSBPL	R9, R6, R7, ASR#23
SUBMI	R9, R12, R7, LSR#23
MLA	R10, R11, R9, R10

ADDS	R7, R7, R4, LSL#3				@ Block #4
RSBPL	R9, R6, R7, ASR#23
SUBMI	R9, R12, R7, LSR#23
MLA	LR, R11, R9, LR

STMIA	R5!, {R0, R1, R10, LR}				@ store the samples back to the buffer
SUBS	R8, R8, #4					@ subtract #4 from the remainging samples
BGT	synth_type_2_loop

B	mixing_end_func

@****************** SPECIAL MIXING ******************@
.if ENABLE_DECOMPRESSION==1
special_mixing:		@ $03006BF8

LDR	R6, [R4, #CHN_WAVE_OFFSET]		@ load the wave header offset to R6
LDRB	R0, [R4]
TST	R0, #FLAG_CHN_COMP			@ check if the channel is initialized
BNE	setup_compressed_mixing_frequency	@ skip the setup procedure if it's running in compressed mode already

ORR	R0, R0, #FLAG_CHN_COMP			@ enable the flag in the channel status
STRB	R0, [R4]				@ store the channel status
LDRB	R0, [R4, #CHN_MODE]			@ load the channel mode byte
TST	R0, #MODE_REVERSE			@ check if reverse mode is not enabled

BEQ	determine_compression			@ if Reverse Mode isn't enabled we can directly check if the sample has to get decoded

LDR	R1, [R6, #WAVE_LENGTH]			@ load the amount of samples
ADD	R1, R1, R6, LSL#1			@ do some start position calculation (???)
ADD	R1, R1, #0x20
SUB	R3, R1, R3
STR	R3, [R4, #CHN_POSITION_ABS]		@ store the final seek position

determine_compression:

LDRH	R0, [R6]				@ load the compression flag from the sample header
CMP	R0, #0					@ check if the compression is not enabled
BEQ	setup_compressed_mixing_frequency	@ skip the compression handler

SUB	R3, R3, R6				@ calc initial position
SUB	R3, R3, #0x10
STR	R3, [R4, #CHN_POSITION_ABS]		@ store the inital position (relative, not absolute)

setup_compressed_mixing_frequency:

STMFD	SP!, {R4, R9, R12}

MOVS	R11, R11, LSR#1				@ divide master volume by 2
ADC	R11, R11, #0x8000
BIC	R11, R11, #0xFF00

LDR	R7, [R4, #CHN_FINE_POSITION]		@ load the fine position
LDR	R1, [R4, #CHN_FREQUENCY]		@ load the channel frequency
LDRB	R0, [R4, #CHN_MODE]			@ load the channel mode again
TST	R0, #MODE_FIXED_FREQ			@ check if fixed frequency mode is enabled
MOVNE	R1, #0x800000				@ ### SAMPLE STEP FREQUENCY CHANGED TO R7
MULEQ	R1, R12, R1				@ default rate factor * frequency = sample steps

ADD	R5, R5, R8, LSL#2			@ set the buffer pointer to the end of the channel

LDRH	R0, [R6]				@ load the codec type
CMP	R0, #0					@ check if compression is disabled
BEQ	uncompressed_mixing_reverse_check

MOV	R0, #0xFF000000				@ set the current decoding block to "something very high" so that the first block always gets decoded
STR	R0, [R4, #CHN_BLOCK_COUNT]		@ write the last decoded block into the channel vars
LDRB	R0, [R4, #CHN_MODE]			@ check again if reverse mode is enabled
TST	R0, #MODE_REVERSE			@ test if reverse mode is enabled
BNE	compressed_mixing_reverse_init		@ check again of reverse mixing is enabled

BL	bdpcm_decoder				@ load a sample from the stream to R12
MOV	R6, R12					@ move the base sample to R6
ADD	R3, R3, #1				@ increase stream position by #1
BL	bdpcm_decoder				@ load the delta sample and calculate delta value
SUB	R12, R12, R6

@***** MIXING LOOP REGISTER USAGE ***********@
@ R0:	Sample to modify from buffer
@ R1:	sample steps		(MOVED FROM R4)
@ R2:	remaining samples before loop/end
@ R3:	sample position
@ R4:	channel pointer
@ R5:	pointer to the end of buffer
@ R6:	Base sample
@ R7:	fine position
@ R8:	remaining samples for current buffer
@ R9:	interpolated sample
@ R10:	not used
@ R11:	volume
@ R12:	Delta Sample
@ LR:	not used
@********************************************@

compressed_mixing_loop:

MUL	R9, R7, R12				@ delta sample * fine position = interpolated DELTA
MOV	R9, R9, ASR#22				@ scale down the sample
ADDS	R9, R9, R6, LSL#1			@ double the base sample and add it to the interpolated downscaled DELTA
LDRNE	R0, [R5, -R8, LSL#2]			@ if the sample is NOT 0 load the sample from buffer and store the calulated value
MLANE	R0, R11, R9, R0				@ add the sample to the buffer sample and apply volume
STRNE	R0, [R5, -R8, LSL#2]			@ store the sample if it's not Zero
ADD	R7, R7, R1				@ ### changed from R4 to R1
MOVS	R9, R7, LSR#23				@ check if there is new samples to load

BEQ	compressed_mixing_load_skip		@ no new samples need to be loaded

SUBS	R2, R2, R7, LSR#23			@ remove the sample overflow from the remaining samples
BLLE	loop_end_sub				@ call the loop/ending handler if the countdown reached zero or something negative
SUBS	R9, R9, #1				@ check if only one sample has to get loaded
ADDEQ	R6, R12, R6				@ if this is the case we can calculate the new base sample
BEQ	compressed_mixing_base_load_skip

ADD	R3, R3, R9				@ these opcodes are equivalent to LDRNESB R6, [R3, R9]!
BL	bdpcm_decoder
MOV	R6, R12

compressed_mixing_base_load_skip:

ADD	R3, R3, #1					@ equivalent to LDRSB	R12, [R3, #1]!
BL	bdpcm_decoder
SUB	R12, R12, R6
BIC	R7, R7, #0x3F800000			@ clear the overflow bits by using the according bitmask

compressed_mixing_load_skip:

SUBS	R8, R8, #1				@ remove #1 from the remaining samples
BGT	compressed_mixing_loop

@SUB	R3, R3, #1				@ sample pointer -1 (???); ALREADY DONE BY mixing_end_func
B	mixing_end_func




compressed_mixing_reverse_init:

SUB	R3, R3, #1				@ subtract one from the reverse playback location initially
BL	bdpcm_decoder				@ fetch a sample from stream
MOV	R6, R12					@ bdpcm_decoder returns base sample in R12 --> R6
SUB	R3, R3, #1				@ seek one sample further backwards
BL	bdpcm_decoder				@ detch the DELTA sample
SUB	R12, R12, R6				@ calc the Delta value

compressed_mixing_reverse_loop:

MUL	R9, R7, R12				@ delta sample * fine position = interpolated DELTA
MOV	R9, R9, ASR#22				@ scale down the sample
ADDS	R9, R9, R6, LSL#1			@ double the base sample and add it to the interpolated downscaled DELTA
LDRNE	R0, [R5, -R8, LSL#2]			@ if the sample is NOT 0 load the sample from buffer and store the calulated value
MLANE	R0, R11, R9, R0				@ add the sample to the buffer sample and apply volume
STRNE	R0, [R5, -R8, LSL#2]			@ store the sample if it's not Zero
ADD	R7, R7, R1				@ ### changed from R4 to R1
MOVS	R9, R7, LSR#23				@ check if there is new samples to load

BEQ	compressed_mixing_reverse_load_skip	@ skip sample loading if we don't need to load new samples from ROM

SUBS	R2, R2, R7, LSR#23			@ remove the overflowed samples from the remaining samples
BLLE	loop_end_sub				@ if the sample playback finished go to end handler

SUBS	R9, R9, #1				@ remove sample overflow count by #1
ADDEQ	R6, R12, R6				@ make the previous delta sample the new base sample if only #1 sample needs to get loaded
BEQ	compressed_mixing_reverse_base_load_skip @skip base sample loading

SUB	R3, R3, R9				@
BL	bdpcm_decoder				@
MOV	R6, R12					@

compressed_mixing_reverse_base_load_skip:

SUB	R3, R3, #1
BL	bdpcm_decoder
SUB	R12, R12, R6				@ load next samples???
BIC	R7, R7, #0x3F800000			@ clear overflow bits

compressed_mixing_reverse_load_skip:

SUBS	R8, R8, #1
BGT	compressed_mixing_reverse_loop

@ADD	R3, R3, #2				@ ???, copied from original code
ADD	R3, R3, #3

B	mixing_end_func


uncompressed_mixing_reverse_check:

LDRB	R0, [R4, #1]				@ load the channel mode		=$03006D84
TST	R0, #MODE_REVERSE			@ check if reverse mode is even enabled
BEQ	mixing_end_func				@ skip the channel if the mode is "akward"

LDRSB	R6, [R3, #-1]!				@ load first negative sample
LDRSB	R12, [R3, #-1]				@ load the DELTA sample
SUB	R12, R12, R6				@ calculate DELTA

reverse_mixing_loop:

MUL	R9, R7, R12				@ delta sample * fine position = interpolated DELTA
MOV	R9, R9, ASR#22				@ scale down the sample
ADDS	R9, R9, R6, LSL#1			@ double the base sample and add it to the interpolated downscaled DELTA
LDRNE	R0, [R5, -R8, LSL#2]			@ if the sample is NOT 0 load the sample from buffer and store the calulated value
MLANE	R0, R11, R9, R0				@ add the sample to the buffer sample and apply volume
STRNE	R0, [R5, -R8, LSL#2]			@ store the sample if it's not Zero
ADD	R7, R7, R1				@ ### changed from R4 to R1
MOVS	R9, R7, LSR#23				@ check if there is new samples to load

BEQ	reverse_mixing_load_skip

SUBS	R2, R2, R7, LSR#23			@ blablabla, all same as above
BLLE	loop_end_sub

MOVS	R9, R9					@ check if sample 
ADDEQ	R6, R12, R6
LDRNESB	R6, [R3, -R9]!
LDRSB	R12, [R3, #-1]				@ load samples dependent on conditions
SUB	R12, R12, R6
BIC	R7, R7, #0x3F800000			@ cut off overflow count to get new fine position

reverse_mixing_load_skip:

SUBS	R8, R8, #1				@ remaining samples -1
BGT	reverse_mixing_loop			@ continue lopo if there is still samples to process

@ADD	R3, R3, #1				@ copied from original code (???)
ADD	R3, R3, #2				@ =$03006DE8

B	mixing_end_func

@**************** SPECIAL MIXING END ****************@

@************** SPECIAL MIXING LOOPING **************@

compressed_loop_end_sub:




@************ SPECIAL MIXING LOOPING END ************@

@****************** BDPCM DEOCODER ******************@

bdpcm_decoder:				@ RETURNS SAMPLE FROM POSITION XXX in R12

STMFD	SP!, {R0, R2, R5-R7, LR}		@ push registers to make them free to use: R0, R2, R5, R6, R7, LR
MOV	R0, R3, LSR#6				@ shift the relative position over to clip of every but the block offset
LDR	R12, [R4, #CHN_BLOCK_COUNT]		@ check if the current sample position is at the beginning of the current block
CMP	R0, R12
BEQ	bdpcm_decoder_return

STR	R0, [R4, #CHN_BLOCK_COUNT]		@ store the block position to Channel Vars
MOV	R12, #0x21				@ load decoding byte count to R1 (1 Block = 0x21 Bytes)
MUL	R2, R12, R0				@ multiply the block count with the block length to calc actual byte position of current block
LDR	R12, [R4, #CHN_WAVE_OFFSET]		@ load the wave data offset to R1
ADD	R2, R2, R12				@ add the wave data offset and 0x10 to get the actual position in ROM
ADD	R2, R2, #0x10				@ 
LDR	R5, decoder_buffer			@ load the decoder buffer pointer to R5
ADR	R6, delta_lookup_table			@ load the lookup table pointer to R6
MOV	R7, #0x40				@ load the block sample count (0x40) to R7
LDRB	LR, [R2], #1				@ load the first byte & sample from the wave data to LR (each block starts with a signed 8 bit pcm sample) LDRSB not necessary due to the 24 high bits being cut off anyway
STRB	LR, [R5], #1				@ write the sample to the decoder buffer
LDRB	R12, [R2], #1				@ load the next 2 samples to R1 (to get decoded) --- LSBits is decoded first and MSBits last
B	bdpcm_decoder_lsb

bdpcm_decoder_msb:

LDRB	R12, [R2], #1				@ load the next 2 samples to get decoded
MOV	R0, R12, LSR#4				@ seperate the current samples' bits
LDRSB	R0, [R6, R0]				@ load the differential value from the lookup table
ADD	LR, LR, R0				@ add the decoded value to the previous sample value to calc the current samples' level
STRB	LR, [R5], #1				@ write the output sample to the decoder buffer and increment buffer pointer

bdpcm_decoder_lsb:

AND	R0, R12, #0xF				@ seperate the 4 LSBits
LDRSB	R0, [R6, R0]				@ but the 4 bit value into the lookup table and save the result to R0
ADD	LR, LR, R0				@ add the value from the lookup table to the previous value to calc the new one
STRB	LR, [R5], #1				@ store the decoded sample to the decoding buffer
SUBS	R7, R7, #2				@ decrease the block sample counter by 2 (2 samples each byte) and check if it is still above 0
BGT	bdpcm_decoder_msb			@ if there is still samples to decode jump to the MSBits

bdpcm_decoder_return:

LDR	R5, decoder_buffer			@ reload the decompressor buffer offset to R5
AND	R0, R3, #0x3F				@ cut off the main position bits to read data from short buffer
LDRSB	R12, [R5, R0]				@ read the decoded sample from buffer
LDMFD	SP!, {R0, R2, R5-R7, PC}		@ pop registers and return to the compressed sample mixer

@**************** END BDPCM DECODER *****************@

decoder_buffer:
	.word	decoder_buffer_target
delta_lookup_table:
	.byte	0x0, 0x1, 0x4, 0x9, 0x10, 0x19, 0x24, 0x31, 0xC0, 0xCF, 0xDC, 0xE7, 0xF0, 0xF7, 0xFC, 0xFF
.endif

main_mixer_end:

	.end

WARNING: DO NOT ATTEMPT TO REMOVE THE 'NOPs' FROM THE ASSEMBLY! THEY ARE SPACEHOLDERS FOR SELFMODIFYING CODE AND REMOVING THEM BREAKS ABSOLUTELY EVERYTHING!

Assembly and insertion:

Well, since Version 1.0 a few things have changed in the insertion process and things got a little more complicated. However, I will try to do my best to explain things as good as I can.
If you've read the introduction you will most likely already know that my mixer will require an additional mixing buffer for high quality processing. This once requires RAM. The amount of RAM in bytes can be calculated by the following:

Code:

FRAME_LENGTH_XXXXX * 4

XXXXX is the maxmium samplerate supported. To get the values for FRAME_LENGTH_XXXXX check the definitions in the code.
Let's say we want to at least support 13379 Hz do the follwing:

Code:

FRAME_LENGTH_13379 * 4 =
0xE0 * 4 = 0x380

So you'll need 0x380 free bytes in IWRAM for that. The pointer to this aread needs to be put into the assembly code. Use a configuration preset for that (".equ hq_buffer" see code, should be self explaining). More on that later.
The assembly itself can be put anywhere into ROM. However, due to speed concerns the code must be loaded to RAM for the high execution speed. Reserve N bytes in IWRAM for that. Just assemble the code and see how long it is (with all features it should be ~0xB00 bytes).
In comparison to V1.0 of the mixer this new version's code is bigger than Nintendo's code and simply can't be put into the same IWRAM area obviously. This is where things really start to get tricky. IWRAM space in Pokemon games is quite limited and we need a lot of it.
Just cuz of lazyness I'll stick to Pokemon games here. If you work on non Pokemon games you'll need to manage the RAM repointing with your own technique.
Long things short: What I do for Pokemon games is that I move a big structure (0xFB0 bytes, let's call it "Main Sound Area") of the Sound Engine to EWRAM to free things up. This structure contains the outputbuffers that are used by the Sound DMA. Moving this structure to EWRAM wouldn't make sense for Nintendo's mixer because it uses these output buffers as work buffers (lot's of reads and writes) which would slow things down. However, with my code that is not such a big problem because the output buffers are only accessed once.
By freeing up that big chunk of memory there is enough space to put the new mixing code into. Also, by moving the new mixing code to that new location the space of the old mixing code obviously is no longer used (0x800 bytes!) and can be used for the work buffers. This is the common technique I use for Pokemon games.
For all GBA Pokemon games the Main Sound Area can safely be moved to 0x0203E000 as long as it doesn't conflict with one of your personal hacks. For Fire Red you need to disable the Help menu. It will break planets otherwise!

Main Sound Area locations:

Fire Red (US): 0x03005F50
Fire Red (GER): 0x03005E40
Emerald (US, GER): 0x03006380

Contact me if you need offsets for other languages.

For the repointing of the Main Sound Area simply search for the pointers above and replace them with 0x0203E000. The pointer should occur exactly 3 times in Emerald and exactly 2 times in Fire Red. All occurences need to be replaced.

Now that the Main Sound Area has been moved out from their original locations it's time for the new mixing code to move in. You first need to assemble the code above. You will need to set the configuration preset before. DO NOT SKIP THIS! See the chapter below for the details.
After the assembly has finished put the binary somewhere into your ROM. Put the pointer of that binary to here:

Fire Red (US): 0x1DD0B4 (ROM)
Fire Red (GER): 0x1E134C (ROM)
Emerald (US): 0x2E00F0 (ROM)

After doing that you need to set the new RAM pointer for the new code inserted. There is 2 pointers for that: One for the CpuSet to copy the code to the right RAM location and one for the actual program call. The one for the program call has the Thumb bit set, the other one doesn't.
The original mixing code is located at the following addresses:

Fire Red (US): 0x030028E0
Fire Red (GER): 03002830
Emerald (US, GER): 0x03001AA8

For repointing search for this pointer and replace it with the new pointer where the Main Sound Area has been. Do the search and replace once with Thumb bit set and without. Also, because the new code is longer than the old one you'll need to specify the length of data to be transferred by CpuSet. Just open the assembled code in the hexeditor, check the length of the data and write it down. Because CpuSet (at least in this case) transfers data in units of 4 bytes, you'll need to divide the code length by 4 in order to get the amounts of units to transfer. Put this value at the following address (2 bytes only!, keep the little endian byte order in mind):

Fire Red (US): 0x1DD0BC (ROM)
Fire Red (GER): 0x1E1354 (ROM)
Emerald (US): 0x2E00F8 (ROM)

Let's do an example: The assembled code has a length of 0xB88 bytes. That'd result 0x2E2 units. So you'd need to write "0xE2 02" at the specified address.
That's it for the code.

Remember that the new work buffer for the mixer will go where the old mixing code in RAM was? Good. The locations of the original mixing code are ^above^. Use these addresses for the "hq_buffer" config setting and you're done ;)

Next step: Play the game and have fun with low noise audio ^.^
REMEBER: If you are using emulator quick saves, you have to save the game in the game itself and reload the ingame save because the new code is only loaded once during ROM startup and will need a restart of the ROM. (quicksaves will contain the old mixer in the IRAM).

Configuration:
So before you assemble your code you'll need to configure it. The code is designed to have multiple and switchable configuration presets. The preset to be used can be selected in the line that says " .equ USED_GAME, GAME_XXXX". Set XXXX to your gamecode and make sure you create a configuration preset if none exists yet. This is done by a code patterns that looks like the following:

Code:

.if USED_GAME==GAME_BPEE

	.equ	hq_buffer, BUFFER_IRAM_BPE
	.equ	decoder_buffer_target, DECODER_BUFFER_BPE
	.equ	ALLOW_PAUSE, 1
	.equ	DMA_FIX, 1
	.equ	ENABLE_DECOMPRESSION, 1
	.equ	PREVENT_CLIP, 1

.endif

Let me explain what all of them do (1 = on, 0 = off):

hq_buffer: Set this to the value where you want your new work buffer to be.
decoder_buffer_target: This is only used if you enable compressed wave support. This points to a buffer 0x40 bytes long.
ALLOW_PAUSE: To be honest, I'm not sure myself what it does but it is required by Pokemon games for the sound channels to init correctly. Set it to 0 for non Pokemon games.
DMA_FIX: Writes zeroes into all DMA3 registers after using it. This magically fixes a rare crash issue I had in Pokemon games. When working with non Pokemon games try to turn it off first. If the game should occasionally crash or other glitches occur try to turn it on.
ENABLE_DECOMPRESSION: Enables compressed sample end reverse playback support. Required for cries and SFX to work properly on Pokemon. Afaik none but Pokemon games need this by default.
PREVENT_CLIP: Incase the volume of a song or SFX is too loud it might cause the sound buffer to overflow. Enabling this caps the amplitude at the maximum level and prevents the "wrap around" that can cause VERY LOUD crackling noise. This function comes with a general usually negligible performance impact. Enable it if you work with very loud songs and sound effects and have issues with crackling noise (you will definitely not miss it incase you have it). Otherwise turn it off.

To save you some work I've already made presets for BPRE, BPEE and some other games. The only thing you'll need to do in this case is to select the right one in the line ".equ USED_GAME, GAME_XXXX".

Comparison (Nintendo's vs. my V1.0):
Here is a video comparing the default mixer (1st) and my mixer (2nd). Keep in mind that this is still the first and no the latest release of my mixer but the results are pretty similar:

Conclusion:
Yeah, has been quite a few time since I got V2.0 working and releasing V2.1 even though it was bug free. I hope you enjoy the work.
As always, feedback appreciated!

Chaos_Darkrai · Apr 24, 2014

Do I take the whole routine, straight from the first line, then change the values? Or do I start at a specific line?

ipatix · Apr 24, 2014

No, you have to start from the top line (the complete routine). You just have to modify the lines I mentioned above.

Chaos_Darkrai · Apr 24, 2014

So if I am using BPRE, I wouldn't change the routine at all? Or would I put the offset where the words are?

ipatix · Apr 24, 2014

If you use you wouldn't need to change anything, right. And, no, you wouldn't need to change the words because the words are defined by the .equ-s and should change all according to the adjustments you do in the first lines.

Chaos_Darkrai · Apr 24, 2014

ipatix said:
If you use you wouldn't need to change anything, right. And, no, you wouldn't need to change the worse because the worse are defined by the .equ-s and should change all according to the adjustments you do in the first lines.

OK. Thanks a bunch ipatix! Wonderful job on this!

Wobbu · Apr 27, 2014

I just tested this on BPEE and all my custom music sounds so much better now! Newer generation music that I ported to my hack doesn't produce nearly as much unnecessary noises as they used to, especially ones that have a heavy use of high-pitched instruments. Thank you a lot for researching this! The difference is very noticeable.

ipatix · Apr 27, 2014

Thanks for the feedback :)

Sniper · Apr 27, 2014

So I tried same as Wobbu did. It's so much better right now! :) Another research that helped a lot! ^^

designmadman · Apr 27, 2014

Mmmph so i changed the start of your routine so i could use it for fire red and inserted it at 0x800660 then i entered this pointer: (60 06 80 08) at 0x1DD0B4 and the game freezes everytime it tries to play sound i also tried (61 06 80 08) the plus 1 thumb pointer and still no good so i'm sure i screwed up somewhere...

PokeBunny · Apr 28, 2014

Not bad. Turns out you are smart. But the AGB music player sounds bad. But they use it because you only have to use swi functions. Gamefreak is lazy!

ipatix · Apr 28, 2014

@PokeBunny:
Actually Gamefreak is not "that" lazy. To clarify things:
The AGB music player is part of the Nintendo SDK. And they did a pretty decent job on doing a very efficient music player although there is some flaws here and there (like the noisy sound). This music player was implemented into the BIOS when the AGB came out to have a fast music player that doesn't need IWRAM for Code which is stored in the BIOS (remember, BIOS memory is as fast as IWRAM). This however turned out to be a problem: There was some bugs with the old versions of the music player and because Nintendo let the Developer to choose BIOS or IWRAM (updateable) code almost all developers used the IWRAM solution.
So "you only have to use swi functions" is not 100% correctly. Anyway, there is not much Nintendo could have done a lot better. With my code and (enough free CPU load which Pokegames don't have) you could reach very good quality. So it's not "all bad".

Some other developers like Camelot (--> developed Golden Sun) however completley rewrote the code of some parts of the music which provide much higher quality and lower CPU load. This is how they could make the game provide one of the best GBA soundtracks in my opinion:

https://www.youtube.com/watch?v=NrcG9lgGGNg

I might actually try to port their engine to Pokemon in the future but I don't promise that. The code they use is even 10 times as complicated as it already was with Nintendos one. The other thing that'll be tough to do is to add the support for compressed samples and reverse playback to this engine (which no other game than Pokemon is able to; default sound driver modded by Gamefreak).

@designmadman:
I don't know why it won't work. I tried it once on my own on Firered US and it worked.
Usually "bad settings" that are language depended shouldn't crash the game (although the sound myight turn really buggy).

Check again if you did the assembly process correctly (correct settings) and if you did all the pointers correctly.
Other than that I don't know what could have gone wrong...

designmadman · Apr 29, 2014

Yeah i tried it on a clean new fire red rom and it works fine i guess something on my current modified rom is preventing the routine from working properly. Nice job on this sounds really good when i import songs to the clean rom.

ipatix · Apr 29, 2014

It should work on any ROM that doesn't use the free IWRAM areas I use in the code.
For Emerald 0x03005200 (4*0xE0 Bytes)
For Fire Red 0x03004200 (4*0xE0 Bytes)

Kawaii Shoujo Duskull · Apr 29, 2014

Huh. I've got a pretty well-trained ear. I didn't really notice much of a difference in your example, just that the second set of audio played was a bit clearer I guess but not much.
lol Is the change easier to hear in the game itself, or am I the only one who doesn't notice a big difference? Just wondering.

But yeah anyway even if I can't tell the difference or if it isn't that big or whatever, still great job on this ASM! Keep up the good work. ^^

ipatix · Apr 30, 2014

Listen carefully to the parts that are more quiet. Specially at these parts it should be pretty noticeable.

EDIT:
@ anyone who is having problem with issues:
This routine is not compatible with ASM hacks that access the IWRAM at the areas specified in the code. I recently found out that I had to move the area for Emerald to 0x03005100 because prime's DNS uses some areas around there aswell and my routine would cause glitchy pallette changes all the time.

EDIT 2:
I now changed the assembly code to a preset system. The only thing you'll do before assembly is set the right "USED_GAME" and run the assembly.

angelXwind · May 17, 2014

There's a typo on line 48 of your code. See here: https://github.com/angelXwind/pokem...mmit/be377b7540ed89a7588941348ed12fd129440669

Also, I wrote a Makefile around your code that executes the following after asssembly (BPEE target):

Code:

dd if=main.bin of="out.gba" conv=notrunc seek=3014896 bs=1

(3014896 is 0x2E00F0 in dec, here's how that Makefile works: https://github.com/angelXwind/pokemon-gen3-hq-sound-mixer/blob/master/Makefile)

However, the resulting ROM only causes the emulator to loop at the BIOS forever.

The assembled binary is 0x7A0 bytes long as it should be, so I'm (most likely) assembling it correctly.

Am I injecting the binary into the wrong area? Or...

PokeBunny · May 17, 2014

SO there is a music player in the BIOS. I didn't know that. You know the reason: in GBATEK, some of the swi functions are undocumented as SoundWhatever #.

ipatix · May 17, 2014

angelXwind said:
There's a typo on line 48 of your code. See here: https://github.com/angelXwind/pokem...mmit/019a667a4612c4bbfa0438c59ed9a3fbdbc983f9

Also, I wrote a Makefile around your code that executes the following after asssembly (BPEE target):

Code:

dd if=main.bin of="out.gba" conv=notrunc seek=3014896 bs=1

(3014896 is 0x2E00F0 in dec, here's how that Makefile works: https://github.com/angelXwind/pokemon-gen3-hq-sound-mixer/blob/master/Makefile)

However, the resulting ROM only causes the emulator to loop at the BIOS forever.

The assembled binary is 0x7A0 bytes long as it should be, so I'm (most likely) assembling it correctly.

Am I injecting the binary into the wrong area? Or...

You can't just overwrite the old code. My new one is slightly bigger so you'll overwrite other stuff. Perhaps this could be the problem.

@all: I just want to announce that version 2.0 of the mixer is already very far in development state. The biggest changes are that the code is completley rewritten, executes almost twice as fast and supports a basic Synth engine without the use of samples in ROM. It'll probably still be incompatible with interdepth's RTC (and/or DNS) for now due to overlapping RAM areas but I'll tell you later more about that.

angelXwind · May 17, 2014

ipatix said:
You can't just overwrite the old code. My new one is slightly bigger so you'll overwrite other stuff. Perhaps this could be the problem.

Ah, thanks for the information. Injecting the binary into some free space in the ROM then modifying the pointer seems to work.

Also, I created a GitHub repository with a Makefile that completely automates the process of assembling and injecting the binary into a ROM. https://github.com/angelXwind/pokemon-gen3-hq-sound-mixer

Development: ipatix' High Quality Sound Mixer | V2.1 released!

More options

ipatix

Sound Expert

Chaos_Darkrai

ipatix

Sound Expert

Chaos_Darkrai

ipatix

Sound Expert

Chaos_Darkrai

Wobbu

bunger bunger bunger bunger

ipatix

Sound Expert

Sniper

ふゆかい

designmadman

PokeBunny

Pokemon Game Maker

ipatix

Sound Expert

designmadman

ipatix

Sound Expert

Kawaii Shoujo Duskull

The Cutest Duskull

ipatix

Sound Expert

angelXwind

SHSL Programmer Pineapple Girl

PokeBunny

Pokemon Game Maker

ipatix

Sound Expert

angelXwind

SHSL Programmer Pineapple Girl