• Our software update is now concluded. You will need to reset your password to log in. In order to do this, you will have to click "Log in" in the top right corner and then "Forgot your password?".
  • Welcome to PokéCommunity! Register now and join one of the best fan communities on the 'net to talk Pokémon and more! We are not affiliated with The Pokémon Company or Nintendo.

Development: ipatix' High Quality Sound Mixer | V2.1 released!

ipatix

Sound Expert
145
Posts
15
Years
  • ipatix' High Quality Sound Mixer V2.1

    Introduction:
    If you are not interested in technical stuff skip to assembly and insertion. This is a snippet which could be used by any hack regardless of the use of music hacks. It also improves the vanilla music quality at no cost.

    Hello and welcome to a new development thread of mine.
    As you might already know from the thread's name I developed a new Sound Mixing Routine for GBA games that use the M4A driver (aka Sappy driver).
    But what is the Sound Mixer? To understand what it does you need to know how digital sound is produced and what hardware abilities the AGB has.
    To explain it very basic: The AGB only has 2 hardware channels for sound playback (usually one for the left and one for the right speaker). With these 2 channels we could only play 1 stereo sound at a time. This is where the mixer and a resampler comes in handy: The mixer "mixes" together a few sounds and produces one output sound. Using this we can play back more sound at the same time and if we use this in combination with a resampler we can playback any sound at any given samplingrate (--> variable pitch for notes) at the same time.
    All of this is done by Nintendo's (with some mods done by Game Freak) Sound Engine which comes with their SDK. Sounds cool, doesn't it?

    There has been done one major flaw with the design of the Mixer though:
    The Sound Mixer produces a short period of sound each frame (~ 1/60 s) which gets placed in a buffer in memory. This data is then transferred by hardware timers and DMAs to the sound circuit for playback. Since the AGB only supports 8 bit resoultion of the audio samples this buffer must have an 8 bit depth. Because Nintendo wanted to make their code use less System ressources and RAM they also use this output buffer as work area for the actual mixing. This might not sound very problematic but the issues we are getting is that a sound that has an 8 bit resolution has hearable quantization noise. This quantization noise is pretty low, however, each time the mixer adds another sound from a virtual channel (these are also called Direct Sound channels although they have nothing to do with Microsoft's DirectSound) it adds quantization noise to the buffer due to the volume scaling that is always done (we can't play all channels at a fixed volume). Because the quantization noise is applied once per channel it get's really loud and is really annoying (even in some commercial titles, not Pokemon though). In Pokemon games this is mostly not noticeable due to an untrained ear and the limited virtual Direct Sound channels of 5. 5 Direct Sound channels aren't much though (if you have ever done Music Hacking) and I personally do the 12 channel hack already for quite a long time. This makes the noise way worse though and it did make the music sound really bad sometimes.

    Then I came up with a solution for this:
    Let's use a work area with a higher bit depth (e.g. commonly used 16 bits) to eliminate quantization noise during the mixing process and only add the noise once for the final downscaling to the main output buffer. The only problem we're getting is that we need an additional "work buffer" in IRAM. We need to use IRAM and not regular WRAM due to the execution performance we need. This also why the mixing routine is really complicated and very annoying to read to get the best performance possible (Nintendo's original does so, mine is even wrose in that aspect). The other thing that is necessary to run things as fast is possible is that the mixing routine is placed in IRAM aswell for faster code loading and the ability to use the ARM instruction set with no performance cost but with the ability to reduce the overall amount of instructions.
    Originally I disassembled Nintendo's mixer code and found out how it worked. I then developed the first version of this code which was just a slight modification of the original but was enough to realize my initial goal. You know, it worked but it had a few issues and the code was a bit slower than the original. I need to go a little offtopic how I got along to write Version 2 of my mixer:
    Well, a friend showed me this game called "Golden Sun" (if you haven't played it I really recommend to do so!). I've played it throught and especially liked the soundtrack of this game. Being interested in things I tried to extract their music and discovered that they also used the M4A sound driver and I got the MIDIs as I wanted to. None the less I was even more impressed by the high sound quality "Golden Sun" and "Golden Sun TLA" had. It took me some time but after a bit I just realised that the game had an incredibly clear sound and low noise level. All of that was before I ever intended writing all of this here. Doing Pokemon hacks I was just like "I need this high quality sound too" and after writing the first version of my high quality mixer I had the idea of copying whatever Camelot did to improve the sound quality to my new code base. No idea what I was up to I ran up a few debugging tools and tried to find out what the game was doing differently than all other GBA games with the M4A driver. Well, it took me quite some time to read the most ugly assembly code I've ever seen in my life but I managed to understand and document it. What even more surprised me that Golden Sun's sound ran at way higher samplingrates than most games did. How could all of that work? Well, I found out that their code was simly 30-100% faster (scaling better with higher samplingrates) than Nintendo's original while still providing higher quality. This lead me to where I currently am, writing the new code with the "power of Golden Sun". In the end I didn't copy everything they had like an even more obscure reverb algorithm. But even without the fancy reverb the code has served me and a few other guys out there (check Bregalad's new Final Fantasy Advance Sound Restaurations!)


    The Routine:
    Version V2.1 is currently being refactored and made look nice. I don't really work actively on it but hopefully it'll end up more readable and better documented that it has been before. In the future I might move it to my GitHub, but for now it remains here. Try to understand things ;) I am definitely impressed if you do so!

    Code:
    @ created by ~ipatix~
    @ revision 2.1
    
        /* globals */
    	.global	main_mixer
    	.global	main_mixer_end
    
        /* game code definitions */
    	.equ	GAME_BPED, 0
    	.equ	GAME_BPEE, 1
    	.equ	GAME_BPRE, 2
    	.equ	GAME_KWJ6, 3
    	.equ	GAME_AE7E, 4
    	.equ	GAME_BPRD, 5
    
        /* SELECT USED GAME HERE */
    	.equ	USED_GAME, GAME_BPRE		@ CHOOSE YOUR GAME
    
    	.equ	FRAME_LENGTH_5734, 0x60
    	.equ	FRAME_LENGTH_7884, 0x84	    @ THIS MODE IS NOT SUPPORTED BY THIS ENGINE BECAUSE IT DOESN'T USE AN 8 ALIGNED BUFFER LENGTH
    	.equ	FRAME_LENGTH_10512, 0xB0
    	.equ	FRAME_LENGTH_13379, 0xE0	@ DEFAULT
    	.equ	FRAME_LENGTH_15768, 0x108
    	.equ	FRAME_LENGTH_18157, 0x130
    	.equ	FRAME_LENGTH_21024, 0x160
    	.equ	FRAME_LENGTH_26758, 0x1C0
    	.equ	FRAME_LENGTH_31536, 0x210
    	.equ	FRAME_LENGTH_36314, 0x260
    	.equ	FRAME_LENGTH_40137, 0x2A0
    	.equ	FRAME_LENGTH_42048, 0x2C0
    
    	.equ	DECODER_BUFFER_BPE, 0x03001300
    	.equ	DECODER_BUFFER_BPR, 0x03002088
    	.equ	DECODER_BUFFER_KWJ, 0x03005800
    
    	.equ	BUFFER_IRAM_BPE, 0x03001AA8
    	.equ	BUFFER_IRAM_BPR, 0x030028E0
    	.equ	BUFFER_IRAM_KWJ, 0x03005840
    	.equ	BUFFER_IRAM_AE7, 0x03006D60	@ PUT THE WORKBUFFER ADDRESS FOR FIRE EMBLEM HERE!!!
    
        /* stack variables */
    	.equ	ARG_FRAME_LENGTH, 0x0       @ TODO actually use this variable
    	.equ	ARG_REMAIN_CHN, 0x4         @ This is the channel count variable    
    	.equ	ARG_BUFFER_POS, 0x8         @ stores the current output buffer pointer
    	.equ	ARG_LOOP_START_POS, 0xC     @ stores wave loop start position in channel loop
    	.equ	ARG_LOOP_LENGTH, 0x10       @   ''    ''   ''  end position
    @   .equ    ARG_UKNOWN, 0x14 
    	.equ	ARG_VAR_AREA, 0x18          @ pointer to engine the main work area
    
        /* channel struct */
    	.equ	CHN_STATUS, 0x0             @ [byte] channel status bitfield
    	.equ	CHN_MODE, 0x1               @ [byte] channel mode bitfield
    	.equ	CHN_VOL_1, 0x2              @ [byte] volume right
    	.equ	CHN_VOL_2, 0x3              @ [byte] volume left
    	.equ	CHN_ATTACK, 0x4             @ [byte] wave attack summand
    	.equ	CHN_DECAY, 0x5              @ [byte] wave decay factor
    	.equ	CHN_SUSTAIN, 0x6            @ [byte] wave sustain level
    	.equ	CHN_RELEASE, 0x7            @ [byte] wave release factor
    	.equ	CHN_ADSR_LEVEL, 0x9         @ [byte] current envelope level
    	.equ	CHN_FINAL_VOL_1, 0xA		@ [byte] not used anymore!
    	.equ	CHN_FINAL_VOL_2, 0xB		@ [byte] not used anymore!
    	.equ	CHN_ECHO_VOL, 0xC           @ [byte] pseudo echo volume
    	.equ	CHN_ECHO_REMAIN, 0xD        @ [byte] pseudo echo length
    	.equ	CHN_POSITION_REL, 0x18		@ [word] sample countdown in mixing loop
    	.equ	CHN_FINE_POSITION, 0x1C     @ [word] inter sample position (23 bits)
    	.equ	CHN_FREQUENCY, 0x20         @ [word] sample rate (in Hz)
    	.equ	CHN_WAVE_OFFSET, 0x24       @ [word] wave header pointer
    	.equ	CHN_POSITION_ABS, 0x28		@ [word] points to the current position in the wave data (relative offset for compressed samples)
    	.equ	CHN_BLOCK_COUNT, 0x3C       @ [word] only used for compressed samples: contains the value of the block that is currently decoded
    
        /* wave header struct */
    	.equ	WAVE_LOOP_FLAG, 0x3         @ [byte] 0x0 = oneshot; 0x40 = looped
    	.equ	WAVE_FREQ, 0x4              @ [word] pitch adjustment value = mid-C samplerate * 1024
    	.equ	WAVE_LOOP_START, 0x8        @ [word] loop start position
    	.equ	WAVE_LENGTH, 0xC            @ [word] loop end / wave end position
        .equ    WAVE_DATA, 0x10             @ [byte array] actual wave data
    
        /* pulse wave synth configuration offset */
    	.equ	SYNTH_BASE_WAVE_DUTY, 0x1   @ [byte]
    	.equ	SYNTH_WIDTH_CHANGE_1, 0x2   @ [byte]
    	.equ	SYNTH_MOD_AMOUNT, 0x3       @ [byte]
    	.equ	SYNTH_WIDTH_CHANGE_2, 0x4   @ [byte]
    
        /* CHN_STATUS flags - 0x0 = OFF */
    	.equ	FLAG_CHN_INIT, 0x80         @ [bit] write this value to init a channel
    	.equ	FLAG_CHN_RELEASE, 0x40      @ [bit] write this value to release (fade out) the channel
    	.equ	FLAG_CHN_COMP, 0x20         @ [bit] is wave being played compressed (yes/no)
    	.equ	FLAG_CHN_LOOP, 0x10         @ [bit] loop (yes/no)
    	.equ	FLAG_CHN_ECHO, 0x4          @ [bit] echo phase
    	.equ	FLAG_CHN_ATTACK, 0x3        @ [bit] attack phase
    	.equ	FLAG_CHN_DECAY, 0x2         @ [bit] decay phase
    	.equ	FLAG_CHN_SUSTAIN, 0x1       @ [bit] sustain phase
    
        /* CHN_MODE flags */
    	.equ	MODE_FIXED_FREQ, 0x8        @ [bit] set to disable resampling (i.e. playback with output rate)
    	.equ	MODE_REVERSE, 0x10          @ [bit] set to reverse sample playback
    	.equ	MODE_COMP, 0x30             @ [bit] is wave being played compressed or reversed (TODO: rename flag)
    	.equ	MODE_SYNTH, 0x40            @ [bit] READ ONLY, indicates synthzied output
    
        /* variables of the engine work area */
    	.equ	VAR_REVERB, 0x5             @ [byte] 0-127 = reverb level
    	.equ	VAR_MAX_CHN, 0x6            @ [byte] maximum channels to process
    	.equ	VAR_MASTER_VOL, 0x7         @ [byte] PCM master volume
    	.equ	VAR_DEF_PITCH_FAC, 0x18     @ [word] this value get's multiplied with the samplerate for the inter sample distance
    	.equ	VAR_FIRST_CHN, 0x50         @ [CHN struct] relative offset to channel array
    
        /* just some more defines */
    	.equ	REG_DMA3_SRC, 0x040000D4
        .equ    ARM_OP_LEN, 0x4
    
    @#######################################
    @*********** GAME CONFIGS **************
    @ add the game's name above to the ASM .equ-s before creating new configs
    @#######################################
    
    
    @*********** IF GERMAN POKEMON EMERALD
    .if USED_GAME==GAME_BPED
    
    	.equ	hq_buffer, BUFFER_IRAM_BPE
    	.equ	decoder_buffer_target, DECODER_BUFFER_BPE
    	.equ	ALLOW_PAUSE, 1
    	.equ	DMA_FIX, 1
    	.equ	ENABLE_DECOMPRESSION, 1
    	.equ	PREVENT_CLIP, 1
    
    .endif
    @*********** IF ENGLISH POKEMON FIRE RED
    .if USED_GAME==GAME_BPRD
    
    	.equ	hq_buffer, BUFFER_IRAM_BPR
    	.equ	decoder_buffer_target, DECODER_BUFFER_BPR
    	.equ	ALLOW_PAUSE, 1
    	.equ	DMA_FIX, 1
    	.equ	ENABLE_DECOMPRESSION, 1
    	.equ	PREVENT_CLIP, 1
    
    .endif
    @*********** IF ENGLISH POKEMON EMERALD
    .if USED_GAME==GAME_BPEE
    
    	.equ	hq_buffer, BUFFER_IRAM_BPE
    	.equ	decoder_buffer_target, DECODER_BUFFER_BPE
    	.equ	ALLOW_PAUSE, 1
    	.equ	DMA_FIX, 1
    	.equ	ENABLE_DECOMPRESSION, 1
    	.equ	PREVENT_CLIP, 1
    
    .endif
    @*********** IF ENGLISH POKEMON FIRE RED
    .if USED_GAME==GAME_BPRE
    
    	.equ	hq_buffer, BUFFER_IRAM_BPR
    	.equ	decoder_buffer_target, DECODER_BUFFER_BPR
    	.equ	ALLOW_PAUSE, 1
    	.equ	DMA_FIX, 1
    	.equ	ENABLE_DECOMPRESSION, 1
    	.equ	PREVENT_CLIP, 1
    
    .endif
    @*********** IF KAWAs JUKEBOX 2006
    .if USED_GAME==GAME_KWJ6
    
    	.equ	hq_buffer, BUFFER_IRAM_KWJ
    	.equ	decoder_buffer_target, DECODER_BUFFER_KWJ
    	.equ	ALLOW_PAUSE, 0
    	.equ	DMA_FIX, 0
    	.equ	ENABLE_DECOMPRESSION, 0
    	.equ	PREVENT_CLIP, 1
    
    .endif
    @*********** IF US FIRE EMBLEM
    .if USED_GAME==GAME_AE7E
    
    	.equ	hq_buffer, BUFFER_IRAM_AE7
    	.equ	ALLOW_PAUSE, 0
    	.equ	DMA_FIX, 0
    	.equ	ENABLE_DECOMPRESSION, 0
    	.equ	PREVENT_CLIP, 0
    .endif
    @***********
    
    	.thumb
    
    main_mixer:
        /* load Reverb level and check if we need to apply it */
        LDRB	R3, [R0, #VAR_REVERB]
        LSR	R3, R3, #2
        BEQ  	clear_buffer
    
        ADR	R1, do_reverb
        BX	R1
    
    	.align	2
    	.arm
    
    do_reverb:
    
        /* 
         * reverb is calculated by the following: new_sample = old_sample * reverb_level / 127
         * note that reverb is mono (both sides get mixed together)
         * 
         * reverb get's applied to the frame we are currently looking at and the one after that
         * the magic below simply calculateds the pointer for the one after the current one
         */
    
        CMP	R4, #2
        ADDEQ R7, R0, #0x350
        ADDNE R7, R5, R8
        MOV	R4, R8
        ORR	R3, R3, R3, LSL#16			
        STMFD SP!, {R8, LR}
        LDR	LR, hq_buffer_label
    
    reverb_loop:
            /* This loop does the reverb processing */
            LDRSB	R0, [R5, R6]
            LDRSB	R1, [R5], #1
            LDRSB	R2, [R7, R6]
            LDRSB	R8, [R7], #1
            LDRSB	R9, [R5, R6]
            LDRSB	R10, [R5], #1
            LDRSB	R11, [R7, R6]
            LDRSB	R12, [R7], #1
            ADD	R0, R0, R1
            ADD	R0, R0, R2
            ADDS	R0, R0, R8
            ADDMI	R0, R0, #0x4
            ADD	R1, R9, R10
            ADD	R1, R1, R11
            ADDS	R1, R1, R12
            ADDMI	R1, R1, #0x4
            MUL	R0, R3, R0
            MUL	R1, R3, R1
            STMIA	LR!, {R0, R1}
            SUBS	R4, R4, #2
            BGT	reverb_loop
            /* end of loop */
        LDMFD	SP!, {R8, LR}
        ADR	R0, (adsr_setup+1)
        BX	R0
    
    	.thumb
    
    clear_buffer:
        /* Incase reverb is disabled the buffer get's set to zero */
        LDR	R3, hq_buffer_label
        MOV	R1, R8
        MOV	R4, #0
        MOV	R5, #0
        MOV	R6, #0
        MOV	R7, #0
        /*
         * Setting the buffer to zero happens in a very efficient loop
         * Depending on the alignment of the buffer length, twice or quadruple the amount of bytes
         * get cleared at once
         */
        LSR	R1, #3
        BCC	clear_buffer_align_8
    
        STMIA	R3!, {R4, R5, R6, R7}
    
    clear_buffer_align_8:
    
        LSR	R1, #1
        BCC	clear_buffer_align_16
    
        STMIA	R3!, {R4, R5, R6, R7}
        STMIA	R3!, {R4, R5, R6, R7}
    
    clear_buffer_align_16:
            /* This repeats until the buffer has been cleared */
            STMIA	R3!, {R4, R5, R6, R7}
            STMIA	R3!, {R4, R5, R6, R7}
            STMIA	R3!, {R4, R5, R6, R7}
            STMIA	R3!, {R4, R5, R6, R7}
            SUB	    R1, #1
            BGT	    clear_buffer_align_16
            /* loop end */
    adsr_setup:
        /*
         * okay, before the actual mixing starts
         * the volume and envelope calculation happens
         */
        MOV R4, R8  @ R4 = buffer length
        /* this buffers the buffer length to a backup location
         * TODO: Move this variable to stack
         */
        ADR	R0, hq_buffer_length_label
        STR	R4, [R0]
        /* init channel loop */
        LDR	R4, [SP, #ARG_VAR_AREA]	        @ R4 = main work area pointer
        LDR	R0, [R4, #VAR_DEF_PITCH_FAC]	@ R0 = samplingrate pitch factor
        MOV	R12, R0					        @ --> R12
        LDRB R0, [R4, #VAR_MAX_CHN]		    @ load MAX channels to R0
        ADD	R4, #VAR_FIRST_CHN  			@ R4 = Base channel Offset (Channel #0)
    
    mixer_entry:
            /* this is the main channel processing loop */
            STR	R0, [SP, #ARG_REMAIN_CHN]		
            LDR	R3, [R4, #CHN_WAVE_OFFSET]
            LDRB R6, [R4, #CHN_STATUS]
            MOVS R0, #0xC7					@ check if any of the channel status flags is set
            TST	R0, R6						@ check if none of the flags is set
            BEQ return_channel_null 		@ skip channel
            /* check channel flags */
            LSL	R0, R6, #25 				@ shift over the FLAG_CHN_INIT to CARRY
            BCC	adsr_echo_check				@ continue with normal channel procedure
            /* check leftmost bit */
            BMI	stop_channel_handler		@ if the channel is initiated but on release it gets turned off immediatley
            /* channel init procedure */
            MOVS R6, #FLAG_CHN_ATTACK		@ set the channel status to ATTACK
            MOVS R0, R3						@ R0 = CHN_WAVE_OFFSET
            ADD	R0, #WAVE_DATA				@ R0 = wave data offset
    
            /* Pokemon games seem to init channels differently than other m4a games */
        .if ALLOW_PAUSE==0
            STR	R0, [R4, #CHN_POSITION_ABS]
            LDR	R0, [R3, #WAVE_LENGTH]
            STR	R0, [R4, #CHN_POSITION_REL] 
        .else
            LDR	R1, [R4, #CHN_POSITION_REL]
            ADD	R0, R0, R1
            STR	R0, [R4, #CHN_POSITION_ABS]
            LDR	R0, [R3, #WAVE_LENGTH]
            SUB	R0, R0, R1
            STR	R0, [R4, #CHN_POSITION_REL]
        .endif
    
            MOVS R5, #0						@ initial envelope = #0
            STRB R5, [R4, #CHN_ADSR_LEVEL]
            STR	R5, [R4, #CHN_FINE_POSITION]
            LDRB R2, [R3, #WAVE_LOOP_FLAG]
            LSR	R0, R2, #6
            BEQ	adsr_attack_handler         @ if loop disabled --> branch
            /* loop enabled here */
            MOVS R0, #FLAG_CHN_LOOP	
            ORR	R6, R0      				@ update channel status
            B adsr_attack_handler
    
    adsr_echo_check:
            /* this is the normal ADSR procedure without init */
            LDRB R5, [R4, #CHN_ADSR_LEVEL]
            LSL	R0, R6, #29				    @ echo flag --> bit 31
            BPL	adsr_release_check			@ PL == false
            /* pseudo echo handler */
            LDRB R0, [R4, #CHN_ECHO_REMAIN]
            SUB	R0, #1
            STRB R0, [R4, #CHN_ECHO_REMAIN]
            BHI	channel_vol_calc			@ if echo still on --> branch
    
    stop_channel_handler:
    
            MOVS R0, #0
            STRB R0, [R4, #CHN_STATUS]
    
    return_channel_null:
            /* go to end of the channel loop */
            B check_remain_channels
    
    adsr_release_check:
            LSL	R0, R6, #25					@ bit 31 = release bit
            BPL	adsr_decay_check			@ if release == 0 --> branch
            /* release handler */
            LDRB R0, [R4, #CHN_RELEASE]
            @SUB R0, #0xFF                  @ linear decay; TODO make option for triggering it
            @SUB R0, #1
            @ADD R5, R5, R0
            MUL	R5, R5, R0	            	@ default release algorithm
            LSR	R5, R5, #8
            @BMI adsr_released_handler      @ part of linear decay
            BEQ	adsr_released_handler	    @ release gone down to #0 --> branch
            /* pseudo echo init handler */
            LDRB R0, [R4, #CHN_ECHO_VOL]
            CMP	R5, R0
            BHI	channel_vol_calc            @ if release still above echo level --> branch
    
    adsr_released_handler:
            /* if volume released to #0 */
            LDRB R5, [R4, #CHN_ECHO_VOL]    @ TODO: replace with MOV R5, R0
            CMP	R5, #0
            BEQ	stop_channel_handler        @ if pseudo echo vol = 0 --> branch
            /* pseudo echo volume handler */
            MOVS R0, #FLAG_CHN_ECHO
            ORR	R6, R0						@ set the echo flag
            B adsr_update_status
    
    adsr_decay_check:
            /* check if decay is active */
            MOVS R2, #3
            AND	R2, R6                      @ seperate phase status bits
            CMP	R2, #FLAG_CHN_DECAY
            BNE	adsr_attack_check			@ decay not active --> branch
            /* decay handler */
            LDRB R0, [R4, #CHN_DECAY]
            MUL	R5, R0
            LSR	R5, R5, #8
            LDRB R0, [R4, #CHN_SUSTAIN]
            CMP	R5, R0
            BHI	channel_vol_calc		    @ sample didn't decay yet --> branch
            /* sustain handler */
            MOVS R5, R0						@ current level = sustain level
            BEQ	adsr_released_handler       @ sustain level #0 --> branch
            /* step to next phase otherweise */
            B adsr_switchto_next
    
    adsr_attack_check:
            /* attack handler */
            CMP	R2, #FLAG_CHN_ATTACK
            BNE	channel_vol_calc			@ if it isn't in attack attack phase, it has to be in sustain (no adsr change needed) --> branch
    
    adsr_attack_handler:
            /* apply attack summand */
            LDRB R0, [R4, #CHN_ATTACK]
            ADD	R5, R5, R0
            CMP	R5, #0xFF
            BCC	adsr_update_status
            /* cap attack at 0xFF */
            MOVS R5, #0xFF
     
    adsr_switchto_next:
            /* switch to next adsr phase */
            SUB	R6, #1
    
    adsr_update_status:
            /* store channel status */
            STRB R6, [R4, #CHN_STATUS]
    
    channel_vol_calc:
            /* store the calculated ADSR level */
            STRB R5, [R4, #CHN_ADSR_LEVEL]
            /* apply master volume */
            LDR	R0, [SP, #ARG_VAR_AREA]
            LDRB R0, [R0, #VAR_MASTER_VOL]
            ADD	R0, #1
            MUL	R5, R0, R5
            /* left side volume */
            LDRB R0, [R4, #CHN_VOL_2]
            MUL	R0, R5
            LSR	R0, R0, #13
            MOV	R10, R0                     @ R10 = left volume
            /* right side volume */
            LDRB R0, [R4, #CHN_VOL_1]
            MUL	R0, R5
            LSR	R0, R0, #13
            MOV	R11, R0						@ R11 = right volume
            /*
             * Now we get closer to actual mixing:
             * For looped samples some additional operations are required
             */
            MOVS R0, #FLAG_CHN_LOOP
            AND	R0, R6
            BEQ	mixing_loop_setup				@ TODO: This label should rather be called "skip_loop_setup"
            /* loop setup handler */
            ADD	R3, #WAVE_LOOP_START
            LDMIA R3!, {R0, R1}					@ R0 = loop start, R1 = loop end
            ADD	R3, R0, R3					    @ R3 = loop start position (absolute)
            STR	R3, [SP, #ARG_LOOP_START_POS]	@ backup loop start
            SUB	R0, R1, R0
    
    mixing_loop_setup:
            /* do the rest of the setup */
            STR	R0, [SP, #ARG_LOOP_LENGTH]		@ if loop is off --> R0 = 0x0
            LDR	R5, hq_buffer_label
            LDR	R2, [R4, #CHN_POSITION_REL]		@ remaining samples for channel
            LDR	R3, [R4, #CHN_POSITION_ABS]		@ current stream position (abs)
            LDRB R0, [R4, #CHN_MODE]
            ADR	R1, mixing_arm_setup
            BX R1
    
    	.align	2
    hq_buffer_label:
    	.word	hq_buffer
    hq_buffer_length_label:     @ TODO: Replace with variable on stack
    	.word	0xFFFFFFFF
    
    	.arm
    mixing_arm_setup:
            /* frequency and mixing loading routine */
            LDR	R8, hq_buffer_length_label
            ORRS R11, R10, R11, LSL#16		    @ R11 = 00RR00LL
            BEQ	switchto_thumb					@ volume #0 --> branch and skip channel processing
            /* normal processing otherwise */
            TST R0, #MODE_FIXED_FREQ
            BNE	fixed_mixing_setup
            TST R0, #MODE_COMP
            BNE special_mixing	                @ compressed? --> branch
            /* same here */
            STMFD SP!, {R4, R9, R12}
            /*
             * This mixer supports 4 different kind of synthesized sounds
             * They are triggered when the loop end = 0
             * This get's checked below
             */
            MOVS R2, R2
            ORREQ R0, R0, #MODE_SYNTH
            STREQB R0, [R4, #CHN_MODE]
            ADD	R4, R4, #CHN_FINE_POSITION
            LDMIA R4, {R7, LR}					@ R7 = Fine Position, LR = Frequency
            MUL	R4, R12, LR					    @ R4 = inter sample steps = output rate factor * samplerate
            /* now the first samples get loaded */
            LDRSB R6, [R3], #1
            LDRSB R12, [R3]
            TST	R0, #MODE_SYNTH
            BNE	init_synth
            /* incase no synth mode should be used, code contiues here */
            SUB	R12, R12, R6					@ R12 = DELTA
            /*
             * Mixing goes with volume ranges 0-127
             * They come in 0-255 --> divide by 2
             */
            MOVS R11, R11, LSR#1
            ADC	R11, R11, #0x8000
            BIC	R11, R11, #0xFF00
            MOV	R1, R7	    					@ R1 = inter sample position
            /*
             * There is 2 different mixing codepaths for uncompressed data
             *  path 1: fast mixing, but doesn't supports loop or stop
             *  path 2: not so fast but supports sample loops / stop
             * This checks if there is enough samples aviable for path 1.
             * important: R0 is expected to be #0
             */
            UMLAL R1, R0, R4, R8
            MOV	R1, R1, LSR#23
            ORR	R0, R1, R0, LSL#9
            CMP	R2, R0						    @ actual comparison
            BLE	split_sample_loading			@ if not enough samples are available for path 1 --> branch
            /* 
             * This is the mixer path 1.
             * The interesting thing here is that the code will
             * buffer enough samples on stack if enough space
             * on stack is available (or goes over the limit of 0x400 bytes)
             */
            SUB	R2, R2, R0
            LDR	R10, stack_capacity
            ADD	R10, R10, R0
            CMP	R10, SP
            ADD	R10, R3, R0
            ADR	R9, custom_stack_3
            /*
             * R2 = remaining samples
             * R10 = final sample position
             * SP = original stack location
             * These values will get reloaded after channel processing
             * due to the lack of registers.
             */
            STMIA	R9, {R2, R10, SP}
            CMPCC	R0, #0x400                  @ > 0x400 bytes --> read directly from ROM rather than buffered
            BCS	select_mixing_mode              @ TODO rename
            /*
             * The code below inits the DMA to read word aligned
             * samples from ROM to stack
             */
            BIC	R1, R3, #3
            MOV	R9, #0x04000000
            ADD	R9, R9, #0xD4
            ADD	R0, R0, #7
            MOV	R0, R0, LSR#2
            SUB SP, SP, R0, LSL#2
            AND	R3, R3, #3
            ADD	R3, R3, SP
            ORR	LR, R0, #0x84000000
            STMIA R9, {R1, SP, LR}              @ actually starts the DMA
    
            /* Somehow is neccesary for some games not to break */
        .if DMA_FIX==1
            MOV	R0, #0
            MOV	R1, R0
            MOV	R2, R1
            STMIA R9, {R0, R1, R2}
        .endif
    
    select_mixing_mode:
            /*
             * This code decides which piece of code to load
             * depending on playback-rate / default-rate ratio.
             * Modes > 1.0 run with different volume levels.
             */
            SUBS R4, R4, #0x800000
            MOVPL R11, R11, LSL#1
            ADR	R0, math_resources				@ loads the base pointer of the code
            ADDPL R0, R0, #(ARM_OP_LEN*6)       @ 6 instructions further
            SUBPLS R4, R4, #0x800000
            ADDPL R0, R0, #(ARM_OP_LEN*6)
            ADDPL R4, R4, #0x800000				@ TODO how does restoring for > 2.0 ratios work?
            LDR	R2, function_pointer
            CMP	R0, R2						    @ code doesn't need to be reloaded if it's already in place
            BEQ	mixing_init
            /* This loads the needed code to RAM */
            STR	R0, function_pointer
            LDMIA R0, {R0-R2, R8-R10}			@ load 6 opcodes
            ADR	LR, runtime_created_routine
    
    create_routine_loop:
                /* paste code to destination, see below for patterns */
                STMIA	LR, {R0, R1}
                ADD	LR, LR, #0x98
                STMIA	LR, {R0, R1}
                SUB	LR, LR, #0x8C
                STMIA	LR, {R2, R8-R10}
                ADD	LR, LR, #0x98
                STMIA	LR, {R2, R8-R10}
                SUB	LR, LR, #0x80
                ADDS	R5, R5, #0x40000000	    @ do that for 4 blocks
                BCC	create_routine_loop
    
            LDR	R8, hq_buffer_length_label
    
    mixing_init:
            MOV	R2, #0xFF000000					@ load the fine position overflow bitmask
    mixing_loop:
            /* This is the actual processing and interpolation code loop; NOPs will be replaced by the code above */
                LDMIA R5, {R0, R1, R10, LR}	@ load 4 stereo samples to Registers
                MUL	R9, R7, R12
    runtime_created_routine:
                NOP							@ Block #1
                NOP
                MLANE R0, R11, R9, R0
                NOP
                NOP
                NOP
                NOP
                BIC	R7, R7, R2, ASR#1
                MULNE	R9, R7, R12
                NOP							@ Block #2
                NOP
                MLANE R1, R11, R9, R1
                NOP
                NOP
                NOP
                NOP
                BIC	R7, R7, R2, ASR#1
                MULNE R9, R7, R12
                NOP							@ Block #3
                NOP
                MLANE R10, R11, R9, R10
                NOP
                NOP
                NOP
                NOP
                BIC	R7, R7, R2, ASR#1
                MULNE R9, R7, R12
                NOP							@ Block #4
                NOP
                MLANE LR, R11, R9, LR
                NOP
                NOP
                NOP
                NOP
                BIC	R7, R7, R2, ASR#1
                STMIA R5!, {R0, R1, R10, LR}	@ write 4 stereo samples
                
                LDMIA R5, {R0, R1, R10, LR}	    @ load the next 4 stereo samples
                MULNE R9, R7, R12	
                NOP							@ Block #1
                NOP
                MLANE R0, R11, R9, R0
                NOP
                NOP
                NOP
                NOP
                BIC	R7, R7, R2, ASR#1
                MULNE R9, R7, R12
                NOP							@ Block #2
                NOP
                MLANE R1, R11, R9, R1
                NOP
                NOP
                NOP
                NOP
                BIC	R7, R7, R2, ASR#1
                MULNE R9, R7, R12
                NOP							@ Block #3
                NOP
                MLANE R10, R11, R9, R10
                NOP
                NOP
                NOP
                NOP
                BIC	R7, R7, R2, ASR#1
                MULNE R9, R7, R12
                NOP							@ Block #4
                NOP
                MLANE LR, R11, R9, LR
                NOP
                NOP
                NOP
                NOP
                BIC	R7, R7, R2, ASR#1
                STMIA R5!, {R0, R1, R10, LR}	@ write 4 stereo samples
                SUBS R8, R8, #8					@ subtract 8 from the sample count
                BGT	mixing_loop
            /* restore previously saved values */
            ADR	R12, custom_stack_3
            LDMIA R12, {R2, R3, SP}
            B mixing_end_func
    
    @ work variables
    
    	.align	2
    custom_stack_3:
    	.word	0x0, 0x0, 0x0
    stack_capacity:
    	.word	0x03007910
    function_pointer:
    	.word	0x0
    
    @ math resources, not directly used
    
    math_resources:
    
    MOV	R9, R9, ASR#22					@ Frequency Lower than default Frequency
    ADDS	R9, R9, R6, LSL#1
    ADDS	R7, R7, R4
    ADDPL	R6, R12, R6
    LDRPLSB	R12, [R3, #1]!
    SUBPLS	R12, R12, R6
    
    ADDS	R9, R6, R9, ASR#23				@ Frequency < 2x && Frequency > default frequency
    ADD	R6, R12, R6
    ADDS	R7, R7, R4
    LDRPLSB	R6, [R3, #1]!
    LDRSB	R12, [R3, #1]!
    SUBS	R12, R12, R6
    
    ADDS	R9, R6, R9, ASR#23				@ Frequency >= 2x higher than default Frequency
    ADD	R7, R7, R4
    ADD	R3, R3, R7, LSR#23
    LDRSB	R6, [R3]
    LDRSB	R12, [R3, #1]!
    SUBS	R12, R12, R6
    
    split_sample_loading:
    
    ADD	R5, R5, R8, LSL#2				@ R5 = End of HQ buffer
    
    uncached_mixing_loop:
    
    MUL	R9, R7, R12					@ calc interpolated DELTA
    MOV	R9, R9, ASR#22					@ scale down the DELTA
    ADDS	R9, R9, R6, LSL#1				@ Add to Base Sample (upscaled to 8 bits again)
    LDRNE	R0, [R5, -R8, LSL#2]				@ load sample from buffer
    MLANE	R0, R11, R9, R0					@ add it to the buffer sample
    STRNE	R0, [R5, -R8, LSL#2]				@ write the sample
    ADD	R7, R7, R4					@ add the step size to the fine position
    MOVS	R9, R7, LSR#23					@ write the overflow amount to R9
    BEQ	uncached_mixing_load_skip			@ skip the mixing load if it isn't required
    
    SUBS	R2, R2, R7, LSR#23				@ remove the overflow count from the remaning samples
    BLLE	loop_end_sub					@ if the loop end is reached call the loop handler
    SUBS	R9, R9, #1					@ remove #1 from the overflow count
    ADDEQ	R6, R12, R6					@ new base sample is previous sample + DELTA
    @RETURN LOCATION FROM LOOP HANDLER
    LDRNESB	R6, [R3, R9]!					@ load new sample
    LDRSB	R12, [R3, #1]!					@ load the delta sample (always required)
    SUB	R12, R12, R6					@ calc new DELTA
    BIC	R7, R7, #0x3F800000				@ clear the overflow from the fine position by using the bitmask
    
    uncached_mixing_load_skip:
    
    SUBS	R8, R8, #1					@ reduce the sample count for the buffer by #1
    BGT	uncached_mixing_loop
    
    mixing_end_func:
    
    SUB	R3, R3, #1					@ reduce sample pointer by #1 (???)
    LDMFD	SP!, {R4, R9, R12}				@ pop values from stack
    STR	R7, [R4, #CHN_FINE_POSITION]			@ store the fine position
    B	store_coarse_sample_pos				@ jump over to code to store coarse channel position
    
    loop_end_sub:
    
    ADD	R3, SP, #ARG_LOOP_START_POS+0xC			@ prepare sample loop start loading and lopo length loading (0xC due to the pushed stack pointer)
    LDMIA	R3, {R3, R6}					@ R3 = Loop Start; R6 = Loop Length
    CMP	R6, #0						@ check if loop is enabled; if Loop is enabled R6 is != 0
    RSBNE	R9, R2, #0					@ the sample overflow from the resampling needs to get subtracted so the remaining samples is slightly less
    ADDNE	R2, R6, R2					@ R2 = add the loop length
    ADDNE	PC, LR, #8					@ return from the subroutine to 2 instructions after the actual return location
    LDMFD	SP!, {R4, R9, R12}				@ restore registers from stack
    B	update_channel_status
    
    fixed_freq_loop_end_handler:
    
    LDR	R2, [SP, #ARG_LOOP_LENGTH+0x8]			@ load the loop length value
    MOVS	R6, R2						@ copy it to R6 and check if loop is disabled
    LDRNE	R3, [SP, #ARG_LOOP_START_POS+0x8]		@ reset the sample pointer to the loop start position
    BXNE	LR						@ if it loops return to mixing function, if it doesn't go on and end mixing
    
    LDMFD	SP!, {R4, R9}
    
    update_channel_status:
    
    STRB	R6, [R4]					@ if loop ist disabled R6 = 0 and we can disable the channel by writing R6 to R4 (channel area)
    B	switchto_thumb					@ switch to thumb
    
    fixed_math_resource:	@ not exectued, used to create mixing function
    
    MOVS	R6, R10, LSL#24
    MOVS	R6, R6, ASR#24
    MOVS	R6, R10, LSL#16
    MOVS	R6, R6, ASR#24
    MOVS	R6, R10, LSL#8
    MOVS	R6, R6, ASR#24
    MOVS	R6, R10, ASR#24
    LDMIA	R3!, {R10}					@ load chunk of samples
    MOVS	R6, R10, LSL#24
    MOVS	R6, R6, ASR#24
    MOVS	R6, R10, LSL#16
    MOVS	R6, R6, ASR#24
    MOVS	R6, R10, LSL#8
    MOVS	R6, R6, ASR#24
    LDMFD	SP!, {R4, R9, R12}
    
    fixed_mixing_setup:
    
    STMFD	SP!, {R4, R9}					@ backup the channel pointer and 
    
    fixed_mixing_check_length:
    
    MOV	LR, R2						@ move absolute sample position to LR
    CMP	R2, R8						@ 
    MOVGT	LR, R8						@ if there is less samples than the buffer to process write the smaller sample amount to LR
    SUB	LR, LR, #1					@ shorten samples to process by #1
    MOVS	LR, LR, LSR#2					@ calculate the amount of words to process (-1/4)
    BEQ	fixed_mixing_process_unaligned			@ process the unaligned samples if there is <= 3 samples to process
    
    SUB	R8, R8, LR, LSL#2				@ subtract the amount of samples we need to process from the buffer length
    SUB	R2, R2, LR, LSL#2				@ subtract the amount of samples we need to process from the remaining samples
    ADR	R1, fixed_mixing_custom_routine
    ADR	R0, fixed_math_resource				@ load the 2 pointers to create function (@R0) by instructions from R1
    MOV	R9, R3, LSL#30					@ move sample alignment bits to the leftmost position
    ADD	R0, R0, R9, LSR#27				@ alignment * 8 + resource offset = new resource offset
    LDMIA	R0!, {R6, R7, R9, R10}				@ load 4 instructions
    STMIA	R1, {R6, R7}					@ write the 1st 2 instructions
    ADD	R1, R1, #0xC					@ move label pointer over to the next slot
    STMIA	R1, {R9, R10}					@ write 2nd block
    ADD	R1, R1, #0xC					@ move label pointer to next block
    LDMIA	R0, {R6, R7, R9, R10}				@ load instructions for block #3 and #4
    STMIA	R1, {R6, R7}					@ write block #3
    ADD	R1, R1, #0xC					@ ...
    STMIA	R1, {R9, R10}					@ write block #4
    LDMIA	R3!, {R10}					@ write read 4 samples from ROM
    
    fixed_mixing_loop:
    
    LDMIA	R5, {R0, R1, R7, R9}				@ load 4 samples from hq buffer
    
    fixed_mixing_custom_routine:
    
    NOP
    NOP
    MLANE	R0, R11, R6, R0					@ add new sample if neccessary
    NOP
    NOP
    MLANE	R1, R11, R6, R1
    NOP
    NOP
    MLANE	R7, R11, R6, R7
    NOP
    NOP
    MLANE	R9, R11, R6, R9
    STMIA	R5!, {R0, R1, R7, R9}				@ write the samples to the work area buffer
    SUBS	LR, LR, #1					@ countdown the sample blocks to process
    BNE	fixed_mixing_loop				@ if the end wasn't reached yet, repeat the loop
    
    SUB	R3, R3, #4					@ reduce sample position by #4, we'll need to load the samples again
    
    fixed_mixing_process_unaligned:
    
    MOV	R1, #4						@ we need to repeat the loop #4 times to completley get rid of alignment errors
    
    fixed_mixing_unaligned_loop:
    
    LDR	R0, [R5]					@ load sample from buffer
    LDRSB	R6, [R3], #1					@ load sample from ROM ro R6
    MLA	R0, R11, R6, R0					@ write the sample to the buffer
    STR	R0, [R5], #4
    SUBS	R2, R2, #1					@ reduce alignment error by #1
    BLEQ	fixed_freq_loop_end_handler
    SUBS	R1, R1, #1
    BGT	fixed_mixing_unaligned_loop			@ repeat the loop #4 times
    
    SUBS	R8, R8, #4					@ reduce the sample amount we wrote to the buffer by #1
    BGT	fixed_mixing_check_length			@ go up to repeat the mixing procedure until the buffer is filled
    
    LDMFD	SP!, {R4, R9}					@ pop registers from stack
    
    store_coarse_sample_pos:
    
    STR	R2, [R4, #CHN_POSITION_REL]			@ store relative and absolute sample position
    STR	R3, [R4, #CHN_POSITION_ABS]			
    
    switchto_thumb:
    
    ADR	R0, (check_remain_channels+1)			@ load the label offset and switch to thumb
    BX	R0
    
    	.thumb
    
    check_remain_channels:
    
    LDR	R0, [SP, #ARG_REMAIN_CHN]			@ load the remaining channels
    SUB	R0, #1						@ reduce the amount by #1
    BLE	mixer_return					@ end the mixing when finished processing all channels
    
    ADD	R4, #0x40
    B	mixer_entry
    
    mixer_return:
    
    ADR	R0, downsampler
    BX	R0
    
    downsampler_return:
    
    LDR	R0, [SP, #ARG_VAR_AREA]			@ load the main var area to R0
    LDR	R3, mixer_finished_status		@ load some status indication value to R3
    STR	R3, [R0]				@ store this value to the main var area
    ADD	SP, SP, #0x1C
    POP	{R0-R7}
    MOV	R8, R0
    MOV	R9, R1
    MOV	R10, R2
    MOV	R11, R3
    POP	{R3}
    BX	R3
    
    	.align	2
    
    mixer_finished_status:
    	.word	0x68736D53
    
    	.arm
    
    downsampler:
    
    LDR	R10, hq_buffer_label
    LDR	R9, [SP, #ARG_BUFFER_POS]
    LDR	R8, hq_buffer_length_label
    MOV	R11, #0xFF
    .if PREVENT_CLIP==1
    
    MOV	R12, #0xFFFFFFFF
    MOV	R12, R12, LSL#14
    MOV	R7, #0x630
    
    downsampler_loop:
    
    LDRSH	R2, [R10], #2
    LDRSH	R0, [R10], #2
    LDRSH	R3, [R10], #2
    LDRSH	R1, [R10], #2
    
    CMP	R0, #0x4000
    MOVGE	R0, #0x3F80
    CMP	R0, #-0x4000
    MOVLT	R0, R12
    
    CMP	R1, #0x4000
    MOVGE	R1, #0x3F80
    CMP	R1, #-0x4000
    MOVLT	R1, R12
    
    CMP	R2, #0x4000
    MOVGE	R2, #0x3F80
    CMP	R2, #-0x4000
    MOVLT	R2, R12
    
    CMP	R3, #0x4000
    MOVGE	R3, #0x3F80
    CMP	R3, #-0x4000
    MOVLT	R3, R12
    
    AND	R0, R11, R0, ASR#7
    AND	R1, R11, R1, ASR#7
    AND	R2, R11, R2, ASR#7
    AND	R3, R11, R3, ASR#7
    
    ORR	R2, R2, R3, LSL#8
    ORR	R0, R0, R1, LSL#8
    
    STRH	R2, [R9, R7]
    STRH	R0, [R9], #2
    
    SUBS	R8, #2
    BGT	downsampler_loop
    
    .else
    downsampler_loop:
    
    LDRH	R4, [R10], #2
    LDRH	R0, [R10], #2
    LDRH	R5, [R10], #2
    LDRH	R1, [R10], #2
    LDRH	R6, [R10], #2
    LDRH	R2, [R10], #2
    LDRH	R7, [R10], #2
    LDRH	R3, [R10], #2
    
    AND	R0, R11, R0, LSR#7
    AND	R1, R11, R1, LSR#7
    AND	R2, R11, R2, LSR#7
    AND	R3, R11, R3, LSR#7
    AND	R4, R11, R4, LSR#7
    AND	R5, R11, R5, LSR#7
    AND	R6, R11, R6, LSR#7
    AND	R7, R11, R7, LSR#7
    
    ORR	R4, R4, R5, LSL#8
    ORR	R4, R4, R6, LSL#16
    ORR	R4, R4, R7, LSL#24
    
    ORR	R0, R0, R1, LSL#8
    ORR	R0, R0, R2, LSL#16
    ORR	R0, R0, R3, LSL#24
    
    STR	R4, [R9, #0x630]
    STR	R0, [R9], #4
    
    SUBS	R8, #4
    BGT	downsampler_loop
    
    .endif
    
    ADR	R0, (downsampler_return+1)
    BX	R0
    
    	.align	2
    
    init_synth:
    
    CMP	R12, #0		@ $030057C4
    BNE	check_synth_type
    
    LDRB	R6, [R3, #SYNTH_WIDTH_CHANGE_1]			@ for saw wave -> 0xF0 (base duty cycle change)
    ADD	R2, R2, R6, LSL#24				@ add it to the current synt
    LDRB	R6, [R3, #SYNTH_WIDTH_CHANGE_2]			@ for saw wave -> 0x80 (base duty cycle change #2)
    ADDS	R6, R2, R6, LSL#24				@ add this to the synth state aswell but keep the old value in R2 and put the new one in R6
    MVNMI	R6, R6	 					@ negate if duty cycle is > 50%
    MOV	R10, R6, LSR#8					@ dividide the final duty cycle by 8 to R10
    LDRB	R1, [R3, #SYNTH_MOD_AMOUNT]			@ for saw wave -> 0xE0
    LDRB	R0, [R3, #SYNTH_BASE_WAVE_DUTY]			@ for saw wave -> 0x10 (base duty cycle offset)
    MOV	R0, R0, LSL#24					@ convert it to a usable duty cycle
    MLA	R6, R10, R1, R0					@ calculate the final duty cycle with the offset, and intensity * rotating duty cycle amount
    STMFD	SP!, {R2, R3, R9, R12}
    
    synth_type_0_loop:
    
    LDMIA	R5, {R0-R3, R9, R10, R12, LR}			@ load 8 samples
    CMP	R7, R6						@ Block #1
    ADDCC	R0, R0, R11, LSL#6
    SUBCS	R0, R0, R11, LSL#6
    ADDS	R7, R7, R4, LSL#3
    CMP	R7, R6						@ Block #2
    ADDCC	R1, R1, R11, LSL#6
    SUBCS	R1, R1, R11, LSL#6
    ADDS	R7, R7, R4, LSL#3
    CMP	R7, R6						@ Block #3
    ADDCC	R2, R2, R11, LSL#6
    SUBCS	R2, R2, R11, LSL#6
    ADDS	R7, R7, R4, LSL#3
    CMP	R7, R6						@ Block #4
    ADDCC	R3, R3, R11, LSL#6
    SUBCS	R3, R3, R11, LSL#6
    ADDS	R7, R7, R4, LSL#3
    CMP	R7, R6						@ Block #5
    ADDCC	R9, R9, R11, LSL#6
    SUBCS	R9, R9, R11, LSL#6
    ADDS	R7, R7, R4, LSL#3
    CMP	R7, R6						@ Block #6
    ADDCC	R10, R10, R11, LSL#6
    SUBCS	R10, R10, R11, LSL#6
    ADDS	R7, R7, R4, LSL#3
    CMP	R7, R6						@ Block #7
    ADDCC	R12, R12, R11, LSL#6
    SUBCS	R12, R12, R11, LSL#6
    ADDS	R7, R7, R4, LSL#3
    CMP	R7, R6						@ Block #8
    ADDCC	LR, LR, R11, LSL#6
    SUBCS	LR, LR, R11, LSL#6
    ADDS	R7, R7, R4, LSL#3
    
    STMIA	R5!, {R0-R3, R9, R10, R12, LR}			@ write 8 samples
    SUBS	R8, R8, #8					@ remove #8 from sample count
    BGT	synth_type_0_loop
    
    LDMFD	SP!, {R2, R3, R9, R12}
    B	mixing_end_func
    
    check_synth_type:
    
    SUBS	R12, R12, #1					@ remove #1 from the synth type byte and check if it's #0
    BNE	synth_type_2					@ if it still isn't it's synth type 2 (smooth pan flute)
    
    MOV	R6, #0x300					@ R6 = 0x300
    MOV	R11, R11, LSR#1					@ halve the volume
    BIC	R11, R11, #0xFF00				@ clear bad bits from division
    MOV	R12, #0x70					@ R12 = 0x70
    
    synth_type_1_loop:
    
    LDMIA	R5, {R0, R1, R10, LR}				@ load 4 samples from memory
    ADDS	R7, R7, R4, LSL#3				@ Block #1 (some oscillator type code)
    RSB	R9, R12, R7, LSR#24
    MOV	R6, R7, LSL#1
    SUB	R9, R9, R6, LSR#27
    ADDS	R2, R9, R2, ASR#1
    MLANE	R0, R11, R2, R0
    
    ADDS	R7, R7, R4, LSL#3				@ Block #2
    RSB	R9, R12, R7, LSR#24
    MOV	R6, R7, LSL#1
    SUB	R9, R9, R6, LSR#27
    ADDS	R2, R9, R2, ASR#1
    MLANE	R1, R11, R2, R1
    
    ADDS	R7, R7, R4, LSL#3				@ Block #3
    RSB	R9, R12, R7, LSR#24
    MOV	R6, R7, LSL#1
    SUB	R9, R9, R6, LSR#27
    ADDS	R2, R9, R2, ASR#1
    MLANE	R10, R11, R2, R10
    
    ADDS	R7, R7, R4, LSL#3				@ Block #4
    RSB	R9, R12, R7, LSR#24
    MOV	R6, R7, LSL#1
    SUB	R9, R9, R6, LSR#27
    ADDS	R2, R9, R2, ASR#1
    MLANE	LR, R11, R2, LR
    
    STMIA	R5!, {R0, R1, R10, LR}
    SUBS	R8, R8, #4
    BGT	synth_type_1_loop
    
    B	mixing_end_func					@ goto end
    
    synth_type_2:
    
    MOV	R6, #0x80					@ write base values to the registers
    MOV	R12, #0x180
    
    synth_type_2_loop:
    
    LDMIA	R5, {R0, R1, R10, LR}				@ load samples from work buffer
    ADDS	R7, R7, R4, LSL#3				@ Block #1
    RSBPL	R9, R6, R7, ASR#23
    SUBMI	R9, R12, R7, LSR#23
    MLA	R0, R11, R9, R0
    
    ADDS	R7, R7, R4, LSL#3				@ Block #2
    RSBPL	R9, R6, R7, ASR#23
    SUBMI	R9, R12, R7, LSR#23
    MLA	R1, R11, R9, R1
    
    ADDS	R7, R7, R4, LSL#3				@ Block #3
    RSBPL	R9, R6, R7, ASR#23
    SUBMI	R9, R12, R7, LSR#23
    MLA	R10, R11, R9, R10
    
    ADDS	R7, R7, R4, LSL#3				@ Block #4
    RSBPL	R9, R6, R7, ASR#23
    SUBMI	R9, R12, R7, LSR#23
    MLA	LR, R11, R9, LR
    
    STMIA	R5!, {R0, R1, R10, LR}				@ store the samples back to the buffer
    SUBS	R8, R8, #4					@ subtract #4 from the remainging samples
    BGT	synth_type_2_loop
    
    B	mixing_end_func
    
    @****************** SPECIAL MIXING ******************@
    .if ENABLE_DECOMPRESSION==1
    special_mixing:		@ $03006BF8
    
    LDR	R6, [R4, #CHN_WAVE_OFFSET]		@ load the wave header offset to R6
    LDRB	R0, [R4]
    TST	R0, #FLAG_CHN_COMP			@ check if the channel is initialized
    BNE	setup_compressed_mixing_frequency	@ skip the setup procedure if it's running in compressed mode already
    
    ORR	R0, R0, #FLAG_CHN_COMP			@ enable the flag in the channel status
    STRB	R0, [R4]				@ store the channel status
    LDRB	R0, [R4, #CHN_MODE]			@ load the channel mode byte
    TST	R0, #MODE_REVERSE			@ check if reverse mode is not enabled
    
    BEQ	determine_compression			@ if Reverse Mode isn't enabled we can directly check if the sample has to get decoded
    
    LDR	R1, [R6, #WAVE_LENGTH]			@ load the amount of samples
    ADD	R1, R1, R6, LSL#1			@ do some start position calculation (???)
    ADD	R1, R1, #0x20
    SUB	R3, R1, R3
    STR	R3, [R4, #CHN_POSITION_ABS]		@ store the final seek position
    
    determine_compression:
    
    LDRH	R0, [R6]				@ load the compression flag from the sample header
    CMP	R0, #0					@ check if the compression is not enabled
    BEQ	setup_compressed_mixing_frequency	@ skip the compression handler
    
    SUB	R3, R3, R6				@ calc initial position
    SUB	R3, R3, #0x10
    STR	R3, [R4, #CHN_POSITION_ABS]		@ store the inital position (relative, not absolute)
    
    setup_compressed_mixing_frequency:
    
    STMFD	SP!, {R4, R9, R12}
    
    MOVS	R11, R11, LSR#1				@ divide master volume by 2
    ADC	R11, R11, #0x8000
    BIC	R11, R11, #0xFF00
    
    LDR	R7, [R4, #CHN_FINE_POSITION]		@ load the fine position
    LDR	R1, [R4, #CHN_FREQUENCY]		@ load the channel frequency
    LDRB	R0, [R4, #CHN_MODE]			@ load the channel mode again
    TST	R0, #MODE_FIXED_FREQ			@ check if fixed frequency mode is enabled
    MOVNE	R1, #0x800000				@ ### SAMPLE STEP FREQUENCY CHANGED TO R7
    MULEQ	R1, R12, R1				@ default rate factor * frequency = sample steps
    
    ADD	R5, R5, R8, LSL#2			@ set the buffer pointer to the end of the channel
    
    LDRH	R0, [R6]				@ load the codec type
    CMP	R0, #0					@ check if compression is disabled
    BEQ	uncompressed_mixing_reverse_check
    
    MOV	R0, #0xFF000000				@ set the current decoding block to "something very high" so that the first block always gets decoded
    STR	R0, [R4, #CHN_BLOCK_COUNT]		@ write the last decoded block into the channel vars
    LDRB	R0, [R4, #CHN_MODE]			@ check again if reverse mode is enabled
    TST	R0, #MODE_REVERSE			@ test if reverse mode is enabled
    BNE	compressed_mixing_reverse_init		@ check again of reverse mixing is enabled
    
    BL	bdpcm_decoder				@ load a sample from the stream to R12
    MOV	R6, R12					@ move the base sample to R6
    ADD	R3, R3, #1				@ increase stream position by #1
    BL	bdpcm_decoder				@ load the delta sample and calculate delta value
    SUB	R12, R12, R6
    
    @***** MIXING LOOP REGISTER USAGE ***********@
    @ R0:	Sample to modify from buffer
    @ R1:	sample steps		(MOVED FROM R4)
    @ R2:	remaining samples before loop/end
    @ R3:	sample position
    @ R4:	channel pointer
    @ R5:	pointer to the end of buffer
    @ R6:	Base sample
    @ R7:	fine position
    @ R8:	remaining samples for current buffer
    @ R9:	interpolated sample
    @ R10:	not used
    @ R11:	volume
    @ R12:	Delta Sample
    @ LR:	not used
    @********************************************@
    
    compressed_mixing_loop:
    
    MUL	R9, R7, R12				@ delta sample * fine position = interpolated DELTA
    MOV	R9, R9, ASR#22				@ scale down the sample
    ADDS	R9, R9, R6, LSL#1			@ double the base sample and add it to the interpolated downscaled DELTA
    LDRNE	R0, [R5, -R8, LSL#2]			@ if the sample is NOT 0 load the sample from buffer and store the calulated value
    MLANE	R0, R11, R9, R0				@ add the sample to the buffer sample and apply volume
    STRNE	R0, [R5, -R8, LSL#2]			@ store the sample if it's not Zero
    ADD	R7, R7, R1				@ ### changed from R4 to R1
    MOVS	R9, R7, LSR#23				@ check if there is new samples to load
    
    BEQ	compressed_mixing_load_skip		@ no new samples need to be loaded
    
    SUBS	R2, R2, R7, LSR#23			@ remove the sample overflow from the remaining samples
    BLLE	loop_end_sub				@ call the loop/ending handler if the countdown reached zero or something negative
    SUBS	R9, R9, #1				@ check if only one sample has to get loaded
    ADDEQ	R6, R12, R6				@ if this is the case we can calculate the new base sample
    BEQ	compressed_mixing_base_load_skip
    
    ADD	R3, R3, R9				@ these opcodes are equivalent to LDRNESB R6, [R3, R9]!
    BL	bdpcm_decoder
    MOV	R6, R12
    
    compressed_mixing_base_load_skip:
    
    ADD	R3, R3, #1					@ equivalent to LDRSB	R12, [R3, #1]!
    BL	bdpcm_decoder
    SUB	R12, R12, R6
    BIC	R7, R7, #0x3F800000			@ clear the overflow bits by using the according bitmask
    
    compressed_mixing_load_skip:
    
    SUBS	R8, R8, #1				@ remove #1 from the remaining samples
    BGT	compressed_mixing_loop
    
    @SUB	R3, R3, #1				@ sample pointer -1 (???); ALREADY DONE BY mixing_end_func
    B	mixing_end_func
    
    
    
    
    compressed_mixing_reverse_init:
    
    SUB	R3, R3, #1				@ subtract one from the reverse playback location initially
    BL	bdpcm_decoder				@ fetch a sample from stream
    MOV	R6, R12					@ bdpcm_decoder returns base sample in R12 --> R6
    SUB	R3, R3, #1				@ seek one sample further backwards
    BL	bdpcm_decoder				@ detch the DELTA sample
    SUB	R12, R12, R6				@ calc the Delta value
    
    compressed_mixing_reverse_loop:
    
    MUL	R9, R7, R12				@ delta sample * fine position = interpolated DELTA
    MOV	R9, R9, ASR#22				@ scale down the sample
    ADDS	R9, R9, R6, LSL#1			@ double the base sample and add it to the interpolated downscaled DELTA
    LDRNE	R0, [R5, -R8, LSL#2]			@ if the sample is NOT 0 load the sample from buffer and store the calulated value
    MLANE	R0, R11, R9, R0				@ add the sample to the buffer sample and apply volume
    STRNE	R0, [R5, -R8, LSL#2]			@ store the sample if it's not Zero
    ADD	R7, R7, R1				@ ### changed from R4 to R1
    MOVS	R9, R7, LSR#23				@ check if there is new samples to load
    
    BEQ	compressed_mixing_reverse_load_skip	@ skip sample loading if we don't need to load new samples from ROM
    
    SUBS	R2, R2, R7, LSR#23			@ remove the overflowed samples from the remaining samples
    BLLE	loop_end_sub				@ if the sample playback finished go to end handler
    
    SUBS	R9, R9, #1				@ remove sample overflow count by #1
    ADDEQ	R6, R12, R6				@ make the previous delta sample the new base sample if only #1 sample needs to get loaded
    BEQ	compressed_mixing_reverse_base_load_skip @skip base sample loading
    
    SUB	R3, R3, R9				@
    BL	bdpcm_decoder				@
    MOV	R6, R12					@
    
    compressed_mixing_reverse_base_load_skip:
    
    SUB	R3, R3, #1
    BL	bdpcm_decoder
    SUB	R12, R12, R6				@ load next samples???
    BIC	R7, R7, #0x3F800000			@ clear overflow bits
    
    compressed_mixing_reverse_load_skip:
    
    SUBS	R8, R8, #1
    BGT	compressed_mixing_reverse_loop
    
    @ADD	R3, R3, #2				@ ???, copied from original code
    ADD	R3, R3, #3
    
    B	mixing_end_func
    
    
    uncompressed_mixing_reverse_check:
    
    LDRB	R0, [R4, #1]				@ load the channel mode		=$03006D84
    TST	R0, #MODE_REVERSE			@ check if reverse mode is even enabled
    BEQ	mixing_end_func				@ skip the channel if the mode is "akward"
    
    LDRSB	R6, [R3, #-1]!				@ load first negative sample
    LDRSB	R12, [R3, #-1]				@ load the DELTA sample
    SUB	R12, R12, R6				@ calculate DELTA
    
    reverse_mixing_loop:
    
    MUL	R9, R7, R12				@ delta sample * fine position = interpolated DELTA
    MOV	R9, R9, ASR#22				@ scale down the sample
    ADDS	R9, R9, R6, LSL#1			@ double the base sample and add it to the interpolated downscaled DELTA
    LDRNE	R0, [R5, -R8, LSL#2]			@ if the sample is NOT 0 load the sample from buffer and store the calulated value
    MLANE	R0, R11, R9, R0				@ add the sample to the buffer sample and apply volume
    STRNE	R0, [R5, -R8, LSL#2]			@ store the sample if it's not Zero
    ADD	R7, R7, R1				@ ### changed from R4 to R1
    MOVS	R9, R7, LSR#23				@ check if there is new samples to load
    
    BEQ	reverse_mixing_load_skip
    
    SUBS	R2, R2, R7, LSR#23			@ blablabla, all same as above
    BLLE	loop_end_sub
    
    MOVS	R9, R9					@ check if sample 
    ADDEQ	R6, R12, R6
    LDRNESB	R6, [R3, -R9]!
    LDRSB	R12, [R3, #-1]				@ load samples dependent on conditions
    SUB	R12, R12, R6
    BIC	R7, R7, #0x3F800000			@ cut off overflow count to get new fine position
    
    reverse_mixing_load_skip:
    
    SUBS	R8, R8, #1				@ remaining samples -1
    BGT	reverse_mixing_loop			@ continue lopo if there is still samples to process
    
    @ADD	R3, R3, #1				@ copied from original code (???)
    ADD	R3, R3, #2				@ =$03006DE8
    
    B	mixing_end_func
    
    @**************** SPECIAL MIXING END ****************@
    
    @************** SPECIAL MIXING LOOPING **************@
    
    compressed_loop_end_sub:
    
    
    
    
    @************ SPECIAL MIXING LOOPING END ************@
    
    @****************** BDPCM DEOCODER ******************@
    
    bdpcm_decoder:				@ RETURNS SAMPLE FROM POSITION XXX in R12
    
    STMFD	SP!, {R0, R2, R5-R7, LR}		@ push registers to make them free to use: R0, R2, R5, R6, R7, LR
    MOV	R0, R3, LSR#6				@ shift the relative position over to clip of every but the block offset
    LDR	R12, [R4, #CHN_BLOCK_COUNT]		@ check if the current sample position is at the beginning of the current block
    CMP	R0, R12
    BEQ	bdpcm_decoder_return
    
    STR	R0, [R4, #CHN_BLOCK_COUNT]		@ store the block position to Channel Vars
    MOV	R12, #0x21				@ load decoding byte count to R1 (1 Block = 0x21 Bytes)
    MUL	R2, R12, R0				@ multiply the block count with the block length to calc actual byte position of current block
    LDR	R12, [R4, #CHN_WAVE_OFFSET]		@ load the wave data offset to R1
    ADD	R2, R2, R12				@ add the wave data offset and 0x10 to get the actual position in ROM
    ADD	R2, R2, #0x10				@ 
    LDR	R5, decoder_buffer			@ load the decoder buffer pointer to R5
    ADR	R6, delta_lookup_table			@ load the lookup table pointer to R6
    MOV	R7, #0x40				@ load the block sample count (0x40) to R7
    LDRB	LR, [R2], #1				@ load the first byte & sample from the wave data to LR (each block starts with a signed 8 bit pcm sample) LDRSB not necessary due to the 24 high bits being cut off anyway
    STRB	LR, [R5], #1				@ write the sample to the decoder buffer
    LDRB	R12, [R2], #1				@ load the next 2 samples to R1 (to get decoded) --- LSBits is decoded first and MSBits last
    B	bdpcm_decoder_lsb
    
    bdpcm_decoder_msb:
    
    LDRB	R12, [R2], #1				@ load the next 2 samples to get decoded
    MOV	R0, R12, LSR#4				@ seperate the current samples' bits
    LDRSB	R0, [R6, R0]				@ load the differential value from the lookup table
    ADD	LR, LR, R0				@ add the decoded value to the previous sample value to calc the current samples' level
    STRB	LR, [R5], #1				@ write the output sample to the decoder buffer and increment buffer pointer
    
    bdpcm_decoder_lsb:
    
    AND	R0, R12, #0xF				@ seperate the 4 LSBits
    LDRSB	R0, [R6, R0]				@ but the 4 bit value into the lookup table and save the result to R0
    ADD	LR, LR, R0				@ add the value from the lookup table to the previous value to calc the new one
    STRB	LR, [R5], #1				@ store the decoded sample to the decoding buffer
    SUBS	R7, R7, #2				@ decrease the block sample counter by 2 (2 samples each byte) and check if it is still above 0
    BGT	bdpcm_decoder_msb			@ if there is still samples to decode jump to the MSBits
    
    bdpcm_decoder_return:
    
    LDR	R5, decoder_buffer			@ reload the decompressor buffer offset to R5
    AND	R0, R3, #0x3F				@ cut off the main position bits to read data from short buffer
    LDRSB	R12, [R5, R0]				@ read the decoded sample from buffer
    LDMFD	SP!, {R0, R2, R5-R7, PC}		@ pop registers and return to the compressed sample mixer
    
    @**************** END BDPCM DECODER *****************@
    
    decoder_buffer:
    	.word	decoder_buffer_target
    delta_lookup_table:
    	.byte	0x0, 0x1, 0x4, 0x9, 0x10, 0x19, 0x24, 0x31, 0xC0, 0xCF, 0xDC, 0xE7, 0xF0, 0xF7, 0xFC, 0xFF
    .endif
    
    main_mixer_end:
    
    	.end
    WARNING: DO NOT ATTEMPT TO REMOVE THE 'NOPs' FROM THE ASSEMBLY! THEY ARE SPACEHOLDERS FOR SELFMODIFYING CODE AND REMOVING THEM BREAKS ABSOLUTELY EVERYTHING!

    Assembly and insertion:

    Well, since Version 1.0 a few things have changed in the insertion process and things got a little more complicated. However, I will try to do my best to explain things as good as I can.
    If you've read the introduction you will most likely already know that my mixer will require an additional mixing buffer for high quality processing. This once requires RAM. The amount of RAM in bytes can be calculated by the following:
    Code:
    FRAME_LENGTH_XXXXX * 4
    XXXXX is the maxmium samplerate supported. To get the values for FRAME_LENGTH_XXXXX check the definitions in the code.
    Let's say we want to at least support 13379 Hz do the follwing:
    Code:
    FRAME_LENGTH_13379 * 4 =
    0xE0 * 4 = 0x380
    So you'll need 0x380 free bytes in IWRAM for that. The pointer to this aread needs to be put into the assembly code. Use a configuration preset for that (".equ hq_buffer" see code, should be self explaining). More on that later.
    The assembly itself can be put anywhere into ROM. However, due to speed concerns the code must be loaded to RAM for the high execution speed. Reserve N bytes in IWRAM for that. Just assemble the code and see how long it is (with all features it should be ~0xB00 bytes).
    In comparison to V1.0 of the mixer this new version's code is bigger than Nintendo's code and simply can't be put into the same IWRAM area obviously. This is where things really start to get tricky. IWRAM space in Pokemon games is quite limited and we need a lot of it.
    Just cuz of lazyness I'll stick to Pokemon games here. If you work on non Pokemon games you'll need to manage the RAM repointing with your own technique.
    Long things short: What I do for Pokemon games is that I move a big structure (0xFB0 bytes, let's call it "Main Sound Area") of the Sound Engine to EWRAM to free things up. This structure contains the outputbuffers that are used by the Sound DMA. Moving this structure to EWRAM wouldn't make sense for Nintendo's mixer because it uses these output buffers as work buffers (lot's of reads and writes) which would slow things down. However, with my code that is not such a big problem because the output buffers are only accessed once.
    By freeing up that big chunk of memory there is enough space to put the new mixing code into. Also, by moving the new mixing code to that new location the space of the old mixing code obviously is no longer used (0x800 bytes!) and can be used for the work buffers. This is the common technique I use for Pokemon games.
    For all GBA Pokemon games the Main Sound Area can safely be moved to 0x0203E000 as long as it doesn't conflict with one of your personal hacks. For Fire Red you need to disable the Help menu. It will break planets otherwise!

    Main Sound Area locations:
    • Fire Red (US): 0x03005F50
    • Fire Red (GER): 0x03005E40
    • Emerald (US, GER): 0x03006380
    Contact me if you need offsets for other languages.

    For the repointing of the Main Sound Area simply search for the pointers above and replace them with 0x0203E000. The pointer should occur exactly 3 times in Emerald and exactly 2 times in Fire Red. All occurences need to be replaced.

    Now that the Main Sound Area has been moved out from their original locations it's time for the new mixing code to move in. You first need to assemble the code above. You will need to set the configuration preset before. DO NOT SKIP THIS! See the chapter below for the details.
    After the assembly has finished put the binary somewhere into your ROM. Put the pointer of that binary to here:
    • Fire Red (US): 0x1DD0B4 (ROM)
    • Fire Red (GER): 0x1E134C (ROM)
    • Emerald (US): 0x2E00F0 (ROM)
    After doing that you need to set the new RAM pointer for the new code inserted. There is 2 pointers for that: One for the CpuSet to copy the code to the right RAM location and one for the actual program call. The one for the program call has the Thumb bit set, the other one doesn't.
    The original mixing code is located at the following addresses:
    • Fire Red (US): 0x030028E0
    • Fire Red (GER): 03002830
    • Emerald (US, GER): 0x03001AA8
    For repointing search for this pointer and replace it with the new pointer where the Main Sound Area has been. Do the search and replace once with Thumb bit set and without. Also, because the new code is longer than the old one you'll need to specify the length of data to be transferred by CpuSet. Just open the assembled code in the hexeditor, check the length of the data and write it down. Because CpuSet (at least in this case) transfers data in units of 4 bytes, you'll need to divide the code length by 4 in order to get the amounts of units to transfer. Put this value at the following address (2 bytes only!, keep the little endian byte order in mind):
    • Fire Red (US): 0x1DD0BC (ROM)
    • Fire Red (GER): 0x1E1354 (ROM)
    • Emerald (US): 0x2E00F8 (ROM)
    Let's do an example: The assembled code has a length of 0xB88 bytes. That'd result 0x2E2 units. So you'd need to write "0xE2 02" at the specified address.
    That's it for the code.

    Remember that the new work buffer for the mixer will go where the old mixing code in RAM was? Good. The locations of the original mixing code are ^above^. Use these addresses for the "hq_buffer" config setting and you're done ;)

    Next step: Play the game and have fun with low noise audio ^.^
    REMEBER: If you are using emulator quick saves, you have to save the game in the game itself and reload the ingame save because the new code is only loaded once during ROM startup and will need a restart of the ROM. (quicksaves will contain the old mixer in the IRAM).


    Configuration:
    So before you assemble your code you'll need to configure it. The code is designed to have multiple and switchable configuration presets. The preset to be used can be selected in the line that says " .equ USED_GAME, GAME_XXXX". Set XXXX to your gamecode and make sure you create a configuration preset if none exists yet. This is done by a code patterns that looks like the following:
    Code:
    .if USED_GAME==GAME_BPEE
    
    	.equ	hq_buffer, BUFFER_IRAM_BPE
    	.equ	decoder_buffer_target, DECODER_BUFFER_BPE
    	.equ	ALLOW_PAUSE, 1
    	.equ	DMA_FIX, 1
    	.equ	ENABLE_DECOMPRESSION, 1
    	.equ	PREVENT_CLIP, 1
    
    .endif
    Let me explain what all of them do (1 = on, 0 = off):
    • hq_buffer: Set this to the value where you want your new work buffer to be.
    • decoder_buffer_target: This is only used if you enable compressed wave support. This points to a buffer 0x40 bytes long.
    • ALLOW_PAUSE: To be honest, I'm not sure myself what it does but it is required by Pokemon games for the sound channels to init correctly. Set it to 0 for non Pokemon games.
    • DMA_FIX: Writes zeroes into all DMA3 registers after using it. This magically fixes a rare crash issue I had in Pokemon games. When working with non Pokemon games try to turn it off first. If the game should occasionally crash or other glitches occur try to turn it on.
    • ENABLE_DECOMPRESSION: Enables compressed sample end reverse playback support. Required for cries and SFX to work properly on Pokemon. Afaik none but Pokemon games need this by default.
    • PREVENT_CLIP: Incase the volume of a song or SFX is too loud it might cause the sound buffer to overflow. Enabling this caps the amplitude at the maximum level and prevents the "wrap around" that can cause VERY LOUD crackling noise. This function comes with a general usually negligible performance impact. Enable it if you work with very loud songs and sound effects and have issues with crackling noise (you will definitely not miss it incase you have it). Otherwise turn it off.
    To save you some work I've already made presets for BPRE, BPEE and some other games. The only thing you'll need to do in this case is to select the right one in the line ".equ USED_GAME, GAME_XXXX".


    Comparison (Nintendo's vs. my V1.0):
    Here is a video comparing the default mixer (1st) and my mixer (2nd). Keep in mind that this is still the first and no the latest release of my mixer but the results are pretty similar:


    Conclusion:
    Yeah, has been quite a few time since I got V2.0 working and releasing V2.1 even though it was bug free. I hope you enjoy the work.
    As always, feedback appreciated!
     
    Last edited:
    173
    Posts
    12
    Years
    • Seen Jan 2, 2015
    Do I take the whole routine, straight from the first line, then change the values? Or do I start at a specific line?
     

    ipatix

    Sound Expert
    145
    Posts
    15
    Years
  • No, you have to start from the top line (the complete routine). You just have to modify the lines I mentioned above.
     
    173
    Posts
    12
    Years
    • Seen Jan 2, 2015
    So if I am using BPRE, I wouldn't change the routine at all? Or would I put the offset where the words are?
     

    ipatix

    Sound Expert
    145
    Posts
    15
    Years
  • If you use you wouldn't need to change anything, right. And, no, you wouldn't need to change the words because the words are defined by the .equ-s and should change all according to the adjustments you do in the first lines.
     
    Last edited:
    173
    Posts
    12
    Years
    • Seen Jan 2, 2015
    If you use you wouldn't need to change anything, right. And, no, you wouldn't need to change the worse because the worse are defined by the .equ-s and should change all according to the adjustments you do in the first lines.

    OK. Thanks a bunch ipatix! Wonderful job on this!
     

    Wobbu

    bunger bunger bunger bunger
    2,794
    Posts
    12
    Years
  • I just tested this on BPEE and all my custom music sounds so much better now! Newer generation music that I ported to my hack doesn't produce nearly as much unnecessary noises as they used to, especially ones that have a heavy use of high-pitched instruments. Thank you a lot for researching this! The difference is very noticeable.
     
    13
    Posts
    11
    Years
  • Mmmph so i changed the start of your routine so i could use it for fire red and inserted it at 0x800660 then i entered this pointer: (60 06 80 08) at 0x1DD0B4 and the game freezes everytime it tries to play sound i also tried (61 06 80 08) the plus 1 thumb pointer and still no good so i'm sure i screwed up somewhere...
     

    PokeBunny

    Pokemon Game Maker
    34
    Posts
    11
    Years
  • Not bad. Turns out you are smart. But the AGB music player sounds bad. But they use it because you only have to use swi functions. Gamefreak is lazy!
     

    ipatix

    Sound Expert
    145
    Posts
    15
    Years
  • @PokeBunny:
    Actually Gamefreak is not "that" lazy. To clarify things:
    The AGB music player is part of the Nintendo SDK. And they did a pretty decent job on doing a very efficient music player although there is some flaws here and there (like the noisy sound). This music player was implemented into the BIOS when the AGB came out to have a fast music player that doesn't need IWRAM for Code which is stored in the BIOS (remember, BIOS memory is as fast as IWRAM). This however turned out to be a problem: There was some bugs with the old versions of the music player and because Nintendo let the Developer to choose BIOS or IWRAM (updateable) code almost all developers used the IWRAM solution.
    So "you only have to use swi functions" is not 100% correctly. Anyway, there is not much Nintendo could have done a lot better. With my code and (enough free CPU load which Pokegames don't have) you could reach very good quality. So it's not "all bad".

    Some other developers like Camelot (--> developed Golden Sun) however completley rewrote the code of some parts of the music which provide much higher quality and lower CPU load. This is how they could make the game provide one of the best GBA soundtracks in my opinion:

    https://www.youtube.com/watch?v=NrcG9lgGGNg

    I might actually try to port their engine to Pokemon in the future but I don't promise that. The code they use is even 10 times as complicated as it already was with Nintendos one. The other thing that'll be tough to do is to add the support for compressed samples and reverse playback to this engine (which no other game than Pokemon is able to; default sound driver modded by Gamefreak).

    @designmadman:
    I don't know why it won't work. I tried it once on my own on Firered US and it worked.
    Usually "bad settings" that are language depended shouldn't crash the game (although the sound myight turn really buggy).

    Check again if you did the assembly process correctly (correct settings) and if you did all the pointers correctly.
    Other than that I don't know what could have gone wrong...
     
    13
    Posts
    11
    Years
  • Yeah i tried it on a clean new fire red rom and it works fine i guess something on my current modified rom is preventing the routine from working properly. Nice job on this sounds really good when i import songs to the clean rom.
     

    ipatix

    Sound Expert
    145
    Posts
    15
    Years
  • It should work on any ROM that doesn't use the free IWRAM areas I use in the code.
    For Emerald 0x03005200 (4*0xE0 Bytes)
    For Fire Red 0x03004200 (4*0xE0 Bytes)
     

    Kawaii Shoujo Duskull

    The Cutest Duskull
    276
    Posts
    10
    Years
    • Seen Sep 10, 2023
    Huh. I've got a pretty well-trained ear. I didn't really notice much of a difference in your example, just that the second set of audio played was a bit clearer I guess but not much.
    lol Is the change easier to hear in the game itself, or am I the only one who doesn't notice a big difference? Just wondering.


    But yeah anyway even if I can't tell the difference or if it isn't that big or whatever, still great job on this ASM! Keep up the good work. ^^
     

    ipatix

    Sound Expert
    145
    Posts
    15
    Years
  • Listen carefully to the parts that are more quiet. Specially at these parts it should be pretty noticeable.

    EDIT:
    @ anyone who is having problem with issues:
    This routine is not compatible with ASM hacks that access the IWRAM at the areas specified in the code. I recently found out that I had to move the area for Emerald to 0x03005100 because prime's DNS uses some areas around there aswell and my routine would cause glitchy pallette changes all the time.

    EDIT 2:
    I now changed the assembly code to a preset system. The only thing you'll do before assembly is set the right "USED_GAME" and run the assembly.
     
    Last edited:

    angelXwind

    SHSL Programmer Pineapple Girl
    2
    Posts
    13
    Years
    • Seen Jun 14, 2014
    There's a typo on line 48 of your code. See here: https://github.com/angelXwind/pokem...mmit/be377b7540ed89a7588941348ed12fd129440669

    Also, I wrote a Makefile around your code that executes the following after asssembly (BPEE target):

    Code:
    dd if=main.bin of="out.gba" conv=notrunc seek=3014896 bs=1

    (3014896 is 0x2E00F0 in dec, here's how that Makefile works: https://github.com/angelXwind/pokemon-gen3-hq-sound-mixer/blob/master/Makefile)

    However, the resulting ROM only causes the emulator to loop at the BIOS forever.

    The assembled binary is 0x7A0 bytes long as it should be, so I'm (most likely) assembling it correctly.

    Am I injecting the binary into the wrong area? Or...
     
    Last edited:

    PokeBunny

    Pokemon Game Maker
    34
    Posts
    11
    Years
  • SO there is a music player in the BIOS. I didn't know that. You know the reason: in GBATEK, some of the swi functions are undocumented as SoundWhatever #.
     

    ipatix

    Sound Expert
    145
    Posts
    15
    Years
  • There's a typo on line 48 of your code. See here: https://github.com/angelXwind/pokem...mmit/019a667a4612c4bbfa0438c59ed9a3fbdbc983f9

    Also, I wrote a Makefile around your code that executes the following after asssembly (BPEE target):

    Code:
    dd if=main.bin of="out.gba" conv=notrunc seek=3014896 bs=1
    (3014896 is 0x2E00F0 in dec, here's how that Makefile works: https://github.com/angelXwind/pokemon-gen3-hq-sound-mixer/blob/master/Makefile)

    However, the resulting ROM only causes the emulator to loop at the BIOS forever.

    The assembled binary is 0x7A0 bytes long as it should be, so I'm (most likely) assembling it correctly.

    Am I injecting the binary into the wrong area? Or...
    You can't just overwrite the old code. My new one is slightly bigger so you'll overwrite other stuff. Perhaps this could be the problem.

    @all: I just want to announce that version 2.0 of the mixer is already very far in development state. The biggest changes are that the code is completley rewritten, executes almost twice as fast and supports a basic Synth engine without the use of samples in ROM. It'll probably still be incompatible with interdepth's RTC (and/or DNS) for now due to overlapping RAM areas but I'll tell you later more about that.
     

    angelXwind

    SHSL Programmer Pineapple Girl
    2
    Posts
    13
    Years
    • Seen Jun 14, 2014
    You can't just overwrite the old code. My new one is slightly bigger so you'll overwrite other stuff. Perhaps this could be the problem.

    Ah, thanks for the information. Injecting the binary into some free space in the ROM then modifying the pointer seems to work.

    Also, I created a GitHub repository with a Makefile that completely automates the process of assembling and injecting the binary into a ROM. https://github.com/angelXwind/pokemon-gen3-hq-sound-mixer
     
    Last edited:
    Back
    Top