The Concept of ASM: A Brief Introduction to the Land of Assembly
To start off with, this is not an ASM tutorial, I am not going to teach you what ldr r0, r1 means, I'm not going to explain how to push and pop registers; mostly because we already have a lot of tutorials like that (at the very end of this post, I will link you to them). My goal here is to explain how ASM works and to explain to the assembly neophyte how registers work and why we use ASM so that it makes sense when you start a more official tutorial like HackMew's or JPAN's.
I also want to say, before I begin, that on a scale of 1-10, my skill in this area is hovering around a 3. This is another reason why I don't feel qualified enough teach you true assembly, but I can help you understand how it works, which is half the battle.
~What is ASM?~
ASM stands for assembly, which is the computer language that operates directly on the CPU. By now, most people understand that computers operate on binary numbers (numbers consisting of only 0s and 1s. ex: 01001010) which are almost impossible for the human mind to understand, even when converted to decimal and/or hexadecimal. So, early programers came up with a way to read and write machine code. This is what assembly is: it is a language understandable by humans that has a 1-1 relationship to the binary code that a computer uses.
Each and every CPU has an assembly set associated with it. Many CPUs also have unique assembly sets, meaning that assembly written for one processor, will not run on another. (This led to the development of higher-level languages like C so that more computers could run the same code, just FYI) The GBA processor is the ARM7TDMI which uses the ARMv4 assembly instruction set.
When referring to the GBA processor itself, it can run in 2 modes: ARM and Thumb. For the most part Thumb is used 98% of the time because it is about half the size and runs much faster. Also, for almost every ARM instruction, there is a corespondent Thumb instruction that does the same thing, so why waste space and time using ARM? Because of this, when I refer to ASM in this guide, I am specifically referring to Thumb mode. Some of what I say will not apply in ARM mode, so you will need to find a different guide if you wish to use ARM.
Everything in GBA games run off of pre-determined ASM routines. All of the scripting commands, how images are written to the screen, even how your game is saved, it is all ASM. In spite of all of this, there are many things we can't do with these pre-made Game Freak routines, which is why we learn ASM and try to write new ones. Things like making Pokémon shiny, run-time trainer customization, and creating new items are just a few of the things that we have created new ASM routines to do what we want them to do.
Now that you understand what exactly ASM is, let's take a look into how it works on the ARM7TDMI. In this processor, there are 16 registers. Every single piece of data used by the game has to pass through on of these registers.
Registers are a tough concept for people and understanding them is a huge part in learning ASM. The best way to describe them is as cups or plates, basically something that can hold something. What happens is these "cups" are used to hold data which we then perform calculations, make comparisons and changes, and then write back to the RAM. The registers are broken up as follows:
Registers 0-7 are general purpose registers and can be used for anything. These are our "low registers".
Registers 8-12 are also general purpose, but are like the scripting variables 0x8000+ as they are only temporary. These are our "high registers".
These registers each have their own functions which I will expand upon later:
Register 13- Stack Pointer often abbreviated to SP.
Register 14- Link Register (LR).
Register 15- Program Counter (PC).
-Registers are referred to as rX, where X is the number of the register. So, register 13 would be r13.
For right now, I will explain the general purpose registers. The processor itself cannot act directly on the ram, it has to load things through the registers. Think about the processor like a hand and the instructions like a brain. As the brain reads the instructions, it uses its hands to pick up and copy data to a cup (register). It can then manipulate this data, compare it to other data, and then take and put the data back.
These registers can also serve as pointers to locations. Whenever a register is written inside of brackets, [rX], it becomes a pointer and any data loaded or written into it will actually be loaded to that location in the ram. So, if r0 = 02000020, then if you were copying r1 to [r0] it would end up writing the data in r1 to 02000020.
So, let's give an example. How could you change a byte at address X? Knowing that everything has to pass through the registers, think of a general idea on how we could accomplish this. The answer is in the spoiler.
~The special registers:
r14 is used as the link register. Here is a bit of information from Wikipedia:
To clarify, r14 is used to hold the address which the game can return to after it finishes the routine. So, if we write a script and use the callasm command, when the script branches to the ASM routine, it will store the location it left off at in r14 so it can resume the script.
r15 is the program counter. What the program counter does is point to the next instruction to be executed. After that instruction is run, it is incremented to point to the next instruction.
"Registers are small memory slots, each one being 32 bits (or in other words, 4 bytes) long."
That's all. Some "programming genious" waaaaay in the past decided to name these memory slots the processor uses as registers. But that's about it. And nowadays, each processor uses these memory slots, registers, for working with data.
Since this tutorial is all about understanding, and everybody has their own viewpoint, maybe the viewpoints of others will help even more!
~Pushing and Popping~
I said earlier that I wasn't going to explain how to push and pop registers, and I won't. My goal here is explain what pushing and popping does so that you can understand it better.
Basically, there is this thing called the "stack". Think of it like a bunch of stored cups that all hold something that you want to save. Whenever you write an ASM routine, it is going to use registers that already have something in them. This could cause some problems, right? You bet. So, we choose to save this data. This is called pushing. Think of it like this: when we push rX, we put a "cap" on the cup (saving its data) and place it at the top of the stack. Now, we can play with rX and use it all we want, knowing that its data is safe. At the end of the routine, we would want to restore the old data, right? This is called popping. A simple explanation is you take the cup off of the top of that stack, remove the cover, and replace the contents in rX with those in the cup. Pretty simple, huh? Just remember, always pop what you push!
One more thing on popping... Think about a stack for a second. Would it be safe to pull something out of the middle of the stack? No, it would end up like a bad game of Jenga and everything comes crashing down. This is why you always pop in the opposite order you pushed. So, if you pushed r0, r1, and r2; you would have to pop in the reverse order: r2, r1, and then r0. Just keep the Jenga image in mind: never play Jenga with ASM:p.
I'm sure the word "stack" triggered a little bit of Deja Vu. Let's see... Where have we seen that before... Oh! That's right, r13: the stack pointer. I neglected to explain earlier what this is. The stack pointer is a register used to store the last address of the stack. This way, we know where the heck the stuff we pushed went and so the game can later recover it.
~The Correlation between LR and PC~
Pushing and popping has another useful feature when you use r14 and r15 together. If you remember, r14 contains the Link Register which stores where we came from. To make it easy to return to a function that called your routine, or to create a routine that is used by multiple other routines, you can push r14 and save where you came from. Due to how assembly works, you can't pop the Link Register and you can't push the Program Counter (r15). So, how do we go back after we pushed r14? Simple, you pop r15! This may not make sense right now, but it is good to know: if you push r14 in the first line of your routine and pop r15 on the last, it will return to the function which called it upon popping r15. This is especially useful in scripting when using the callasm command as it will allow the routine to jump back to your script when done with the routine, at the offset used by callasm.
After thinking about the above concept for a very, very long time; I just realized exactly how it works. When I originally wrote this, I knew it worked, but I didn't know why; now I do! So, here it is: you should remember that when you branch/jump to another function, the LR (r14) contains the address you just came from. However, how did this address get there? Let's remember that the Program Counter (r15) stores the next function to be run. So, the branch command, which allows us to jump to another function, copies what was in the PC into the LR, then loads into PC the next address to go to which causes the processor to see that as the next offset and jump to the new function. Makes sense, right? So, now when we push LR, it stores that address (where we left off from) in the stack. Now we can run our function like usual, but how on earth does popping PC jump back to where we were? Remember my image of the stack and how you always pull from the top when you pop? So, if the last thing you pop is PC, the LR which you pushed will be on top. So, popping PC will place the pushed LR (the address which we came from) on PC. Now, PC contains the address we came from, which it will read as the next instruction to run and will therefore go back there.
You probably didn't need to know that, however, at least to me personally, I know that when I understand how things work, it makes using them a lot easier.
I honestly have no more to say about ASM. My goal here was to bridge the gap between knowing nothing about ASM and being able to actually follow an assembly tutorial, which can prove quite confusing if you don't understand anything.
True to what I said in the beginning, the next step in your journey is to hit up an ASM tutorial which will teach you what each command means and does, as well as how to write your own routines. So, here is the list of tutorials I have right now:
HackMew's Having fun with ASM: Lesson One
HackMew's Shinies Unleashed: Lesson 2 - Gettin' better
JPAN's ASM Tutorial Document
shiny quagsire's GBA ASM Programming
knizz and FullMetal's Yet another ASM-Tutorial
Feel free to comment with any other tutorials you might know and think that they should be added to this list.
I really hope this helps anyone trying to learn ASM and makes it a little less intimidating!
If you want/need any other ASM concepts, feel free to request them. I will not teach you ASM, just the concepts.
ASM Hacker's I have learned from:
People who read and proofed this tutorial for me to make sure it was worth it:D:
~Pink Parka Girl