- 77
- Posts
- 10
- Years
- Seen Dec 5, 2015
A bit of background. I'm a decent tool-based hacker with a basic knowledge of hex and XSE scripting. I know a bit of computer programming (Python, specifically) but I know almost nothing about computer hardware. And I'm trying to learn ASM.
From the tutorials I've explored, it seems like I'm in WAY over my head. This line at the end of FBI's Inserting Routines Into Your ROM tutorial sums it up nicely:
To which I say "what is this outside you speak of?" And no, I didn't understand at that point.
But I kept going. I was able to insert HackMew's tutorial ASM routine after a bit of deduction. Unfortunately, I've hit a brick wall I can't get past on my own. The big problem is that I don't understand how ASM code actually works yet. I'm trying to figure out what it's doing, and I'm making a bit of progress, but not enough to make my own routines.
From my little time skimming ASM questions, I don't think I'm in the same mindset as other ASM newbies. So I went through the "How does it work" portion of HackMew's ASM tutorial line-by-line. I'd like to know if I'm at least somewhere in the ballpark with my train of thought, and where I'm going wrong.
It's a HUGE response, so spoiler tag. Thanks in advance to anyone who spares the time.
Starting at the first lines that I don't think appear at the head of every routine:
Ignore the bits part, you don't need to worry about that yet.
According to Google, a byte is a character. I'm interpreting that a character would be a symbol, such as a letter, number or anything you can type in a word processor. So that would make a register just 4 symbols.
What do you mean by "accessed" and "calling their name"? Would the names of registers be "r0", "r1", "r2", ect.? Would accessed mean "used by a command"? No idea what "calling their name" means. A type of command, maybe?
The two numbered registers used are general use registers, like a variable in scripting? r14, Link Register, is probably the lr from the previous command. Dunno what a "sub-routine" or "branch" is. The word "faster" is used; maybe it's talking about FPS? What does he mean by "there's only one LR register for each mode"? What are modes?
So the "stack" would be like a workbench. You put registers onto the workbench so that you can modify them. Putting a register onto the workbench is called a "push", while taking a register off the workbench is called a "pop".
But like most things involving computers, there's a catch. Your workbench is the worst workbench ever made. Firstly, the changes you make with the workbench aren't permenant. Once a register leaves the workbench, everything you did is cleared (its value will be restored to its previous state). Fantastic. Secondly, you need to put the registers you want to use on at the exact same time (so to see what is on the third plate, the first and second plates will have to be removed). Otherwise, your registers will start stacking on top of each other. And you can only see the registers on the top of the stack. It's a weird workbench.
This is how we use the workbench. "ldr" is the tool we're going to use, r0 is the register we're using it on, and .PLAYER_DATA is how we're using the tool.
Now for ldr. ldr is like using a mold. We put our register, r0, into the mold, .PLAYER_DATA. ldr does it's magic and presto, r0 is changed to look like .PLAYER_DATA!
And suddenly I'm back to being totally confused. "actual value"? Were we using fake values? Did we just put r0 into a mold of itself?
Okay, same thing as ldr r0, .PLAYER_DATA. This time we're using a different mold on a different offset, but it's the same process.
Seriously, why does this guy keep talking about bits? Now I gotta explain this to myself.
A bit is either a 0 or 1. A computer uses it like a base-2 number system. You put in 8 bits and you get a symbol. This symbol is called a byte. 1 bit = 8 bytes. If it's a perfect conversion, just stop giving us the amount of bits and instead just give us the amount of bytes. So maybe it's not always a perfect conversion? Whatever, I'll find out later.
Okay, back on topic. I'm guessing "memory address" is like a house address. It's a system that shows you where the memory is. So when I follow a memory address, I get to a certain thing of memory.
What name is this guy talking about? Name entered by the player in-game, maybe? Looking back:
That's the only time a name is mentioned. The name is 8 bytes, the gender is 1 byte, ??? is 1 byte, and the Trainer ID is 2 bytes. 8+1+1+2=12. That's probably what he means about counting from the first byte till the Secret ID then. If the name is the same one entered by player in-game, it would explain why it's there's an 8 character maximum; one character for each byte. Though then wouldn't the Secret ID be 5 bytes? Whatever, push on.
So a word is 4 bytes, a half-word is 2 bytes. I still don't know what this command is doing. 0xC is hexadecimal for 12. If you can add a number to an address, then addresses must be stored as numbers. You add 12 to the address and it's still readable as an address, but it's now for a different thingy of memory. Think of it like having a house address of 7 Stupid Lane, adding 12, and getting 21 Stupid Lane. It's still an address, but it's a to a different house down the street.
I might of lost track at ldr r0, [r0], but I think .PLAYER_DATA is still "in" r0. Whatever .PLAYER_DATA is, it's probably a variable, like an x in algebra. The variable was mentioned to be an address; .PLAYER_DATA must be an address. I'm not clear on the exact syntax, but ldrh must be "you know where this thing of memory is? from there, go to the thingy of memory a couple houses down". It's like going through a shelf full of molds, picking one, and then you get told you have to use the mold a couple rows down. ASM really is just working in the world's worst factory.
Confusing once again, but I've figured out enough now to learn some new info immediately.
1. r0 is currently our Secret ID.
2. r1 is "pointed" at the address LASTRESULT. I recognize LASTRESULT from the Yes/No textboxes in XSE. It's used in the script this guy made me compile; perhaps I need to look into the buffernumber command.
3. The "h" suffix is...something. Dunno what it does yet.
4. A variable is 2 bytes. I've been using variable in my notes but it's probably unrelated to the type of variable this guy is talking about. Push on.
5. This script is "storing" the contents of r0 in r1. Dunno if these means copy n' pasting or cut n' pasting.
Did we just erase everything we just did? We didn't even use a register called pc. Are variables registers? By "you should always pop it back", does it mean "if you don't clear the stack by the end of the code bad things will happen"? I don't even know. Next line.
Guessing this needs to go at the end of every script. Don't care why, as long as it works.
Wait, what? We've been using .PLAYER_DATA and we haven't even defined it yet? Just...how? What? Next line.
We're now modifying .PLAYER_DATA. I'm guessing "assigning" in this case means "adding onto", cause we've already been working with .PLAYER_DATA. I think. Very ambiguous wording.
Did the same thing as we just did with .PLAYER_DATA, only this makes sense since we haven't used .VAR yet. "symbol" is an odd choice of words, I might be seeing a technical definition I don't know yet. Hopefully I guessed right about a byte storing a symbol. Would this make .VAR 1 byte? But then earlier he said variables were 2 bytes. I dunno, next line.
It looks like he's doing math in hexadecimal. Dunno what he's talking about with making it easier to change, though I don't know how you would even go about changing it in the first place. No idea what "temporary variables" are.
I'm completely lost at this point. I dunno if this even relates to the code. Let's just end it here.
From the tutorials I've explored, it seems like I'm in WAY over my head. This line at the end of FBI's Inserting Routines Into Your ROM tutorial sums it up nicely:
If you still don't understand, I would advise you to go and play outside instead
To which I say "what is this outside you speak of?" And no, I didn't understand at that point.
But I kept going. I was able to insert HackMew's tutorial ASM routine after a bit of deduction. Unfortunately, I've hit a brick wall I can't get past on my own. The big problem is that I don't understand how ASM code actually works yet. I'm trying to figure out what it's doing, and I'm making a bit of progress, but not enough to make my own routines.
From my little time skimming ASM questions, I don't think I'm in the same mindset as other ASM newbies. So I went through the "How does it work" portion of HackMew's ASM tutorial line-by-line. I'd like to know if I'm at least somewhere in the ballpark with my train of thought, and where I'm going wrong.
It's a HUGE response, so spoiler tag. Thanks in advance to anyone who spares the time.
Spoiler:
Starting at the first lines that I don't think appear at the head of every routine:
Here the true THUMB code starts. This instruction will push registers from r0 to r1, along with the Link Register into the stack. "What the heck are registers? Stack??"Code:push {r0-r1, lr}
Registers are special memory areas which are 32 bits wide hence they can hold numbers up to 4 bytes.
Ignore the bits part, you don't need to worry about that yet.
According to Google, a byte is a character. I'm interpreting that a character would be a symbol, such as a letter, number or anything you can type in a word processor. So that would make a register just 4 symbols.
They can be accessed by simply calling their name.
What do you mean by "accessed" and "calling their name"? Would the names of registers be "r0", "r1", "r2", ect.? Would accessed mean "used by a command"? No idea what "calling their name" means. A type of command, maybe?
There's a total of 16 registers, from r0 to r15. A bit more in detail:
- r0-r12: These 13 registers are the so called General Purpose Registers, which means they can be used for whatever reason you may have. However in THUMB mode r0 - r7 (Low Registers), can always be used whereas r8 - r12 (High Registers) can be used only by some instructions.
- r13: While in ARM mode the user can choose to use r13 or another register as a Stack Pointer, in THUMB mode this register is always used as Stack Pointer.
- r14: This is used as Link Register. When calling to a sub-routine by a branch with a Link instruction, the return address is stored in this register. Storing the return address is a lot faster then pushing it into memory, however there's only one LR register for each mode so the user must manually push its content before issuing "nested" subroutines.
- r15: This is used as Program Counter, when reading from r15 it will return a value of PC+n because of read-ahead (pipelining) while "n" depends on the instruction and on the CPU state (THUMB or ARM).
The two numbered registers used are general use registers, like a variable in scripting? r14, Link Register, is probably the lr from the previous command. Dunno what a "sub-routine" or "branch" is. The word "faster" is used; maybe it's talking about FPS? What does he mean by "there's only one LR register for each mode"? What are modes?
Stack: besides registers, there's another special memory area called "Stack". It's used to store the value of registers into it so that you can safely modify them. When you store something into the stack, that's called "pushing". When you're done, you will do the opposite. That is, "popping". When you pop a register, its value will be restored to its previous state.
A frequently used metaphor is the idea of a stack of plates in a spring loaded cafeteria stack. In such a stack, only the top plate is visible and accessible to the user, all other plates remain hidden. As new plates are added, each new plate becomes the top of the stack, hiding each plate below, pushing the stack of plates down. As the top plate is removed from the stack, they can be used, the plates pop back up, and the second plate becomes the top of the stack. Two important principles are illustrated by this metaphor: the Last In First Out principle is one; the second is that the contents of the stack are hidden. Only the top plate is visible, so to see what is on the third plate, the first and second plates will have to be removed.
To sum it up, when we use push {r0-r1, lr}, we're storing - or better, pushing - registers from r0 to r1 and the Link Register into the stack. So, following the above metaphor, r0, r1 and lr would become the top plate.
So the "stack" would be like a workbench. You put registers onto the workbench so that you can modify them. Putting a register onto the workbench is called a "push", while taking a register off the workbench is called a "pop".
But like most things involving computers, there's a catch. Your workbench is the worst workbench ever made. Firstly, the changes you make with the workbench aren't permenant. Once a register leaves the workbench, everything you did is cleared (its value will be restored to its previous state). Fantastic. Secondly, you need to put the registers you want to use on at the exact same time (so to see what is on the third plate, the first and second plates will have to be removed). Otherwise, your registers will start stacking on top of each other. And you can only see the registers on the top of the stack. It's a weird workbench.
This THUMB instruction will load the value of our custom symbol called .PLAYER_DATA into the register r0.Code:ldr r0, .PLAYER_DATA
This is how we use the workbench. "ldr" is the tool we're going to use, r0 is the register we're using it on, and .PLAYER_DATA is how we're using the tool.
Now for ldr. ldr is like using a mold. We put our register, r0, into the mold, .PLAYER_DATA. ldr does it's magic and presto, r0 is changed to look like .PLAYER_DATA!
This THUMB instruction will load into r0 the value pointed by the actual value of r0. Yes, you've guessed right: .PLAYER_DATA is a memory address which holds a pointer to the player data. First we loaded the address into r0, then we loaded into the same register the value located at the address stored in the register itself.Code:ldr r0, [r0]
And suddenly I'm back to being totally confused. "actual value"? Were we using fake values? Did we just put r0 into a mold of itself?
This THUMB instruction will load into r1 the value of the symbol .VAR, which is the memory address of the variable 0x800D, LASTRESULT.Code:ldr r1, .VAR
Okay, same thing as ldr r0, .PLAYER_DATA. This time we're using a different mold on a different offset, but it's the same process.
Right now in r0 we have the memory address of the player data. If you start counting from the first byte of the name till the Secret ID, you'll end up with 12 (0xC) bytes. So this THUMB instruction will load an half-word stored at the address r0 + 0xC. Not surprisingly, that's exactly where the Secred ID is stored. Why half-word? And, more important: what are half-words? Except for "byte" which is always 8 bits, there isn't a strict convention about its multiples. When talking about ASM, anyway, we define word a 32 bits value. Therefore an half-word (as the name suggests) is 16 bits. And the Secret ID takes 2 bytes, or 16 bits indeed.Code:ldrh r0, [r0, #0xC]
Seriously, why does this guy keep talking about bits? Now I gotta explain this to myself.
A bit is either a 0 or 1. A computer uses it like a base-2 number system. You put in 8 bits and you get a symbol. This symbol is called a byte. 1 bit = 8 bytes. If it's a perfect conversion, just stop giving us the amount of bits and instead just give us the amount of bytes. So maybe it's not always a perfect conversion? Whatever, I'll find out later.
Okay, back on topic. I'm guessing "memory address" is like a house address. It's a system that shows you where the memory is. So when I follow a memory address, I get to a certain thing of memory.
What name is this guy talking about? Name entered by the player in-game, maybe? Looking back:
So far we merely inserted a routine and we called it from a script. We have no idea what happens behind the scenes, yet.
Except one thing: the routine is used to retrieve the Secret ID. Well, not truly secret any more, eh?
The Secret ID is stored into the RAM along with other info about the player. The structure is the following:
[Name (8 bytes)] [Gender (1 byte)] [??? (1 byte)] [Trainer ID (2 bytes)] [Secret ID (2 bytes)]
[Hours of play (2 bytes)] [Minutes (1 byte)] [Seconds (1 byte)] [Frames (1 byte)]
[??? (1 byte)] [Options (2 bytes)]
That's the only time a name is mentioned. The name is 8 bytes, the gender is 1 byte, ??? is 1 byte, and the Trainer ID is 2 bytes. 8+1+1+2=12. That's probably what he means about counting from the first byte till the Secret ID then. If the name is the same one entered by player in-game, it would explain why it's there's an 8 character maximum; one character for each byte. Though then wouldn't the Secret ID be 5 bytes? Whatever, push on.
So a word is 4 bytes, a half-word is 2 bytes. I still don't know what this command is doing. 0xC is hexadecimal for 12. If you can add a number to an address, then addresses must be stored as numbers. You add 12 to the address and it's still readable as an address, but it's now for a different thingy of memory. Think of it like having a house address of 7 Stupid Lane, adding 12, and getting 21 Stupid Lane. It's still an address, but it's a to a different house down the street.
I might of lost track at ldr r0, [r0], but I think .PLAYER_DATA is still "in" r0. Whatever .PLAYER_DATA is, it's probably a variable, like an x in algebra. The variable was mentioned to be an address; .PLAYER_DATA must be an address. I'm not clear on the exact syntax, but ldrh must be "you know where this thing of memory is? from there, go to the thingy of memory a couple houses down". It's like going through a shelf full of molds, picking one, and then you get told you have to use the mold a couple rows down. ASM really is just working in the world's worst factory.
This THUMB instruction will store the value held by r0 (which is our Secret ID) at the address pointed by r1, which is LASTRESULT. Note that we're using the "h" suffix once again. In fact we're storing an half-word since variables are 16 bits wide (from 0x0 to 0xFFFF).Code:strh r0, [r1]
Confusing once again, but I've figured out enough now to learn some new info immediately.
1. r0 is currently our Secret ID.
2. r1 is "pointed" at the address LASTRESULT. I recognize LASTRESULT from the Yes/No textboxes in XSE. It's used in the script this guy made me compile; perhaps I need to look into the buffernumber command.
3. The "h" suffix is...something. Dunno what it does yet.
4. A variable is 2 bytes. I've been using variable in my notes but it's probably unrelated to the type of variable this guy is talking about. Push on.
5. This script is "storing" the contents of r0 in r1. Dunno if these means copy n' pasting or cut n' pasting.
This THUMB instruction will revert the effect of our previous push. Remember that when you push a variable and you change its value, you should always pop it back.Code:pop {r0-r1, pc}
Did we just erase everything we just did? We didn't even use a register called pc. Are variables registers? By "you should always pop it back", does it mean "if you don't clear the stack by the end of the code bad things will happen"? I don't even know. Next line.
Assembler directive. Nothing new actually. Just don't forget it.Code:.align 2
Guessing this needs to go at the end of every script. Don't care why, as long as it works.
This is a label used to define the .PLAYER_DATA symbol used by the routine.Code:.PLAYER_DATA:
Wait, what? We've been using .PLAYER_DATA and we haven't even defined it yet? Just...how? What? Next line.
Assigns the word (32 bits) 0x300500C to .PLAYER_DATA.Code:.word 0x0300500C
We're now modifying .PLAYER_DATA. I'm guessing "assigning" in this case means "adding onto", cause we've already been working with .PLAYER_DATA. I think. Very ambiguous wording.
This is a label used to define the .VAR symbol used by the routine.Code:.VAR:
Did the same thing as we just did with .PLAYER_DATA, only this makes sense since we haven't used .VAR yet. "symbol" is an odd choice of words, I might be seeing a technical definition I don't know yet. Hopefully I guessed right about a byte storing a symbol. Would this make .VAR 1 byte? But then earlier he said variables were 2 bytes. I dunno, next line.
Assigns the word (32 bits) 0x020270B6 + (0x800D * 2) = 0x020370D0 to .VAR. If you're wondering about the "weird" format, I made that to make it easier changing the variable used. Note however this would work only for temporary variables, 0x800D onwards. For the previous temporary variables, just increase the main address by 2 (in the example above, you would change it to .word 0x020270B8 + (0x8000 * 2), if you were to use variable 0x8000).Code:.word 0x020270B6 + (0x800D * 2)
It looks like he's doing math in hexadecimal. Dunno what he's talking about with making it easier to change, though I don't know how you would even go about changing it in the first place. No idea what "temporary variables" are.
For the sake of precision, I'll explain you how memory is used by the GBA.
- System ROM/BIOS: starts at 0x0000000 with a length of 16KBs, this section contains BIOS memory which is strictly read-only.
- External Working RAM/EWRAM: starts at 0x2000000 and has a length of 256 KB. Since it contains a 16-bit databus, THUMB is best used here.
It allows 8, 16 and 32 bits read/write.
- Internal Working RAM/IWRAM: begins at 0x3000000 and has a length of 32 KB with a 32-bit databus thus making it fast for ARM code.
It allows 8, 16 and 32 bit read/write.
- Register Memory/IO: begins at 0x4000000 going up to 1 KB. This is where you control graphics, sound, timing, keypressing etc.
Besides the name, it has absolutely nothing to do with the actual registers: r0-r15.
- Palette Memory: starts at 0x5000000 going up to 1 KB. This area contains 2 palettes: backgrounds and sprites, respectively.
- Video Memory/VRAM: starts at 0x6000000, graphic data (tilesets, sprites, tilemaps) are stored here. Sprites are usually stored starting from 0x6010000.
- Object Attribute Memory/OAM: begins at 0x7000000 with a length of 1 KB. This is where you control sprites such as storing width, height, or location to sprite graphic data.
- ROM: begins at 0x8000000 going to a maximum of 32 MB, usually. THUMB is the best choice over here.
I'm completely lost at this point. I dunno if this even relates to the code. Let's just end it here.