The PokéCommunity Forums  

Go Back   The PokéCommunity Forums > Fan Games > Binary ROM Hacking
Reload this Page Other Newbie tries to read ASM tutorial

Notices
For all updates, view the main page.

Binary ROM Hacking Need a helping hand or just want to talk about binary ROM hacks? Get comments and answers to any ROM Hacking-related problems, questions or thoughts you have here.

Ad Content
Reply
 
Thread Tools
  #1   Link to this post, but load the entire thread.  
Old November 9th, 2015 (7:29 PM).
Oloolooloo!'s Avatar
Oloolooloo! Oloolooloo! is offline
 
Join Date: Feb 2015
Posts: 77
A bit of background. I'm a decent tool-based hacker with a basic knowledge of hex and XSE scripting. I know a bit of computer programming (Python, specifically) but I know almost nothing about computer hardware. And I'm trying to learn ASM.

From the tutorials I've explored, it seems like I'm in WAY over my head. This line at the end of FBI's Inserting Routines Into Your ROM tutorial sums it up nicely:

Quote:
Originally Posted by FBI View Post
If you still don't understand, I would advise you to go and play outside instead
To which I say "what is this outside you speak of?" And no, I didn't understand at that point.

But I kept going. I was able to insert HackMew's tutorial ASM routine after a bit of deduction. Unfortunately, I've hit a brick wall I can't get past on my own. The big problem is that I don't understand how ASM code actually works yet. I'm trying to figure out what it's doing, and I'm making a bit of progress, but not enough to make my own routines.

From my little time skimming ASM questions, I don't think I'm in the same mindset as other ASM newbies. So I went through the "How does it work" portion of HackMew's ASM tutorial line-by-line. I'd like to know if I'm at least somewhere in the ballpark with my train of thought, and where I'm going wrong.

It's a HUGE response, so spoiler tag. Thanks in advance to anyone who spares the time.

Spoiler:

Starting at the first lines that I don't think appear at the head of every routine:
Quote:
Originally Posted by HackMew View Post
Code:
push {r0-r1, lr}
Here the true THUMB code starts. This instruction will push registers from r0 to r1, along with the Link Register into the stack. "What the heck are registers? Stack??"

Registers are special memory areas which are 32 bits wide hence they can hold numbers up to 4 bytes.
Ignore the bits part, you don't need to worry about that yet.

According to Google, a byte is a character. I'm interpreting that a character would be a symbol, such as a letter, number or anything you can type in a word processor. So that would make a register just 4 symbols.

Quote:
Originally Posted by HackMew View Post
They can be accessed by simply calling their name.
What do you mean by "accessed" and "calling their name"? Would the names of registers be "r0", "r1", "r2", ect.? Would accessed mean "used by a command"? No idea what "calling their name" means. A type of command, maybe?

Quote:
Originally Posted by HackMew View Post
There's a total of 16 registers, from r0 to r15. A bit more in detail:

- r0-r12: These 13 registers are the so called General Purpose Registers, which means they can be used for whatever reason you may have. However in THUMB mode r0 - r7 (Low Registers), can always be used whereas r8 - r12 (High Registers) can be used only by some instructions.

- r13: While in ARM mode the user can choose to use r13 or another register as a Stack Pointer, in THUMB mode this register is always used as Stack Pointer.

- r14: This is used as Link Register. When calling to a sub-routine by a branch with a Link instruction, the return address is stored in this register. Storing the return address is a lot faster then pushing it into memory, however there's only one LR register for each mode so the user must manually push its content before issuing "nested" subroutines.

- r15: This is used as Program Counter, when reading from r15 it will return a value of PC+n because of read-ahead (pipelining) while "n" depends on the instruction and on the CPU state (THUMB or ARM).
The two numbered registers used are general use registers, like a variable in scripting? r14, Link Register, is probably the lr from the previous command. Dunno what a "sub-routine" or "branch" is. The word "faster" is used; maybe it's talking about FPS? What does he mean by "there's only one LR register for each mode"? What are modes?

Quote:
Originally Posted by HackMew View Post
Stack: besides registers, there's another special memory area called "Stack". It's used to store the value of registers into it so that you can safely modify them. When you store something into the stack, that's called "pushing". When you're done, you will do the opposite. That is, "popping". When you pop a register, its value will be restored to its previous state.

A frequently used metaphor is the idea of a stack of plates in a spring loaded cafeteria stack. In such a stack, only the top plate is visible and accessible to the user, all other plates remain hidden. As new plates are added, each new plate becomes the top of the stack, hiding each plate below, pushing the stack of plates down. As the top plate is removed from the stack, they can be used, the plates pop back up, and the second plate becomes the top of the stack. Two important principles are illustrated by this metaphor: the Last In First Out principle is one; the second is that the contents of the stack are hidden. Only the top plate is visible, so to see what is on the third plate, the first and second plates will have to be removed.

To sum it up, when we use push {r0-r1, lr}, we're storing - or better, pushing - registers from r0 to r1 and the Link Register into the stack. So, following the above metaphor, r0, r1 and lr would become the top plate.
So the "stack" would be like a workbench. You put registers onto the workbench so that you can modify them. Putting a register onto the workbench is called a "push", while taking a register off the workbench is called a "pop".

But like most things involving computers, there's a catch. Your workbench is the worst workbench ever made. Firstly, the changes you make with the workbench aren't permenant. Once a register leaves the workbench, everything you did is cleared (its value will be restored to its previous state). Fantastic. Secondly, you need to put the registers you want to use on at the exact same time (so to see what is on the third plate, the first and second plates will have to be removed). Otherwise, your registers will start stacking on top of each other. And you can only see the registers on the top of the stack. It's a weird workbench.

Quote:
Originally Posted by HackMew View Post
Code:
ldr r0, .PLAYER_DATA
This THUMB instruction will load the value of our custom symbol called .PLAYER_DATA into the register r0.
This is how we use the workbench. "ldr" is the tool we're going to use, r0 is the register we're using it on, and .PLAYER_DATA is how we're using the tool.

Now for ldr. ldr is like using a mold. We put our register, r0, into the mold, .PLAYER_DATA. ldr does it's magic and presto, r0 is changed to look like .PLAYER_DATA!

Quote:
Originally Posted by HackMew View Post
Code:
ldr r0, [r0]
This THUMB instruction will load into r0 the value pointed by the actual value of r0. Yes, you've guessed right: .PLAYER_DATA is a memory address which holds a pointer to the player data. First we loaded the address into r0, then we loaded into the same register the value located at the address stored in the register itself.
And suddenly I'm back to being totally confused. "actual value"? Were we using fake values? Did we just put r0 into a mold of itself?

Quote:
Originally Posted by HackMew View Post
Code:
ldr r1, .VAR
This THUMB instruction will load into r1 the value of the symbol .VAR, which is the memory address of the variable 0x800D, LASTRESULT.
Okay, same thing as ldr r0, .PLAYER_DATA. This time we're using a different mold on a different offset, but it's the same process.

Quote:
Originally Posted by HackMew View Post
Code:
ldrh r0, [r0, #0xC]
Right now in r0 we have the memory address of the player data. If you start counting from the first byte of the name till the Secret ID, you'll end up with 12 (0xC) bytes. So this THUMB instruction will load an half-word stored at the address r0 + 0xC. Not surprisingly, that's exactly where the Secred ID is stored. Why half-word? And, more important: what are half-words? Except for "byte" which is always 8 bits, there isn't a strict convention about its multiples. When talking about ASM, anyway, we define word a 32 bits value. Therefore an half-word (as the name suggests) is 16 bits. And the Secret ID takes 2 bytes, or 16 bits indeed.
Seriously, why does this guy keep talking about bits? Now I gotta explain this to myself.

A bit is either a 0 or 1. A computer uses it like a base-2 number system. You put in 8 bits and you get a symbol. This symbol is called a byte. 1 bit = 8 bytes. If it's a perfect conversion, just stop giving us the amount of bits and instead just give us the amount of bytes. So maybe it's not always a perfect conversion? Whatever, I'll find out later.

Okay, back on topic. I'm guessing "memory address" is like a house address. It's a system that shows you where the memory is. So when I follow a memory address, I get to a certain thing of memory.

What name is this guy talking about? Name entered by the player in-game, maybe? Looking back:

Quote:
Originally Posted by HackMew View Post
So far we merely inserted a routine and we called it from a script. We have no idea what happens behind the scenes, yet.
Except one thing: the routine is used to retrieve the Secret ID. Well, not truly secret any more, eh?
The Secret ID is stored into the RAM along with other info about the player. The structure is the following:

Quote:
[Name (8 bytes)] [Gender (1 byte)] [??? (1 byte)] [Trainer ID (2 bytes)] [Secret ID (2 bytes)]
[Hours of play (2 bytes)] [Minutes (1 byte)] [Seconds (1 byte)] [Frames (1 byte)]
[??? (1 byte)] [Options (2 bytes)]
That's the only time a name is mentioned. The name is 8 bytes, the gender is 1 byte, ??? is 1 byte, and the Trainer ID is 2 bytes. 8+1+1+2=12. That's probably what he means about counting from the first byte till the Secret ID then. If the name is the same one entered by player in-game, it would explain why it's there's an 8 character maximum; one character for each byte. Though then wouldn't the Secret ID be 5 bytes? Whatever, push on.

So a word is 4 bytes, a half-word is 2 bytes. I still don't know what this command is doing. 0xC is hexadecimal for 12. If you can add a number to an address, then addresses must be stored as numbers. You add 12 to the address and it's still readable as an address, but it's now for a different thingy of memory. Think of it like having a house address of 7 Stupid Lane, adding 12, and getting 21 Stupid Lane. It's still an address, but it's a to a different house down the street.

I might of lost track at ldr r0, [r0], but I think .PLAYER_DATA is still "in" r0. Whatever .PLAYER_DATA is, it's probably a variable, like an x in algebra. The variable was mentioned to be an address; .PLAYER_DATA must be an address. I'm not clear on the exact syntax, but ldrh must be "you know where this thing of memory is? from there, go to the thingy of memory a couple houses down". It's like going through a shelf full of molds, picking one, and then you get told you have to use the mold a couple rows down. ASM really is just working in the world's worst factory.

Quote:
Originally Posted by HackMew View Post
Code:
strh r0, [r1]
This THUMB instruction will store the value held by r0 (which is our Secret ID) at the address pointed by r1, which is LASTRESULT. Note that we're using the "h" suffix once again. In fact we're storing an half-word since variables are 16 bits wide (from 0x0 to 0xFFFF).
Confusing once again, but I've figured out enough now to learn some new info immediately.

1. r0 is currently our Secret ID.
2. r1 is "pointed" at the address LASTRESULT. I recognize LASTRESULT from the Yes/No textboxes in XSE. It's used in the script this guy made me compile; perhaps I need to look into the buffernumber command.
3. The "h" suffix is...something. Dunno what it does yet.
4. A variable is 2 bytes. I've been using variable in my notes but it's probably unrelated to the type of variable this guy is talking about. Push on.
5. This script is "storing" the contents of r0 in r1. Dunno if these means copy n' pasting or cut n' pasting.

Quote:
Originally Posted by HackMew View Post
Code:
pop {r0-r1, pc}
This THUMB instruction will revert the effect of our previous push. Remember that when you push a variable and you change its value, you should always pop it back.
Did we just erase everything we just did? We didn't even use a register called pc. Are variables registers? By "you should always pop it back", does it mean "if you don't clear the stack by the end of the code bad things will happen"? I don't even know. Next line.

Quote:
Originally Posted by HackMew View Post
Code:
.align 2
Assembler directive. Nothing new actually. Just don't forget it.
Guessing this needs to go at the end of every script. Don't care why, as long as it works.

Quote:
Originally Posted by HackMew View Post
Code:
.PLAYER_DATA:
This is a label used to define the .PLAYER_DATA symbol used by the routine.
Wait, what? We've been using .PLAYER_DATA and we haven't even defined it yet? Just...how? What? Next line.

Quote:
Originally Posted by HackMew View Post
Code:
.word 0x0300500C
Assigns the word (32 bits) 0x300500C to .PLAYER_DATA.
We're now modifying .PLAYER_DATA. I'm guessing "assigning" in this case means "adding onto", cause we've already been working with .PLAYER_DATA. I think. Very ambiguous wording.

Quote:
Originally Posted by HackMew View Post
Code:
.VAR:
This is a label used to define the .VAR symbol used by the routine.
Did the same thing as we just did with .PLAYER_DATA, only this makes sense since we haven't used .VAR yet. "symbol" is an odd choice of words, I might be seeing a technical definition I don't know yet. Hopefully I guessed right about a byte storing a symbol. Would this make .VAR 1 byte? But then earlier he said variables were 2 bytes. I dunno, next line.

Quote:
Originally Posted by HackMew View Post
Code:
.word 0x020270B6 + (0x800D * 2)
Assigns the word (32 bits) 0x020270B6 + (0x800D * 2) = 0x020370D0 to .VAR. If you're wondering about the "weird" format, I made that to make it easier changing the variable used. Note however this would work only for temporary variables, 0x800D onwards. For the previous temporary variables, just increase the main address by 2 (in the example above, you would change it to .word 0x020270B8 + (0x8000 * 2), if you were to use variable 0x8000).
It looks like he's doing math in hexadecimal. Dunno what he's talking about with making it easier to change, though I don't know how you would even go about changing it in the first place. No idea what "temporary variables" are.

Quote:
Originally Posted by HackMew View Post
For the sake of precision, I'll explain you how memory is used by the GBA.

- System ROM/BIOS: starts at 0x0000000 with a length of 16KBs, this section contains BIOS memory which is strictly read-only.
- External Working RAM/EWRAM: starts at 0x2000000 and has a length of 256 KB. Since it contains a 16-bit databus, THUMB is best used here.
It allows 8, 16 and 32 bits read/write.
- Internal Working RAM/IWRAM: begins at 0x3000000 and has a length of 32 KB with a 32-bit databus thus making it fast for ARM code.
It allows 8, 16 and 32 bit read/write.
- Register Memory/IO: begins at 0x4000000 going up to 1 KB. This is where you control graphics, sound, timing, keypressing etc.
Besides the name, it has absolutely nothing to do with the actual registers: r0-r15.
- Palette Memory: starts at 0x5000000 going up to 1 KB. This area contains 2 palettes: backgrounds and sprites, respectively.
- Video Memory/VRAM: starts at 0x6000000, graphic data (tilesets, sprites, tilemaps) are stored here. Sprites are usually stored starting from 0x6010000.
- Object Attribute Memory/OAM: begins at 0x7000000 with a length of 1 KB. This is where you control sprites such as storing width, height, or location to sprite graphic data.
- ROM: begins at 0x8000000 going to a maximum of 32 MB, usually. THUMB is the best choice over here.
I'm completely lost at this point. I dunno if this even relates to the code. Let's just end it here.
__________________
In your self-improvement efforts, strive for perfection, knowing you'll never make it. But you'll damn surely hit high if you aim high.
- Gene Duncan, U.S. Marine Corps (Retired)
Reply With Quote
  #2   Link to this post, but load the entire thread.  
Old November 9th, 2015 (8:16 PM).
azurile13 azurile13 is offline
 
Join Date: Mar 2015
Posts: 417
Well, the reason FBI's post won't make sense to you is that it wasn't trying to teach you ASM. It was telling you how to insert other people's ASM. But I believe he has a number of other tutorials on writing. I was going to read your comments, but yeah. It is very long. I may read it eventually. Until then, did you read Touched's tutorial? It is more conceptual than "implement x feature," which it sounds like you're looking for.

https://github.com/Touched/asm-tutorial/blob/master/doc.md
Reply With Quote
  #3   Link to this post, but load the entire thread.  
Old November 10th, 2015 (5:06 AM).
Touched's Avatar
Touched Touched is offline
Resident ASMAGICIAN
 
Join Date: Jul 2014
Gender: Male
Posts: 625
I'll give this a go.

Basically, you seem to be very confused about basic terminology. Most of the time you can use Wikipedia to look up the term you are confused with and it will explain the jargon, or at least give you some more words to Google. Seriously, get into the habit of looking up every word that seems like jargon. ASM is one of the few things in ROM hacking that isn't domain specific; there is a lot more information out there than on this forum. Also, much of what I say is going to sound super pedantic, in a very technical field like computer science, the nomenclature is everything. Also, getting the jargon right will help you recognise subtle differences between concepts, which is super important.

Quote:
Originally Posted by Oloolooloo! View Post
Starting at the first lines that I don't think appear at the head of every routine:
Fair warning. Hackmew's tutorial is out of date. We don't use this anymore because some of us actually bothered to read the ARM manual. Registers r0-r3 are known as "scratch registers", which means you can mess them up in a subroutine as much as you want without pushing them to the stack.

Quote:
Originally Posted by Oloolooloo! View Post
According to Google, a byte is a character. I'm interpreting that a character would be a symbol, such as a letter, number or anything you can type in a word processor. So that would make a register just 4 symbols.
A byte is not a character. A character may be a byte, but the reverse is not necessarily true (see multibyte encodings). I understand this may be confusing. You say you have basic knowledge of hex editing, so you should understand what a byte is. A byte is 8 bits, which is just a convenient grouping of digits. Hexadecimal and binary (and octal) are closely related, since they're bases that are powers of two. This grouping is convenient, because it helps us split numbers into logically distinct units. Like in decimal, powers of ten (10, 100, 1000) are "nice" numbers, powers of two are "nice" in binary/hexadecimal/octal. In decimal, we group numbers in terms of powers of ten. 10,000,000 makes it easy to read the 10 million, in the same way grouping stuff in powers of two (8 bits, say) is convenient for computers.

Think of register capacity in number ranges. A register can hold 32 bits, or a range of 0 - 2^32 for unsigned numbers (numbers that can only be positive). For signed numbers (negative or positive numbers), this range is somewhat smaller because we need to use half for negative, half for positive. See two's complement for how negative numbers are represented in binary.

Quote:
Originally Posted by Oloolooloo! View Post
What do you mean by "accessed" and "calling their name"? Would the names of registers be "r0", "r1", "r2", ect.? Would accessed mean "used by a command"? No idea what "calling their name" means. A type of command, maybe?
Terrible wording. He means you can use the name of the register in conjunction with a mnemonic to modify/use it.

Quote:
Originally Posted by Oloolooloo! View Post
The two numbered registers used are general use registers, like a variable in scripting? r14, Link Register, is probably the lr from the previous command. Dunno what a "sub-routine" or "branch" is. The word "faster" is used; maybe it's talking about FPS? What does he mean by "there's only one LR register for each mode"? What are modes?
Subroutine is wikipediable. It's just a logical unit of instructions, sometimes known as a function. Branch is also on wiki. You say you do scripting. Ever do "if 0x1 goto bleh"? That's a branch. It just means that the program jumps to a new location.

He says faster, but it's not very clear what he is talking about. I assume he means that having a special register to hold the return location for a function is faster than using memory for the same purpose. This is faster because it wouldn't have to transfer data over an address bus, saving CPU cycles. When talking about ASM speed, we rarely mean FPS, and almost always mean CPU cycles. A frame is counted in tens of thousands of cycles (apparently 280,896 cycles).

Modes are CPU modes. You can find a list on GBATEK. These modes are used for a specialised purpose, but you probably don't want to know because it involves me using words like interrupts and BIOS. You're gonna be in User mode basically 100% of the time, the GBA has other modes, but you don't really have to worry about those for a while. Rather get the basic theory down before worrying about this.

Quote:
Originally Posted by Oloolooloo! View Post
So the "stack" would be like a workbench. You put registers onto the workbench so that you can modify them. Putting a register onto the workbench is called a "push", while taking a register off the workbench is called a "pop".

But like most things involving computers, there's a catch. Your workbench is the worst workbench ever made. Firstly, the changes you make with the workbench aren't permenant. Once a register leaves the workbench, everything you did is cleared (its value will be restored to its previous state). Fantastic. Secondly, you need to put the registers you want to use on at the exact same time (so to see what is on the third plate, the first and second plates will have to be removed). Otherwise, your registers will start stacking on top of each other. And you can only see the registers on the top of the stack. It's a weird workbench.
You seem to mix up what you're calling a workbech. One minute you call it the stack, then it blurs into registers. Forget registers. Forget the stack. You have a misconception of what the registers are for. It does not matter that they are transient. All that matters is memory. On the GBA (simply), we have the ROM (read only memory), we have RAM (quite a few areas). The working RAM is temporary data that we can use while the machine is running to store stuff. But this is not really useful since we can't store things, so we have some region to store stuff, so that's usually the SRAM and is used for storing save games and stuff. Now we have all this memory, how do we read it? And, more importantly, how do we perform calculations on it? The CPU of course! One problem. It can't really calculate using memory addresses. It therefore needs some really fast data "pockets" to load data into from the memory, calculate stuff, and then store it somewhere. This is known as load/store architecture and is just one way of doing things.

Quote:
Originally Posted by Oloolooloo! View Post
This is how we use the workbench. "ldr" is the tool we're going to use, r0 is the register we're using it on, and .PLAYER_DATA is how we're using the tool.
Now for ldr. ldr is like using a mold. We put our register, r0, into the mold, .PLAYER_DATA. ldr does it's magic and presto, r0 is changed to look like .PLAYER_DATA!
As I said, your analogy got a bit confused. This is a load/store architecture. We load from memory into registers using LDR variants, use the registers to perform some calculation, and then use STR variants to put it somewhere.

Quote:
Originally Posted by Oloolooloo! View Post
And suddenly I'm back to being totally confused. "actual value"? Were we using fake values? Did we just put r0 into a mold of itself?
Yay! Pointer confusion. Everyone has this, don't worry.

The GBA has a lot (not compared to modern computers, but bear with me) of memory. Registers are small comparatively, and we have only a few. Sure registers can hold numbers, but that isn't very useful when the data is structured (i.e. it is a combination of basic data types, like numbers). Say we have a list of numbers. We can't fit all the numbers into the registers, so how do we use this list? Well, we can just remember where the list is in memory, and use that to operate on the list. This location is known as an address. Addresses on the GBA are segmented, since we have a few different sections (ROM, RAM, etc. all on different physical chips). We store the segment and the offset of that segment.

That's all a pointer is. An address of some piece of data. Of course, this gets confusing when we have pointers to pointers (yes, this is possible, and can be abused so we have nested pointer hell). PLAYER_DATA is such a pointer to a pointer. Gamefreak thought an ingenious mechanism of stopping hackers (probably Gameshark hackers) was to move certain important data (such as player data) around in memory. This means that we need to store a pointer somewhere so we can move it around. PLAYER_DATA is a pointer to that pointer.

When doing LDR, R0, PLAYER_DATA, we load the first pointer into R0. LDR R0, [R0] gets the pointer at the address in R0 and puts it in R0. This is the "true" pointer to the data. That's what he means by actual value.

Quote:
Originally Posted by Oloolooloo! View Post
Okay, same thing as ldr r0, .PLAYER_DATA. This time we're using a different mold on a different offset, but it's the same process.

Seriously, why does this guy keep talking about bits? Now I gotta explain this to myself.

A bit is either a 0 or 1. A computer uses it like a base-2 number system. You put in 8 bits and you get a symbol. This symbol is called a byte. 1 bit = 8 bytes. If it's a perfect conversion, just stop giving us the amount of bits and instead just give us the amount of bytes. So maybe it's not always a perfect conversion? Whatever, I'll find out later.
We just generally talk about bits when talking about CPUs, as that's what we're working with mainly. That's probably a typo, but a bit is not 8 bytes. Other way round.
Everyone knows their powers of two, (so do you if you've ever bought a flash drive or portable hard drive) so it doesn't really bother us.

Quote:
Originally Posted by Oloolooloo! View Post
Okay, back on topic. I'm guessing "memory address" is like a house address. It's a system that shows you where the memory is. So when I follow a memory address, I get to a certain thing of memory.
Yes

Quote:
Originally Posted by Oloolooloo! View Post
What name is this guy talking about? Name entered by the player in-game, maybe? Looking back:
Player's name

Quote:
Originally Posted by Oloolooloo! View Post
That's the only time a name is mentioned. The name is 8 bytes, the gender is 1 byte, ??? is 1 byte, and the Trainer ID is 2 bytes. 8+1+1+2=12. That's probably what he means about counting from the first byte till the Secret ID then. If the name is the same one entered by player in-game, it would explain why it's there's an 8 character maximum; one character for each byte. Though then wouldn't the Secret ID be 5 bytes? Whatever, push on.
Why would it be 5 bytes?

Quote:
Originally Posted by Oloolooloo! View Post
So a word is 4 bytes, a half-word is 2 bytes. I still don't know what this command is doing. 0xC is hexadecimal for 12. If you can add a number to an address, then addresses must be stored as numbers. You add 12 to the address and it's still readable as an address, but it's now for a different thingy of memory. Think of it like having a house address of 7 Stupid Lane, adding 12, and getting 21 Stupid Lane. It's still an address, but it's a to a different house down the street.
Since we have the pointer to the structure (explained what that is earlier), we now need a specific value in that structure. This happens to be at offset 12 (0xC) in the structure. So we load the halfword (2 bytes) at offset 12. Since that is the ID, we now have the ID in the register, and it can be used however we wish.

Quote:
Originally Posted by Oloolooloo! View Post
I might of lost track at ldr r0, [r0], but I think .PLAYER_DATA is still "in" r0. Whatever .PLAYER_DATA is, it's probably a variable, like an x in algebra. The variable was mentioned to be an address; .PLAYER_DATA must be an address. I'm not clear on the exact syntax, but ldrh must be "you know where this thing of memory is? from there, go to the thingy of memory a couple houses down". It's like going through a shelf full of molds, picking one, and then you get told you have to use the mold a couple rows down. ASM really is just working in the world's worst factory.

Confusing once again, but I've figured out enough now to learn some new info immediately.

1. r0 is currently our Secret ID.
2. r1 is "pointed" at the address LASTRESULT. I recognize LASTRESULT from the Yes/No textboxes in XSE. It's used in the script this guy made me compile; perhaps I need to look into the buffernumber command.
3. The "h" suffix is...something. Dunno what it does yet.
4. A variable is 2 bytes. I've been using variable in my notes but it's probably unrelated to the type of variable this guy is talking about. Push on.
5. This script is "storing" the contents of r0 in r1. Dunno if these means copy n' pasting or cut n' pasting.
Hopefully I've answered most of these.

This isn't a script though. It's a routine. Very different.

Quote:
Originally Posted by Oloolooloo! View Post
Did we just erase everything we just did? We didn't even use a register called pc. Are variables registers? By "you should always pop it back", does it mean "if you don't clear the stack by the end of the code bad things will happen"? I don't even know. Next line.
As I've explained, registers are temporary containers. However, other subroutines/functions also use them. We back them up on the stack (I explain this thoroughly in my tutorial). The only ones we don't back up are r0-r3 (which is why this push/pop is actually stupid/useless/wrong - it works, but it's useless). You don't "use" PC. it's updated for you. It's a pointer to the currently run instruction and it's how the CPU keeps track of what to execute. Read my tutorial. It explains the relationships between LR/PC and how they interact over push/pop.

Quote:
Originally Posted by Oloolooloo! View Post
Guessing this needs to go at the end of every script. Don't care why, as long as it works.
Wait, what? We've been using .PLAYER_DATA and we haven't even defined it yet? Just...how? What? Next line.
We don't need to define it earlier. We define it at the bottom, because you want the first thing in your routine to be code. Otherwise it's difficult to get a pointer to the code and you'll end up executing data (the CPU can't tell the difference - it's all ones and zeroes). You don't need to define it earlier because the assembler doesn't resolve the names until later. We're just defining a symbol. That is just a mapping of a name to a value.

Quote:
Originally Posted by Oloolooloo! View Post
We're now modifying .PLAYER_DATA. I'm guessing "assigning" in this case means "adding onto", cause we've already been working with .PLAYER_DATA. I think. Very ambiguous wording.

Did the same thing as we just did with .PLAYER_DATA, only this makes sense since we haven't used .VAR yet. "symbol" is an odd choice of words, I might be seeing a technical definition I don't know yet. Hopefully I guessed right about a byte storing a symbol. Would this make .VAR 1 byte? But then earlier he said variables were 2 bytes. I dunno, next line.

It looks like he's doing math in hexadecimal. Dunno what he's talking about with making it easier to change, though I don't know how you would even go about changing it in the first place. No idea what "temporary variables" are.

I'm completely lost at this point. I dunno if this even relates to the code. Let's just end it here.
Symbol isn't an odd choice of words, it's jargon. I linked it earlier.

Temporary variables are the 0x8000 series. These are used internally all over the place, and you generally don't want to keep data in there permanently as it might be overwritten.

Hopefully I've cleared some stuff up, but I hope I've shown you that you really need to start looking up terminology. It might help if you started trying to learn to program more on the side. Maybe try more Python, then pick up C. It may sound stupid, but it will really help you learn theory. The most important thing you can do is try to use the wealth of knowledge on the internet.
__________________

A Pokemon that is discriminated!
Support squirtle and make it everyone's favourite.
Reply With Quote
  #4   Link to this post, but load the entire thread.  
Old November 10th, 2015 (12:25 PM).
Oloolooloo!'s Avatar
Oloolooloo! Oloolooloo! is offline
 
Join Date: Feb 2015
Posts: 77
Quote:
Originally Posted by Touched View Post
Sniped to save space
Super duper uber wooper thanks for this. It looks like you took the same amount of time I did taking these notes to help me, and I can't express how much I appreciate that. This not only cleared up a few points of confusion, but also helps me adjust my learning style. I'll take your advice to start looking up jargin and maybe rekindle my programming.
__________________
In your self-improvement efforts, strive for perfection, knowing you'll never make it. But you'll damn surely hit high if you aim high.
- Gene Duncan, U.S. Marine Corps (Retired)
Reply With Quote
  #5   Link to this post, but load the entire thread.  
Old November 10th, 2015 (1:08 PM).
Deokishisu's Avatar
Deokishisu Deokishisu is offline
Mr. Magius
 
Join Date: Feb 2006
Location: If I'm online, it's a safe bet I'm at a computer.
Gender: Male
Nature: Relaxed
Posts: 984
Quote:
Originally Posted by Touched View Post
Cut for brevity. Summary: I'm Touched and I'm pretty awesome, let me use my ASMagix to drop some knowledge on y'all.
As an aside, this cleared up a lot of my misconceptions as well, and I've read your tutorials! You may want to incorporate some of this stuff into the tutorial actually, as it's really gold. Thanks for taking the time to answer Oloolooloo!'s question.
Reply With Quote
  #6   Link to this post, but load the entire thread.  
Old November 10th, 2015 (5:27 PM).
Blah's Avatar
Blah Blah is offline
Free supporter
 
Join Date: Jan 2013
Location: Unknown Island
Gender: Male
Posts: 1,924
Quote:
Originally Posted by Touched View Post
I'll give this a go.

Hopefully I've cleared some stuff up, but I hope I've shown you that you really need to start looking up terminology. It might help if you started trying to learn to program more on the side. Maybe try more Python, then pick up C. It may sound stupid, but it will really help you learn theory. The most important thing you can do is try to use the wealth of knowledge on the internet.
I agree with everything here except the bold parts. I don't think knowing Python will help you learn ASM at all, Python is very high level and there is too much abstraction to even see what's going on in a low level (You rarely, if ever, deal with pointers). Also, it doesn't help with the low level tricks we use either. C on the other hand is more related, as instructions can be converted directly. However, I don't think learning C -> ASM is as good either, the opposite seems better to me. There's nothing wrong learning ASM as a first language, I've helped people learn ASM who haven't had prior programming knowledge. After you compile your first routine, you sorta get going. douevencompilebro

--
@Os and Ls guy

As Touched suggested, I too suggest you try and learn some of the lingo. "What is a pointer?", "What is a table?", "What is hex?", "What is a bit/byte/word/dword (also related hword and word can mean different things depending on context)?".

If you know the answers to these questions, the next step is to realize that ASM is extremely low level, you're manipulating memory addresses directly, rather than through an object interface, or through high level function calls. Read some of the tutorials which are local to PC. I see you've read HackMew's tutorial as your starter tutorial, I don't exactly recommend that because on top of being old, it's actually not even the easiest tutorial here. Have you read some of my tutorials? I see you've read the "How to insert ASM" one, but that is definitely not what you were looking for. That tutorial is just how to insert already written ASM (meant for the leechers at the ASM resource thread). I've written a few more, give those a go.

Don't try to get everything in one sitting, take it as it comes, do little by little, there's no rush. I'm glad you made a thread when you were confused, rather than just giving up like most people. These kinds of threads are rather rare, so it makes me happy to see one :)
__________________
...
Reply With Quote
  #7   Link to this post, but load the entire thread.  
Old November 12th, 2015 (3:23 PM).
Oloolooloo!'s Avatar
Oloolooloo! Oloolooloo! is offline
 
Join Date: Feb 2015
Posts: 77
Bit of an epilogue to show I'm using your advice. I'm going through Touched's ASM tutorial and looking up computer terms as I go. I'm in chapter 2 now, here's a snippet of my notes.

Spoiler:
Program Counter (PC) = Current line of code, specifically where the current line of code is. Technically, the address of the current line of code.

Subroutine = Lines of code making, for lack of a better vocabulary, a very simple program. Like a mini-program. A program inside another program.

Nested Subroutine = A subroutine inside another subroutine. In other words, a program inside a program inside another program. Wouldn't it be awesome if computers were simple?

Code:
mov r0, #3
mov r1, #1

push {r0}
move r0, r1
pop {r1}
1. First we set the values of r0 and r1 using mov. (r0 = 3, r1 = 1)
2. Next, we back up the value of r0 on the stack. (the stack now contains 3, r0 contains 3 somewhere else. 3 is "backed up")
3. mov r0, r1 COPIES r1 onto r1. At this point the original value of r0 is lost. This is half the swap. (r0 = 1, r1 = 1, the stack contains 3)
4. Now, we get the original value of r0 back. Instead of restoring it back onto r0 (this would put us back where we started), we pop it onto r1. (the stack is cut n' pasted onto r1. r1 = 3, r0 = 1, stack is empty)
5. The swap is now complete. r1 is now 3 and r0 is now 1.

Quote:
Push and pop are very useful, however they can be a bit confusing since a lot of their operation is hidden. Whenever we push, we simply decrement (decrease) the stack pointer by 4, and then store the value of the register at the new pointer.
- This means the pointer to the stack is moving all over the place. Also explains why ASM needs to be inserted on a multiple of 4; try to move the stack off a multiple of 4 and the GBA goes "LOL WUT", panics, and kills itself. The robot revolution will be short lived.

Code:
@ push {r0} equivalent
sub sp, #4 SUBtract 4 from Stack Pointer (SP)
str r0, [sp] STore 32 bit data from Register 0 onto Stack Pointer (Sp) 

@ pop {r0} equivalent
add sp, #4 ADD 4 from Stack Pointer (SP)
ldr r0, [sp] LoaD 32 bit data from Register 0 onto Stack Pointer (SP)
Quote:
Link register is used to keep track of return location when calling lines of code. Whenever a BL instruction is encountered, LR (Link Register) is automatically set to PC (Program Counter) + 4. Since BL is 4 bytes long, PC + 4 is the next instruction.
I get how BL being 4 bytes long makes sense; push and pop move the stack 4 bytes. But what exactly is BL?


I'm learning. Thank you.
__________________
In your self-improvement efforts, strive for perfection, knowing you'll never make it. But you'll damn surely hit high if you aim high.
- Gene Duncan, U.S. Marine Corps (Retired)
Reply With Quote
  #8   Link to this post, but load the entire thread.  
Old November 13th, 2015 (1:14 AM).
Touched's Avatar
Touched Touched is offline
Resident ASMAGICIAN
 
Join Date: Jul 2014
Gender: Male
Posts: 625
It's good that you're learning, but let me just correct a few things here.

Quote:
Originally Posted by Oloolooloo! View Post
This means the pointer to the stack is moving all over the place. Also explains why ASM needs to be inserted on a multiple of 4; try to move the stack off a multiple of 4 and the GBA goes "LOL WUT", panics, and kills itself. The robot revolution will be short lived.
The alignment of the stack does not have to do with the alignment of the code.

THUMB code technically only needs to be aligned to an offset that is a multiple of 2 (2 byte alignment/.align 1). The reason most people align it to 4 bytes is because a literal pool needs 4 byte alignment. Rather than confuse noobs with the distinction, we just say all ASM needs that alignment. The reason the literal pool needs that sort of alignment is because of operations code like:

Code:
.align 2
.thumb

ldr r0, SOME_VALUE
bx lr

@ Here comes the literal pool
.align 2 
SOME_VALUE: .word 0xDEADBEEF
Basically, LDR tells the CPU to load 0xDEADBEEF into r0. But how does it do that in 2 bytes? Well, what actually happens is that this is a PC-relative load. It works out the distance between ldr r0, SOME_VALUE and SOME_VALUE: .word 0xDEADBEEF and then tells the CPU that it will find the value to place in r0 at PC + that distance. However, for performance and space reasons, "that distance" must be a multiple of 4. This is because the value saved will actually be DISTANCE/4. The CPU will then load the distance, multiply it by 4, add to PC, then go to that location and load the word at that address. PC relative loads can only load words. The CPU requires word alignment when reading words, so this is why you need 4 byte alignment (a word is 4 bytes in the GBA).

The reason THUMB code needs to be 2 byte aligned is that each opcode is 2 bytes, and when reading halfwords the CPU require halfword alignment.

Quote:
Originally Posted by Oloolooloo! View Post
I get how BL being 4 bytes long makes sense; push and pop move the stack 4 bytes. But what exactly is BL?
BL is a mnemonic for Branch with Link, which means that it branches to an address (sets PC) and sets the return location (LR) so that the subroutine can move back to the opcode after the BL (PC + 4) when it has completed its work. Bear in mind that this being 4 bytes wide has nothing to do with the stack. The fact that it is 4 bytes wide is it needs more space to store the address to branch to.

Hopefully I haven't confused you :P
__________________

A Pokemon that is discriminated!
Support squirtle and make it everyone's favourite.
Reply With Quote
  #9   Link to this post, but load the entire thread.  
Old November 14th, 2015 (9:11 AM). Edited November 14th, 2015 by Oloolooloo!.
Oloolooloo!'s Avatar
Oloolooloo! Oloolooloo! is offline
 
Join Date: Feb 2015
Posts: 77
Quote:
Originally Posted by Touched View Post
Hopefully I haven't confused you
Because I am a horrible person, I'm going to crush your hopes. I'll repeat everything back for clarity.

Quote:
Originally Posted by Touched View Post
The alignment of the stack does not have to do with the alignment of the code.

THUMB code technically only needs to be aligned to an offset that is a multiple of 2 (2 byte alignment/.align 1). The reason most people align it to 4 bytes is because a literal pool needs 4 byte alignment. Rather than confuse noobs with the distinction, we just say all ASM needs that alignment. The reason the literal pool needs that sort of alignment is because of operations code like...
Some of the terminology goes over my head, but I think I understand. You can align your code to 2 bytes, but if you try and use a literal pool (AKA a place that usually stores addresses before they're loaded onto a register), things get weird. The CPU tries to save space and memory by dividing the address by 4, and if it isn't easily divisible by 4 then bad thingies happen.

Quote:
Originally Posted by Touched View Post
BL is a mnemonic for Branch with Link, which means that it branches to an address (sets PC) and sets the return location (LR) so that the subroutine can move back to the opcode after the BL (PC + 4) when it has completed its work. Bear in mind that this being 4 bytes wide has nothing to do with the stack. The fact that it is 4 bytes wide is it needs more space to store the address to branch to.
So branch with link means jumping around in the code. You have to
  1. Tell the gameboy where you are currently.
  2. Tell the gameboy where you want to go.
Here's my point of confusion.
Quote:
Originally Posted by Touched's ASM tutorial
When the subroutine is done execution, it returns execution back to the calling routine by setting PC = LR. This is done in a number of ways:
Do you mean this:
Quote:
Originally Posted by Touched's ASM tutorial
When the subroutine is done execution, it returns execution back to the calling routine by setting PC = LR. You can set PC = LR in a number of ways:
Or this:
Quote:
Originally Posted by Touched's ASM tutorial
When the subroutine is done execution, it returns execution back to the calling routine by setting PC = LR. You can return execution back to the calling routine in a number of ways:
Side note: this is my current understanding of how a line of code is read:
  1. The CPU reads Program Counter, aka PC. PC is an address to a line of code.
  2. The CPU then reads the code at PC's address.
  3. PC is then automatically changed so it points to the next line of code.
  4. Repeat.
If you edit PC's address, you start reading a different line of code. That's what BL does. It automatically sets the Link Register to the current address (technically it sets Link Register to current address + 4, but it acts the same as the current address. I'll cross that bridge later). I start getting confused at this point. Could I get an example piece of code showing a Branch with Link and showing what values are at each register during each line of code? Like this:
Code:
mov r0, #3 (r0 = 3) 
mov r1, #1 (r0 = 3, r1 = 1)  

push {r0} (r0 = 3, r1 = 1, stack = 3) 
move r0, r1 (r0 = 1, r1 = 1, stack = 3)
pop {r1} (r0 = 1, r1 = 3, stack is empty)
P.S. I'm not actually a horrible person. Usually.
__________________
In your self-improvement efforts, strive for perfection, knowing you'll never make it. But you'll damn surely hit high if you aim high.
- Gene Duncan, U.S. Marine Corps (Retired)
Reply With Quote
  #10   Link to this post, but load the entire thread.  
Old November 15th, 2015 (1:08 PM).
Touched's Avatar
Touched Touched is offline
Resident ASMAGICIAN
 
Join Date: Jul 2014
Gender: Male
Posts: 625
Quote:
Originally Posted by Oloolooloo! View Post
Here's my point of confusion.
Do you mean this ... Or this ...
I mean both? Setting PC = LR and returning are mostly equivalent. The only time they're not is when you're returning to ARM code, as that can only be done with a BX in THUMB.

Quote:
Originally Posted by Oloolooloo! View Post
Side note: this is my current understanding of how a line of code is read:
  1. The CPU reads Program Counter, aka PC. PC is an address to a line of code.
  2. The CPU then reads the code at PC's address.
  3. PC is then automatically changed so it points to the next line of code.
  4. Repeat.
Yeah, there is more to it, but that is a simplistic understanding of the process. When you say "line of code" you should rather talk about "instructions" or "opcodes", as there is no concept of source code on the machine level.

Quote:
Originally Posted by Oloolooloo! View Post
If you edit PC's address, you start reading a different line of code. That's what BL does. It automatically sets the Link Register to the current address (technically it sets Link Register to current address + 4, but it acts the same as the current address. I'll cross that bridge later). I start getting confused at this point. Could I get an example piece of code showing a Branch with Link and showing what values are at each register during each line of code? Like this:
Code:
main: @ Pretend PC = 0 here, LR = X (we don't actually care about it, it points to the instruction after the call to the "main" function)
bl func_a @ PC = 2. This will set LR = 6, and PC = address of func_a
bl func_b @ Return from func_a, PC is now 6. This will set PC = func_b and LR = 10
bx lr @ PC is now 10. This will set PC = LR, making PC = X.
@ After this bx lr we will be at whatever called main.

func_a: @ LR can be either 6 (called by main) or func_b+6 (when called by func_b)
bx lr @ Set PC = LR, returning.

func_b:
push {lr} @ Put LR on the stack (10)
bl func_a @ PC is now func_b+2, this will set PC=func_a and LR=func_b+6
pop {pc} @ Pop value on LR onto PC, effectively doing PC = 10. You need to push LR onto the stack when modifying LR in the code, as was done by the BL directly above.
The reason we do LR = PC+4 is because at a BL, PC is equal to the address of the BL instruction. Since a BL instruction is two opcodes (4 bytes) wide, we do PC+4 to get the address of the instruction directly after the BL (as shown above) and set LR to it, because that is the part of the code you want to go back to. If you just did LR = PC it would continue to call the same function over and over again since it would never get past the BL instruction.
__________________

A Pokemon that is discriminated!
Support squirtle and make it everyone's favourite.
Reply With Quote
Reply

Quick Reply

Join the conversation!

Create an account to post a reply in this thread, participate in other discussions, and more!

Create a PokéCommunity Account
Ad Content

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off

Forum Jump


All times are GMT -8. The time now is 9:10 AM.