Bits and Bytes.

Full Metal · May 7, 2011

About.
Several new Rom-Hackers have trouble understanding things such as bits, bytes and offsets. The goal and intention of this thread is to provide a reference and explanation that makes this easier to understand. Here are a few things you need to know and understand in order to use this thread functionally:

Basic Algebra
Will to learn and understand
A positive attitude

If you feel there needs to be an addition to this thread, don't hesitate to do so, and I will do my best to add it into the first post in a reasonable time.
Index:[a id]index[/a id]

Number Bases.
Bits.
Bytes.
Shorts.
Words.
Pointers.
Arrays.
Structures.
BitWise Operators.
Logic Operators.
Byte Endianness.

Download this tutorial: Here.

[a id]nb[/a id]

Number Bases.
A number base is the amount of possible combinations of characters in a digit. For example, base 10, has 10 possible characters in one digit: 0,1,2,3,4,5,6,7,8,9. Base 2, has 2 different possible values per digit: 0,1. So on and so forth. Number bases are usually assumed, but can be notated with a subscript value.
For example: 101 - You would normally assume a base 10 ( which is the number system most people grow up with now-adays )
But if you put a subscript-2 next to it...
101₂ it becomes equivalent to: 5₁₀
Here is a list of common number systems:

Decimal - base 10.
Binary - base 2.
Octal - base 8.
Hexadecimal (or hex) - base 16.

*Hex numbers are usually notated with a '0x', '&h', '$' preceding the value, as opposed to using a sub-script number to notate the base system.

Back to Top.

[a id]bits[/a id]

Bits.
A bit is the tiniest possible storage unit in modern computing, and uses the binary number system ( base 2 ). This means that it has two possible values: 0, and 1. Let's say you want to know the highest possible value held in X number of bytes.
y = 2^x-1
Using that we can figure out that...
2 bits has a maximum value of: 3.
3 bits has a maximum value of: 7.
And so forth.
Now you can adapt that function to fit other number systems.
y = n^x-1
Where n is the base ( base 2 - binary, base 10 - decimal, base 8 - octal ) and X is the number of digits.

Back to Top.

[a id]bytes[/a id]

Bytes.
A byte consists of 8 bits, and has a maximum value of 0xFF₁₆ ( 255 )
In programming, there is a 'signed' or 'unsigned' byte (or char, if you must ). A signed byte sacrifices the most significant bit as a 'negative' flag. The most significant bit is the bit with the highest place-value. ( Furthest away from the bit with value 1₁₀ ).

For the sake of simplicity, know that in the rest of this document: I will only notate non-decimal numbers, and The most significant bit ( Bit of significance, or BOS ) will be considered on the right 'side' of a number ( (left)100001000110(right) )

The sacrificing of the BOS means that a signed byte only has 15 bits to store the actual number ( y = 2¹⁵-1 ) which effectively cuts the maximum value in half. Unsigned bytes have no such limitations, however negative numbers are not possible in this way.

Back to Top.

[a id]shorts[/a id]

Shorts.

It would be good of you to notice that GBA Thumb Instructions (with 1 exception that I am aware of, long branch with link ) are 16 bits in size.

Shorts ( Or half-words ) are 16 bits long ( Max Val: 0xFFFF ). The same information regarding signed-ness applies to shorts, as bytes.

Back to Top.

[a id]words[/a id]

Words.

It would be good of you to notice that GBA ARM Instructions, and GBA Registers are 32 bits in size. Also notice, on most processors, a WORD is 16 bits, and a DWORD ( double word ) is 32 bits. GBA ARM processor is an exception to this.

Words are 32 bits long ( Max Val: 0xFFFFFFFF ). The same information regarding signed-ness applies to shorts, as bytes and shorts.

Back to Top.

[a id]pointers[/a id]

Pointers.

I borrowed the house-address metaphor from C++ for Dummies, 5th edition.

In the world of programming, there exists a thing known as 'variables'. Variables are a programmers way of storing and holding data. As a programmer, you need more than 16 variables, which means you can't just put your variables in your registers. Instead, the variables are stored into memory ( Usually the RAM ). Now you have the variables in memory, now what? If you want to work with them, you have to know where the variable is at in the memory. Think of a city. A city has many houses, apartments, etc. A city also has a mail-man. Mail-men have letters that belong to houses. Letters contain the address of where it belongs. Think of the city as your memory, containing all the houses and apartments ( variables ). The mail contains the address ( pointer ) of a house ( variable ) so that the Mail-Man ( processor ) can get to the house ( variable ) and deliver the mail ( use the data ). In ROM-hacking, an address is commonly referred to as an offset ( the two are equivalent in actuality, but some people hesitate to make the connection )

Back to Top.

[a id]arrays[/a id]

Arrays.

A c-style string is an array of chars ( bytes ), and the end of the string is notated by a null-byte ( 0 )

Back to our City metaphor. Houses aren't just randomly dispersed in the city ( usually ). They have neighborhoods. Each house is in a nice row, evenly spaced out, and identical, but the internals of the house can vary. Think of an array as a neighborhood. It contains many houses ( variables ), and each variable can hold it's own value.

Back to Top.

[a id]structs[/a id]

Structures.

You'll notice that I explain things using C and C++ terms quite often, I do apologize for those who do not program in the language, but try to bare with me.

In several processors and architectures, registers are generally 32 bits. The processor can only work with processors. So what happens if your variable is larger than 32 bits? What happens, is a struct. Consider this: A file header has a File-Signature ( provides information about the file type e.g. what version, ensures the correct file type, etc ) and then it contains a WORD ( 32 bit integer ). Well, we'll assume that the signature is 32 bits. 32+32 = 64. This means that our FileHeader Variable can not fit inside a register. So what do we do? We take the variables pointer, and use pointer arithmetics. The first part of our variable ( signature ) is 32 bits ( 4 bytes ). So, we add 4 to our pointer because we want the WORD contained in the header, which is what our pointer now points to. You can now work with the WORD contained in the header.

Back to Top.

[a id]bwo[/a id]

BitWise Operators.

A byte is 8 bits, and has a maximum value of 0xFF. A little shortcut for BitWise operations: There are two digits, 4 bits belong to each digit ( when dealing with hexadecimal ). EG: 1111₂ is equal to 0x0F. 1111 1111 is equal to 0xFF. So if you learn how to count to 0xF in binary, you should be good to go, and doing BitWise operators, as well as converting between number bases, inside your head should be a breaze. :)

Bit wise operators are just that. They do things to bits. Move bits, reverse bits, set bits, unset bits, etc. Bit Shifting does not apply to a bit. Instead, Bit shifting applies to a group of bits ( Bytes, Shorts, Words, etc ). To BitShift (BS) a unit, you need to know two things:
The amount of bits to shift, and the direction of the shift.
If you bit shift towards the BOS ( left ), the numerical value of the unit will increase. The opposite is also true.
BS-ing to the left: X << N =(exact) X * 2^N
BS-ing to the right: X >> N =(rounded) X / 2^N
AND operator:
AND-ing, involves two, corresponding bits of two units. IF both of the bits are set ( == 1 ), then the resulting bit is also set ( X = A AND B; X = result, A = Unit 1, B = Unit 2 ). Otherwise, the resulting bit is 0. Unfortunately, I don't know a way to represent this operation with algebra, I'm sorry. In programming ( save for ASM ), the AND operator is represented with the '&' character.
OR operator:
OR-ing, also uses two corresponding bits of two units. IF either bit A, OR bit B is set, then the resulting bit is also set. The only way to get 0 from this operator, is for both bits to be 0. OR-ing is represented with the pipe ( '|' ) character.
XOR ( eXclusive OR operator )
XOR is a bit more complicated than the previous operators, and is somewhat representable in math. 1 XOR 1 = 0. 1 XOR 0 = 1. If BOTH bits are 1, the result is 0. If 1 Bit is one, the result is 1. If BOTH bits are 0, the result is 0.
X XOR Y = C;
C XOR X = Y;
Y XOR C = X;
XOR-ing is represented with a '^' character.
NOT operator:
NOT-ing a bit, is simply reversing it. EG if a bit is set, it becomes unset. If a bit is not set, it becomes set. Typically applied to whole units, but is applicable to a single bit. NOT-ing is often represented by an exclamation point ( '!' ) or a tilde ( '~' [ a C++ destructor reference ] ).

Back to Top.

[a id]lo[/a id]

Logic Operators.

In C++, you signify a destructor with a tilde ( '~' ) followed by the corresponding class name. So in a sense, you're saying NOT X. EG: make X NOT exist. Very clever C++. Very clever.

Without logic, computers would be redundant, at best ( see what I did there? )
Fortunately for us, computer logic is easy to understand. There are a few basic operators you need to know.
X == Y - returns true if X = Y
X <= Y - returns true if X is less than, or equal to Y
X >= Y - returns true if X is greater than, or equal to Y
X < Y - returns true if X is less than Y
X > Y - returns true if X is greater than Y
X != Y - returns true if X is NOT equal to Y
X - returns true if X is NOT 0
!X - returns true if X IS 0.
Take the return value, and IF it is TRUE, then do this. In thumb-ASM, this is what that would look like:

cmp rn,ry @ sets the compares register N, and register Y and sets an appropriate Processor flag ( look them up in gbaTEK )

beq rz @ if ( rn == ry ) goto rZ

Back to Top.

[a id]be[/a id]

Byte Endianness.

This is what pointers look like. :) If you have a pointer to address 0xABCDEF, the value in the hex-editor is 0xEFCDAB. HOWEVER, for the most part pointers that most ROM hackers deal with are pointers into the ROM area, which in the GBA is either 0x08NNNNNN, or 0x09NNNNNN. SO, when you see a pointer with '0x08' or '0x09' appended to it, that's what that means. A pointer to the ROM area 0xABCDEF looks like 0xEFCDAB08 in a hex editor.

Byte endianness refers to what order the bytes are in, in a WORD or DWORD. You write numbers like so: 1234. This is known as "Big Endian". In Big Endian, the BOS ( of the DWORD itself ) is all the way on the right. eg:
10101010 10101010 10101010 10100101
The alternative is "Little Endian", and the bytes are in an opposite order.
The best way for me to explain this is by example.
Big Endian: 0x(12 34 56 78)
Little Endian: 0x(78 56 34 12)
Correct me if I'm wrong those who know, but I believe this is the reasoning to this madness.
This seems a little pointless ( albeit, with modern technology, it kind of is ), but in the past processors were slow and the difference between processing 1 byte and 2 bytes may have been significant. If you have a word, and you want say... 16 bits of it. ( 0x12345678 is what you have. You want 0x1234 ) What you would do is:
u32 value = 0x12345678;
u16* pVal = &value; //u16* is syntax to define a pointer of u16 type.
if you look at *pVal ( what pVal points to ) you will get: 0x1234. Why? Because you took the address of a u32 ( 0x12345678 ) and it is stored in memory like so: 0x78563412. If you process the pointer as a short, you get 0x5678 ( the 16 bits are also "flipped" )

Back to Top.

miksy91 · May 8, 2011

"In the world of programming, there exists a thing known as 'variables'. Variables are a programmers way of storing and holding data. As a programmer, you need more than 16 variables, which means you can't just put your variables in your registers. Instead, the variables are stored into memory ( Usually the RAM )."

Just quoting to this that game doesn't use all of its ROM data all the time.
Instead, to make the game run smoothly, by ASM the game is told to load several parts of the ROM data to the game's RAM where all the data that's going on is stored.

The processor of GameBoy isn't for example capable of running data of 0x200000 bytes (the size of G/S/C) at the same time. Instead, only 0x10000 bytes are used and their values can vary.

esperance · May 8, 2011

This is a very useful resource! The best part is that these pertain not just gba stuff, but also programming as a whole. :)

Full Metal · May 8, 2011

miksy91 said:
"In the world of programming, there exists a thing known as 'variables'. Variables are a programmers way of storing and holding data. As a programmer, you need more than 16 variables, which means you can't just put your variables in your registers. Instead, the variables are stored into memory ( Usually the RAM )."

Just quoting to this that game doesn't use all of its ROM data all the time.
Instead, to make the game run smoothly, by ASM the game is told to load several parts of the ROM data to the game's RAM where all the data that's going on is stored.

The processor of GameBoy isn't for example capable of running data of 0x200000 bytes (the size of G/S/C) at the same time. Instead, only 0x10000 bytes are used and their values can vary.

Sorry, but I'm not quite sure what you're trying to say... ( I get what you're saying, about not being able to process 0x200000 bytes at the same time and all ), could you possibly re-phrase this?

agentgeo said:
This is a very useful resource! The best part is that these pertain not just gba stuff, but also programming as a whole. :)

I'm glad you found it useful, and as always, am completely open to suggestions.
*edit* added bit-wise operators, and logical operators. :) Bon-Apetite~!

Darthatron · May 8, 2011

There was nothing about endianness and the note thingo in the Logic Operators section is totally irrelevant.

And the CSS makes it a bit hard hard to read. Make the text black or something.

It also seems a bit chunky. Try breaking up the paragraphs a bit more.

Full Metal · May 9, 2011

* Sorry, I couldn't think of anything else to put in there, I'll try to think harder today at school. X)
* CSS - fixed the text
* Endianess - I'll also work on this ( in my head ) at school today. :)
* I'll try to keep the next sections smaller and less 'chunky' :) ( I'll de-chunk previous sections this weekend )

Paupir · May 9, 2011

Hey there Full Metal, this really helped me wrap my head around a few things; much appreciated!

I just wanted to clarify one thing though, so that I'm sure. There is a difference in the way a pointer and an offset is formatted, but it's the same "address information" in both?
Could you explain the formatting difference and perhaps state where and/or when either is more useful (if at all)? Thank you kindly :)

Binary · May 9, 2011

I've only skimmed the tutorial. Will read it once I find more time. Quite a bountiful of information though. Yeah, more appropriate paragraphs would've made it easier to read, and perhaps some CODE boxes?

Paupir said:
I just wanted to clarify one thing though, so that I'm sure. There is a difference in the way a pointer and an offset is formatted, but it's the same "address information" in both?
Could you explain the formatting difference and perhaps state where and/or when either is more useful (if at all)? Thank you kindly :)

On the basis of what I've understood, an offset and pointer are equivalent, i.e. they both contain the address, of where the mail must be sent - with reference to the context. The pointer contains the address of the house (variables; a set of commands, I'm assuming). An offset is the same thing, only most ROM hackers use the word offset to determine the address after the pointer has been processed (compiled, in the case of scripting). Correct me if I'm wrong, Full Metal.

Full Metal · May 9, 2011

This is probably something I should have explained.
This is known as byte endian-esse, which I forgot to mention.
I will update the tutorial with this info this weekend. :)

Paupir · May 10, 2011

^ Cool, I look forward to it!

Edit:

Thanks for your explanation by the way, Binary- it helped me understand a little more :P

@Full Metal: It would also be great if you could explain some things about the Register, the RAM and any other kind of memory of significance. I was doing some reading, and it said somewhere that the Register contains addresses to specific data stored in the memory- which fits with your explanations.
However, I have some questions in the context of rom hacking:

1.) Is the actual processing code stored and executed in the Register? And when a variable is needed, a Pointer is used to obtain the information needed from the memory (RAM)?

2.) Does this mean that the memory (RAM) only contains tables of information? If so, could you explain the basic structure of a table?

3.) Back on the topic of pointers: I know you might be getting around to explaining this, but I just wanted to state my ignorance so that you're aware anyway. I was wondering just how much data the pointer takes in one go- and how does it know how much to take? If I'm misunderstanding something basic, forgive me.

Thanks :)

FoggyDoggy · May 10, 2011

Very informative guide for those confused.
I read through it and noticed a hiccup...

Code:

Words are 32 bits long ( Max Val: 0xFFFFFFFF ). The same information regarding signed-ness applies to [B]shorts[/B], as bytes and shorts.

Mean words?

Great though, I wish this was around when I was having problems learning this stuff.

Full Metal · May 10, 2011

Paupir said:
^ Cool, I look forward to it!

Edit:

Thanks for your explanation by the way, Binary- it helped me understand a little more :P

@Full Metal: It would also be great if you could explain some things about the Register, the RAM and any other kind of memory of significance. I was doing some reading, and it said somewhere that the Register contains addresses to specific data stored in the memory- which fits with your explanations.
However, I have some questions in the context of rom hacking:

1.) Is the actual processing code stored and executed in the Register? And when a variable is needed, a Pointer is used to obtain the information needed from the memory (RAM)?

2.) Does this mean that the memory (RAM) only contains tables of information? If so, could you explain the basic structure of a table?

3.) Back on the topic of pointers: I know you might be getting around to explaining this, but I just wanted to state my ignorance so that you're aware anyway. I was wondering just how much data the pointer takes in one go- and how does it know how much to take? If I'm misunderstanding something basic, forgive me.

Thanks :)

1. No, the THUMB / ARM code is never loaded into a register. HOWEVER, PC ( Program Counter ) is used to tell the processor where the next instruction is at. This means, that with instructions such as branch-with-link ( the actual ASM name escapes me at the moment ) stores the location of the branch-with-link instruction + 1 instruction into lr ( Link Register ). What does this mean? It means that you can jump from one area of code to another, and then teleport back again, without having to have all the possible return locations stored in DWORDS. Example:

Function_A:

....

ldr rn, OFFSET

branch-with-link rn

XOR rn,rn @ Nifty trick to set RN to 0, which is quicker than MOV rn,0 btw.

...

function_B: @ This is located at 0xOFFSET

push { lr } @ store it for later

...

pop { pc } @ PC now equals the location of your branch-with-link + 2, which is XOR rn,rn

2. No, memory is not only filled with tables. Tables are generally used for large gross amounts of instances ( 123 ABC street is an instance of an address, for example ). In Pokemon Games specifically, this could be items,pokemon stats, songs, maps, etc. From what I know, there isn't really a 'standard' for tables. Although, I do use 'tables' of sorts in my C++ programming for dynamic memory. I have an array of pointers, and the last pointer is always NULL ( = 0x00000000 ). So basically:
POINTER POINTER POINTER NULL
By using a loop, I know that I have 3 elements to use. ~~( I know it's inefficient, but memory management is a hassle, and what little performance strain there is, it's worth it. )~~

3. I'm not entirely sure I'm reading the question correctly, but here goes.
A pointer doesn't know anything. When code is compiled/assembled/etc, the compiler writes specific instructions telling the processor the size of an element. So that when you call to advance your pointer one element in your code, the code says how much to advance it ( the pointer ).

0m3GA ARS3NAL said:
Very informative guide for those confused.
I read through it and noticed a hiccup...

Code:

Words are 32 bits long ( Max Val: 0xFFFFFFFF ). The same information regarding signed-ness applies to [B]shorts[/B], as bytes and shorts.

Mean words?

Great though, I wish this was around when I was having problems learning this stuff.

You and me both. haha.
Yes, actually. Haha, you have to understand, my mind works in C and C++ ( more or less ). Shorts == Words in C++ ( 16 bits ) and I also had a blonde moment where I completely forgot about DWORDS. :P Again, look forward to an update this coming weekend.

ShadowMrk · May 10, 2011

Very nice tutorial. I'm not well known ,so people will probably disregard my post. Nonetheless, this is a very professional looking tutorial ,and although it didn't help me (because I already know this stuff XD) I'm sure the people that have the dedication and time will benefit greatly from it. Keep up the good work! :D

Full Metal · May 13, 2011

I have updated the tutorial, and added info on Byte Endianess. Enjoy~ ( and also, I'm not entirely certain I got this correct, so please those who know it well, feel free to correct as this area always confuses me a little bit every time X) )

Darthatron · May 14, 2011

Full Metal said:
I have updated the tutorial, and added info on Byte Endianess. Enjoy~ ( and also, I'm not entirely certain I got this correct, so please those who know it well, feel free to correct as this area always confuses me a little bit every time X) )

Just in the little note thing on endianess: "A pointer to the ROM area 0xABCDEF looks like 0xEFCDAB08 in a hex editor." Fix is in bold.

And yeah, that all seems right. Though admittedly, I just skimmed across it. :) Good work.

Full Metal · May 14, 2011

^ fixed the pointer bit as suggested

Also, a download of this tutorial is now available: here.
To those reading this, I'm considering doing one of these on the in's and out's of ASM ( and those of you who know I'm not the best at that, writing tutorials is how I learn, and it lets others profit as well ) but, I would like some ( practical ) suggestions of things I could apply the knowledge to. IE: I need examples related to ( pokemon 3rd gen ) Rom Hacking. VM me those suggestions :)

Johto_legend · May 14, 2011

hey there! this is a very nice tutorial, helped me clear up some things ive been trying to learn. anyways i have a suggestion on what to add. this may be only me but, maybe u can add a part where is shows how to use a hex editor, because even with this knowledge, im not sure how to use it if i dont understand how to use a hex editor.=/

Full Metal · May 15, 2011

You mean as far as actually changing the bytes?
That much is pretty simple, but I guess I can add that.
Just gimme a few minutes, a lot happened while I was gone over saturday. o.O

FoggyDoggy · May 15, 2011

Johto_legend said:
hey there! this is a very nice tutorial, helped me clear up some things ive been trying to learn. anyways i have a suggestion on what to add. this may be only me but, maybe u can add a part where is shows how to use a hex editor, because even with this knowledge, im not sure how to use it if i dont understand how to use a hex editor.=/

Open Hex Editor, Open file, click a byte, change it, viola, you've made a change to the ROM, though it wont help you if you're changing random bytes, you've gotta know what it is you are changing, and that is a different story.

Full Metal · May 15, 2011

^ ha. ha. ha. X)
Maybe I'll just quote you on that.
( ? )
*edit* but in all seriousness, if you don't know how to use your Hex Editor, look up your Editor's documentation and support area. It's there for a reason. :\

Bits and Bytes.

More options

Full Metal

C(++) Developer.

miksy91

Dark Energy is back in action! ;)

esperance

Full Metal

C(++) Developer.

Darthatron

巨大なトロール。

Full Metal

C(++) Developer.

Paupir

Dreamer

Binary

え？

Full Metal

C(++) Developer.

Paupir

Dreamer

FoggyDoggy

Im comin' home...

Full Metal

C(++) Developer.

ShadowMrk

Intangible

Full Metal

C(++) Developer.

Darthatron

巨大なトロール。

Full Metal

C(++) Developer.

Johto_legend

Full Metal

C(++) Developer.

FoggyDoggy

Im comin' home...

Full Metal

C(++) Developer.