• Our software update is now concluded. You will need to reset your password to log in. In order to do this, you will have to click "Log in" in the top right corner and then "Forgot your password?".
  • Welcome to PokéCommunity! Register now and join one of the best fan communities on the 'net to talk Pokémon and more! We are not affiliated with The Pokémon Company or Nintendo.

The Nuts & Bolts of ROM Hacking

Alexander Nicholi

what do you know about computing?
5,500
Posts
14
Years
  • So. You've sat down, looked at a ROM hack or two and decided to try it out for yourself. You want to ROM hack. You see countless tools and tutorials and resources on tPC, but are they really what you're looking for? Friend, there is so much more to ROM hacking than programs and tools, and even how-tos!

    This tutorial aims at covering that general purpose nitty-gritty knowledge some of you walk in lacking. Questions like, "what is a pointer?" "Can someone explain endianness?" and "How do I find things not already documented?" will be answered here.

    It's dangerous to go alone, so be sure to take these before you start:
    1. A hex editor. HxD is a good choice, but if you want table file support you'll have to look at Hexecute
    2. Some general tools to start off and get your feet wet with are Advance Map (get both 1.92 and 1.95), XSE, Advance Trainer, and a hex editor + the forums here wouldn't hurt. :)
    3. If you want to hack more easily, go with either FireRed or Emerald. Here's why.

    What is "hexadecimal"?

    Hexadecimal is a counting base, for numbers. What you call decimal is base 10, while hexadecimal is base 16. Octal is base 8 but we won't be using that anyway (it's rarely seen outside of UNIX file systems). Hexadecimal, as I'm sure you could tell, utilizes 0 through 9 and also A through F as digits.

    If you want to make conversions between binary, octal, decimal, and hexadecimal, Windows Calculator is your friend. While the NT 6.1+ versions has a new UI separating science, programming, and statistics, versions as far back as Windows 3.1 support base conversions so you shouldn't have to worry. Just enter the numbers and switch modes, and it'll convert :)

    Another important thing to remember is how hex is written, to avoid confusion with decimals. Hexadecimal numbers are usually prefixed with 0x or sometimes &H or $ (depending on purpose) to show they are in fact specifically hexadecimal. It's commonplace for the first two to be used for numbers and $ for offsets, but it doesn't matter all that much.

    What are "bits," "bytes," and "words"?

    A bit is the smallest possible unit of information (there's some study suggesting otherwise, but that's no matter here), stored as an electrical signal on some circuit. That circuit could be a hard drive platter, NAND module in an SSD, or inside a volatile circuit in an SDRAM module... or any number of other things. It means only one of two things, in the most abstract sense: on, or off.

    A byte is a larger unit of information, consisting of 8 bits strung together little endian (which we'll discuss later). It is most common to look at bytes in hexadecimal as one byte completely and exactly fills up two hexadecimal digits. Words are groups of bytes, whose size is usually the same as the memory bus width of the CPU (in this case, with the GBA, 32 bits, or 4 bytes). Working with bytes and words makes programming a lot more human.

    Thus far we've only really covered a bit of number theory to do with programming. While that's all great and wonderful, we obviously need some application of these numbers, right? Every program has a flow to it - a logical "circuit". It's impossible to include everything you want to do in a program inline in that circuit, correct? This is where pointers come in.

    What is a "pointer"? And more

    A pointer is a short block of data that (quite literally) points to data stored somewhere else. While modern PCs usually bear a 64-bit memory bus, older PCs and our own GBA have a 32-bit bus width for addressing data. The GBA also uses memory-mapped I/O, which essentially means all of the game's hardware is mapped to RAM. Still, the GBA has its I/O mapped in a very peculiar way, with different parts of the hardware in different "banks", so we'll see.

    It's important you understand the difference between a pointer and an offset. An offset is simply a location represented the number of bytes from the start of the ROM it sits at – it is commonly formatted like $30AAC or $114C9D8. The GBA handles pointers in a completely different manner than what we see in a hex editor.

    The offset $ACC864 as a pointer for the GBA would be 0x08ACC864. The 08 at the beginning tells the CPU that we're reading from the first 16MiB of the ROM – say you used 09 and you'd be addressing the second 16MiB, with an extended ROM. However there is no 0A or such as the ROM's maximum readable capacity is 32MiB, just so you know.

    But the CPU isn't going to read 0x08ACC864 like that, no no!

    "Big Endian" and "Little Endian"

    We humans usually read numbers in what's termed big endian – meaning the biggest numbers are read first. However, computers don't exactly work like that, and here's why. Physically speaking, the way computers feed binary data is a lot like vertical text to us, so it's neither left-to-right or right-to-left like we normally read. At the most minute level, bits—pieces of information representing either true or false—are fed into the CPU with the smallest numbers first, opposite of us. And when you think about it our brains actually work in this little endian fashion anyway, simply converting the numbers we see in our heads into the language it understands.

    Let's take it back to reality for a moment. Say you had a number, $C05022. To the computer, the bits making up each byte are already in little endian, though you probably won't see it. To the processor, it would be natural for the bytes to be in little endian as well, right? How would it make sense for the bits to be little endian and the bytes be big endian, just for our eyes?

    If we put that offset into a GBA ROM as a pointer, it would be 08 C0 50 22. But the processor reads in little endian, so we have to convert the numbers into little endian for the CPU to read them, making it 22 50 C0 08. I know that may look a bit confusing – something doesn't look right. Since we read in big endian the hex editor converts each byte from the computer's native little endian into the big endian that makes sense in our heads. But the hex editor isn't aware of what we're doing beyond that so it can only do that conversion abstractly, see? To the computer both the bits and bytes are now little endian, we just only see the bytes in little endian because we're reading the numbers as humans.

    Again, this is all great and wonderful in the abstract, but Alex, how can I use this knowledge?

    Well, I'm sure you know how nowadays there's a tool for everything. You don't ever have to touch a hex editor— wrong! You are limiting yourself if you can't open up a hex or image editor and find data to edit. Here's the basics of how you can do that.

    How do I change things myself?

    Unfortunately there will never be a tool for everything (if there is, God help us all). Sometimes you have to do things on your own! This is where the R&D forum on PC comes in handy. This is where disassemblies shine, and all the little offsets people have collected over the years come to use. You can plug them in and do what you need to do, and be like "wow, that wasn't that painful at all!" and heck, you may even grow to like that kind of ROM editing. The truth is, that's all there used to be for the longest time. You used to have Advance Map and a script editor if you were lucky, and you hoped to God they didn't break on you with how much of their work you did in hex. It's still that way in other fandoms, even. Pokémon is just a bit big, lol.

    There are a few tools you'll want in your DIY endeavours. The first is a sturdy and stable hex editor such as HxD, and a way to hex edit with table file support. [ At the moment that's a delimma of mine as Goldfinger's interface is absurd, Hexecute is always finding an excuse to crash, and Thingy (including 32) also has a ridonkulous UI. Anyone with a fix let me know. ] The second thing you'll want is NSE 2.X, and unLZ GBA + NTME. These tools are all indispensable in GBA ROM hacking.

    There is a general procedure to executing a modification:
    1. Figure out what it is you want to change
    2. Search your documents, PC's TT&R section, PC's R&D section, other sites like PHO/DataCrystal, and Google – in that order for potential documentation about your hack
    3. If documentation is found, use as necessary
    4. If no documentation or insufficient documentation is collected, use the the VBA-SDL-H debugger to find a starting point where your hack's original may exist in the game, and trace back pointers and such through the debugger until you find the data you need.
    5. Once done, edit the routines/data as needed, changing pointers and such to reflect your new hack, and test.

    If you wish to implement something entirely new, you need to first check whether it's been done before (either here on PC or elsewhere), and if not use the above reverse-engineering model to find the needed hooks for your assembly routine that does what you wish for it to do.

    If that didn't make a whole lot of sense, I suggest you read the different assembly tutorials done by Shiny Quagsire, FBI Agent, and the like.


    Welp, that's about it for this tutorial. Hopefully this'll help out a lot of those who are new to ROM hacking in understanding a lot of the often-forgotten basics.
     

    Blah

    Free supporter
    1,924
    Posts
    11
    Years
  • Hey Alex, this is great! However, I'm thinking that it may be a little too complex?

    If I don't know what a bit/byte/word is it's very likely, I wouldn't be able to understand the presented information. The hardware tangent should be avoided for the abstract idea. Something like:

    A bit can be considered as a representation of a binary digit. It has two states, either on or off, 0 or 1. When you combine one or more bits, you can create a number. Consider two bits "1,1". Here both bits are turned on, and in binary "11" is the decimal number "3". Like this, you can create any number in decimal using binary, and as such, it can be represented in bits in circuitry.

    A byte is just a pairing of eight bits. ...

    [continue awesome explanation]

    Another thing is the explanation of big endian and little endian. The word "biggest" is a little ambiguous, because in the offset provided "08" isn't the biggest number. However, it is the most significant byte in big endian. Perhaps changing "biggest" to "significant" would clear this potential misunderstanding up.

    Finally, VBA-SDL-H should only be used to backtrace ASM related things. If I want to trace an image, using a hex editor/ VBA's SWI logs would be better. If VBA-SDL-H were used I'd have to set a break upon read at the offset of the image, which implies I already know where the image is.

    Otherwise, a nice guide :)
     

    Touched

    Resident ASMAGICIAN
    625
    Posts
    9
    Years
    • Age 122
    • Seen Feb 1, 2018
    Hey man. This is great, I just have issues with some of the terminology:

    A byte is a larger unit of information, consisting of 8 bits strung together little endian (which we'll discuss later).

    ...
    How would it make sense for the bits to be little endian and the bytes be big endian, just for our eyes?

    Bits don't have endianess, only bytes do. Endianess refers only to the byte order in words, not to how the bits are strung together. I get that you were probably trying to explain bit numbering, but I feel that needs a separate section rather than being lumped in with endianess, which is just confusing terminology.
     

    Alexander Nicholi

    what do you know about computing?
    5,500
    Posts
    14
    Years
  • Hey man. This is great, I just have issues with some of the terminology:



    Bits don't have endianess, only bytes do. Endianess refers only to the byte order in words, not to how the bits are strung together. I get that you were probably trying to explain bit numbering, but I feel that needs a separate section rather than being lumped in with endianess, which is just confusing terminology.
    I meant the bits within the bytes, and then later the bytes within the word. Sorry, I need to figure out how to clarify that :P

    That's how daniilS explained it to me in IRC.
     
    Back
    Top