• Just a reminder that providing specifics on, sharing links to, or naming websites where ROMs can be accessed is against the rules. If your post has any of this information it will be removed.
  • Ever thought it'd be cool to have your art, writing, or challenge runs featured on PokéCommunity? Click here for info - we'd love to spotlight your work!
  • Which Pokémon Masters protagonist do you like most? Let us know by casting a vote in our Masters favorite protagonist poll here!
  • Red, Hilda, Paxton, or Kellyn - which Pokémon protagonist is your favorite? Let us know by voting in our poll!
  • Welcome to PokéCommunity! Register now and join one of the best fan communities on the 'net to talk Pokémon and more! We are not affiliated with The Pokémon Company or Nintendo.

[ASM & Hex] [Pokemon Emerald + C] Indexing arrays with a non-const variable leads to undefined behavior?

  • 14
    Posts
    12
    Years
    • Seen Oct 4, 2020
    Hello. I'm working on a ROM hack of Pokemon Emerald and I'm trying to add a few custom battle types. I'm actually a lot more comfortable writing code in C than ASM, so that's what I've been doing. I'm using this GitHub repository as a template (can't post links yet, sorry: https://github.com/EternalCode/Empty-Template), obviously replacing BPRE.ld with a BPEE.ld composed of various references I've been able to find and dig up around the Internet.

    The problem is as described in the title. The application here is that I have an array of structs stored in a fixed part of the ROM. I want to be able to pass an index to a C/ASM function so that the game knows which element in the array to load data from. It seems to work fine when I use a literal (array[0];) or a const variable (const uint8_t index = 0; array[index];). But non-const variables access the wrong part of memory.

    Consider the following self-contained example:

    example.h

    Spoiler:


    example.c

    Spoiler:


    example_script.rbc

    Spoiler:


    Now here's what the script looks like in-game. Using literals, function1 performs as expected. But function2 returns garbage values, even though we've confirmed that 0x8004 is correctly being set to 0. The visual proof is shown in the attachments. It does the same if manual pointer arithmetic as well (ie &numbers + var_8004 is just as wrong).

    This is obviously really problematic. As the GBA has really piss-poor memory management, it's really easy to read into memory you're not supposed to. In more complicated examples, simply accessing an array as lead to things like the game force-resetting to the emulator crashing, unable to recognize which command to execute next.

    Never in my years of programming C have I ever encountered a system where pointer arithmetic was only valid with constants and literals. This is one of the most basic and fundamental aspects of programming in general so I doubt this is just how the system is "supposed" to behave. There has to be something stupid I'm not seeing.

    For your convenience, I'm also attaching the ASM gcc generates for function1 and function2. They seem fine and valid to me, but my C is way better than my ASM so maybe there's a problem you can spot that I haven't.

    Spoiler:


    Thanks in advance!
     

    Attachments

    • [PokeCommunity.com] [Pokemon Emerald + C] Indexing arrays with a non-const variable leads to undefined behavior?
      example2.png
      8.5 KB · Views: 10
    • [PokeCommunity.com] [Pokemon Emerald + C] Indexing arrays with a non-const variable leads to undefined behavior?
      exmaple1.PNG
      8.4 KB · Views: 10
    Not at all an answer to your question, but obviously the ASM that GCC generates for function1 is using #100 and #200, i.e. it has inlined the values from the array. Perhaps it would be worth trying to adapt that code until you find the breaking point?

    That said, I've never encountered a problem with indexing. The ASM for function2 looks reasonable to me. My guess is that your numbers array isn't being populated correctly in the ROM (or perhaps it isn't aligned? But I can't see how GCC would decide to violate alignment like that). If you view its address in a hex editor do you see the numbers you're expecting?
     
    That's actually somewhat helpful to point out. I had thought the issue was something like "non-const variables don't work for indexing arrays", but the only time the arrays are ever really indexed are with non-const variables. It could be that indexing in general is messed up for some reason, and const variables only work because the compiler literally bypasses this step.

    The debug says "numbers" was inserted at 0x08E61F94. In the ROM the hex values look like

    Spoiler:


    Which seems correct.
     
    I decided to run through the code again and annotated it line by line to make sure that it's really doing what you'd expect, and I came to the same conclusion that assembler is correct. Go figure the professionals over at the GNU project know what they're doing. I thought maybe it's somehow being inserted incorrectly, so I used a HEX to ASM converter to see if the code in the ROM matches the code in the assembler. It came back with this:

    Spoiler:


    Which, surprise surprise, is exactly like what the assembler gives us, minus the pretty labels for everything. Though I will say towards the bottom of the section in HEX, I do see a few pointers, Chiefly:

    Spoiler:


    This is on a different build than what I was using to demonstrate in the original post. This is the end of another function, not function2. Still, I've annotated what these pointers are supposed to be and a few look suspicious. Furthermore, I can't actually find a pointer to where "numbers" was inserted anywhere in my ROM, at least according to HxD (numbers in this index was inserted and confirmed at 0x08E3D068 and a pointer to that value does not exist).

    Here's what the pointers at the end of function2 look like:

    Spoiler:


    So the current working theory is that somewhere between compilation and insertion, the address for whatever array I want to index is stored in HEX wrong. This causes the address call to be wrong and a bunch of other issues. I'm going to investigate this lead and see what turns up. In the meantime, I would appreciate continued from anyone who has experienced this kind of thing before
     
    The issue is resolved as of now. As you can see I never really post here though I guess that might change, so I'm not sure if I'm supposed to mark as solved or if a mod will.

    A description of what happens for those who run into this problem in the future.

    The GitHub repository was made for FireRed, which assumes looking for freespace starting at 0x08800000. It's actually unsafe to insert code there (in Emerald) for a variety of reasons, so I use the script to insert code in the ROM much later. Unfortunately, the script compiles your code as if you're inserting at 0x800000, regardless of what you set the insertion offset to be in reality. This means any pointer to your custom variables will be incorrect.

    So in my example, I ended up getting the "numbers" array inserted at 0x08E3D068. But in code this was written as a pointer to 0x08800104. Which I'm sure would've been valid if I had inserted at 0x800000, but I didn't.

    There's probably a way to fix this in the insertion script. Otherwise I'll make sure to manually go back in and fix the pointers to variables with a HEX editor before I test.
     
    I have. For the time being, I had started this project as a ROM hack and I think that's how I'm going to finish it. When I do a second project that's a bit more complicated, I'll start with the decomp for sure. Obviously with pokeemerald you don't get weird issues like this one, but it's a learning curve in its own right
     
    Back
    Top