• Our software update is now concluded. You will need to reset your password to log in. In order to do this, you will have to click "Log in" in the top right corner and then "Forgot your password?".
  • Welcome to PokéCommunity! Register now and join one of the best fan communities on the 'net to talk Pokémon and more! We are not affiliated with The Pokémon Company or Nintendo.

Development: Gen III Split Disassembly

Shiny Quagsire

I'm Still Alive, Elsewhere
697
Posts
14
Years
Greets all,

So for the last few days I noticed a significant number of new hacks coming up which were based on iimarckus's Pokemon Red disassembly. After a bit of discussion on #GoGo, Touched, Pia, and I decided to create the first Gen III disassembly for use with new ROM Hacks.

What is a Disassembly?
A disassembly, for those who are unaware, is a full in-assembly source file which can be compiled into a ROM of choice. In the case of iimarckus's disassembly, it's a full split disassembly which, when compiled, results in a Pokemon Red ROM. Disassemblies are useful in that they can assist in making ASM hacking and the addition of more resources easier, without having to worry about free space.

Why a Disassembly?
As I already mentioned, having the full source (in ASM) of the entire game in one file can make it incredibly easy to do various ASM hacking. However, once all the resources are properly split and stripped of their static offsets, all the advantages of a split disassembly start to pour in. Free from the original restrictions of offsets, you can easily add and remove code from the original engine and have all the resources reallocate around it. Freespace is managed by the compiler, as opposed to manual fragmented freespace management. Resources can be changed and manipulated as files instead of offsets, and in the case of iimarckus's disassembly, tools can even take editable PNGs and automatically manage and insert them throughout the ROM.

Pros and Cons of a Disassembly
As with most ASM hacks and fun things, there's always some key things to know. The advantages I outlined above make managing and creating hacks easier than ever. However, by breaking the resources of their offset-based bounds, compatibility with all tools is broken. Yes, all tools. This can be a huge dealbreaker for many, especially in terms of scripting, map making, or even music inserting. The solution to this, however, is for tool makers to adapt to the split disassembly method, and with that, the entire community. However, some tools are eliminated in some cases, like freespace finders or sprite editors, which can be done at compile time instead.

Where can I get this "Disassembly"?
As of today, there is one out of the 3 main Gen III engines disassembled, and none which are completely split (meaning it's pretty much worse than a ROM at this point). The Fire Red disassembly, based on knizz's IDB, can be found on my GitHub here. As of now it compiles to an exact ROM of Fire Red, but it requires large binary blobs from an original ROM to work. However, now that there is a source file to go off of, people can assist in breaking down all the resources of Fire Red into separate files, and slowly but surely, removing the need for binary blobs altogether. A split disassembly of Emerald is planned next, and after that perhaps a split disassembly of Ruby (or German Debug Ruby) of someone feels so inclined to do so.

How can I Help?
At the moment, with the disassembly successfully compiling into a proper Fire Red ROM, the main task is to start separating resources into separate files. A full list of offsets for the Fire Red split disassembly can be found in newsyms.sym (the file which links binary offsets to the source ASM), as well as in knizz's split disassembly. A lot of resources may have pointers which are not contained in the actual code (ie pointers in the resources to other resources) so frequent testing is key to maintaining the ROM and keeping the ROM from changing. Cleanup would help as well, especially in the 0x1E0000ish region where the Nintendo inbuilt library resides, which uses a lot of ARM code which doesn't usually play friendly with IDA Pro.

Questions, comments, concerns? Feel free to talk about this below. A split disassembly is definitely a step forward in my mind, but I'd definitely like to hear your take on it.

GitHub Link: DisFire, DisEmerald (coming soon), DisRuby (maybe but probably not coming soon)
 

ShyRayq

Unprofessional Unprofessional
1,856
Posts
16
Years
  • Seen Apr 2, 2024
Wow, this is something big. I can't believe that you're starting this massive project. I wish you and any other helpers the best of luck. I bet when this is finished, this will change hacking in a big way.
 

Shiny Quagsire

I'm Still Alive, Elsewhere
697
Posts
14
Years
So it seems that the entire first binary blob needed (located between the main engine and Nintendo's library) is pretty much made up of just scripts. So using MEH, I generated a list of every single script in the game, coming out to a total of 2148 unique scripts. I'll probably end up doing some automagic conversion of these files to ASM or scripts using SEA or something else. The main issue to tackle is turning the static offsets of strings and whatnot into proper relative names. In addition to that, I'll probably need to make up a sort of Script -> ASM markup into SEA so that the scripts are editable when it comes time to do that. I'm thinking maybe something similar to mid2agb where it's all assembly, but it uses some extra bits to make it more script-like.

For those curious, the list of scripts in the game:
Spoiler:
 

MrDollSteak

Formerly known as 11bayerf1
858
Posts
15
Years
I don't know if this is useful at all, but I've started work at identifying a few routines, and saying what new abilities and items could be added to these checks, this doesn't really take into account how to add these routines to the dissassembly but its just something to think about. I'd be glad to give you all the routines I have so that you can work on implementing them into the calcs. I think putting it in directly like this is great as it saves free space from branches into free space in the long run and risks less bugs.

Routines

Spoiler:


I'll either update this section or make new posts detailing new routines once I can be bothered to document them.
I'd also consider including lots of the fixes or features that hackers have found, such as the PSS, JPAN's Save Block Hack, Hackmew's Pokedex Fix, Running Shoes fix etc.
 
Last edited:

Touched

Resident ASMAGICIAN
625
Posts
9
Years
  • Age 122
  • Seen Feb 1, 2018
Congratulations on getting FireRed to assemble. I neglected to mention something to you on the IRC that is probably the reason for your crashes. Knizz's IDB is not a perfect disassembly of FireRed. He changed some locations to pure ASCII (such as the intro, some menu items, etc.) for readability reasons. I don't think he anticipated anyone using his IDB to create a split disassembly. This, of course, means that strings are null terminated instead of ~0 terminated, meaning that the strcpy and other string related routines will break.

Anyway, I've started work on the (IDA Python) scripts that we spoke about. Here is the one you suggested. It starts at the address of the cursor in IDA, and prompts you for an end address. It then jumps to the next aligned unexplored region and attempts to identify it as code. If it succeeds, it tries to make it a subroutine.

Just go File -> Script File to run it.
Spoiler:
 

knizz

192
Posts
16
Years
  • Seen Oct 28, 2020
… I neglected to mention something to you on the IRC that is probably the reason for your crashes. Knizz's IDB is not a perfect disassembly of FireRed. He changed some locations to pure ASCII (such as the intro, some menu items, etc.) for readability reasons. I don't think he anticipated anyone using his IDB to create a split disassembly.

With the following idc script you can revert any selection to it's original form:
Code:
#include <idc.idc>
static main(void) {
        auto ptr, end, k;
        ptr=SelStart(); if(ptr==BADADDR) return;
        end=SelEnd(); if (end==BADADDR) return;
        while(ptr<end) {
                PatchByte(ptr, GetOriginalByte(ptr));
                ptr++;
        }
}
 

Shiny Quagsire

I'm Still Alive, Elsewhere
697
Posts
14
Years
So I've been working on splitting apart the scripts in the ROM, and I almost have it done.However, there were some issues which caused certain areas of the ROM to be decompiled improperly, as well as some "mysterious" data areas which were seemingly neither scripts nor strings. After some closer inspection I found that they were actually Japanese strings leftover from the initial translation. For anyone curious, a full list of the Japanese texts and their offsets can be found here. I found it very interesting that these happened to be in the ROM in the first place, since you'd think they'd remove them during translation. Google Translate can give a somewhat decent idea of what they say but frankly it's really horrible at translating. As an example, this section:

Code:
この あいだ やまおくで
きんのたまを ひろい ましてね!
つかえない しなもの ですが
うったら なんと 5000$でした

was Google Translated to this:
Code:
This Aida mountains 
I much less broad balls of gold! 
Although it is an article which can not be used 
It was a whopping $5000 Uttara

With the actual translation (by GameFreak) being:
Code:
A NUGGET is totally useless.
So I sold it for $5000.

Granted, this isn't a perfect translation on GameFreak's part (as in, word for word), but it does show that a lot of these strings are just duplicates of already existing translated strings.
 

daniilS

busy trying to do stuff not done yet
409
Posts
10
Years
  • Age 24
  • Seen Jan 29, 2024
So I've been working on splitting apart the scripts in the ROM, and I almost have it done.However, there were some issues which caused certain areas of the ROM to be decompiled improperly, as well as some "mysterious" data areas which were seemingly neither scripts nor strings. After some closer inspection I found that they were actually Japanese strings leftover from the initial translation. For anyone curious, a full list of the Japanese texts and their offsets can be found here. I found it very interesting that these happened to be in the ROM in the first place, since you'd think they'd remove them during translation. Google Translate can give a somewhat decent idea of what they say but frankly it's really horrible at translating. As an example, this section:

Code:
この あいだ やまおくで
きんのたまを ひろい ましてね!
つかえない しなもの ですが
うったら なんと 5000$でした

was Google Translated to this:
Code:
This Aida mountains 
I much less broad balls of gold! 
Although it is an article which can not be used 
It was a whopping $5000 Uttara

With the actual translation (by GameFreak) being:
Code:
A NUGGET is totally useless.
So I sold it for $5000.

Granted, this isn't a perfect translation on GameFreak's part (as in, word for word), but it does show that a lot of these strings are just duplicates of already existing translated strings.

The translation issue is because the games only use kana. I have tried replacing it with kanji, which gives:
この間山奥で
きんのたまを拾いましてね!
使えない品物ですが
売ったらなんと5000$でした
Translated by Google:
During this time in the mountains
I have picked up balls of gold!
Although it is an article which can not be used
It was a whopping $ 5000 if you sell
which is fairly decent
 

U.Flame

Maker of Short Games
1,326
Posts
15
Years
Yes! The Pokemon hacking community is taking the first steps of evolving into advanced hacking the likes of which the Sonic hacking community has been! Once the methods are perfected and tools adapted, just think of what's possible, what the future has in store for our fan games! I'm so excited! I wish I had any helpful knowledge, but all I can do is cheer you on and offer emotional support. That and work on my own unrelated experiments.
 

IceGod64

In the Lost & Found bin!
624
Posts
15
Years
Thank goodness this is FINALLY taking off; I've tried ROM hacking for many different games in the past, and it's really a complete mess. I really hope to see this progress. There's just so much room for expansion once this gets to a more hackable state. I will attempt to download and look into it as well once I'm able.
 
416
Posts
11
Years
  • Age 35
  • Seen Feb 10, 2024
once its ripped, we canmake a full IDE to write scripts asm code and even map, similar to Visual studios... Then wed be set lol.
 
4
Posts
9
Years
  • Age 34
  • Seen Oct 17, 2014
I'd try to help out. I've done some disassembly of Gen 1, and I know how to extract tables and code from the ROM. I really prefer ASM-level stuff anyways. It might take longer, but you have complete control over the outcome and you're only limited by the console itself.
 

Shiny Quagsire

I'm Still Alive, Elsewhere
697
Posts
14
Years
Heyo all,

First off, long time no update (about a month it seems), but do not worry. I've been (somewhat) busy working on this and a few other small projects, and I'm finally done with a major part of the ROM: scripts. Yep, pretty much every part of the first binary block has been picked apart by SEA's new Linear Script Decompiler and formatted into a readable, organized, and compilable ASM file. However, a small block of battle scripts has yet to be decompiled, and remains purposely stuck as a string until I feel like adding battle script support to my command database and SEA itself. However, I feel that the battle scripts will be a bit easier to work with than normal scripts because they aren't as... loopy. And by loopy I mean completely jumbled up with all sorts of different data like strings, movement data, mart data, level scripts, japanese strings, etc.

Basically, at this point every script in the game is decompiled, and can be viewed and edited. That being said, the best thing anyone could probably do to help is to find parts in the file which are improperly decompiles. This basically consists of scripts being missing or forced to a string. If you find a string which looks like gibberish (or, a script which looks like gibberish) you can just post it here or post an issue on GitHub. In most cases, the gibberish scripts are actually Japanese strings which happen to get picked up as scripts due to the large amount of values which normally don't occur in english strings. And if a string looks like gibberish, it's either misinterpreted movement/mart/level script data or it's a script which got forced as a string for some reason. I will say that all data after (I think) 0x1C68F4 is battle scripts forced as strings, so those will look like complete garbage, and they should look like garbage.

In terms of what to do next, I'll probably start by going down the next binary blob and converting everything to data down there. MEH will likely become crucial in getting map data converted, so some work will likely get done to make sure I have every bit of data read from MEH.
 
30
Posts
19
Years
  • Seen Apr 21, 2020
So, I dunno if this is useful for this, but let me know--long ago, I reverse-engineered the sound engine. By this, I mean...I have the actual m4a sound engine from the GBA dev kit, and essentially reverse engineered the Pokemon sound data in to files this engine can compile back again, including all music and sound effects from the entire Gen 3 series--think of it as essentially having the entire audio source code of the games available to you.

Again, I don't know if that will help this project, though it does compile to assembly output, so do let me know if it will be of any value and I can start getting it in to a condition anyone other than myself could use :P this is a really awesome project and I am happy to see it happening!

So, I dunno if this is useful for this, but let me know--long ago, I reverse-engineered the sound engine. By this, I mean...I have the actual m4a sound engine from the GBA dev kit, and essentially reverse engineered the Pokemon sound data in to files this engine can compile back again, including all music and sound effects from the entire Gen 3 series--think of it as essentially having the entire audio source code of the games available to you.

Again, I don't know if that will help this project, though it does compile to assembly output, so do let me know if it will be of any value and I can start getting it in to a condition anyone other than myself could use :P this is a really awesome project and I am happy to see it happening!
 

Shiny Quagsire

I'm Still Alive, Elsewhere
697
Posts
14
Years
So, I dunno if this is useful for this, but let me know--long ago, I reverse-engineered the sound engine. By this, I mean...I have the actual m4a sound engine from the GBA dev kit, and essentially reverse engineered the Pokemon sound data in to files this engine can compile back again, including all music and sound effects from the entire Gen 3 series--think of it as essentially having the entire audio source code of the games available to you.

Again, I don't know if that will help this project, though it does compile to assembly output, so do let me know if it will be of any value and I can start getting it in to a condition anyone other than myself could use :P this is a really awesome project and I am happy to see it happening!

That would actually help out a ton. I wouldn't be able to use any official GBA SDK code though if that's what you mean, but source files always help out a ton, especially for sounds.
 
30
Posts
19
Years
  • Seen Apr 21, 2020
ah yeah, totally. I'll try and package it up with a quick little explanation and send it your way a bit later then!
 

Shiny Quagsire

I'm Still Alive, Elsewhere
697
Posts
14
Years
Greets all,

I was talking with Sufflejoy on the IRC and I decided to add a few example commits of how to commit data in .bins vs .asms. Basically, if the data is a table, struct, should be able to be read out, or contains any pointer whatsoever, it needs to be committed as a .asm file and .include'd (or put within firered.asm itself to be separated later). Large binary data like bitmaps, LZ77 compressed data, and other bits should be committed as a .bin, or if it has accompanying data (ie for fonts, font widths) a .asm with the accompanying data and a .bin for the blob. If you need an example for this, see here. Examples for .bin and .asm commits can be found here and here respectively. The disassembly is based on knizz's IDB, so if you want to commit get that set up for referencing and grabbing data lengths and such.
 
Back
Top