Perhaps 3 sets of 3 layers would be an alternative. Each set corresponds to a level the player walks on, and contains 3 tile layers a la RMXP. I suppose the player would be inserted between the middle and top layers for the player's current level, which the map-maker needs to change via some kind of eventing at user-defined points for the sake of bridges, etc. Tile passabilities are considered only for tiles in the bottom and middle layers only for the player's current level and below. Oh, and a single "layer" of events, perhaps with each event's level being definable for the sake of making them appear above/below certain layers (their functionality could also depend on the player's current level if necessary, to allow effects depending on the current level), but importantly there's just one layer for events rather than one per level - it makes working with them easier.
Just a suggestion. I haven't thought about it in too much detail so I don't know about all the ramifications.
I presume in your system that a tile's passability is defined at the tileset, like what RMXP does. Priority doesn't need to be defined, though, of course - that information is now defined by the layer the tile is in.
For example, I choose the 6th tile in the second tileset. The first tileset has 40 tiles in it, so the tile ID saved in the map data (the "cumulative tile ID") is 46. In-game, the renderer sees that 46 is greater than the length of the first tileset, so it subtracts that length from the ID and checks the next tileset for that ID (now 6).
Or you could literally create a new tileset on the fly in-game made up of the component tilesets, and use that for map drawing instead. It means each map ends up using just one tileset graphic. I believe Minecraft does this nowadays (or will once the next version comes out).
I don't know how autotiles would work, though.