Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

[ FF7 ] Huge Upside-down Cloud model #295

Closed
DLPB2 opened this issue Oct 27, 2021 · 18 comments
Closed

[ FF7 ] Huge Upside-down Cloud model #295

DLPB2 opened this issue Oct 27, 2021 · 18 comments
Assignees
Labels
bug Something isn't working

Comments

@DLPB2
Copy link

DLPB2 commented Oct 27, 2021

I've tried to find out why this is and I can confirm it is 100% not the field script itself. It could be something that is specific to the 3 fields (bugin1c, md1_1, mtcrl_3) bug it happens on. The only commonality I have found is that the 'field scale value' is different with the fields above. But that may just be a coincidence.

from Reunion Database

""Returning to md1_1 from md1_2 can cause Cloud's model to become huge and upside down. This happens more regularly after a battle. [This bug is confirmed to not be caused by field script. I think it might be some sort of weird memory issue - or something unique to this field compared to others. Currently testing field scale value. - DLPB]

Note: This happens at Mt Corel and in Bugenhagen's observatory also.""

The field scale value (walkmesh area info) is 1024 + for these 3 fields when usually it's 512. Could be relevant but if that's the case we should also see this bug happen on bugin1b and so far we don't. In any case, there's something seriously wrong here. It happens intermittently. Sometimes it can take 100s of entries to the fields above before Cloud's model is corrupted.

Please see the following picture from Ori - tested with FFNx

https://drive.google.com/file/d/18U9OM7yWDznG2C--A0Todox6JcKPcslf/view?usp=sharing

@julianxhokaxhiu julianxhokaxhiu added bug Something isn't working help wanted Extra attention is needed labels Oct 27, 2021
@julianxhokaxhiu julianxhokaxhiu changed the title Upside down cloud model - bug present with ffnx and since aali's original build [ FF7 ] Huge Upside-down Cloud model Oct 27, 2021
@DLPB2
Copy link
Author

DLPB2 commented Oct 28, 2021

I'm currently resolving the issue. From my discord to Myst6re:

The model size is stored as a string - not a number. 1024 is literally 1 0 2 4 in memory [hex 31, 30, 32, 34]. So a function is called to convert this to a number [in this case 1024]. It's among the worst things I've ever seen in my life. It uses a look up table to decide when the string of numbers has ended. 0-9 are assigned value of 0x84 - a check is done for 0x4. If that is found, it will proceed to next number. if not, it decides the number has ended and the function returns the value. It's REALLY bad. What's happening is that in memory 1 0 2 4 is followed by a random number from elsewhere - if one of those values is 0x30-0x39 the game will think there is another number... and add that number too lol.

I don't really know what to say other than it's terrible. This function is used in a lot of other places... so i'd feel a lot more comfortable if we knew it can't fail elsewhere. Still - the way it is at present I think the function expects that your model size strings are terminated with 00
btw the reason more maps aren't affected by this issue is because few models are 1000 or above - if 3 digits, the string is being terminated by 00, eliminating the bug.
00413144 is the place where this function is failing due to not finding the end of the string correctly. The only fix I see is to make the model size function call at 0063E74D a new function that only checks maximum 4 digits. The game only copies 4 bytes, so adding 00 to the end of a 4 digit string with Makou Reactor won't work.

@julianxhokaxhiu
Copy link
Owner

Thank you very much for sharing this precious info. I'll take a look at the sub this weekend and I'll see if I can manage to replace it in a more safe way. But yeah, ugh, what a weird way to handle that...

@DLPB2
Copy link
Author

DLPB2 commented Oct 28, 2021

There's a lot of other stuff going on in there that I'm not too sure about.

Also, I have confirmed that when Makou saves 1,2 or 3 digits as opposed to 4, the remaining digits are set to 00, eliminating the bug. So this bug will only happen if the field model size is 1000 or greater - as that's 4 digits and then you're at the mercy of game memory to decide what the 5th digit will be at runtime.

The game only copies 4 bytes - so a fix will be needed to the function. It may be that the fix only needs making to the model size call at 0063E74D. I'm going to write in a replacement there that checks for 4 digits only.

@DLPB2
Copy link
Author

DLPB2 commented Oct 28, 2021

Here's how I've fixed it at 63E74D and confirmed working. Hopefully it makes sense for you to convert to C. Assuming you don't want to fix the original function entirely but it looks messy.

Function FieldModelSizeStringtoInteger63E74D(offset: LongInt): LongInt; stdcall; //Fix huge inverted model
Var
i: LongInt;
NumString: AnsiString;
Begin

Result:= 0;
NumString:= '';

for i:= 0 to 3 do
begin
if pbyte(offset + i)^ - $30 > 9 then
break;

NumString:= NumString + pChar(offset + i)^;
end;

if NumString <> '' then
Result:= StrtoIntDef(NumString, 0);
end;

@julianxhokaxhiu
Copy link
Owner

julianxhokaxhiu commented Oct 30, 2021

I've investigated a bit the whole thing and it seems the game is using a wrapper function atoi that calls itself atol ( which looks like the native implementation of atol in the compiler ).

I have various idea on how to fix this:

  1. Replace the built-in atoi with the one from MSVC today implementation
  2. Copy the string in a new buffer, add a NULL terminated character and return the converted value

I'll try first with 1 and if somehow it will not be working, I'll fallback to 2. But I think 1 should work well.

Look forward for commit updates on this issue.

@julianxhokaxhiu julianxhokaxhiu self-assigned this Oct 30, 2021
@julianxhokaxhiu julianxhokaxhiu removed the help wanted Extra attention is needed label Oct 30, 2021
@julianxhokaxhiu julianxhokaxhiu added this to the 1.10 milestone Oct 30, 2021
@julianxhokaxhiu
Copy link
Owner

Some updates on the issue.

I've done some tests and checked what the games give and what we output, indeed using the new MSVC std::atoi implementation we can correctly convert the values, and I could catch some really weird cases where atoi behaved correctly now.

Some of them I catched in the logs:

[00001687] TRACE: ff7_atoi: 1024���1��< -> atoi -> 1024
[00001687] TRACE: ff7_atoi: 1024�o5 -> atoi -> 1024
[00001687] TRACE: ff7_atoi: 1024�L���$;fA���p=4���� �2���;� C -> atoi -> 1024
[00001687] TRACE: ff7_atoi: 1024���������^����Qq4�a��!�1��4�q -> atoi -> 1024
[00001687] TRACE: ff7_atoi: 1024S/e/w/ -> atoi -> 1024
[00001687] TRACE: ff7_atoi: 102465
@�&�&. "�������1c0��oo5:�0 -> atoi -> 102465
[00001806] TRACE: ff7_atoi: 1024���1��< -> atoi -> 1024

Feel free to test the latest canary and let me know if you're still able to reproduce the bug.

julianxhokaxhiu added a commit that referenced this issue Oct 30, 2021
julianxhokaxhiu added a commit that referenced this issue Oct 30, 2021
@julianxhokaxhiu julianxhokaxhiu removed this from the 1.10 milestone Oct 31, 2021
@julianxhokaxhiu
Copy link
Owner

Patch dropped. I'll try to find another way.

@julianxhokaxhiu
Copy link
Owner

I reworked the patch. I double checked the load model function and basically, the only safe way to fix this is by replacing only the atoi call in that function and ALWAYS use only the first 4 bytes of the buffer to convert, as inside there's a check that doesn't allow the model size to be bigger than 4096.

@DLPB2
Copy link
Author

DLPB2 commented Nov 1, 2021

Yup that's how I do it too :)

@julianxhokaxhiu
Copy link
Owner

Thank you again for the hint, really appreciated!

@dolkow
Copy link

dolkow commented Oct 2, 2023

I heard about this in a 4-8 Productions video, and reacted to this "worst things" quote:

The model size is stored as a string - not a number. 1024 is literally 1 0 2 4 in memory [hex 31, 30, 32, 34]. So a function is called to convert this to a number [in this case 1024]. It's among the worst things I've ever seen in my life.

Allow me to speak in defense of the developers. :)

It uses a look up table to decide when the string of numbers has ended. 0-9 are assigned value of 0x84 - a check is done for 0x4.

This sounds like there's a table of information about characters, which is queried for the implementation of isdigit(), e.g. something like this implementation that I just made up:

static inline int isdigit(int c) {
    return (char_attributes[c] & CHATTR_DIGIT) != 0;
}

The PSYQ library seems to have done something very similar, but in a macro instead. The +1 is because the C standard mandates that isdigit() handles all valid char values as well as EOF -- it is defined as -1 in that library, and the content of the _ctype_ array is presumably shifted one step up to accomodate this special value.

If that is found, it will proceed to next number. if not, it decides the number has ended and the function returns the value.

Like @julianxhokaxhiu has mentioned above, this just looks like a standard atoi() implementation: convert until there are no more valid digits.

It's REALLY bad. What's happening is that in memory 1 0 2 4 is followed by a random number from elsewhere

Yes, this is where it gets bad. Looks like they've just allocated 4 bytes for this text, regardless of the value. As I understand it, they use 4-digit values very rarely, so that's probably why they didn't notice the problem.

if one of those values is 0x30-0x39 the game will think there is another number... and add that number too lol.

Yup, classic unterminated string problem. But that's really the only issue here; the conversion itself (lookup table + run-until-nondigit) is reasonable, and probably not even implemented by the game developers -- they just call atoi().

Sorry about necro'ing this old post, I just couldn't resist. :)

@DLPB2
Copy link
Author

DLPB2 commented Oct 2, 2023 via email

@dolkow
Copy link

dolkow commented Oct 3, 2023

this issue is because the string was not terminated

I agree.

and the mechanism they used to read it was broken.

I disagree. The mechanism could be a valid (possibly inlined) implementation of atoi, with no inherent problems. My point is: using atoi on a nonterminated string is sufficient to get the described behavior.

Allow me to demonstrate:

#include <stdio.h>
#include <string.h>

#ifndef CUSTOM
#include <stdlib.h>
#else
// imagine this part is in some .h files of the c library

#define EOF (-1)

#define NM (_CHATTR_DIGIT | _CHATTR_HEX | _CHATTR_PRINT)
#define PU (_CHATTR_PRINT | _CHATTR_UPPER)
#define PL (_CHATTR_PRINT | _CHATTR_LOWER)
#define SP (_CHATTR_PRINT | _CHATTR_SPACE)
#define P  (_CHATTR_PRINT)
#define XU (PU | _CHATTR_HEX)
#define XL (PL | _CHATTR_HEX)

static enum { // probably shared, not static, in reality.
    _CHATTR_UPPER = 0x01,
    _CHATTR_LOWER = 0x02,
    _CHATTR_DIGIT = 0x04,
    _CHATTR_HEX   = 0x08,
    _CHATTR_SPACE = 0x10,
    _CHATTR_PRINT = 0x80,
} _charattr[] = {
    0, // for EOF (-1)
    /* 0x00 */  0, 0, 0, 0, 0, 0, 0, 0,
    /* 0x08 */  0,SP,SP, 0, 0, 0, 0, 0,
    /* 0x10 */  0, 0, 0, 0, 0, 0, 0, 0,
    /* 0x18 */  0, 0, 0, 0, 0, 0, 0, 0,
    /* 0x20 */ SP, P, P, P, P, P, P, P,
    /* 0x28 */  P, P, P, P, P, P, P, P,
    /* 0x30 */ NM,NM,NM,NM,NM,NM,NM,NM,
    /* 0x38 */ NM,NM, P, P, P, P, P, P,
    /* 0x40 */  P,XU,XU,XU,XU,XU,XU,PU,
    /* 0x48 */ PU,PU,PU,PU,PU,PU,PU,PU,
    /* 0x50 */ PU,PU,PU,PU,PU,PU,PU,PU,
    /* 0x58 */ PU,PU,PU, P, P, P, P, P,
    /* 0x60 */  P,XL,XL,XL,XL,XL,XL,PL,
    /* 0x68 */ PL,PL,PL,PL,PL,PL,PL,PL,
    /* 0x70 */ PL,PL,PL,PL,PL,PL,PL,PL,
    /* 0x78 */ PL,PL,PL, P, P, P, P, 0,
    /* 0x80 */  0, 0, 0, 0, 0, 0, 0, 0,
    /* 0x88 */  0, 0, 0, 0, 0, 0, 0, 0,
    /* 0x90 */  0, 0, 0, 0, 0, 0, 0, 0,
    /* 0x98 */  0, 0, 0, 0, 0, 0, 0, 0,
    /* 0xa0 */  0, 0, 0, 0, 0, 0, 0, 0,
    /* 0xa8 */  0, 0, 0, 0, 0, 0, 0, 0,
    /* 0xb0 */  0, 0, 0, 0, 0, 0, 0, 0,
    /* 0xb8 */  0, 0, 0, 0, 0, 0, 0, 0,
    /* 0xc0 */  0, 0, 0, 0, 0, 0, 0, 0,
    /* 0xc8 */  0, 0, 0, 0, 0, 0, 0, 0,
    /* 0xd0 */  0, 0, 0, 0, 0, 0, 0, 0,
    /* 0xd8 */  0, 0, 0, 0, 0, 0, 0, 0,
    /* 0xe0 */  0, 0, 0, 0, 0, 0, 0, 0,
    /* 0xe8 */  0, 0, 0, 0, 0, 0, 0, 0,
    /* 0xf0 */  0, 0, 0, 0, 0, 0, 0, 0,
    /* 0xf8 */  0, 0, 0, 0, 0, 0, 0, 0,
};

static inline int isdigit(int c) {
    int flags = _charattr[c+1]; // +1 to handle EOF
    return (flags & _CHATTR_DIGIT) != 0;
}

static inline int atoi(const char* s) {
    int v = 0;
    while (isdigit(*s)) {
        v *= 10;
        v += (*s - '0');
        s++;
    }
    return v;
}
#endif

// imagine this part is in a .c file

__attribute__((noinline))
int loadmodel(const char* sizestr) {
    int size = atoi(sizestr);
    /* blah blah blah */
    return size;
}

struct {
    char my_number[4];
    char morestuff[4];
} container;

int main() {
    memcpy(container.my_number, "1024", 4);
    memcpy(container.morestuff, "7abc", 4);
    printf("%d\n", loadmodel(container.my_number));
    return 0;
}

Compiling this with -DCUSTOM will use my custom implementation of isdigit and atoi, emulating what I believe their C library did. Compiling without that flag will use the normal atoi from standard libc.

This godbolt link shows the assembly and execution result of both.

Note that:

  • My custom implementation ends up doing exactly what you've described: lookup table, bitmask, stops when the relevant bit isn't set.
  • My custom implementation has no visible atoi call (because it's been inlined from either a function or a macro).
  • Both output "10247" -- i.e. my custom implementation matches the behavior of the standard atoi.

So, again, I think the conclusion here should be that the string is unterminated, which causes the problem -- and that's a nice find, kudos! -- but there's nothing inherently wrong with how the game is reading this value.

@julianxhokaxhiu
Copy link
Owner

but there's nothing inherently wrong with how the game is reading this value.

How would you explain then the bug we had to fix?

@dolkow
Copy link

dolkow commented Oct 3, 2023

By the string being unterminated. That's wrong and bad. Using atoi (which in turn uses a lookup table and continues as long as there are digits) is not.

@julianxhokaxhiu
Copy link
Owner

julianxhokaxhiu commented Oct 3, 2023

To be honest I see this discussion as pointless because you're defending the C implementation of atoi like we're saying that's the problem, when we clearly had to fix a bad implementation on the FF7 engine side of things where they used atoi on top of non-null terminated strings, which is the problem we had to fix.

The fact the patch replaces atoi ( or better to say the "inline atoi" ) in the game engine is because is the most common entry point that is used across of many other subs. We're not doing it "because we don't like atoi".

@dolkow
Copy link

dolkow commented Oct 3, 2023

I see this discussion as pointless

Probably, but it's also interesting. :)

you're defending the C implementation of atoi like we're saying that's the problem

I honestly thought you were, in this part:

It's among the worst things I've ever seen in my life. It uses a look up table to decide when the string of numbers has ended. 0-9 are assigned value of 0x84 - a check is done for 0x4. If that is found, it will proceed to next number. if not, it decides the number has ended and the function returns the value. It's REALLY bad.

@dolkow
Copy link

dolkow commented Oct 3, 2023

To be clear, I was just trying to add some information/analysis of the described behavior, which I thought was interesting (and not yet explicitly mentioned here). I'm sorry if it came across as criticism, that was not my intention.

# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants