-
Notifications
You must be signed in to change notification settings - Fork 256
New issue
Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? # to your account
Uses UTF8MB4 everywhere #8425
base: release-3.0
Are you sure you want to change the base?
Uses UTF8MB4 everywhere #8425
Conversation
Signed-off-by: Jon Stovell <jonstovell@gmail.com>
Signed-off-by: Jon Stovell <jonstovell@gmail.com>
Signed-off-by: Jon Stovell <jonstovell@gmail.com>
Signed-off-by: Jon Stovell <jonstovell@gmail.com>
Signed-off-by: Jon Stovell <jonstovell@gmail.com>
Signed-off-by: Jon Stovell <jonstovell@gmail.com>
Signed-off-by: Jon Stovell <jonstovell@gmail.com>
Signed-off-by: Jon Stovell <jonstovell@gmail.com>
Looks like 90% of this is just removing and hardcoding UTF-8 on everything. InnoDB is a fairly safe conversion. Just may take longer for larger forums on certain tables, but nothing can be avoided in timeout protections for that. It looks good from what I see. |
Signed-off-by: Jon Stovell <jonstovell@gmail.com>
I'll run some test upgrades. Merge conflicts need to be resolved first, though. Or is there a per-requisite PR? |
Signed-off-by: Jon Stovell <jonstovell@gmail.com> # Conflicts: # Sources/Db/APIs/MySQL.php
Great! Thank you. 🙂 Merge conflict has been resolved.
Nope. |
First test was an upgrade of a new, vanilla 2.1.4 forum to 3.0, via CLI. I followed the old 2.1.x protocol, where I would copy the upgrade files over from the /other folder, then run upgrade.php. DB: MySQL, version 8.4.0 Had a few errors, here is the complete output:
|
Okay, so it looks like we have some unrelated upgrader bugs to fix before you can even get to the point of testing the new ConvertUtf8() logic in this PR. Oh, the joys of the upgrader never cease. 😒 |
Looks like 2 things here...
|
Probably the same issues in a different form... But I attempted an install in the same environment. This comes up after entering the DB credentials, etc. Same issues in php 8.3 & 8.4:
|
Same errors occur in unix. |
Signed-off-by: Jon Stovell <jonstovell@gmail.com>
Signed-off-by: Jon Stovell <jonstovell@gmail.com>
3.0 Install 🍻 OK The upgrader wouldn't run in CLI mode. No error, no nothing, immediate exit. In browser mode, I get an error:
If I ignore & attempt to proceed, it immediately WSOD's with:
|
Signed-off-by: Jon Stovell <jonstovell@gmail.com>
98dab02 fixes that.
Hm. I'm not sure about that one yet. I'll look into it next. |
Signed-off-by: Jon Stovell <jonstovell@gmail.com>
Signed-off-by: Jon Stovell <jonstovell@gmail.com>
Signed-off-by: Jon Stovell <jonstovell@gmail.com>
795fdda
to
e4690d3
Compare
Signed-off-by: Jon Stovell <jonstovell@gmail.com>
Signed-off-by: Jon Stovell <jonstovell@gmail.com>
Okay, @sbulen, ready for another round. All upgrader errors introduced since merging from release-3.0 have been fixed according my tests. Here's hoping that it passes your tests, too. |
MySQL 8.4, PHP 8.4.2. 3.0 install was fine. Just attempted a 3.0 => 3.0 upgrade & got the same as before - CLI produces an mmediate exit; when run from the browser you get a "try again" link that doesn't work. Also attempted a 2.1 => 3.0 upgrade & got this (partial log; 2.1 was in utf8mb3):
|
I cannot reproduce that. I've tried both the CLI and in browser repeatedly on an existing 3.0 database. What output do you see if you make the following temporary change in QueryString and then try again using CLI? Find:
Replace:
|
Signed-off-by: Jon Stovell <jonstovell@gmail.com>
Odd. I was also testing 2.1 → 3.0 using a database in utf8mb3, and I never saw that. Still, dc702ae should fix that. |
Signed-off-by: Jon Stovell <jonstovell@gmail.com>
Signed-off-by: Jon Stovell <jonstovell@gmail.com>
Hmmm... I'm still seeing this on a 2.1 => 3.0 upgrade:
|
I believe MySQL is still in a kind of limbo... For DB functions, I don't think we should ever use 'utf8', only 'utf8mb3' and 'utf8mb4'. My understanding is that the very definition of 'utf8' will change. In 'older' versions of MySQL it means 'utf8mb3'. At some point it will mean 'utf8mb4'. I don't know when that cutoff will be. But the point is that, across mysql versions, utf8 is an ambiguous term. I am running MySQL 8.4. I believe that in 8.4, 'utf8' as an alias for 'utf8mb3' is deprecated. This kinda sorta explains that 'utf8' only appears in SHOW statements, etc: |
Signed-off-by: Jon Stovell <jonstovell@gmail.com>
Please try b7db1fe.
The issue we are hitting right now (and that b7db1fe should solve properly) is not with the database but rather with mb_convert_encoding(). Most of the character set names that MySQL uses are recognized by mb_convert_encoding(), but not |
Actually, wait. I have an even better idea than b7db1fe. |
Signed-off-by: Jon Stovell <jonstovell@gmail.com>
Sorry if I'm doing that thing again where I change stuff while you are in the middle of testing. 🤪 |
No prob - I was doing some reading... Just a couple notes on the old logic, since the code doesn't look very familiar to me anymore...
The problem is some of the old MySQL versions were horrible at enforcing charsets. So, there is a lot of utf8 data stuffed into old latin1 DBs. Just running a simple utf8 conversion double-encoded everything. The bounce-it-off-of-binary approach addresses that. Yes, I'm always kicking the posts on Chesterton's fence... |
I don't think I said this clearly above, but this is ready for another round of testing whenever you are, @sbulen. |
I don't think that's an issue. Since database transactions are always atomic, changing the whole table at once whenever possible is actually better and safer. When we do one column at a time using the method where we change the column to a binary encoding and then back, an interruption at an inopportune moment can leave the column sitting there in a binary encoding. If anything, we should probably build more safety checks around the column-based method in the new upgrader code.
I don't think that's accurate. The old logic was inherited from SMF 2.0, when the mbstring extension was not required by SMF and therefore the upgrader couldn't rely on it. Regarding double encoding, all that I found on the matter was a single StackOverflow discussion in which the original poster was trying to do something different than we are (and didn't seem to understand what they were doing very well). Perhaps your searches turned up something mine didn't, though, so if there's something more, please share the link. I am often wrong, after all, and always glad to discover a better understanding. 🙂 |
Signed-off-by: Jon Stovell <jonstovell@gmail.com>
I was just looking over the code again and noticed the spot that you were probably referring to; 14ea6ae should fix it. Again, ready for testing whenever you are. 🙂 |
Fixes #7938
Fixes #7173
Closes #6409
Closes #6406