Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Use UTF8MB4 everywhere #7938

Open
3 tasks done
Sesquipedalian opened this issue Dec 4, 2023 · 7 comments · May be fixed by #8425
Open
3 tasks done

Use UTF8MB4 everywhere #7938

Sesquipedalian opened this issue Dec 4, 2023 · 7 comments · May be fixed by #8425
Assignees
Labels
Charset/Encoding UTF8 & mb4 encoding related issues Database
Milestone

Comments

@Sesquipedalian
Copy link
Member

Sesquipedalian commented Dec 4, 2023

  • Upgrade all MySQL tables to use utf8mb4.
  • Get rid of Utils::$context['utf8']
  • Change Utils::fixUtf8mb4() to encode or decode as appropriate based on database. (This will allow us to update old data on the fly.)
@Sesquipedalian Sesquipedalian added Database Charset/Encoding UTF8 & mb4 encoding related issues labels Dec 4, 2023
@Sesquipedalian Sesquipedalian added this to the 3.0 Alpha 2 milestone Dec 4, 2023
@live627
Copy link
Contributor

live627 commented Dec 4, 2023

ref #6409

@jdarwood007
Copy link
Member

The work on this will depend on the upgrader/installer logic. We have plans for that overhaul, so we need to get that in place before we can even get a PR for the upgrade logic.

@Sesquipedalian
Copy link
Member Author

Sesquipedalian commented Dec 4, 2023

That's a good point. I'll adjust the milestones in our internal roadmap. There isn't a specific issue for the upgrader and installer improvements here on GitHub yet, but basically I'm going to move the "Installer and upgrader improvements" item from Alpha 3 to Alpha 2.

@sbulen
Copy link
Contributor

sbulen commented Dec 6, 2023

Note that some of the DB changes in #6409 aren't required if the DB meets some minimum requirements, e.g., tables must be InnoDB & must have a row_format that is not COMPACT or REDUNDANT.

These DB constraints are a problem for DBs created prior to mysql 5.7 and just migrated forward. Note the 2.1 upgrader did not change or address these.

So... It is quite likely we have a lot of MyISAM 2.1 DBs out there. Or, even if InnoDB, COMPACT rows will create a problem. This is explained in the writeup for #6409.

Note also that if the innodb_default_row_format in a table is COMPACT or REDUNDANT, the table would need to be rebuilt before converting to MB4. (EDIT: An ALTER TABLE to change the row format should be sufficient...)

The approach in #6409 was to modify some of the indexes, to sidestep these constraints, so the conversion would be successful no matter what the engine & row format.

@sbulen
Copy link
Contributor

sbulen commented Dec 7, 2023

Bottom line is that either db changes are needed (191s on the indexes), or, engine (innodb) & row format (DYNAMIC) upgrader steps.

At the very least, a check/error to get folks to do that first manually.

@Sesquipedalian
Copy link
Member Author

Thanks, @sbulen. That's helpful.

@sbulen
Copy link
Contributor

sbulen commented Dec 8, 2023

FWIW, I'd suggest going all InnoDB & DYNANIC. Greater consistency & more modern across the board.

One other thing to think about... Entity conversion. Today utf8 mb4 chars are entity encoded.

The whole point here is to not have to do that...

At upgrade time??? Under database maintenance as well???

Search impacts here, too, as words may be based on entity-encoded words. #6405

@Sesquipedalian Sesquipedalian self-assigned this Jun 29, 2024
@Sesquipedalian Sesquipedalian linked a pull request Jan 30, 2025 that will close this issue
# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
Charset/Encoding UTF8 & mb4 encoding related issues Database
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants