Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

MariaDB vs MySQL adaptations #206

Merged
merged 2 commits into from
Aug 9, 2017
Merged

Conversation

ryanrath
Copy link
Contributor

@ryanrath ryanrath commented Aug 4, 2017

Description

  • Due to the inability of MySQL to handle UTF-8 -> latin1 coercion ( which
    mariadb apparently has no problem with ). We cast all values that are provided
    as strings to 'BINARY' during comparison with column values that are stored as
    latin1 ( Which are also cast to BINARY ). This was done for the following files:
    • acl-config
    • create_public_user.sql

Motivation and Context

When executing these scripts against a MySQL box the queries involved errored out against a MySQL db but not against a MariaDB instance.

Tests performed

Manual Testing:

  • select a query that is currently failing due to illegal mix of collations error.
  • execute: SELECT COLLATION('dfsdf');
    • this will tell you what collation is being used for string literals for your db.
  • execute: SELECT table_schema, table_name, column_name, character_set_name, collation_name FROM information_schema.columns WHERE table_schema IN ('<your_schema_here>') AND table_name IN ('<your table_name_here>') ORDER BY table_schema, table_name,ordinal_position;
    - This will tell what encoding / collation is being used for the columns in table_schema.table_name
    • If the collation for the string text is different than that of the columns being compared to then you will need to do some casting wherever the selected query does anything like:
      • table.column = 'textual data' and changing it to
      • BINARY table.column = BINARY 'textual data'

Types of changes

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to change)

Checklist:

  • My code follows the code style of this project as found in the CONTRIBUTING document.
  • I have added tests to cover my changes.
  • All new and existing tests passed.

- Due to the inability of MySQL to handle UTF-8 -> latin1 coercion ( which
  mariadb apparently has no problem with ). We cast all values that are provided
  as strings to 'BINARY' during comparison with column values that are stored as
  latin1. This was done for the following files:
    - acl-config
    - create_public_user.sql
- Making sure that BINARY is used only where necessary.
@ryanrath ryanrath requested review from plessbd, tyearke and smgallo August 8, 2017 15:35
Copy link
Contributor

@tyearke tyearke left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Discussing it with @ryanrath, this seems like the least risky way of solving the problem. It works so long as:

  1. The character set of one string is a subset of the character set of the other string, and
  2. Both character sets use the same binary representations for shared characters.

Since I can only ever foresee us using latin1, utf8, or utf8mb4, this should continue to work even if we change character set and collation settings in the future.

EDIT: Only the first 128 characters of latin1 - the ASCII characters - are binary compatible with utf8 and utf8mb4. As long as we stick to those characters for this purpose before we switch to UTF-8, we're fine.

@smgallo
Copy link
Contributor

smgallo commented Aug 8, 2017

I agree, any change should be to utf8mb4. We have encountered similar issues ingesting data from Postgres which supports 4-byte utf8 characters by default and ended up converting to latin1 as a temporary solution.

@ryanrath ryanrath merged commit eb2e282 into ubccr:xdmod7.0 Aug 9, 2017
@tyearke tyearke added this to the v7.0.0 milestone Aug 14, 2017
@tyearke tyearke added the bug Bugfixes label Aug 14, 2017
ryanrath added a commit to ryanrath/xdmod that referenced this pull request Sep 18, 2017
* MariaDB vs MySQL adaptations

- Due to the inability of MySQL to handle UTF-8 -> latin1 coercion ( which
  mariadb apparently has no problem with ). We cast all values that are provided
  as strings to 'BINARY' during comparison with column values that are stored as
  latin1. This was done for the following files:
    - acl-config
    - create_public_user.sql

* SQL cleanup

- Making sure that BINARY is used only where necessary.
chakrabortyr pushed a commit to chakrabortyr/xdmod that referenced this pull request Oct 17, 2017
* MariaDB vs MySQL adaptations

- Due to the inability of MySQL to handle UTF-8 -> latin1 coercion ( which
  mariadb apparently has no problem with ). We cast all values that are provided
  as strings to 'BINARY' during comparison with column values that are stored as
  latin1. This was done for the following files:
    - acl-config
    - create_public_user.sql

* SQL cleanup

- Making sure that BINARY is used only where necessary.
# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
bug Bugfixes
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants