Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Specify collation on columns #19275

Closed
arrkaye opened this issue Dec 11, 2019 · 20 comments · Fixed by #20602
Closed

Specify collation on columns #19275

arrkaye opened this issue Dec 11, 2019 · 20 comments · Fixed by #20602
Assignees
Labels
closed-fixed The issue has been fixed and is/will be included in the release indicated by the issue milestone. type-enhancement
Milestone

Comments

@arrkaye
Copy link

arrkaye commented Dec 11, 2019

Hi,

This has been raised before and closed, but I would like to emphasis how useful it is.

I want to be able to specify the collation on a column, perhaps through an annotation and have EF Core respect and generate the column as such in the migration.

I have temporarily worked around this by adding some SQL to ALTER the column manually in the migration, but this is a slightly tenuous solution as any subsequent ALTER on the column will reset it back to default.

I posit that setting a column as case-sensitive in the DB is a usual and valid use-case and ideally would like it to be supported in EFCore.

Thank you for considering it.

Ark

@ajcvickers
Copy link
Contributor

@arrkaye Can you point to where this was closed before?

@ajcvickers ajcvickers added this to the Backlog milestone Dec 15, 2019
@roji
Copy link
Member

roji commented Feb 6, 2020

Duplicate of #19275?

@roji roji removed this from the Backlog milestone Feb 6, 2020
@ajcvickers
Copy link
Contributor

@roji This is 19275. :-)

@roji
Copy link
Member

roji commented Feb 6, 2020

Oops, I was referring to #6577 but on second look, that issue is about specifying collation (and other things) on the database at creation time, rather than as on the column here. So not at a dup..

@ajcvickers
Copy link
Contributor

ajcvickers commented Feb 6, 2020

@roji Agreed, but my conclusion from thinking about this last night, backed up by the discussions on the other issue today, is that we need to take a holistic approach to collations.

Ideally we should be able to run efficient queries when the collation and query are compatible, and throw good, provider-specific exception messages for other cases indicating how to get to efficient queries for other cases.

I'll pull some issues out of the milestone and we'll discuss with the team on Friday.

You already did!

@smitpatel
Copy link
Contributor

EF.Collate!

@roji
Copy link
Member

roji commented Feb 7, 2020

EF.Collate!

Thinking about this in general, per-operation collation support (as opposed to column/database collation settings) is a bit scary... Assuming we add this to SqlBinaryExpression (in the extreme case we can make SqlExpression IAnnotatable 😨), preserving these across all transformations is not going to be easy, can see lots of bugs coming.

@smitpatel
Copy link
Contributor

EF.Collate is quite similar to EF.StoreType.

@roji
Copy link
Member

roji commented Feb 7, 2020

Right, but we already have type mapping on SqlExpression, and a whole inference infrastructure for composing expressions with regards to type mapping... But if you think this is simple 😃

@smitpatel
Copy link
Contributor

I am not saying it is simple but we would need to do it regardless ;)

@roji
Copy link
Member

roji commented Feb 7, 2020

Then possibly related: #6717 (comment) :trollface:

@ajcvickers
Copy link
Contributor

Tracking this via #19866 and linked issues.

@roji
Copy link
Member

roji commented Apr 8, 2020

Reopening to track column collation specification.

@roji
Copy link
Member

roji commented Apr 11, 2020

Column specs with both collation and generated:

MySQL

Both orderings work:

CREATE TABLE BothWork
(
    name1 VARCHAR(32) COLLATE utf8mb4_german2_ci AS ('hello'),
    name2 VARCHAR(32) AS ('hello') COLLATE utf8mb4_german2_ci
);

Sqlite

Both orderings work:

CREATE TABLE works (
    id INT,
    name1 TEXT COLLATE NOCASE AS ('hello') STORED,
    name2 TEXT AS ('hello') STORED COLLATE NOCASE
);

PostgreSQL

Both orderings work:

CREATE TABLE works
(
    name1 TEXT COLLATE "en-US-x-icu" GENERATED ALWAYS AS ('hello') STORED,
    name2 TEXT GENERATED ALWAYS AS ('hello') STORED COLLATE "en-US-x-icu"
);

SQL Server

... wait for it... wait for it... Only one ordering works (likely because of SQL Server's peculiar syntax around generated columns).

CREATE TABLE works
(
    id INT,
    name AS 'hello' COLLATE German_PhoneBook_CI_AS
);

CREATE TABLE doesntwork
(
    id INT,
    name1 COLLATE German_PhoneBook_CI_AS AS 'hello'
);

@roji roji changed the title Collation on a column in CodeFirst Specify collation on columns and at the database level Apr 11, 2020
@roji roji changed the title Specify collation on columns and at the database level Specify collation on columns Apr 11, 2020
roji added a commit to roji/efcore that referenced this issue Apr 11, 2020
* Added metadata and migration support for column- and database- level
  collations.
* Scaffolding implemented for SQL Server. Sqlite seems to not be
  reporting the collation in pragma_table_info (need newer version?)

Closes dotnet#19275
Closes dotnet#6577
@roji roji added the closed-fixed The issue has been fixed and is/will be included in the release indicated by the issue milestone. label Apr 11, 2020
roji added a commit to roji/efcore that referenced this issue Apr 11, 2020
* Added metadata and migration support for column- and database- level
  collations.
* Scaffolding implemented for SQL Server. Sqlite seems to not be
  reporting the collation in pragma_table_info (need newer version?)

Closes dotnet#19275
Closes dotnet#6577
@minichma
Copy link

@ajcvickers

Can you point to where this was closed before?

#2779

We're also lacking this feature for quite some time now and almost ran into some serious security issues due to this. We have columns containing cryptographic hashes and IDs encoded in Base64. Due to the lack of support for column collations by EF we modified affected columns within CodeFirst migrations. At a later point in time we applied some other modification to the column and created another CodeFirst migration, this time without manual modification. As EF didn't know about the collation, it created migration code that didn't consider this aspect and we almost moved back to case insensitive collation, opening vulnerabilities for collision attacks.

I think, having base64-encoded fields or other columns where collation matters, is not so uncommon. Please add this feature!

@roji
Copy link
Member

roji commented Apr 12, 2020

@minichma thanks for the issue link.

This feature is being added for 5.0 - a PR is already out, so you should be able to try this in an upcoming preview.

Out of sheer curiosity, any specific reason to store BASE64 in a database text field rather than the binary data directly in a binary/varbinary? That would avoid any text-processing issues (such as collation, but there are others), and also be a more efficient.

@minichma
Copy link

@roji In some cases we see IDs that happen to be composed from base64 string. But regardless of the way they have been composed, they are just plain strings but in contrast to (e.g.) GUIDs, they require case sensitive treatment. The shape of the IDs often has historic reasons and is not something up to the DB designer.

In other cases we see key material where the standard way of storing it is base64 encoded, like it's the case with PEM-encoded keys.

But even when talking about plain byte arrays, storing them string-encoded often has benefits over storing as varbinary, mainly because on C# side string is easier to handle than byte[]. One aspect is, that comparison of byte[] behaves differently when part of an EF query than when executed locally. While a reference comparison locally, it's a value comparison when run against the DB. That's great on the one hand and the reason for this is obvious, but it also makes code less readable, because the reader always has to keep the semantic differences in mind and consider the context of the read code. When dealing with strings, this aspect is somewhat relaxed, but only if collation is treated appropriately for each field.

@roji
Copy link
Member

roji commented Apr 13, 2020

Thanks for the added info @minichma, makes sense!

roji added a commit that referenced this issue Apr 14, 2020
* Added metadata and migration support for column- and database- level
  collations.
* Scaffolding implemented for SQL Server. Sqlite seems to not be
  reporting the collation in pragma_table_info (need newer version?)

Closes #19275
Closes #6577
@ajcvickers ajcvickers modified the milestones: 5.0.0, 5.0.0-preview4 Apr 20, 2020
@ajcvickers ajcvickers modified the milestones: 5.0.0-preview4, 5.0.0 Nov 7, 2020
@marmagni
Copy link

marmagni commented Nov 9, 2020

Here's my solution:

public static PropertyBuilder<string> AsVarchar(this PropertyBuilder<string> propertyBuilder, int? maxLength = null, COLLATE? collate = null)
        {
            var size = maxLength?.ToString() ?? "max";
            var collateType = collate.HasValue ? " COLLATE " + collate.ToString() : "";
            var builder = propertyBuilder.HasColumnType($"varchar({size}){collateType}");

            return builder;
        }

The enum type:

public enum COLLATE
    {
        SQL_LATIN1_GENERAL_CP1_CI_AI
    }

Usage example:

builder.Property(x => x.Name).AsVarchar(200, COLLATE.SQL_LATIN1_GENERAL_CP1_CI_AI)

@roji
Copy link
Member

roji commented Nov 9, 2020

@marmagni in case you're not aware, EF Core 5.0 has 1st-class support for collations, so you don't have to resort to the above; see the docs.

# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
closed-fixed The issue has been fixed and is/will be included in the release indicated by the issue milestone. type-enhancement
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants