fix(NODE-3803): Fix _id typing on collection create operations #3077

baileympearson · 2021-12-13T18:47:09Z

Description

This PR improves type inference around the _id fields for schemas used by the Collection class.

What is changing?

This PR updates the type inference that extracts the type of the _id field from schemas to

be easier to understand
correctly handle the case where an optional _id is passed in

See https://jira.mongodb.org/browse/NODE-3761.

This PR also updates the parameter types for write operations on the collection class to match the following behavior (discussed in this slack thread)

_id not defined on collection means the default ObjectId will be used, and won't be required on inserts;
_id defined as required on a collection means that the user will provide the _id on every insert
_id defined as optional means that the user is expecting to rely on a pkFactory method to generate a consistent id

Is there new documentation needed for these changes?

Yes (I think)

What is the motivation for this change?

Improved typing of the _id field on schemas results in a better developer experience for typescript consumers of the node driver.

Double check the following

Ran npm run check:lint script
Self-review completed using the steps outlined here
PR title follows the correct format: <type>(NODE-xxxx)<!>: <description>
Changes are covered by tests

- Update the inferType helper type to correctly infer optional _id types - Update the OptionalId helper to make the _id optional in all cases - Create a new helper to require _id if the _id is defined on the schema

src/mongo_types.ts

test/types/community/collection/insertX.test-d.ts

- Remove union type for optional _id fields - Add tests for insertX operations to test custom defined _id types

test/types/schema_helpers.test-d.ts

src/mongo_types.ts

test/types/community/collection/filterQuery.test-d.ts

dariakp

LGTM, thanks!

Changes addressed

octogonz · 2022-02-20T03:23:50Z

@baileympearson @dariakp I don't understand the intended API usage after this change.

Consider this example:

old API usage (before this PR)

interface Book {
  _id: ObjectId;
  title: string
}

const books: Collection<Book> = db.collection('books');

// Let's add a book. Normally books must have an _id, 
// but during creation the service will generate the _id for us,
// so we omit that field for this particular operation:
const newBook: WithoutId<Book> = {
  title: 'My Book'
};
await books.insertOne(newBook);

// Later if I retrieve the book, its _id field is present:
const foundBook: Book | null = await books.findOne({ title: 'My Book' });
if (foundBook) {
  console.log('The book's ID is: ' + foundBook._id);
}

The above code worked fine prior to this PR. However after upgrading the mongodb package, it seems that now we are supposed to change the interface to have an optional _id:

new API usage (after this PR)

interface Book {
  // "_id defined as optional means that the user is expecting to rely on 
  // a pkFactory method to generate a consistent id"
  _id?: ObjectId; 
  title: string
}

But then the type of foundBook._id becomes ObjectId | undefined. This requires me to check for undefined whenever I try to access it.

This doesn't make sense. The foundBook._id cannot ever be undefined, so why should I need these checks? The only situation where _id might be missing is the partially constructed input for APIs like insertOne().

Am I misunderstanding something? (Maybe we just need to expand the unit tests to better indicate the intended usage for this API.)

dariakp · 2022-02-22T16:18:00Z

@octogonz Thanks for reaching out. The intended usage for the API is to not specify the _id on the collection at all if you are intending to rely on the automatic _id insertion. The _id should only be specified (as optional or required) if you are intending to manage it manually. Please see my comment here for more detail about the reasoning behind this implementation and the use cases we are trying to support. As far as the _id coming back as ObjectId | undefined, it should not do that on findOne since findOne always returns a WithId<TSchema> where the _id will always be defined.

octogonz · 2022-02-23T05:11:34Z

@dariakp Thanks for explaining. I've removed the _id field from all our interfaces as suggested.

However the result is a bit awkward: Basically everywhere that used to say e.g. Book now must say WithId<Book>. If before it said Promise<Book[]>, now it needs to be Promise<WithId<Book>[]>. If before it said Map<string, Partial<Book>>, now it has to be Map<string, Partial<WithId<Book>>>.

In practice, this effect cascades across the code base, because you never know where a function may need to read the _id field of a Book.

And in theory, "book" objects always have an ID. So it seems odd to declare Book without an _id, and to complicate the 99% of non-insertion scenarios, versus for example doing WithoutId<> for the occasional call to insertOne().

I wonder if you might reconsider this API design choice.

dariakp · 2022-02-23T16:32:58Z

@octogonz We do appreciate the concern here, but as I highlighted in the linked comment:

We are aware that this may be different from the intuitive notion of a collection document type; unfortunately, since it is impossible to reverse engineer the desired insert shape from the final document shape, only vice versa, we cannot use the final shape of the document to fully define the collection for all driver operations.

The purpose of the collection type is to inform the driver of the expected document shape across all supported operations, hence this is currently the only viable api choice that would allow us to correctly assert _id types on inserts. To mitigate the transition pain, you could keep the _id on Book type and pass WithoutId<Book> when specifying the collection type, which should reduce the amount of code changes required overall.

justinmchase · 2022-07-08T21:47:00Z

This seems like a regression to me, its very frustrating. I'm reading the thought process behind it and its not good logic.

...since it is impossible to reverse engineer the desired insert shape from the final document shape...

    insertOne(doc: OptionalId<TSchema>): Promise<InsertOneResult<TSchema>>;

Boom done.

I need to reference the _id field all over my code so I need it to be non-optional on my schema. But now I'm stuck, unable to insert documents with the .insertOne(book) api because of this regression, unless I go down the path that @octogonz described, where I have to wrap everything with the WithId<Book> interface, its brutal.

Did you all test this out on a real code base before implementing it? I can't really see how this choice would have gotten past the prototype phase if it had actually been applied to a codebase even a single time.

This whole notion that you shouldn't require it to be defined on your schemas because you're using auto id's on creation seems to completely overlook the fact that you'll need to reference the _id on the schema still.

My workaround is to just cast it when inserting, even though that sucks too for different reasons:

const book = WithoutId<Book> {
  title: 'How to Design APIs'
} 
await books.insertOne(book as Book)

Please reconsider the comments of @octogonz because he's 100% right and this feels like a regression, what was there before was better.

justinmchase · 2022-07-08T21:51:01Z

Looking at the code in this PR in more detail, I would just recommend straight up reverting it.

dariakp · 2022-07-08T23:48:03Z

Please see reply here: #3100 (comment)

tl;dr There are users who need the id to be required on inserts. What you pass in as the collection type is the insert shape, and regardless of the insert shape, finds will always return a document with a correctly typed required id field, therefore there is no need to cast anything in ANY of the individual methods (read or write) if you accurately specify the id as optional in the collection insert schema.

interface IdPet {
  _id: ObjectId;
  name: string;
  age: number;
}

const database = client.db("<your database>");
const collection = db.collection<OptionalId<IdPet>>("<your collection>");
 
collection.insertOne({
  name: "Spot",
  age: 2
});

const puppy = collection.findOne();  // type of find result is IdPet
// puppy: IdPet

const puppyId = collection.findOne()._id; //_id is defined and accessible
// puppyId: ObjectId

We have more examples in our docs: https://www.mongodb.com/docs/drivers/node/master/fundamentals/typescript/

justinmchase · 2022-07-12T18:33:50Z

Ok not gonna lie that's pretty confusing, in my schema its not optional, that's probably the source of the confusion. If I set it to optional on the schema then its optional on the insert but included in the results. Doesn't make sense but it does work.

dariakp · 2022-07-12T22:23:37Z

I agree that it is counterintuitive, but it should make sense if you think of it from the perspective of what a typescript schema in the driver can actually enforce. When you specify a typescript schema for the collection, the driver has no way to know whether those types are actually accurate in the database itself - it is not uncommon for more than one microservice to share a collection and modify different attributes. So, you aren't expected to describe the exact shape of the data that exists in the database, instead, you are expected to tell the driver what fields your service will be interacting with (i.e., reading and writing). *(see note)

Ultimately, the collection schema is the "fields you care about when using the driver methods" schema with an indication of which fields you care about always and which only sometimes.

If you don't expect to touch a field at all, don't specify it at all. The driver won't know about it and so won't allow it on inserts or return it on finds (unless it's the _id).
If you care that a given field is always present on a document when your application interacts with it (i.e. when calling read AND write methods), specify the field as required. The driver will ensure that it's required on inserts and know to expect it to be returned on finds.
If you only expect your application to sometimes interact with a field, specify the field as optional. The driver will allow but not require it on inserts and will expect it to be optionally (unless it's the _id) present on finds.

The _id field is only special in that the driver knows that documents without an _id field cannot exist, so it always at least allows the _id field on inserts and knows that it is always present on finds with the default type of ObjectId OR the type you specify in your schema - even if you only sometimes or never care about interacting with it. However, for the users who always care about interacting with it, we allow the tightening of that condition from optional to required on inserts, since there is no harm in allowing you to set a property you ultimately do not use, but there is harm in not enforcing the requirement of a property you care about personally setting.

*Note: In fact, you could have objects of different shapes living side by side in the same collection, and if you have a module in your application that you wish to only work with a specific type of object, then you could specify only that object type as the collection schema used by that module and would get typescript-backed safety for operations performed by the driver inside that module.

octogonz · 2022-07-16T01:39:03Z

instead, you are expected to tell the driver what fields your service will be interacting with (i.e., reading and writing).

I guess my point was that reading should be favored over writing. In other words, this is the important basic scenario:

const foundBook: Book | null = await books.findOne({ title: 'My Book' });
if (foundBook) {
  console.log('The book's ID is: ' + foundBook._id);
}

In this scenario we expect Book to have a simple type expression (not a clumsy WithId<Book>). The interface should define our idea of what a "book" is, even if some other microservice might consider a "book" to have extra fields or different required/optional status.

By contrast, writing is the messier scenario:

const newBook: WithoutId<Book> = {
  title: 'My Book'
};
await books.insertOne(newBook);

The clunky WithoutId<Book> is more natural here, because we're not really describing a "book," we're describing the partial inputs to be used to initialize a new document.

tl;dr There are users who need the id to be required on inserts.

In the old design, those users would accomplish that by using Book with insertOne instead of WithoutId<Book>. What I understand from #3100 (comment) is that you wanted to provide an extra safeguard against mistakes -- a policy ensuring that insertOne() will reject WithoutId<Book>.

In the new implementation, this policy is controlled by the Book._id declaration:

export declare type OptionalUnlessRequiredId<TSchema> = TSchema extends {
    _id: any;
} ? TSchema : OptionalId<TSchema>;

The intended relationship between the schema and the _id in the typings is as follows:

If no _id is specified on the schema, it is inferred to be of type ObjectId and is optional on inserts.

If an _id is specified on the schema as required, then the _id type is inferred to be of the specified type and is required on inserts.

If an _id is specified on the schema as optional, it is inferred to be of the specified type and is optional on inserts:

But the effect is more like this:

For the minority of users who require the _id on inserts, explicitly declare Book._id as non-optional, mission accomplished 👍👍
For everyone else, insertOne() is broken unless you declare Book._id as optional. Oops, your Book interface no longer accurately describes the document. The mitigation is to use WithId<Book> everywhere. 😥

By adding a feature to help the minority, it seems like things got worse for the majority.

Isn't there some better way to solve this? Like maybe instead of TSchema extends { _id: any; } we check for some kind of _id: RequiredOnInsert<TSchema>. Or maybe Collection<T> has a parameterization as Collection<T, IdRequiredOnInsert> or CollectionWithExplicitId<T>. I don't have time to solve this type algebra problem, but maybe someone else can think of something...

@justinmchase It might help to create a proper GitHub issue rather than discussing this problem in a closed PR thread.

dariakp · 2022-07-16T15:46:26Z

@octogonz I appreciate the thoughtful feedback here. Opening a ticket in our Jira project: https://jira.mongodb.org/projects/NODE would be the best way to keep track of the discussion and potential changes.

Write schema is preferred over read schema because the write schema is what has the highest risk of leading to data inconsistency. However, if it truly is a major pain point for many users and there are suggestions for a better implementation, we are always open to hearing them.

octogonz · 2022-07-25T21:46:28Z

Here's a ticket : https://jira.mongodb.org/browse/NODE-4470

baileympearson added 5 commits December 13, 2021 13:44

fix(NODE-3803): Fix _id typing on collection create operations

f2863d7

- Update the inferType helper type to correctly infer optional _id types - Update the OptionalId helper to make the _id optional in all cases - Create a new helper to require _id if the _id is defined on the schema

rename new helper type and clean up doc comments

b072a74

Export new type && actually rename type

4faf513

Add clarifying comment to InferType

2965bd7

Add tsd tests

2cfa137

baileympearson changed the title ~~WIP fix(NODE-3803): Fix _id typing on collection create operations~~ fix(NODE-3803): Fix _id typing on collection create operations Dec 13, 2021

baileympearson marked this pull request as ready for review December 13, 2021 19:53

baileympearson requested review from nbbeeken, dariakp and durran December 13, 2021 19:54

baileympearson added 3 commits December 13, 2021 15:27

Fix linting error in test file

f34a4ca

Finish merge with main

aa2adcc

Fix issues introduced with merge

e94571d

dariakp added the Primary Review In Review with primary reviewer, not yet ready for team's eyes label Dec 14, 2021

dariakp requested changes Dec 14, 2021

View reviewed changes

src/mongo_types.ts Outdated Show resolved Hide resolved

test/types/community/collection/insertX.test-d.ts Show resolved Hide resolved

Address PR comments

ace4b95

- Remove union type for optional _id fields - Add tests for insertX operations to test custom defined _id types

dariakp requested changes Dec 14, 2021

View reviewed changes

test/types/schema_helpers.test-d.ts Outdated Show resolved Hide resolved

baileympearson added 4 commits December 15, 2021 10:39

Update InferIdType tests

1a41fb2

Prevent 'objectidlike' objects being used as filter conditions

c078125

fix spelling of object id in tests

cce4784

Remove usages of ObjectID and run linter

ba07253

nbbeeken previously requested changes Dec 16, 2021

View reviewed changes

src/mongo_types.ts Show resolved Hide resolved

test/types/community/collection/filterQuery.test-d.ts Outdated Show resolved Hide resolved

Fix linting and spelling errors

144217e

dariakp added Team Review Needs review from team and removed Primary Review In Review with primary reviewer, not yet ready for team's eyes labels Dec 16, 2021

baileympearson added 2 commits December 16, 2021 15:40

Test scenario where user does not specify collection type

f933c85

Add test one more test

deba500

dariakp approved these changes Dec 16, 2021

View reviewed changes

dariakp requested a review from nbbeeken December 16, 2021 22:32

dariakp merged commit f1979db into main Dec 16, 2021

dariakp deleted the NODE-3803/fix-_id-typing-on-collection-methods branch December 16, 2021 22:33

mark-langer mentioned this pull request Jan 25, 2022

Update mongodb driver to 4.3.1 Automattic/mongoose#11246

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(NODE-3803): Fix _id typing on collection create operations #3077

fix(NODE-3803): Fix _id typing on collection create operations #3077

baileympearson commented Dec 13, 2021 •

edited

Loading

dariakp left a comment

octogonz commented Feb 20, 2022 •

edited

Loading

dariakp commented Feb 22, 2022

octogonz commented Feb 23, 2022

dariakp commented Feb 23, 2022

justinmchase commented Jul 8, 2022 •

edited

Loading

justinmchase commented Jul 8, 2022

dariakp commented Jul 8, 2022 •

edited

Loading

justinmchase commented Jul 12, 2022

dariakp commented Jul 12, 2022

octogonz commented Jul 16, 2022

dariakp commented Jul 16, 2022

octogonz commented Jul 25, 2022 •

edited

Loading

fix(NODE-3803): Fix _id typing on collection create operations #3077

fix(NODE-3803): Fix _id typing on collection create operations #3077

Conversation

baileympearson commented Dec 13, 2021 • edited Loading

Description

What is changing?

Is there new documentation needed for these changes?

What is the motivation for this change?

Double check the following

dariakp left a comment

Choose a reason for hiding this comment

octogonz commented Feb 20, 2022 • edited Loading

dariakp commented Feb 22, 2022

octogonz commented Feb 23, 2022

dariakp commented Feb 23, 2022

justinmchase commented Jul 8, 2022 • edited Loading

justinmchase commented Jul 8, 2022

dariakp commented Jul 8, 2022 • edited Loading

justinmchase commented Jul 12, 2022

dariakp commented Jul 12, 2022

octogonz commented Jul 16, 2022

dariakp commented Jul 16, 2022

octogonz commented Jul 25, 2022 • edited Loading

baileympearson commented Dec 13, 2021 •

edited

Loading

octogonz commented Feb 20, 2022 •

edited

Loading

justinmchase commented Jul 8, 2022 •

edited

Loading

dariakp commented Jul 8, 2022 •

edited

Loading

octogonz commented Jul 25, 2022 •

edited

Loading