Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

sql: optimize DROP and TRUNCATE #2003

Closed
mberhault opened this issue Aug 6, 2015 · 7 comments
Closed

sql: optimize DROP and TRUNCATE #2003

mberhault opened this issue Aug 6, 2015 · 7 comments
Assignees
Labels
C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception) O-community Originated from the community
Milestone

Comments

@mberhault
Copy link
Contributor

TRUNCATE currently deletes all rows in the table before returning, and DROP just calls TRUNCATE followed by table descriptor deletions.

Instead, we should introduce a deleted field for tables causing the following:

  • best-effort data deletion
  • hidden from all sql statements
  • not included in accounting (well, we don't have accounting yet, so that's easy).

DROP would just mark the table as deleted, remove the name -> ID entry and exit. TRUNCATE would call DROP and create a new table with the same name.

@mberhault mberhault added the C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception) label Aug 6, 2015
@petermattis
Copy link
Collaborator

I was thinking about this as well, though it is obviously lower priority than getting various bits of functionality working first. It would be great if we could figure out how to perform the deletion as part of normal compactions.

@mberhault
Copy link
Contributor Author

We don't have priorities, or I would have put it as "get around to it when someone's bored".

@petermattis
Copy link
Collaborator

We can make our own priority labels.

@tbg
Copy link
Member

tbg commented Aug 7, 2015

so the idea here is to make data deletion asynchronous? There's probably value in not making it too asynchronous. If I drop a table, I'd at the very least want the storage stats to update quickly, and to do that you might as well delete it all as soon as you can.

@mberhault
Copy link
Contributor Author

We haven't started doing accounting for structured data, but we'll probably need per-table and per-database stats, so we'll need some logic for aggregation.
My suggestion would be that marking a table a deleted (and therefore unreachable to the user) would remove that table from the accounting aggregation. From the user's point of view, this would provide the proper behavior: the table is no longer visible or accessible, accounting is decremented, and the name is reusable.

@vivekmenezes
Copy link
Contributor

related to

#3274

@vivekmenezes
Copy link
Contributor

Implementing Truncate, one thought is to use a new table ID and delete all the keys from the old table ID. Unfortunately that's complicated by the fact that we have cross table references from foreign keys and interleaved tables that use the table ID as the reference, and we would also need to update those. I think it's best to just disable the table from reads/writes, truncate it and reenable it. I'll try that option for now.

@vivekmenezes vivekmenezes modified the milestones: Later, 1.0 Feb 26, 2017
@dianasaur323 dianasaur323 added the O-community Originated from the community label Apr 24, 2017
vivekmenezes added a commit to vivekmenezes/cockroach that referenced this issue Jul 17, 2017
The DROP TABLE is deemed complete as soon as the table name is no
longer in use. The table data GC cleanup is executed asynchronously
through the asynchronous schema change path, and can be made more
performant later.

fixes cockroachdb#14279

related to cockroachdb#2003
vivekmenezes added a commit to vivekmenezes/cockroach that referenced this issue Jul 17, 2017
In this change TRUNCATE is executed by swapping the
ID of a table. The old id is dropped and a new id
is created for the table. The old table descriptor is
gc-ed through the asynchronous schema changer.

fixed cockroachdb#2003
vivekmenezes added a commit to vivekmenezes/cockroach that referenced this issue Jul 17, 2017
The DROP TABLE is deemed complete as soon as the table name is no
longer in use. The table data is deleted asynchronously
through the asynchronous schema change path, and can be made more
performant later.

fixes cockroachdb#14279

related to cockroachdb#2003
vivekmenezes added a commit to vivekmenezes/cockroach that referenced this issue Jul 18, 2017
In this change TRUNCATE is executed by swapping the
table ID. The old id is dropped and a new id
is created for the table. The old table descriptor is
deleted through the asynchronous schema changer.

Note: with this change TRUNCATE is no longer transactional.
The TRUNCATE itself is implemented in a transaction, but
the name -> old id map could be present on a node after the
transaction resulting in data being added using the old id.
The newly inserted data using the old mapping will eventually
be deleted.

fixed cockroachdb#2003
@vivekmenezes vivekmenezes modified the milestones: 1.1, Later Jul 20, 2017
# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception) O-community Originated from the community
Projects
None yet
Development

No branches or pull requests

6 participants