sql: fix perf problems with DROP TABLE/INDEX #7499

petermattis · 2016-06-28T12:08:35Z

DROP {TABLE,INDEX} both issue a single transactional DelRange operation to delete all of the table/index data. In large tables, this creates an extremely large transaction which has various performance problems (that should be fixed separately).

Since DROP {TABLE,INDEX} have already updated the table descriptor and other metadata to make the table/index unavailable, we should be able to use the chunked-style implementation used for backfilling indexes/columns upon creation. Even better, it isn't immediately obvious why we need to use a transaction at all here. @tschottdorf would it be reasonable to perform a single non-transactional DelRange for all of a table's data?

The text was updated successfully, but these errors were encountered:

tbg · 2016-06-28T14:51:35Z

Yes, that would be reasonable but is not trivial: we currently auto-wrap range-spanning operations in a transaction at the DistSender level (because we generally have to). It's worth allowing for something like that, though. The "trivial" solution is a flag on BatchRequest (or a special consistency mode, or we could even make INCONSISTENT mean that range ops are allowed to span ranges without requiring a txn? Those would also be heavily parallelized).
Once we have that, the nice part is that we don't have to do any upstream chunking at all.

It's also good to keep in mind that with leader-proposed Raft, these are precisely the worst case which will essentially send snapshot-sized commands through Raft, so it would be good to avoid them. Also, we don't need MVCC at all. Instead of DeleteRange, maybe we should invest in a ClearRange command which would perform the deletion as a side-effect instead.

The two paragraphs above are mostly orthogonal, so forgetting about ClearRange for now and looking at the other thing is a good way to start.

andreimatei · 2016-06-28T14:59:15Z

I was thinking we'd wait for a distributed implementation of DROP at the SQL level; it should be trivial to parallelize, having each range clear itself non-transactionally (or one txn per range). This would avoid having to introduce any special KV support.

In the meantime, we can implement the truncation in a streaming fashion, with a series of DELETE... LIMIT ls

petermattis · 2016-06-28T15:05:54Z

Related, TRUNCATE <table> performs a DelRange on all of the table data. I wonder if we should turn TRUNCATE into a schema change operation.

FYI, see #2003 which is related to (or a dup of) this issue.

petermattis · 2016-06-28T15:14:01Z

@tschottdorf Can we really avoid using MVCC when dropping index/table data? How would that interact with time-travel queries?

tbg · 2016-06-28T15:17:22Z

Ah, forgot about those. Yes, we need to keep that for time-travel queries.

What's also potentially interesting is that we didn't really need to actively delete the data if we could instead set a TTL and the GC queue would toss out the data over time, as permitted by time travel considerations. So if TTLs are anywhere on our roadmap, that's worth a thought.

tbg · 2016-06-28T15:20:34Z

Regardless of how we go about it, we should also figure out why the intents aren't proactively resolved. In an idea world, the intent range is attached to EndTransaction by TxnCoordSender and used even when the aborting EndTransaction isn't valid any more. This is better since a single ResolveIntentRange can clear out the whole swath of keyspace we're otherwise running into one by one. Worth investigating what exactly the process is.

petermattis · 2016-06-28T15:30:18Z

Allowing dropped table and index data to be deleted asynchronously by the GC queue is certainly attractive, though (as mentioned in #2003) we have to consider how that will eventually interact with accounting.

Not exactly sure what you're envisioning for TTLs. I think we could enhance the GC queue so that it could determine when a range of data was deleted. For example, if TableDescriptor had a deleted_at field, the GC queue could see if the table associated with the range it is considering has been deleted. This would essentially be moving per-key tombstones to a per-table tombstone.

tbg · 2016-06-28T15:36:46Z

I'm not envisioning anything for TTLs right now, but it has been requested a bunch of times and if we were to entertain it, then there would be an elegant way to remove old tables as they go out of scope by means of the GCQueue (which would be nice because that's also the safeguard of time travel queries).
Hauling the GC queue up to a SQL level seems like a project I'm not sure is worth it.

I think we'll do fine with what we have once the deletion is less completely stupid as it is now, at least for some time.

Since the current situation is pretty bad, we should do something about it. I don't know if waiting for distributed SQL is an option or even the right thing to do. At some point we'll have to figure out what the responsibilities of DistSender vs DistSQL are. I'd rather we do that sooner than later.

petermattis · 2016-06-28T15:45:33Z

No need to haul the GC queue up to the SQL level in order to give it knowledge of keys. For example, we could add hooks to the GC queue which the SQL level could hook into: given a span of keys, return a list of spans with corresponding deletion timestamps.

vivekmenezes · 2016-06-28T16:29:00Z

@petermattis not sure what you had in mind. Another approach creates a new system table holding all the key spans that have been deleted.

vivekmenezes · 2016-06-28T17:08:15Z

Talk to peter, the API will given a span of keys will return spans with corresponding deletion timestamps. The implementation will identify all table ids that fit within the span provided, and check if those tables or indexes within have been deleted.

tbg · 2016-06-28T17:42:35Z

That's possible, as is a special RPC that marks the whole range for deletion (in which case the gc will pick it up at high priority). I'd avoid upcalling to SQL because that would require KV, and it's good to minimize that (on top of being a little intransparent). Let's not get sidetracked on this GC issue.

vivekmenezes · 2016-07-07T14:03:33Z

Given that we need a fix for this soon, I'm tempted to move forward on this by adding a limit to DelRange(). I think it's a useful addition to the DelRange API even outside of this issue. Any concerns with move forward on that approach.

petermattis · 2016-07-07T14:37:31Z

@vivekmenezes No concerns. I thought you were already moving forward with that approach.

vivekmenezes · 2016-07-07T14:42:42Z

I've not been paying any attention to this issue and was expecting to get to it in two weeks. It's hurting users and I'm going to drop what I'm doing (adding checkpointing to schema changes) in favor of this issue.

tbg · 2016-07-07T14:50:25Z

SGTM. Unfortunately there isn't a trivial way to do this within storage.

On Thu, Jul 7, 2016 at 10:43 AM vivekmenezes notifications@github.com
wrote:

I've not been paying any attention to this issue and was expecting to get
to it in two weeks. It's hurting users and I'm going to drop what I'm doing
(adding checkpointing to schema changes) in favor of this issue.

—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
#7499 (comment),
or mute the thread
https://github.com/notifications/unsubscribe/AE135GrZV6wc2IIhhzxDWmy57Vs3SrzTks5qTRBvgaJpZM4JAB_y
.

-- Tobias

vivekmenezes · 2016-07-07T14:59:12Z

Can I do it the same way Scan() is implemented?

petermattis · 2016-07-07T15:03:40Z

You mean add a maxRows parameter to DelRange? Yes, I think that should work.

andreimatei · 2016-07-07T15:25:25Z

This was also on my list, but I'm glad Vivek's taking it! :)
I was thinking of leaving KV alone, and doing the limit in SQL, in whatever way the DeletedNode deals with a LIMIT clause (I guess a scan followed by point deletes). Might be some trickiness because the table descriptor cannot be read in the usual way any more in the schema changer (since it's been marked for deletion), but I'm sure we can hack around that somehow.

I'm not sure how DelRange would work with a limit... how would DistSender split it across the different KV ranges?

vivekmenezes · 2016-10-13T20:02:01Z

Ran a drop table on a 3M+ row table on a GCE cluster with 3 nodes

root@104.196.170.123:26257> show tables;
+--------+
| Table  |
+--------+
| blocks |
+--------+
(1 row)
root@104.196.170.123:26257> select count(*) from blocks;
+----------+
| count(*) |
+----------+
|  3458985 |
+----------+
(1 row)
root@104.196.170.123:26257> drop table blocks;
DROP TABLE
root@104.196.170.123:26257> SELECT count(*) FROM blocks;
pq: table "blocks" does not exist

while the table was being dropped in a separate window I ran

root@104.196.170.123:26257> show tables;
+--------+
| Table  |
+--------+
| blocks |
+--------+
(1 row)
root@104.196.170.123:26257> select count(*) from blocks;
pq: table is being deleted

vivekmenezes · 2016-10-13T20:06:31Z

#9318 is tracking the issue with the deleted table showing up on show tables. I think it should still show up there with some deletion progress indicator

I'm closing this issue because it's fixed.

petermattis assigned vivekmenezes Jun 28, 2016

tbg mentioned this issue Jun 28, 2016

stability: keep a cache of pushed transactions #627

Closed

andreimatei mentioned this issue Jun 29, 2016

sql: drop table and recreate fail #7348

Closed

This was referenced Jun 30, 2016

sql: SHOW TABLES reports deleted tables #7549

Closed

sql: show schema change state in sql commands #7554

Closed

andreimatei mentioned this issue Jun 30, 2016

sql: DROP of a large table causes server to become unresponsive. #3103

Closed

vivekmenezes mentioned this issue Jul 10, 2016

kv: apply MaxSpanRequestKeys to DelRange requests #7725

Merged

petermattis modified the milestone: Q3 Jul 11, 2016

petermattis added the C-performance Perf of queries or internals. Solution not expected to change functional behavior. label Jul 11, 2016

vivekmenezes mentioned this issue Aug 27, 2016

sql: drop index in chunks #8870

Merged

vivekmenezes mentioned this issue Aug 29, 2016

sql: table DROP in chunks #8885

Merged

vivekmenezes closed this as completed Oct 13, 2016

vivekmenezes mentioned this issue Oct 19, 2016

running drop table command ... system hung ... don't know what happened. #9776

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

sql: fix perf problems with DROP TABLE/INDEX #7499

sql: fix perf problems with DROP TABLE/INDEX #7499

petermattis commented Jun 28, 2016

tbg commented Jun 28, 2016

andreimatei commented Jun 28, 2016 •

edited

Loading

petermattis commented Jun 28, 2016

petermattis commented Jun 28, 2016

tbg commented Jun 28, 2016

tbg commented Jun 28, 2016

petermattis commented Jun 28, 2016

tbg commented Jun 28, 2016

petermattis commented Jun 28, 2016

vivekmenezes commented Jun 28, 2016

vivekmenezes commented Jun 28, 2016

tbg commented Jun 28, 2016

vivekmenezes commented Jul 7, 2016

petermattis commented Jul 7, 2016

vivekmenezes commented Jul 7, 2016

tbg commented Jul 7, 2016

vivekmenezes commented Jul 7, 2016

petermattis commented Jul 7, 2016

andreimatei commented Jul 7, 2016

vivekmenezes commented Oct 13, 2016

vivekmenezes commented Oct 13, 2016

sql: fix perf problems with DROP TABLE/INDEX #7499

sql: fix perf problems with DROP TABLE/INDEX #7499

Comments

petermattis commented Jun 28, 2016

tbg commented Jun 28, 2016

andreimatei commented Jun 28, 2016 • edited Loading

petermattis commented Jun 28, 2016

petermattis commented Jun 28, 2016

tbg commented Jun 28, 2016

tbg commented Jun 28, 2016

petermattis commented Jun 28, 2016

tbg commented Jun 28, 2016

petermattis commented Jun 28, 2016

vivekmenezes commented Jun 28, 2016

vivekmenezes commented Jun 28, 2016

tbg commented Jun 28, 2016

vivekmenezes commented Jul 7, 2016

petermattis commented Jul 7, 2016

vivekmenezes commented Jul 7, 2016

tbg commented Jul 7, 2016

vivekmenezes commented Jul 7, 2016

petermattis commented Jul 7, 2016

andreimatei commented Jul 7, 2016

vivekmenezes commented Oct 13, 2016

vivekmenezes commented Oct 13, 2016

andreimatei commented Jun 28, 2016 •

edited

Loading