-
Notifications
You must be signed in to change notification settings - Fork 3.9k
New issue
Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? # to your account
sql: fix perf problems with DROP TABLE/INDEX #7499
Comments
Yes, that would be reasonable but is not trivial: we currently auto-wrap range-spanning operations in a transaction at the DistSender level (because we generally have to). It's worth allowing for something like that, though. The "trivial" solution is a flag on BatchRequest (or a special consistency mode, or we could even make It's also good to keep in mind that with leader-proposed Raft, these are precisely the worst case which will essentially send snapshot-sized commands through Raft, so it would be good to avoid them. Also, we don't need MVCC at all. Instead of DeleteRange, maybe we should invest in a ClearRange command which would perform the deletion as a side-effect instead. The two paragraphs above are mostly orthogonal, so forgetting about ClearRange for now and looking at the other thing is a good way to start. |
I was thinking we'd wait for a distributed implementation of DROP at the SQL level; it should be trivial to parallelize, having each range clear itself non-transactionally (or one txn per range). This would avoid having to introduce any special KV support. In the meantime, we can implement the truncation in a streaming fashion, with a series of |
Related, FYI, see #2003 which is related to (or a dup of) this issue. |
@tschottdorf Can we really avoid using MVCC when dropping index/table data? How would that interact with time-travel queries? |
Ah, forgot about those. Yes, we need to keep that for time-travel queries. What's also potentially interesting is that we didn't really need to actively delete the data if we could instead set a TTL and the GC queue would toss out the data over time, as permitted by time travel considerations. So if TTLs are anywhere on our roadmap, that's worth a thought. |
Regardless of how we go about it, we should also figure out why the intents aren't proactively resolved. In an idea world, the intent range is attached to |
Allowing dropped table and index data to be deleted asynchronously by the GC queue is certainly attractive, though (as mentioned in #2003) we have to consider how that will eventually interact with accounting. Not exactly sure what you're envisioning for TTLs. I think we could enhance the GC queue so that it could determine when a range of data was deleted. For example, if |
I'm not envisioning anything for TTLs right now, but it has been requested a bunch of times and if we were to entertain it, then there would be an elegant way to remove old tables as they go out of scope by means of the GCQueue (which would be nice because that's also the safeguard of time travel queries). I think we'll do fine with what we have once the deletion is less completely stupid as it is now, at least for some time. Since the current situation is pretty bad, we should do something about it. I don't know if waiting for distributed SQL is an option or even the right thing to do. At some point we'll have to figure out what the responsibilities of DistSender vs DistSQL are. I'd rather we do that sooner than later. |
No need to haul the GC queue up to the SQL level in order to give it knowledge of keys. For example, we could add hooks to the GC queue which the SQL level could hook into: given a span of keys, return a list of spans with corresponding deletion timestamps. |
@petermattis not sure what you had in mind. Another approach creates a new system table holding all the key spans that have been deleted. |
Talk to peter, the API will given a span of keys will return spans with corresponding deletion timestamps. The implementation will identify all table ids that fit within the span provided, and check if those tables or indexes within have been deleted. |
That's possible, as is a special RPC that marks the whole range for deletion (in which case the gc will pick it up at high priority). I'd avoid upcalling to SQL because that would require KV, and it's good to minimize that (on top of being a little intransparent). Let's not get sidetracked on this GC issue. |
Given that we need a fix for this soon, I'm tempted to move forward on this by adding a limit to DelRange(). I think it's a useful addition to the DelRange API even outside of this issue. Any concerns with move forward on that approach. |
@vivekmenezes No concerns. I thought you were already moving forward with that approach. |
I've not been paying any attention to this issue and was expecting to get to it in two weeks. It's hurting users and I'm going to drop what I'm doing (adding checkpointing to schema changes) in favor of this issue. |
SGTM. Unfortunately there isn't a trivial way to do this within On Thu, Jul 7, 2016 at 10:43 AM vivekmenezes notifications@github.com
-- Tobias |
Can I do it the same way Scan() is implemented? |
You mean add a |
This was also on my list, but I'm glad Vivek's taking it! :) I'm not sure how |
Ran a drop table on a 3M+ row table on a GCE cluster with 3 nodes
while the table was being dropped in a separate window I ran
|
#9318 is tracking the issue with the deleted table showing up on show tables. I think it should still show up there with some deletion progress indicator I'm closing this issue because it's fixed. |
DROP {TABLE,INDEX}
both issue a single transactionalDelRange
operation to delete all of the table/index data. In large tables, this creates an extremely large transaction which has various performance problems (that should be fixed separately).Since
DROP {TABLE,INDEX}
have already updated the table descriptor and other metadata to make the table/index unavailable, we should be able to use the chunked-style implementation used for backfilling indexes/columns upon creation. Even better, it isn't immediately obvious why we need to use a transaction at all here. @tschottdorf would it be reasonable to perform a single non-transactionalDelRange
for all of a table's data?The text was updated successfully, but these errors were encountered: