Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

QB moves multiple complete torrents at a time, leading to super high IO contention. #9407

Closed
CyrusNajmabadi opened this issue Aug 27, 2018 · 73 comments · Fixed by #12035
Closed
Assignees
Labels
Confirmed bug An issue confirmed by project team to be considered as a bug Core

Comments

@CyrusNajmabadi
Copy link

CyrusNajmabadi commented Aug 27, 2018

Please provide the following information

qBittorrent version and Operating System

QB: 4.1.2
OS: Win10Pro 1803

What is the problem

I have my qb set to download torrents to a very fast SSD array. When complete, those torrents are then moved to a much slower HD array. However, QB is now moving many completed torrents over at hte same time. This leads to huge contention on the HD array, tanking moving speed. i.e. instead of being able to move at 100+ MB/s, the contention makes the overall transfer speed go at 15-25 MB/s.

QB didn't use to work this afaict. It used to move the completed torrents over one at a time.

@thalieht
Copy link
Contributor

thalieht commented Aug 28, 2018

The cause was identified #9373 (comment)

@CyrusNajmabadi
Copy link
Author

@thalieht the conversation you linked to seems to be about "checking" torrents, not "moving" torrents . I can imagine the same issue is causing both problems. However, i wanted to actually check to make sure that this really is the same problem.

Thanks!

@CyrusNajmabadi
Copy link
Author

An example of three of the moves happening at the same time, totally tanking perf to an abysmal 30ish mb/s:

image

@thalieht
Copy link
Contributor

thalieht commented Aug 28, 2018

Oops didn't notice the key word. I just moved 2 queued torrents (auto_managed) at the same time from one disk to another and Resource monitor showed activity on the Read column for both of them so i suppose this does not have the same cause as when checking.

@CyrusNajmabadi
Copy link
Author

and Resource monitor showed activity on the Read column for both of them

Ok great. So it's not just me. :)

As the original post, this is fairly brutal give how bad HDDs will perform when there is a lot of contention on them.

@CyrusNajmabadi
Copy link
Author

Is there any chance that this can be looked at. The regression is super severe and has made qb unusable on my machine. To put it in perspective, here's the backlog of files to move in the qb queue:

image

The list goes far beyond that, but this is all i could fit in a screen. Effectively, because of the excessively slow IO (due to copying multiple files at a time to HDDs), my system is now nearly 10 days behind on actually writing out the data that i've downloaded to the final destination.

Copying multiple files at the same time to the same location makes no sense for HDDs as it means that instead of being able to just go at full throughput speed to a single place in the drive, the drive has to constantly jump back and forth to different locations, decimating the throughput rate.

@Seeker2
Copy link

Seeker2 commented Sep 14, 2018

May be similar to this issue:
#9120 (comment)

@CyrusNajmabadi
Copy link
Author

Is there any chance this can be looked at? The performance here is still terrible. I basically can only download one torrent at a time since qbittorrent just absolutely hammers my disks when doing the final move. The contention means writes of only a few MB per second. Downloading multiple torrents is impossible since the download rate far surpasses the disk write speed during these moves.

@CyrusNajmabadi
Copy link
Author

CyrusNajmabadi commented Nov 23, 2018

Hi Qbittorrent/@sledgehammer999. I tried reporting this issue over at libtorrent in case there was maybe something that could be done over there. From @arvidn's perspective:

It sounds like this is primarily a qBT issue. libtorrent could implement some queuing mechanism and detection of which volume is being copied to, but it seems like a stretch of the scope of the library.

libtorrent currently just does what it's told. If qBT asks it to move 3 torrents, they will be moved, and 3 I/O threads will be used doing it (if available). I think the simplest way to address this would be for qBT to have an option to not move torrents in parallel.

Would it be possible to not have qbittorrent try to queue so many moves simultaneously? As moving is going to be fundamentally throughput bound, i can't imagine how doing things in parallel ever really helps anything. Even if you're lucky, and your storage medium experiences no contention slowdown, the best you'll ever do is move things the same speed as you would have if you were doing things serially. Except that each individual file will still take longer to move than doing things one at a time.

Note: this used to not be a problem. I'm not sure if QB changed something in 4.x to do things in parallel with moves. But it would be nice if it could be rolled back. Thanks!

@JamesG269
Copy link

JamesG269 commented Apr 3, 2019

This is still in an issue in 4.1.5. Selecting many files, and moving them to another drive, results in 3 simultaneous copies at a time according to Resource Monitor. After those complete, it starts 3 more. This results in transferring the files at 40-50MB/s instead of the 80-90MB/s the drives are capable of when doing one single copy.

@iheartcsharp
Copy link

Looks like libtorrent 1.2 will fix this: https://github.com/arvidn/libtorrent/releases/tag/libtorrent-1_2-RC2

@arvidn
Copy link
Contributor

arvidn commented May 16, 2019

@iheartcsharp which change are you referring to specifically in that list? off the top of my head, I don't think libtorrent-1.2 changes this situation.

Also, libtorrent-1.2.1 has been released, that's a link to an old pre-release.

@ctlaltdefeat
Copy link

ctlaltdefeat commented May 16, 2019

@arvidn "move files one-by-one when moving storage for a torrent"
does this not refer to what OP means?

Also, what libtorrent version is qbittorrent on right now?

@arvidn
Copy link
Contributor

arvidn commented May 17, 2019

I think that refers to how an individual torrent is moved. Instead of moving its entire directory, only files that are actually part of the torrent are moved. I.e. if some other files were put there, they won’t be moved.

Multiple torrents can still be moved in parallel.

@iheartcsharp
Copy link

iheartcsharp commented May 22, 2019

I think that refers to how an individual torrent is moved. Instead of moving it’s entire directory, only files that are actually part of the torrent are moved. I.e. if some other files were put there, they won’t be moved.

Multiple torrents can still be moved in parallel.

Hmmm...I see what you are saying. So any idea/plans on limiting 1 torrent being moved at a time? It can just be a simple global limit for any torrents that are being moved.

@arvidn
Copy link
Contributor

arvidn commented May 22, 2019

So any idea/plans on limiting 1 torrent being moved at a time? It can just be a simple global limit for any torrents that are being moved.

I don't have any plans to build that into libtorrent itself. It seems out of scope for the core library. I'm open to be convinced otherwise though, if someone has any ideas of how it could be implemented and also simplify the API/model at the same time. Just tacking on more complexity to the library does not seem like the right approach.

@JamesG269
Copy link

JamesG269 commented May 26, 2019

Not a c++ programmer, but could you do something like just have an array of the drives' usage as source/destinations in moving and checking, and queue further operations on that drive? Here's some pseudo C# I wrote to illustrate:

int[] drives_used = int[26];

Threads_to_use = 4;
Threads_used = 0;
move_files()
{
	int i = 0;
	while (move_queue.count > 0)
	{
		while (Threads_used == Threads_to_use)
		{
			await Task.Delay(100);
		}
		if (drives_used[move_queue[i].file1.drive] == 0 && drives_used[move_queue[i].file2.drive] == 0)
		{
			Threads_used++;
			drives_used[move_queue[i].file1.drive]++;
			drives_used[move_queue[i].file2.drive]++;			
			Task.Run(() => { move_file(move_queue[i]); });
			drives_used[move_queue[i].file1.drive]--;
			drives_used[move_queue[i].file2.drive]--;
			Threads_used--;
			move_queue.remove(move_queue[i]);			
		}
		i++;		
		if (i == move_queue.length)
		{
			await Task.Delay(100);
			i = 0;
		}
	}
}

@CyrusNajmabadi
Copy link
Author

Is there any chance this can happen? Moving torrents remains utterly brutal. It can take hours at these sorts of speeds. Any sort of change in torrent location can literally take days/weeks as it churns so slowly through all these files at a 4-5x slower rate than what we'd get if one file would happen at a time.

@yaxv
Copy link

yaxv commented Nov 18, 2019

I concur with the rest. It would be nice if files were moved one at a time.

EDIT: Then again, I suppose that a temporary solution is to move the files manually and update their new locations in Qbittorrent afterwards.

@CyrusNajmabadi
Copy link
Author

@arvidn

I don't have any plans to build that into libtorrent itself. It seems out of scope for the core library. I'm open to be convinced otherwise though, if someone has any ideas of how it could be implemented and also simplify the API/model at the same time. Just tacking on more complexity to the library does not seem like the right approach.

How do you currently move files? Hard to give an idea on how it should be done without knowing that information.

@CyrusNajmabadi
Copy link
Author

@arvidn can you explain how you check files one at a time?

@arvidn
Copy link
Contributor

arvidn commented Jan 15, 2020

I think it should be covered here.

The short answer is:

  • set settings_pack::active_checking_limit to 1
  • make torrents set the auto_managed flag on torrents (see here and here)

@arvidn
Copy link
Contributor

arvidn commented Jan 15, 2020

How do you currently move files? Hard to give an idea on how it should be done without knowing that information.

Who is "you" in this question? I don't think I follow, files are essentially moved by calling rename() and if that fails, the file is copied and the original is deleted.

if "you" is qBT, then it calls torrent_handle::move_storage() on the torrent handle.

@arvidn
Copy link
Contributor

arvidn commented Jan 15, 2020

I don't think there's any disagreement that it would be better if torrents would be moved one at a time. There's just a shortage of engineers to make that happen (just like everywhere else in the world).

@CyrusNajmabadi
Copy link
Author

Ok. So if you have a mechanism to only check one torrent at a time, can you not apply that same mechanism to moving one torrent at a time?

I.e. add an active_moving_limit value and use it in the same way you use active_checking_value

@arvidn
Copy link
Contributor

arvidn commented Jan 15, 2020

sure I could. I don't think that's in scope for libtorrent though, but I'm willing to be convinced otherwise

@CyrusNajmabadi
Copy link
Author

Can you explain the reasoning for why 'checking' (an extremely heavy io op that needs to process entire files) has support for being serial... But 'moving' (a very similar op in nature) does not?

I guess I don't see you made it possible to limit one but not the other. Having either happen in parallel is just not good at all...

@Mike-EE
Copy link

Mike-EE commented Feb 2, 2020

Since this is already being discussed here...

I participate in several open-source radio projects, as my programming experience is heavily rooted in embedded software and FPGA programming, and I have to say that the ethos of these programs is "core functions come first". Topics related to I/O and other platform functions get the most attention and tickets related to I/O issues are handled first.

I don't perceive that the core qBt development team thinks the same way and it's incredibly frustrating to experience the debilitating I/O issues in qBt, try to work around them and then read debates about implementing features that have no impact on debilitating I/O issues. I would want, if it were me, the "basics" to work well before building more "stuff" into the project.

I sense in this Git and the forum a growing frustration from dealing with memory consumption, storage handling and similar issues for (in some cases) years and we're seeing that reflected in threads such as this. I know from my own experience that no-one wants their passion project to become another "day job" but I would expect that anyone with a key role in this project would want their project to be broadly useful and resolutely reliable, which sadly isn't the way I would describe qBt today.

I implore the qBt development team, including the library developers, to "go after" the I/O (network, storage, etc.) problems and retire these lingering, multi-year issues before efforting feature growth. I know it's more fun to work on new things, but nobody can enjoy the latest "new thing" if the program self-terminates or overloads the storage subsystem before they can really enjoy that "new thing".

Thanks for reading.

@arvidn
Copy link
Contributor

arvidn commented Feb 2, 2020

@Mike-EE I assume by "library developers" you mean me. Specifically regarding moving the storage of a torrent from one disk to another, I've made the case, in this thread, that I don't consider adding any sophistication around queuing/serializing this operation in scope of libtorrent (sort of like how one wouldn't expect copyfile() to have any such sophistication). I welcome arguments against my position (but it should probably be on the libtorrent issue tracker, to not pollute this thread any more). One reason for this is exactly because disk I/O and memory efficiency has much higher priority in my book, along with fundamental networking performance (again, I welcome discussions about this on the libtorrent issue tracker). Memory consumption almost always spill into the behavior of the operating system's disk cache though, which (to some extent) I also consider outside the scope of libtorrent, at least heroics to strong-arm the OS' caching strategies.

@arvidn
Copy link
Contributor

arvidn commented Feb 2, 2020

also, for context. These are the libtorrent donations: https://bitref.com/373ZDeQgQSQNuxdinNAPnQ63CRNn4iEXzg :)

@CyrusNajmabadi
Copy link
Author

this operation in scope of libtorrent

I would understand this if libtorrent didn't have anything to do with moving files. However, it provides the entrypoint that torrent programs use to actually move the files. As such, it should exhibit good behavior around that.

The same issue existed (and was fixed) wrt to checking files. The same argument could have been made that it could just be up to the torrent program to queue files that need to be checked, and libtorrent simply checked when asked. However, libtorrent dealt with this significant problem. It exposes the functionality to check, and it does so intelligently with IO. If it's also handling moving IO then i think it should do the same.

THe arguments that this isn't a core part of bittorrent ring a little hollow to me. If it's not a party of bittorrent, then just don't expose moving at all as part of libtorrent API. However, if it is useful and vlauable have in your library/api, i think it behooves the library/api to do it well.

sort of like how one wouldn't expect copyfile() to have any such sophistication)

Actually, i would. And OSs do expose this sort of functionality. Precisely because thrashing the IO subsystem is not necessarily a good thing (which i presume is why libtorrent doesn't check multiple files at the same time anymore either). I know i've personally used those sorts of APIs in my heavy IO apps. And, for any heavy IO lib, it's def been important to ensure that the lib itself is conscientious of how it schedules IO.

@CyrusNajmabadi
Copy link
Author

also, for context. These are the libtorrent donations:

What sort of donation would you be looking for here to do this work? What's the right amount of money to motivate it? Thanks! :)

@arvidn
Copy link
Contributor

arvidn commented Feb 3, 2020

As such, it should exhibit good behavior around that.

"Good behavior" for one user may be bad behavior for another. For instance, someone might expect to be able to move files located on separate drives in parallel. I think it's more appropriate to talk about the behavior as higher or lower level. The current functionality is low level. It allows implementing the higher level on top of it. If libtorrent would only provide the high level API, how would you get access to the low level functionality? The API alone is a non-trivial challenge.

The same issue existed (and was fixed) wrt to checking files.

The original behavior for checking files in libtorrent was to only check one at a time. This was extended to allow checking files in parallel (but the default was still one at a time).

The same argument could have been made that it could just be up to the torrent program to queue files that need to be checked, and libtorrent simply checked when asked. However, libtorrent dealt with this significant problem. It exposes the functionality to check, and it does so intelligently with IO. If it's also handling moving IO then i think it should do the same.

You made this argument earlier in the thread already, I won't repeat my response.

THe arguments that this isn't a core part of bittorrent ring a little hollow to me. If it's not a party of bittorrent, then just don't expose moving at all as part of libtorrent API.

So, you can pretend it isn't there. done.

However, if it is useful and vlauable have in your library/api, i think it behooves the library/api to do it well.

This sounds like the same argument as the "good behavior" one.

Actually, i would.

copyfile() copies the file, it doesn't queue it up and do it later. It's reasonable because it's a low level API.

Precisely because thrashing the IO subsystem is not necessarily a good thing

I don't think anyone is arguing that it is.

If you want to respond, PLEASE file a ticket in the libtorrent repo. It's right here. You can still link to the ticket in this thread.

@CyrusNajmabadi
Copy link
Author

"Good behavior" for one user may be bad behavior for another. For instance, someone might expect to be able to move files located on separate drives in parallel.

Someone might expect to be able to check multiple files in parallel. Such expectations are reasonable, and i would have no problem with the software making it possible for people to specify that thsi is what they want.

However, this seems like the abnormal case. i.e. which is more likely across all your users: that they have torrents being moved simultaneously to multiple different drives? Or that they're moving to the same drive?

The software should work well for the more common case, and optionally provide ways to work well for the less common cases.

The original behavior for checking files in libtorrent was to only check one at a time. This was extended to allow checking files in parallel (but the default was still one at a time).

Sounds great. Having the same for moving would be awesome. Looks like there's good precedent for it in the library :)

I won't repeat my response.

As i said, your response rings hollow to me. All the reasons you've given for 'checking' to be done in the manner it is done, also seem to apply just as well to 'moving'.

copyfile() copies the file, it doesn't queue it up and do it later. It's reasonable because it's a low level API.

That's a choice of the API currently. There's no reason that you only have to supply a low-level copy op, just like there would be no need to just have a low-level 'check' API that the app layer would have to coordinate to prevent thrashing.

If you want to respond, PLEASE file a ticket in the libtorrent repo.

Sure.

@glassez
Copy link
Member

glassez commented Feb 3, 2020

Well, apparently this is a really serious problem, so I'll try to make my contribution here.
I must say again that I have no idea about the money that you are talking about here. qBittorrent is just my hobby, I spend my personal free time (although some money could save me from some other worries, so I could devote the free time to this project).
Before I introduce any implementation, I would like to discuss some key aspects of its user interface.
By and large, all we need to do is implement "queued" storage moving. I.e.:

  1. When torrent need to be moved (by user or by some trigger, e.g. "moving completed torrent from temp folder") we don't perform "moving" immediately but just mark it as "moving storage" and append it to "moving queue" (it corner case, when there are no items in the queue, it still performs moving immediately).
  2. When torrent has finished its moving we check queue for the next item.

Note: the finished torrent should be considered as really finished only when it's done moving from temp directory.

Are there any other global aspects that I need to take into account?

@arvidn
Copy link
Contributor

arvidn commented Feb 3, 2020

just don't make the "special case" of an empty queue too special. You still need to add it to the queue, so that any subsequent move gets queued

@glassez
Copy link
Member

glassez commented Feb 3, 2020

just don't make the "special case" of an empty queue too special. You still need to add it to the queue, so that any subsequent move gets queued

Sure. "Task in progress" must remain in the queue until it is completed.

@CyrusNajmabadi
Copy link
Author

Are there any other global aspects that I need to take into account?

Probably want to ensure that this works even if qb is closed/reopened. I.e. torrents queued for moving aren't somehow 'lost' if the user does that.

Thanks for looking into this!

@CyrusNajmabadi
Copy link
Author

Also, rather than having two queues (one for moving, and one for checking), there should really only be one 'heavy io' queue that is shared by both ops. While a check is going on, moves should not happen (and vice versa).

Basically, both these ops completely saturate the underlying io subsystem. So running and concurrently just kills performance.

Thanks!

@arvidn
Copy link
Contributor

arvidn commented Feb 3, 2020

Ideally, moving torrents within the same drive would not be subject to queueing

@CyrusNajmabadi
Copy link
Author

Ideally, moving torrents within the same drive would not be subject to queueing

FWIW, this feels like an unnecessary special case. Say you are moving on the same drive. Even if you queue, it won't be a problem as the queue will blitz through things super quickly. even if you had a ton of moves (i.e. thousands), that's only going to be thousands of simple metadata-writes in a row, which should still take a tiny amount of time (perhaps a couple of seconds tops). So trying to optimize for that seems to likely be unnecessary.

Another thing that can make this difficult is that "the same drive" can be non-obvious. For example, in my system i use raided storage where the relationship between folders and drives isn't at all necessarily clear. I woudl hate for the system to think it could do things in parallel, while it as still hitting the same disk, thus negating the entire benefit of all this work :-/

@arvidn
Copy link
Contributor

arvidn commented Feb 3, 2020

Say you are moving on the same drive. Even if you queue, it won't be a problem as the queue will blitz through things super quickly.

not if you have to sit and wait at the end of the queue for a bunch of moves that actually move across drives.

Another thing that can make this difficult is that "the same drive" can be non-obvious.

Yes, it's not trivial. one sure way to discover this is to attempt a rename(), and if it fails, you know you have to perform a copy.

@arvidn
Copy link
Contributor

arvidn commented Feb 3, 2020

perhaps move_flags_t could be extended with an option to "no_fallback_to_copy", to have a cheap way to try the cheap operation first.

@CyrusNajmabadi
Copy link
Author

perhaps move_flags_t could be extended with an option to "no_fallback_to_copy", to have a cheap way to try the cheap operation first.

That seems very reasonable.

@glassez
Copy link
Member

glassez commented Feb 4, 2020

Wait, wait... It looks like you want to get something "excellent" right away, without having anything yet. This is a way to nowhere (given our current capabilities). As they say, the best is the enemy of the good.
In addition, it seems that the "moving queue" will cover most of the problematic cases. At the very least, further improvement looks significantly less profitable in terms of cost/benefit ratio (at least having current libtorrent implementation).
Anyway, we can add improvements incrementally. So if the "moving queue" doesn't suit you, I won't even start doing anything.

@arvidn
Copy link
Contributor

arvidn commented Feb 4, 2020

@glassez I'm not sure who you're referring to as "you" there. I have no skin in this game so feel free to ignore me.

@glassez
Copy link
Member

glassez commented Feb 7, 2020

@arvidn, is it hard to implement an option to set libtorrent to use only one thread for "move storage" jobs?

@arvidn
Copy link
Contributor

arvidn commented Feb 7, 2020

well. you could set the number of disk I/O threads to 1 as long as there are any pending move jobs. that would affect all other disk I/O though, and prevent peers to request data while waiting for the move to complete. not ideal.

I can't think of a particularly simple way to do this. There are two queues right now, one for hash jobs and one for all other disk I/O jobs. iirc, this is to prevent hash jobs to starve all other jobs. maybe something similar could be done for move jobs.

@iheartcsharp
Copy link

I wish I could help out but I'm heavy on the C# side and learning the dev tools and C++ will take a long time. I think the best option is to just have qBittorrent have a global queue variable and move the files one by one as a stop-gap measure until there is time to add a smarter logic. At least if the files move 1 by 1 it will be a much better than the current behavior which causes thrashing of the drives and fragmenting of the files.

@glassez
Copy link
Member

glassez commented Feb 11, 2020

Working on it currently...

@glassez glassez self-assigned this Feb 11, 2020
@FranciscoPombal FranciscoPombal added Confirmed bug An issue confirmed by project team to be considered as a bug Core labels Feb 19, 2020
@qbittorrent qbittorrent locked and limited conversation to collaborators Mar 1, 2021
# for free to subscribe to this conversation on GitHub. Already have an account? #.
Labels
Confirmed bug An issue confirmed by project team to be considered as a bug Core
Projects
None yet
Development

Successfully merging a pull request may close this issue.