-
Notifications
You must be signed in to change notification settings - Fork 1.8k
New issue
Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? # to your account
Improving boxplot performance #543
Comments
Hi, indeed that looks like it could be improved. Have you profiled this? At first glance it looks like d3.bisectLeft might be appropriate here. |
Hi. Not sure what you mean by profiling this. If you mean benchmarking, no real benchmarks, but there were noticeable differences when I used d3.bisectLeft as you suggested. Huge improvements when combined with underscore.js's indexOf function flagged to use binary search: speedArrayGroup = experimentDimension.group().reduce(
function(p,v) {
p.splice(d3.bisectLeft(p, v.Speed), 0, v.Speed);
return p;
},
function(p,v) {
p.splice(_.indexOf(p, v.Speed, true), 1);
return p;
},
function() {
return [];
}
); If you feel that this can be further improved, please let me know. Thanks! |
Cool, yes _.indexOf and d3.bisectLeft should be equivalent here, I was only suggesting the d3 function because dc is already dependent on d3. I don't know which is more efficient |
Ah, I didn't realize that d3.bisectLeft was a binary search. I would imagine that minor (or maybe not so minor?) improvements can be made by leaving the reduceAdd function to p.push(v.Speed), and somehow have the p called in reduceRemove to be the sorted array of values generated when calculating the quartiles, but I don't know if that modification would have to occur in dc.js, crossfilter, or d3... or how trivial it is would be to implement, and if it would mess up functionality elsewhere. Either way, thanks for the quick fix. |
The action item here is to fix the example. |
Changing from p.indexOf to d3.bisectLeft saved 25% (no pre-sorting). _.indexOf was only about 12%. |
A second action item here is to add an option saying that the data is pre-sorted. The d3 boxplot plugin is always sorting the data, which does not make sense in case of incremental changes. |
Moving the pre-sorting option to #1527. Fixing the examples now. |
Hi,
I'm trying to use dc.js to generate dynamic boxplots, but was having some performance issues with larger datasets (~50k float entries). I narrowed the issue down to this line under the reduce().reduceRemove function (taken from example):
I was thinking that if the p returned was pre-sorted (I'd imagine that the boxplot function already sorts the array to calculate quartiles) so that a more efficient binary search tree can be used to find the index for the splice. Is this something that can be implemented?
Thanks
The text was updated successfully, but these errors were encountered: