Loading file list of large directory is too slow #4706

esevan · 2019-06-21T10:21:50Z

It's duplicate of #3114 . But no answer in that issue thread. Is there any progress on it such as pagination?

kevin-bates · 2019-06-21T17:55:37Z

I think this issue is related #4575. Did you see how these issues fair in Lab?

esevan · 2019-06-22T02:19:37Z

@kevin-bates Thanks for response. Actually I'm using Lab. For more detail, I opened the directory which has 10,000 images. In that case everything hang even terminal input not working. I can see request timeout in the end.

I'll add my env detail soon. Sorry for lack of information.

kevin-bates · 2019-06-25T14:02:34Z

I think you might get better traction opening this issue in https://github.com/jupyterlab/jupyterlab since that's where the front-end focus is these days. I suspect "files" are treated differently than directories (which is where I saw the difference in Lab) relative to rendering.

esevan · 2019-06-26T10:18:34Z

@kevin-bates Well.. I decided to post here because I could check this happens in classic notebook (/tree) as well as jupyter lab (/lab).

The following shows it takes 7.40 seconds to get response from the notebook server for the request of 25,089 dentries under trainB.

$ ls datasets/horse2zebra/trainB | wc -c
25089

EDIT)
I'm guessing server stocked in following code.

notebook/notebook/services/contents/filemanager.py

Lines 310 to 343 in 6d15e9c

    
           if content: 
        
               model['content'] = contents = [] 
        
               os_dir = self._get_os_path(path) 
        
               for name in os.listdir(os_dir): 
        
                   try: 
        
                       os_path = os.path.join(os_dir, name) 
        
                   except UnicodeDecodeError as e: 
        
                       self.log.warning( 
        
                           "failed to decode filename '%s': %s", name, e) 
        
                       continue 
        
                   try: 
        
                       st = os.lstat(os_path) 
        
                   except OSError as e: 
        
                       # skip over broken symlinks in listing 
        
                       if e.errno == errno.ENOENT: 
        
                           self.log.warning("%s doesn't exist", os_path) 
        
                       else: 
        
                           self.log.warning("Error stat-ing %s: %s", os_path, e) 
        
                       continue 
        
                   if (not stat.S_ISLNK(st.st_mode) 
        
                           and not stat.S_ISREG(st.st_mode) 
        
                           and not stat.S_ISDIR(st.st_mode)): 
        
                       self.log.debug("%s not a regular file", os_path) 
        
                       continue 
        
                   if self.should_list(name): 
        
                       if self.allow_hidden or not is_file_hidden(os_path, stat_res=st): 
        
                           contents.append( 
        
                                   self.get(path='%s/%s' % (path, name), content=False) 
        
                           ) 
        
               model['format'] = 'json'

The sample data used for description is from https://github.com/junyanz/CycleGAN

kevin-bates · 2019-06-26T14:20:29Z

Thanks for the information. So it sounds like you see roughly the same behavior between classic and lab with files (unlike I saw with directories). I figured the delay was in the client-side rendering, but you're showing essentially server-side code, which implies thousands of directories should have resulted in the same behavior - i.e., delay seen in both Notebook and Lab (contrary to what I found).

((That repo you link is interesting. The sample for the failure case is particularly entertaining.))

kevin-bates · 2019-06-26T16:45:08Z

Hmm, I still see the same behaviors with files. I touched 10,000 files in my notebook directory (for i in {1..10000}; do touch zzz_${i}; done). Then ran notebook (with debug enabled).

With Notebook "classic", I see the contents api completes in just over 1 second, but the rendering (not sure if that's the approrpriate use of the term here, not a front-end dev) on the order of 48 seconds as I attempt to scroll. This scrolling is also accompanied by "Page Unresponsive" dialogs (using Chrome).

[D 08:55:48.309 NotebookApp] 200 GET /api/sessions?_=1561563455783 (::1) 1.09ms
[D 08:55:48.312 NotebookApp] 200 GET /api/terminals?_=1561563455784 (::1) 1.11ms
[D 08:55:49.608 NotebookApp] 200 GET /api/contents?type=directory&_=1561563455785 (::1) 1115.29ms
[D 08:56:36.954 NotebookApp] 200 GET /api/sessions?_=1561563455786 (::1) 0.90ms
[D 08:56:36.955 NotebookApp] 200 GET /api/terminals?_=1561563455787 (::1) 0.71ms
[D 08:56:38.371 NotebookApp] 200 GET /api/contents?type=directory&_=1561563455788 (::1) 1126.69ms
[D 08:57:30.000 NotebookApp] 200 GET /api/sessions?_=1561563455789 (::1) 0.94ms
[D 08:57:30.003 NotebookApp] 200 GET /api/terminals?_=1561563455790 (::1) 1.16ms
[D 08:57:31.326 NotebookApp] 200 GET /api/contents?type=directory&_=1561563455791 (::1) 1130.55ms

Switching the url to Lab, I see the same contents api taking just over 1 second, but the scrolling appears to be fine, with gaps between contents calls taking on the order of 8 seconds. However, I see no delay in the UI, so I suspect this "retrieval & rendering work" is happening in the background.

[D 08:36:54.697 NotebookApp] 200 GET /api/sessions?1561563414694 (::1) 1.13ms
[D 08:36:54.698 NotebookApp] 200 GET /api/terminals?1561563414695 (::1) 0.85ms
[D 08:36:56.337 NotebookApp] 200 GET /api/contents/?content=1&1561563415174 (::1) 1160.75ms
[D 08:37:04.696 NotebookApp] 200 GET /api/sessions?1561563424693 (::1) 1.07ms
[D 08:37:04.698 NotebookApp] 200 GET /api/terminals?1561563424694 (::1) 0.90ms
[D 08:37:06.371 NotebookApp] 200 GET /api/contents/?content=1&1561563425175 (::1) 1193.78ms
[D 08:37:14.696 NotebookApp] 200 GET /api/sessions?1561563434693 (::1) 1.08ms
[D 08:37:14.698 NotebookApp] 200 GET /api/terminals?1561563434694 (::1) 0.81ms
[D 08:37:16.374 NotebookApp] 200 GET /api/contents/?content=1&1561563435179 (::1) 1192.88ms

Not sure why the contents api is occuring during scrolling given the contents service doesn't appear to have paging. This might just be how the front-end is written in order to deal with updates. I suspect there's a general assumption that notebook directories are sparsely populated - which is reasonable IMO.

esevan · 2019-06-27T04:19:11Z

Not sure why the contents api is occuring during scrolling given the contents service doesn't appear to have paging.

I suspect this is due to periodic refreshing behavior of directory change in Jupyter Lab because contents API does not return the path under the directory incrementally (referring to the code I attached above).

As for the test result, my environment magnifies the problem since it resides in remote server and allocated resource to the server is quite small (CPU 2, Memory 4Gi).

So I tested in local, result is similar with @kevin-bates reported.
After touching 10,000 files by for i in {1..10000}; do touch zzz_${i}; done
Actually I can see the result shows RTT of the contents request is quite big and suspect it's linearly increasing.

[D 12:57:27.835 LabApp] 200 GET /api/contents/contents_test/10000?content=1&1561607589692 (10.113.66.26) 1233.79ms

So I increased the number of files to 10x more than above by for i in {1..100000}; do touch zzz_${i}; done

[D 13:05:59.067 LabApp] 200 GET /api/contents/contents_test/100000?content=1&1561608090756 (10.113.66.26) 11399.31ms

I can check contents api almost takes 10x more time to get response.

Here I could find the problem I think.

Listing directory in server side takes long time proportional to the number of files the directory has.
-> This can be a big problem, I think notebook contents api needs to support incremental contents API and responsive UI or pagination should be developed in front end side.

Jupyter Lab hangs while server handles the request. So another request to the server cannot be responded.
-> I'm not sure if this is an issue due to the server side logic, but suspect due to both side (front and back)
-> I could check the browser cannot send subsequent request while the server handles the request.
: Maybe HOL blocking issue happens.
-> The coroutine seems blocked due to following code is not an async code.

notebook/notebook/services/contents/filemanager.py

Lines 310 to 343 in 6d15e9c

    
           if content: 
        
               model['content'] = contents = [] 
        
               os_dir = self._get_os_path(path) 
        
               for name in os.listdir(os_dir): 
        
                   try: 
        
                       os_path = os.path.join(os_dir, name) 
        
                   except UnicodeDecodeError as e: 
        
                       self.log.warning( 
        
                           "failed to decode filename '%s': %s", name, e) 
        
                       continue 
        
                   try: 
        
                       st = os.lstat(os_path) 
        
                   except OSError as e: 
        
                       # skip over broken symlinks in listing 
        
                       if e.errno == errno.ENOENT: 
        
                           self.log.warning("%s doesn't exist", os_path) 
        
                       else: 
        
                           self.log.warning("Error stat-ing %s: %s", os_path, e) 
        
                       continue 
        
                   if (not stat.S_ISLNK(st.st_mode) 
        
                           and not stat.S_ISREG(st.st_mode) 
        
                           and not stat.S_ISDIR(st.st_mode)): 
        
                       self.log.debug("%s not a regular file", os_path) 
        
                       continue 
        
                   if self.should_list(name): 
        
                       if self.allow_hidden or not is_file_hidden(os_path, stat_res=st): 
        
                           contents.append( 
        
                                   self.get(path='%s/%s' % (path, name), content=False) 
        
                           ) 
        
               model['format'] = 'json'

Cannot render 100,000 files in the browser, though server managed to respond.
-> I believe this is highly due to frontend problem. Should request incrementally and provide better UX.

miraculixx · 2020-06-17T07:51:57Z

@esevan Great analysis. I am experiencing the same issue (in particular, JupyterLab hangs intermittently). Here are a few more insights from my POV.

I believe this is highly due to frontend problem.

In my case the key problem is that JupyterLab seems to issue a new call to the /api/contents API before and after (?) each cell execution, with the ?content=1 flag set. This in turn issues a call to the FileContentsManager.get(..., content=True), thus requesting the actual file contents. Note we have subclassed FileContentsManager to support storing notebooks in a database, which aggravates the problem -- already with 100 or so notebooks this can slow down the process to the point where the api call can take 5-10 seconds to complete. In conclusion, its not really a UI issue, though it is caused by the way the UI requests the contents listing.

-> I'm not sure if this is an issue due to the server side logic, but suspect due to both side

The server side logic seems ok (except that it is blocking, not sure the ContentsManager api supports async). However it is not quite clear to me why JupyterLab requests the file contents when all it really does is to refresh the directory listing. In particular, JupyterLab - like the previous Jupyter server file listing, i.e. /tree - issues a specific get request to get the actual contents once the file/notebook is opened.

I see several possible approaches to improve the situation:

Return some dummy contents on directory requests which would speed up the process.
Change JupyterLab to get directory listings with ?content=0
Cache the actual contents for some time in the server

Not sure if option 1 interferes with the JupyterLab UI logic (it may use the actual contents to display icons or other information).

From my perspective option 2 would the best. We should avoid option 3 as it is bound to introduce consistency issues.

Currently I don't have the capacity to dig further or open an issue in the JupyterLab tracker, any support would be appreciated.

jamesdbrock mentioned this issue Jul 20, 2020

FileBrowser large directory performance jupyterlab/jupyterlab#8700

Open

JasonWeill mentioned this issue Jan 19, 2023

Fresh install of Jupyter is too slow to use #6701

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Loading file list of large directory is too slow #4706

Loading file list of large directory is too slow #4706

esevan commented Jun 21, 2019

kevin-bates commented Jun 21, 2019

esevan commented Jun 22, 2019 •

edited

Loading

kevin-bates commented Jun 25, 2019

esevan commented Jun 26, 2019 •

edited

Loading

kevin-bates commented Jun 26, 2019

kevin-bates commented Jun 26, 2019

esevan commented Jun 27, 2019 •

edited

Loading

miraculixx commented Jun 17, 2020 •

edited

Loading

Loading file list of large directory is too slow #4706

Loading file list of large directory is too slow #4706

Comments

esevan commented Jun 21, 2019

kevin-bates commented Jun 21, 2019

esevan commented Jun 22, 2019 • edited Loading

kevin-bates commented Jun 25, 2019

esevan commented Jun 26, 2019 • edited Loading

kevin-bates commented Jun 26, 2019

kevin-bates commented Jun 26, 2019

esevan commented Jun 27, 2019 • edited Loading

miraculixx commented Jun 17, 2020 • edited Loading

esevan commented Jun 22, 2019 •

edited

Loading

esevan commented Jun 26, 2019 •

edited

Loading

esevan commented Jun 27, 2019 •

edited

Loading

miraculixx commented Jun 17, 2020 •

edited

Loading