Extremely slow line iteration on Windows/ProactorEventLoop #47

lych77 · 2018-08-31T04:55:16Z

Windows 7 64-bit, Python 3.7.0 64-bit, aiofiles 0.4.0

Test program:

import asyncio, aiofiles, timeit

with open('sample.txt', 'w') as f:
	for i in range(10000):
		print('A'*100, file=f)

loop = asyncio.ProactorEventLoop()
asyncio.set_event_loop(loop)

def traditional_read():
	with open('sample.txt', encoding='utf-8') as f:
		for ln in f:
			pass

async def aiofiles_read():
	async with aiofiles.open('sample.txt', encoding='utf=8') as f:
		async for ln in f:
			pass

print('traditional:', timeit.timeit('traditional_read()', number=1, globals=globals()))
print('aiofiles:', timeit.timeit('loop.run_until_complete(aiofiles_read())', number=1, globals=globals()))

Output:

traditional: 0.005477579999999982
aiofiles: 1.563328274

The sample file is not a very big one (less than 1MB), but it contains a bit "too many" lines. So is the result caused by excess thread context switches? Can this be avoided by some approach?

The text was updated successfully, but these errors were encountered:

Jitsusama · 2018-11-28T16:18:52Z

I'm noticing the same thing. I have a 240 meg file that I'm trying to read in; and I've tried it with 3 different approaches. The first, inserting asyncio.sleep calls after each line is read in with the builtin open function, the second, utilizing aiofiles, the third without async using the builtin open function. Here's the code:

import asyncio
import logging
import aiofiles

logging.basicConfig(format='%(asctime)s %(message)s')
logging.getLogger().setLevel(level=logging.INFO)

async def async_read_with_aiofiles(file_name: str):
    line_count = 0
    logging.info('starting async aiofiles')
    async with aiofiles.open(file=file_name, mode='r', newline='\r\n') as file:
        async for _ in file:
            line_count += 1
    logging.info(f'read {line_count} lines with async aiofiles')

async def async_read_with_stdio(file_name: str):
    line_count = 0
    logging.info('starting async stdio')
    with open(file=file_name, mode='r', newline='\r\n') as file:
        for _ in file:
            line_count += 1
            await asyncio.sleep(0)
    logging.info(f'read {line_count} lines with async stdio')

def sync_read_with_stdio(file_name: str):
    line_count = 0
    logging.info('starting sync stdio')
    with open(file=file_name, mode='r', newline='\r\n') as file:
        for _ in file:
            line_count += 1
    logging.info(f'read {line_count} lines with sync stdio')

loop = asyncio.get_event_loop()

Here's the results of running it on a ~240MB file through each of the 3 options:

>>> sync_read_with_stdio(file_name='large_file')
2018-11-28 11:04:21,546 starting sync stdio
2018-11-28 11:04:22,935 read 2769652 lines with sync stdio
>>> loop.run_until_complete(async_read_with_stdio(file_name='large_file'))
2018-11-28 11:06:05,393 starting async stdio
2018-11-28 11:06:57,863 read 2769652 lines with async stdio
>>> loop.run_until_complete(async_read_with_aiofiles(file_name='large_file'))
2018-11-28 11:07:14,362 starting async aiofiles
2018-11-28 11:14:26,550 read 2769652 lines with async aiofiles

BTW; I'm on the exact same platform and versions as @lych77.

ccosby · 2019-04-30T14:50:59Z

Both SelectorEventLoop and ProactorEventLoop display the same behavior. It's just plain slow. Used the code from @lych77

MacOS / Python-3.7.3 / aiofiles-0.4.0 / SelectorEventLoop

traditional: 0.001899950000000039
aiofiles: 1.288835948

RHEL-7.6 / Python-3.7.3 / aiofiles-0.4.0 / SelectorEventLoop

traditional: 0.0033165980130434036
aiofiles: 2.984602615237236

espdev · 2019-07-14T20:51:05Z

I have the same problem with aiofiles.

My simple benchmark and results on linux/uvloop:

import time
import asyncio
import aiofiles
import uvloop

uvloop.install()


async def _async_file_reader(file_name):
    async with aiofiles.open(file_name, 'r') as fp:
        async for line in fp:
            yield line

async def _async_write_file(file_name, lines):
    async with aiofiles.open(file_name, 'w') as fp:
        for line in lines:
            await fp.write(f"{line}\n")
        await fp.flush()

def _file_reader(file_name):
    with open(file_name, 'r') as fp:
        for line in fp:
            yield line

def _write_file(file_name, lines):
    with open(file_name, 'w') as fp:
        for line in lines:
            fp.write(f"{line}\n")
        fp.flush()

async def async_main(file_name):
    ts = time.time()
    lines = []

    async for line in _async_file_reader(file_name):
        lines.append(line)

    te = time.time() - ts
    print(f'ASYNC FILE READING: {te:.2f} sec')

    ts = time.time()
    await _async_write_file('/tmp/bbb.txt', lines)
    te = time.time() - ts
    print(f'ASYNC FILE WRITING: {te:.2f} sec')


def main(file_name):
    ts = time.time()
    lines = []

    for line in _file_reader(file_name):
        lines.append(line)

    te = time.time() - ts
    print(f'FILE READING: {te:.2f} sec')

    ts = time.time()
    _write_file('/tmp/aaa.txt', lines)
    te = time.time() - ts
    print(f'FILE WRITING: {te:.2f} sec')


if __name__ == '__main__':
    file_name = 'lines.txt'

    main(file_name)

    loop = asyncio.get_event_loop()
    loop.run_until_complete(async_main(file_name))
    loop.close()

# file with 10K lines
FILE READING: 0.01 sec
FILE WRITING: 0.06 sec
ASYNC FILE READING: 2.90 sec
ASYNC FILE WRITING: 2.68 sec

In my real code aiofiles ~14000x slower than not async IO when I use aiofiles and async for in reader->chunker->concurrent_workers pipeline

Maybe a problem in this code that called for each line if we use async for:

aiofiles/aiofiles/threadpool/utils.py

Lines 31 to 37 in cba6910

    
           def _make_delegate_method(attr_name): 
        
               @asyncio.coroutine 
        
               def method(self, *args, **kwargs): 
        
                   cb = functools.partial(getattr(self._file, attr_name), 
        
                                          *args, **kwargs) 
        
                   return (yield from self._loop.run_in_executor(self._executor, cb)) 
        
               return method

aiofiles/aiofiles/base.py

Lines 19 to 26 in cba6910

    
           @asyncio.coroutine 
        
           def __anext__(self): 
        
               """Simulate normal file iteration.""" 
        
               line = yield from self.readline() 
        
               if line: 
        
                   return line 
        
               else: 
        
                   raise StopAsyncIteration

Therefore, I think async for is not applicable. read() should be used instead.

Bobronium · 2019-07-14T20:51:33Z

The reason for this issue is that for each line aiofiles will start new task in executor, which is a huge overhead just for one line reading:

loop.run_in_executor(executor, file.readline)

Code above is just for demonstration, original code is:

aiofiles/aiofiles/threadpool/text.py

Lines 6 to 7 in d85a65f

    
           @delegate_to_executor('close', 'flush', 'isatty', 'read', 'readable', 
        
                                 'readline', 'readlines', 'seek', 'seekable', 'tell',

aiofiles/aiofiles/threadpool/utils.py

Lines 5 to 10 in 1451075

    
           def delegate_to_executor(*attrs): 
        
               def cls_builder(cls): 
        
                   for attr_name in attrs: 
        
                       setattr(cls, attr_name, _make_delegate_method(attr_name)) 
        
                   return cls 
        
               return cls_builder

aiofiles/aiofiles/threadpool/utils.py

Lines 31 to 37 in 1451075

    
           def _make_delegate_method(attr_name): 
        
               @asyncio.coroutine 
        
               def method(self, *args, **kwargs): 
        
                   cb = functools.partial(getattr(self._file, attr_name), 
        
                                          *args, **kwargs) 
        
                   return (yield from self._loop.run_in_executor(self._executor, cb)) 
        
               return method

espdev · 2019-07-14T21:01:16Z

Thanks @MrMrRobat for investigating.

* Tinche/aiofiles#47 Thread pool is no longer used for files in memory * fix tests * fix import sorted * little change

denravonska · 2020-09-02T13:18:48Z

Reading binary files also seems to be super slow. I'm streaming files from a VM to a server on the host OS and I get ~1-1.6MBit/s.

* Tinche/aiofiles#47 Thread pool is no longer used for files in memory * fix tests * fix import sorted * little change

abersheeran mentioned this issue May 6, 2020

Maybe the thread pool should no longer be used for reading and writing files? encode/starlette#819

Closed

tomchristie pushed a commit to encode/starlette that referenced this issue May 6, 2020

Thread pool is no longer used for files in memory (#933)

9725751

* Tinche/aiofiles#47 Thread pool is no longer used for files in memory * fix tests * fix import sorted * little change

hidraco added a commit to hidraco/starlette that referenced this issue Apr 21, 2022

Thread pool is no longer used for files in memory (#933)

353a443

* Tinche/aiofiles#47 Thread pool is no longer used for files in memory * fix tests * fix import sorted * little change

azurelotus0926 added a commit to azurelotus0926/starlette that referenced this issue Jun 27, 2024

Thread pool is no longer used for files in memory (#933)

e721d76

* Tinche/aiofiles#47 Thread pool is no longer used for files in memory * fix tests * fix import sorted * little change

github-actions bot pushed a commit to Kludex/jik that referenced this issue Aug 16, 2024

Thread pool is no longer used for files in memory (#933)

1380b80

* Tinche/aiofiles#47 Thread pool is no longer used for files in memory * fix tests * fix import sorted * little change

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Extremely slow line iteration on Windows/ProactorEventLoop #47

Extremely slow line iteration on Windows/ProactorEventLoop #47

lych77 commented Aug 31, 2018

Jitsusama commented Nov 28, 2018

ccosby commented Apr 30, 2019 •

edited

Loading

espdev commented Jul 14, 2019 •

edited

Loading

Bobronium commented Jul 14, 2019 •

edited

Loading

espdev commented Jul 14, 2019 •

edited

Loading

denravonska commented Sep 2, 2020

Extremely slow line iteration on Windows/ProactorEventLoop #47

Extremely slow line iteration on Windows/ProactorEventLoop #47

Comments

lych77 commented Aug 31, 2018

Jitsusama commented Nov 28, 2018

ccosby commented Apr 30, 2019 • edited Loading

espdev commented Jul 14, 2019 • edited Loading

Bobronium commented Jul 14, 2019 • edited Loading

espdev commented Jul 14, 2019 • edited Loading

denravonska commented Sep 2, 2020

ccosby commented Apr 30, 2019 •

edited

Loading

espdev commented Jul 14, 2019 •

edited

Loading

Bobronium commented Jul 14, 2019 •

edited

Loading

espdev commented Jul 14, 2019 •

edited

Loading