-
-
Notifications
You must be signed in to change notification settings - Fork 156
New issue
Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? # to your account
Extremely slow line iteration on Windows/ProactorEventLoop #47
Comments
I'm noticing the same thing. I have a 240 meg file that I'm trying to read in; and I've tried it with 3 different approaches. The first, inserting asyncio.sleep calls after each line is read in with the builtin open function, the second, utilizing aiofiles, the third without async using the builtin open function. Here's the code: import asyncio
import logging
import aiofiles
logging.basicConfig(format='%(asctime)s %(message)s')
logging.getLogger().setLevel(level=logging.INFO)
async def async_read_with_aiofiles(file_name: str):
line_count = 0
logging.info('starting async aiofiles')
async with aiofiles.open(file=file_name, mode='r', newline='\r\n') as file:
async for _ in file:
line_count += 1
logging.info(f'read {line_count} lines with async aiofiles')
async def async_read_with_stdio(file_name: str):
line_count = 0
logging.info('starting async stdio')
with open(file=file_name, mode='r', newline='\r\n') as file:
for _ in file:
line_count += 1
await asyncio.sleep(0)
logging.info(f'read {line_count} lines with async stdio')
def sync_read_with_stdio(file_name: str):
line_count = 0
logging.info('starting sync stdio')
with open(file=file_name, mode='r', newline='\r\n') as file:
for _ in file:
line_count += 1
logging.info(f'read {line_count} lines with sync stdio')
loop = asyncio.get_event_loop() Here's the results of running it on a ~240MB file through each of the 3 options: >>> sync_read_with_stdio(file_name='large_file')
2018-11-28 11:04:21,546 starting sync stdio
2018-11-28 11:04:22,935 read 2769652 lines with sync stdio
>>> loop.run_until_complete(async_read_with_stdio(file_name='large_file'))
2018-11-28 11:06:05,393 starting async stdio
2018-11-28 11:06:57,863 read 2769652 lines with async stdio
>>> loop.run_until_complete(async_read_with_aiofiles(file_name='large_file'))
2018-11-28 11:07:14,362 starting async aiofiles
2018-11-28 11:14:26,550 read 2769652 lines with async aiofiles BTW; I'm on the exact same platform and versions as @lych77. |
Both SelectorEventLoop and ProactorEventLoop display the same behavior. It's just plain slow. Used the code from @lych77 MacOS / Python-3.7.3 / aiofiles-0.4.0 / SelectorEventLoop
RHEL-7.6 / Python-3.7.3 / aiofiles-0.4.0 / SelectorEventLoop
|
I have the same problem with aiofiles. My simple benchmark and results on linux/uvloop: import time
import asyncio
import aiofiles
import uvloop
uvloop.install()
async def _async_file_reader(file_name):
async with aiofiles.open(file_name, 'r') as fp:
async for line in fp:
yield line
async def _async_write_file(file_name, lines):
async with aiofiles.open(file_name, 'w') as fp:
for line in lines:
await fp.write(f"{line}\n")
await fp.flush()
def _file_reader(file_name):
with open(file_name, 'r') as fp:
for line in fp:
yield line
def _write_file(file_name, lines):
with open(file_name, 'w') as fp:
for line in lines:
fp.write(f"{line}\n")
fp.flush()
async def async_main(file_name):
ts = time.time()
lines = []
async for line in _async_file_reader(file_name):
lines.append(line)
te = time.time() - ts
print(f'ASYNC FILE READING: {te:.2f} sec')
ts = time.time()
await _async_write_file('/tmp/bbb.txt', lines)
te = time.time() - ts
print(f'ASYNC FILE WRITING: {te:.2f} sec')
def main(file_name):
ts = time.time()
lines = []
for line in _file_reader(file_name):
lines.append(line)
te = time.time() - ts
print(f'FILE READING: {te:.2f} sec')
ts = time.time()
_write_file('/tmp/aaa.txt', lines)
te = time.time() - ts
print(f'FILE WRITING: {te:.2f} sec')
if __name__ == '__main__':
file_name = 'lines.txt'
main(file_name)
loop = asyncio.get_event_loop()
loop.run_until_complete(async_main(file_name))
loop.close()
In my real code aiofiles ~14000x slower than not async IO when I use Maybe a problem in this code that called for each line if we use aiofiles/aiofiles/threadpool/utils.py Lines 31 to 37 in cba6910
Lines 19 to 26 in cba6910
Therefore, I think |
The reason for this issue is that for each line loop.run_in_executor(executor, file.readline) Code above is just for demonstration, original code is: aiofiles/aiofiles/threadpool/text.py Lines 6 to 7 in d85a65f
aiofiles/aiofiles/threadpool/utils.py Lines 5 to 10 in 1451075
aiofiles/aiofiles/threadpool/utils.py Lines 31 to 37 in 1451075
|
Thanks @MrMrRobat for investigating. |
* Tinche/aiofiles#47 Thread pool is no longer used for files in memory * fix tests * fix import sorted * little change
Reading binary files also seems to be super slow. I'm streaming files from a VM to a server on the host OS and I get ~1-1.6MBit/s. |
* Tinche/aiofiles#47 Thread pool is no longer used for files in memory * fix tests * fix import sorted * little change
* Tinche/aiofiles#47 Thread pool is no longer used for files in memory * fix tests * fix import sorted * little change
* Tinche/aiofiles#47 Thread pool is no longer used for files in memory * fix tests * fix import sorted * little change
* Tinche/aiofiles#47 Thread pool is no longer used for files in memory * fix tests * fix import sorted * little change
Windows 7 64-bit, Python 3.7.0 64-bit, aiofiles 0.4.0
Test program:
Output:
The sample file is not a very big one (less than 1MB), but it contains a bit "too many" lines. So is the result caused by excess thread context switches? Can this be avoided by some approach?
The text was updated successfully, but these errors were encountered: