Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Receptor prints Python tracebacks when it ought to print human-friendly error messages #219

Open
elyezer opened this issue May 5, 2020 · 4 comments
Labels
bug Something isn't working
Milestone

Comments

@elyezer
Copy link
Member

elyezer commented May 5, 2020

In general, receptor prints raw stacktraces when it finds an error while running the CLI. Considering receptor status as an example here.

Help message of receptor status does not show any required argument:

$ poetry run receptor status --help
usage: receptor status [-h] [--peer STATUS_PEER]
                       [--ws_extra_header STATUS_WS_EXTRA_HEADERS]
                       [--show-ephemeral] [--ws_heartbeat STATUS_WS_HEARTBEAT]

optional arguments:
  -h, --help            show this help message and exit
  --peer STATUS_PEER    The peer to access the mesh through. If unspecified
                        here or in a config file, localhost:8888 will be used.
  --ws_extra_header STATUS_WS_EXTRA_HEADERS
                        Set additional headers to provide when connecting to
                        websocket peers.
  --show-ephemeral      Show ephemeral nodes in output
  --ws_heartbeat STATUS_WS_HEARTBEAT
                        Set heartbeat interval for websocket connection

But when we try to run it without any option it raises the following:

$ poetry run receptor status
ERROR 2020-05-05 13:35:35,710  __main__ main: an error occured while running receptor
Traceback (most recent call last):
  File "/home/elyezer/code/receptor/receptor/receptor/entrypoints.py", line 210, in run_as_status
    controller = Controller(config)
  File "/home/elyezer/code/receptor/receptor/receptor/controller.py", line 34, in __init__
    self.receptor = Receptor(config)
  File "/home/elyezer/code/receptor/receptor/receptor/receptor.py", line 94, in __init__
    os.makedirs(os.path.join(self.config.default_data_dir, self.node_id))
  File "/usr/lib64/python3.7/os.py", line 213, in makedirs
    makedirs(head, exist_ok=exist_ok)
  File "/usr/lib64/python3.7/os.py", line 223, in makedirs
    mkdir(name, mode)
PermissionError: [Errno 13] Permission denied: '/var/lib/receptor'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/elyezer/code/receptor/receptor/receptor/__main__.py", line 59, in main
    config.go()
  File "/home/elyezer/code/receptor/receptor/receptor/config.py", line 570, in go
    self._parsed_args.func(self)
  File "/home/elyezer/code/receptor/receptor/receptor/entrypoints.py", line 213, in run_as_status
    controller.cleanup_tmpdir()
UnboundLocalError: local variable 'controller' referenced before assignment

It is expected that the status would just work when run without any option by either just working or providing a meaningful message why it was not able to run.

In the above example I had nodes running on ports 9999 and 9998, that said it probably failed because it tried to connect to the default localhost:8888 but hasn't found anything running there.

@ghjm
Copy link
Contributor

ghjm commented May 7, 2020

There are several related problems here.

The reason "receptor status" failed is that it has to start an ephemeral node to communicate with the Receptor mesh (problem 1), so it tried to store its manifest data under /var/lib/receptor, and did not have permission to to so (problem 2). After this failure, it printed a traceback instead of a nicely-formatted error message (problem 3), and while it was doing this, the exception handler itself generated another exception (problem 4), which was then printed as a traceback instead of a nicely-formatted error message (problem 3 again).

Taking these problems one at a time:

  • Problem 1 will be solved if and when Fix race condition when send happens after close ansible/receptor#208 is implemented. Until then, we just have to live with ephemeral nodes.
  • Problem 2 can be solved by giving your uid access to /var/lib/receptor, passing a -d /whatever/dir argument, or by editing /etc/receptor/receptor.conf to specify a different directory (I use /tmp/receptor).
  • Problem 3 probably requires a top-down rethink of how we handle error conditions in Receptor, or if we can't get that done, maybe at least a global exception handler with better default behavior. Also, I'll just mention that the whole concept of exception handling in Python asyncio in general, and Receptor in particular, is a maze of twisty little passages, all different.
  • Problem 4 can probably be fixed by putting the Controller constructor call outside the try..finally block in entrypoints.py.

@elyezer, can you clarify which of these problems you mean for this issue card to track?

@elyezer
Copy link
Member Author

elyezer commented May 7, 2020

@ghjm it seems that problem 3, maybe also 4, would be a great candidate here. Usually it is not a great experience to try things and get all these stacktraces as the output.

About problem 2, we have a separate issue ansible/receptor#195.

@ghjm ghjm added the bug Something isn't working label May 7, 2020
@ghjm
Copy link
Contributor

ghjm commented May 7, 2020

@elyezer That's great, I hadn't seen #195 and we do need an issue to track problem 3. Can you edit the title of this to something like "Receptor prints Python tracebacks when it ought to print human-friendly error messages" ?

@elyezer elyezer changed the title receptor status raises an error when run without any option Receptor prints Python tracebacks when it ought to print human-friendly error messages May 7, 2020
@elyezer
Copy link
Member Author

elyezer commented May 7, 2020

@ghjm sure thing, I will edit a bit the description to reflect the intents here.

@matburt matburt added this to the 1.0 Release milestone May 13, 2020
# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants