Skip to content
This repository has been archived by the owner on Oct 31, 2023. It is now read-only.

Message Queue #4111

Merged
merged 35 commits into from
Apr 24, 2019
Merged

Message Queue #4111

merged 35 commits into from
Apr 24, 2019

Conversation

jiivan
Copy link
Contributor

@jiivan jiivan commented Apr 11, 2019

Final step of implementing message queue in golem. Plug in all functionality in TaskServer. Move unnecessary logic from TaskSession into TaskServer or it's submodules in golem.task.server.

fixes: #2223
fixes: #2403
fixes: #4117
fixes: #2404

@ghost ghost assigned jiivan Apr 11, 2019
@ghost ghost added the in progress label Apr 11, 2019
@jiivan jiivan marked this pull request as ready for review April 15, 2019 11:22
@jiivan jiivan requested a review from maaktweluit April 15, 2019 12:20
Copy link
Contributor

@shadeofblue shadeofblue left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks good as far as I could tell - would be good to have it tested by the QA before we merge it, adding to the QA project's backlog

@codecov
Copy link

codecov bot commented Apr 16, 2019

Codecov Report

Merging #4111 into develop will increase coverage by 0.13%.
The diff coverage is 82.38%.

@@             Coverage Diff             @@
##           develop    #4111      +/-   ##
===========================================
+ Coverage    88.63%   88.77%   +0.13%     
===========================================
  Files          214      215       +1     
  Lines        18700    18632      -68     
===========================================
- Hits         16575    16540      -35     
+ Misses        2125     2092      -33

Copy link
Contributor

@maaktweluit maaktweluit left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

left some small comments

will check it again tomorrow and approve :)

@jiivan jiivan force-pushed the msg_queue branch 6 times, most recently from 1c8b6a0 to 4086c3f Compare April 18, 2019 10:34
@ederenn
Copy link

ederenn commented Apr 18, 2019

Tested on:

INFO     [golemapp                           ] Protocol Version: 32
INFO     [golemapp                           ] golem_messages Version: 3.3.1

After connecting to other nodes in network I've received

encontered on Linux18 and Windows 10

INFO     [golem.client                       ] Resumed
Unhandled Error
Traceback (most recent call last):
  File "C:\Users\ederenn\Projects\golem-env\lib\site-packages\twisted\python\log.py", line 103, in callWithLogger
    return callWithContext({"system": lp}, func, *args, **kw)
  File "C:\Users\ederenn\Projects\golem-env\lib\site-packages\twisted\python\log.py", line 86, in callWithContext
    return context.call({ILogContext: newCtx}, func, *args, **kw)
  File "C:\Users\ederenn\Projects\golem-env\lib\site-packages\twisted\python\context.py", line 122, in callWithContext
    return self.currentContext().callWithContext(ctx, func, *args, **kw)
  File "C:\Users\ederenn\Projects\golem-env\lib\site-packages\twisted\python\context.py", line 85, in callWithContext
    return func(*args,**kw)
--- <exception caught here> ---
  File "C:\Users\ederenn\Projects\golem-env\lib\site-packages\twisted\internet\iocpreactor\reactor.py", line 119, in _callEventCallback
    evt.callback(rc, numBytes, evt)
  File "C:\Users\ederenn\Projects\golem-env\lib\site-packages\twisted\internet\iocpreactor\tcp.py", line 552, in cbAccept
    self.handleAccept(rc, evt)
  File "C:\Users\ederenn\Projects\golem-env\lib\site-packages\twisted\internet\iocpreactor\tcp.py", line 594, in handleAccept
    protocol.makeConnection(transport)
  File "C:\Users\ederenn\Projects\golem-env\lib\site-packages\twisted\internet\protocol.py", line 510, in makeConnection
    self.connectionMade()
  File "C:\Users\ederenn\Projects\golem\golem\network\transport\tcpnetwork.py", line 435, in connectionMade
    self.server.new_connection(self.session)
  File "C:\Users\ederenn\Projects\golem\golem\task\taskserver.py", line 405, in new_connection
    session.disconnect(message.base.Disconnect.REASON.NoMoreMessages)
  File "C:\Users\ederenn\Projects\golem\golem\network\transport\session.py", line 108, in disconnect
    self._send_disconnect(reason)
  File "C:\Users\ederenn\Projects\golem\golem\network\transport\session.py", line 140, in _send_disconnect
    self.send(message.base.Disconnect(reason=reason))
  File "C:\Users\ederenn\Projects\golem\golem\task\tasksession.py", line 897, in send
    raise RuntimeError('Connection unverified')
builtins.RuntimeError: Connection unverified

CRITICAL [twisted                            ] Unhandled Error
Traceback (most recent call last):
  File "C:\Users\ederenn\Projects\golem-env\lib\site-packages\twisted\python\log.py", line 103, in callWithLogger
    return callWithContext({"system": lp}, func, *args, **kw)
  File "C:\Users\ederenn\Projects\golem-env\lib\site-packages\twisted\python\log.py", line 86, in callWithContext
    return context.call({ILogContext: newCtx}, func, *args, **kw)
  File "C:\Users\ederenn\Projects\golem-env\lib\site-packages\twisted\python\context.py", line 122, in callWithContext
    return self.currentContext().callWithContext(ctx, func, *args, **kw)
  File "C:\Users\ederenn\Projects\golem-env\lib\site-packages\twisted\python\context.py", line 85, in callWithContext
    return func(*args,**kw)
--- <exception caught here> ---
  File "C:\Users\ederenn\Projects\golem-env\lib\site-packages\twisted\internet\iocpreactor\reactor.py", line 119, in _callEventCallback
    evt.callback(rc, numBytes, evt)
  File "C:\Users\ederenn\Projects\golem-env\lib\site-packages\twisted\internet\iocpreactor\tcp.py", line 552, in cbAccept
    self.handleAccept(rc, evt)
  File "C:\Users\ederenn\Projects\golem-env\lib\site-packages\twisted\internet\iocpreactor\tcp.py", line 594, in handleAccept
    protocol.makeConnection(transport)
  File "C:\Users\ederenn\Projects\golem-env\lib\site-packages\twisted\internet\protocol.py", line 510, in makeConnection
    self.connectionMade()
  File "C:\Users\ederenn\Projects\golem\golem\network\transport\tcpnetwork.py", line 435, in connectionMade
    self.server.new_connection(self.session)
  File "C:\Users\ederenn\Projects\golem\golem\task\taskserver.py", line 405, in new_connection
    session.disconnect(message.base.Disconnect.REASON.NoMoreMessages)
  File "C:\Users\ederenn\Projects\golem\golem\network\transport\session.py", line 108, in disconnect
    self._send_disconnect(reason)
  File "C:\Users\ederenn\Projects\golem\golem\network\transport\session.py", line 140, in _send_disconnect
    self.send(message.base.Disconnect(reason=reason))
  File "C:\Users\ederenn\Projects\golem\golem\task\tasksession.py", line 897, in send
    raise RuntimeError('Connection unverified')
builtins.RuntimeError: Connection unverified

network consists of 5 nodes, 1 requestor Linux, 4 providers. Linux, Mac OS, Windows
At first try, 3 different providers were computing for requestor. Only Mac and Linux provided with success, providing by Windows ended at every try in failure.
task, 1 frame 6 subtasks, was finished successfully but mainly on one node. two providers computed only one subtask in task the rest has been finished on third provider.
Situation has repeated in next four tasks.
After restart of a provider providing nodes were not staring computations at all, no additional errors in regular logs. Might be worth trying on debug.

After another restart of a requestor nodes started to provide.

Errors encountered during first round of computations:

from requestor:

2019-04-18 15:42:19 INFO     golem.task.taskserver               Computation for task '3c98b2c8-61df-11e9-8e9d-5ad86f3605c2' failed: 'Error downloading resources: [Failure instance: Traceback (failure with no frames): <class \'requests.exceptions.HTTPError\'>: {"error":"d620bddd8cf0439914e84f631bcabead1849a6a8572ca3ac866d398bf53fa5e5 download timed out after 234 s"}\n]'.

subtask has been provided by Windows node

when subtask checked from cli:

>> subtasks show 3c98b2c8-61df-11e9-8e9d-5ad86f3605c2
node_id: d178f6b5bb2c4533ef16df75f8407a99c8d10f1475639d76ac94dfc272d814baf42ebffac96c0a14e408f8bb882043045ffedd3a43d07689e7916fa0042a2b93
node_name: vm verif
progress: 1.0
results: []
status: Failure
stderr: Error downloading resources: [Failure instance: Traceback (failure with no frames): <class 'requests.exceptions.HTTPError'>: {"error":"d620bddd8cf0439914e84f631bcabead1849a6a8572ca3ac866d398bf53fa5e5 download timed out after 234 s"}
]
stdout: 
subtask_id: 3c98b2c8-61df-11e9-8e9d-5ad86f3605c2
time_remaining: 0.0
time_started: 1555594703.4309285

from requestor:

2019-04-18 15:39:33 CRITICAL twisted                             Unhandled error in Deferred:
2019-04-18 15:39:33 CRITICAL twisted                             
Traceback (most recent call last):
  File "/home/ederenn/projects/golem-env/lib/python3.6/site-packages/twisted/internet/defer.py", line 653, in _runCallbacks
    current.result = callback(current.result, *args, **kw)
  File "/home/ederenn/projects/golem/golem/task/tasksession.py", line 379, in _offer_chosen
    ctd["resources"] = self.task_server.get_resources(msg.task_id)
TypeError: 'NoneType' object does not support item assignment

from requestor:

2019-04-18 15:39:33 INFO     golem.task.taskmanager              no more computation needed: 2e2d79ae-61df-11e9-abf1-5ad86f3605c2
2019-04-18 15:39:33 CRITICAL twisted                             Unhandled error in Deferred:
2019-04-18 15:39:33 CRITICAL twisted                             
Traceback (most recent call last):
  File "/home/ederenn/projects/golem-env/lib/python3.6/site-packages/twisted/internet/defer.py", line 653, in _runCallbacks
    current.result = callback(current.result, *args, **kw)
  File "/home/ederenn/projects/golem/golem/task/tasksession.py", line 379, in _offer_chosen
    ctd["resources"] = self.task_server.get_resources(msg.task_id)
TypeError: 'NoneType' object does not support item assignment
2019-04-18 15:39:46 INFO     golem.task.tasksession              Received offer to compute. task_id='2e2d79ae-61df-11e9-abf1-5ad86f3605c2', node="'Oncoming Storm'(6c15fa96..d6a66c61)"

from provider:

019-04-18 14:37:36 ERROR    golem.task.taskkeeper               Problem restoring dumpfile: /home/ederenn/.local/share/golem/default/mainnet/tasks/comp_task_keeper.pickle; deleting broken file
Traceback (most recent call last):
  File "/home/ederenn/projects/golem/golem/task/taskkeeper.py", line 160, in restore
    data = pickle.load(f)
AttributeError: 'TaskHeader' object has no attribute 'resource_size'

repeated on providers several times in row

2019-04-18 16:01:57 ERROR    twisted                             TypeError: __init__() missing 1 required keyword-only argument: 'outfilebasename': Traceback (most recent call last):
  File "/home/ederenn/projects/golem-env/lib/python3.6/site-packages/autobahn/twisted/websocket.py", line 162, in _onMessage
    self.onMessage(payload, isBinary)
  File "/home/ederenn/projects/golem-env/lib/python3.6/site-packages/autobahn/wamp/websocket.py", line 95, in onMessage
    self._session.onMessage(msg)
  File "/home/ederenn/projects/golem-env/lib/python3.6/site-packages/autobahn/wamp/protocol.py", line 895, in onMessage
    on_reply = txaio.as_future(endpoint.fn, *invoke_args, **invoke_kwargs)
  File "/home/ederenn/projects/golem-env/lib/python3.6/site-packages/txaio/tx.py", line 417, in as_future
    return maybeDeferred(fun, *args, **kwargs)
--- <exception caught here> ---
  File "/home/ederenn/projects/golem-env/lib/python3.6/site-packages/twisted/internet/defer.py", line 150, in maybeDeferred
    result = f(*args, **kw)
  File "/home/ederenn/projects/golem/golem/client.py", line 901, in get_task_stats
    'provider_state': self.get_provider_status(),
  File "/home/ederenn/projects/golem/golem/client.py", line 1355, in get_provider_status
    task_computer.get_progress()
  File "/home/ederenn/projects/golem/golem/task/taskcomputer.py", line 219, in get_progress
    **c.extra_data,
builtins.TypeError: __init__() missing 1 required keyword-only argument: 'outfilebasename'

@jiivan
Copy link
Contributor Author

jiivan commented Apr 19, 2019

  1. builtins.RuntimeError: Connection unverified resolved in 64bebf8
  2. CTD None (NoMoreSubtasks before offer_chosen) resolved in cd15c3d
  3. AttributeError: 'TaskHeader' object has no attribute 'resource_size' looks unrelated to this PR
  4. builtins.TypeError: __init__() missing 1 required keyword-only argument: 'outfilebasename' also looks unrelated

@jiivan jiivan merged commit 7eeee39 into develop Apr 24, 2019
@jiivan jiivan deleted the msg_queue branch April 24, 2019 15:03
@ghost ghost removed the in progress label Apr 24, 2019
# for free to subscribe to this conversation on GitHub. Already have an account? #.
Labels
None yet
Projects
None yet
4 participants