You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The temporary shell script file /home/dir/.julia-htc/julia-1195449.sh seems OK:
#!/bin/sh
cd /tmp
/usr/bin/julia --worker=o7tjjc9VsZGKA8qn | /usr/bin/telnet machinenode.from_which_I_ran.julia 8848
All ouput *.o files look like:
Trying 192.168.1.3...
All ouput *.e files look like:
telnet: connect to address 192.168.1.3: Connection refused
(machinenode.from_which_I_ran.julia has IP address 192.168.1.3 , locally )
Other issue: The method "addprocs_htc(np::Integer) = addprocs(HTCManager(np))" does not seem to allow the specification a a different working directory. In many cases, htcondor will place the julia-1195449.sh and associated files into a temporary scratch working directory where one may want to stay during the worker lifetime. Couldn't we avoid that with a
As far as I can tell, the problem is stlll present!!! I keep failing launching workers with htcondor. The problem remains the same.
telnet keeps complaining:
telnet: connect to address 192.168.1.3: Connection refused
If I directly run "nc -l 8200" on a machine mmm in the cluster and I telnet mmm 820 . Telnet connection succeeds!!
It seems to me that equivalent of nc -l command is the listen(portnum) call at line 45 of the condor.jl script...
Anyhow, I'd be interested to read from anyone facing the same issue or not, while using ClusterManagers in a HTCondor scheduler!
I get the following error on my local cluster with htcondor scheduler ( julia version 1.1.0-DEV). 1
The created condor script file seems OK:
The temporary shell script file /home/dir/.julia-htc/julia-1195449.sh seems OK:
All ouput *.o files look like:
Trying 192.168.1.3...
All ouput *.e files look like:
telnet: connect to address 192.168.1.3: Connection refused
(machinenode.from_which_I_ran.julia has IP address 192.168.1.3 , locally )
Other issue: The method "addprocs_htc(np::Integer) = addprocs(HTCManager(np))" does not seem to allow the specification a a different working directory. In many cases, htcondor will place the julia-1195449.sh and associated files into a temporary scratch working directory where one may want to stay during the worker lifetime. Couldn't we avoid that with a
(dir!=nothing) && println(scriptf, "cd $(Base.shell_escape(dir))")
and
addprocs_htc(np::Integer ; dir=nothing ) = addprocs(HTCManager(np) , dir=dir)
change in condor.jl
The text was updated successfully, but these errors were encountered: