Skip to Content.
Sympa Menu

atlas-connect-l - Re: [Atlas-connect-l] Condor jobs switched back to idle on atlas connect?

atlas-connect-l AT lists.bnl.gov

Subject: Atlas-connect-l mailing list

List archive

Chronological Thread  
  • From: Matthew Epland <matthew.epland AT cern.ch>
  • To: jlstephen AT uchicago.edu
  • Cc: atlas-connect-l AT lists.bnl.gov
  • Subject: Re: [Atlas-connect-l] Condor jobs switched back to idle on atlas connect?
  • Date: Wed, 7 Nov 2018 16:31:08 -0500

Hello,

The jobs ended up restarting, but I killed them and resubmitted with some other updates. Now I'd like to log into one of the worker nodes and check on my payloads log file but I'm getting an error:

[mepland@login logs]$ condor_ssh_to_job 369232.0
slot1@7251@uct2-c363.mwt2.org: Cannot execute /tmp/condorinstall0/libexec/condor_ssh_to_job_sshd_setup: No such file or directory

[mepland@login logs]$ condor_ssh_to_job -debug 369232.0
11/07/18 15:27:31 SharedPortClient: sent connection request to schedd at <192.170.231.50:9618> for shared port id 2635_f824_3
11/07/18 15:27:31 SharedPortClient: sent connection request to local schedd for shared port id 2635_f824_3
11/07/18 15:27:31 Response for GET_JOB_CONNECT_INFO:
RemoteHost = "slot1@7251@uct2-c363.mwt2.org"
Result = true
ServerTime = 1541626051
CondorVersion = "$CondorVersion: 8.4.12 Jul 06 2017 BuildID: 409562 $"

11/07/18 15:27:31 No shared_port cookie available; will fall back to using on-disk $(DAEMON_SOCKET_DIR)
11/07/18 15:27:31 SharedPortClient: sent connection request to collector 192.170.227.143:11120?addrs=192.170.227.143-11120&noUDP&sock=collector for shared port id collector
slot1@7251@uct2-c363.mwt2.org: Cannot execute /tmp/condorinstall0/libexec/condor_ssh_to_job_sshd_setup: No such file or directory
11/07/18 15:27:31 Attempting to remove /tmp/mepland.condor_ssh_to_job_8c57114f as unknown user

The documentation says:

"condor_ssh_to_job is intended to work with OpenSSH as installed in typical environments. It does not work on Windows platforms. If the ssh programs are installed in non-standard locations, then the paths to these programs will need to be customized within the HTCondor configuration. Versions of ssh other than OpenSSH may work, but they will likely require additional configuration of command-line arguments, changes to the sshd configuration template file, and possibly modification of the $(LIBEXEC)/condor_ssh_to_job_sshd_setup script used by the condor_starter to set up sshd."

So could something possibly be miss-configured with LIBEXEC?

Thank you,
Matt Epland

On Wed, Nov 7, 2018 at 10:35 AM Judith Stephen <jlstephen AT uchicago.edu> wrote:
Hi Giordon,

login.usatlas.org was rebooted half an hour ago, but the existing running jobs should have reconnected by now. I am looking into it.

Judith

> On Nov 7, 2018, at 9:22 AM, Giordon Stark <gstark AT cern.ch> wrote:
>
> Hi,
>
> I'm reporting on behalf of Matthew cc'd -- it seems like his condor jobs that were running got pushed back to idle. Any ideas why that happened? Did something get restarted?
>
> Giordon
> --
> Giordon Stark
> _______________________________________________
> Atlas-connect-l mailing list
> Atlas-connect-l AT lists.bnl.gov
> https://lists.bnl.gov/mailman/listinfo/atlas-connect-l



--
Matthew Epland
Duke University Department of Physics



Archive powered by MHonArc 2.6.24.

Top of Page