Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Failure on TASK [fubarhouse.golang : Go-Lang | Moving to installation directory] #66

Open
markdorison opened this issue Aug 9, 2017 · 9 comments

Comments

@markdorison
Copy link

This role has worked for me in the past but I am now encountering the following error on the "Moving to installation directory" task. I redacted the information about the box it is running on.

fatal: [FQDN_REDACTED -> IP_REDACTED]: FAILED! => {"changed": false, "cmd": "/usr/bin/rsync --delay-updates -F --compress --delete-after --archive --rsh 'ssh -S none -o StrictHostKeyChecking=no' --rsync-path=\"sudo rsync\" --out-format='<<CHANGED>>%i %n%L' \"/tmp/go/\" \"IP_REDACTED:/root/go\"", "failed": true, "msg": "Warning: Permanently added 'IP_REDACTED' (ECDSA) to the list of known hosts.\r\nPermission denied (publickey).\r\nrsync: connection unexpectedly closed (0 bytes received so far) [sender]\nrsync error: unexplained error (code 255) at io.c(226) [sender=3.1.0]\n", "rc": 255}

@fubarhouse
Copy link
Owner

@markdorison first of all thanks for submitting a ticket!

Second, as you're getting the following: Permission denied (publickey), could you please check the ssh-agent to ensure keys have been added?

You can use the following

echo $SSH_AGENT_PID
eval $(ssh-add)
ssh-add
ssh-add -l

This will rule out any obvious errors. I am intending on going back through this role again for a new release - much like what I've just completed for my curl role.

@ctorgalson
Copy link

@fubarhouse: Thanks!

(I work with @markdorison and it was me who ran into this issue). The machine the Ansible playbook was running from does have a passphrase-protected ssh-key to the remote machine, but we know it was ssh-added because the playbook was able to connect to the remote machine at all (and successfully complete all tasks up to the golang role).

But reading through the code, it looks like it's the ansible_ssh_user is undefined that's even forcing the role's use of synchronize instead of shell, and this playbook is usually run without -u, and I don't think we'd found the fubarhouse_user variable.

I'll run the playbook again at a quiet time with a value for user or fubarhouse_user to see if that's the issue & report back.

@fubarhouse
Copy link
Owner

@ctorgalson that actually describes something I can diagnose much better, so I'll look into this for you and respond.

Under no circumstance should fubarhouse_user not be assigned a value, but at least now I know there is a case where it may not be.

Are you able to provide any given reason the following two tasks would be skipped?

- name: "Go-Lang | Define user variable for ssh use"
  set_fact:
    fubarhouse_user: "{{ ansible_ssh_user }}"
  when: ansible_ssh_user is defined and fubarhouse_user is not defined

- name: "Go-Lang | Define user variable for non-ssh use"
  set_fact:
    fubarhouse_user: "{{ ansible_user_id }}"
  when: ansible_ssh_user is not defined and fubarhouse_user is not defined

@ctorgalson
Copy link

ctorgalson commented Aug 28, 2017

@fubarhouse Thanks. I think I can explain what's failing.

Background

  • We use this role to install Go (naturally!)
  • Any of several users can run the relevant playbook.
  • All these users are sudoers.
  • The users all use public-keys to authenticate as separate users.
  • Running ansible -i hosts.yml -m setup hostname shows me that:
    • neither ansible_ssh_user nor ansible_user (since this is Ansible 2.2.2.0) are defined, and
    • Ansible has correctly set ansible_user_id to ctorgalson.

Analysis

  1. Since ansible_ssh_user is undefined, the role sets fubarhouse_user to the value of ansible_user_id.

  2. Since ansible_ssh_user is undefined, the role will attempt to use the synchronize task

  3. The synchronize task uses the value of fubarhouse_user for become_user. This is the direct cause of the error we see (but possibly not the actual problem):

    "cmd": "/usr/bin/rsync --delay-updates -F --compress --delete-after --archive --rsh 'ssh  -S none -o StrictHostKeyChecking=no' --rsync-path=\"sudo rsync\" --out-format='<<CHANGED>>%i %n%L' \"/tmp/go/\" \"xxx.xxx.xxx.xxx:/root/go\"",
    

    Even though ctorgalson is in the sudoers list, that user should become root in order to move files into root's home directory (i.e. it should run rsync with sudo rsync ... and not sudo -u ctorgalson rsync ...); this suggests that the actual problem is something else...

  4. The rsync command shown in the error above attempts to copy files to xxx.xxx.xxx.xxx:/root/go. This shows that the {{ GOROOT }} fact is set to /root/go even though the code appears to try to set it to the fubarhouse_user's home directory.

I think (4) is the core issue, though I'm not sure what an appropriate solution might be--installing in individual users' home directories is not a viable solution for us :)


PS: according to Ansible's documentation, the Synchronize module "...is run and originates on the local host where Ansible is being run". Which sounds like the generated rsync command above might always fail (since the get_url task downloads to the remote host, but the Ansible-generated rsync command's source is /tmp/go and not e.g. xxx.xxx.xxx.xxx:/tmp/go).

@fubarhouse
Copy link
Owner

@ctorgalson I've actually been doing a bit of work on similar things, but I've rolled some more changes to the dev branch in and kicked off some tests, here's a summary.

  • changed the source distro (build from source) for git
  • completely removed become and become_user, as no sudo access is actually required. It's something I should be doing more of in my other roles.
  • much to my disinterest, I've completely removed the syncronize task and favoured the shell cp... task. The sync module is inconsistent, but it's all Ansible offers for that purpose...
  • an assortment of unrelated changes to ensure this role works on 20 different linux platforms.

You can test this out on the dev-2.5.x branch, but I'll get a release out in the next day for you. I would be appreciative if you could tell me if the above changes solve your problem!

Link to tests:

@fubarhouse
Copy link
Owner

2.5.0 is officially released, available via the galaxy.

As previously stated, I'd like to know if the changes have resolved your problems.

@markdorison
Copy link
Author

markdorison commented Sep 21, 2017

@fubarhouse I updated the role to 2.5.0. When attempting a run it fails, but in a different place:

TASK [fubarhouse.golang : Go-Lang | Run get commands] ******************************************************************************************************************************************************* failed: [jenkins.chromatic.is] (item={u'url': u'github.com/StackExchange/dnscontrol', u'name': u'dnscontrol'}) => {"changed": false, "cmd": "/root/go/bin/go get -u github.com/StackExchange/dnscontrol", "delta": "0:00:02.402064", "end": "2017-09-19 19:18:47.305134", "failed": true, "item": {"name": "dnscontrol", "url": "github.com/StackExchange/dnscontrol"}, "rc": 2, "start": "2017-09-19 19:18:44.903070", "stderr": "# runtime\n/root/go/src/runtime/mstkbar.go:151:10: debug.gcstackbarrieroff undefined (type struct { allocfreetrace int32; cgocheck int32; efence int32; gccheckmark int32; gcpacertrace int32; gcshrinkstackoff int32; gcrescanstacks int32; gcstoptheworld int32; gctrace int32; invalidptr int32; sbrk int32; scavenge int32; scheddetail int32; schedtrace int32 } has no field or method gcstackbarrieroff)\n/root/go/src/runtime/mstkbar.go:162:24: division by zero\n/root/go/src/runtime/mstkbar.go:162:43: invalid expression unsafe.Sizeof(composite literal)\n/root/go/src/runtime/mstkbar.go:162:44: undefined: stkbar\n/root/go/src/runtime/mstkbar.go:212:4: gp.stkbar undefined (type *g has no field or method stkbar)\n/root/go/src/runtime/mstkbar.go:213:15: gp.stkbar undefined (type *g has no field or method stkbar)\n/root/go/src/runtime/mstkbar.go:216:23: undefined: stackBarrierPC\n/root/go/src/runtime/mstkbar.go:226:28: gp.stkbarPos undefined (type *g has no field or method stkbarPos)\n/root/go/src/runtime/mstkbar.go:227:19: gp.stkbarPos undefined (type *g has no field or method stkbarPos)\n/root/go/src/runtime/mstkbar.go:248:41: undefined: stkbar\n/root/go/src/runtime/mstkbar.go:227:19: too many errors", "stderr_lines": ["# runtime", "/root/go/src/runtime/mstkbar.go:151:10: debug.gcstackbarrieroff undefined (type struct { allocfreetrace int32; cgocheck int32; efence int32; gccheckmark int32; gcpacertrace int32; gcshrinkstackoff int32; gcrescanstacks int32; gcstoptheworld int32; gctrace int32; invalidptr int32; sbrk int32; scavenge int32; scheddetail int32; schedtrace int32 } has no field or method gcstackbarrieroff)", "/root/go/src/runtime/mstkbar.go:162:24: division by zero", "/root/go/src/runtime/mstkbar.go:162:43: invalid expression unsafe.Sizeof(composite literal)", "/root/go/src/runtime/mstkbar.go:162:44: undefined: stkbar", "/root/go/src/runtime/mstkbar.go:212:4: gp.stkbar undefined (type *g has no field or method stkbar)", "/root/go/src/runtime/mstkbar.go:213:15: gp.stkbar undefined (type *g has no field or method stkbar)", "/root/go/src/runtime/mstkbar.go:216:23: undefined: stackBarrierPC", "/root/go/src/runtime/mstkbar.go:226:28: gp.stkbarPos undefined (type *g has no field or method stkbarPos)", "/root/go/src/runtime/mstkbar.go:227:19: gp.stkbarPos undefined (type *g has no field or method stkbarPos)", "/root/go/src/runtime/mstkbar.go:248:41: undefined: stkbar", "/root/go/src/runtime/mstkbar.go:227:19: too many errors"], "stdout": "", "stdout_lines": []}

@markdorison
Copy link
Author

The cause of this failure seems to be further upstream in the playbook as a bunch of tasks are being skipped and go is not being installed successfully. Investigating further.

@fubarhouse
Copy link
Owner

fubarhouse commented Sep 21, 2017

@markdorison,

I have just identified the problem, so I'll get a fix under way asap.

Edit: see See 0231ee8

I'm just waiting for some tests (now running) to complete and I'll release it.

Edit:

2.6.1 is released, which includes the above commit.

Changelog will be added tonight, but it's available via the galaxy.

Edit (again):

If the distribution tasks are skipping, the removal of the old Go install will also fail.

It's my recommendation to delete your GOROOT in the event this fails again.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants