Merge lp://qastaging/~fginther/jenkins-launchpad-plugin/wait-for-node into lp://qastaging/~private-ps-quality-team/jenkins-launchpad-plugin/trunk

Proposed by Francis Ginther
Status: Needs review
Proposed branch: lp://qastaging/~fginther/jenkins-launchpad-plugin/wait-for-node
Merge into: lp://qastaging/~private-ps-quality-team/jenkins-launchpad-plugin/trunk
Diff against target: 419 lines (+333/-23)
6 files modified
jenkinsutils.py (+73/-1)
jsonjenkins.py (+0/-19)
launchpad.py (+3/-3)
tests/test_jenkinsutils.py (+134/-0)
tests/test_waitForNodeIdle.py (+60/-0)
waitForNodeIdle.py (+63/-0)
To merge this branch: bzr merge lp://qastaging/~fginther/jenkins-launchpad-plugin/wait-for-node
Reviewer Review Type Date Requested Status
Martin Mrazik (community) Needs Fixing
PS Jenkins bot continuous-integration Needs Fixing
Review via email: mp+148956@code.qastaging.launchpad.net

Commit message

Adds a script, waitForNodeIdle.py to wait for a jenkins slave node to become idle. This can be used to automate slave node maintenance tasks which require the node to be offline and idle.

Description of the change

Adds a script to wait for a jenkins slave node to become idle. This can be used to automate slave node maintenance tasks which require the node to be offline and idle. See ps-qa-pbuilder-clean for an example.

Testing:
 - Includes unit tests for wait_for_node_idle()
 - Tested with jenkins job ps-qa-pbuilder-clean to perform a pbuilder --clean.

To post a comment you must log in.
Revision history for this message
PS Jenkins bot (ps-jenkins) wrote :
review: Approve (continuous-integration)
Revision history for this message
Martin Mrazik (mrazik) wrote :

Few things:

1. Can we make the timeout arg optional with some reasonable default value? It will make the use of this easier as I won't need to think of what "reasonable timeout" is. Not quite sure what the default should be but I would expect something like 2 hours.

2. I would prefer an "--mark-offline" option (maybe on by default) which would put the node to maintenance so no new jobs will be scheduled. You will need to do this anyway and if we do it directly in this script we will shorten the window where a race condition can occur. After the node is in maintenance I would do the check for idle again just to make sure nothing was scheduled while we were doing the lock. Once --mark-offline is there it would be nice to have --mark-online which would put the node back online again without any additional checks. That way all I need is this python script and the rest can be done in shell as a separate build step.

3. I wonder if "def wait_for_node_idle" shouldn't be in jenkinsutils.py (as well as any helpers for making the node online/offline).

review: Needs Fixing
Revision history for this message
Martin Mrazik (mrazik) wrote :

btw. I just tried to put a node offline while a test was running and the test continues to run. Maybe with the --mark-offline option the script can first mark the node offline and then wait for being idle.

Revision history for this message
PS Jenkins bot (ps-jenkins) wrote :
review: Needs Fixing (continuous-integration)
Revision history for this message
Francis Ginther (fginther) wrote :

This doesn't work right due to some authorization issue:
Traceback (most recent call last):
  File "/var/lib/jenkins/jenkins-launchpad-plugin-wfn/waitForNodeIdle.py", line 95, in <module>
    sys.exit(main())
  File "/var/lib/jenkins/jenkins-launchpad-plugin-wfn/waitForNodeIdle.py", line 87, in main
    mark_offline(json_jenkins, args.url, args.node)
  File "/var/lib/jenkins/jenkins-launchpad-plugin-wfn/waitForNodeIdle.py", line 46, in mark_offline
    data = json_jenkins.get_json_data(offline_url, False)
  File "/var/lib/jenkins/jenkins-launchpad-plugin-wfn/jsonjenkins.py", line 19, in get_json_data
    data = urllib2.urlopen(url)
  File "/usr/lib/python2.7/urllib2.py", line 126, in urlopen
    return _opener.open(url, data, timeout)
  File "/usr/lib/python2.7/urllib2.py", line 406, in open
    response = meth(req, response)
  File "/usr/lib/python2.7/urllib2.py", line 519, in http_response
    'http', request, response, code, msg, hdrs)
  File "/usr/lib/python2.7/urllib2.py", line 444, in error
    return self._call_chain(*args)
  File "/usr/lib/python2.7/urllib2.py", line 378, in _call_chain
    result = func(*args)
  File "/usr/lib/python2.7/urllib2.py", line 527, in http_error_default
    raise HTTPError(req.get_full_url(), code, msg, hdrs, fp)
urllib2.HTTPError: HTTP Error 403: Forbidden

It's the call to load http://{server}/computer/{node}/toggleOffline which works fine when triggered from a browser. Yes, I know that get_json_data isn't the right call for this, but it's failing on the urlopen method, not the json.loads method.

The alternative solution is to call jenkins-cli from within the python script. I don't want to do that as it's mixing two APIs when a pure web based solution should work.

Revision history for this message
Martin Mrazik (mrazik) wrote :

using PreemptiveBasicAuthHandler class from[1] instead of urllib2.HTTPBasicAuthHandler seems to fix the issue (well, then it fails on the json.load call but that seems to be straightforward).

What about adding PreemptiveBasicAuthHandler and renaming JSONJenkins to something else (JenkinsURLlib?) providing two different methods -- get_json_data (with the json.load as its right now) and get_data (returning raw data)?

[1]
http://pythonhosted.org/jenkinsapi/_modules/jenkinsapi/utils/urlopener.html

Revision history for this message
Francis Ginther (fginther) wrote :

Address the earlier comments. This version was tested on a local jenkins server.

mark-offline will mark the node offline, then wait for the timeout (currently 2 hours) for the node to become idle. When a node is offline, jenkins will not schedule new jobs, but does wait for any running jobs to complete. This allows for a graceful shutdown of a node without impacting any running jobs.

Revision history for this message
Francis Ginther (fginther) wrote :

Now ready for review.

Revision history for this message
PS Jenkins bot (ps-jenkins) wrote :
review: Needs Fixing (continuous-integration)
Revision history for this message
Martin Mrazik (mrazik) wrote :

Tests are failing.

review: Needs Fixing

Unmerged revisions

85. By Francis Ginther

Added logging messages.

84. By Francis Ginther

Merge to trunk

83. By Francis Ginther

Updated waitForNodeIdle test cases.

82. By Francis Ginther

Refactored and renamed JSONJenkins to JenkinsUrl and fixed authorization to allow setting nodes offline. Moved code to manipulate nodes into jenkinsutils.

81. By Francis Ginther

Add options to mark the node offline and online.

80. By Francis Ginther

Added test_waitForNodeIdle.py for unit tests.

79. By Francis Ginther

Fixed timeout issue found in testing

78. By Francis Ginther

Add wait_for_node_idle.py to determine when a jenkins node is idle and ready for clean-up actions.

Preview Diff

[H/L] Next/Prev Comment, [J/K] Next/Prev File, [N/P] Next/Prev Hunk
The diff is not available at this time. You can reload the page or download it.

Subscribers

People subscribed via source and target branches

to all changes: