Merge lp://qastaging/lava-dispatcher/multinode into lp://qastaging/lava-dispatcher

Proposed by Neil Williams
Status: Merged
Approved by: Neil Williams
Approved revision: 693
Merged at revision: 659
Proposed branch: lp://qastaging/lava-dispatcher/multinode
Merge into: lp://qastaging/lava-dispatcher
Diff against target: 4419 lines (+2791/-513) (has conflicts)
42 files modified
doc/conf.py (+2/-2)
doc/debugging.rst (+119/-0)
doc/index.rst (+1/-0)
doc/multinode-usecases.rst (+8/-0)
doc/multinode.rst (+291/-0)
doc/multinodeapi.rst (+302/-0)
doc/usecaseone.rst (+521/-0)
doc/usecasetwo.rst (+224/-0)
lava/dispatcher/commands.py (+8/-1)
lava/dispatcher/node.py (+411/-0)
lava_dispatcher/__init__.py (+1/-1)
lava_dispatcher/actions/deploy.py (+1/-0)
lava_dispatcher/actions/launch_control.py (+60/-3)
lava_dispatcher/actions/lava_test_shell.py (+69/-1)
lava_dispatcher/config.py (+0/-1)
lava_dispatcher/context.py (+20/-4)
lava_dispatcher/default-config/lava-dispatcher/device-types/aa9.conf (+16/-0)
lava_dispatcher/default-config/lava-dispatcher/device-types/capri.conf (+0/-46)
lava_dispatcher/default-config/lava-dispatcher/device-types/mx53loco.conf (+16/-6)
lava_dispatcher/default-config/lava-dispatcher/device-types/nexus10.conf (+0/-46)
lava_dispatcher/default-config/lava-dispatcher/device-types/rtsm_foundation-armv8.conf (+0/-20)
lava_dispatcher/default-config/lava-dispatcher/device-types/rtsm_ve-a15x1-a7x1.conf (+0/-117)
lava_dispatcher/default-config/lava-dispatcher/device-types/rtsm_ve-a15x4-a7x4.conf (+0/-117)
lava_dispatcher/default-config/lava-dispatcher/device-types/rtsm_ve-armv8.conf (+0/-128)
lava_dispatcher/device/master.py (+15/-10)
lava_dispatcher/downloader.py (+1/-1)
lava_dispatcher/job.py (+148/-4)
lava_dispatcher/signals/__init__.py (+88/-1)
lava_dispatcher/tests/test-config/bin/fake-qemu (+3/-0)
lava_dispatcher/tests/test_device_version.py (+24/-0)
lava_dispatcher/utils.py (+2/-2)
lava_test_shell/multi_node/lava-group (+19/-0)
lava_test_shell/multi_node/lava-multi-node.lib (+210/-0)
lava_test_shell/multi_node/lava-network (+104/-0)
lava_test_shell/multi_node/lava-role (+14/-0)
lava_test_shell/multi_node/lava-self (+9/-0)
lava_test_shell/multi_node/lava-send (+17/-0)
lava_test_shell/multi_node/lava-sync (+20/-0)
lava_test_shell/multi_node/lava-wait (+21/-0)
lava_test_shell/multi_node/lava-wait-all (+23/-0)
requirements.txt (+2/-1)
setup.py (+1/-1)
Text conflict in lava_dispatcher/actions/lava_test_shell.py
Text conflict in lava_dispatcher/context.py
Conflict adding file lava_dispatcher/default-config/lava-dispatcher/device-types/aa9.conf.  Moved existing file to lava_dispatcher/default-config/lava-dispatcher/device-types/aa9.conf.moved.
Text conflict in lava_dispatcher/default-config/lava-dispatcher/device-types/mx53loco.conf
Text conflict in lava_dispatcher/device/master.py
Text conflict in lava_dispatcher/job.py
Conflict adding file lava_dispatcher/tests/test-config/bin.  Moved existing file to lava_dispatcher/tests/test-config/bin.moved.
Text conflict in lava_dispatcher/tests/test_device_version.py
To merge this branch: bzr merge lp://qastaging/lava-dispatcher/multinode
Reviewer Review Type Date Requested Status
Neil Williams Approve
Antonio Terceiro Needs Fixing
Review via email: mp+181233@code.qastaging.launchpad.net

This proposal supersedes a proposal from 2013-08-20.

Description of the change

Landing MultiNode.

Handles the communication between jobs in a MultiNode group to deliver the LAVA MultiNode API with synchronisation primitives.

This branch applies with conflicts. The conflicts are proposed to be resolved as per this temporary branch: lp:~codehelp/lava-dispatcher/multinode-merge

lava-dispatcher will be the merged after dashboard but before scheduler, so that MultiNode jobs can start as soon as the scheduler is ready.

Updated: Include missing changes from tip.

To post a comment you must log in.
Revision history for this message
Antonio Terceiro (terceiro) wrote :
Download full text (68.2 KiB)

Hi guys,

This is great work! I like how we managed to minimize the impact on existing
code, so we are not risking breaking what already works! :-)

I have a fair number of comments below. I hope it helps us to get to a even
better code.

On Wed, Aug 21, 2013 at 09:45:46AM -0000, Neil Williams wrote:
> Neil Williams has proposed merging lp:lava-dispatcher/multinode into lp:lava-dispatcher.
>
> Requested reviews:
> Linaro Validation Team (linaro-validation)
>
> For more details, see:
> https://code.launchpad.net/~linaro-automation/lava-dispatcher/multinode/+merge/181233
>
> Landing MultiNode.
>
> Handles the communication between jobs in a MultiNode group to deliver
> the LAVA MultiNode API with synchronisation primitives.
>
> This branch applies with conflicts. The conflicts are proposed to be
> resolved as per this temporary branch:
> lp:~codehelp/lava-dispatcher/multinode-merge

I don't undertand why you didn't resolve the conflict already before making the
merge proposal ... specially because you already have that done somewhere else
:-)

> lava-dispatcher will be the merged after dashboard but before
> scheduler, so that MultiNode jobs can start as soon as the scheduler
> is ready.
>
> Updated: Include missing changes from tip.
[...]

There were a bunch of file renamings/removals in lava_dispatcher/default-config
in the diff, I don't think they should be here. I'm ommitting those from the
review. Maybe the merge from trunk was not complete, please make sure you
review the diff wrt trunk before going further with this.

> === modified file 'lava/dispatcher/commands.py'
> --- lava/dispatcher/commands.py 2013-07-16 15:58:16 +0000
> +++ lava/dispatcher/commands.py 2013-08-21 09:44:41 +0000
> @@ -7,7 +7,7 @@
> from json_schema_validator.errors import ValidationError
> from lava.tool.command import Command
> from lava.tool.errors import CommandError
> -
> +from lava.dispatcher.node import NodeDispatcher
> import lava_dispatcher.config
> from lava_dispatcher.config import get_config, get_device_config, get_devices
> from lava_dispatcher.job import LavaTestJob, validate_job_data
> @@ -93,6 +93,7 @@
> # Set process id if job-id was passed to dispatcher
> if self.args.job_id:
> try:
> + # noinspection PyUnresolvedReferences

what is this?

> from setproctitle import getproctitle, setproctitle
> except ImportError:
> logging.warning(
> @@ -107,6 +108,14 @@
> jobdata = stream.read()
> json_jobdata = json.loads(jobdata)
>
> + # detect multinode and start a NodeDispatcher to work with the LAVA Coordinator.
> + if not self.args.validate:
> + if 'target_group' in json_jobdata:
> + node = NodeDispatcher(json_jobdata, oob_file, self.args.output_dir)
> + node.run()
> + # the NodeDispatcher has started and closed.
> + # FIXME: get any error state from nodeDispatcher!

is it OK to land with this issue here unsolved?

> + exit(0)
> if self.args.target is None:
> if 'target' not in json_jobdata:
> ...

review: Needs Fixing
Revision history for this message
Neil Williams (codehelp) wrote :
Download full text (13.9 KiB)

On Fri, 23 Aug 2013 00:03:18 -0000
Antonio Terceiro <email address hidden> wrote:

> > This branch applies with conflicts. The conflicts are proposed to be
> > resolved as per this temporary branch:
> > lp:~codehelp/lava-dispatcher/multinode-merge
>
> I don't undertand why you didn't resolve the conflict already before
> making the merge proposal ... specially because you already have that
> done somewhere else :-)

What I have somewhere else is just a copy of the merge after manual
resolution of the conflicts. Some of the conflicts are entirely
unavoidable - I'm inserting a new block into an existing function and
launchpad/bazaar consider this a conflict when there is no reason for
it to conflict.

> > lava-dispatcher will be the merged after dashboard but before
> > scheduler, so that MultiNode jobs can start as soon as the scheduler
> > is ready.
> >
> > Updated: Include missing changes from tip.
> [...]
>
> There were a bunch of file renamings/removals in
> lava_dispatcher/default-config in the diff, I don't think they should
> be here. I'm ommitting those from the review. Maybe the merge from
> trunk was not complete, please make sure you review the diff wrt
> trunk before going further with this.

That's why the merge branch is there. The file removals have been fixed
but there are other conflicts which cannot be avoided as bazaar /
launchpad just don't seem to accept that there is no conflict.

> > === modified file 'lava/dispatcher/commands.py'
> > --- lava/dispatcher/commands.py 2013-07-16 15:58:16 +0000
> > +++ lava/dispatcher/commands.py 2013-08-21 09:44:41 +0000
> > @@ -7,7 +7,7 @@
> > from json_schema_validator.errors import ValidationError
> > from lava.tool.command import Command
> > from lava.tool.errors import CommandError
> > -
> > +from lava.dispatcher.node import NodeDispatcher
> > import lava_dispatcher.config
> > from lava_dispatcher.config import get_config, get_device_config,
> > get_devices from lava_dispatcher.job import LavaTestJob,
> > validate_job_data @@ -93,6 +93,7 @@
> > # Set process id if job-id was passed to dispatcher
> > if self.args.job_id:
> > try:
> > + # noinspection PyUnresolvedReferences
>
> what is this?

PyCharm code inspection flag. It allows PyCharm to indicate that the
file is clean when otherwise PyCharm itself would get confused by the
failure to import. It's harmless but I can remove it. (There are lots
of others which I keep in the git versions of these files but this one
slipped into the bazaar copy too.)

> > + node.run()
> > + # the NodeDispatcher has started and closed.
> > + # FIXME: get any error state from nodeDispatcher!
>
> is it OK to land with this issue here unsolved?

I'll remove the FIXME. There were error conditions when NodeDispatcher
used a different form of comms but these no longer apply. There is
nothing else for node.run to return.

> > + json_data = None
> > + polling = False
>
> polling is only used inside poll(), and its value does not depend on
> anything that is outside of that method. make it a local variable
> instead.

Replaced with True as the l...

687. By Neil Williams

Fu Wei 2013-08-23 Fix the bug of disabling debug info in multi-node API
 shell scripts.

Revision history for this message
Antonio Terceiro (terceiro) wrote :

> > > values were +#passed, nothing is printed.
> > > +LAVA_MULTI_NODE_API="LAVA_WAIT"
> > > +#MESSAGE_TIMEOUT=5
> > > +#MESSAGE_NEED_ACK=yes
> > > +
> > > +source $LAVA_TEST_BIN/lava-multi-node.lib
> > > +
> > > +lava_multi_node_send $1
> > > +
> > > +lava_multi_node_wait_for_message
> >
> > lava-wait looks too similar to lava-sync ... the only difference I
> > see is the LAVA_MULTI_NODE_API assignment, which seems to be used
> > only for debugging purposes. lava-wait shouldn't send anything, just
> > wait.
>
> lava-wait has to send a messageID, that's the point above, but it's
> handled in the python, not in the shell. lava-sync is just a lava-send
> followed by a lava-wait for the same messageID.

My point is that both lava-sync and lava-wait are at the moment
implement the exact same way: first lava_multi_node_send $1, then
lava_multi_node_wait_for_message. In my understanding they do the very
same thing. Maybe I still don't understand the mechanics, or maybe this
means we don't actually need both.

688. By Neil Williams

Neil Williams 2013-08-23 Drop PyCharm inspection comment.
Neil Williams 2013-08-23 Set the context once in __init__ instead of each
 time in the handler.
Neil Williams 2013-08-23 Set the context once in __init__ instead of each
 time in the handler. Rename setConnection to set_connection.
Neil Williams 2013-08-23 Drop PyCharm inspection comment.
Neil Williams 2013-08-23 Move delay into a local function variable.

689. By Neil Williams

Neil Williams 2013-08-23 Add the stream parameter to the pending call so
 that it can be passed down to put_pending over xmlrpc.
Neil Williams 2013-08-23 Add the stream parameter to the pending call
 so that it can be passed down to put_pending over xmlrpc.
 Switch to simplejson for the bundle parsing which makes a better
 job of result bundles with Decimal(0.0) output.

690. By Neil Williams

Neil Williams 2013-08-23 Add debugging help page.
Neil Williams 2013-08-23 Add details of how to use the API in the use case.
Neil Williams 2013-08-23 Add hrefs and a section on installing packages.
Neil Williams 2013-08-23 Add debugging page and section on balancing
 timeouts.
Neil Williams 2013-08-23 Add internal hrefs to make it easier to link
 from examples.

691. By Neil Williams

Merge from tip 657:
[Tyler Baker] Implemented deploy_linaro_kernel in qemu class. Refactored dispatcher code for easier reuse. Added lava-test-shell ability to bootloader targets.

692. By Neil Williams

Neil Williams 2013-08-27 Fix LNG result bundle aggregation error with
 measurements.

Revision history for this message
Neil Williams (codehelp) wrote :

> > >
> > > lava-wait looks too similar to lava-sync ... the only difference I
> > > see is the LAVA_MULTI_NODE_API assignment, which seems to be used
> > > only for debugging purposes. lava-wait shouldn't send anything, just
> > > wait.
> >
> > lava-wait has to send a messageID, that's the point above, but it's
> > handled in the python, not in the shell. lava-sync is just a lava-send
> > followed by a lava-wait for the same messageID.
>
> My point is that both lava-sync and lava-wait are at the moment
> implement the exact same way: first lava_multi_node_send $1, then
> lava_multi_node_wait_for_message. In my understanding they do the very
> same thing. Maybe I still don't understand the mechanics, or maybe this
> means we don't actually need both.

LAVA_SYNC and LAVA_WAIT (as LAVA_MULTI_NODE_API) are handled very differently inside the LAVA Coordinator. Wait sends an Ack immediately that any device in the group sends the requested messageID. Sync requires that *all* devices in the group send the requested messageID before allowing the Ack to be sent, instead it causes the NodeDispatcher to wait.

It's the difference between lava-wait and lava-wait-all.

lava-wait-all is needed because it handles roles but also because it is very likely that tests will want to issue a lava-send at some point and only use lava-wait a little bit later.

So whilst the handling inside signals/__init__.py can be the same apart from the API designation, that doesn't mean that the operation will do the same thing during a test.

Revision history for this message
Antonio Terceiro (terceiro) wrote :

On Wed, Aug 28, 2013 at 10:48:48AM -0000, Neil Williams wrote:
> LAVA_SYNC and LAVA_WAIT (as LAVA_MULTI_NODE_API) are handled very
> differently inside the LAVA Coordinator.

OK, this is the bit I was missing. Thanks for the explanation! :-)

693. By Neil Williams

Fu Wei 2013-08-27 multinode review:Fix a bug about using 'printf'
Fu Wei 2013-08-26 multinode review: make 'waiting message ack' be
 supported, only if the shell script is running in bash
Fu Wei 2013-08-26 multinode review: delete some bash-specific syntax
 for multinode API shell scripts.

Revision history for this message
Neil Williams (codehelp) wrote :

Final review update: approved.

review: Approve

Preview Diff

[H/L] Next/Prev Comment, [J/K] Next/Prev File, [N/P] Next/Prev Hunk
The diff is not available at this time. You can reload the page or download it.

Subscribers

People subscribed via source and target branches

to status/vote changes: