OpenStack Compute (nova)

Merge lp://qastaging/~justin-fathomdb/nova/restart-instance into lp://qastaging/~hudson-openstack/nova/trunk

restart-instance
Merge into trunk

Proposed by justinsb on 2011-03-24

Status:

Work in progress

Proposed branch:

lp://qastaging/~justin-fathomdb/nova/restart-instance

Merge into:

lp://qastaging/~hudson-openstack/nova/trunk

Prerequisite:

lp://qastaging/~justin-fathomdb/nova/refresh-instance-states

Diff against target:

684 lines (+323/-86)

13 files modified

nova/api/ec2/cloud.py (+7/-1)
nova/compute/api.py (+3/-2)
nova/compute/manager.py (+11/-11)
nova/db/sqlalchemy/migrate_repo/versions/014_add_persistence_mode_to_instance.py (+52/-0)
nova/db/sqlalchemy/models.py (+2/-0)
nova/tests/db/fakes.py (+9/-1)
nova/utils.py (+7/-2)
nova/virt/driver.py (+159/-11)
nova/virt/fake.py (+0/-15)
nova/virt/hyperv.py (+4/-5)
nova/virt/libvirt_conn.py (+56/-32)
nova/virt/xenapi/vm_utils.py (+7/-0)
nova/virt/xenapi_conn.py (+6/-6)

To merge this branch:

bzr merge lp://qastaging/~justin-fathomdb/nova/restart-instance

Related bugs:

Bug #742115: Guests lost on host reboot

Medium

Fix Released

Link a bug report

Reviewer	Review Type	Date Requested	Status
termie (community)		2011-03-24	Needs Fixing on 2011-03-25
Review via email: mp+54652@code.qastaging.launchpad.net

This proposal supersedes a proposal from 2011-03-15.

Description of the change

Tighter control of auto-start-on-boot.

Particularly in conjunction with delete-on-shutdown, we want to be sure that a machine that we expect to be persistent isn't deleted just because the host reboots. On the other hand, for EC2-style ephemeral machines, we do want delete-on-shutdown.

Revision history for this message

justinsb (justin-fathomdb) wrote on 2011-03-24:

(Adjust with the correct pre-req branch)

OK... I think this is ready for merge.

There are a few big changes in this patch:
1) We create persistent, not transient, libvirt instances. A transient libvirt instance is destroyed whenever it is shut down. This includes a host restart. Not good.

2) We set instances to auto-start. Currently we rely on this to ensure that if an instance is shut-down, we treat is as if it should be deleted (well, tombstoned)

3) Much more careful error handling in get_info(). I hit a really nasty bug - Bug #741468

4) The Amazon API defaults to destroy-on-shutdown, the Rackspace API defaults to do-not-destroy-on-shutdown. Rackspace-style instances are therefore set to auto-start on host start, Amazon-style are not.

Open questions:

1) If the user shuts down a machine (shutdown now), what should we do? Restart it?

2) What should we do with an Amazon-style instance after a host restart?

3) Do we want different handling based on the API used (and potentially a parameter in future)? There was an IRC chat where we came down strongly in favor of having both options.

Revision history for this message

justinsb (justin-fathomdb) wrote on 2011-03-24:

Incidentally, although this has some complexities in terms of deciding the best policy, it does need to go in to Cactus. Without it, instances are routinely lost when the host restarts :-(

Revision history for this message

Vish Ishaya (vishvananda) wrote on 2011-03-24:

A couple comments:

A) The instances aren't actually "lost" in the current verision. A euca-reboot-instances will bring them back without any data lost. Libvirt doesn't know they exist, but the files are still there.

B) I think the default should be to not lose the instances on reboot. Amazon's default is silly.

Vish

On Mar 24, 2011, at 12:08 AM, justinsb wrote:

> Incidentally, although this has some complexities in terms of deciding the best policy, it does need to go in to Cactus. Without it, instances are routinely lost when the host restarts :-(
> --
> https://code.launchpad.net/~justin-fathomdb/nova/restart-instance/+merge/54652
> You are subscribed to branch lp:nova.

Revision history for this message

justinsb (justin-fathomdb) wrote on 2011-03-24:

Good point; the disks aren't actually lost currently. I think that on boot though, the instances will be marked deleted, so I'm not sure that it's as easy as just a quick euca command.

Here's what I think happens (with libvirt) before I started messing with it, after a host reboot:

1) All libvirt instances are deleted, because they are transient
2) When the compute node is started, it sets the state to SHUTOFF because it can't find a VM
3) It then deletes the instance in the DB (because it is SHUTOFF), so they disappear from the GUI

After this patch, here's what should happen, for instances in "non-ephemeral" mode:

1) The libvirt instances are still there, because they are persistent. In fact, they should be running, because they're auto-boot.
2) When the compute node is started, it just adopts the running VMs.

I do agree that the Amazon decision wrt persistence is not what most people expect. I can change our default so that everybody gets 'persistent' instances, whether they use the Amazon or the EC2 API (I like that default). Then people that know what they're doing and do want the ephemeral behaviour can explicitly request it through the OpenStack API (e.g. in "system metadata"), or I guess the cloud operator can switch it system-wide for EC2 API users.

Fairly easy to change the flag default, so can we get all the other stuff nailed down?

Revision history for this message

Vish Ishaya (vishvananda) wrote on 2011-03-24:

Below:

On Mar 24, 2011, at 10:29 AM, justinsb wrote:

> 3) It then deletes the instance in the DB (because it is SHUTOFF), so they disappear from the GUI

Is this due to a recent patch? This should never happen IMO. Instances should not be deleted from the db because of a state change. Auto reboot is quite nice, but what if libvirt fails. We don't want the instances deleted from the db. A state update to shutoff or something would be way better.

Revision history for this message

justinsb (justin-fathomdb) wrote on 2011-03-24:

http://bazaar.launchpad.net/~hudson-openstack/nova/trunk/annotate/head:/nova/virt/libvirt_conn.py

It's this patch, late January:
http://bazaar.launchpad.net/~hudson-openstack/nova/trunk/revision/604.3.1

I'm not saying that's a bad patch - quite the contrary. There's even a
massive TODO stating that this is likely to be a problem. We haven't looked
at this behaviour in detail, which is what my patch is trying to do.

Want to talk this through on IRC?

Revision history for this message

Jesse Andrews (anotherjesse) wrote on 2011-03-24:

Would it make sense to separate the concept of persistance & api interface?

I might want to run openstack api in non-persistance mode or ec2 api in persistance mode

Revision history for this message

justinsb (justin-fathomdb) wrote on 2011-03-25:

Definitely they should be separated... right now the operator can set this
system-wide with a flag for EC2. I could easily add a flag for the
OpenStack API also.

Vish suggested that the default should be persistent even for EC2 - I do
like the idea that the default doesn't vary based on API, and I do like the
idea of 'failing safe' and not deleting customer's instances on reboots
(even if we put in safety measures to let them recover it within 24 hours)

We don't yet have a mechanism to let the customer choose on a per-instance
basis which behaviour they want, but the infrastructure is there in this
patch. It could easily be a piece of "system metadata". It's important to
be configurable though, because some users don't want us restarting their
instances after a crash (e.g. memcached, or even a database server if my
operations system has already remounted the volume on another machine), but
other users are more traditional and just want the instance back up (e.g.
virtual desktops).

I would suggest that we need to focus on the details of the two behaviour
modes (ephemeral and persistent), even if we expect that some modes won't
actually be used by default in Cactus.

Here's what I suggest they should be:

Ephemeral (EC2 style):

Host reboot: Gone
Guest shutdown: Gone
Guest restart: Still there
Guest crash: Gone
API delete: Gone

Persistent (CloudServers style):

Host reboot: Restarted asap, as if nothing happened
Guest shutdown: We auto-restart it.
Guest restart: Still there
Guest crash: We auto-restart it.
API delete: Gone

(In other words, in persistent mode, we're always restarting it until you
explicitly delete it.)

There's a separate consideration which is what we want to happen once a
guest is "gone". We probably want to protect against "need more coffee
moments" ... probably want to put it in the SHUTOFF state and still visible
in the API for a while. It should probably be possible to relaunch it and
get your instance drives back, though the devil is in the details here.
After e.g. 4 hours it should go into the deleted state, so it's invisible
in the API, and then maybe after 24 hours we actually wipe the image. This
is obviously much nicer behaviour than EC2's ephemeral images.

All time periods configurable of course, but I thought I would take a first
pass at configuring what I think the broad behaviour could be.

It's a tricky situation here, because obviously on the one hand we should
discuss this properly at the design summit, but on the other hand we can't
have instances disappearing in Cactus!

Should I summarize this for the ML?

Definitely they should be separated... right now the operator can set this
system-wide with a flag for EC2.  I could easily add a flag for the
OpenStack API also.

We don't yet have a mechanism to let the customer choose on a per-instance
basis which behaviour they want, but the infrastructure is there in this
patch.  It could easily be a piece of "system metadata".  It's important to
be configurable though, because some users don't want us restarting their
instances after a crash (e.g. memcached, or even a database server if my
operations system has already remounted the volume on another machine), but
other users are more traditional and just want the instance back up (e.g.
virtual desktops).

I would suggest that we need to focus on the details of the two behaviour
modes (ephemeral and persistent), even if we expect that some modes won't
actually be used by default in Cactus.

Here's what I suggest they should be:

Ephemeral (EC2 style):

Host reboot: Gone
Guest shutdown:  Gone
Guest restart: Still there
Guest crash: Gone
API delete: Gone

Persistent (CloudServers style):

Host reboot: Restarted asap, as if nothing happened
Guest shutdown: We auto-restart it.
Guest restart: Still there
Guest crash: We auto-restart it.
API delete: Gone

(In other words, in persistent mode, we're always restarting it until you
explicitly delete it.)

There's a separate consideration which is what we want to happen once a
guest is "gone".  We probably want to protect against "need more coffee
moments" ... probably want to put it in the SHUTOFF state and still visible
in the API for a while.  It should probably be possible to relaunch it and
get your instance drives back, though the devil is in the details here.
 After e.g. 4 hours it should go into the deleted state, so it's invisible
in the API, and then maybe after 24 hours we actually wipe the image.  This
is obviously much nicer behaviour than EC2's ephemeral images.

All time periods configurable of course, but I thought I would take a first
pass at configuring what I think the broad behaviour could be.

It's a tricky situation here, because obviously on the one hand we should
discuss this properly at the design summit, but on the other hand we can't
have instances disappearing in Cactus!

Should I summarize this for the ML?

Revision history for this message

termie (termie) wrote on 2011-03-25:

- seems like your flag should be DEFINE_boolean rather than integer and instead of persistence mode it would be compute_persist and applicable to both APIs. Alternatively if you really think it is going to mave more than two settings, use an enum so that we don't have magic numbers... but I think persist true/false is probably enough, we can either add another axis later on if there is more granulatiry or switch to an enum at that point
- where you pass persistence mode in compute.api it should default to None and use the flag value if it is None, rather than 0
- you can remove the note in the migrate file, that is how it works
- please format the docstring for init host correctly
- ditto for _init_instance, get_info and spawn
- lots of magic numbers in the PersistenceModel class
- for your todo around line 419 of the diff, please don't generate multiple levels of indentation, same with most of your comments that start with '#'
- for your note in _create_domain please make that into a docstring instead, it explains what is going on in the method more than is explaining an unexpected situation.
- why did you comment out the entire init_host at the end? seems like it is already explaining what it does and is commented out
- in vmutils you may as well remove the extra config option rather than make the note about it
- at like 527 you ask how to raise and keep the stack trace, assuming you mean the original error you can just do 'raise' instead of 'raise e' to reraise the original exception
- in the notes around 545 you don't really need the NOTE(justinsb) stuff, it is just explaining what is going on, not explaining an unexpected condition

I'm not sure I agree with how you're doing PersistenceMode as part of driver and then having it talk to the db, either it is an option on the model or it is a driver specific thing and it doesn't look like it is a driver-specific thing to determine whether persist is enabled. The part that is driver specific is how to deal with persisting something.

I know you prefer object-based data types but the standard at the moment is simple data types and your creation of PersistenceMode goes against that quite a bit and is generally a rather new pattern in the system as a whole, I'd rather simply have "persist" be a property on option on instance and that be the end of it, the rest is driver implementation of what to do based on what that value is equal to.

- seems like your flag should be DEFINE_boolean rather than integer and instead of persistence mode it would be compute_persist and applicable to both APIs. Alternatively if you really think it is going to mave more than two settings, use an enum so that we don't have magic numbers... but I think persist true/false is probably enough, we can either add another axis later on if there is more granulatiry or switch to an enum at that point
- where you pass persistence mode in compute.api it should default to None and use the flag value if it is None, rather than 0
- you can remove the note in the migrate file, that is how it works
- please format the docstring for init host correctly
- ditto for _init_instance, get_info and spawn
- lots of magic numbers in the PersistenceModel class
- for your todo around line 419 of the diff, please don't generate multiple levels of indentation, same with  most of your comments that start with '#'
- for your note in _create_domain please make that into a docstring instead, it explains what is going on in the method more than is explaining an unexpected situation.
- why did you comment out the entire init_host at the end? seems like it is already explaining what it does and is commented out
- in vmutils you may as well remove the extra config option rather than make the note about it
- at like 527 you ask how to raise and keep the stack trace, assuming you mean the original error you can just do 'raise' instead of 'raise e' to reraise the original exception
- in the notes around 545 you don't really need the NOTE(justinsb) stuff, it is just explaining what is going on, not explaining an unexpected condition

review: Needs Fixing

Revision history for this message

justinsb (justin-fathomdb) wrote on 2011-03-26:

Thanks termie ... agree with your style notes. (In my defense, some of them are on docstring moved over from the fake)

The commented out init_host in xen_api is because it now inherits the functionality from the base class, so the function (which was a stub function) should no longer be there. However, there were some implementor's notes that weren't my place to delete, so I didn't.

The big issue is around PersistenceMode. While at the moment there are only two values (so it should be a boolean), I can well imagine that we might have more. For example:

On guest crash, should we restart?
On host crash, should we auto-migrate or shutdown?

etc

So can we focus first on what the behaviour should be?

The idea was that PersistenceMode was going to have functions that would define each 'decision point' (like auto-restart on host start?), so that it was clear what the code was doing, and we weren't linking together decisions that weren't actually required to be linked simply so that we could use a simpler flag. Right now, there's only one such function (autostart_on_host_boot), but that's because our desired behaviours still aren't very clear to me.

What I might do is extend PersistenceMode so that it does define all these decision points as functions, and then we can clearly see the decisions we have to make, and whether we want a boolean or an enum or multiple booleans. Sound good?

lp://qastaging/~justin-fathomdb/nova/restart-instance updated on 2011-03-29

829. By justinsb on 2011-03-29: Committing a few cleanups prior to merge with trunk
830. By justinsb on 2011-03-29: Merged with trunk
831. By justinsb on 2011-03-29: Fixed merge problem with names, better message on state change
832. By justinsb on 2011-03-29: More documentation on our persistence modes

Revision history for this message

justinsb (justin-fathomdb) wrote on 2011-03-29:

This branch is probably overkill for the problem at hand. Created a more focused bugfix:
https://code.launchpad.net/~justin-fathomdb/nova/where-has-my-instance-gone

We can discuss the big picture at the Design Summit

Unmerged revisions

832. By justinsb on 2011-03-29

More documentation on our persistence modes

831. By justinsb on 2011-03-29

Fixed merge problem with names, better message on state change

830. By justinsb on 2011-03-29

Merged with trunk

829. By justinsb on 2011-03-29

Committing a few cleanups prior to merge with trunk

828. By justinsb on 2011-03-24

Better documentation, logging and more use of tombstone function

827. By justinsb on 2011-03-24

Documented the _tombstone_instance a bit better

826. By justinsb on 2011-03-24

Brought over the docstring from fake and added some additional commentary

825. By justinsb on 2011-03-24

More cautious exception handling in get_info. See Bug #741468

824. By justinsb on 2011-03-24

Refactored init_host into the base driver.

For each machine discovered on boot, it calls init_instance. A driver doesn't need to implement that, but we log a warning if the driver doesn't. The 'public humiliation' approach.

823. By justinsb on 2011-03-24