Merge lp://qastaging/~vishvananda/nova/cow into lp://qastaging/~hudson-openstack/nova/trunk

Proposed by Vish Ishaya
Status: Merged
Approved by: Soren Hansen
Approved revision: 523
Merged at revision: 562
Proposed branch: lp://qastaging/~vishvananda/nova/cow
Merge into: lp://qastaging/~hudson-openstack/nova/trunk
Diff against target: 542 lines (+195/-154)
4 files modified
contrib/nova.sh (+3/-2)
nova/virt/disk.py (+84/-103)
nova/virt/libvirt.xml.template (+13/-3)
nova/virt/libvirt_conn.py (+95/-46)
To merge this branch: bzr merge lp://qastaging/~vishvananda/nova/cow
Reviewer Review Type Date Requested Status
Soren Hansen (community) Approve
Devin Carlen (community) Approve
Review via email: mp+45293@code.qastaging.launchpad.net

Description of the change

This modifies libvirt to use CoW images instead of raw images. This is much more efficient and allows us to use the snapshotting capabilities available for qcow2 images. It also changes local storage to be a separate drive instead of a separate partition.

I'm proposing this branch for review to get feedback. I may have inadvertently broken a few things. Comments and possible issues:

1. I haven't tested the other hypervisors. I may have broken libvirt xen support and uml support with this patch.
2. Is it useful to have a use_cow_images param, or should it just be automatic for qemu/kvm and turned off for everything else.
3. create_image is a large annoying method. I tried to clean it up a bit, but it could probably use a bit more refactoring.
4. disk.py seems to be only used by the hypervisors, so perhaps it should move into virt dir.
5. disk.py/partition() is unused now. Should we leave it in or throw it away?

Comments welcome

To post a comment you must log in.
Revision history for this message
Thierry Carrez (ttx) wrote :

Looks like this won't make it by FeatureFreeze, and given the regression risks it looks like a good target for early Cactus rather than late Bexar...

Revision history for this message
Vish Ishaya (vishvananda) wrote :

Very sad, as this is a huge optimization and is required for us to put in snapshotting. I guess no one is bothering to review it.

lp://qastaging/~vishvananda/nova/cow updated
521. By Vish Ishaya

merged trunk

Revision history for this message
Devin Carlen (devcamcar) wrote :

approve

review: Approve
Revision history for this message
Jay Pipes (jaypipes) wrote :

I can't review much other than the code style, and general responses to your questions, as I don't have any experience in this area unfortunately. :(

The code style looks perfectly fine to me, FWIW.

I can't figure out how nova/compute/disk.py ever worked before this patch, as all the execute()'s were changed to utils.execute() and I don't see where execute was every in the namespace of nova/compute/disk.py... but that's a side-issue :)

As for your questions:

1. I haven't tested the other hypervisors. I may have broken libvirt xen support and uml support with this patch.

I suppose there's no way of having unittests do this. Would be good to get the functional/smoke tests updated and installed on some test machine linked to Hudson to better automate testing on these types of changes... it makes me nervous to approve a patch that the patch author admits things may break, but as I said, I'm no expert in any of this and can't speak to whether breakages are likely. :(

2. Is it useful to have a use_cow_images param, or should it just be automatic for qemu/kvm and turned off for everything else.

I'd say make it a flag so that we can actually switch it off/on and test the performance differences.

3. create_image is a large annoying method. I tried to clean it up a bit, but it could probably use a bit more refactoring.

You could prolly say the same about lots of things. But, refactoring can always come later ;)

4. disk.py seems to be only used by the hypervisors, so perhaps it should move into virt dir.

That would make sense to me, yes.

5. disk.py/partition() is unused now. Should we leave it in or throw it away?

Chuck it if is not used. IMHO, of course ;)

Revision history for this message
Soren Hansen (soren) wrote :
Download full text (10.2 KiB)

> 1. I haven't tested the other hypervisors. I may have broken libvirt
> xen support and uml support with this patch.

I haven't checked either, but your changes look as likely to break QEmu
as the others, so if that works, it's probably good :)

> 2. Is it useful to have a use_cow_images param, or should it just be
> automatic for qemu/kvm and turned off for everything else.

For now, let's leave it configurable. If we can credibly (i.e. have
reasonably benchmarks to back it up) say that it doesn't cause
regressions, we can yank the flag. Provided that leaving the flag at its
default doesn't screw up UML or Xen, that is.

> 3. create_image is a large annoying method. I tried to clean it up a
> bit, but it could probably use a bit more refactoring.

Yes. Yes, it could. And tests! :)

> 4. disk.py seems to be only used by the hypervisors, so perhaps it should move into virt dir.

Sounds reasonable. No need for it to block this patch, though.

> 5. disk.py/partition() is unused now. Should we leave it in or throw it away?

KILL IT!

> === modified file 'contrib/nova.sh'
> --- contrib/nova.sh 2010-12-30 00:07:41 +0000
> +++ contrib/nova.sh 2011-01-13 20:44:35 +0000
> @@ -83,9 +83,10 @@
> sudo /etc/init.d/iscsitarget restart
> sudo modprobe kvm
> sudo /etc/init.d/libvirt-bin restart
> + sudo modprobe nbd

This belongs in Nova proper. If qemu-nbd doesn't do this on its own, we
should do it from Nova.

> === modified file 'nova/compute/disk.py'
> --- nova/compute/disk.py 2011-01-04 05:23:35 +0000
> +++ nova/compute/disk.py 2011-01-13 20:44:35 +0000
> @@ -24,10 +24,12 @@
[...]
> -def extend(image, size, execute):
> +def extend(image, size):
> + """Increase image to size"""
> file_size = os.path.getsize(image)
> if file_size >= size:
> return
> - return execute('truncate -s size %s' % (image,))
> -
> -
> -def inject_data(image, key=None, net=None, partition=None, execute=None):
> + utils.execute('truncate -s %s %s' % (size, image))

Uh.. Yeah, good call :) I apparantly always use m1.tiny, so this has
probably never been execised.

> + # NOTE(vish): attempts to resize filesystem
> + utils.execute('e2fsck -fp %s' % image, check_exit_code=False)
> + utils.execute('resize2fs %s' % image, check_exit_code=False)

When we get rid of the injection business, this needs to go as well. I
really think we should keep our hands off of the users's images.

> +def _link_device(image, nbd):
> + """Link image to device using loopback or nbd"""
> + if nbd:
> + device = _allocate_device()
> + utils.execute('sudo qemu-nbd -c %s %s' % (device, image))
> + # NOTE(vish): this forks into another process, so give it a chance
> + # to set up before continuuing
> + time.sleep(1)
> + return device

Yeah, this is really annoying. qemu-nbd can either go into the
background immediately or not go into the background at all, so there's
no feedback available as to when its been connected.

You're going to laugh when I tell you this, but the way to detect if a
given nbd device is "active" is to check if /sys/block/<devname>/pid
exists. I kid you not. It's how the canonical...

review: Needs Information
Revision history for this message
Vish Ishaya (vishvananda) wrote :
Download full text (11.6 KiB)

Below

On Jan 13, 2011, at 2:03 PM, Soren Hansen wrote:

> Review: Needs Information
>> 1. I haven't tested the other hypervisors. I may have broken libvirt
>> xen support and uml support with this patch.
>
> I haven't checked either, but your changes look as likely to break QEmu
> as the others, so if that works, it's probably good :)
>
>> 2. Is it useful to have a use_cow_images param, or should it just be
>> automatic for qemu/kvm and turned off for everything else.
>
> For now, let's leave it configurable. If we can credibly (i.e. have
> reasonably benchmarks to back it up) say that it doesn't cause
> regressions, we can yank the flag. Provided that leaving the flag at its
> default doesn't screw up UML or Xen, that is.
>
>> 3. create_image is a large annoying method. I tried to clean it up a
>> bit, but it could probably use a bit more refactoring.
>
> Yes. Yes, it could. And tests! :)
>
>> 4. disk.py seems to be only used by the hypervisors, so perhaps it should move into virt dir.
>
> Sounds reasonable. No need for it to block this patch, though.
>
>> 5. disk.py/partition() is unused now. Should we leave it in or throw it away?
>
> KILL IT!

Done

>
>> === modified file 'contrib/nova.sh'
>> --- contrib/nova.sh 2010-12-30 00:07:41 +0000
>> +++ contrib/nova.sh 2011-01-13 20:44:35 +0000
>> @@ -83,9 +83,10 @@
>> sudo /etc/init.d/iscsitarget restart
>> sudo modprobe kvm
>> sudo /etc/init.d/libvirt-bin restart
>> + sudo modprobe nbd
>
> This belongs in Nova proper. If qemu-nbd doesn't do this on its own, we
> should do it from Nova.

I feel like punting on this for now, since I'm not sure where this should be done.

>
>> === modified file 'nova/compute/disk.py'
>> --- nova/compute/disk.py 2011-01-04 05:23:35 +0000
>> +++ nova/compute/disk.py 2011-01-13 20:44:35 +0000
>> @@ -24,10 +24,12 @@
> [...]
>> -def extend(image, size, execute):
>> +def extend(image, size):
>> + """Increase image to size"""
>> file_size = os.path.getsize(image)
>> if file_size >= size:
>> return
>> - return execute('truncate -s size %s' % (image,))
>> -
>> -
>> -def inject_data(image, key=None, net=None, partition=None, execute=None):
>> + utils.execute('truncate -s %s %s' % (size, image))
>
> Uh.. Yeah, good call :) I apparantly always use m1.tiny, so this has
> probably never been execised.
>
>> + # NOTE(vish): attempts to resize filesystem
>> + utils.execute('e2fsck -fp %s' % image, check_exit_code=False)
>> + utils.execute('resize2fs %s' % image, check_exit_code=False)
>
> When we get rid of the injection business, this needs to go as well. I
> really think we should keep our hands off of the users's images.

Agreed

>
>> +def _link_device(image, nbd):
>> + """Link image to device using loopback or nbd"""
>> + if nbd:
>> + device = _allocate_device()
>> + utils.execute('sudo qemu-nbd -c %s %s' % (device, image))
>> + # NOTE(vish): this forks into another process, so give it a chance
>> + # to set up before continuuing
>> + time.sleep(1)
>> + return device
>
> Yeah, this is really annoying. qemu-nbd can either go into the
> background immedia...

lp://qastaging/~vishvananda/nova/cow updated
522. By Vish Ishaya

Modified per sorens review.

Moved disk.py
Removed disk.partition
Changed docstrings
Use pid to find nbd devices

523. By Vish Ishaya

merged trunk

Revision history for this message
Soren Hansen (soren) wrote :

>>> + sudo modprobe nbd
>> This belongs in Nova proper. If qemu-nbd doesn't do this on its own, we
>> should do it from Nova.
> I feel like punting on this for now, since I'm not sure where this should be
> done.

Ok. Right before the first call to qemu-nbd, I would guess, but we can
fix that post-FF.

>> Have you given any thought to how the move from partitions to full disks
>> will affect guests? I mean.. The whole partitioning madness wasn't added
>> to being with because someone was bored :)
> It may require changes to some images, but amazon already has a second drive
> instead of partitions for some of their instance types, so there is precedent

There is precedent, yes. In the Ubuntu images we (at least used to)
handle this by having different fstab's depending on whether it was i386
images or amd64 (which is the deciding factor for this on EC2). Bah,
let's try it, deal with the fall out later.

>>> +
>>> + If cow is True, it will make a CoW image instead of a copy.
>>> + """
>>> + if not os.path.exists(target):
>>> + base_dir = os.path.join(FLAGS.instances_path, '_base')
>>> + if not os.path.exists(base_dir):
>>> + os.mkdir(base_dir)
>>> + os.chmod(base_dir, 0777)
> > I really don't like this chmod call. Who needs access? Let's work ou
> > twho to make that happen properly.
> yes the create image is doing the same thing to make sure libvirt/qemu can
> access it. I don't want to risk breaking permissions so I made a bug for
> this:
> https://bugs.launchpad.net/nova/+bug/702651

Cool.

> >> + if inst['instance_type'] == 'm1.tiny' or prefix == 'rescue-':
> >> + size = None
> >> + root_fname += "_sm"
> > What does this suffix denote?
> The sm is for images that aren't resized to 10G
> so if you launch ami-tty as m1.tiny, the 25M image will be cached as ami-
> tty_sm and if you launch it as m1.small the 10G resized image will be cached
> as ami-tty. I needed to differentiate so that we can still launch without
> forcing a resize, which is truly necessary for testing.

Makes sense.

Alright, lgtm.

review: Approve

Preview Diff

[H/L] Next/Prev Comment, [J/K] Next/Prev File, [N/P] Next/Prev Hunk
The diff is not available at this time. You can reload the page or download it.