Merge lp://qastaging/~justin-fathomdb/nova/constraint-scheduler into lp://qastaging/~hudson-openstack/nova/trunk

Proposed by justinsb
Status: Work in progress
Proposed branch: lp://qastaging/~justin-fathomdb/nova/constraint-scheduler
Merge into: lp://qastaging/~hudson-openstack/nova/trunk
Diff against target: 1136 lines (+898/-85)
8 files modified
nova/scheduler/constraint.py (+226/-0)
nova/scheduler/constraint_lib.py (+254/-0)
nova/scheduler/datastore.py (+148/-0)
nova/scheduler/driver.py (+13/-3)
nova/scheduler/simple.py (+24/-35)
nova/scheduler/zone.py (+1/-2)
nova/tests/test_constraintlib.py (+170/-0)
nova/tests/test_scheduler.py (+62/-45)
To merge this branch: bzr merge lp://qastaging/~justin-fathomdb/nova/constraint-scheduler
Reviewer Review Type Date Requested Status
Nachi Ueno (community) Needs Fixing
Nova Core security contacts Pending
Review via email: mp+51857@code.qastaging.launchpad.net

Description of the change

Implementation of constraint-based scheduler.
Created a simple contraint solver and a scheduler that uses it; created constraints that mirrored the existing selection criteria; refactored DB access code to avoid duplication and support future constraints (Working towards 'proximity' allocation or 'specific zone' allocation)

To post a comment you must log in.
Revision history for this message
Soren Hansen (soren) wrote :

I question the usefulness of best-match algorithms. I don't believe that the best match is really interesting at all. Finding best matches typically involves (and indeed that seems to be what happens in this implementation) looking at the entire set of candidates and ordering them according to some criteria. I don't believe this approach scales, and I don't believe it's necessary. It's not unlikely that the top XX% are all just fine candidates, so finding the very best offers no real advantage. Furthermore, the host that is the best match right now might be significantly worse two minutes from now.

Revision history for this message
Sandy Walsh (sandy-walsh) wrote :

Justin, nice work on a very interesting branch.

We're currently working on a similar approach in our Zones and Distributed Scheduler blueprints.

https://blueprints.launchpad.net/nova/+spec/multi-cluster-in-a-region
https://blueprints.launchpad.net/nova/+spec/bexar-distributed-scheduler

Ed Leafe (dabo) is working on a similar body of work to this, based on the current Rackspace/Slicehost implementation of Server Best Match.

Soren has a point that the search doesn't have to be exhaustive. We recently changed our approach from server-side to nearly fully db-side and saw massive performance improvements. Also, limiting the result set to XX% as Soren mentioned was possible once the problem was pushed off.

I think we need to review your branch in some depth before giving a pass/fail. I've sure there are things we can leverage to bring it in line with Distributed Scheduler.

Cheers! ... and stay tuned.

Revision history for this message
justinsb (justin-fathomdb) wrote :

Soren: It's true that an exhaustive search can be expensive, which is why I
didn't code an exhaustive search (other than in the unit tests) :-) There's
a framework for 'Criteria', which the 'Solver' tries to solve as best it
can. You can plug in whatever Solver technique you think best (which might
actually be exhaustive for small sets), but I believe the Solver I've got
here is likely to be reasonable in practice. The approach is that it
identifies the 'most selective criteria' and then steps through those
results in order to find one where all the other criteria are no-more
unhappy (mini-max). I'll document the magic better! I haven't coded
heuristic algorithms yet because - frankly - we're nowhere near the scales
where it becomes necessary and we have no way to take advantage of
heuristics when we're sourcing data from a relational DB, and because there
are difficult questions around behavior when the heuristics fail to find a
good solution or any solution at all.

Sandy: I think the work is non-overlapping with the distributed & multi
schedulers, but I'll check them out in more detail. My goal is to support
more constraints in the scheduler (in particular, co-placement of volumes
and servers), and I'm going to work on this constraint to help motivate this
patch.

I'm going to check out the other branches, and code up a co-placement
Criteria so this isn't just work in the abstract!

Revision history for this message
justinsb (justin-fathomdb) wrote :

I've got a branch up which makes use of the constraint scheduler:
lp:~justin-fathomdb/nova/schedule-compute-near-volume

It's now super-easy to add constraints, even conflicting constraints, and (hopefully) this will still yield reasonable decisions.

As the number of constraints grows, I don't believe that trying to satisfy the constraints manually will scale.

The dependent branch isn't yet ready for merge (I doubt it actually works); I need to write tests for it and that is at the end of a really long chain of merge requests!!

Revision history for this message
Nachi Ueno (nati-ueno) wrote :

As you pointed, I suppose your code needs more test code before merged.
In addtion unit tests for ConstraintLib fail.

======================================================================
ERROR: test_big (nova.tests.test_constraintlib.ConstraintLibTestCase)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/nati/workspace/constraint-scheduler/nova/tests/test_constraintlib.py", line 106, in test_big
    self._do_random_test(rnd, item_count, constraints_count)
  File "/home/nati/workspace/constraint-scheduler/nova/tests/test_constraintlib.py", line 122, in _do_random_test
    solver = constraint_lib.Solver(None, None)
TypeError: __init__() takes exactly 2 arguments (3 given)

======================================================================
ERROR: test_five_five__five (nova.tests.test_constraintlib.ConstraintLibTestCase)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/nati/workspace/constraint-scheduler/nova/tests/test_constraintlib.py", line 97, in test_five_five__five
    self._do_random_test(rnd, 5, 5)
  File "/home/nati/workspace/constraint-scheduler/nova/tests/test_constraintlib.py", line 122, in _do_random_test
    solver = constraint_lib.Solver(None, None)
TypeError: __init__() takes exactly 2 arguments (3 given)

review: Needs Fixing
Revision history for this message
justinsb (justin-fathomdb) wrote :

Fixed up the unit tests (I was working in a derived branch and tried to keep this one up to date - probably more trouble than it's worth with bazaar)

This branch has good test coverage; and the dependent one now does also.

Revision history for this message
Nachi Ueno (nati-ueno) wrote :

Thank you for your fix.
Would you add more test case for ConstraintDriverTestCase?

Revision history for this message
justinsb (justin-fathomdb) wrote :

It's easy to miss, but the ConstraintDriverTestCase derives from
_SchedulerBaseTestCase, so inherits basically the same code coverage as the
SimpleScheduler (although the SimpleScheduler supports some 'directed
placement' in a bit of a hacky way, which I don't support in the
ConstraintScheduler) But the ConstraintScheduler (which is never used
unless someone specifically requests it) behaves the same way as the
SimpleScheduler under the basic tests.

In addition, there are unit tests for the constraint solving library.

I believe that's reasonable test coverage. There can always be more tests,
but I think this is a reasonable start for something that is only
user-selectable. As we add use-cases, I'm sure we'll find issues and add
tests, both for bugs, and places where extra behaviour is needed. I did
find a bug when developing the derived directed-location branch, and I
back-ported that fix (which is how I broke the unit tests in the first
place!) That bug was when the min-scores were tied, it didn't fall-back to
consider the secondary criteria. The constraint solving library tests
didn't hit this because they were using randomized data, so were very
unlikely to get ties.

Is that OK?

Revision history for this message
Nachi Ueno (nati-ueno) wrote :

I'm sorry for late reply, because of earthquake in Japan.
As you said, it is hard to write sufficient test code.
However, is it possible to add Schedure test code which corresponds to constraint solving library tests? Test code can be used a document, so it shows how to use the Constraint Schedure.

> It's easy to miss, but the ConstraintDriverTestCase derives from
> _SchedulerBaseTestCase, so inherits basically the same code coverage as the
> SimpleScheduler (although the SimpleScheduler supports some 'directed
> placement' in a bit of a hacky way, which I don't support in the
> ConstraintScheduler) But the ConstraintScheduler (which is never used
> unless someone specifically requests it) behaves the same way as the
> SimpleScheduler under the basic tests.
>
> In addition, there are unit tests for the constraint solving library.
>
> I believe that's reasonable test coverage. There can always be more tests,
> but I think this is a reasonable start for something that is only
> user-selectable. As we add use-cases, I'm sure we'll find issues and add
> tests, both for bugs, and places where extra behaviour is needed. I did
> find a bug when developing the derived directed-location branch, and I
> back-ported that fix (which is how I broke the unit tests in the first
> place!) That bug was when the min-scores were tied, it didn't fall-back to
> consider the secondary criteria. The constraint solving library tests
> didn't hit this because they were using randomized data, so were very
> unlikely to get ties.
>
> Is that OK?

Revision history for this message
justinsb (justin-fathomdb) wrote :

Nachi: There are tests already for the constraint scheduler, but the only constraints implemented in this branch are ones that assign to the least-loaded machine. If you look in the derived branch, there are examples of more advanced constraints and better tests

In the derived branch, I add a constraint that favors putting a compute node as close as possible to a specified location, based on a topology:
https://code.launchpad.net/~justin-fathomdb/nova/schedule-compute-near-volume/+merge/52520

Revision history for this message
justinsb (justin-fathomdb) wrote :

Moving to WIP - we're going to discuss at Design Summit

Unmerged revisions

725. By justinsb

Back-ported fixes from derived branch

724. By justinsb

Added (derived) unit test for constraint scheduler

723. By justinsb

Fixed pep8, missing copyrights

722. By justinsb

Use datastore in simple scheduler for retrieval in non-forced case

721. By justinsb

Remove DB update code from schedulers; they deal with the datastore now

720. By justinsb

Refactoring data store so that we're not using DB objects
(Different constraints will likely need different queries, and the DB binding will be problematic)

719. By justinsb

Optimization for when we know we're not the worst constraint

718. By justinsb

Small cleanup of tests & pep8

717. By justinsb

Fix logical error, use a priority queue in the constraint solver

716. By justinsb

A few fixes (passes most simple scheduler tests)

Preview Diff

[H/L] Next/Prev Comment, [J/K] Next/Prev File, [N/P] Next/Prev Hunk
The diff is not available at this time. You can reload the page or download it.