MAAS

Merge lp://qastaging/~rvb/maas/pc-bug-1064777 into lp://qastaging/~maas-committers/maas/trunk

pc-bug-1064777
Merge into trunk

Proposed by Raphaël Badin on 2013-06-07

Status:

Merged

Approved by:

Raphaël Badin on 2013-06-12

Approved revision:

no longer in the source branch.

Merged at revision:

1521

Proposed branch:

lp://qastaging/~rvb/maas/pc-bug-1064777

Merge into:

lp://qastaging/~maas-committers/maas/trunk

Diff against target:

122 lines (+51/-0)

4 files modified

src/maasserver/api.py (+2/-0)
src/maasserver/models/node.py (+14/-0)
src/maasserver/tests/test_api.py (+15/-0)
src/maasserver/tests/test_node.py (+20/-0)

To merge this branch:

bzr merge lp://qastaging/~rvb/maas/pc-bug-1064777

High

Fix Released

Link a bug report

Reviewer	Date Requested	Status
Jeroen T. Vermeulen (community)	2013-06-07	Approve on 2013-06-12
Gavin Panella (community)		Approve on 2013-06-12
Review via email: mp+168059@code.qastaging.launchpad.net

Commit message

Add API method to fetch the IP addresses attached to a node.

Description of the change

This branch adds a new method on the Node object to return the IP addresses allocated to that node. It also adds a new API method to return these IP addresses.

I initially wanted to add a property on the API representation of a node but since the list of IP addresses allocated to a node is something that is impossible to cache (given the structure of our database and the tools that Django uses to cache objects), it made sense to add a new API method.

I found one solution to have the ip addresses as a property on the node and still get a number of queries independant of the number of nodes returned by the API: http://paste.ubuntu.com/5754070/. The problem with that is that the value of 'dhcpleases_qs' has to be kept in memory and this is the entire mac->IP mapping for the whole nodegroup. Having that object in memory and iterating over it to find the relevant mappings for each node was not an unacceptable solution.

This was pre-imp'ed with Jeroen.

Revision history for this message

Jeroen T. Vermeulen (jtv) wrote on 2013-06-11:

I hate to slow you down, but I don't think this branch is quite done yet.

Given that the querying is now under our control rather than Piston's, I wonder if it might make more sense to have an API method that retrieves node IPs in bulk, similar to what you did in the UI, and returns the result as a mapping from each node to its IP addresses. It'd be a bit more complicated, of course, but it would naturally guide the caller towards an efficient approach.

Once upon a time, in the early days of the MAAS project, it was suggested that we do everything this way and we all nodded but then the practicalities of real life took over. Given the explicit scaling worries here, I think this would be a good place to do it the scalable way.

Now that Node.ip_addresses is a method rather than a property, its name ought to start with a verb.

The docstring for ip_addresses says that there is no database relation between the DHCPLease and MACAddress tables. I believe this is technically incorrect: the problem is that there is no ORM relationship, but all such connections are ad-hoc in SQL, rather than pre-declared. (And to get really anorakky about it, in the relational model tables *are* the relations so it's a bit silly to say there's no relation between tables.)

More importantly, the docstring has a warning to take care with this method at scale. To someone who's unfamiliar with the situation, that's rather vague. A bit like warning that "viewer discretion is advised." What kind of scale are you warning about? I think it boils down to "try not to call this very very often" — which actually applies to most functions.

Where are the negative tests? All that's tested here is the happy path, which is why the overlap between the two tests is so great. In unit testing we want to see not just that the method returns what it should, but also that it doesn't return what it shouldn't.

Jeroen

Revision history for this message

Raphaël Badin (rvb) wrote on 2013-06-11:

> I hate to slow you down, but I don't think this branch is quite done yet.
>
> .
>
> Given that the querying is now under our control rather than Piston's, I
> wonder if it might make more sense to have an API method that retrieves node
> IPs in bulk, similar to what you did in the UI, and returns the result as a
> mapping from each node to its IP addresses. It'd be a bit more complicated,
> of course, but it would naturally guide the caller towards an efficient
> approach.
>
> Once upon a time, in the early days of the MAAS project, it was suggested that
> we do everything this way and we all nodded but then the practicalities of
> real life took over. Given the explicit scaling worries here, I think this
> would be a good place to do it the scalable way.

Adding such a method might make sense but this would be different from what I'm trying to achieve here. We want the IP addresses to be listed on the node's page and we also want the API counterpart for this: get the IP addresses for a node.

>
> .
>
> Now that Node.ip_addresses is a method rather than a property, its name ought
> to start with a verb.

Well, I'm not sure we have a very firm rul when it comes to this sort of things (tag_names() is a method for instance). But okay…

> The docstring for ip_addresses says that there is no database relation between
> the DHCPLease and MACAddress tables. I believe this is technically incorrect:
> the problem is that there is no ORM relationship, but all such connections are
> ad-hoc in SQL, rather than pre-declared. (And to get really anorakky about
> it, in the relational model tables *are* the relations so it's a bit silly to
> say there's no relation between tables.)
>
> More importantly, the docstring has a warning to take care with this method at
> scale. To someone who's unfamiliar with the situation, that's rather vague.
> A bit like warning that "viewer discretion is advised." What kind of scale
> are you warning about? I think it boils down to "try not to call this very
> very often" — which actually applies to most functions.

Okay, I'll get rid of the warning.

> Where are the negative tests? All that's tested here is the happy path, which
> is why the overlap between the two tests is so great. In unit testing we want
> to see not just that the method returns what it should, but also that it
> doesn't return what it shouldn't.

We also want to keep the number of test limited but okay, I'll add some.

> I hate to slow you down, but I don't think this branch is quite done yet.
> 
> .
> 
> Given that the querying is now under our control rather than Piston's, I
> wonder if it might make more sense to have an API method that retrieves node
> IPs in bulk, similar to what you did in the UI, and returns the result as a
> mapping from each node to its IP addresses.  It'd be a bit more complicated,
> of course, but it would naturally guide the caller towards an efficient
> approach.
> 
> Once upon a time, in the early days of the MAAS project, it was suggested that
> we do everything this way and we all nodded but then the practicalities of
> real life took over.  Given the explicit scaling worries here, I think this
> would be a good place to do it the scalable way.

Adding such a method might make sense but this would be different from what I'm trying to achieve here.  We want the IP addresses to be listed on the node's page and we also want the API counterpart for this: get the IP addresses for a node.

> 
> .
> 
> Now that Node.ip_addresses is a method rather than a property, its name ought
> to start with a verb.

Well, I'm not sure we have a very firm rul when it comes to this sort of things (tag_names() is a method for instance).  But okay…

> The docstring for ip_addresses says that there is no database relation between
> the DHCPLease and MACAddress tables.  I believe this is technically incorrect:
> the problem is that there is no ORM relationship, but all such connections are
> ad-hoc in SQL, rather than pre-declared.  (And to get really anorakky about
> it, in the relational model tables *are* the relations so it's a bit silly to
> say there's no relation between tables.)
> 
> More importantly, the docstring has a warning to take care with this method at
> scale.  To someone who's unfamiliar with the situation, that's rather vague.
> A bit like warning that "viewer discretion is advised."  What kind of scale
> are you warning about?  I think it boils down to "try not to call this very
> very often" — which actually applies to most functions.

Okay, I'll get rid of the warning.

> Where are the negative tests?  All that's tested here is the happy path, which
> is why the overlap between the two tests is so great.  In unit testing we want
> to see not just that the method returns what it should, but also that it
> doesn't return what it shouldn't.

We also want to keep the number of test limited but okay, I'll add some.

Revision history for this message

Gavin Panella (allenap) wrote on 2013-06-11:

> I found one solution to have the ip addresses as a property on the
> node and still get a number of queries independant of the number of
> nodes returned by the API: http://paste.ubuntu.com/5754070/. The
> problem with that is that the value of 'dhcpleases_qs' has to be
> kept in memory and this is the entire mac->IP mapping for the whole
> nodegroup. Having that object in memory and iterating over it to
> find the relevant mappings for each node was not an unacceptable
> solution.

Fwiw, a mapping of MAC to IP address with ~1,000,000 entries is only
about 150MB.

{{{
from maastesting.factory import factory
from sys import getsizeof
from itertools import imap

def print_vmsize():
    with open("/proc/self/status", "rb") as status:
        for line in status:
            if line.startswith("VmSize"):
                print line.strip()

getrandip = factory.getRandomIPAddress
getrandmac = factory.getRandomMACAddress

print_vmsize()

mapping = {
    getrandmac(): getrandip()
    for _ in xrange(1000000)
    }

size = getsizeof(mapping)
size = sum(imap(getsizeof, mapping.iterkeys()), size)
size = sum(imap(getsizeof, mapping.itervalues()), size)

print_vmsize()

print "mapping is %dB, %dkB, %dMB" % (
size, size // 1024, size // 1048576)
}}}

Revision history for this message

Raphaël Badin (rvb) wrote on 2013-06-11:

> > I found one solution to have the ip addresses as a property on the
> > node and still get a number of queries independant of the number of
> > nodes returned by the API: http://paste.ubuntu.com/5754070/. The
> > problem with that is that the value of 'dhcpleases_qs' has to be
> > kept in memory and this is the entire mac->IP mapping for the whole
> > nodegroup. Having that object in memory and iterating over it to
> > find the relevant mappings for each node was not an unacceptable
> > solution.
>
> Fwiw, a mapping of MAC to IP address with ~1,000,000 entries is only
> about 150MB.

Interesting… but we're talking about the result of a queryset here, which contains Django objects… and I suspect that will be significantly more heavy. And that's just for the memory, there is also the CPU overhead of having to extract the right mappings out of the general mapping for each node.

Revision history for this message

Gavin Panella (allenap) wrote on 2013-06-11:

> Interesting… but we're talking about the result of a queryset here,
> which contains Django objects… and I suspect that will be
> significantly more heavy.

Well, if the results are only needed to construct the mapping it can
be queried flat. Also, that's for 1000000 nodes; a cluster should not
have more than ~2000 nodes or so.

> And that's just for the memory, there is also the CPU overhead of
> having to extract the right mappings out of the general mapping for
> each node.

Well, if it's a better solution conceptually, try it out and measure
it. Perhaps it's okay after all.

+ @operation(idempotent=True)
+ def get_ip_addresses(self, request, system_id):

This is an invitation to potato; this should not land. The result of a
hungry client calling this in a loop will be far far worse than
holding a few hundred MBs in memory once.

Revision history for this message

Raphaël Badin (rvb) wrote on 2013-06-11:

> > Interesting… but we're talking about the result of a queryset here,
> > which contains Django objects… and I suspect that will be
> > significantly more heavy.
>
> Well, if the results are only needed to construct the mapping it can
> be queried flat. Also, that's for 1000000 nodes; a cluster should not
> have more than ~2000 nodes or so.

I don't think ~2000 nodes is what we've been told to aim for. More like ~100000 nodes IIRC.

We are using Django's caching mechanism here so we cannot really decide what we're caching. Here is what it looks like http://paste.ubuntu.com/5754928/.

> > And that's just for the memory, there is also the CPU overhead of
> > having to extract the right mappings out of the general mapping for
> > each node.
>
> Well, if it's a better solution conceptually, try it out and measure
> it. Perhaps it's okay after all.
>
> + @operation(idempotent=True)
> + def get_ip_addresses(self, request, system_id):
>
> This is an invitation to potato; this should not land. The result of a
> hungry client calling this in a loop will be far far worse than
> holding a few hundred MBs in memory once.

Well, you can say the exact same thing about the GET operation on a node. Calling it in a loop would be a disaster as well.

The solution, as Jeroen suggest, and if we think we *really* want to provide the entire mapping, is to create a different, nodegroup-level method but it does not mean that this method here should be removed.

Revision history for this message

Raphaël Badin (rvb) wrote on 2013-06-11:

On the bright side, when the API calls will be batched, we will be able to set of maximum to the number of nodes returned by the API in one go.

Revision history for this message

Raphaël Badin (rvb) wrote on 2013-06-11:

> Well, if it's a better solution conceptually, try it out and measure
> it. Perhaps it's okay after all.

Okay, I'll time this precisely. Having the list of addresses as a field on the node is definitely much cleaner and much more RESTish so if it's doable, we should do it this way. I'll do tests with 100000 mac->mappings.

Revision history for this message

Raphaël Badin (rvb) wrote on 2013-06-11:

s/100000 mac->mappings/100000 mac->ip mappings/

Revision history for this message

Gavin Panella (allenap) wrote on 2013-06-11:

> I don't think ~2000 nodes is what we've been told to aim for. More like
> ~100000 nodes IIRC.

For some reason I thought this was likely to be limited to a cluster,
not a region. Still, 100000 would mean only 15MB or so if you were
able to use just IP and MAC values instead of row-level objects.

> We are using Django's caching mechanism here so we cannot really decide what
> we're caching. Here is what it looks like http://paste.ubuntu.com/5754928/.

That's better than I expected!

Can we not do a query just for values with Django instead of having to
materialize whole row objects? Would that help?

...
> > This is an invitation to potato; this should not land. The result of a
> > hungry client calling this in a loop will be far far worse than
> > holding a few hundred MBs in memory once.
>
> Well, you can say the exact same thing about the GET operation on a node.
> Calling it in a loop would be a disaster as well.

That's true, although it's unlikely anyone will do that, because they
can't loop until they have system IDs (or whatever it is we use to
index nodes), and when they get those they always(?) get the whole
node object, i.e. they don't need to GET the node.

In any case, arguing that we did it wrong in the past so it doesn't
matter if we do it wrong again is not really in the interests of
making a good product. I know we have to be pragmatic (indeed, to the
point that it really hurt in 2012) but I think doing this right is
comparatively cheap now, compared to expensive to deal with and
transition people off in the future if we do it the potato way.

> The solution, as Jeroen suggest, and if we think we *really* want to provide
> the entire mapping, is to create a different, nodegroup-level method but it
> does not mean that this method here should be removed.

Fwiw, I think we should have only the method on the nodegroup.

Revision history for this message

Raphaël Badin (rvb) wrote on 2013-06-11:

> > We are using Django's caching mechanism here so we cannot really decide what
> > we're caching. Here is what it looks like http://paste.ubuntu.com/5754928/.
>
> That's better than I expected!
>
> Can we not do a query just for values with Django instead of having to
> materialize whole row objects? Would that help?

That's my point, we cannot do that easily (i.e. not without doing brain surgery on Django).

> In any case, arguing that we did it wrong in the past so it doesn't
> matter if we do it wrong again is not really in the interests of
> making a good product.

That's *really* not what I said. My point is that you cannot prevent a user to shoot himself in the foot if he wants to do that.

> Fwiw, I think we should have only the method on the nodegroup.

I strongly disagree. What if you want the IP of *one* node?

Revision history for this message

Gavin Panella (allenap) wrote on 2013-06-11:

> > > We are using Django's caching mechanism here so we cannot really
> > > decide what we're caching. Here is what it looks like
> > > http://paste.ubuntu.com/5754928/.
> >
> > That's better than I expected!
> >
> > Can we not do a query just for values with Django instead of
> > having to materialize whole row objects? Would that help?
>
> That's my point, we cannot do that easily (i.e. not without doing
> brain surgery on Django).

I know I like to know Django, but this actually sucks.

>
> > In any case, arguing that we did it wrong in the past so it doesn't
> > matter if we do it wrong again is not really in the interests of
> > making a good product.
>
> That's *really* not what I said. My point is that you cannot prevent a user
> to shoot himself in the foot if he wants to do that.

Let's not leave the guns lying around at least.

>
> > Fwiw, I think we should have only the method on the nodegroup.
>
> I strongly disagree. What if you want the IP of *one* node?

Call it with one node.

Let me clarify what I meant though: if it's not possible, or too much
work, to populate node representations with IP addresses, then let's
just have a function on the nodegroup. I might have given the
impression that I preferred just having a function over all
alternatives.

Revision history for this message

Gavin Panella (allenap) wrote on 2013-06-11:

> I know I like to know Django, but this actually sucks.

*knock*

Revision history for this message

Julian Edwards (julian-edwards) wrote on 2013-06-12:

On Tuesday 11 Jun 2013 14:57:27 you wrote:
> > > We are using Django's caching mechanism here so we cannot really decide
> > > what we're caching. Here is what it looks like
> > > http://paste.ubuntu.com/5754928/.>
> > That's better than I expected!
> >
> > Can we not do a query just for values with Django instead of having to
> > materialize whole row objects? Would that help?
>
> That's my point, we cannot do that easily (i.e. not without doing brain
> surgery on Django).
> > In any case, arguing that we did it wrong in the past so it doesn't
> > matter if we do it wrong again is not really in the interests of
> > making a good product.
>
> That's *really* not what I said. My point is that you cannot prevent a user
> to shoot himself in the foot if he wants to do that.

As Gavin said, leaving the gun around is also not a good idea. We should not
be encouraging users to do it the wrong way at all. We did that with LP and
paid the price.

Gavin said:
> > Fwiw, I think we should have only the method on the nodegroup.

Actually, preferably the global level.

> I strongly disagree. What if you want the IP of *one* node?

Then you pass only one node ID into a bulk operation, which returns a set of
one node's IPs.

When this project started I remember having the discussion with you guys about
making *all* API methods take multiple objects to work on and return where
possible, rather than single object operations. Somewhere, we dropped the
ball and we've ended up with a few API ops that encourage potatoes.

Let's please endeavour to be:
a) consistent with naming - when coding and reviewing take special care to
ensure that our api op names are consistent and predictable
b) put API operations as bulk ops *first*. If it doesn't make sense to do
that in rare situations, then by all means put it on single objects. But the
instinctive change should be to do a bulk op.

Cheers.

Revision history for this message

Raphaël Badin (rvb) wrote on 2013-06-12:

> As Gavin said, leaving the gun around is also not a good idea. We should not
> When this project started I remember having the discussion with you guys about
> making *all* API methods take multiple objects to work on and return where
> possible, rather than single object operations. Somewhere, we dropped the
> ball and we've ended up with a few API ops that encourage potatoes.

I agree with the general philosophy here but there is something that we also need to take into account: we have a RESTful API. This is a major constraint as it "pushes" us toward returning objects (JSON structures corresponding to business objects: nodes, clusters, etc) rather than creating ad-hoc methods returning ad-hoc objects. Here, the only reasonable thing to do is to include the list of IP addresses in the node's JSON representation. Everything else will be a workaround.

Revision history for this message

Julian Edwards (julian-edwards) wrote on 2013-06-12:

On Wednesday 12 Jun 2013 07:00:37 you wrote:
> > As Gavin said, leaving the gun around is also not a good idea. We should
> > not When this project started I remember having the discussion with you
> > guys about making *all* API methods take multiple objects to work on and
> > return where possible, rather than single object operations. Somewhere,
> > we dropped the ball and we've ended up with a few API ops that encourage
> > potatoes.
> I agree with the general philosophy here but there is something that we also
> need to take into account: we have a RESTful API. This is a major
> constraint as it "pushes" us toward returning objects (JSON structures
> corresponding to business objects: nodes, clusters, etc) rather than
> creating ad-hoc methods returning ad-hoc objects. Here, the only
> reasonable thing to do is to include the list of IP addresses in the node's
> JSON representation. Everything else will be a workaround.

Including the list of IP addresses in the node's JSON representation is the
only reasonable thing to do. But there's no reason why we can't have a
collection of those JSON representations.

In terms of being RESTful, we need objects that represent collections. Then
we define operations on those objects. But honestly I'd rather have a sane
API than slavishly following some dogma.

Revision history for this message

Raphaël Badin (rvb) wrote on 2013-06-12:

All right, so I did some performance testing of my (initial) solution for this, which implies simply putting the list of IP addresses allocated to a node in the JSON representation of a node. I gave up the idea of doing this simply because I thought it would not be acceptable from a performance p.o.v. Turns out the alternative solutions are much worse in terms of design and the performance is actually not that bad!

This is the time it takes to fetch (using the CLI) 100 nodes with IP addresses info attached to all of them. The time is dependent on the number of leases present in the nodegroup (this is the list we cache and iterate over):
100k leases: 3.1s
90k leases: 3s
50k leases: 1.8s
10k leases: 0.8s
0 leases (no IP addresses): 0.6s

The RAM usage is around 1Mb for 10k leases. So even for 100k leases, we're talking about 100Mb.

This sounds acceptable to me… what do you think?

Revision history for this message

Raphaël Badin (rvb) wrote on 2013-06-12:

All right, I've updated this branch now, please have another look.

Revision history for this message

Gavin Panella (allenap) on 2013-06-12:

review: Approve

Revision history for this message

Jeroen T. Vermeulen (jtv) wrote on 2013-06-12:

I'd still much prefer a verb on ip_addresses(). The method queries several database tables, and naming that method as if it barely even qualifies for a getter-style name makes that cost very implicit.

review: Approve

Revision history for this message

Raphaël Badin (rvb) wrote on 2013-06-12:

> I'd still much prefer a verb on ip_addresses(). The method queries several
> database tables, and naming that method as if it barely even qualifies for a
> getter-style name makes that cost very implicit.

The name of the method is what ends up in the JSON structure. A verb would be strange there. Hence the name I've chosen.

Revision history for this message

Jeroen T. Vermeulen (jtv) wrote on 2013-06-12:

Oh! I see.