Merge lp://qastaging/~edwin-grubbs/launchpad/bug-490659-series-timeout into lp://qastaging/launchpad/db-devel

Proposed by Edwin Grubbs
Status: Merged
Approved by: Stuart Bishop
Approved revision: no longer in the source branch.
Merged at revision: 9696
Proposed branch: lp://qastaging/~edwin-grubbs/launchpad/bug-490659-series-timeout
Merge into: lp://qastaging/launchpad/db-devel
Diff against target: 429 lines (+135/-62)
17 files modified
database/schema/patch-2208-01-0.sql (+9/-0)
database/schema/security.cfg (+1/-0)
database/schema/trusted.sql (+27/-0)
lib/canonical/launchpad/webapp/sorting.py (+3/-2)
lib/lp/registry/browser/__init__.py (+1/-1)
lib/lp/registry/browser/configure.zcml (+7/-0)
lib/lp/registry/browser/productseries.py (+14/-0)
lib/lp/registry/browser/tests/productseries-views.txt (+1/-1)
lib/lp/registry/browser/tests/test_milestone.py (+1/-1)
lib/lp/registry/interfaces/milestone.py (+2/-0)
lib/lp/registry/model/milestone.py (+9/-2)
lib/lp/registry/model/projectgroup.py (+17/-5)
lib/lp/registry/stories/productseries/xx-productseries-series.txt (+2/-2)
lib/lp/registry/templates/object-milestones.pt (+2/-2)
lib/lp/registry/templates/productseries-detailed-display.pt (+38/-0)
lib/lp/registry/templates/productseries-macros.pt (+0/-42)
lib/lp/registry/templates/productseries-status.pt (+1/-4)
To merge this branch: bzr merge lp://qastaging/~edwin-grubbs/launchpad/bug-490659-series-timeout
Reviewer Review Type Date Requested Status
Stuart Bishop (community) db Approve
Robert Collins (community) Approve
Brad Crittenden (community) code Approve
Review via email: mp+32555@code.qastaging.launchpad.net

Description of the change

Summary
-------

Bug 490659 reports a timeout caused by a project with over 700 releases.
The ProductSeries' milestones and releases attributes both used a python
function to expand the version numbers in the milestone name so that
milestone version numbers such as 1.1, 1.10, and 1.100 would sort
correctly. To avoid loading all 700 releases from the database just to
sort them in python to find the most recent ones, I added the
milestone_sort_key(timestamp dateexpected, text name) postgresql
function, so that the sorting can take place in the db. I also added an
index to the Milestone table using this function.

I have never submitted a db function for review before, so I'm sure that
I'm doing something wrong. For some reason, the configuration I added to
security.cfg doesn't seem to have any effect, so I have to run the
following sql on launchpad_ftest_template and laucnhpad_dev.

GRANT EXECUTE ON FUNCTION milestone_sort_key(timestamp, text) TO PUBLIC;

Implementation details
----------------------

Added milestone_sort_key(timestamp, text) db function.
    database/schema/patch-2207-99-0.sql
    database/schema/security.cfg

Added __len__ to DecoratedResultSet, so that code expecting the model to
return a list won't blow up on the result set.
    lib/canonical/launchpad/components/decoratedresultset.py
    lib/canonical/launchpad/zcml/decoratedresultset.zcml

Changed the productseries detailed-display macro into its own page since
it makes it cleaner to limit the number of milestones and releases in
the view attributes.
    lib/lp/registry/browser/configure.zcml
    lib/lp/registry/browser/productseries.py
    lib/lp/registry/templates/productseries-macros.pt
    lib/lp/registry/templates/productseries-status.pt
    lib/lp/registry/stories/productseries/xx-productseries-series.txt

Converted HasMilestones' milestones and releases attributes to return a
DecoratedResultSet instead of alist.
    lib/lp/registry/model/milestone.py
    lib/lp/registry/browser/tests/productseries-views.txt

Tests
-----

./bin/test -vv -t 'xx-productseries-series.txt|productseries-views.txt'

Demo and Q/A
------------

Create a bunch of sample milestones and releases on launchpad.dev:
    INSERT INTO Milestone (name, product, productseries)
    SELECT 'a' || i, (select id from product where name = 'bzr'),
        (select id from productseries where name='trunk' and product = (
                select id from product where name = 'bzr'))
    FROM generate_series(1, 100) as i;

    INSERT INTO Milestone (name, product, productseries, active)
    SELECT
        'rel-' || i,
        (select id from product where name = 'bzr'),
        (select id from productseries where name='trunk' and product = (
                select id from product where name = 'bzr')),
        FALSE
    FROM generate_series(1, 100) as i;

    INSERT INTO ProductRelease (owner, milestone, datereleased)
    SELECT 1, id, now()
    FROM Milestone
    WHERE name ~ 'rel-.*';

* Open http://launchpad.dev/bzr/+series
  * Only 12 milestones and 12 releases should be listed in the gray box
    for trunk.

To post a comment you must log in.
Revision history for this message
Robert Collins (lifeless) wrote :

I don't like the __len__ introduction as it blurs the line between lists and resultsets more : and we suffer because the line is blurry already. I realise you probably have a stack of stuff to put on top of this that is doing lens : however I'm worried about the knock on effects of the change : we should do it in IResultSet if we're doing it at all, not piecemeal on DecoratedResultSet. You could use a lazr.delegates to add __len__ just to your specific case (or, and perhaps better?) work up the stack replacing listified stuff with resultset aware code.

Other than that I think this is conceptually fine and an appropriate solution.

I'm a tad rusty on my plpython gotchas, so I'll let Stuart review that for you.

review: Needs Fixing
Revision history for this message
Stuart Bishop (stub) wrote :

The function and a database comment should be added directly to trusted.sql. You will still need the database patch to add the index. I'll allocate a number shortly.

The function should return """ '%s %s' % (str(date_expected), name) """, rather than the implicit cast of repr(date_expected, name).

You are importing plpy but no longer using it.

I agree with Robert on __len__ - it is deliberately left out to avoid it accidentally being called, as it can be very expensive. In particular, Python invoked __len__ if it exists when casting something to a list, doubling the amount of database time materializing a result set (we reported this as a Python bug - not sure if it is still open).

review: Approve (db)
Revision history for this message
Stuart Bishop (stub) wrote :

patch-2208-01-0.sql

Revision history for this message
Edwin Grubbs (edwin-grubbs) wrote :
Download full text (14.4 KiB)

I got rid of __len__ on DecoratedResultSet. Instead, I added IHasMilestones.has_milestones, since this is what len() was being used to test for, and a database count is much more intensive than finding a single unsorted matching row.

I made the changes to to the db function, and I discovered why my configuration in security.cfg wasn't granting execute to public. It expects me to specify "timestamp without time zone" instead of just "timestamp". A GRANT statement with just "timestamp" works, so it must just be the introspection that security.py does to figure out whether it is dealing with a function or a table.

I have included a new full diff below, since it is less confusing than the incremental diff and not much bigger.

-Edwin

=== added file 'database/schema/patch-2208-01-0.sql'
--- database/schema/patch-2208-01-0.sql 1970-01-01 00:00:00 +0000
+++ database/schema/patch-2208-01-0.sql 2010-08-19 17:15:22 +0000
@@ -0,0 +1,9 @@
+-- Copyright 2010 Canonical Ltd. This software is licensed under the
+-- GNU Affero General Public License version 3 (see the file LICENSE).
+SET client_min_messages=ERROR;
+
+CREATE INDEX milestone_dateexpected_name_sort
+ON Milestone
+USING btree (milestone_sort_key(dateexpected, name));
+
+INSERT INTO LaunchpadDatabaseRevision VALUES (2208, 01, 0);

=== modified file 'database/schema/security.cfg'
--- database/schema/security.cfg 2010-08-10 05:49:16 +0000
+++ database/schema/security.cfg 2010-08-19 23:52:51 +0000
@@ -17,6 +17,7 @@
 public.person_sort_key(text, text) = EXECUTE
 public.calculate_bug_heat(integer) = EXECUTE
 public.debversion_sort_key(text) = EXECUTE
+public.milestone_sort_key(timestamp without time zone, text) = EXECUTE
 public.null_count(anyarray) = EXECUTE
 public.valid_name(text) = EXECUTE
 public.valid_bug_name(text) = EXECUTE

=== modified file 'database/schema/trusted.sql'
--- database/schema/trusted.sql 2010-08-06 09:32:02 +0000
+++ database/schema/trusted.sql 2010-08-19 19:50:52 +0000
@@ -1705,3 +1705,30 @@

     return int(total_heat)
 $$;
+
+-- This function is not STRICT, since it needs to handle
+-- dateexpected when it is NULL.
+CREATE OR REPLACE FUNCTION milestone_sort_key(
+ dateexpected timestamp, name text)
+ RETURNS text
+AS $_$
+ # If this method is altered, then any functional indexes using it
+ # need to be rebuilt.
+ import re
+ import datetime
+
+ date_expected, name = args
+
+ def substitude_filled_numbers(match):
+ return match.group(0).zfill(5)
+
+ name = re.sub(u'\d+', substitude_filled_numbers, name)
+ if date_expected is None:
+ # NULL dates are considered to be in the future.
+ date_expected = datetime.datetime(datetime.MAXYEAR, 1, 1)
+ return '%s %s' % (date_expected, name)
+$_$
+LANGUAGE plpythonu IMMUTABLE;
+
+COMMENT ON FUNCTION milestone_sort_key(timestamp, text) IS
+'Sort by the Milestone dateexpected and name. If the dateexpected is NULL, then it is converted to a date far in the future, so it will be sorted as a milestone in the future.';

=== modified file 'lib/lp/registry/browser/__init__.py'
--- lib/lp/registry/browser/__init__....

Revision history for this message
Brad Crittenden (bac) wrote :

Hi Edwin,

Interesting branch!

typo: substitude -> substitute

Should has_milestones be exported?

Any reason you cannot use ISlaveStore for the queries?

review: Approve (code)
Revision history for this message
Edwin Grubbs (edwin-grubbs) wrote :

> Hi Edwin,
>
> Interesting branch!
>
> typo: substitude -> substitute

Fixed both the db function and the code that I stole that from.

> Should has_milestones be exported?

I think it would be better not to slow down every query for the pillars and series by adding another attribute that is hardly ever used. The other milestones attributes are references to collections, so you don't actually invoke the extra query unless you follow that link.

> Any reason you cannot use ISlaveStore for the queries?

IStore() chooses the slave or master based on whether the user has modified anything lately. If I forced it to use ISlaveStore, it would not show the user any changes they made until they propagated to the slave servers.

Revision history for this message
Robert Collins (lifeless) wrote :

69- def substitude_filled_numbers(match):
70+
71+ def substitute_filled_numbers(match):

That new VWS breaks up a small function - closures like that are more readable when they look like one conceptual thing rather than two IMO; please consider removing the new VWS.

I think the index is new? Needs a new stamp from stuart if so.

Lastly, it seems to me that LBYL isn't needed here: surely doing *neither* a .count() nor a .any() is appropriate: rather just iterate the latest_milestones, and if the iterator outputs no rows don't show the table? Perhaps we don't have a construct for doing that; if thats the case I'm happy with this approach, but suggest that you file a bug saying we should have such a construct - it will be the least work of all and thus fastest.

review: Approve
Revision history for this message
Stuart Bishop (stub) wrote :

db still approved.

review: Approve (db)
Revision history for this message
Edwin Grubbs (edwin-grubbs) wrote :

> 69- def substitude_filled_numbers(match):
> 70+
> 71+ def substitute_filled_numbers(match):

Brad also pointed that out, and it has been fixed.

> That new VWS breaks up a small function - closures like that are more readable
> when they look like one conceptual thing rather than two IMO; please consider
> removing the new VWS.

I actually disagree that it makes it less readable. Secondly, pocketlint complains when you don't have a blank line above a function definition.

> I think the index is new? Needs a new stamp from stuart if so.

The index isn't new. Since I included a full diff after the first set of changes instead of an incremental diff, it may have appeared new, but it was in the patch file below the db function before the db function was moved to trusted.sql.

> Lastly, it seems to me that LBYL isn't needed here: surely doing *neither* a
> .count() nor a .any() is appropriate: rather just iterate the
> latest_milestones, and if the iterator outputs no rows don't show the table?

The problem with not checking .any() before iterating is that a storm ResultSet object will query the database every time you iterate over it; therefore, you would have to create a new cached property on the view that would hold a list in order to eliminate the query. We can't have the model just cache the property as a list for us, since the whole point of this refactoring was to allow the view code to slice the model's attribute without the model first querying a really large number of rows.

> Perhaps we don't have a construct for doing that; if thats the case I'm happy
> with this approach, but suggest that you file a bug saying we should have such
> a construct - it will be the least work of all and thus fastest.

I definitely think there is a better universal solution to this, but it will need more discussion on the mailing list, since either template/view coding practices or storm's ResultSet will have to change significantly. Also, to put it into perspective, we are talking about eliminating one or two single-row queries, compared to the 700-rows of results that this branch eliminates, so improving this might not be as urgent as other optimizations.

Preview Diff

[H/L] Next/Prev Comment, [J/K] Next/Prev File, [N/P] Next/Prev Hunk
The diff is not available at this time. You can reload the page or download it.

Subscribers

People subscribed via source and target branches

to status/vote changes: