Database policy reevaluated frequently, slowing everything down during database maintenance

Bug #1885859 reported by William Grant
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Launchpad itself
In Progress
Low
Colin Watson

Bug Description

lp.services.webapp.adapter.StoreSelector invokes lp.services.database.policy.BaseDatabasePolicy.getStore each time it's used (e.g. during an IStore(Product) call). In normal operation this is cheap, since the replication lag check is in LaunchpadDatabasePolicy.install and only run once per request, so getStore already knows if it's looking for a master or slave.

But if the requested database is down (e.g. it's a slave-capable request but the slave is offline), each getStore call will try to connect and only then fall back to the other one, often taking a few milliseconds if it fails at the pgbouncer level. If a request context available, a failed connection should probably blacklist the flavour for the remainder of the request.

This could potentially also cause weird behaviour if a DB is flaky, as you'd end up with repeated requests for an object returning two from different stores.

There's possibly also another bug here: LaunchpadDatabasePolicy.getReplicationLag will actually check lag against the master if the slave is down, due to the same getStore fallback behaviour.

Colin Watson (cjwatson)
Changed in launchpad:
assignee: nobody → Colin Watson (cjwatson)
status: Triaged → In Progress
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.