dee

Merge lp://qastaging/~kamstrup/dee/tree-index into lp://qastaging/dee

tree-index
Merge into trunk

Proposed by Mikkel Kamstrup Erlandsen on 2011-11-23

Status:

Merged

Approved by:

Michal Hruby on 2011-11-29

Approved revision:

343

Merged at revision:

309

Proposed branch:

lp://qastaging/~kamstrup/dee/tree-index

Merge into:

lp://qastaging/dee

Diff against target:

4389 lines (+2926/-603)

32 files modified

bindings/python/Dee.py (+2/-2)
dee/Makefile.am (+8/-2)
dee/dee-analyzer.c (+504/-0)
dee/dee-analyzer.h (+174/-0)
dee/dee-glist-result-set.c (+4/-3)
dee/dee-glist-result-set.h (+3/-3)
dee/dee-hash-index.c (+30/-17)
dee/dee-hash-index.h (+3/-2)
dee/dee-index.c (+61/-19)
dee/dee-index.h (+6/-47)
dee/dee-model-reader.c (+157/-229)
dee/dee-model-reader.h (+60/-11)
dee/dee-term-list.c (+130/-27)
dee/dee-term-list.h (+8/-4)
dee/dee-text-analyzer.c (+192/-0)
dee/dee-text-analyzer.h (+90/-0)
dee/dee-tree-index.c (+713/-0)
dee/dee-tree-index.h (+83/-0)
dee/dee.h (+4/-1)
doc/reference/dee-1.0/dee-1.0-docs.sgml (+4/-1)
doc/reference/dee-1.0/dee-1.0.types (+3/-0)
tests/Makefile.am (+18/-3)
tests/test-analyzer.c (+137/-0)
tests/test-dee.c (+4/-2)
tests/test-glist-result-set.c (+3/-3)
tests/test-index.c (+346/-112)
tests/test-model-readers.c (+49/-83)
tests/test-python.py (+31/-0)
tests/test-term-list.c (+54/-0)
vapi/Dee-0.5-custom.vala (+2/-12)
vapi/Dee-0.5.metadata (+0/-1)
vapi/dee-1.0.vapi (+43/-19)

To merge this branch:

bzr merge lp://qastaging/~kamstrup/dee/tree-index

Related bugs:

Bug #622446: Implement a DeeTreeIndex

Medium

Fix Released

Link a bug report

Reviewer	Review Type	Date Requested	Status
Michal Hruby (community)		2011-11-23	Approve on 2011-11-28
Review via email: mp+83141@code.qastaging.launchpad.net

Description of the change

WARNING API+ABI break: Big overhaul of the whole text analysis, DeeAnalyzer, and DeeIndex framework.

Primary intent is to implement a DeeTreeIndex that can do O(logN) prefix searching.

Revision history for this message

Michal Hruby (mhr3) wrote on 2011-11-23:

Wow, this is huge...

There are (closure) annotations missing for DeeModelReaderFunc, DeeCollatorFunc, and all the *_reader helpers.

dee_term_list_clone_real: is trying to be insanely quick, doing memcpy of the ptrarr->data fields could squeeze up some extra performance.

find_term_real: shouldn't it use g_sequence_lookup when doing MATCH_EXACT?

2573 +//fixme fixme - is there something to fix?

Revision history for this message

Michal Hruby (mhr3) wrote on 2011-11-23:

Oh yea, and regenerate the Vala bindings please.

Revision history for this message

Michal Hruby (mhr3) on 2011-11-23:

review: Needs Fixing

Revision history for this message

Michal Hruby (mhr3) wrote on 2011-11-23:

Also please add a test that creates TreeIndex on a model that already has a few rows. Hint: it'll crash. ;)

Revision history for this message

Mikkel Kamstrup Erlandsen (kamstrup) wrote on 2011-11-23:

Just pushed fixes for everything except the last crasher you pointed out. I added a failing unit test as well for the latter, but will wait until tomorrow to actually fix it because it'll require some work (the root cause is described in //FIXMEs).

Revision history for this message

Mikkel Kamstrup Erlandsen (kamstrup) wrote on 2011-11-24:

Just pushed a fix for the last issue. Please review again.

Revision history for this message

Michal Hruby (mhr3) wrote on 2011-11-24:

I can't say I like the pointer property, and moreover it's copying the delegate which is *really* dangerous if the original was also used. That being said, this allows easy use of the existing readers, so it shouldn't be that bad. But please, make the "reader" property write-only and add one more ModelReader initializer that will have the (ModelReaderFunc, gpointer, GDestroyNotify) parameters.

As I mentioned on IRC, my test still crashes with SIGFPE, and after some digging, it seems it has something to do with the exact data I chose, so adding that here (the clear call crashes):

void lookup_rows (Dee.Index index, string term)
{
  var rs = index.lookup (term, Dee.TermMatchFlag.PREFIX);
  var m = rs.get_model ();
  foreach (var iter in rs)
  {
    debug ("resultset contains iter: %p", iter);
    debug (" data: %d, %s", m.get_int32 (iter, 0), m.get_string (iter, 1));
  }
}

void main ()
{
var m = new Dee.SequenceModel ();
m.set_schema ("i", "s");

var reader = Dee.ModelReader.new_for_string_column (1);

  var i = new Dee.TreeIndex (m, new Dee.TextAnalyzer (), reader);
  m.append (44, "file:///local_uri");
  m.append (23, "http://foo.com");
  m.append (11, "ftp://bar.org");
  m.append (07, "file:///home/username/file@home");

lookup_rows (i, "user");

m.clear ();
}

Revision history for this message

Mikkel Kamstrup Erlandsen (kamstrup) wrote on 2011-11-25:

Thanks for spotting this. Finally nailed it and pushed the fixes. See
the commit comments. Please review again.

Revision history for this message

Michal Hruby (mhr3) wrote on 2011-11-28:

There are just minor things: ModelReader shouldn't be marked as Compact class (+vapi regen), dee_model_reader_new annotations should have (scope notified) for the first param, the third should be just (allow-none).

Other than that I think we're good. Awesome job! :)

review: Approve

Revision history for this message

Michal Hruby (mhr3) wrote on 2011-11-28:

+ unitialized -> initialized :)

Revision history for this message

Michal Hruby (mhr3) wrote on 2011-11-28:

> + unitialized -> initialized

Actually, "uninitialized" :)

Revision history for this message

Unity Merger (unity-merger) wrote on 2011-11-29:

The Jenkins job https://jenkins.qa.ubuntu.com/job/automerge-dee/1/console reported an error when processing this lp:~kamstrup/dee/tree-index branch.
Not merging it.

lp://qastaging/~kamstrup/dee/tree-index updated on 2011-11-29

343. By Mikkel Kamstrup Erlandsen on 2011-11-29: Fix test /Index/ResultSet/GList/Test3 by making dee_glist_result_set_new() a public symbol - although not part of the contractually "stable API". You'll need to include non-distributed headers to access it. The problem with the tests was that we compiled DeeGListResultSet into the test executable making it register the DeeGListResultSet GType twice which causes a g_error.