Tabulated review threads sorted by average score

JeBuS · Nov 16, 2013

Umm, the reason TF:R&D is ranked #1 is because it's picking up the wrong edition. This versus this. Huge difference. It begs the question how correct the rest of them are.

Defcon · Nov 16, 2013

The problem lies with Goodreads since they have multiple listings for the book. On Shelfari you can request a merger of duplicate books, but I haven't found this option on Goodreads yet

JeBuS · Nov 17, 2013

Defcon said: ↑

The problem lies with Goodreads since they have multiple listings for the book. On Shelfari you can request a merger of duplicate books, but I haven't found this option on Goodreads yet
Click to expand...

As a goodreads librarian, I thought I'd do it myself, but it's only possible when an item is on less than 5 people's shelves. That one had 7. I've put in a merge request so a super-librarian can do it. Hopefully it'll be sorted soon.

Sho, why don't you use the ISBN of the books for the goodreads API?

Sho · Nov 17, 2013

JeBuS said: ↑

Sho, why don't you use the ISBN of the books for the goodreads API?
Click to expand...

Because the threads here don't contain it. Remember that the whole thing runs fully autonomously and automated off the forum (and now Goodreads); I don't want to hunt for threads manually or collect metadata about them manually.

I could use some sort of other source to map from title + author to ISBN -- but then Goodreads effectively does that too.

Pretty sure this is an outlier, anyhow.

JeBuS · Nov 17, 2013

Merger complete on TF:R&D

Sho said: ↑

Pretty sure this is an outlier, anyhow.
Click to expand...

But it's likely to happen every time a new book is announced. These outliers are the result of regular users adding the books as they can, without complete info. If a few people use it instead of the real one, it gets stuck, as happened with TF:R&D. Yeah, they probably eventually get merged, but it's not uncommon for them to be there.

Sho · Nov 17, 2013

JeBuS said: ↑

Merger complete on TF:R&D
Click to expand...

Thanks, I kicked off an out-of-schedule run and the new data is in now.

JeBuS said: ↑

But it's likely to happen every time a new book is announced. These outliers are the result of regular users adding the books as they can, without complete info. If a few people use it instead of the real one, it gets stuck, as happened with TF:R&D. Yeah, they probably eventually get merged, but it's not uncommon for them to be there.
Click to expand...

No doubt. That's why, if you go back two pages to when this was first proposed, I wrote "Most likely the main challenge is reliably mapping a review thread to the goodreads entry in an automated fashion", anticipating such problems.

I see no trivial solution. You seem to know Goodreads better though. If you have suggestions for algorithmically selecting the best search result better than Goodreads' result weighting works by itself (example: "use the entry with the highest number of ratings that still has Star Trek in the title") I'm interested.

JeBuS · Nov 17, 2013

Sho said: ↑

No doubt. That's why, if you go back two pages to when this was first proposed, I wrote "Most likely the main challenge is reliably mapping a review thread to the goodreads entry in an automated fashion", anticipating such problems.

I see no trivial solution. You seem to know Goodreads better though. If you have suggestions for algorithmically selecting the best search result better than Goodreads' result weighting works by itself (example: "use the entry with the highest number of ratings that still has Star Trek in the title") I'm interested.
Click to expand...

That would have been my suggestion, actually. Though who knows how similar some of the Trek novels' titles are to one another? It may just end up kicking the can down the road to a different problem.

Sho · Nov 17, 2013

JeBuS said: ↑

That would have been my suggestion, actually. Though who knows how similar some of the Trek novels' titles are to one another? It may just end up kicking the can down the road to a different problem.
Click to expand...

I think we'll just have to keep an eye out for how it behaves in practice.

For the record, currently, the search process works like this:

1. Take the thread title.
2. Chop off the first segment ending in a colon (i.e. the series label) if there is one.
3. Strip out the trigger phrase the thread discovery keys on ("Review Thread").
4. Strip out anything that looks like author name initials (because the Goodreads search is super strict, and searching for an author initial can already mean no results since there is no basic string match against the fully written-out name - if I was implementing Goodreads' search engine, I would actually account for this).
5. Strip out anything that looks like "(spoiler*)".
6. Strip out the last occurance of the word "by".
7. Strip out special chars like "&".
8. Simplify and trim whitespace.
9. Prepend "Star Trek ".
10. Search Goodreads for it.

So for example, "TF: A Ceremony of Losses by David Mack Review Thread (Spoilers!)" results in a search for "Star Trek A Ceremony of Losses David Mack".

However, in the case of Revelation and Dust, the author name is included in the thread title as "DRGIII" (for space reasons). If you include that in the search, Goodreads' strictness results in zero search results. So I introduced a simple list of words to remove from titles and seeded it with DRGIII and KRAD for now.

Now, before the two R&D entries were merged, the result for its search query ("Star Trek Revelation and Dust") contained both entries, but in the wrong order, i.e. with the bad one first. If I had written additional code to grab the entry with the largest number of ratings, that would have corrected for it.

However, I did a quick test, and just replacing the kill words list with a substitution table (turning "DRGIII" into "David George") would have done the trick too: The same search with the author appended ("Star Trek Revelation and Dust David George") put the desired entry first.

The difference is most likely because the bad entry's title had a higher similarity to our search terms - it contained "Star Trek" in the title, while for the right entry, the association to Star Trek was moved out into metadata. Supplying the author as well somehow convinced the scoring algorithm in Goodreads' search engine that the other entry was the better match.

This is why I made the outlier claim early: Revelation and Dust is the only book we aren't searching for with author name included, and if you do include the author name, Goodreads' own result weighting seems to do an acceptable job.

I'll implement the substitution thing later.

Edit: Another idea would be to append instead of prepend "Star Trek" to the search terms. Since the right entries always have the Trek association in metadata, only bad entries would have "Star Trek" at the beginning of the title, and prepending then biases the scoring badly because of the high string similarity. Goodreads is most likely using a trivial Levenshtein distance algo for that.

JeBuS · Nov 17, 2013

^That is an interesting read. That's all I can say.

Sho · Nov 17, 2013

Thanks ... .

If you're curious, this is how this looks in code form: http://www.eikehein.com/repositorie...13ca9742721af10fff7d4e3258d8edf785caf;hb=HEAD

It's all a bit quick and dirty, but has been working pretty reliably so far. Lines 113-149 interact with Goodreads.

I try to habitually future-proof this sort of stuff. If I run out of time or interest to look after the forum (the latter is unlikely, but who knows about the former?) I want things to just continue working unattended. And for the get-hit-by-a-bus scenario the source code is available so someone else can set it up.

Another idea to make the search better btw. is to just run the search terms through a standard spell checker and allow it to do high-confidence substitutions. That would have taken care of the What Judgements Come problem.

JeBuS · Nov 17, 2013

If that's quick and dirty, I'm afraid of what slow and clean looks like. It seems perfectly readable to me as-is.

Sho · Nov 17, 2013

^ Thanks.

Defcon · Nov 17, 2013

Christopher said: ↑

Interesting... My three listed books are all grouped together in the TrekBBS rankings,[...]
Click to expand...

And now Ex Machina has joined the group.

Sho · Nov 17, 2013

I've now added the search improvements I mentioned (except for the spelling correction for now), and they don't seem to have impacted things negatively at least.

Edit: I'm now also reporting Goodreads scores as "n/a" when there are less than 4 votes, the same bar used against thread votes for inclusion in the table.

Avro Arrow · Nov 17, 2013

Sho said: ↑

And for the get-hit-by-a-bus scenario the source code is available so someone else can set it up.
Click to expand...

I didn't realize the "hit by a bus" thing was some sort of industry standard, that apparently even crosses national boundaries. Why do we never talk about other forms of death when discussing future supportability? Apparently coders spend a lot of their time running out into traffic suddenly...

Sho · Nov 17, 2013

We also talk about the "bus number" of a codebase .

The "bus number" is the number of developers who'd need to get hit by a bus to seriously disrupt further development of the codebase because of the knowledge exclusive to their heads.

I.e. if a codebase has a bus number of 1, only one developer needs to get up close and personal with the front of a bus traveling at high velocity, and the project no longer functions. Increasing the bus number isn't as easy as just adding more people, either, it's more about documenting things and other forms of spreading knowledge, making code accessible enough so someone can jump in and take over, and removing other barriers to someone else stepping up and accepting responsibility.

trampledamage · Nov 20, 2013

I haven't come across bus number before, but we did used to have people designated as "bus people" in that they, absolutely, could not get hit or the project would collapse!

Nice work, Sho !

Sho · Nov 20, 2013

^ Thanks!

Defcon · Nov 23, 2013

So I have thought about a rough schedule for the next few "classic" review threads mixing up "newer" novels with older one while rotating through series (I'll go with the one book every two weeks schedule by the way):

TNG: Death in Winter * Michael Jan Friedman
DS9: Warped * K.W. Jeter
VOY: String Theory #1: Cohesion * Jeffrey Lang
Enterprise: By the Book * Dean Wesley Smith & K.K. Rusch
IKS Gorkon: A good day to die * Keith R.A DeCandido
Titan: Taking Wing *Andy Mangels & Michael A. Martin

Any thoughts?

Markonian · Nov 23, 2013

Sounds good to me. I'm going to read A Good Day to Die soon for the first time, and consider re-reading Taking Wing, too. It's great to share thoughts about books so old.

Do you have a specific pattern which books/series you choose for the classics threads?

Log in or Sign up

Tabulated review threads sorted by average score

JeBuS Lieutenant Commander Red Shirt

Defcon Rear Admiral Rear Admiral

JeBuS Lieutenant Commander Red Shirt

Sho Fleet Captain Fleet Captain

JeBuS Lieutenant Commander Red Shirt

Sho Fleet Captain Fleet Captain

JeBuS Lieutenant Commander Red Shirt

Sho Fleet Captain Fleet Captain

JeBuS Lieutenant Commander Red Shirt

Sho Fleet Captain Fleet Captain

JeBuS Lieutenant Commander Red Shirt

Sho Fleet Captain Fleet Captain

Defcon Rear Admiral Rear Admiral

Sho Fleet Captain Fleet Captain

Avro Arrow Vice Admiral Moderator

Sho Fleet Captain Fleet Captain

trampledamage Clone Admiral

Sho Fleet Captain Fleet Captain

Defcon Rear Admiral Rear Admiral

Markonian Fleet Admiral Moderator

Log in or Sign up

Tabulated review threads sorted by average score

JeBuS Lieutenant Commander Red Shirt

Defcon Rear Admiral Rear Admiral

JeBuS Lieutenant Commander Red Shirt

Sho Fleet Captain Fleet Captain

JeBuS Lieutenant Commander Red Shirt

Sho Fleet Captain Fleet Captain

JeBuS Lieutenant Commander Red Shirt

Sho Fleet Captain Fleet Captain

JeBuS Lieutenant Commander Red Shirt

Sho Fleet Captain Fleet Captain

JeBuS Lieutenant Commander Red Shirt

Sho Fleet Captain Fleet Captain

Defcon Rear Admiral Rear Admiral

Sho Fleet Captain Fleet Captain

Avro Arrow Vice Admiral Moderator

Sho Fleet Captain Fleet Captain

trampledamage Clone Admiral

Sho Fleet Captain Fleet Captain

Defcon Rear Admiral Rear Admiral

Markonian Fleet Admiral Moderator

Useful Searches