RSS iconTwitter iconFacebook icon

The Trek BBS title image

The Trek BBS statistics

Threads: 138,311
Posts: 5,352,224
Members: 24,619
Currently online: 651
Newest member: DaxFan74205

TrekToday headlines

Drexler TV Alert
By: T'Bonz on Jul 26

Retro Review: His Way
By: Michelle on Jul 26

MicroWarriors Releases Next Week
By: T'Bonz on Jul 25

Ships Of The Line Design Contest
By: T'Bonz on Jul 25

Next Weekend: Shore Leave 36!
By: T'Bonz on Jul 25

True Trek History To Be Penned
By: T'Bonz on Jul 25

Insight Editions Announces Three Trek Books For 2015
By: T'Bonz on Jul 24

To Be Takei Review by Spencer Blohm
By: T'Bonz on Jul 24

Mulgrew: Playing Red
By: T'Bonz on Jul 24

Hallmark 2015 Trek Ornaments
By: T'Bonz on Jul 24


Welcome! The Trek BBS is the number one place to chat about Star Trek with like-minded fans. Please login to see our full range of forums as well as the ability to send and receive private messages, track your favourite topics and of course join in the discussions.

If you are a new visitor, join us for free. If you are an existing member please login below. Note: for members who joined under our old messageboard system, please login with your display name not your login name.


Go Back   The Trek BBS > Misc. Star Trek > Trek Literature

Trek Literature "...Good words. That's where ideas begin."

Reply
 
Thread Tools
Old November 17 2013, 12:26 AM   #121
JeBuS
Lieutenant Commander
 
Re: Tabulated review threads sorted by average score

Umm, the reason TF:R&D is ranked #1 is because it's picking up the wrong edition. This versus this. Huge difference. It begs the question how correct the rest of them are.
JeBuS is offline   Reply With Quote
Old November 17 2013, 12:36 AM   #122
Defcon
Rear Admiral
 
Defcon's Avatar
 
Location: Germany
View Defcon's Twitter Profile
Re: Tabulated review threads sorted by average score

The problem lies with Goodreads since they have multiple listings for the book. On Shelfari you can request a merger of duplicate books, but I haven't found this option on Goodreads yet
Defcon is offline   Reply With Quote
Old November 17 2013, 01:00 AM   #123
JeBuS
Lieutenant Commander
 
Re: Tabulated review threads sorted by average score

Defcon wrote: View Post
The problem lies with Goodreads since they have multiple listings for the book. On Shelfari you can request a merger of duplicate books, but I haven't found this option on Goodreads yet
As a goodreads librarian, I thought I'd do it myself, but it's only possible when an item is on less than 5 people's shelves. That one had 7. I've put in a merge request so a super-librarian can do it. Hopefully it'll be sorted soon.

Sho, why don't you use the ISBN of the books for the goodreads API?
JeBuS is offline   Reply With Quote
Old November 17 2013, 01:39 AM   #124
Sho
Fleet Captain
 
Sho's Avatar
 
Location: Berlin, Germany
Re: Tabulated review threads sorted by average score

JeBuS wrote: View Post
Sho, why don't you use the ISBN of the books for the goodreads API?
Because the threads here don't contain it. Remember that the whole thing runs fully autonomously and automated off the forum (and now Goodreads); I don't want to hunt for threads manually or collect metadata about them manually.

I could use some sort of other source to map from title + author to ISBN -- but then Goodreads effectively does that too.

Pretty sure this is an outlier, anyhow.

Last edited by Sho; November 17 2013 at 01:53 AM.
Sho is offline   Reply With Quote
Old November 17 2013, 01:53 AM   #125
JeBuS
Lieutenant Commander
 
Re: Tabulated review threads sorted by average score

Merger complete on TF:R&D

Sho wrote: View Post
Pretty sure this is an outlier, anyhow.
But it's likely to happen every time a new book is announced. These outliers are the result of regular users adding the books as they can, without complete info. If a few people use it instead of the real one, it gets stuck, as happened with TF:R&D. Yeah, they probably eventually get merged, but it's not uncommon for them to be there.
JeBuS is offline   Reply With Quote
Old November 17 2013, 02:03 AM   #126
Sho
Fleet Captain
 
Sho's Avatar
 
Location: Berlin, Germany
Re: Tabulated review threads sorted by average score

JeBuS wrote: View Post
Merger complete on TF:R&D
Thanks, I kicked off an out-of-schedule run and the new data is in now.


JeBuS wrote: View Post
But it's likely to happen every time a new book is announced. These outliers are the result of regular users adding the books as they can, without complete info. If a few people use it instead of the real one, it gets stuck, as happened with TF:R&D. Yeah, they probably eventually get merged, but it's not uncommon for them to be there.
No doubt. That's why, if you go back two pages to when this was first proposed, I wrote "Most likely the main challenge is reliably mapping a review thread to the goodreads entry in an automated fashion", anticipating such problems.

I see no trivial solution. You seem to know Goodreads better though. If you have suggestions for algorithmically selecting the best search result better than Goodreads' result weighting works by itself (example: "use the entry with the highest number of ratings that still has Star Trek in the title") I'm interested.
Sho is offline   Reply With Quote
Old November 17 2013, 02:50 AM   #127
JeBuS
Lieutenant Commander
 
Re: Tabulated review threads sorted by average score

Sho wrote: View Post
No doubt. That's why, if you go back two pages to when this was first proposed, I wrote "Most likely the main challenge is reliably mapping a review thread to the goodreads entry in an automated fashion", anticipating such problems.

I see no trivial solution. You seem to know Goodreads better though. If you have suggestions for algorithmically selecting the best search result better than Goodreads' result weighting works by itself (example: "use the entry with the highest number of ratings that still has Star Trek in the title") I'm interested.
That would have been my suggestion, actually. Though who knows how similar some of the Trek novels' titles are to one another? It may just end up kicking the can down the road to a different problem.
JeBuS is offline   Reply With Quote
Old November 17 2013, 03:10 AM   #128
Sho
Fleet Captain
 
Sho's Avatar
 
Location: Berlin, Germany
Re: Tabulated review threads sorted by average score

JeBuS wrote: View Post
That would have been my suggestion, actually. Though who knows how similar some of the Trek novels' titles are to one another? It may just end up kicking the can down the road to a different problem.
I think we'll just have to keep an eye out for how it behaves in practice.

For the record, currently, the search process works like this:

1. Take the thread title.
2. Chop off the first segment ending in a colon (i.e. the series label) if there is one.
3. Strip out the trigger phrase the thread discovery keys on ("Review Thread").
4. Strip out anything that looks like author name initials (because the Goodreads search is super strict, and searching for an author initial can already mean no results since there is no basic string match against the fully written-out name - if I was implementing Goodreads' search engine, I would actually account for this).
5. Strip out anything that looks like "(spoiler*)".
6. Strip out the last occurance of the word "by".
7. Strip out special chars like "&".
8. Simplify and trim whitespace.
9. Prepend "Star Trek ".
10. Search Goodreads for it.

So for example, "TF: A Ceremony of Losses by David Mack Review Thread (Spoilers!)" results in a search for "Star Trek A Ceremony of Losses David Mack".

However, in the case of Revelation and Dust, the author name is included in the thread title as "DRGIII" (for space reasons). If you include that in the search, Goodreads' strictness results in zero search results. So I introduced a simple list of words to remove from titles and seeded it with DRGIII and KRAD for now.

Now, before the two R&D entries were merged, the result for its search query ("Star Trek Revelation and Dust") contained both entries, but in the wrong order, i.e. with the bad one first. If I had written additional code to grab the entry with the largest number of ratings, that would have corrected for it.

However, I did a quick test, and just replacing the kill words list with a substitution table (turning "DRGIII" into "David George") would have done the trick too: The same search with the author appended ("Star Trek Revelation and Dust David George") put the desired entry first.

The difference is most likely because the bad entry's title had a higher similarity to our search terms - it contained "Star Trek" in the title, while for the right entry, the association to Star Trek was moved out into metadata. Supplying the author as well somehow convinced the scoring algorithm in Goodreads' search engine that the other entry was the better match.

This is why I made the outlier claim early: Revelation and Dust is the only book we aren't searching for with author name included, and if you do include the author name, Goodreads' own result weighting seems to do an acceptable job.

I'll implement the substitution thing later.

Edit: Another idea would be to append instead of prepend "Star Trek" to the search terms. Since the right entries always have the Trek association in metadata, only bad entries would have "Star Trek" at the beginning of the title, and prepending then biases the scoring badly because of the high string similarity. Goodreads is most likely using a trivial Levenshtein distance algo for that.
Sho is offline   Reply With Quote
Old November 17 2013, 03:35 AM   #129
JeBuS
Lieutenant Commander
 
Re: Tabulated review threads sorted by average score

^That is an interesting read. That's all I can say.
JeBuS is offline   Reply With Quote
Old November 17 2013, 03:50 AM   #130
Sho
Fleet Captain
 
Sho's Avatar
 
Location: Berlin, Germany
Re: Tabulated review threads sorted by average score

Thanks ... .

If you're curious, this is how this looks in code form: http://www.eikehein.com/repositories...785caf;hb=HEAD

It's all a bit quick and dirty, but has been working pretty reliably so far. Lines 113-149 interact with Goodreads.

I try to habitually future-proof this sort of stuff. If I run out of time or interest to look after the forum (the latter is unlikely, but who knows about the former?) I want things to just continue working unattended. And for the get-hit-by-a-bus scenario the source code is available so someone else can set it up.

Another idea to make the search better btw. is to just run the search terms through a standard spell checker and allow it to do high-confidence substitutions. That would have taken care of the What Judgements Come problem.
Sho is offline   Reply With Quote
Old November 17 2013, 04:33 AM   #131
JeBuS
Lieutenant Commander
 
Re: Tabulated review threads sorted by average score

If that's quick and dirty, I'm afraid of what slow and clean looks like. It seems perfectly readable to me as-is.
JeBuS is offline   Reply With Quote
Old November 17 2013, 04:42 AM   #132
Sho
Fleet Captain
 
Sho's Avatar
 
Location: Berlin, Germany
Re: Tabulated review threads sorted by average score

^ Thanks.
Sho is offline   Reply With Quote
Old November 17 2013, 02:02 PM   #133
Defcon
Rear Admiral
 
Defcon's Avatar
 
Location: Germany
View Defcon's Twitter Profile
Re: Tabulated review threads sorted by average score

Christopher wrote: View Post
Interesting... My three listed books are all grouped together in the TrekBBS rankings,[...]
And now Ex Machina has joined the group.
Defcon is offline   Reply With Quote
Old November 17 2013, 07:16 PM   #134
Sho
Fleet Captain
 
Sho's Avatar
 
Location: Berlin, Germany
Re: Tabulated review threads sorted by average score

I've now added the search improvements I mentioned (except for the spelling correction for now), and they don't seem to have impacted things negatively at least.

Edit: I'm now also reporting Goodreads scores as "n/a" when there are less than 4 votes, the same bar used against thread votes for inclusion in the table.

Last edited by Sho; November 17 2013 at 07:41 PM.
Sho is offline   Reply With Quote
Old November 17 2013, 11:52 PM   #135
Avro Arrow
Fleet Captain
 
Location: Ontario, Canada
Re: Tabulated review threads sorted by average score

Sho wrote: View Post
And for the get-hit-by-a-bus scenario the source code is available so someone else can set it up.
I didn't realize the "hit by a bus" thing was some sort of industry standard, that apparently even crosses national boundaries. Why do we never talk about other forms of death when discussing future supportability? Apparently coders spend a lot of their time running out into traffic suddenly...
Avro Arrow is online now   Reply With Quote
Reply

Bookmarks

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off

Forum Jump



All times are GMT +1. The time now is 06:53 AM.

Powered by vBulletin® Version 3.8.6
Copyright ©2000 - 2014, Jelsoft Enterprises Ltd.
FireFox 2+ or Internet Explorer 7+ highly recommended.