View Single Post
Old January 18 2012, 07:29 PM   #26
Fleet Captain
Sho's Avatar
Location: Berlin, Germany
Re: Tabulated review threads sorted by average score

Defcon wrote: View Post
Maybe it would be wise to wait with expanding the list until after the software-switch to Xenforo that is planned for the relatively near future.

Probably that will just be a minor change for you in the code (I have absolutely no idea about programming/coding), but I thought you should at least know about it.
Whoa. Thanks for pointing this out, I had indeed no idea.

I've taken a quick peek at XenForo's own installation of XenForo to determine how hard it will be to adapt the data miner:

First, let's take a quick look at how the data miner currently operates - you can probably skip this if you don't care for technobabble, but then what are you doing in a Star Trek forum? Anyway, here we go:
  1. It visits the forum's thread list page, using variables in the URL it loads to convey its desired view options, which are to sort the threads by their creation time from new to old, and to limit the listing to a specific timeframe. On the first run that timeframe was actually the limitless "since the beginning", and on subsequent runs it asks for only the last 24 hours, which is the lowest queryable timeframe and sufficient to catch new threads when running twice daily.
  2. With the thread list page loaded up, it walks over it (while following pagination links if there are any, i.e. loading further pages as needed) and notes down the URLs of any threads which contain a configurable trigger phrase, currently "review thread" (the character string search is done case-insensitively).
  3. Then it reads in the contents of its thread cache, a file containing the URLs of threads gathered in previous runs.
  4. Having this set of previously gathered thread URLs from the cache and a set of any new thread URLs gathered from the thread list page, it builds both the union of the two and the difference of them, i.e. sets containing all now-known thread URLs and of thread URLs not yet written to the cache. The latter is then added to the cache to serve the next run, and the former is used in the next step.
  5. For each URL in the just-computed set of all known thread URLs it visits them, mining any poll data from the thread page.
  6. Then it loops over the assembled thread data again, to eliminate those threads which do not follow the expected poll format, to separate them into two bins for threads meeting the minimum vote threshold and those which do not, to sort them (which the new, fancier page can actually do client-side using JavaScript as the ability to dynamically resort by clicking the column headers demonstrates, but just in case you don't have JavaScript enabled in your browser the static HTML served up to it still presorts the table rows by average score) and to generate the report page.
In effect, this gives it the following properties:
  • It picks up on new threads and their votes.
  • It picks up on new votes in old threads.
  • It picks up on new polls in old threads (i.e. it's already ready for grafts).
Now, back to XenForo. Looks like it allows me:
  • ... to list threads ordered by thread creation time by loading the right URL. However, it doesn't appear to allow me to limit the timeframe of the query. That means I'll have to adjust the thread gathering logic slightly to make it "walk over threads, possibly following pagination links, until hitting a thread that's older than the time of the last run" (or some fixed cut-off time delta vs. the current time). Fortunately it includes the creation dates right in the listing, so that's easy to compute. There's a bit of a curveball in the form of sticky threads with old dates leading the list, but I can easily skip over those of course.
  • ... to view poll results without needing to be logged in. It includes nearly the same data in its display of them as vBulletin does: Poll option texts, votes per option - what's missing is the number of total votes, but that's of course easily built by adding together the per-option counts.
Bottom line: The thread gatherer will need a bit of work, and I'll have to adapt the parsing code to slightly different HTML, but it shouldn't take me longer than writing this post did .

So, no reason to hold off on poll grafts. A more pressing question is whether XenForo allows for adding/grafting polls, which I haven't looked into yet.

Edit: After not finding a direct way to add a poll to a test thread I made in their forum. I found this question-and-answer thread:

So, grafting is possible in XenForo, but only using the cumbersome merge maneuver, which we may actually be able to avoid in vBulletin after all. Dunno if it would be a good idea to add all the polls in one swoop while still on vBulletin, and then just throttle doing the "hey, did you see the new poll" replies (if we want to do those at all, etc etc). Then again I guess we already had come to terms with the merge requirement earlier.

Defcon wrote: View Post
I like the distribution graphs by the way.

Last edited by Sho; January 18 2012 at 09:33 PM.
Sho is offline   Reply With Quote