Fixing "Random Page"

From Knot Atlas
Revision as of 18:37, 25 November 2007 by Drorbn (talk | contribs) (New page: The Knot Atlas contains a few hundred human-generated pages and a few thousand computer generated pages, one per knot with up to 14 crossings and a few more for a few other classes of knot...)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to navigationJump to search

The Knot Atlas contains a few hundred human-generated pages and a few thousand computer generated pages, one per knot with up to 14 crossings and a few more for a few other classes of knots and links. If we were to choose a page at random within the Knot Atlas, with uniform distribution, we'd be likely to get some the page of some anonymous 13- or 14-crossing knot, of which there is not much interesting to say. We thus needed to change the randomization procedure to one that will give a higher weight to the more "interesting" pages.

The standard MediaWiki random page algorithm is a bit strange: at creation time, every page is assigned a random number between 0 and 1 (the column page_random in the MySQL table mw_page). Then in order to select a random page, a further random number is generated, and the MySQL database is queried to order all pages by their page_random and return the first page whose page_random is higher (in a cyclic sense) than . The page_random attribute for any given page never changes. Thus if for a given page its page_random is just a tiny bit more than the page_random of some other page, then this page is unlikely to ever be selected "at random".

We exploit the strange MediaWiki selection algorithm to our benefit, by re-assigning the page_random attribute non-randomly, so as the gaps "behind" less-interesting pages are shorter and thus they are chosen with a lower probability. This is done by the program RebuildRandomPage.php which we run periodically at the maintenance subdirectory of our MediaWiki installation. The program is quoted below.

<?php
// Rebuilds the page_random column; uncomment
// "mysql_query($action, $connection);"
// near the end of the file for wet runs.

require_once( "commandLine.inc" );

function p($title) {
  $p = 1;
  if (ereg("^8_[0-9]+$", $title)) $p=15/21;
  if (ereg("^8_[0-9]+_Quantum_Invariants$", $title)) $p=15/21;
  if (ereg("^9_[0-9]+$", $title)) $p=15/49;
  if (ereg("^9_[0-9]+_Quantum_Invariants$", $title)) $p=15/49;
  if (ereg("^10_[0-9]+$", $title)) $p=15/165;
  if (ereg("^10_[0-9]+_Quantum_Invariants$", $title)) $p=15/165;
  if (ereg("^K11[an][0-9]+$", $title)) $p=15/552;
  if (ereg("^K12[an][0-9]+$", $title)) $p=15/2176;
  if (ereg("^K13[an][0-9]+$", $title)) $p=15/9988;
  if (ereg("^K14[an][0-9]+$", $title)) $p=15/46962;
  if (ereg("^L8[an][0-9]+$", $title)) $p=15/29;
  if (ereg("^L9[an][0-9]+$", $title)) $p=15/83;
  if (ereg("^L10[an][0-9]+$", $title)) $p=15/287;
  if (ereg("^L11[an][0-9]+$", $title)) $p=15/1007;
  if (ereg("^L12[an][0-9]+$", $title)) $p=15/4276;
  if (ereg("^L13[an][0-9]+$", $title)) $p=15/7539;
  if (ereg("^T\([0-9]+,[0-9]+\)$", $title)) $p=15/36;
  return $p;
}

$connection = mysql_connect($wgDBserver, $wgDBuser, $wgDBpassword);
mysql_select_db($wgDBname, $connection);
$query = "SELECT *
          FROM mw_page
          WHERE page_namespace=0 AND page_is_redirect=0";

$res = mysql_query($query, $connection);
$Z = 0; $N=0;
while ($row = mysql_fetch_array($res)) {
  $title = $row["page_title"];
  $p = p($title);
  print "$title -> $p\n";
  $Z += $p; ++$N;
}

$res = mysql_query($query, $connection);
$r = 0;
while ($row = mysql_fetch_array($res)) {
  $id = $row["page_id"];
  $title = $row["page_title"];
  $random = $row["page_random"];
  $p = p($title);
  $r += $p/$Z;
  $action = "UPDATE mw_page SET page_random=$r WHERE page_id=$id";
  print "$title, $random: $action\n";
  // mysql_query($action, $connection);
}

print "\$N=$N; \$Z=$Z\n";

?>