- 1 Extension:OAIRepository
- 2 Install OAI extension on source wiki (oai server)
- 3 The oai repository on the server
- 4 The wiki mirror (oai client)
- 5 Issues
This page describes how to install the OAIRepository extension for mediawiki, and how to use it to mirror wikis. The instructions are based on the instructions on the extension page: mediawikiwiki:Extension:OAIRepository, and the talk page mediawikiwiki:Extension_talk:OAIRepository.
We'll use this extension to mirror wiki content on a 'server' (the 'source' wiki) to a client (the 'mirror' wiki). The client should be 'ready-only' (!). You won't be able to synchronise the two wikis (safely). For more, see Mediawiki/OAI mirror.
2 Install OAI extension on source wiki (oai server)
2.1 Get the extension
Get the extension files, and put them in w/extensions/OAI/
in your extensions directory (probably 'w/extensions' or 'wiki/extensions') should work.
The following steps to do with the mysql are a little long winded. Here's a perl script that does all of these instructions in one go: Mediawiki/OAI mirror/OAIRepository/install oai server.pl Use at your own risk - I haven't tested this extensively at all! If you run the script, you can skip modification of LocalSettings and mysql, but you still need to modify a php file (OAIRepo_body), see below.
Amend LocalSettings file: Add to LocalSettings.php :
# OAI repository for update server @include( $IP.'/extensions/OAI/OAIRepo.php' ); // $oaiAgentRegex = '/experimental/'; // $oaiAuth = true; # broken... squid? php config? wtf // $oaiAudit = true; $wgDebugLogGroups['oai'] = '/home/wikipedia/logs/oai.log';
The last line needs amending: choose a suitable directory.
2.3 Add OAI tables to the MySQL database
You then need to run some mysql on your database. In extensions/OAI, you've got
update_table.sql oaiaudit_table.sql oaiharvest_table.sql oaiuser_table.sql
(1) You need the value of $wgDBprefix, and $wgDBname.
- There may not be a prefix set. Check the value of $wgDBprefix in LocalSettings.
- The standard wiki db is 'wikidb', but it may be called something else. Check the value of $wgDBname in LocalSettings.php.
$wgDBprefix = "mw_"; $wgDBname = "wikidb";
But we assume that
$wgDBprefix = "mwsource_"; $wgDBname = "wikimirrordb";
(2) Edit Update_table.sql, to replace
/*$wgDBprefix*/ in update_table.sql with the actual value of the prefix (was determined above). You can use
perl -i.bak -pe 's/\/\*\$wgDBprefix\*\//mwsource_/g' update_table.sql
to make the edit (where you need to replace mwsource_ with your prefix.
(3) update_table.sql now needs to be run in the wiki DB. (mediawikiwiki:Extension:OAIRepository notes that this will take a significant amount of time on rather large wikis.) So you run
mysql wikimirrordb -uroot -p < update_table.sql
(where wikidb and username may change depending on your circumstances; the name of the database you can get from $wgDBname as above).
2.3.2 The other three sql files
You now need to have some tables for the OAI process itself. This can be any db to which the wiki db user has access. We choose the same db as the wikidb, but mediawikiwiki:Extension:OAIRepository has instructions for using a separate db.
Add the following in LocalSettings.php:
$oaiAuditDatabase = 'wikimirrordb';
You then need to create three tables for OAI, which is done by these sql scripts: oaiuser_table.sql , oaiharvest_table.sql , oaiaudit_table.sql.
As before (for update_table.sql), replace
$wgDBprefix for the actual prefix.
perl -i.bak -pe 's/\/\*\$wgDBprefix\*\//mwsource_/g' oaiuser_table.sql oaiharvest_table.sql oaiaudit_table.sql
to make the edit (where you need to replace mwsource_ with your prefix).
Now create additional tables:
mysql wikimirrordb -uroot -p < oaiaudit_table.sql mysql wikimirrordb -uroot -p < oaiharvest_table.sql mysql wikimirrordb -uroot -p < oaiuser_table.sql
(again: wikidb and username may change depending on your circumstances; the name of the database you can get from $wgDBname as above).
2.3.3 Adding a login for the oai user
To be able to log in to the OAIRepository, you'll have to add a login to the oaiuser table. These don't need to be the same as
$wgDBpassword, and because they may be passed in the clear, its better to use something else:
Create a file called add_user.sql
INSERT INTO mwsource_oaiuser(ou_name, ou_password_hash) VALUES ('SomeUserName', md5('SomePassword') );
and amend 'SomeUserName' and 'SomePassword'. Then run
mysql wikimirrordb -uroot -p add_user.sql
2.4 Edit OAIRepo_body.php
As detailed on the page for the extension (here), there's an error in OAIRepo_body.php. A bug report has been filed, and you should check whether this has been resolved. If not you need to make some mannual changes.
Basically, you need to insert $wgDBprefix in three places:
$this->auditTableName( $wgDBprefix . 'oaiuser' ), $this->auditTableName( $wgDBprefix . 'oaiaudit' ), $this->mAuditDb = $lb->getConnection( DB_MASTER, $wgDBprefix . 'oaiAudit', $oaiAuditDatabase );
to those functions as well
3 The oai repository on the server
The OAI repository is now installed on the server, and something like http://www.sciencemedianetwork.org/w/index.php?title=Special:OAIRepository&verb=ListMetadataFormats should now work. (You'll need the username and password to authenticate). You can try these queries:
- To get wiki text, use http://www.sciencemedianetwork.org/w/index.php?title=Special:OAIRepository&verb=ListRecords&metadataPrefix=mediawiki
(This won't on the present wiki, as the extension isn't installed yet.)
Once you have the repository set up, you can do things with it. For instance, mediawikiwiki:Extension:OAIRepository explains how to set up a lucene search. On these pages, we'll set up another wiki as a client.
4 The wiki mirror (oai client)
The idea is to set up a second wiki, that acts as harvester for the source wiki.
It's not clear how much of the above needs to be repeated for the client wiki, but going through all the steps seems to work.
4.1 Install the extension
... as above.
4.2 Modify LocalSettings.php
... as above.
For testing, I installed both wikis on the same server, in the same database, and thus I used a different wgDBprefix:
$wgDBprefix = "mwmirror_"; $wgDBname = "wikimirrordb";
4.3 Do the mysql modifications
... as above.
4.4 Edit Repo_body
... probably not necessary, as we won't be using the wiki as a repo.
Add the following lines to LocalSettings.php to enable the harvester:
@include( $IP.'/extensions/OAI/OAIHarvest.php' ); $oaiSourceRepository = "http://url.to.the.source.wiki/wiki/index.php/Special:OAIRepository";
(where the url points to the respository created above).
The client wiki is now pointed at the repository created previously, and we can run
on the client. This will return a number of messages about pages to be updated. (However, see issues below!)
I couldn't get this to work with authentication. If you comment these lines on client and server, it works.
// $oaiAgentRegex = '/experimental/'; // $oaiAuth = true; # broken... squid? php config? wtf // $oaiAudit = true;
5.2 Files and images
When trying to transfer files between wikis (v.1.15), you get:
File updating temporarily broken on 1.11, sorry!
so file/image transfer seems to be broken, due to the change in image handling from 1.11 onwards (and the removal of wfImageDir).
A hack to fix this. In OAIHarvest.php, below
echo "File updating temporarily broken on 1.11, sorry!\n";
$image = Image::newFromTitle($upload['filename']); $new_image = preg_replace("/^.*?images\//","", $upload["src"]); $new_image = $image->repo->directory . "/" . $new_image; $filename = $new_image;
This relies on the source wiki using /images/ as upload directory. A better solution would probably be to enter this into the configuration.
There's also an issue with caching:
- Add image link to source
- Run harvester on client
- Add actual image to source
- Run harvester again
In that case, the page on the client need to be purged for the image to show. If the client is low traffic, caching could be turned off to work around this.