Saturday 4 June 2011

User Story: Gamification and Wikimedia Commons mass upload

Gamification is the use of game play in non-game applications.

This is a user story of the near future for Wikimedia Commons, based on a mini-brainstorm chat in the pub with Tom Morris and Shimgray immediately after the British Library English & Drama editathon. This was also in the context of the recent workshop with selected Wellcome Trust researchers on how to release mass image donations and some of the known issues and discussions on Commons for previous mass donations.

It is June 2012 and three types of mass donation over the last 6 months have resulted in the Wikimedia Foundation, two major public bodies and two Chapter coordinated volunteer groups radically revising the possible application of Commons and for the first time formed direct partnerships with external digital libraries. All three have gamification programmes.

1Philately and crowd-sourcing the catalogue

A large archive has donated a set of high quality scans on CD ROM of philatelic artefacts. These contain an estimated 350,000 different objects including many known to be unique to the archive. The archive does not have the resources to catalogue these artefacts fully but they are hierarchically organized into several main groups including pre-1955 postage stamps from African countries, Cinderella stamps from many countries dating from 1860 to 1968 and a wide collection of airmail stamps. They have been selected to be out of copyright but this is not guaranteed for all images due to the partial catalogue.

The volunteer teams have access to a hosted staging area for the images and new e-volunteers from the philatelic community are encouraged to apply to help but due to possible copyright issues registration is necessary. A total of 40 participants are gradually working through the collection to apply a standard set of categorization tags using a visual front-end that seamlessly draws on the live Commons categorization structures and provides feedback to the participants on their progress, the quality of their classification work (based on sample expert checks) and the total programme progress. Where copyright is not obvious, these are flagged for further expert review.

By April 2012 over 80,000 images have been released to the public on Commons and the same images and metadata support the archive's public catalogue (which includes the copyrighted images not released on Commons). Based on the current burn-down rate the 350,000 images will be fully catalogued by the end of July and the archive is considering now planning a collaborative digitization programme for a further 200,000 objects.

The wider philatelic community has already recognized Commons as a reference resource and an external organization has used the Commons API to produce a simplified front end which can be used as an integrated part of their established on-line catalogue.

2Political history and identification

A small cultural archive has had a co-funded a volunteer programme to digitize a number of political history related artefacts to Commons. Though the majority of these date within the last 70 years, they are being released on a no known copyright basis. The programme is on-going and 3,000 high quality images have been released out of an estimated 40,000.

A key issue with classification has been to identify a number of professional photographs that were donated from a 1970-1990's magazine archive. These were mostly interviewees for the magazine and so all had consent for release but whether the photographs were used in the final print is unknown and the photographs have no identification or dating on them.

As they are scanned the images are uploaded the same day to Commons. Using a system of barnstars, the wider community has been encouraged to compete to classify the backlog of new images and an optional user-script has been created to help compare related images existing on Commons and automatically cross-match to TinEye prospects on the internet.

The attribution on every Commons image page links back to the donor archive's website and as a result the archive has seen a large increase in archive access requests as these have been gradually been put in use in nearly all of the 300 available language variations of Wikipedia. The archive has put forward a funding plan for a paid intern placement to support the on-going digitization and sharing programme.
3Medical research and funware

A large research body has establish a image sharing programme in partnership with four other images services including Commons (in practice the Wikimedia Foundation and the UK chapter) to provide an image clearing service for research image mass donations. Each of the five organizations has access to the image staging area and planned mass donations fit minimal upload criteria of known copyright status (some may be constrained for non-commercial use), EXIF data with complete originator information and associated "at the bench" description data for the majority of uploads. There is an expectation that images hosted on the clearing service will persist for at least a year at a time before being deleted.

Any or all of the five participating organizations are free to host the images. Due to the large nature of some of the donations (over 10,000 images in a batch), each major donation type has required Commons community discussion and planning though only the fully copyright free images are considered. Not all of the donations have been considered of likely general public educational value.

Separate from the main Commons site, a funware rating system (cheezburger) has been developed (which can run on mobile devices) to help rank categories with over 400 images for further decomposition. As well as helping to tag images with suitable sub-categories, game players are able to subjectively compare images for interest and quality. As a result of this work, the main Commons interface includes options for these categories to display recommended images from larger sets. This has been successful enough to be extended beyond the medical research image sets in order to gameify all large categories on Commons and has proved particularly popular with a range of "soft" categories such as "1960s" and "cats".

No comments:

Post a Comment