Cardiff paper

by Thomas Krichel

Status

This is the Cardiff paper. It describes a work program for the construction of a new interface to select documents in the ACIS documents selection screen.

This is the version of 2008–04–30.

Introduction

A key function of the ACIS software is selection of documents, the authorship or editorship of which correspond to the registrant's name variations profile.

To settle terminology, let us call such documents “nominal documents”.

ACIS was first implemented for the RePEc document data collection, which is medium sized, say roughly 500k records. The set of nominal documents remains within reasonable limits so that the manual selection of documents is not too onerous.

AuthorClaim's document collection stands at 34 times the size of RePEc. Much of that size comes from the PubMed dataset that does not have first names, just initials.

Joanna P. Davies found out that there are in excess of 400 hits for her name. And she is lucky because she is called Davies, rather than Davis which is the more common form.

Debates between Joanna P. Davies and Thomas Krichel have focussed on whether the addition of a keyword is search will be sufficient, or whether a machine learning approach is required. This proposal seeks to integrate keywords with machine learning.

Project contents

The document selection screen will be redesigned. The selection will be claimed (green), refused (red), unstated (black).

Add buttons

Order will push the claimed documents to the top, the refused documents to the bottom.

Order will also use a statistical learning algorithm, written in JavaScript, that will push at the unstated document that that are closest to the claimed document to the top of the unstated document part.

The Cardiff projects develops an encoding for the history of all action, and makes that history available in a hidden form field.

One feature, which has the potential to be switched off, is the "jump to first" (j2f). Users will be told to select the first document they can claim. Then all documents before that document can be thought off as refused. The ordering can start with just one paper.

The project software will include storage of the information within ACIS table, but this is something that Thomas Krichel may do himself.

Student dissertation

The title will be “Item selection with machine learning in a web interface”.

The dissertation reports on choice of machine learning algorithm that is used. It implements the algorithm in JavaScript if an existing implementation can not be found.

It sets out a short encoding for the history of the interaction of the user with the system, used to collect data for system evaluation.

It will report on j2f, determining whether it is a good or a bad idea.

It gives some overview over the running of the system, but no serious stats, just a few graphs.

Valid XHTML 1.0!