Diverse Cross-document Coreference and Media Bias Analysis
2024-5-17 23:11:16 Author: hackernoon.com(查看原文) 阅读量:3 收藏

This paper is available on arxiv under CC 4.0 license.

Authors:

(1) Jakob Vogel, M.A. Digital Humanities, Institute for Digital Humanities, Faculty of Philosophy, Georg August University of G¨ottingen.

The software we will use for annotation is called Inception (Klie et al., 2018). Inception is an open source annotation tool which can be freely downloaded from the authors’ GitHub repository. Although for this project, every annotator will be provided with a ready-to-code version of the program with all necessary annotation layers and settings already implemented and some sample annotations included. This instance of Inception can be requested from the project administrator Jakob Vogel.

3.1. Setup

To set up Inception on your local computer, make sure you already received your personal instance of the software. If not, please contact the project administrator.

Inception comes as a jar-file. In order to run it, you need to have the Java Runtime Environment (JRE) installed. Furthermore, make sure the file is set as executable. Then open the directory ”Inception” in your command prompt and run:

To access Inception’s graphical user interface (GUI), go to a web browser and open: http://localhost:8080/

On your first time running Inception, you will need to import the project and set up your personal user account:

• First, log in as admin (User ID: admin ; Password: admin).

• Click on ”Import project” and select the file ”proj-div-CDCR.zip” from the ”Inception” directory. Make sure to check the boxes ”Import permissions” (already checked by default) and ”Create missing users” (unchecked by default). Then click ”Import”.

• Click on ”Administration” in the GUI’s right top corner. Then click on ”Users”.

• Select your personal user or create a new one here. Assign a password to your user. Additionally, assign the role ”ROLE USER” to your user (already assigned by default). Finally, check the box ”Account enabled” and click ”Save”.

• Log out of the current Inception session.

From now on, to log into Inception, use your personal user account details instead of the admin account.

Figure 3: Screenshot of Inception window showing the user management settings. Make sure to create or activate your own user account here at first login.

To get to the annotation GUI, log in with your personal account now and click on the highlighted project name ”Diverse cross-document coreference”. Then, in the left taskbar, click on ”Annotation”. A window opens that shows a list of all documents to be annotated. The first digit in every title is a discourse identifier that sorts all documents according to their topic, followed by an underscore and a newspaper abbreviation (see Introduction). You can annotate documents in chronological order or randomly, whichever you prefer. Click on one of the documents to start your annotation.

Figure 4: Screenshot of Inception window showing a list of all documents to be annotated.

Figure 5: Screenshot of Inception window showing a not yet annotated document loaded into the annotation GUI.

3.2. User manual

Inception offers a variety of functionalities of which only those relevant for our project are described here. For a full explanation of how to use Inception, please check the official documentation which can be accessed online or from within the Inception GUI by clicking on ”Help” in the right upper corner. Every annotator’s instance of Inception contains two basic layers of annotation. The first layer, called Entity layer, is triggered when a mention is marked by highlighting text with a simple press-hold-drag mechanism. This opens the layer’s side panel. Here, annotators can fill in the Entity layer’s three parameters:

• Entity-type: a drop-down list to select a mention’s entity-type by clicking on or typing the type’s abbreviation.

• Global entity-name: a mixture of free textfield and drop-down list to assign a global entity’s name to a mention. If the name has already been used before, it can be selected as item from the drop-down list by again clicking or typing. If not, it can be freely typed which adds it as a new tag to the list.

Wikidata: a search field to type the name of an entity and find its respective Wikidata URI.

item from the drop-down list by again clicking or typing. If not, it can be freely typed which adds it as a new tag to the list. • Wikidata: a search field to type the name of an entity and find its respective Wikidata URI.

Figure 6: Annotating a mention of ”Donald Trump”: in the right panel, annotators can fill in values for the Entity layer’s three parameters Entity-type, Global entity-name, and Wikidata. Automatically suggested annotations are displayed in gray boxes above the text rows.

The second layer, called Relation, is triggered when two already marked mentions are connected to each other, again simply by clicking and holding on one mention and dragging the mouse to the other mention. This layer only contains one parameter which is named Label. It is a drop-down list to select a relation-type for labelling the connection between both mentions.

After the first annotations have been made, Inception starts to suggests spans and values for new annotations on the Entity layer. These suggestions are displayed in gray boxes. One click on a box accepts the suggestion and turns it into a proper annotation, a double-click denies the suggestion and makes the box disappear.

The GUI’s upper panel is mostly for navigating through the document. However, it also contains a button for resetting the document by deleting all annotations made so far and a button in the shape of a padlock to mark the annotation process of the document as finished. This button should be pressed at the very end of the annotation, though it is advisable to first annotate each document before marking all of them together as officially finished. Clicking on the gear wheel opens up the GUI’s style settings. Here, annotators have the option to adjust panels’ margin sizes, the colouring of annotations, and how many text rows are to be displayed simultaneously. Annotations are saved automatically which is why there exists no saving button in the GUI.

Figure 7: Annotating a relation between two mentions: the mention ”North Korea” is connected to ”North and South Korea” with a meronymyrelation (MER).


文章来源: https://hackernoon.com/diverse-cross-document-coreference-and-media-bias-analysis?source=rss
如有侵权请联系:admin#unsafe.sh