Automatic Summarizer Manual

Introduction

The Automatic Summarizer creates a summary by assigning values to each sentence based on specific statistical features of the text. After combining for each sentence the different feature values, the highest scoring sentences are displayed. This summarizer is specifically tailored for scientific papers and therefore will not perform very good on regular texts.

Sentence features

Five features are used to calculate the score of a sentence:

Using the Automatic Summarizer

Formatting the source text with Tag-buttons

The first step to generate a summary is to paste the source text in the large text area. If the text only consists of a title and headings of the same level, no tags have to be added. In that case the title is the first line of the source text, after which a blank line must be inserted. Every heading should be preceded and followed by a blank line. In the case of a paragraph (non-heading) of a single sentence, this sentence must be tagged by the <T> (Text) tag, even if this sentence spans multiple lines. Tagging can be done by highlighting the sentence(s) you wish to tag and pressing the corresponding tag button below the source text.

If the source text contains multi-level headings (e.g. headings of sections and headings of subsections), these can be tagged as follows. Do not tag the headings, but tag the subheadings with the tags starting from <H2> for each sublevel. Please also make sure that single line paragraphs are tagged by the <T> tag.

It is also possible to tag the text completely. This should be done in the following way. The title must be tagged with the <H0> (Title) tag. The highest level headings after the title (e.g. headings of sections) must be tagged with the <H1> (Heading 1) tag. Lower headings (e.g. headings of subsections) should be tagged with the following <H#> tags. Note that headings of the same level should be tagged with the same tag. Finally the complete text below each lowest-level heading should be selected and tagged with the <T> (Text) tag. Only one text-block can appear below each lowest level heading (so the text-block can be multiple paragraphs long, must always contain all text below the heading and may never contain other headings). To remove tags from the (part of the) source text, select the desired part and press the Delete Tags button.

To make sure certain sentences appear in the summary (e.g. author information), these sentences should be highlighted and be tagged with the <F> (Forward to summary) tag. Note that the number of sentences which can be forwarded is limited by the size of the summary.

A parser is used to separate the source text into sentences. In most cases this will succeed, however it is possible that a single sentence gets split into two other (incorrect) sentences (e.g. because of initials, or abbreviations not in the text-file). In this case the cursor should be placed after the token which forces the sentence split (e.g. a point (.)) and the <NS> button should be pressed. In this case the sentence will not be split. It is also possible that two sentences are incorrectly merged into one sentenc. In this case the cursor should be placed after the first sentence and the <S> button should be split. In this case the sentences will not be merged.

Parameters and buttons