The Summarization API performs extractive summarization, i.e., the summary is composed of sentences extracted from the text, those which are supposed to provide the most informative load. Thus, the input text is divided into sentences; then, each sentence is analyzed through a combination of different algorithms that take different aspects into account to calculate a sentence score. Finally, the most relevant sentences (the higher scores) are returned as output, with minor post-processing (capitalization adjustments, end of sentence punctuation removal/addition, etc.).
Currently, we use the TextTeaser algorithm combined with TextRank and scores calculated using the relative position of the sentence in the text, titles and section headers, presence of words in italics and bold, numbers, and/or some special keywords/phrases.
We are also researching alternative methods, mainly abstractive summarization, involving text rewriting, using neural approaches based on sequence to sequence models. However, it has not reached production quality for a generic domain yet.