Response

The Lemmatization, PoS and Parsing API performs a deep analysis of a text, including syntactic and morphological analysis. The morphological analysis (Part of Speech tagging) includes lemmatization of each token of the text.

This API uses a complex structure to describe the elements analyzed, which we will refer to as tokens. It will be considered that a token representing the morphological analysis of a word corresponds to the token at the deepest level of its syntactic analysis. Or, in other words, the leaves of a syntactic tree will be the morphological analysis or PoS tagging.

The information provided is the same for the different output formats and the naming convention used for all fields is lowercase_separated_by_underscore.

The output contains information about the status of the request and about the complete analysis that has been requested. As it has already been mentioned, the basic element of these analyses is the token, which through different configurations will represent every possible node in a syntactic tree analysis.

The syntactic tree will have as many levels as the analysis requires. The first level will separate the sentences in the text and the lowest levels, the leaves of the tree, will be the basic elements: elemental tokens that will provide the morphological analysis where the PoS are assigned.

The following table shows the fields included in the response object.

PoS tagging and syntactic analysis output

NameDescription
statusDescribes the request outcome in terms of success or failure.
status.codeNumerical value of result code. Refer to the error code catalog.
status.msgHuman-readable error code, if any, orOK.
status.creditsCredits consumed by the request.
status.remaining_creditsCredits left to reach the usage limit.
token_list

List of tokens/units in which the input text is divided. Most of the time they correspond to words, but in some cases more than one word will be included in the same token. Each token will be represented by the object token and will represent any node in a syntactical tree, considering that the leaves in this tree will correspond to the morphological analysis of the element.

Check the token object to see in detail the fields in a token element.

global_sentiment

Sentiment analysis information at a global level. The sentiment analysis information will be included in the response only when de sm (sentiment model) parameter is sent in the request and it's not empty. This field includes:

global_sentiment.model

Shows the model used in the evaluation followed by an underscore and its language.

global_sentiment.score_tag

Indicates the polarity found (or not found) in the text. The possible values are the following:

  • P+: strong positive
  • P: positive
  • NEU: neutral
  • N: negative
  • N+: strong negative
  • NONE: without sentiment

global_sentiment.agreement

Marks the agreement between the polarities detected in the text. It has two possible values:

  • AGREEMENT: the different elements have the same polarity.
  • DISAGREEMENT: there is disagreement between the different elements' polarity.

global_sentiment.subjectivity

Marks the subjectivity of the text. It has two possible values:

  • OBJECTIVE: the text does not have any subjectivity marks.
  • SUBJECTIVE: the text has subjective marks.

global_sentiment.confidence

Represents the confidence associated with the sentiment analysis performed on the text. Its value is an integer number in the 0-100 range.

global_sentiment.irony

Indicates the irony of the text. It has two possible values:

  • NONIRONIC: the text does not have ironic marks
  • IRONIC: the text has ironic marks

Token object

The following table contains the fields that will appear in a token, and how we will represent each node of our morphosyntactic tree.

PoS tagging and syntactic analysis output

NameDescription
type

Indicates which is the type of the token.

  • multiword: groups tokens that carry out the same elemental morphological function but that do not come directly from resources.
  • phrase: type assigned to the group of tokens that perform a syntactic function.
  • sentence: type of the sentences that form the text.

If no value appears, the token will be considered of the elemental type. Elemental types come from resources.

formForm of a Token
normalized_form

Normalized form of the token. It will contain different values depending on what the token contains. The values will be identified by a prefix:

  • 'numeric@' for numerals: actual numeric value (integer/float value)
  • 'semverb@' for verbal periphrasis: semantic head of the periphrasis (main verb).
  • 'date@' for time and date elements: string associated through which its value is represented; it will follow the format century|era|season|weekday|year|month|day|hour|minutes|seconds|timezone where each field will take the values below:
    • century, year, month, day, hour, minutes, seconds: numeric values
    • era: after Christ (aC), before Christ (dC)
    • season: spring (s), summer (v), autumn (a), winter (w)
    • weekday: Monday (m), Tuesday (t), Wednesday (w), Thursday (j), Friday (f), Saturday (s), Sunday (d)
    • timezone: must be specified either by using the standard timezones designations or with the offset with respect to GMT, e.g.: CE+02:00
    • +/- indicate references after/before the returned value (e.g. +2 days)
    • ~ indicates approximate values ("about")
  • 'hashtags@': with the content of the hashtag without the # symbol, and separating the words it contains (marked by capital letters) using spaces. For example, the token for #BringBackOurGirls will have the value 'Bring Back Our Girls'.
  • 'checkinfo@': with all the proofreading suggestions associated to the token separated by pipes, |.
id

Identifier of the token unique for the request and represented by a natural number.

inip

Initial position of the token, starting from 0. Tokens with type sentence will always have 0 in this field.

endp

End position of the token. Tokens with type sentence will always have 0 in this field.

style

This object contains information about the style of the text:

  • isBold: it will have the value yes if the token in the input is in bold, no otherwise.
  • isItalics: it will have the value yes if the token in the input is in italics, no otherwise.
  • isUnderlined: it will have the value yes if the token in the input is underscored, no otherwise.
  • isTitle: it will have the value yes if the token in the input is in a title, no otherwise.
separation

Describes how the token is separated with respect to the previous one. The possible values are the following:

  • _: paragraph break
  • -: line break
  • A: no break
  • 1: one blank space
  • 2: several blank space
quote_level

Index that indicates the level of quote of an element. Currently only one level is supported.

affected_by_negation

If a token is affected by a negation, its sentiment may vary, and so, this field will be added to the analysis. Possible values are yes or no. It only applies when lang parameter is en, es or fr.

syntactic_tree_relation_list

List of syntactic relations of the token. Each element is represented by the syntactic_tree_relation object. This object will have fields:

  • id: token id of the token to which it is related.
  • type: type of relation. There's more information about the values that may appear in this field in the syntactic relations section.
analysis_list

List with all the possible morphosyntactic analyses of the token. Each analysis will be represented with an analysis element. It's structure is explained in detail in the analysis object section.

sense_list

List of senses or semantic analyses associated to the token. Each sense has four different fields:

  • id: identifier of the sense which will be used to link the morphosyntactic analyses with the semantic ones.
  • info: string with all the attributes that conform the semantic information associated to the sense. The formatting of this string is explained in more detail in the semantic information section.
  • form: Form associated to this sense in the language specified in the ilang parameter.
  • official_form: Official form associated to this sense, that is, its official name in cases when it's different from the form. For instance, "United States" vs "United States of America". It's returned in the language specified in the ilang parameter.
sentiment

Object that shows the polarity of the token in which it is included, or inherited from other token. See sentiment object for further information.

The sentiment analysis information will be included in the response only when de sm (sentiment model) parameter is sent in the request and it's not empty.

topic_list

This element will show any topics associated to the token. The possible topics types that will appear are entity, concept, time_expression, money_expression, quantity_expression [beta], other_expression, quotation_list and relation_list. The format followed will be the same as the one used in the response of the Topics Extraction API; the only difference will be in the aggregated fields (such as variant_list and relevance), which will not appear.

token_list

List of children of a token. Elemental tokens will not have any children, so this field will only appear in non-leaf tokens. Each element will be a token object and will follow the same structure that has been described in this section.

Sentiment element

The following table shows the different fields that will appear in the sentiment object.

Sentiment Object

NameDescription
self_sentiment

Sentiment analysis associated to this token. It's an object with the following fields:

  • text: Text of the token, including between parentheses the polarity modifiers it is affected by, and the context words used to determine its polarity.
  • inip: position in which the token begins (starting from 0).
  • endp: position in which the token ends.
  • tag_stack: polarity modifiers affecting this token. It appears only when verbose=y.
  • confidence: confidence associated with the sentiment analysis performed on the text. Its value is an integer number in the 0-100 range.
  • score_tag: This tag indicates the polarity found (or not found) in the token it refers to. The possible values are the following:
    • P+: strong positive
    • P: positive
    • NEU: neutral
    • N: negative
    • N+: strong negative
    • NONE: without sentiment
inherited_sentimentSentiment analysis affecting this token, but inherited from another one. This field is an onbject containing the following fields:
  • relation_list: array of relation objects. It will contain information about the different tokens from which this token inherits sentiment.
    • id: unique identifier of the related token.
    • type: hasInheritedSentiment
  • score_tag: polarity found. Same possible values as in self_sentiment score_tag.

Syntactic Relations

These are the syntactic relationships detected and how they will appear in the type field of the syntactic_tree_relation element.

isAgentComplementisDirectObjectisNonAnaphoricisRestrictiveApposition
isAnaphoraisIndirectObjectisNonRestrictiveAppositionisSubject
isAttributeisLocationComplementisPossessorisTimeComplement
isComplementisMannerComplementisPredicative
isCoreferenceisNegationComplementisQuantityComplement

In order to provide bidirectional relations, every single type will have its inverse version. This means that if token A is related to B through type, then B will be related to A through the inverse of type. The notation used for every inverse relation type will be and abbreviation of 'inverse of': iof_ + type. If A is related to B through the isSubject, then B will be related to A through iof_isSubject.

Analysis element

The following table contains the fields that will appear in an analysis, and how we will represent each node of our morphosyntactic tree.

Analysis Object

NameDescription
origin

Specifies where the analysis comes from, specifically if any additional techniques have been applied to find the analysis. The possible values are the following:

  • SEG: secure origin, obtained from resources
  • SUF: known suffix added to a word from resources
  • PREF: known prefix added to a word from resources
  • AORT: words written without the necessary accent marks
  • ORTO: the analysis has been obtained by the spellchecker (Deal with unknown words parameter, uw)
variety_dictionary

Shows, in the cases where it applies, from which language variety dictionary comes the analysis. It only appears when it's not from the general dictionary or from a user dictionary (specified in the ud parameter).

These are the possible values:

  • es-LA: Latin American Spanish
  • es-NL: Northern Latin American Spanish
  • es-SL: Southern Latin American Spanish
  • es-CA: Central American Spanish
  • es-CB: Caribbean Spanish
  • es-MX: Mexican Spanish
  • en-GB: British English
  • en-US: American English
tag

Morphosyntactic tag associated to the token. A detailed explanation of what every feature in each tag means can be found in the morphosyntactic tagset section.

lemma

Lemma associated to the analysis.

original_form

Original form of the token, that is, how it appears in the text.

tag_info

Explained values of the morphosyntactic tag. It will only be output when the verbose parameter is enabled, and it's shown in the language analyzed.

variety_dictionary_info

Expanded value of the language variety dictionary that appears in variery_dictionary. It will only be output when the verbose parameter is enabled, and it's shown in the language specified in the ilang parameter.

check_info

Checker information associated to the token, which may include spelling, grammar and style errors and recommendations.

  • tag: tag associated to the error/recommendation. A detailed explanation can be found at the checkinfo section of the morphosyntactic tagset.
  • form_list: list of forms of the suggestions provided to improve the text. Each one will be tagged as a form element.
  • check_extra_info: extended information related to the tag of the error/recommendation.
remission

In the cases where the form is a variant of something else, this field will show the complete form. For instance 'cause will remit to because.

sense_id_list

List of sense identifiers, or in other words, list of semantic analyses associated to the morphosyntactic analysis. Each one will be tagged as a sense_id element.

Semantic Information

In the string with semantic information the different attributes are separated by specific characters depending on the level of the type of information.

In the string with semantic information the different attributes are separated by specific characters depending on the level of the type of information.

    LevelCharacterDescription
    1\tSeparates first level attributes
    2/Separates a complex attribute from its content (*)
    3|Separates the different values associated to an attribute
    4@Separates the different attributes within a complex attribute
    5=Separates the name from the value in simple attributes
    6#Separates attributes associated to a simple attribute value
    8>Separates hierarchy values within an simple attribute value

    (*) Except for the semld elements with URL format.

Response examples

Sample response:

{
"status":{
"code":
"0"
"msg":
"OK"
"credits":
"1"
}
"token_list":[
0:{
"type":
"sentence"
"id":
"15"
"inip":
"0"
"endp":
"66"
"style":{
...
}
"separation":
"A"
"quote_level":
"0"
"affected_by_negation":
"no"
"sentiment":{
...
}
"token_list":[
...
]
}
]
"global_sentiment":{
"model":
"general_en"
"score_tag":
"P"
"agreement":
"AGREEMENT"
"subjectivity":
"OBJECTIVE"
"confidence":
"100"
"irony":
"NONIRONIC"
}
}

If of=img the response will be a gif image with the syntactic tree. When we select a sentiment model in the call to the API (for instance, sm=general the tree will show the sentiment information.

Parser 2.0
Parser 2.0

The field self_sentiment is shown through nodes of different colors, while the inherited_sentiment will appear as yellow arrows between nodes:

Parser 2.0

The brighter the node, the stronger the polarity, and modifiers and negators are represented in yellow.