Lemmatization, Pos and Parsing 2.0 Documentation

Do you have any questions? Write us an email or ask us through the feedback section.

Response

The Lemmatization, PoS and Parsing API performs a deep analysis of a text, including syntactic and morphological analysis. The morphological analysis (Part of Speech tagging) includes lemmatization of each token of the text.

This API uses a complex structure to describe the elements analyzed, which we will refer to as tokens. It will be considered that a token representing the morphological analysis of a word corresponds to the token at the deepest level of its syntactic analysis. Or, in other words, the leaves of a syntactic tree will be the morphological analysis or PoS tagging.

The information provided is the same for the different output formats and the naming convention used for all fields is lowercase_separated_by_underscore.

The output contains information about the status of the request and about the complete analysis that has been requested. As it has already been mentioned, the basic element of these analyses is the token, which through different configurations will represent every possible node in a syntactic tree analysis.

The syntactic tree will have as many levels as the analysis requires. The first level will separate the sentences in the text and the lowest levels, the leaves of the tree, will be the basic elements: elemental tokens that will provide the morphological analysis where the PoS are assigned.

The following table shows the fields included in the response object.

PoS tagging and syntactic analysis output

Name	Description
`status`	Describes the request outcome in terms of success or failure.
`status`.`code`	Numerical value of result code. Refer to the error code catalog.
`status`.`msg`	Human-readable error code, if any, or`OK`.
`status`.`credits`	Credits consumed by the request.
`status`.`remaining_credits`	Credits left to reach the usage limit.
`token_list`		List of tokens/units in which the input text is divided. Most of the time they correspond to words, but in some cases more than one word will be included in the same token. Each token will be represented by the object `token` and will represent any node in a syntactical tree, considering that the leaves in this tree will correspond to the morphological analysis of the element. Check the token object to see in detail the fields in a `token` element.
`global_sentiment`	Sentiment analysis information at a global level. The sentiment analysis information will be included in the response only when de sm (sentiment model) parameter is sent in the request and it's not empty. This field includes:
`global_sentiment`.`model`	Shows the model used in the evaluation followed by an underscore and its language.
`global_sentiment`.`score_tag`	Indicates the polarity found (or not found) in the text. The possible values are the following: P+: strong positive P: positive NEU: neutral N: negative N+: strong negative NONE: without sentiment
`global_sentiment`.`agreement`	Marks the agreement between the polarities detected in the text. It has two possible values: AGREEMENT: the different elements have the same polarity. DISAGREEMENT: there is disagreement between the different elements' polarity.
`global_sentiment`.`subjectivity`	Marks the subjectivity of the text. It has two possible values: OBJECTIVE: the text does not have any subjectivity marks. SUBJECTIVE: the text has subjective marks.
`global_sentiment`.`confidence`	Represents the confidence associated with the sentiment analysis performed on the text. Its value is an integer number in the 0-100 range.
`global_sentiment`.`irony`	Indicates the irony of the text. It has two possible values: NONIRONIC: the text does not have ironic marks IRONIC: the text has ironic marks

Token object

The following table contains the fields that will appear in a token, and how we will represent each node of our morphosyntactic tree.

PoS tagging and syntactic analysis output

Name	Description
`type`		Indicates which is the type of the token. multiword: groups tokens that carry out the same elemental morphological function but that do not come directly from resources. phrase: type assigned to the group of tokens that perform a syntactic function. sentence: type of the sentences that form the text. If no value appears, the token will be considered of the elemental type. Elemental types come from resources.
`form`	Form of a Token
`normalized_form`		Normalized form of the token. It will contain different values depending on what the token contains. The values will be identified by a prefix: 'numeric@' for numerals: actual numeric value (integer/float value) 'semverb@' for verbal periphrasis: semantic head of the periphrasis (main verb). 'date@' for time and date elements: string associated through which its value is represented; it will follow the format `century\|era\|season\|weekday\|year\|month\|day\|hour\|minutes\|seconds\|timezone` where each field will take the values below: century, year, month, day, hour, minutes, seconds: numeric values era: after Christ (aC), before Christ (dC) season: spring (s), summer (v), autumn (a), winter (w) weekday: Monday (m), Tuesday (t), Wednesday (w), Thursday (j), Friday (f), Saturday (s), Sunday (d) timezone: must be specified either by using the standard timezones designations or with the offset with respect to GMT, e.g.: CE+02:00 +/- indicate references after/before the returned value (e.g. +2 days) ~ indicates approximate values ("about") 'hashtags@': with the content of the hashtag without the `#` symbol, and separating the words it contains (marked by capital letters) using spaces. For example, the token for #BringBackOurGirls will have the value 'Bring Back Our Girls'. 'checkinfo@': with all the proofreading suggestions associated to the token separated by pipes, `\|`.
`id`	Identifier of the token unique for the request and represented by a natural number.
`inip`	Initial position of the token, starting from 0. Tokens with type sentence will always have 0 in this field.
`endp`	End position of the token. Tokens with type sentence will always have 0 in this field.
`style`		This object contains information about the style of the text: `isBold`: it will have the value yes if the token in the input is in bold, no otherwise. `isItalics`: it will have the value yes if the token in the input is in italics, no otherwise. `isUnderlined`: it will have the value yes if the token in the input is underscored, no otherwise. `isTitle`: it will have the value yes if the token in the input is in a title, no otherwise.
`separation`
Describes how the token is separated with respect to the previous one. The possible values are the following: _: paragraph break -: line break A: no break 1: one blank space 2: several blank space
`quote_level`	Index that indicates the level of quote of an element. Currently only one level is supported.
`affected_by_negation`		If a `token` is affected by a negation, its sentiment may vary, and so, this field will be added to the analysis. Possible values are yes or no. It only applies when `lang` parameter is `en`, `es` or `fr`.
`head`		Identifies which child of the token defines its function by its token `id`. For instance, the head in a noun group will be the noun included in it; in a prepositional group, the head will be the preposition, etc. Clauses will not have a head child.
`syntactic_tree_relation_list`		List of syntactic relations of the token. Each element is represented by the `syntactic_tree_relation` object. This object will have fields: `id`: token id of the token to which it is related. `type`: type of relation. There's more information about the values that may appear in this field in the syntactic relations section.
`analysis_list`		List with all the possible morphosyntactic analyses of the token. Each analysis will be represented with an `analysis` element. It's structure is explained in detail in the analysis object section.
`sense_list`		List of senses or semantic analyses associated to the token. Each sense has four different fields: `id`: identifier of the sense which will be used to link the morphosyntactic analyses with the semantic ones. `info`: string with all the attributes that conform the semantic information associated to the sense. The formatting of this string is explained in more detail in the semantic information section. `form`: Form associated to this sense in the language specified in the `ilang` parameter. `official_form`: Official form associated to this sense, that is, its official name in cases when it's different from the `form`. For instance, "United States" vs "United States of America". It's returned in the language specified in the `ilang` parameter.
`sentiment`		Object that shows the polarity of the `token` in which it is included, or inherited from other token. See sentiment object for further information. The sentiment analysis information will be included in the response only when de `sm` (sentiment model) parameter is sent in the request and it's not empty.
`topic_list`	This element will show any topics associated to the token. The possible topics types that will appear are `entity`, `concept`, `time_expression`, `money_expression`, `quantity_expression` [beta], `other_expression`, `quotation_list` and `relation_list`. The format followed will be the same as the one used in the response of the Topics Extraction API; the only difference will be in the aggregated fields (such as `variant_list` and `relevance`), which will not appear.
`token_list`		List of children of a token. Elemental tokens will not have any children, so this field will only appear in non-leaf tokens. Each element will be a `token` object and will follow the same structure that has been described in this section.

Sentiment element

The following table shows the different fields that will appear in the sentiment object.

Sentiment Object

Name Description

Name	Description
`self_sentiment`		Sentiment analysis associated to this token. It's an object with the following fields: `text`: Text of the token, including between parentheses the polarity modifiers it is affected by, and the context words used to determine its polarity. `inip`: position in which the token begins (starting from 0). `endp`: position in which the token ends. `tag_stack`: polarity modifiers affecting this token. It appears only when `verbose=y`. `confidence`: confidence associated with the sentiment analysis performed on the text. Its value is an integer number in the 0-100 range. `score_tag`: This tag indicates the polarity found (or not found) in the token it refers to. The possible values are the following: P+: strong positive P: positive NEU: neutral N: negative N+: strong negative NONE: without sentiment
`inherited_sentiment`		Sentiment analysis affecting this token, but inherited from another one. This field is an onbject containing the following fields: `relation_list`: array of relation objects. It will contain information about the different tokens from which this token inherits sentiment. `id`: unique identifier of the related token. `type`: `hasInheritedSentiment` `score_tag`: polarity found. Same possible values as in `self_sentiment` `score_tag`.

self_sentiment

Sentiment analysis associated to this token. It's an object with the following fields:

text: Text of the token, including between parentheses the polarity modifiers it is affected by, and the context words used to determine its polarity.
inip: position in which the token begins (starting from 0).
endp: position in which the token ends.
tag_stack: polarity modifiers affecting this token. It appears only when verbose=y.
confidence: confidence associated with the sentiment analysis performed on the text. Its value is an integer number in the 0-100 range.
score_tag: This tag indicates the polarity found (or not found) in the token it refers to. The possible values are the following:
- P+: strong positive
- P: positive
- NEU: neutral
- N: negative
- N+: strong negative
- NONE: without sentiment

inherited_sentiment

Sentiment analysis affecting this token, but inherited from another one. This field is an onbject containing the following fields:

relation_list: array of relation objects. It will contain information about the different tokens from which this token inherits sentiment.
- id: unique identifier of the related token.
- type: hasInheritedSentiment
score_tag: polarity found. Same possible values as in self_sentiment score_tag.

Syntactic Relations

These are the syntactic relationships detected and how they will appear in the type field of the syntactic_tree_relation element.

isAgentComplement	isDirectObject	isNonAnaphoric	isRestrictiveApposition
isAnaphora	isIndirectObject	isNonRestrictiveApposition	isSubject
isAttribute	isLocationComplement	isPossessor	isTimeComplement
isComplement	isMannerComplement	isPredicative
isCoreference	isNegationComplement	isQuantityComplement

In order to provide bidirectional relations, every single type will have its inverse version. This means that if token A is related to B through type, then B will be related to A through the inverse of type. The notation used for every inverse relation type will be and abbreviation of 'inverse of': iof_ + type. If A is related to B through the isSubject, then B will be related to A through iof_isSubject.

Analysis element

The following table contains the fields that will appear in an analysis, and how we will represent each node of our morphosyntactic tree.

Analysis Object

Name	Description
`origin`		Specifies where the analysis comes from, specifically if any additional techniques have been applied to find the analysis. The possible values are the following: SEG: secure origin, obtained from resources SUF: known suffix added to a word from resources PREF: known prefix added to a word from resources AORT: words written without the necessary accent marks ORTO: the analysis has been obtained by the spellchecker (Deal with unknown words parameter, `uw`)
`variety_dictionary`		Shows, in the cases where it applies, from which language variety dictionary comes the analysis. It only appears when it's not from the general dictionary or from a user dictionary (specified in the `ud` parameter). These are the possible values: es-LA: Latin American Spanish es-NL: Northern Latin American Spanish es-SL: Southern Latin American Spanish es-CA: Central American Spanish es-CB: Caribbean Spanish es-MX: Mexican Spanish en-GB: British English en-US: American English
`tag`	Morphosyntactic tag associated to the token. A detailed explanation of what every feature in each tag means can be found in the morphosyntactic tagset section.
`lemma`	Lemma associated to the analysis.
`original_form`	Original form of the token, that is, how it appears in the text.
`tag_info`	Explained values of the morphosyntactic `tag`. It will only be output when the `verbose` parameter is enabled, and it's shown in the language analyzed.
`variety_dictionary_info`	Expanded value of the language variety dictionary that appears in `variery_dictionary`. It will only be output when the `verbose` parameter is enabled, and it's shown in the language specified in the `ilang` parameter.
`check_info`		Checker information associated to the token, which may include spelling, grammar and style errors and recommendations. `tag`: tag associated to the error/recommendation. A detailed explanation can be found at the checkinfo section of the morphosyntactic tagset. `form_list`: list of forms of the suggestions provided to improve the text. Each one will be tagged as a `form` element. `check_extra_info`: extended information related to the `tag` of the error/recommendation.
`remission`	In the cases where the form is a variant of something else, this field will show the complete form. For instance 'cause will remit to because.
`sense_id_list`	List of sense identifiers, or in other words, list of semantic analyses associated to the morphosyntactic analysis. Each one will be tagged as a `sense_id` element.

Semantic Information

In the string with semantic information the different attributes are separated by specific characters depending on the level of the type of information.

Level	Character	Description
1	\t	Separates first level attributes
2	/	Separates a complex attribute from its content (*)
3	\|	Separates the different values associated to an attribute
4	@	Separates the different attributes within a complex attribute
5	=	Separates the name from the value in simple attributes
6	#	Separates attributes associated to a simple attribute value
8	>	Separates hierarchy values within an simple attribute value

(*) Except for the semld elements with URL format.

Response examples

Sample response:

{3 items
"status":{3 items
"code":"0"
"msg":"OK"
"credits":"1"
}
"token_list":[1 item
0:{10 items
"type":"sentence"
"id":"15"
"inip":"0"
"endp":"66"
"style":{...
}4 items
"separation":"A"
"quote_level":"0"
"affected_by_negation":"no"
"sentiment":{...
}1 item
"token_list":[...
]2 items
}
]
"global_sentiment":{6 items
"model":"general_en"
"score_tag":"P"
"agreement":"AGREEMENT"
"subjectivity":"OBJECTIVE"
"confidence":"100"
"irony":"NONIRONIC"
}
}

If of=img the response will be a gif image with the syntactic tree. When we select a sentiment model in the call to the API (for instance, sm=general the tree will show the sentiment information.