Name | Description |
---|---|
status | Describes the request outcome in terms of success or failure. |
status .code | Numerical value of result code. Refer to the error code catalog. |
status .msg | Human-readable error code, if any, orOK . |
status .credits | Credits consumed by the request. A credit corresponds to a bucket of 500 words. Did you know...?Only successful requests consume credits. |
status .remaining_credits | Credits left to reach the usage limit. |
entity_list | Contains the named entities found in the text, represented as entity objects. |
concept_list | Contains the concepts found in the text, represented as concept objects. |
time_expression_list | Contains the time expressions found in the text, represented as time_expression objects. |
money_expression_list | Contains the money expressions found in the text, represented as money_expression objects. |
quantity_expression_list | [beta] Contains the quantity expressions found in the text, represented as quantity_expression objects. |
other_expression_list | Contains the unknown alphanumeric patterns found in the text, represented as other_expression objects. |
quotation_list | Contains the quotations found in the text, represented as quotation objects. |
relation_list | Contains the syntactic triples (subject-action-object) found in the text, represented as relation objects. |
Both entities and concepts have the same basic structure even if some of the specific values found in each field are different. In the following explanation element will refer to both entity
and concept
objects.
Each element found will be a node in our ontology. There are two types of information associated to each element:
form
), how many times and in which form it appears in the text (variant_list
), its global relevance
, if it belongs to a specific dictionary
, and in the cases where it is a known element, its unique identifier, id
in the ontology and known standards (standard_list
).sementity
), geographical and thematic information (semgeo_list
and semtheme_list
) and other, more generic types (semrefer_list
).sementity
will be the only semantic aspect of the element that will be mandatory, as it will be associated to the sense of the element found and each sense translates into an entity
/concept
object in the output. In terms of the ontology, sementity
contains information from the node in the ODENTITY_TOP branch to which the element node found is related to. For example, London has two senses, last name and city, so in a scenario with no disambiguation, this will mean two entities will be found, each one with a different sementity
object, one with the id
ODENTITY_LAST_NAME and the other with the id
ODENTITY_CITY.
The sementity
element contains a field called type
with the expanded hierarchy of the entity type that provides a much more intuitive grasp of the sense associated to the element. Each level of the hierarchy will follow a notation a bit more user-friendly than the node names seen until now: the entity type id will lose the prefix ODENTITY_, the underscores will be deleted and the capitalization will follow the upper CamelCase style. Using the previous examples:
sementity
will also include an attribute called class
, which will indicate if the element in question is an instance of the entity type, or if it is a class. In the case of an entity
, this value will always be an instance, as a named entity is always an example of the class the node sementity
represents. Elements with class=class
will appear as concept
objects.
semtheme_list
is conformed by semtheme
objects. semtheme
is quite similar to sementity
, instead of refering to the entity type (a node in the ODENTITY_TOP branch of the ontology), it points to the theme or themes the node belongs to (a node in the ODTHEME_TOP branch of the ontology). semtheme
also contains a type
field with the expanded version of the hierarchy; it follows the same pattern mentioned in sementity
before but with ODTHEME_ as prefix:
There will be as many semtheme
elements as themes the node relates to.
Both sementity
and semtheme
are characterized by always refering to class nodes. The rest of the semantic information associated to the entity
will refer to instance nodes. The main difference this will show in the output is that classes will be identified by their name (e.g. ODENTITY_CITY) while instances will be referred to by a unique alphanumeric string that univocally identifies the node in the ontology (id
).
Similarly to sementity
and semtheme
, semgeo
(each element contained in semgeo_list
) provides information on the node's hierarchy, although in this case the hierarchy corresponds to a geopolitical criteria. Instead of including the values in a single field and taking into account that some cases may be multiple inheritance (for instance, a mountain chain that belongs to two different countries), there will be specific object for each level which will be identified by its form
and its node id
.
semrefer_list
will contain other references between the entity
/concept
node and other instances in the ontology. There are currently two types: organization
, which links an instance of the ODENTITY_ORGANIZATION type (or its descendants), and affinity
which shows an affinity relationship between the entity
node and another instance in the ontology. Each object in semrefer
will be represented but its form
and its node id
.
The last field included in an entity
/concept
, semld
, is a mix of the two types of information described: it contains information specific to the node but said information are links to external ontologies such as SUMO, Wikipedia or YAGO.
The following table contains the fields that will appear in an entity
and concept
objects.
Name | Description |
---|---|
form | Form of the entity, in the language specified byilang |
official_form | Official form of the entity, like United States vs United States of America , in the language specified by ilang . |
dictionary | User dictionary name where the entity is found. |
id | Alphanumeric string that identifies uniquely the entity. This ID will correspond to the entity senseID in resources (which includes user dictionaries). If the entity is not in any of the resources but has been detected in the analysis, the ID will be specifically created for that analysis and will begin by two underscores. |
sementity | Describes the entity |
sementity .class | Contains the fixed value instance for entities. |
sementity .fiction | Contains the value fiction for fictional elements or nonfiction for non-fictional. |
sementity .id | Identifier of the node associated to the entity type. |
sementity .type | provides a more user-friendly notation for the type classification hierarchy of the entity. It will start with the highest node (Top) and each level will separated by > . Top will always appear. |
sementity .confidence | It will use the values unknown and uncertain to denote entity types infered from heuristic rules and ambiguous classifications, respectively. |
semgeo_list | Geographical information the entity is associated to. |
semgeo_list[] .continent | Continent-level information in the geographical hierarchy, represented as semgeo objects. |
semgeo_list[] .country | Country-level information in the geographical hierarchy, represented as semgeo objects. |
semgeo_list[] .adm1 | adm1-level information in the geographical hierarchy, represented as semgeo objects. |
semgeo_list[] .adm2 | adm2-level information in the geographical hierarchy, represented as semgeo objects. |
semgeo_list[] .adm3 | adm3-level information in the geographical hierarchy, represented as semgeo objects. |
semgeo_list[] .city | City-level information in the geographical hierarchy, represented as semgeo objects. |
semgeo_list[] .district | District-level information in the geographical hierarchy, represented as semgeo objects. |
semld_list | Provides a list of gateways to different open data sources. These gateways will be provided in two different formats: through a link or by providing an identifier to access the information. Refer to semld gateways to learn more. |
semrefer_list | Includes references to other nodes in the ontology (instance type nodes) represented by semrefer objects. |
semtheme_list | List of the thematic classifications, represented by semtheme objects. |
standard_list | List of international standards relevant to the sense associated to the element, represented by standard objects. |
variant_list | Alternative appearances of the entity/concept in the text, represented by variant objects. |
relevance | Relative relevance of the entity in the text compared to the other entities found. |
subentity_list | This element is composed of subentity elements that have exactly the same structure as entity . It applies only to entity objects. |
The geographical information extracted from the text is represented assemgeo
objects with the following structure:
Name | Description |
---|---|
form | Form of the country, city, etc. |
id | Identifier of the node associated to the country, city, etc. |
standard_list | Contains standard code names of the entity |
standard_list[] .id | Name of the standard |
standard_list[] .value | Name of the country, city, etc. in the given standard |
The following table includes the gateways associated to an identifier, and how to use it:
Source | Format | How to use it |
---|---|---|
SUMO | sumo:xxxxx | http://sigma-01.cim3.net:8080/sigma/Browse.jsp?kb=SUMO&term=xxxxx |
@xxxxx | http://twitter.com/xxxxx |
Name | Description |
---|---|
organization | Organizational relationships with the node specified through the subattributes form and id . An example of this type of relationship would be a company and its subsidiary. |
organization .form | Form of the organization. |
organization .id | Identifier of the node that represents the organization. |
affinity | Affinity relationships with the node specified through the subattributes form and id . An example of this type of relationship would be a company and its subsidiary. |
affinity .form | Form of the related entity. |
affinity .id | Identifier of the node that represents the related entity. |
Name | Description |
---|---|
id | identifier of the node associated to the theme the entity belongs to. |
type | provides a more user-friendly name of all the levels of the theme classification hierarchy. It will start with the highest node (Top) and each level will separated by > . |
Name | Description |
---|---|
id | Identifier of the standard. |
value | Specific value in the standard |
For example, the ISO3166-1 standard for countries will be identified as ISO3166-1-a2
when it refers to the two letters that identify each country and as ISO3166-1-a3
for the three-letter id. NYSE will be the value used to identify the ticker of a company that trades in the NY stock exchange.
These are all the values that may appear in id
:
ID | Description |
---|---|
ISO3166-1-a2, ISO3166-1-a3 | Country codes |
BEL20, BMAD, BUENOSAIRES, BVL, CAC_40, CARACAS, CORROELECTRONICO, DAX_30, EURO_STOXX50, Euronext, FTSE_100, FTSE_LATIBEX, IBEX35, LSE, LuxSE, MAB, MEXICO, MIB, NASDAQ, NYSE, OMXH25, OMXS30, SANTIAGO, SMI, SP100 | Stock exchanges |
ISO4217 | Currency codes |
ISO639-1, ISO639-2, ISO639-3, ISO639-5 | Languages codes |
ISO8601 | Dates standard |
Name | Description |
---|---|
form | The exact form found in the text |
inip | The initial position of the appearance |
endp | The final position of the appearance |
For time expressions that refer to a specific date, the following format will be used to represent its associated value:
century|era|season|weekday|year|month|day|hour|minutes|seconds|timezone
These are the values each field may have:
If an expression has no value for one of the positions, it will be empty.
These would be some examples of how this would look:
This representation of the time will be used to calculate the value in actual_time
, which will use as reference timeref
and will return a date value in one of the following three formats: YYYY-MM-DD hh:mm:ss GMT±HH:MM, YYYY-MM-DD and hh:mm:ss GMT±HH:MM. For the examples seen and using as reference 2013-01-01 12:12:12 GMT+01:00, the result would be:
In some cases, actual_time
returns values that are not certain (for example, minutes and seconds in the second example), so a precision
value is added to filter these out. The values for precision
are the positions of the normalized_form
field plus hourAMPM, minutesAMPM and secondsAMPM. This will result in obtaining different objects for it's 7:30 and it's 7:30 in the evening.
Name | Description |
---|---|
form | Form of the time expression. |
normalized_form | Normalized form associated to the time expression. |
actual_time | Actual time relative to the given time reference, based on the normalized form. |
precision | Level of precision for actual_time. |
inip | Initial position of the time expression. |
endp | End position of the time expression. |
Lists of money expressions found in the text and represented as money_exppression
objects.
It will be considered that there is a money expression when there is both a currency and an amount in a valid structure. The currency will be expressed using the ISO4217 and in the cases where more than one currency may apply, all the possible values will be returned separated by |
and ordered alphabetically.
Name | Description |
---|---|
form | Form of money expression. |
amount_form | Amount associated to the money expression as it appears in the text. |
numeric_value | Equivalent numeric value of the amount of money. |
currency | ISO4217 value associated to the currency in the money expression. Different values are separated by the character | . |
inip | Initial position of the money expression. |
endp | End position of the money expression. |
Some specific patterns will be considered known ones, and identified as such through the field type
. The patterns detected are the following:
Name | Description |
---|---|
form | Form of expression. |
type | Type of expression (default: unknown) |
inip | Initial position of the expression. |
endp | End position of the expression. |
Name | Description |
---|---|
form | Content of the quote as it appears in the text. |
who | Who the quote is attributed to. It will have two fields, the form, and the lemma. |
verb | Verb associated to the quotation. It will have two fields, the form, and the lemma. |
inip | Initial position of the expression. |
endp | End position of the expression. |
Quotations in direct speech will not always include information regarding who they are attributed to; in those cases the fields who
and verb
will not appear.
The syntactic triples will be defined by subject
-verb
pairs, and all the complements associated to that verb
. There are two possible exceptions to this:
verb
(for example, appositions). In this case, the verb
is assumed to be "to be" (or its equivalent, depending on the language), and its form
will appear between parentheses.subject
is omitted (very common in some languages such as Spanish), in which the subject
will not appear.Name | Description |
---|---|
form | Sentence in which the relation appears. |
inip | Initial position of the sentence the relation appears in. |
endp | End position of the sentence the relation appears in. |
subject | Subject of the relation. In the cases where the subject is an anaphora, the anaphora will be solved and the details that will appear will be those of the element that solves it. |
subject .form | How it appears in the text. |
subject .lemma_list | list of lemma/s of the element. Coordinated elements by definition don't have a lemma, so the field will not appear. |
subject .sense_id_list | id associated to the entity or concept the subject refers to. |
verb | Verb of the relation |
verb .form | how it appears in the text. |
verb .lemma_list | list of lemmas of the verb. |
verb .sense_id_list | id associated to the verb. |
verb .semantic_lemma_list | List of semantic lemmas associated to the verb. It will only be included when its values are different than the ones in lemma_list . |
complement_list | List of complements of the verb. |
complement_list[] .form | How it appears in the text. Anaphoras will be solved to obtain this value. |
complement_list[] .type | Type of complement. The different types of syntactic relations detected are included in the response of the Lemmatization, PoS and Parsing, specifically in the section regarding syntactic_tree_relation elements. |
degree | Degree of proximity of the relation, that is, if the relation included is in the same sentence as the subject (in the cases where an anaphora has been solved, it won't be). |
If a subject
-verb
pair appears several times in the same text, they will only appear once associated to the sentence they first appear in; the complement_list
of the following appearances will be included in that relation
.
The format in which this information will be shown will depend on the value of the of
parameter.
Arsene Wenger’s side sit third in the Premier League for the first time since September 22.
A thousand dollars could be spent trying to tackle a parking problem.
To cancel your flight, go to our web site www.example.com. If you do not see the option to revoke your flight online, call at 1 877 781 3229 to cancel your flight giving us your flight number (e.g: AA5683). Cancellations can be done until twenty four hours before flight
The child said that his brother was at Harvard University.