Response

Sample response:

{
"status":{
"code":
"0"
"msg":
"OK"
"credits":
"1"
}
"entity_list":[
0:{
"form":
"Robert Downey Jr"
"id":
"__12123288058840445720"
"sementity":{
...
}
"variant_list":[
...
]
"relevance":
"100"
}
1:{
"form":
"Forbes"
"id":
"db0f9829ff"
"sementity":{
...
}
"semgeo_list":[
...
]
"semld_list":[
...
]
"semtheme_list":[
...
]
"variant_list":[
...
]
"relevance":
"100"
}
2:{
"form":
"Iron Man"
"id":
"529e97f38e"
"sementity":{
...
}
"semld_list":[
...
]
"semtheme_list":[
...
]
"variant_list":[
...
]
"relevance":
"100"
}
3:{
"form":
"Dwayne Johnson"
"id":
"__4280586672389134159"
"sementity":{
...
}
"variant_list":[
...
]
"relevance":
"100"
}
4:{
"form":
"Bradley Cooper"
"official_form":
"Bradley Charles Cooper"
"id":
"3e7c9ae34b"
"sementity":{
...
}
"semgeo_list":[
...
]
"semld_list":[
...
]
"semtheme_list":[
...
]
"variant_list":[
...
]
"relevance":
"100"
}
5:{
"form":
"Chris Hemsworth"
"id":
"b2e6c3b771"
"sementity":{
...
}
"semld_list":[
...
]
"semtheme_list":[
...
]
"variant_list":[
...
]
"relevance":
"100"
}
6:{
"form":
"Leonardo DiCaprio"
"id":
"8119b88b6d"
"sementity":{
...
}
"semld_list":[
...
]
"semtheme_list":[
...
]
"variant_list":[
...
]
"relevance":
"100"
}
]
"concept_list":[
0:{
"form":
"magazine"
"id":
"a0a1a5401f"
"sementity":{
...
}
"semld_list":[
...
]
"variant_list":[
...
]
"relevance":
"100"
}
1:{
"form":
"actor"
"id":
"99e6d7a3f6"
"sementity":{
...
}
"semld_list":[
...
]
"semtheme_list":[
...
]
"variant_list":[
...
]
"relevance":
"100"
}
2:{
"form":
"star"
"id":
"35d8a8e65d"
"sementity":{
...
}
"semld_list":[
...
]
"semtheme_list":[
...
]
"variant_list":[
...
]
"relevance":
"100"
}
3:{
"form":
"star"
"id":
"c5994b45cc"
"sementity":{
...
}
"semld_list":[
...
]
"semtheme_list":[
...
]
"variant_list":[
...
]
"relevance":
"100"
}
4:{
"form":
"avenger"
"id":
"65fdadcbff"
"sementity":{
...
}
"semld_list":[
...
]
"variant_list":[
...
]
"relevance":
"100"
}
5:{
"form":
"film"
"id":
"4e7e3490af"
"sementity":{
...
}
"semld_list":[
...
]
"semtheme_list":[
...
]
"variant_list":[
...
]
"relevance":
"100"
}
6:{
"form":
"dollar"
"official_form":
"United States dollar"
"id":
"7b6858c50a"
"sementity":{
...
}
"semgeo_list":[
...
]
"semld_list":[
...
]
"semtheme_list":[
...
]
"standard_list":[
...
]
"variant_list":[
...
]
"relevance":
"100"
}
7:{
"form":
"opponent"
"id":
"d556261ad1"
"sementity":{
...
}
"semld_list":[
...
]
"variant_list":[
...
]
"relevance":
"100"
}
]
"time_expression_list":[
]
"money_expression_list":[
0:{
"form":
"$75m"
"amount_form":
"75m"
"numeric_value":
"7.5e+07"
"currency":
"USD"
"inip":
"189"
"endp":
"192"
}
]
"quantity_expression_list":[
]
"other_expression_list":[
]
"quotation_list":[
]
"relation_list":[
0:{
"form":
"The 49-year-old star of the Iron Man and Avengers films made an estimated $75m over the past year, beating rivals Dwayne Johnson, Bradley Cooper, Chris Hemsworth and Leonardo DiCaprio."
"inip":
"115"
"endp":
"297"
"subject":{
...
}
"verb":{
...
}
"complement_list":[
]
"degree":
"1"
}
1:{
"form":
"Robert Downey Jr has topped Forbes magazine's annual list of the highest paid actors for the second year in a row."
"inip":
"0"
"endp":
"112"
"subject":{
...
}
"verb":{
...
}
"complement_list":[
...
]
"degree":
"1"
}
2:{
"form":
"The 49-year-old star of the Iron Man and Avengers films made an estimated $75m over the past year, beating rivals Dwayne Johnson, Bradley Cooper, Chris Hemsworth and Leonardo DiCaprio."
"inip":
"115"
"endp":
"297"
"subject":{
...
}
"verb":{
...
}
"complement_list":[
...
]
"degree":
"1"
}
]
}

Response object:

NameDescription
statusDescribes the request outcome in terms of success or failure.
status.codeNumerical value of result code. Refer to the error code catalog.
status.msgHuman-readable error code, if any, orOK.
status.credits

Credits consumed by the request. A credit corresponds to a bucket of 500 words.

Did you know...?

Only successful requests consume credits.

status.remaining_creditsCredits left to reach the usage limit.
entity_listContains the named entities found in the text, represented as entity objects.
concept_listContains the concepts found in the text, represented as concept objects.
time_expression_listContains the time expressions found in the text, represented as time_expression objects.
money_expression_listContains the money expressions found in the text, represented as money_expression objects.
quantity_expression_list[beta] Contains the quantity expressions found in the text, represented as quantity_expression objects.
other_expression_listContains the unknown alphanumeric patterns found in the text, represented as other_expression objects.
quotation_listContains the quotations found in the text, represented as quotation objects.
relation_listContains the syntactic triples (subject-action-object) found in the text, represented as relation objects.

Entity/Concept object

Both entities and concepts have the same basic structure even if some of the specific values found in each field are different. In the following explanation element will refer to both entity and concept objects.

Each element found will be a node in our ontology. There are two types of information associated to each element:

  • Basic element information, that is, the information specific to the element found. It tells which element it is (form), how many times and in which form it appears in the text (variant_list), its global relevance, if it belongs to a specific dictionary, and in the cases where it is a known element, its unique identifier, id in the ontology and known standards (standard_list).
  • Semantic information, or the different nodes to which the element node is related to. There are different aspects of semantic information: type of entity (sementity), geographical and thematic information (semgeo_list and semtheme_list) and other, more generic types (semrefer_list).

sementity will be the only semantic aspect of the element that will be mandatory, as it will be associated to the sense of the element found and each sense translates into an entity/concept object in the output. In terms of the ontology, sementity contains information from the node in the ODENTITY_TOP branch to which the element node found is related to. For example, London has two senses, last name and city, so in a scenario with no disambiguation, this will mean two entities will be found, each one with a different sementity object, one with the id ODENTITY_LAST_NAME and the other with the id ODENTITY_CITY.

The sementity element contains a field called type with the expanded hierarchy of the entity type that provides a much more intuitive grasp of the sense associated to the element. Each level of the hierarchy will follow a notation a bit more user-friendly than the node names seen until now: the entity type id will lose the prefix ODENTITY_, the underscores will be deleted and the capitalization will follow the upper CamelCase style. Using the previous examples:

sementity will also include an attribute called class, which will indicate if the element in question is an instance of the entity type, or if it is a class. In the case of an entity, this value will always be an instance, as a named entity is always an example of the class the node sementity represents. Elements with class=class will appear as concept objects.

semtheme_list is conformed by semtheme objects. semtheme is quite similar to sementity, instead of refering to the entity type (a node in the ODENTITY_TOP branch of the ontology), it points to the theme or themes the node belongs to (a node in the ODTHEME_TOP branch of the ontology). semtheme also contains a type field with the expanded version of the hierarchy; it follows the same pattern mentioned in sementity before but with ODTHEME_ as prefix:

There will be as many semtheme elements as themes the node relates to.

Both sementity and semtheme are characterized by always refering to class nodes. The rest of the semantic information associated to the entity will refer to instance nodes. The main difference this will show in the output is that classes will be identified by their name (e.g. ODENTITY_CITY) while instances will be referred to by a unique alphanumeric string that univocally identifies the node in the ontology (id).

Similarly to sementity and semtheme, semgeo (each element contained in semgeo_list) provides information on the node's hierarchy, although in this case the hierarchy corresponds to a geopolitical criteria. Instead of including the values in a single field and taking into account that some cases may be multiple inheritance (for instance, a mountain chain that belongs to two different countries), there will be specific object for each level which will be identified by its form and its node id.

semrefer_list will contain other references between the entity/concept node and other instances in the ontology. There are currently two types: organization, which links an instance of the ODENTITY_ORGANIZATION type (or its descendants), and affinity which shows an affinity relationship between the entity node and another instance in the ontology. Each object in semrefer will be represented but its form and its node id.

The last field included in an entity/concept, semld, is a mix of the two types of information described: it contains information specific to the node but said information are links to external ontologies such as SUMO, Wikipedia or YAGO.

The following table contains the fields that will appear in an entity and concept objects.

Entity/Concept object attributes

NameDescription
formForm of the entity, in the language specified byilang
official_formOfficial form of the entity, like United States vs United States of America, in the language specified by ilang.
dictionaryUser dictionary name where the entity is found.
idAlphanumeric string that identifies uniquely the entity. This ID will correspond to the entity senseID in resources (which includes user dictionaries). If the entity is not in any of the resources but has been detected in the analysis, the ID will be specifically created for that analysis and will begin by two underscores.
sementityDescribes the entity
sementity.classContains the fixed value instance for entities.
sementity.fictionContains the value fiction for fictional elements or nonfiction for non-fictional.
sementity.idIdentifier of the node associated to the entity type.
sementity.typeprovides a more user-friendly notation for the type classification hierarchy of the entity. It will start with the highest node (Top) and each level will separated by >. Top will always appear.
sementity.confidenceIt will use the values unknown and uncertain to denote entity types infered from heuristic rules and ambiguous classifications, respectively.
semgeo_listGeographical information the entity is associated to.
semgeo_list[].continentContinent-level information in the geographical hierarchy, represented as semgeo objects.
semgeo_list[].countryCountry-level information in the geographical hierarchy, represented as semgeo objects.
semgeo_list[].adm1adm1-level information in the geographical hierarchy, represented as semgeo objects.
semgeo_list[].adm2adm2-level information in the geographical hierarchy, represented as semgeo objects.
semgeo_list[].adm3adm3-level information in the geographical hierarchy, represented as semgeo objects.
semgeo_list[].cityCity-level information in the geographical hierarchy, represented as semgeo objects.
semgeo_list[].districtDistrict-level information in the geographical hierarchy, represented as semgeo objects.
semld_listProvides a list of gateways to different open data sources. These gateways will be provided in two different formats: through a link or by providing an identifier to access the information. Refer to semld gateways to learn more.
semrefer_listIncludes references to other nodes in the ontology (instance type nodes) represented by semrefer objects.
semtheme_listList of the thematic classifications, represented by semtheme objects.
standard_listList of international standards relevant to the sense associated to the element, represented by standard objects.
variant_listAlternative appearances of the entity/concept in the text, represented by variant objects.
relevanceRelative relevance of the entity in the text compared to the other entities found.
subentity_listThis element is composed of subentity elements that have exactly the same structure as entity. It applies only to entity objects.

Semgeo object

The geographical information extracted from the text is represented assemgeoobjects with the following structure:

Semgeo object attributes

NameDescription
formForm of the country, city, etc.
idIdentifier of the node associated to the country, city, etc.
standard_listContains standard code names of the entity
standard_list[].idName of the standard
standard_list[].valueName of the country, city, etc. in the given standard

Semld gateways

The following table includes the gateways associated to an identifier, and how to use it:

SourceFormatHow to use it
SUMOsumo:xxxxxhttp://sigma-01.cim3.net:8080/sigma/Browse.jsp?kb=SUMO&term=xxxxx
Twitter@xxxxxhttp://twitter.com/xxxxx

Semrefer object

Semrefer object attributes

NameDescription
organizationOrganizational relationships with the node specified through the subattributes form and id. An example of this type of relationship would be a company and its subsidiary.
organization.formForm of the organization.
organization.idIdentifier of the node that represents the organization.
affinityAffinity relationships with the node specified through the subattributes form and id. An example of this type of relationship would be a company and its subsidiary.
affinity.formForm of the related entity.
affinity.idIdentifier of the node that represents the related entity.

Semtheme object

Semtheme object attributes

NameDescription
ididentifier of the node associated to the theme the entity belongs to.
typeprovides a more user-friendly name of all the levels of the theme classification hierarchy. It will start with the highest node (Top) and each level will separated by >.

Standard object

Standard object attributes

NameDescription
idIdentifier of the standard.
valueSpecific value in the standard

For example, the ISO3166-1 standard for countries will be identified as ISO3166-1-a2 when it refers to the two letters that identify each country and as ISO3166-1-a3 for the three-letter id. NYSE will be the value used to identify the ticker of a company that trades in the NY stock exchange.

These are all the values that may appear in id:

IDDescription
ISO3166-1-a2, ISO3166-1-a3Country codes
BEL20, BMAD, BUENOSAIRES, BVL, CAC_40, CARACAS, CORROELECTRONICO, DAX_30, EURO_STOXX50, Euronext, FTSE_100, FTSE_LATIBEX, IBEX35, LSE, LuxSE, MAB, MEXICO, MIB, NASDAQ, NYSE, OMXH25, OMXS30, SANTIAGO, SMI, SP100Stock exchanges
ISO4217Currency codes
ISO639-1, ISO639-2, ISO639-3, ISO639-5Languages codes
ISO8601Dates standard

Variant object

Variant object attributes

NameDescription
formThe exact form found in the text
inipThe initial position of the appearance
endpThe final position of the appearance

Time expression object

For time expressions that refer to a specific date, the following format will be used to represent its associated value:

century|era|season|weekday|year|month|day|hour|minutes|seconds|timezone

These are the values each field may have:

  • century, year, month, day, hour, minutes, seconds: numeric values
  • era: after Christ (aC), before Christ (dC)
  • season: spring (s), summer (v), autumn (a), winter (w)
  • weekday: Monday (m), Tuesday (t), Wednesday (w), Thursday (j), Friday (f), Saturday (s), Sunday (d)
  • timezone: must be specified either by using the standard timezones designations (CET, EST, etc.) or with the offset with respect to GMT, e.g.: GMT+02:00
  • +/- indicate references after/before the returned value (e.g. +2 days)
  • ~ indicates approximate values

If an expression has no value for one of the positions, it will be empty.

These would be some examples of how this would look:

  • It's 7:30 in the evening -- |||||||19|30||
  • 27th February at 3pm -- |||||2|27|15|||
  • 5th june 2008 -- 21||||2008|6|5||||

This representation of the time will be used to calculate the value in actual_time, which will use as reference timeref and will return a date value in one of the following three formats: YYYY-MM-DD hh:mm:ss GMT±HH:MM, YYYY-MM-DD and hh:mm:ss GMT±HH:MM. For the examples seen and using as reference 2013-01-01 12:12:12 GMT+01:00, the result would be:

  • It's 7:30 in the evening -- 19:30:00 GMT+01:00
  • 27th February at 3pm -- 2013-01-27 15:00:00 GMT+01:00
  • 5th june 2008 -- 2008-06-05

In some cases, actual_time returns values that are not certain (for example, minutes and seconds in the second example), so a precision value is added to filter these out. The values for precision are the positions of the normalized_form field plus hourAMPM, minutesAMPM and secondsAMPM. This will result in obtaining different objects for it's 7:30 and it's 7:30 in the evening.

Time expression object attributes

NameDescription
formForm of the time expression.
normalized_formNormalized form associated to the time expression.
actual_timeActual time relative to the given time reference, based on the normalized form.
precisionLevel of precision for actual_time.
inipInitial position of the time expression.
endpEnd position of the time expression.

Money expression object

Lists of money expressions found in the text and represented as money_exppression objects.

It will be considered that there is a money expression when there is both a currency and an amount in a valid structure. The currency will be expressed using the ISO4217 and in the cases where more than one currency may apply, all the possible values will be returned separated by | and ordered alphabetically.

Money expression object attributes

NameDescription
formForm of money expression.
amount_formAmount associated to the money expression as it appears in the text.
numeric_valueEquivalent numeric value of the amount of money.
currencyISO4217 value associated to the currency in the money expression. Different values are separated by the character |.
inipInitial position of the money expression.
endpEnd position of the money expression.

Other expression object

Some specific patterns will be considered known ones, and identified as such through the field type. The patterns detected are the following:

  • Spanish:
    • bank account number: 20 digits with the format xxxx xxxx xx xxxxxxxxxx
    • license plate: Spanish license plate with two formats: dddd-LLL and ddddLLL (where d are digits and L are capital letters)
    • id: national id document: ddddddddL or dddddddd-L (with d digits, L a capital letter)
  • All languages:
    • flight number: detects flight numbers with the format LLdddd (where d are digits and L are capital letters)

Other expression object attributes

NameDescription
formForm of expression.
typeType of expression (default: unknown)
inipInitial position of the expression.
endpEnd position of the expression.

Quotation object

Quotation object attributes

NameDescription
formContent of the quote as it appears in the text.
whoWho the quote is attributed to. It will have two fields, the form, and the lemma.
verbVerb associated to the quotation. It will have two fields, the form, and the lemma.
inipInitial position of the expression.
endpEnd position of the expression.

Quotations in direct speech will not always include information regarding who they are attributed to; in those cases the fields who and verb will not appear.

Relation object

The syntactic triples will be defined by subject-verb pairs, and all the complements associated to that verb. There are two possible exceptions to this:

  • Cases where the existing relation has an omitted verb (for example, appositions). In this case, the verb is assumed to be "to be" (or its equivalent, depending on the language), and its form will appear between parentheses.
  • The subject is omitted (very common in some languages such as Spanish), in which the subject will not appear.

Relation object attributes

NameDescription
formSentence in which the relation appears.
inipInitial position of the sentence the relation appears in.
endpEnd position of the sentence the relation appears in.
subjectSubject of the relation. In the cases where the subject is an anaphora, the anaphora will be solved and the details that will appear will be those of the element that solves it.
subject.formHow it appears in the text.
subject.lemma_listlist of lemma/s of the element. Coordinated elements by definition don't have a lemma, so the field will not appear.
subject.sense_id_listid associated to the entity or concept the subject refers to.
verbVerb of the relation
verb.formhow it appears in the text.
verb.lemma_listlist of lemmas of the verb.
verb.sense_id_listid associated to the verb.
verb.semantic_lemma_listList of semantic lemmas associated to the verb. It will only be included when its values are different than the ones in lemma_list.
complement_listList of complements of the verb.
complement_list[].formHow it appears in the text. Anaphoras will be solved to obtain this value.
complement_list[].typeType of complement. The different types of syntactic relations detected are included in the response of the Lemmatization, PoS and Parsing, specifically in the section regarding syntactic_tree_relation elements.
degreeDegree of proximity of the relation, that is, if the relation included is in the same sentence as the subject (in the cases where an anaphora has been solved, it won't be).

If a subject-verb pair appears several times in the same text, they will only appear once associated to the sentence they first appear in; the complement_list of the following appearances will be included in that relation.

Response examples

The format in which this information will be shown will depend on the value of the of parameter.

Arsene Wenger’s side sit third in the Premier League for the first time since September 22.

{
"status":{
"code":
"0"
"msg":
"OK"
"credits":
"1"
}
"entity_list":[
0:{
...
}
1:{
...
}
]
"time_expression_list":[
0:{
...
}
1:{
...
}
]
}

A thousand dollars could be spent trying to tackle a parking problem.

{
"status":{
"code":
"0"
"msg":
"OK"
"credits":
"1"
}
"concept_list":[
0:{
...
}
1:{
...
}
]
"money_expression_list":[
0:{
...
}
]
}

To cancel your flight, go to our web site www.example.com. If you do not see the option to revoke your flight online, call at 1 877 781 3229 to cancel your flight giving us your flight number (e.g: AA5683). Cancellations can be done until twenty four hours before flight

{
"status":{
"code":
"0"
"msg":
"OK"
"credits":
"1"
}
"entity_list":[
0:{
...
}
1:{
...
}
]
"quantity_expression_list":[
0:{
...
}
]
"other_expression_list":[
0:{
...
}
]
}

The child said that his brother was at Harvard University.

{
"status":{
"code":
"0"
"msg":
"OK"
"credits":
"1"
}
"quotation_list":[
0:{
...
}
]
"relation_list":[
0:{
...
}
1:{
...
}
]
}