Topics Extraction 2.0 Documentation

Do you have any questions? Write us an email or ask us through the feedback section.

Response

Sample response:

{9 items
"status":{3 items
"code":"0"
"msg":"OK"
"credits":"1"
}
"entity_list":[7 items
0:{5 items
"form":"Robert Downey Jr"
"id":"__12123288058840445720"
"sementity":{...
}3 items
"variant_list":[...
]1 item
"relevance":"100"
}
1:{8 items
"form":"Forbes"
"id":"db0f9829ff"
"sementity":{...
}4 items
"semgeo_list":[...
]1 item
"semld_list":[...
]28 items
"semtheme_list":[...
]1 item
"variant_list":[...
]1 item
"relevance":"100"
}
2:{7 items
"form":"Iron Man"
"id":"529e97f38e"
"sementity":{...
}4 items
"semld_list":[...
]28 items
"semtheme_list":[...
]1 item
"variant_list":[...
]1 item
"relevance":"100"
}
3:{5 items
"form":"Dwayne Johnson"
"id":"__4280586672389134159"
"sementity":{...
}3 items
"variant_list":[...
]1 item
"relevance":"100"
}
4:{9 items
"form":"Bradley Cooper"
"official_form":"Bradley Charles Cooper"
"id":"3e7c9ae34b"
"sementity":{...
}4 items
"semgeo_list":[...
]1 item
"semld_list":[...
]28 items
"semtheme_list":[...
]1 item
"variant_list":[...
]1 item
"relevance":"100"
}
5:{7 items
"form":"Chris Hemsworth"
"id":"b2e6c3b771"
"sementity":{...
}4 items
"semld_list":[...
]28 items
"semtheme_list":[...
]1 item
"variant_list":[...
]1 item
"relevance":"100"
}
6:{7 items
"form":"Leonardo DiCaprio"
"id":"8119b88b6d"
"sementity":{...
}4 items
"semld_list":[...
]43 items
"semtheme_list":[...
]1 item
"variant_list":[...
]1 item
"relevance":"100"
}
]
"concept_list":[8 items
0:{6 items
"form":"magazine"
"id":"a0a1a5401f"
"sementity":{...
}4 items
"semld_list":[...
]28 items
"variant_list":[...
]1 item
"relevance":"100"
}
1:{7 items
"form":"actor"
"id":"99e6d7a3f6"
"sementity":{...
}4 items
"semld_list":[...
]28 items
"semtheme_list":[...
]1 item
"variant_list":[...
]1 item
"relevance":"100"
}
2:{7 items
"form":"star"
"id":"35d8a8e65d"
"sementity":{...
}4 items
"semld_list":[...
]1 item
"semtheme_list":[...
]1 item
"variant_list":[...
]1 item
"relevance":"100"
}
3:{7 items
"form":"star"
"id":"c5994b45cc"
"sementity":{...
}4 items
"semld_list":[...
]28 items
"semtheme_list":[...
]1 item
"variant_list":[...
]1 item
"relevance":"100"
}
4:{6 items
"form":"avenger"
"id":"65fdadcbff"
"sementity":{...
}4 items
"semld_list":[...
]17 items
"variant_list":[...
]1 item
"relevance":"100"
}
5:{7 items
"form":"film"
"id":"4e7e3490af"
"sementity":{...
}4 items
"semld_list":[...
]29 items
"semtheme_list":[...
]1 item
"variant_list":[...
]1 item
"relevance":"100"
}
6:{10 items
"form":"dollar"
"official_form":"United States dollar"
"id":"7b6858c50a"
"sementity":{...
}4 items
"semgeo_list":[...
]1 item
"semld_list":[...
]2 items
"semtheme_list":[...
]1 item
"standard_list":[...
]1 item
"variant_list":[...
]1 item
"relevance":"100"
}
7:{6 items
"form":"opponent"
"id":"d556261ad1"
"sementity":{...
}4 items
"semld_list":[...
]4 items
"variant_list":[...
]1 item
"relevance":"100"
}
]
"time_expression_list":[]0 items
"money_expression_list":[1 item
0:{6 items
"form":"$75m"
"amount_form":"75m"
"numeric_value":"7.5e+07"
"currency":"USD"
"inip":"189"
"endp":"192"
}
]
"quantity_expression_list":[]0 items
"other_expression_list":[]0 items
"quotation_list":[]0 items
"relation_list":[3 items
0:{7 items
"form":"The 49-year-old star of the Iron Man and Avengers films made an estimated $75m over the past year, beating rivals Dwayne Johnson, Bradley Cooper, Chris Hemsworth and Leonardo DiCaprio."
"inip":"115"
"endp":"297"
"subject":{...
}3 items
"verb":{...
}3 items
"complement_list":[]0 items
"degree":"1"
}
1:{7 items
"form":"Robert Downey Jr has topped Forbes magazine's annual list of the highest paid actors for the second year in a row."
"inip":"0"
"endp":"112"
"subject":{...
}3 items
"verb":{...
}2 items
"complement_list":[...
]2 items
"degree":"1"
}
2:{7 items
"form":"The 49-year-old star of the Iron Man and Avengers films made an estimated $75m over the past year, beating rivals Dwayne Johnson, Bradley Cooper, Chris Hemsworth and Leonardo DiCaprio."
"inip":"115"
"endp":"297"
"subject":{...
}3 items
"verb":{...
}3 items
"complement_list":[...
]1 item
"degree":"1"
}
]
}

Response object:

Name	Description
`status`	Describes the request outcome in terms of success or failure.
`status`.`code`	Numerical value of result code. Refer to the error code catalog.
`status`.`msg`	Human-readable error code, if any, or`OK`.
`status`.`credits`	Credits consumed by the request. A credit corresponds to a bucket of 500 words. Did you know...? Only successful requests consume credits.
`status`.`remaining_credits`	Credits left to reach the usage limit.
`entity_list`	Contains the named entities found in the text, represented as entity objects.
`concept_list`	Contains the concepts found in the text, represented as concept objects.
`time_expression_list`	Contains the time expressions found in the text, represented as time_expression objects.
`money_expression_list`	Contains the money expressions found in the text, represented as money_expression objects.
`quantity_expression_list`	[beta] Contains the quantity expressions found in the text, represented as quantity_expression objects.
`other_expression_list`	Contains the unknown alphanumeric patterns found in the text, represented as other_expression objects.
`quotation_list`	Contains the quotations found in the text, represented as quotation objects.
`relation_list`	Contains the syntactic triples (subject-action-object) found in the text, represented as relation objects.

Entity/Concept object

Both entities and concepts have the same basic structure even if some of the specific values found in each field are different. In the following explanation element will refer to both entity and concept objects.

Each element found will be a node in our ontology. There are two types of information associated to each element:

Basic element information, that is, the information specific to the element found. It tells which element it is (form), how many times and in which form it appears in the text (variant_list), its global relevance, if it belongs to a specific dictionary, and in the cases where it is a known element, its unique identifier, id in the ontology and known standards (standard_list).
Semantic information, or the different nodes to which the element node is related to. There are different aspects of semantic information: type of entity (sementity), geographical and thematic information (semgeo_list and semtheme_list) and other, more generic types (semrefer_list).

sementity will be the only semantic aspect of the element that will be mandatory, as it will be associated to the sense of the element found and each sense translates into an entity/concept object in the output. In terms of the ontology, sementity contains information from the node in the ODENTITY_TOP branch to which the element node found is related to. For example, London has two senses, last name and city, so in a scenario with no disambiguation, this will mean two entities will be found, each one with a different sementity object, one with the id ODENTITY_LAST_NAME and the other with the id ODENTITY_CITY.

The sementity element contains a field called type with the expanded hierarchy of the entity type that provides a much more intuitive grasp of the sense associated to the element. Each level of the hierarchy will follow a notation a bit more user-friendly than the node names seen until now: the entity type id will lose the prefix ODENTITY_, the underscores will be deleted and the capitalization will follow the upper CamelCase style. Using the previous examples:

ODENTITY_CITY -- City
ODENTITY_LAST_NAME -- LastName

sementity will also include an attribute called class, which will indicate if the element in question is an instance of the entity type, or if it is a class. In the case of an entity, this value will always be an instance, as a named entity is always an example of the class the node sementity represents. Elements with class=class will appear as concept objects.

semtheme_list is conformed by semtheme objects. semtheme is quite similar to sementity, instead of refering to the entity type (a node in the ODENTITY_TOP branch of the ontology), it points to the theme or themes the node belongs to (a node in the ODTHEME_TOP branch of the ontology). semtheme also contains a type field with the expanded version of the hierarchy; it follows the same pattern mentioned in sementity before but with ODTHEME_ as prefix:

ODTHEME_BASIC_SCIENCES -- BasicSciences
ODTHEME_MYTHOLOGY -- Mythology

There will be as many semtheme elements as themes the node relates to.

Both sementity and semtheme are characterized by always refering to class nodes. The rest of the semantic information associated to the entity will refer to instance nodes. The main difference this will show in the output is that classes will be identified by their name (e.g. ODENTITY_CITY) while instances will be referred to by a unique alphanumeric string that univocally identifies the node in the ontology (id).

Similarly to sementity and semtheme, semgeo (each element contained in semgeo_list) provides information on the node's hierarchy, although in this case the hierarchy corresponds to a geopolitical criteria. Instead of including the values in a single field and taking into account that some cases may be multiple inheritance (for instance, a mountain chain that belongs to two different countries), there will be specific object for each level which will be identified by its form and its node id.

semrefer_list will contain other references between the entity/concept node and other instances in the ontology. There are currently two types: organization, which links an instance of the ODENTITY_ORGANIZATION type (or its descendants), and affinity which shows an affinity relationship between the entity node and another instance in the ontology. Each object in semrefer will be represented but its form and its node id.

The last field included in an entity/concept, semld, is a mix of the two types of information described: it contains information specific to the node but said information are links to external ontologies such as SUMO, Wikipedia or YAGO.

The following table contains the fields that will appear in an entity and concept objects.

Entity/Concept object attributes

Name	Description
`form`	Form of the entity, in the language specified by`ilang`
`official_form`	Official form of the entity, like `United States` vs `United States of America`, in the language specified by `ilang`.
`dictionary`	User dictionary name where the entity is found.
`id`	Alphanumeric string that identifies uniquely the entity. This ID will correspond to the entity senseID in resources (which includes user dictionaries). If the entity is not in any of the resources but has been detected in the analysis, the ID will be specifically created for that analysis and will begin by two underscores.
`sementity`	Describes the entity
`sementity`.`class`	Contains the fixed value `instance` for entities.
`sementity`.`fiction`	Contains the value `fiction` for fictional elements or `nonfiction` for non-fictional.
`sementity`.`id`	Identifier of the node associated to the entity type.
`sementity`.`type`	provides a more user-friendly notation for the type classification hierarchy of the entity. It will start with the highest node (Top) and each level will separated by `>`. Top will always appear.
`sementity`.`confidence`	It will use the values `unknown` and `uncertain` to denote entity types infered from heuristic rules and ambiguous classifications, respectively.
`semgeo_list`	Geographical information the entity is associated to.
`semgeo_list[]`.`continent`	Continent-level information in the geographical hierarchy, represented as semgeo objects.
`semgeo_list[]`.`country`	Country-level information in the geographical hierarchy, represented as semgeo objects.
`semgeo_list[]`.`adm1`	adm1-level information in the geographical hierarchy, represented as semgeo objects.
`semgeo_list[]`.`adm2`	adm2-level information in the geographical hierarchy, represented as semgeo objects.
`semgeo_list[]`.`adm3`	adm3-level information in the geographical hierarchy, represented as semgeo objects.
`semgeo_list[]`.`city`	City-level information in the geographical hierarchy, represented as semgeo objects.
`semgeo_list[]`.`district`	District-level information in the geographical hierarchy, represented as semgeo objects.
`semld_list`	Provides a list of gateways to different open data sources. These gateways will be provided in two different formats: through a link or by providing an identifier to access the information. Refer to semld gateways to learn more.
`semrefer_list`	Includes references to other nodes in the ontology (instance type nodes) represented by semrefer objects.
`semtheme_list`	List of the thematic classifications, represented by semtheme objects.
`standard_list`	List of international standards relevant to the sense associated to the element, represented by standard objects.
`variant_list`	Alternative appearances of the entity/concept in the text, represented by variant objects.
`relevance`	Relative relevance of the entity in the text compared to the other entities found.
`subentity_list`	This element is composed of `subentity` elements that have exactly the same structure as `entity`. It applies only to `entity` objects.

Semgeo object

The geographical information extracted from the text is represented assemgeoobjects with the following structure:

Semgeo object attributes

Name	Description
`form`	Form of the country, city, etc.
`id`	Identifier of the node associated to the country, city, etc.
`standard_list`	Contains standard code names of the entity
`standard_list[]`.`id`	Name of the standard
`standard_list[]`.`value`	Name of the country, city, etc. in the given standard

Semld gateways

The following table includes the gateways associated to an identifier, and how to use it:

Source	Format	How to use it
SUMO	sumo:xxxxx	http://sigma-01.cim3.net:8080/sigma/Browse.jsp?kb=SUMO&term=xxxxx
Twitter	@xxxxx	http://twitter.com/xxxxx

Semrefer object

Semrefer object attributes

Name	Description
`organization`	Organizational relationships with the node specified through the subattributes `form` and `id`. An example of this type of relationship would be a company and its subsidiary.
`organization`.`form`	Form of the organization.
`organization`.`id`	Identifier of the node that represents the organization.
`affinity`	Affinity relationships with the node specified through the subattributes `form` and `id`. An example of this type of relationship would be a company and its subsidiary.
`affinity`.`form`	Form of the related entity.
`affinity`.`id`	Identifier of the node that represents the related entity.

Semtheme object

Semtheme object attributes

Name	Description
`id`	identifier of the node associated to the theme the entity belongs to.
`type`	provides a more user-friendly name of all the levels of the theme classification hierarchy. It will start with the highest node (Top) and each level will separated by `>`.

Standard object

Standard object attributes

Name	Description
`id`	Identifier of the standard.
`value`	Specific value in the standard

For example, the ISO3166-1 standard for countries will be identified as ISO3166-1-a2 when it refers to the two letters that identify each country and as ISO3166-1-a3 for the three-letter id. NYSE will be the value used to identify the ticker of a company that trades in the NY stock exchange.

These are all the values that may appear in id:

ID	Description
ISO3166-1-a2, ISO3166-1-a3	Country codes
BEL20, BMAD, BUENOSAIRES, BVL, CAC_40, CARACAS, CORROELECTRONICO, DAX_30, EURO_STOXX50, Euronext, FTSE_100, FTSE_LATIBEX, IBEX35, LSE, LuxSE, MAB, MEXICO, MIB, NASDAQ, NYSE, OMXH25, OMXS30, SANTIAGO, SMI, SP100	Stock exchanges
ISO4217	Currency codes
ISO639-1, ISO639-2, ISO639-3, ISO639-5	Languages codes
ISO8601	Dates standard

Variant object

Variant object attributes

Name	Description
`form`	The exact form found in the text
`inip`	The initial position of the appearance
`endp`	The final position of the appearance

Time expression object

For time expressions that refer to a specific date, the following format will be used to represent its associated value:

These are the values each field may have:

century, year, month, day, hour, minutes, seconds: numeric values
era: after Christ (aC), before Christ (dC)
season: spring (s), summer (v), autumn (a), winter (w)
weekday: Monday (m), Tuesday (t), Wednesday (w), Thursday (j), Friday (f), Saturday (s), Sunday (d)
timezone: must be specified either by using the standard timezones designations (CET, EST, etc.) or with the offset with respect to GMT, e.g.: GMT+02:00
+/- indicate references after/before the returned value (e.g. +2 days)
~ indicates approximate values

If an expression has no value for one of the positions, it will be empty.

These would be some examples of how this would look:

It's 7:30 in the evening -- |||||||19|30||
27th February at 3pm -- |||||2|27|15|||
5th june 2008 -- 21||||2008|6|5||||

This representation of the time will be used to calculate the value in actual_time, which will use as reference timeref and will return a date value in one of the following three formats: YYYY-MM-DD hh:mm:ss GMT±HH:MM, YYYY-MM-DD and hh:mm:ss GMT±HH:MM. For the examples seen and using as reference 2013-01-01 12:12:12 GMT+01:00, the result would be:

It's 7:30 in the evening -- 19:30:00 GMT+01:00
27th February at 3pm -- 2013-01-27 15:00:00 GMT+01:00
5th june 2008 -- 2008-06-05

In some cases, actual_time returns values that are not certain (for example, minutes and seconds in the second example), so a precision value is added to filter these out. The values for precision are the positions of the normalized_form field plus hourAMPM, minutesAMPM and secondsAMPM. This will result in obtaining different objects for it's 7:30 and it's 7:30 in the evening.

Time expression object attributes

Name	Description
`form`	Form of the time expression.
`normalized_form`	Normalized form associated to the time expression.
`actual_time`	Actual time relative to the given time reference, based on the normalized form.
`precision`	Level of precision for actual_time.
`inip`	Initial position of the time expression.
`endp`	End position of the time expression.

Money expression object

Lists of money expressions found in the text and represented as money_exppression objects.

It will be considered that there is a money expression when there is both a currency and an amount in a valid structure. The currency will be expressed using the ISO4217 and in the cases where more than one currency may apply, all the possible values will be returned separated by | and ordered alphabetically.

Money expression object attributes

Name	Description
`form`	Form of money expression.
`amount_form`	Amount associated to the money expression as it appears in the text.
`numeric_value`	Equivalent numeric value of the amount of money.
`currency`	ISO4217 value associated to the currency in the money expression. Different values are separated by the character `\|`.
`inip`	Initial position of the money expression.
`endp`	End position of the money expression.

Other expression object

Some specific patterns will be considered known ones, and identified as such through the field type. The patterns detected are the following:

Spanish:

bank account number: 20 digits with the format xxxx xxxx xx xxxxxxxxxx
license plate: Spanish license plate with two formats: dddd-LLL and ddddLLL (where d are digits and L are capital letters)
id: national id document: ddddddddL or dddddddd-L (with d digits, L a capital letter)

All languages:

flight number: detects flight numbers with the format LLdddd (where d are digits and L are capital letters)

Other expression object attributes

Name	Description
`form`	Form of expression.
`type`	Type of expression (default: unknown)
`inip`	Initial position of the expression.
`endp`	End position of the expression.

Quotation object

Quotation object attributes

Name	Description
`form`	Content of the quote as it appears in the text.
`who`	Who the quote is attributed to. It will have two fields, the form, and the lemma.
`verb`	Verb associated to the quotation. It will have two fields, the form, and the lemma.
`inip`	Initial position of the expression.
`endp`	End position of the expression.

Quotations in direct speech will not always include information regarding who they are attributed to; in those cases the fields who and verb will not appear.

Relation object

The syntactic triples will be defined by subject-verb pairs, and all the complements associated to that verb. There are two possible exceptions to this:

Cases where the existing relation has an omitted verb (for example, appositions). In this case, the verb is assumed to be "to be" (or its equivalent, depending on the language), and its form will appear between parentheses.
The subject is omitted (very common in some languages such as Spanish), in which the subject will not appear.

Relation object attributes

Name	Description
`form`	Sentence in which the relation appears.
`inip`	Initial position of the sentence the relation appears in.
`endp`	End position of the sentence the relation appears in.
`subject`	Subject of the relation. In the cases where the `subject` is an anaphora, the anaphora will be solved and the details that will appear will be those of the element that solves it.
`subject`.`form`	How it appears in the text.
`subject`.`lemma_list`	list of lemma/s of the element. Coordinated elements by definition don't have a lemma, so the field will not appear.
`subject`.`sense_id_list`	`id` associated to the `entity` or `concept` the subject refers to.
`verb`	Verb of the relation
`verb`.`form`	how it appears in the text.
`verb`.`lemma_list`	list of lemmas of the verb.
`verb`.`sense_id_list`	id associated to the verb.
`verb`.`semantic_lemma_list`	List of semantic lemmas associated to the verb. It will only be included when its values are different than the ones in `lemma_list`.
`complement_list`	List of complements of the verb.
`complement_list[]`.`form`	How it appears in the text. Anaphoras will be solved to obtain this value.
`complement_list[]`.`type`	Type of complement. The different types of syntactic relations detected are included in the response of the Lemmatization, PoS and Parsing, specifically in the section regarding `syntactic_tree_relation` elements.
`degree`	Degree of proximity of the relation, that is, if the relation included is in the same sentence as the `subject` (in the cases where an anaphora has been solved, it won't be).

If a subject-verb pair appears several times in the same text, they will only appear once associated to the sentence they first appear in; the complement_list of the following appearances will be included in that relation.

Response examples

The format in which this information will be shown will depend on the value of the of parameter.

Arsene Wenger’s side sit third in the Premier League for the first time since September 22.

{3 items
"status":{3 items
"code":"0"
"msg":"OK"
"credits":"1"
}
"entity_list":[2 items
0:{...
}8 items
1:{...
}8 items
]
"time_expression_list":[2 items
0:{...
}3 items
1:{...
}6 items
]
}

A thousand dollars could be spent trying to tackle a parking problem.

{3 items
"status":{3 items
"code":"0"
"msg":"OK"
"credits":"1"
}
"concept_list":[2 items
0:{...
}11 items
1:{...
}7 items
]
"money_expression_list":[1 item
0:{...
}6 items
]
}

To cancel your flight, go to our web site www.example.com. If you do not see the option to revoke your flight online, call at 1 877 781 3229 to cancel your flight giving us your flight number (e.g: AA5683). Cancellations can be done until twenty four hours before flight

{4 items
"status":{3 items
"code":"0"
"msg":"OK"
"credits":"1"
}
"entity_list":[2 items
0:{...
}5 items
1:{...
}5 items
]
"quantity_expression_list":[1 item
0:{...
}6 items
]
"other_expression_list":[1 item
0:{...
}4 items
]
}

The child said that his brother was at Harvard University.

{3 items
"status":{3 items
"code":"0"
"msg":"OK"
"credits":"1"
}
"quotation_list":[1 item
0:{...
}7 items
]
"relation_list":[2 items
0:{...
}7 items
1:{...
}7 items
]
}

Topics Extraction API version 2.0