If you are working with an on-premises installation, you will need to substitute api.meaningcloud.com
by your own server address.
multipart/form-data
Name | Description | Values | Notes |
---|---|---|---|
key | Authorization key for using MeaningCloud services. Create an account for free to create your key. | Required | |
of | Output format. | json xml | Optional. Default:json |
lang | It specifies the language in which the text is. | en: English es: Spanish it: Italian fr: French pt: Portuguese ca: Catalan da: Danish sv: Swedish no: Norwegian fi: Finnish zh: Chinese ru: Russian ar: Arabic | Required |
txt | This parameter will contain one or more texts, one per line. All the texts sent in this parameter will be assigned automatically the ID used to identify them at the output. The IDs will be numerical, and will start from 1. For mode =dg, more than one text needs to be sent. | UTF-8 encoded text (plain text, HTML or XML). | Required |
id | This parameter will contain the IDs associated to the input texts. Each ID will have to be included in a different line, and the number of IDs included has to be the same as the number of texts included in txt . | UTF-8 encoded text (plain text, HTML or XML). | Optional. Default: id="" |
mode | This parameter will define the approach used to carry out the clustering process. To read more about the possibilities check the Clustering modes section. | tm: Topic Modeling (default) dg: Document Grouping | Optional. Default: mode="tm" |
sw | Stopwords to be ignored by the algorithm, both in the clustering process, and as labels for the clusters. The valid format is a stopword per line (separated by linefeed "\n"). These stopwords are added to the ones used by default for the selected lang . | UTF-8 encoded. | Optional. Default: sw="" |
The current clustering modes available are the following:
txt
parameter by the n-gram that's most representative of its meaning. It's a change in the pipeline found in classical clustering algorithms, as it selects the representing labels before grouping the texts. This approach helps to discover hidden themes in document collections providing more descriptive labels than classical clustering algorithms. Cluster assignation is not exclusive (a text can belong to more than one cluster), and there will always exist a default cluster called Other Topics with the texts that do not belong to any other cluster.topic modeling
is the fact that cluster assignation is exclusive, that is, a text can only be assigned to a single cluster. In this case, labels are not as descriptive; they are composed by a collection of terms that describe the documents assigned to the cluster. For large collections, the label will be a single term.So, which one to choose? It will depend on your use case, but the main factors to take into account are thattopic modeling
gives more descriptive labels and more weight to outliers in the collection, while document grouping
is the only one that provides exclusive clustering.