Do you have any questions? Write us an email or ask us through the feedback section.

Request

Endpoint:

POST
https://api.meaningcloud.com/clustering-1.1


If you are working with an on-premises installation, you will need to substitute api.meaningcloud.com by your own server address.

Content-Type:

multipart/form-data

Parameters:

NameDescriptionValuesNotes
keyAuthorization key for using MeaningCloud services. Create an account for free to create your key.Required
ofOutput format.json xmlOptional. Default:json
langIt specifies the language in which the text is.en: English
es: Spanish
it: Italian
fr: French
pt: Portuguese
ca: Catalan
da: Danish
sv: Swedish
no: Norwegian
fi: Finnish
zh: Chinese
ru: Russian
ar: Arabic
Required
txtThis parameter will contain one or more texts, one per line. All the texts sent in this parameter will be assigned automatically the ID used to identify them at the output. The IDs will be numerical, and will start from 1. For mode=dg, more than one text needs to be sent.UTF-8 encoded text (plain text, HTML or XML).Required
idThis parameter will contain the IDs associated to the input texts. Each ID will have to be included in a different line, and the number of IDs included has to be the same as the number of texts included in txt.UTF-8 encoded text (plain text, HTML or XML).Optional. Default: id=""
modeThis parameter will define the approach used to carry out the clustering process. To read more about the possibilities check the Clustering modes section.tm: Topic Modeling (default)
dg: Document Grouping
Optional. Default: mode="tm"
swStopwords to be ignored by the algorithm, both in the clustering process, and as labels for the clusters. The valid format is a stopword per line (separated by linefeed "\n"). These stopwords are added to the ones used by default for the selected lang.UTF-8 encoded.Optional. Default: sw=""

Clustering modes

The current clustering modes available are the following:

  • Topic modeling: this method groups the documents passed in the txt parameter by the n-gram that's most representative of its meaning. It's a change in the pipeline found in classical clustering algorithms, as it selects the representing labels before grouping the texts. This approach helps to discover hidden themes in document collections providing more descriptive labels than classical clustering algorithms. Cluster assignation is not exclusive (a text can belong to more than one cluster), and there will always exist a default cluster called Other Topics with the texts that do not belong to any other cluster.
  • Document grouping: this method implements the classic bisecting k-means algorithm. One of its most significant differences with topic modeling is the fact that cluster assignation is exclusive, that is, a text can only be assigned to a single cluster. In this case, labels are not as descriptive; they are composed by a collection of terms that describe the documents assigned to the cluster. For large collections, the label will be a single term.

So, which one to choose? It will depend on your use case, but the main factors to take into account are thattopic modeling gives more descriptive labels and more weight to outliers in the collection, while document grouping is the only one that provides exclusive clustering.