The current clustering modes available are the following:
txtparameter by the n-gram that's most representative of its meaning. It's a change in the pipeline found in classical clustering algorithms, as it selects the representing labels before grouping the texts. This approach helps to discover hidden themes in document collections providing more descriptive labels than classical clustering algorithms. Cluster assignation is not exclusive (a text can belong to more than one cluster), and there will always exist a default cluster called Other Topics with the texts that do not belong to any other cluster.
topic modelingis the fact that cluster assignation is exclusive, that is, a text can only be assigned to a single cluster. In this case, labels are not as descriptive; they are composed by a collection of terms that describe the documents assigned to the cluster. For large collections, the label will be a single term.
So, which one to choose? It will depend on your use case, but the main factors to take into account are that
topic modeling gives more descriptive labels and more weight to outliers in the collection, while
document grouping is the only one that provides exclusive clustering.