Skip navigation
Universidade Federal da Bahia |
Repositório Institucional da UFBA
Use este identificador para citar ou linkar para este item: https://repositorio.ufba.br/handle/ri/33507
Registro completo de metadados
Campo DCValorIdioma
dc.contributor.advisorRios, Tatiane Nogueira-
dc.contributor.authorEustáquio, Fernanda Silva-
dc.creatorEustáquio, Fernanda Silva-
dc.date.accessioned2021-05-27T21:10:35Z-
dc.date.issued2021-05-27-
dc.date.submitted2020-04-16-
dc.identifier.urihttp://repositorio.ufba.br/ri/handle/ri/33507-
dc.description.abstractMost of the well-known and widely used conventional clustering algorithms, as k-Means and Fuzzy c-Means (FCM), were designed by assuming that, in most cases, the number of objects in a dataset will be greater than its number of dimensions (features). However, this assumption fails when a dataset consists of text documents or DNA microarrays, in which the number of dimensions is much bigger than the number of objects. Most studies have revealed that FCM and the fuzzy cluster validity indices (CVIs) perform poorly when they are used with high-dimensional data even when a similarity or dissimilarity measure suitable to this type of data is used. The problems faced by high dimensionality are known as the curse of dimensionality and some approaches such as feature transformation, feature selection, feature weighting, and subspace clustering were de ned to deal with thousands of dimensions. To be convinced that the number of dimensions should be maintained to learn as much as possible from an object and to know that just one subset of features might not be enough to all clusters, the soft subspace clustering technique was used in the proposed work. Besides FCM, three soft subspace algorithms, Simultaneous Clustering and Attribute Discrimination (SCAD), Maximum-entropy-regularized Weighted Fuzzy c-Means (EWFCM) and Enhanced Soft Subspace Clustering (ESSC) were performed to cluster three types of high-dimensional data (Gaussian mixture, text, microarray) and they were evaluated employing fuzzy CVIs instead of using external measures like Clustering Accuracy, Rand Index, Normalized Mutual Information, that use information from class labels, as usually done in most research studies. From the experimental results, in a general evaluation, all the clustering algorithms had similar performances highlighting that ESSC presented the best result and FCM was better than the remaining soft subspace algorithms. Besides the use of the soft subspace technique, in the search for the cause of the poor performance of the conventional techniques for high-dimensional data, it was investigated which distance measure or value of weighting fuzzy exponent (m) produced the best clustering result. Furthermore, the performance of nineteen fuzzy CVIs was evaluated by verifying if some tendencies and problems related to previous research studies are maintained when validating soft subspace clustering results. From the analysis made in this work, it was clear that the type of data was determinant to the performance of the clustering algorithms and fuzzy CVIs.pt_BR
dc.description.sponsorshipCoordenação de Aperfeiçoamento de Pessoal de Nível Superior (CAPES)pt_BR
dc.language.isoenpt_BR
dc.rightsAcesso Abertopt_BR
dc.subjectFuzzy cluster validity indicespt_BR
dc.subjectSoft subspace clusteringpt_BR
dc.subjectFuzzy clusteringpt_BR
dc.subjectFuzzy c-Means modelpt_BR
dc.subjectHigh-dimensional datapt_BR
dc.subjectAlgoritmo de agrupamentopt_BR
dc.titleOn fuzzy cluster validity indices for soft subspace clustering of high-dimensional datasetspt_BR
dc.typeDissertaçãopt_BR
dc.embargo.liftdate10000-01-01-
dc.contributor.refereesCamargo, Heloísa de Arruda-
dc.contributor.refereesMarcacini, Ricardo Marcondes-
dc.publisher.departamentUniversidade Federal da Bahiapt_BR
dc.publisher.departamentInstituto de Matemática e Estatísticapt_BR
dc.publisher.departamentDepartamento de Ciência da Computaçãopt_BR
dc.publisher.programem Ciência da Computaçãopt_BR
dc.publisher.initialsUFBApt_BR
dc.publisher.countrybrasilpt_BR
dc.subject.cnpqCiências Exatas e da Terrapt_BR
dc.subject.cnpqCiência da Computaçãopt_BR
Aparece nas coleções:Dissertação (PGCOMP)

Arquivos associados a este item:
Não existem arquivos associados a este item.
Mostrar registro simples do item Visualizar estatísticas


Os itens no repositório estão protegidos por copyright, com todos os direitos reservados, salvo quando é indicado o contrário.