SciELO - Scientific Electronic Library Online

 
vol.2 número22DESARROLLO DE UNA FORMULACIÓN PARA LA CURTICIÓN DE PIEL CAPRINA CON ÁCIDO HÚMICO Y TARADESARROLLO DE UN BANCO DE PRUEBA PARA COLECTORES SOLARES DE AGUA CON TUBOS AL VACÍO: ANÁLISIS DE FUNCIONAMIENTO Y COMPARACIÓN CON DISPOSITIVOS COMERCIALES índice de autoresíndice de materiabúsqueda de artículos
Home Pagelista alfabética de revistas  

Servicios Personalizados

Revista

Articulo

Indicadores

Links relacionados

  • No hay articulos similaresSimilares en SciELO

Compartir


Perfiles

versión On-line ISSN 2477-9105

Resumen

MORALES-ONATE, Víctor  y  MORALES-ONATE, Bolívar. A robust clustering technique for a Big Data approach: CLARABD for Mixed data types. Perfiles [online]. 2019, vol.2, n.22, pp.87-97. ISSN 2477-9105.  https://doi.org/10.47187/perf.v2i22.68.

When a researcher does not have an a priori knowledge of the configuration of groups in a given data set, the need to perform a classification known as unsupervised classification emerges. In addition, the data set can be mixed (qualitative and/or quantitative data) or presented in large volumes. The kmeans algorithm, for example, does not allow the comparison of mixed data and is limited to a maximum of 65536 objects in the R software. K-medoids, on the other hand, allows the comparison of mixed data but also has the same limitation of objects that k-means does. The traditional CLARA algorithm can easily exceed this volume limitation, but it does not allow the comparison of mixed data. In this context, this work is an extension of the CLARA algorithm for mixed data, the CLARABD algorithm. Gower distance is central in CLARABD to make this extension, because it allows the comparison of mixed data and it is also possible to process a data set with more than 65536 observations. To show the benefits of the proposed algorithm, a simulation process has been carried out as well as an application to real data, obtaining consistent results in each case.

Palabras clave : Classification; CLARA; K medoids; mixed data types; R software.

        · resumen en Español     · texto en Español     · Español ( pdf )