SciELO - Scientific Electronic Library Online

 
vol.1 número24DETERMINACIÓN DE PUNTOS DE OPTIMIZACIÓN A TRAVÉS DE BALANCE DE EXERGÍA EN LA PLANTA DE PASTEURIZACIÓN DE LECHE “ESTACIÓN EXPERIMENTAL TUNSHI (RIOBAMBA - ECUADOR)”DIAGNÓSTICO SOCIOECONÓMICO DE LA PARROQUIA PUNGALÁ índice de autoresíndice de assuntospesquisa de artigos
Home Pagelista alfabética de periódicos  

Serviços Personalizados

Journal

Artigo

Indicadores

Links relacionados

  • Não possue artigos similaresSimilares em SciELO

Compartilhar


Perfiles

versão On-line ISSN 2477-9105

Resumo

MORALES-ONATE, Víctor; MORETA, Luis  e  MORALES-ONATE, Bolívar. SMOTEMD: a mixed data balancing algorithm for Big Data in R. Perfiles [online]. 2020, vol.1, n.24, pp.20-26. ISSN 2477-9105.  https://doi.org/10.47187/perf.v1i24.75.

Analyzing samples with unbalanced data is a challenge for those who should use them in terms of modeling. A context in which this happens is when the response variable is binary and one of its classes is very small in proportion to the total. For the modeling of binary variables, probability models such as logit or probit are usually used. However, these models present problems when the sample is not balanced and it is desired to elaborate the confusion matrix from which the predictive power of the model is evaluated. One technique that allows the observed data to be balanced is the SMOTE algorithm, which works with numerical data exclusively. This work is an extension of SMOTE such that it allows the use of mixed data (numerical and categorical). By using mixed data, this proposal also makes it possible to overcome the barrier of 65536 observations that the R software has when working with categorical data distances. Through a simulation study, it is possible to verify the benefits of the proposed algorithm: SMOTEMD for mixed data.

Palavras-chave : SMOTE; Classification; Unbalanced samples..

        · resumo em Espanhol     · texto em Espanhol     · Espanhol ( pdf )