1
resposta

Como gerar clusters através do algoritmo k-means para o conjunto de dados abaixo em java

Sou iniciante na programação e gostaria de gerar clusters através do algoritmo k-means para o conjunto de dados abaixo em java:

ExternalReviewer,Reviewer,0.41,0.5,0.57,1.0,1.0,1.0,false PaperFullVersion,Conference_announcement,0.0,0.32,0.0,0.11,0.51,0.75,false Conference,Conference,1.0,1.0,1.0,1.0,1.0,1.0,true Decision,Conference_proceedings,0.0,0.43,0.0,0.1,0.35,0.67,false Reviewer,Reviewer,1.0,1.0,1.0,1.0,1.0,1.0,false ProgramCommitteeChair,Chair,0.0,0.45,0.33,1.0,1.0,1.0,false Review,Review,1.0,1.0,1.0,1.0,1.0,1.0,true PaperAbstract,Abstract,0.0,0.7,0.64,1.0,1.0,1.0,true Document,Conference_document,0.4,0.54,0.45,1.0,1.0,1.0,true Co-author,Contribution_co-author,0.63,0.62,0.57,1.0,1.0,1.0,true Person,Person,1.0,1.0,1.0,1.0,1.0,1.0,true Chairman,Chair,0.94,0.71,0.59,0.07,1.0,1.0,true ExternalReviewer,Extended_abstract,0.77,0.41,0.22,0.05,0.0,0.18,false Author,Regular_author,0.49,0.42,0.42,1.0,1.0,1.0,true Rejection,Accepted_contribution,0.6,0.4,0.24,0.08,0.29,0.75,false Co-author,Contribution_1th-author,0.63,0.62,0.5,0.0,0.0,0.0,false AuthorNotReviewer,Reviewed_contribution,0.0,0.53,0.24,0.06,0.0,0.21,false ConferenceChair,Chair,0.56,0.5,0.5,1.0,1.0,1.0,false Co-author,Co-chair,0.82,0.6,0.38,0.0,0.0,0.0,false

1 resposta

Oi Roberto, tudo bom?

Para poder te ajudar preciso entender melhor como você quer tratar esses dados. Por exemplo:

O algoritimo k-means trabalha em cima de vetores (Conjuntos de números).

Se pensarmos em vetores de duas dimensões, podemos desenhar um ponto para cada vetor que temos na nossa base de dados.

Esses pontos são a amostra de dados que você coletou e o que esse algoritmo faz é agrupar seus dados em K grupos, fazendo com que a distância média dos vetores para o centro de cada grupo seja a menor possível.

Dado que esse algoritmo trabalha com vetores e vetores são conjuntos de números, você primeiro deve decidir como vai tratar as strings e bools dos seus dados.

Você tem uma definição melhor de como vai tratar esses dados?