Gaio DataOS
English
English
  • Welcome to the Gaio Platform documentation!
  • Documentação
    • General information
      • Functioning Structure
      • Login
      • Home page
    • Applications
    • Studio
      • Suit
      • Data Sources
      • Tasks
        • ETL
          • Builder
          • SQL
          • SQL External
          • Insert Table
          • Insert Row
          • Update
          • Delete
          • Create Table
          • Quick Table
          • pivot table
          • Unpivot Table
          • Run Process
          • Rest
          • Parameters to Table
          • Table to Parameters
          • Users
          • CSV Web
          • Google Planilhas
        • Analytics
          • Sample
          • AutoML
          • Scoring
          • Cluster
          • Main Components
          • Association Rules
          • Time Series
          • Python
        • Delivery
          • Report
          • Power Search
          • Content
          • Form
          • Exportar Output
          • Banner
          • Export CSV
          • Map
          • Insights
          • API
          • SMS
          • Whatsapp
          • Email
          • Network
      • Parameters
      • Forms
      • Files
      • Action Buttons
      • Top Menu
        • SQL
        • Edit Bucket Tables
        • Executions in Progress
        • Map Editor
        • Schedule
        • Models
        • Cognitive
        • Error log
        • Edit Dashboard
        • Chat - GPS
        • General Menu
    • Administration
      • Permissions
      • Users
      • Data Sources
      • Repositories
        • Creation
        • Data Management
      • Sharing
      • Schedules
      • Logs
    • Keyboard Shortcuts
  • IntegraƧƵes
    • LDAP
  • HUB
    • Exemplos
      • Tarefa ConteĆŗdo
        • Manual Form
  • FAQ
  • GestĆ£o de Servidor
    • Gaio startup
  • What's New!
    • Release Notes
      • 2022
  • Import
    • Editor de Mapas
    • ParĆ¢metros
    • FormulĆ”rios
    • Processos
    • Editar Dashboard
    • Administração
      • UsuĆ”rios
        • PermissƵes
      • Fontes de Dados
      • Repositórios
        • Criação
        • GestĆ£o de Dados
      • Compartilhamento
      • Agendamento
      • Log
    • Consulta
Powered by GitBook
On this page
  • 1. Configuration
  • 2. Results
  1. Documentação
  2. Studio
  3. Tasks
  4. Analytics

Cluster

PreviousScoringNextMain Components

Last updated 1 year ago

Traditionally used in Customer Segmentation, cluster analysis has multiple applications. Its purpose is to group very similar lines into groups. As a basic output, a table is generated with a new column where the created groups are defined.

Gaio uses the K-Means technique to identify groups and analysis calculations are made in H2O, whose documentation can be accessed here.

1. Configuration

To build a cluster analysis, simply click on the table that will be used, access the Tasks menu and choose Cluster.

  1. Set the task name.

  2. Define the name of the table that will be generated from the execution.

  3. Exclude unwanted fields in the group composition (clusters).

  4. Determine the maximum time for identifying groups.

  5. As for the number of groups, there are two options. The first is to let the platform identify on its own how many clusters make the most sense given the data used. The technology challenge is to place similar lines in the same cluster. Identical lines are easy to group together. The challenge begins when you start grouping different lines together. As this occurs, the "error" will increase and the technology will evaluate to have the most homogeneous groups possible, without generating a high volume of clusters.

  6. As an analyst, you can determine how many groups should be generated, for example, in the situation where in the company we are unable to build differentiated value propositions for more than 5 groups of customers. So, it may be interesting to define 5 clusters.

2. Results

As a result, the clusterPredict column will signal which group each row belongs to, in addition to repeating all columns from the source table.

To understand the differences between the groups created, descriptive statistical analyzes must be carried out for the numeric and categorical columns, such as:

  1. Numerical: Compare the different groups with means, minimums, maximums and standard deviations and thus understand which, for example, have higher average salaries

  2. Categorical : bar graphs comparing clusters and categorical variable, indicating in which cluster there is a greater concentration of men, for example, and in which there is a greater concentration of women.

Below are some comparison examples.