Gaio DataOS
Gaio DataOS
Gaio DataOS
  • 👋 Welcome to Gaio DataOS
  • GETTING STARTED
    • Gaio DataOS Console
    • Quickstart
  • FUNDAMENTALS
    • Data Projects
    • Studio
    • Database
    • Workflow
  • Data Sources
  • TASKS
    • ETL
      • Builder
      • SQL
      • Source SQL
      • Insert Table
      • Insert Row
      • Update
      • Delete
      • Create Table
      • Quick Table
      • Quick Upload
      • Pivot Table
      • Unpivot Table
      • REST
      • Parameters to Table
      • Table to Parameters
      • Define parameter value
      • Users
      • CSV Web
      • CSV Local
      • Google Spreadsheet
    • Analytics
      • Sample
      • Cluster
      • Principal Components
      • Association Rules
      • Forecast
      • Python
    • Delivery
      • Content
      • Form Card
      • Export CSV
    • Map Editor
Powered by GitBook
On this page
  • How to Use the Cluster Task
  • 1. Open the Cluster Task
  • 2. Configure the Task
  • 3. Exclude Columns (Optional)
  • 4. Adjust Execution Settings
  • 5. Save and Run
  1. TASKS
  2. Analytics

Cluster

PreviousSampleNextPrincipal Components

Last updated 3 days ago

The Cluster task in Gaio DataOS applies clustering algorithms to group records with similar characteristics. It's ideal for use cases such as customer segmentation, pattern recognition, and data-driven decision-making based on behavioral or structural profiles.

Gaio uses the K-Means technique to identify groups and analysis calculations are made in H2O, whose documentation can be accessed here.


How to Use the Cluster Task


1. Open the Cluster Task

  • In the Studio, go to the Tasks panel.

  • Under the Analytics section, select Cluster.


2. Configure the Task

  • Task label: (optional) Name for identifying this step in your flow.

  • Result table: Output table that will contain the clustered results. Example: cluster_campaign.

  • Table name: Automatically populated with the selected table (e.g., new_sales).


3. Exclude Columns (Optional)

  • In the Exclude columns field, add columns that should not be considered in the clustering process, such as unique IDs (e.g., cod_cliente).

  • This helps avoid bias or noise in the algorithm.


4. Adjust Execution Settings

Execution time

  • Defines the maximum runtime of the clustering algorithm (in seconds).

  • Recommended: between 20 and 60 seconds, depending on dataset size and complexity.

Max cluster size

  • Sets the maximum number of clusters the algorithm can create.

  • Example: if set to 3, the output will contain up to 3 distinct groups.

️ Automatic clusters size

  • When enabled, Gaio will automatically determine the ideal number of clusters based on the data's variability.

  • When disabled, it will strictly follow the manual limit set in Max cluster size.


5. Save and Run

  • Click Save to confirm the task configuration.

  • Run the flow — the output table will contain your clustered data.


Output

The resulting table will include:

  • All original columns (excluding those set to be ignored)

  • A new column indicating the assigned cluster ID for each row


Best Practices

  • Use tasks like Sample or Principal Component Analysis (PCA) beforehand to reduce dimensionality and improve performance.

  • Remove irrelevant or high-cardinality columns that could distort clustering results.

  • Leverage clustering to personalize campaigns, identify customer profiles, detect anomalies, or support retention strategies.