Cluster
Last updated
Last updated
The Cluster task in Gaio DataOS applies clustering algorithms to group records with similar characteristics. It's ideal for use cases such as customer segmentation, pattern recognition, and data-driven decision-making based on behavioral or structural profiles.
Gaio uses the K-Means technique to identify groups and analysis calculations are made in H2O, whose documentation can be accessed here.
In the Studio, go to the Tasks panel.
Under the Analytics section, select Cluster.
Task label: (optional) Name for identifying this step in your flow.
Result table: Output table that will contain the clustered results. Example: cluster_campaign.
Table name: Automatically populated with the selected table (e.g., new_sales
).
In the Exclude columns field, add columns that should not be considered in the clustering process, such as unique IDs (e.g., cod_cliente
).
This helps avoid bias or noise in the algorithm.
Execution time
Defines the maximum runtime of the clustering algorithm (in seconds).
Recommended: between 20 and 60 seconds, depending on dataset size and complexity.
Max cluster size
Sets the maximum number of clusters the algorithm can create.
Example: if set to 3
, the output will contain up to 3 distinct groups.
️ Automatic clusters size
When enabled, Gaio will automatically determine the ideal number of clusters based on the data's variability.
When disabled, it will strictly follow the manual limit set in Max cluster size.
Click Save to confirm the task configuration.
Run the flow — the output table will contain your clustered data.
The resulting table will include:
All original columns (excluding those set to be ignored)
A new column indicating the assigned cluster ID for each row
Use tasks like Sample or Principal Component Analysis (PCA) beforehand to reduce dimensionality and improve performance.
Remove irrelevant or high-cardinality columns that could distort clustering results.
Leverage clustering to personalize campaigns, identify customer profiles, detect anomalies, or support retention strategies.