# Cluster

<figure><img src="/files/zJNAMj4RhSynvyH1cgse" alt=""><figcaption></figcaption></figure>

The **Cluster** task in Gaio DataOS applies clustering algorithms to **group records with similar characteristics**. It's ideal for use cases such as customer segmentation, pattern recognition, and data-driven decision-making based on behavioral or structural profiles.

Gaio uses the **K-Means** technique to identify groups and analysis calculations are made in H2O, whose documentation can be [accessed here](https://docs.h2o.ai/h2o/latest-stable/h2o-docs/data-science/k-means.html).

***

## How to Use the Cluster Task

***

### 1. **Open the Cluster Task**

* In the **Studio**, go to the **Tasks** panel.
* Under the **Analytics** section, select **Cluster**.

***

### 2. **Configure the Task**

* **Task label**: (optional) Name for identifying this step in your flow.
* **Result table:** Output table that will contain the clustered results. Example: `cluster_campaign.`
* **Table name**: Automatically populated with the selected table (e.g., `new_sales`).

***

### 3. **Exclude Columns (Optional)**

* In the **Exclude columns** field, add columns that should **not be considered** in the clustering process, such as unique IDs (e.g., `cod_cliente`).
* This helps avoid bias or noise in the algorithm.

***

### 4. **Adjust Execution Settings**

&#x20;**Execution time**

* Defines the **maximum runtime** of the clustering algorithm (in seconds).
* Recommended: between **20 and 60 seconds**, depending on dataset size and complexity.

&#x20;**Max cluster size**

* Sets the **maximum number of clusters** the algorithm can create.
* Example: if set to `3`, the output will contain up to 3 distinct groups.

**️ Automatic clusters size**

* When enabled, Gaio will **automatically determine the ideal number of clusters** based on the data's variability.
* When disabled, it will strictly follow the manual limit set in **Max cluster size**.

***

### 5. **Save and Run**

* Click **Save** to confirm the task configuration.
* Run the flow — the output table will contain your clustered data.

***

#### &#x20;Output

The resulting table will include:

* All original columns (excluding those set to be ignored)
* A new column indicating the **assigned cluster ID** for each row

***

#### Best Practices

* Use tasks like **Sample** or **Principal Component Analysis (PCA)** beforehand to reduce dimensionality and improve performance.
* Remove irrelevant or high-cardinality columns that could distort clustering results.
* Leverage clustering to personalize campaigns, identify customer profiles, detect anomalies, or support retention strategies.


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://docs.gaiodataos.com/tools/tasks/analytics/cluster.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
