# Copy of Principal Components

<figure><img src="/files/TQ8EsD1lmpCmF5bMZloO" alt=""><figcaption></figcaption></figure>

When you have a large set of columns (mainly numeric), it may be interesting to reduce them to a few columns that will well represent the variability existing in the different columns.

One method for this is Core Components. Gaio uses [H2O ](http://docs.h2o.ai/h2o/latest-stable/h2o-docs/data-science/pca.html)to perform the calculations and summarize the data in a few columns. The algorithm accepts both numeric and categorical variables.

***

## How to Use the PCA Task

***

### 1. **Open the Principal Component Analysis Task**

* In the **Studio**, go to the **Tasks** panel.
* Under the **Analytics** section, select on **Principal Component Analysis**.

***

### 2. **Configure the Main Fields**

* **Task label**: (optional) Name for identifying this step in your flow.
* **Result table:** Output table that will contain the principal components. Example: pca.
* **Source table:** Automatically populated with the selected table (e.g., `new_sales`).
* **Components amount:** define how many principal components you want to extract.

***

### 4. **Select Columns to Remove (Optional)**

* In **Columns to remove**, you can exclude columns that should **not** be considered in the PCA calculation (e.g., IDs, codes, irrelevant fields).
* This helps avoid bias and improves the quality of the results.

***

### 5. **Save and Execute**

* After setting the configuration, click **Save**.
* Run the flow — the output table will contain the extracted principal components.

***

#### Output

The resulting table will include:

* The main components are presented in the first columns and then all the columns of the source table.
* One or more columns representing the **principal components** (e.g., `PCA_1`, `PCA_2`, etc.)
* A simplified dataset ready for further use in tasks like **Clustering**, **AutoML**, or 2D visualizations

***

#### ✅ Best Practices

* Use PCA to:
  * Reduce the number of variables in datasets with many numeric features
  * Optimize performance of clustering or classification algorithms
  * Simplify visualizations when working with high-dimensional data
* Combine PCA with tasks that benefit from dimensionality reduction, such as **Cluster** or **Forecast**

![](/files/DWS11CqyKK7l9Gvo8qjZ)

In this example, as 5 components were defined, five columns were created.

{% hint style="info" %}
A report is being developed that will provide a diagnosis of the components created. For now, they are only generated, but it is not possible to identify what percentage of the data variability was concentrated in each component.
{% endhint %}


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://docs.gaiodataos.com/tools/tasks/analytics/copy-of-principal-components.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
