> For the complete documentation index, see [llms.txt](https://docs.gaiodataos.com/llms.txt). Markdown versions of documentation pages are available by appending `.md` to page URLs; this page is available as [Markdown](https://docs.gaiodataos.com/tools/tasks/analytics/principal-components.md).

# Principal Components

<figure><img src="/files/TQ8EsD1lmpCmF5bMZloO" alt=""><figcaption></figcaption></figure>

When you have a large set of columns (mainly numeric), it may be interesting to reduce them to a few columns that will well represent the variability existing in the different columns.

One method for this is Core Components. Gaio uses [H2O ](http://docs.h2o.ai/h2o/latest-stable/h2o-docs/data-science/pca.html)to perform the calculations and summarize the data in a few columns. The algorithm accepts both numeric and categorical variables.

***

## How to Use the PCA Task

***

### 1. **Open the Principal Component Analysis Task**

* In the **Studio**, go to the **Tasks** panel.
* Under the **Analytics** section, select on **Principal Component Analysis**.

***

### 2. **Configure the Main Fields**

* **Task label**: (optional) Name for identifying this step in your flow.
* **Result table:** Output table that will contain the principal components. Example: pca.
* **Source table:** Automatically populated with the selected table (e.g., `new_sales`).
* **Components amount:** define how many principal components you want to extract.

***

### 4. **Select Columns to Remove (Optional)**

* In **Columns to remove**, you can exclude columns that should **not** be considered in the PCA calculation (e.g., IDs, codes, irrelevant fields).
* This helps avoid bias and improves the quality of the results.

***

### 5. **Save and Execute**

* After setting the configuration, click **Save**.
* Run the flow — the output table will contain the extracted principal components.

***

#### Output

The resulting table will include:

* The main components are presented in the first columns and then all the columns of the source table.
* One or more columns representing the **principal components** (e.g., `PCA_1`, `PCA_2`, etc.)
* A simplified dataset ready for further use in tasks like **Clustering**, **AutoML**, or 2D visualizations

***

#### Best Practices

* Use PCA to:
  * Reduce the number of variables in datasets with many numeric features
  * Optimize performance of clustering or classification algorithms
  * Simplify visualizations when working with high-dimensional data
* Combine PCA with tasks that benefit from dimensionality reduction, such as **Cluster** or **Forecast.**