# Principal Components

<figure><img src="https://1671598980-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2F0muDd0LkZG6CmTQvGQ4D%2Fuploads%2Fib8PQq8CJ8XvHtHueFes%2FPCA.png?alt=media&#x26;token=cc65dfa4-4a6a-4749-a0ed-22a8b059d8d1" alt=""><figcaption></figcaption></figure>

When you have a large set of columns (mainly numeric), it may be interesting to reduce them to a few columns that will well represent the variability existing in the different columns.

One method for this is Core Components. Gaio uses [H2O ](http://docs.h2o.ai/h2o/latest-stable/h2o-docs/data-science/pca.html)to perform the calculations and summarize the data in a few columns. The algorithm accepts both numeric and categorical variables.

***

## How to Use the PCA Task

***

### 1. **Open the Principal Component Analysis Task**

* In the **Studio**, go to the **Tasks** panel.
* Under the **Analytics** section, select on **Principal Component Analysis**.

***

### 2. **Configure the Main Fields**

* **Task label**: (optional) Name for identifying this step in your flow.
* **Result table:** Output table that will contain the principal components. Example: pca.
* **Source table:** Automatically populated with the selected table (e.g., `new_sales`).
* **Components amount:** define how many principal components you want to extract.

***

### 4. **Select Columns to Remove (Optional)**

* In **Columns to remove**, you can exclude columns that should **not** be considered in the PCA calculation (e.g., IDs, codes, irrelevant fields).
* This helps avoid bias and improves the quality of the results.

***

### 5. **Save and Execute**

* After setting the configuration, click **Save**.
* Run the flow — the output table will contain the extracted principal components.

***

#### Output

The resulting table will include:

* The main components are presented in the first columns and then all the columns of the source table.
* One or more columns representing the **principal components** (e.g., `PCA_1`, `PCA_2`, etc.)
* A simplified dataset ready for further use in tasks like **Clustering**, **AutoML**, or 2D visualizations

***

#### Best Practices

* Use PCA to:
  * Reduce the number of variables in datasets with many numeric features
  * Optimize performance of clustering or classification algorithms
  * Simplify visualizations when working with high-dimensional data
* Combine PCA with tasks that benefit from dimensionality reduction, such as **Cluster** or **Forecast.**
