Principal Components

When you have a large set of columns (mainly numeric), it may be interesting to reduce them to a few columns that will well represent the variability existing in the different columns.

One method for this is Core Components. Gaio uses H2O arrow-up-rightto perform the calculations and summarize the data in a few columns. The algorithm accepts both numeric and categorical variables.


How to Use the PCA Task


1. Open the Principal Component Analysis Task

  • In the Studio, go to the Tasks panel.

  • Under the Analytics section, select on Principal Component Analysis.


2. Configure the Main Fields

  • Task label: (optional) Name for identifying this step in your flow.

  • Result table: Output table that will contain the principal components. Example: pca.

  • Source table: Automatically populated with the selected table (e.g., new_sales).

  • Components amount: define how many principal components you want to extract.


4. Select Columns to Remove (Optional)

  • In Columns to remove, you can exclude columns that should not be considered in the PCA calculation (e.g., IDs, codes, irrelevant fields).

  • This helps avoid bias and improves the quality of the results.


5. Save and Execute

  • After setting the configuration, click Save.

  • Run the flow — the output table will contain the extracted principal components.


Output

The resulting table will include:

  • The main components are presented in the first columns and then all the columns of the source table.

  • One or more columns representing the principal components (e.g., PCA_1, PCA_2, etc.)

  • A simplified dataset ready for further use in tasks like Clustering, AutoML, or 2D visualizations


Best Practices

  • Use PCA to:

    • Reduce the number of variables in datasets with many numeric features

    • Optimize performance of clustering or classification algorithms

    • Simplify visualizations when working with high-dimensional data

  • Combine PCA with tasks that benefit from dimensionality reduction, such as Cluster or Forecast.

Last updated