Principal Components
Last updated
Last updated
When you have a large set of columns (mainly numeric), it may be interesting to reduce them to a few columns that will well represent the variability existing in the different columns.
One method for this is Core Components. Gaio uses to perform the calculations and summarize the data in a few columns. The algorithm accepts both numeric and categorical variables.
In the Studio, go to the Tasks panel.
Under the Analytics section, select on Principal Component Analysis.
Task label: (optional) Name for identifying this step in your flow.
Result table: Output table that will contain the principal components. Example: pca.
Source table: Automatically populated with the selected table (e.g., new_sales
).
Components amount: define how many principal components you want to extract.
In Columns to remove, you can exclude columns that should not be considered in the PCA calculation (e.g., IDs, codes, irrelevant fields).
This helps avoid bias and improves the quality of the results.
After setting the configuration, click Save.
Run the flow — the output table will contain the extracted principal components.
The resulting table will include:
The main components are presented in the first columns and then all the columns of the source table.
One or more columns representing the principal components (e.g., PCA_1
, PCA_2
, etc.)
A simplified dataset ready for further use in tasks like Clustering, AutoML, or 2D visualizations
Use PCA to:
Reduce the number of variables in datasets with many numeric features
Optimize performance of clustering or classification algorithms
Simplify visualizations when working with high-dimensional data
Combine PCA with tasks that benefit from dimensionality reduction, such as Cluster or Forecast.