Gaio DataOS
Gaio DataOS
Gaio DataOS
  • 👋 Welcome to Gaio DataOS
  • GETTING STARTED
    • Gaio DataOS Console
    • Quickstart
  • FUNDAMENTALS
    • Data Projects
    • Studio
    • Database
    • Workflow
  • Data Sources
  • TASKS
    • ETL
      • Builder
      • SQL
      • Source SQL
      • Insert Table
      • Insert Row
      • Update
      • Delete
      • Create Table
      • Quick Table
      • Quick Upload
      • Pivot Table
      • Unpivot Table
      • REST
      • Parameters to Table
      • Table to Parameters
      • Define parameter value
      • Users
      • CSV Web
      • CSV Local
      • Google Spreadsheet
    • Analytics
      • Sample
      • Cluster
      • Principal Components
      • Association Rules
      • Forecast
      • Python
    • Delivery
      • Content
      • Form Card
      • Export CSV
    • Map Editor
Powered by GitBook
On this page
  • How to Use the PCA Task
  • 1. Open the Principal Component Analysis Task
  • 2. Configure the Main Fields
  • 4. Select Columns to Remove (Optional)
  • 5. Save and Execute
  1. TASKS
  2. Analytics

Principal Components

PreviousClusterNextAssociation Rules

Last updated 2 days ago

When you have a large set of columns (mainly numeric), it may be interesting to reduce them to a few columns that will well represent the variability existing in the different columns.

One method for this is Core Components. Gaio uses to perform the calculations and summarize the data in a few columns. The algorithm accepts both numeric and categorical variables.


How to Use the PCA Task


1. Open the Principal Component Analysis Task

  • In the Studio, go to the Tasks panel.

  • Under the Analytics section, select on Principal Component Analysis.


2. Configure the Main Fields

  • Task label: (optional) Name for identifying this step in your flow.

  • Result table: Output table that will contain the principal components. Example: pca.

  • Source table: Automatically populated with the selected table (e.g., new_sales).

  • Components amount: define how many principal components you want to extract.


4. Select Columns to Remove (Optional)

  • In Columns to remove, you can exclude columns that should not be considered in the PCA calculation (e.g., IDs, codes, irrelevant fields).

  • This helps avoid bias and improves the quality of the results.


5. Save and Execute

  • After setting the configuration, click Save.

  • Run the flow — the output table will contain the extracted principal components.


Output

The resulting table will include:

  • The main components are presented in the first columns and then all the columns of the source table.

  • One or more columns representing the principal components (e.g., PCA_1, PCA_2, etc.)

  • A simplified dataset ready for further use in tasks like Clustering, AutoML, or 2D visualizations


Best Practices

  • Use PCA to:

    • Reduce the number of variables in datasets with many numeric features

    • Optimize performance of clustering or classification algorithms

    • Simplify visualizations when working with high-dimensional data

  • Combine PCA with tasks that benefit from dimensionality reduction, such as Cluster or Forecast.

H2O