Sample

The Sample task in Gaio DataOS allows you to extract a subset of data from a table in a simple and controlled way. This functionality is ideal for testing, validation, initial visualizations, or preprocessing in Machine Learning workflows.


How to Use the Sample Task


1. Add the Sample Task to Your Flow

  • In the Studio, go to the Tasks panel.

  • Under the Analytics section, select Sample task.


2. Configure the Main Fields

  • Task label: (optional) Provide a name for this task within your flow.(default: sample)

  • Result table: name of the output table that will contain the sampled data (e.g., sample_sample)


3. Choose the Sampling Type

You can choose between two options:

Percentage

  • Allows you to define the percentage of rows to be sampled from the original table.

  • You can adjust the slider or manually input the value.

  • Example: 0.7 (70%) → returns 70% of the rows from the source table.

Rows

  • Allows you to define a fixed number of rows to extract as a sample.

  • Example: 1,000 → the output table will contain exactly 1,000 randomly selected rows.


4. Save and Execute

  • Once you’ve configured the sample type and value, click Save.

  • Run the flow — a new table will be generated based on the selected sample configuration.


Best Practices

  • Use the Sample task to:

    • Reduce dataset size during development or dashboard previews.

    • Create smaller datasets for training ML models.

    • Test queries and transformations without processing the full dataset.

  • Combine with other tasks like AutoML, Cluster, or Scoring to streamline your experimentation and modeling.


Last updated