AutoML
Last updated
Last updated
Gaio uses technology to create predictive models H2O AutoML (Automatic Machine Learning). This means that Gaio operationalizes the connection to data, data processing, delivers training and modeling data and directives to H2O AutoML, retrieves the result of the execution and delivers the results in a user-friendly interface. This entire process can be automated within Gaio.
In the left-side menu, go to Analytics and select the AutoML task.
In the configuration screen:
Model Name(optional): Enter a name for your model (e.g., auto_ML
).
Table: Select the data source table.
Target: Choose the variable you want to predict (e.g., status
).
Columns to remove: If there are columns that should be excluded (such as IDs), list them here.
Training Time (Seconds): Estimated time the system will use to train the models.
Rows limit: By default, Gaio uses up to 100,000 rows to train the model. You can adjust this, but higher values may overload the server.
Click Save and Train to begin the process.
While training, the interface displays two progress bars:
Preparation: Data preprocessing stage.
Training: Model construction and testing.
Several techniques are used in the automatic modeling process. The following list contains the link to the official H2O documentation:
GLM: Generalized Linear Model.
XGBoost: Combination of multiple decision trees created in parallel.
GBM: Gradient Boosting Machine.
DeepLearning: use of Neural Networks.
Training and validation criteria are applied. Gaio uses Cross-Validation to evaluate whether the models are being assertive. A 5-Fold is used to generate 5 random samples of the same size that will be used to train several models, as shown in the image below:
The criterion for prioritizing the model is Accuracy .
Categorical (text) and Numeric are accepted as response variables. In the case of a numerical variable, it will always be considered that the desire is to predict the number and not to indicate the probability of that event occurring.
Once completed, the system will display a full report including:
Summary: A summary of the automatic model building process is generated, and the overall quality of the model is reported.
Model Accuracy: Shows the accuracy of the best model created.
ROC Curve: A visual representation of model performance.
Most Important Variables: Lists the top predictive features in order of importance.
Models: The list of all models that were created in the predetermined time with some model quality statistics.
Supporting Tables:
Cross Validation
Confusion Matrix
Gain Table
Maximum Metrics
The trained model is saved and can be reused through the Scoring task to apply predictions to new data.