AutoML
Last updated
Last updated
Gaio uses technology to create predictive models H2O AutoML (Automatic Machine Learning). This means that Gaio operationalizes the connection to data, data processing, delivers training and modeling data and directives to H2O AutoML, retrieves the result of the execution and delivers the results in a user-friendly interface. This entire process can be automated within Gaio.
Within Gaio, the process for creating predictive models is very simple.
Click on the table with historical data to train the models
From the Tasks menu, choose AutoML
Define the name of the model that will be saved by Gaio
Define what the response variable will be
Define the time that Gaio will have to search for patterns in the data
Exclude fields that don't make sense in training, such as Customer Code
Click Train or Save. Run the task and wait for the set time.
The model building interface is very simple and does not require specialized knowledge, however it is very important that the analyst knows what is happening when building models.
Several techniques are used in the automatic modeling process. The following list contains the link to the official H2O documentation:
GLM: Generalized Linear Model.
XGBoost: Combination of multiple decision trees created in parallel.
GBM: Gradient Boosting Machine.
DeepLearning: use of Neural Networks.
Training and validation criteria are applied. Gaio uses Cross-Validation to evaluate whether the models are being assertive. A 5-Fold is used to generate 5 random samples of the same size that will be used to train several models, as shown in the image below:
The criterion for prioritizing the model is Accuracy .
Categorical (text) and Numeric are accepted as response variables. In the case of a numerical variable, it will always be considered that the desire is to predict the number and not to indicate the probability of that event occurring.
After executing the AutoML task , the results are made available in a new object in the process. Below is an example whose response variable is categorical.
A summary of the automatic model building process is generated, and the overall quality of the model is reported.
The variables that most impacted the model are ordered. In the example above, Age was the variable that most contributed to predicting the event, reaching a 57.3% contribution.
The Summary screen is standard when entering the model result and provides the main information about the model chosen as the best.
The confusion matrix indicates the percentages of correct answers for each value of the categorical response variable (see image below).
The list of all models that were created in the predetermined time with some model quality statistics.
Circled in green are the model hits, where it coincided with what happened in the past. The red circles signal where the model made a mistake, differing from what happened in the past. In this example above, when the model says (first line) that the customer will not cancel, it gets it wrong 5 times and therefore gets 99.2% correct. However, when the model predicts that the customer will cancel, it is wrong 26 times, resulting in a 92.4% success rate. Overall, the accuracy (degree of success) is 97.3%.
In this run, 16 different models were generated, which are ordered from best to worst. In the columns on the right, some model quality indicators are presented , including the AUC (Area Under the curve) and the RMSE (Root Mean Square Error).