Optional Rason Data Science Sections
Rason DM also features several additional optional sections that can be used to further refine the Rason model. These additional sections are: data, fittedModel, preProcessor and weakLearner.
"data"
Data arrays may be defined and calculated in this optional section, to be used later in a data science method. Scalars, arrays or tables containing scalars maybe be defined in the data section. If pulling data from an external source, this section may be used to "bind" the data to an array or table.
In the example code below, data from the qty column from the parts_data data source is assigned to the parts table. Note: A table is created here, rather than an array, by the use of the valueCol property.
"data": {
"parts": {
"binding": "parts_data", "valueCol": "qty"
}
},
Properties available for data, are:
"fittedModel"
Used (only) when scoring a model. This section is similar to "datasets" but rather than refining imported data,
this section defines a model that you can bind to when performing an "action" such as "forecast", "predict", "fit"
or "transform".
In the example below, a previously fit linear regression model previously POSTed to the RASON server is used to
score the hald-small-score.txt dataset (which was imported into RASON as "dataSrc" and then bound to "myData").
{
"modelName": "PMMLRegressor",
"modelType": "datamining",
"modelDescription": "regression: linear model scoring from pmml",
"datasources": {
"dataSrc": {
"type": "csv",
"connection:" "hald-small-score.txt",
"direction": "import"
}
},
"datasets": {
"myData": {
"binding": "dataSrc"
}
},
"fittedModel": {
"mlrModel": {
"modelName": "LinearRegression"
}
},
"actions": {
"myDataPrediction": {
"data": "myData",
"fittedModel": "mlrModel",
"action": "predict",
"evaluations": [
"prediction"
]
}
}
}
This section includes two properties: "modelName" and "binding".
- Use "modelName" when the fitted model is residing on the RASON Server.
- Use "binding" when importing a file containing the fitted model.
See the section above for an example of each.
"preProcessor"
This optional section may be used for preliminarily data preparation or to compute values of some properties, which are passed later, at parse-time, to the RASON DM engine. This section is parsed once, before the model is parsed.
In the example below, "numLeafRecords", is defined within the "preProcessor" section and is then referenced within the estimator, "treeEstimator", to set the parameter, "minNumRecordsInLeaves".
"preProcessor": {
"numLeafRecords": {
"formula: "INT(MAX(1, ROWS(myTrainData) / 10))"
}
},
"estimator: {
"treeEstimator": {
"type": "classification",
"algorithm": "decisionTree",
"parameters": {
"priorProbMethod": "EMPIRICAL",
"minNumRecordsInLeaves": "numLeafRecords",
"maxNumNodes": 5,
"maxNumLevels": 3,
"maxNumSplits": 10,
"categoricalFeaturesNames": [ "X1" ],
"prunedTreeType": "MIN_ERROR"
}
}
},
The properties available for this section include:
"weakLearner"
This section is only required when a bagging or boosting estimator is specified in "estimator", and is used to define the weak learner used in these algorithms.
The following example defines the treeWeakLearner data source.
"weakLearner": {
"treeWeakLearner": {
"type": "classification",
"algorithm": "decisionTree",
"parameters": {
"minNumRecordsInLeaves": 2
}
}
},
In this example code snippet, weakLearner, "treeWeakLearner", is initialized to perform a classification (type: classification) using the decision tree algorithm as the weak learner for a bagging or boosting algorithm defined in within "estimator".
The following parameters are available for this section:
|