Essential Rason Model Sections
As mentioned in Defining a Data Science Model, there are four essential sections that must exist in a single
Rason DM model. This Help topic describes each section, besides modelName and modelType, in more detail.
"datasources"
As mentioned above, this section is used to specify how the data will be acquired. Typically, data
will be contained in an external data source such as a delimited file, Excel workbook, or database.
This section, "datasources", is an object with user defined attributes where each attribute
defines an object with "type", "connection" and "direction" properties. The following example defines 3 data
sources: myTrainingData, myValidationData and myTestData.
"datasources": {
"myTrainingData":{
"type":"csv",
"connection":"PathToDataFilesOrTrainingData.txt",
"direction": "import"
},
"myValidationData":{
"type":"csv",
"connection":"PathToDataFilesOrValidationData.txt",
"direction": "import"
} ,
"myTestData":{
"type":"csv",
"connection":"PathToDataFilesOrTestData.txt",
"direction": "import"
}
}
In this example code snippet, three data sources are initialized: myTrainingData, myValidationData,
and myTestData. The "type" property describes the file type of the data file being imported into
the Rason model. In this case, the data for all three data sources is contained within a "CSV" file.
The "connection" property describes the location of each data file and the "direction" property
specifies whether the file is being imported or exported. The default for "direction" is "import".
Aside from "type" and "connection" properties, additional properties exist for specific types of
data sources such as "headerExists" for delimited files or "selection" for SQL database selection.
For the full list of properties for the "datasource" section, see the RASON Reference Guide. For
examples on how to import from various data sources, see both the RASON Reference Guide or the
Editor page on RASON.com.
Note: RASON V2020 makes it exceptionally easy to work with data sources in the Microsoft ecosystem, by creating
a Data Connection on the user's My Account
page on www.RASON.com. The RASON service supports the following data
connections.
- OneDrive and OneDrive for Business
- Common Data Service for Dynamics 365, Power Apps and Power Automate
- OData and CDS support for Power BI
- CData Cloud Hub support for access to 100+enterprise data sources.
For more information on how to create and maintain Data Connections, see the previous Data Connections
topic within the RASON Subscriptions topic.
"datasets"
The component, "datasets", is an object with user defined attributes where each attribute
defines an object with a "binding" property. The following example defines 2 data sets:
myTrainData and myValidData.
"datasets": {
"myTrainData": {
"binding": "myTrainSrc",
"targetCol": "Y"
},
"myValidData": {
"binding": "myValidSrc",
"targetCol": "Y"
}
},
In this example code snippet, two datasets are initialized, "myTrainData" and "myValidData".
Within "myTrainData", the dataSource "myTrainSrc" is bound to the "myTrainData" dataset. Likewise,
the dataSource "myValidSrc" is bound to the "myValidData" dataset.
The "binding" property specifies the data source to be bound. This attribute can be bound
to the output of, or data sources in, other stages. "Binding" is not applicable if the user
provides the data inline, i.e. enters data manually into the RASON model. For a list of all
properties that may appear in a given data set definition, see the RASON Reference Guide.
"estimator"/"transformer"
The "estimator" object estimates a model from the training data and stores the fitted model,
which may be used later. The "estimator" object implements the "fit" interface. The "transformer"
object is used to differentiate the algorithms that do not have a model, i.e. they do not implement
the "fit" interface. Rather, these algorithms implement the "transform" interface (only).
"estimator"
The "estimator" section defines the estimator used to fit the model. Estimators extract a model
from the input data. This model can be used by other using a dataset binding to the output.
This element is mutually exclusive with the "transformer" element. Both may not appear in the
same stage definition. An example of the estimator "baggingEstimator" is shown below.
"estimator": {
"baggingEstimator": {
"type": "classification",
"algorithm": "bagging",
"parameters": {
"numWeakLearners": 2,
"bootstrapSeed": 10
}
}
},
In this example, a new estimator, baggingEstimator, is initialized. This estimator will
perform a classification using the bagging algorithm. Two options, numWeakLearners and
bootstrapSeed, are specified.
Properties for "estimator" are:
- "type" – Must be one of the following: "classification", "regression", "clustering",
"textMining", "transformation", "timeSeries".
- "algorithm" – The selection for this property varies with the selected "type". See the
chart below to see which algorithms correspond to the selected "type".
Options for Type and Algorithm Properties
Type |
Algorithm Choice |
"classification" |
"boosting", "bagging", "neuralNetwork", "decisionTree", "randomTrees", "nearestNeighbors", "naiveBayes",
"discriminantAnalysis" or "logisticRegression" |
"regression" |
"boosting", "bagging", "neuralNetwork", "decisionTree", "randomTrees", "nearestNeighbors",
"linearRegression" |
"clustering" |
"kMeans" or "hierarchical" |
"textMining" |
"tfIdf" or "latentSemanticAnalysis" |
"transformation" |
"oneHotEncoding", "imputation", "rescaling", "principalComponentAnalysis", "binning", "factorization",
"canonicalVariateAnalysis", "syntheticDataGenerator", "summarization" |
"featureSelection" |
"univariate", "linearWrapping" or "logisticWrapping" |
"timeSeries" |
"addHoltWinters", "mulHoltWinters", "noTrendHoltWinters", "doubleExponential", "exponential",
"movingAverage", "arima" or "lagAnalysis" |
- "parameters" – The property options for "parameters" will vary depending on the algorithm
selected. For a complete list of properties for each algorithm, see the RASON Reference Guide.
-
"simulation" - In order to run the synthetic data generator, described later in this chapter,
“simulation”:{} must be called within the estimator. All parameters applying to the synthetic
data generator are passed within “simulation”. For example:
"estimator": {
"mlrEstimator": {
"type": "regression",
"algorithm": "linearRegression",
"parameters": {
"fitIntercept": true
},
"simulation": {
"metalogAuto": true,
"numMetalogTerms": [
["CRIM", 5],
["ZN", 5],
["INDUS", 5],
…
}
}
}
}
}
"transformer"
A "transformer" applies to estimators that do not fit a model but rather transform data, such as
Feature Selection or Sampling. Since no data is stored (i.e. transformers take data in and return
data out), transformation algorithms are represented by a single object. For example, when applying
a sampling algorithm to a dataset, there is nothing to estimate from the training data which results
in nothing to store in a model for future actions.
This element is mutually exclusive with the "estimator" element. Both may not appear in the
same RASON model. An example of the transformer "mySampler" (appearing in the Transformation -
Sampling.json RASON example on RASON.com) is shown below.
"transformer": {
"mySampler": {
"type": "transformation",
"algorithm": "sampling",
"parameters": {
"sampleSize": 4,
"replaceOption": "false",
"sortIndexes": "false",
"seed": 123
}
}
},
In the example code snippet above, the transformer "mySampler" is initialized. This transformer
will perform a "transformation" (type: transformation) using the sampling algorithm (algorithm: sampling).
Four options, sampleSize, replaceOption, sortIndexes and seed, are specified.
Properties for "transformer" are:
- "type" – Must be one of the following: "affinityAnalysis", "bigData", "featureSelection" or
"transformation".
- "algorithm" – The selection for this property varies with the selected "type". See the chart
below to see which algorithms correspond to the selected "type".
Options for Type and Algorithm Properties
Type |
Algorithm Choice |
"affinityAnalysis" |
"associationRules" |
"bigData" |
"sampling" or "summarization" |
"transformation" |
"sampling", "stratifiedSampling", "partitioning", "oversamplePartitioning", "categoryReduction",
"syntheticDataGenerator" and "summarization" |
"actions"
The estimator or transformer is applied to the data within the "actions" section. An example of the action
"nnpModel" ((RASON Example Models – Data Science – Regression – Fitted Models POSTed to Server --
NeuralNetworkPostFM.json) is shown below.
actions: {
"nnpModel": {
"trainData": 'myTrainData',
"estimator": 'nnpEstimator',
"export": 'json',
"action": "fit",
"evaluations": [
"trainingLog",
"neuronWeights",
"numEpochsUsed",
"trainingTime",
"stoppingReason",
"partitionCausedStopping"
]
}
},
In the example code snippet above, the "nnpModel" action is initialized. The model created from the
"nnpEstimator" (estimator: nnpEstimator) will be applied or "fit" (action: fit) to the "myTrainData"
dataset. Several results or "evaluations" are requested in the final results: the training log
(trainingLog), the neuron weights (neuronWeights), number of epochs (numEpochsUsed), the time spent
training the model (trainingTime), the reason the algorithm stopped (stoppingReason) and the partition
used to evaluate the performance of the algorithm (partitionCausedStopping).
Note the "export" property. This property posts the fitted model, in either JSON or PMML format, to the RASON
Server under the "modelName" property setting. Replace "export" ("export": "json/pmml") property with "binding"
property to export the fitted model contained within a JSON or XML file. If neither "export" or "binding"
properties are included within "actions", then the fitted model will only be produced in-memory. An in-memory
fitted model may be used in a decision flow. For more information on POSTing a fitted model to the RASON server
or exporting a fitted model, see the section POSTing/Exporting Fitted Models, below.
A second example of an action for a transformer is below. Notice there is no keyword to replace
"estimator" within "actions", as in the example above. When using a transformer, there's no
estimator/model, so the actions can unambiguously refer to the transformer object only.
transformer: {
mySampler: {
type: 'transformation',
algorithm: 'sampling',
parameters: {
sampleSize: 4,
replaceOption: false,
sortIndexes: false,
seed: 123
}
}
},
actions: {
sampleData: {
data: 'myData',
action: 'transform',
export: 'json',
evaluations: [
'transformation'
]
}
}
The following properties are available for the "actions" section:
Notes
-
The following transformation methods do not generate a fitted model: sampling, partitioning and SQL
transformation.
- The following transformation methods only produce a fitted model in JSON format: categoryreduction,
factorization, imputation, and principalcomponentsanalysis.
- The following estimators do not generated a fitted model: Feature Selection (logisticAnalysis,
linearWrapping, univariate)
- The following estimators only product a fitted model in JSON format: hierarchical and kmeans
clustering.
- The following estimator only produces a fitted model in PMML: affinityAnalysis.
"fittedModel" – Used when scoring a model, this property is used to reference the model
generated inside of the "model" object. For more information on scoring, see the example below.
"action" - Valid values for this property are "fit", "predict", "transform", or "forecast".
As the name suggests, "fit" fits the model given "estimator" and "trainData". The remaining options, "predict",
"transform" and "forecast", apply the fitted model for further options on partitions or new data.
"parameters" – The selection for this property depends on the "model" or "estimator" selected.
If these options are directly applicable to the prediction/transformation/forecast of the data within this action
specifically (i.e. the "successProbability" when classifying different datasets), you may use different values for
scoring each dataset using the same model. If using "numPrincipalComponents" when running Principal Components
Analysis, you may request a different number of components when transforming each dataset using the same PCA model. For all valid parameters and evaluations for each algorithm, see the Rason Reference Guide.
"evaluations" – This property specifies the results to be reported back to the user. Only
those evaluations specified for this property will be computed or reported. Evaluation results may either be 1.
A part of the RASON response or 2. Bound to a writeable datasource. In the example below, "fittedModelJson" and
"regressionSummary" are part of the RASON response while "influenceDiagnostics" is bound to the writeable
datasource "myExportSrc". To view this complete example, see LinearRegression.json on the Editor page on
RASON.com. Note: Some code has been removed from the example below for simplicity.
{
"modelName": "LinearRegression",
"modelDescription": "regression: linear model; scoring examples JSONLinearRegression.json and PMMLRegressor.json
use exported fitted model, mlrModel, to score new data",
"modelType": "datamining",
"datasources": {
"myTrainSrc": {
"type": "csv",
"connection": "hald-small-train.txt",
"direction": "import"
},
…
"myExportSrc": {
"type": "csv",
"content": "export",
"connection": "influence-diagnostics.csv",
"direction": "export"
}
},
"datasets": {
"myTrainData": {
"binding": "myTrainSrc",
"targetCol": "Y"
},
…
},
"estimator": {
"mlrEstimator": {
"type: "regression",
"algorithm": "linearRegression",
"parameters": {
"fitIntercept": true
}
}
},
"actions": {
"mlrModel": {
"trainData": "myTrainData",
"estimator": "mlrEstimator",
"action": "fit",
"evaluations": [
"fittedModelJson",
{
"name": "influenceDiagnostics",
"binding": "myExportSrc"
},
"regressionSummary"
…
]
…
}
}
Notes on exporting to a writable data source.
For more examples on exporting results to a writeable datasource, see the "datasources" topic in the
Rason Reference Guide.
|