Partitioning Example
Now let's take a look at a partitioning example.
{
"modelName": "Partitioning",
"modelDescription": "transformation: partitioning",
"modelType": "datamining",
"datasources": {
"mySrc": {
"type": "csv",
"connection": "hald-small-binary.txt",
"direction": "import"
}
},
"datasets": {
"myData": {
"binding": "mySrc"
}
},
"transformer": {
"myPartitioner": {
"type": "transformation", "algorithm": "partitioning",
"parameters": {
"partitionMethod": 'RANDOM',
"ratios": [
[ "training", 0.5 ],
[ "validation", 0.3 ],
[ "test", 0.2 ]
],
"seed": 123
}
}
},
"actions": {
"partitions": {
"data": "myData",
"action": 'transform',
"evaluations": [ 'transformation' ]
}
}
}
The "datasources" section in this example is identical to the previous Sampling Example. Inside of "datasets", the datasource "mySrc" is bound to the "myData" dataset. Since partitioning performs a transformation of the data, the "transformer" attribute is used with type "transformation" and algorithm "partitioning". Within "parameters", "RANDOM" is specified for "partitionMethod" which selects random partitioning as the type of partitioning to be performed.
In simple random sampling, every observation in the main dataset has equal probability of being selected for the partition dataset. For example, if you specify 60% for the training dataset, then 60% of the total observations are randomly selected for the training dataset. In other words, each observation has a 60% chance of being selected. Random partitioning uses the system clock as a default to initialize the random number seed. Alternatively, the random seed can be manually set which will result in the same observations being chosen for the training/validation/test sets each time a standard partition is created.
In this example, 50% of the records will be included in the training partition, 30% will be included in the validation partition and 20% will be included in the test partition.
Within "actions", the partitioning (or transformation) is performed on the MyData data set. The returned result will be the three different partitions: training, validation and test.
Getting model results: GET https://rason.net/api/model/2590+Partitioning+2020-01-20-01-18-37-436902/result
{
"status": {
"id": "2590+Partitioning+2020-01-20-01-18-37-436902",
"code":0,
"codeText":"Success"
},
"results":["partitions.transformation"],
"partitions":{
"transformation": {
"objectType": "dataFrameVector",
"name": "myData - Partitioned",
"data": {
"training": {
"objectType": "dataFrame",
"name": "training",
"order": "col",
"rowNames": ["Record 1", "Record 12", "Record 6", "Record 13", "Record 9", "Record 4", "Record 2"],
"colNames": ["Y", "X1", "X2", "X3", "X4", "Weights"],
"colTypes": ["double", "double", "double", "double", "double", "double"],
"indexCols": null,
"data": [
[0, 1, 1, 1, 0, 0, 0] ,
[7, 11, 11, 10, 2, 11, 1],
[26, 66, 55, 68, 54, 31, 29],
[6, 9, 9, 8, 18, 8, 15],
[60, 12, 22, 12, 22, 47, 52],
[1, 1, 1, 1, 1, 2, 3]
]
},
"validation": {
"objectType": "dataFrame",
"name": "validation",
"order": "col",
"rowNames": ["Record 11", "Record 3", "Record 10", "Record 7"],
"colNames": ["Y", "X1", "X2", "X3", "X4", "Weights"],
"colTypes": ["double", "double", "double", "double", "double", "double"],
"indexCols": null,
"data": [
[0, 1, 1, 1],
[1, 11, 21, 3],
[40, 56, 47, 71],
[23, 8, 4, 17],
[34, 20, 26, 6],
[3, 2, 1, 1]
]
},
"testing": {
"objectType": "dataFrame",
"name": "testing",
"rowNames": ["Record 5", "Record 8"],
"colNames": ["Y", "X1", "X2", "X3", "X4", "Weights"],
"colTypes": ["double", "double", "double", "double", "double", "double"],
"indexCols": null,
"data": [
[1, 0],
[7, 1],
[52, 31],
[6, 22],
[33, 44],
[1, 2]
]
}
}
}
}
}
From the results, we can see the records allocated to the training, validation and test partitions.
Training Partition Results
Index |
Y |
X1 |
X2 |
X3 |
X4 |
Weights |
1 |
0 |
7 |
26 |
6 |
60 |
1 |
12 |
1 |
11 |
66 |
9 |
12 |
1 |
6 |
1 |
11 |
55 |
9 |
22 |
1 |
13 |
1 |
10 |
68 |
8 |
12 |
1 |
9 |
0 |
2 |
54 |
18 |
22 |
1 |
4 |
0 |
11 |
31 |
8 |
47 |
2 |
2 |
0 |
1 |
29 |
15 |
52 |
3 |
Validation Partition Results
Index |
Y |
X1 |
X2 |
X3 |
X4 |
Weights |
11 |
0 |
1 |
40 |
23 |
34 |
3 |
3 |
1 |
11 |
56 |
8 |
20 |
2 |
10 |
1 |
21 |
47 |
4 |
26 |
1 |
7 |
1 |
3 |
71 |
17 |
6 |
1 |
Test Partition Results
Index |
Y |
X1 |
X2 |
X3 |
X4 |
Weights |
5 |
1 |
7 |
52 |
6 |
33 |
1 |
8 |
0 |
1 |
31 |
22 |
44 |
2 |
|