An Example Data Science Model
The simple data science example below, performs stratified random sampling on the hald-small-binary dataset. Beginning in datasources, the contents of hald-small-binary.txt are imported into the mySrc data source. Next, under datasets, the mySrc data source is bound to the myData dataset while also selecting the Y variable as the stratum variable. Moving on to transformer, the mySampler transformer is constructed using the stratified random sampling algorithm with a sample size equal to 10. Lastly, under actions, the stratified random sampler, mySampler, "transforms", or samples from, the myData dataset.
{
modelName: "ExampleDM",
modelType: "datamining",
datasources: {
mySrc: {
type: 'csv',
connection: 'hald-small-binary.txt',
direction: 'import'
}
},
datasets: {
myData: {
binding: 'mySrc',
strataCol: 'Y'
}
},
transformer: {
mySampler: {
type: 'transformation',
algorithm: 'stratifiedSampling',
parameters: {
sampleSize: 10
}
}
},
actions: {
sampleData: {
data: 'myData',
action: 'transform',
evaluations: [
'transformation'
]
}
}
}
As in both the optimization and simulation examples above, the results are given in JSON which contain the ten sampled records: 1, 2, 3, 5, 6, 7, 8, 9, 11 and 13.
{
"status": {
"id": "2590+ExampleDM+2020-02-26-17-22-28-146373",
"code": 0,
"codeText": "Success"
},
"results": ["sampleData.transformation"],
"sampleData": {
"transformation": {
"objectType": "dataFrame",
"name": "Sample:myData",
"order": "col",
"rowNames": [
"Record 11", "Record 8", "Record 2", "Record 9", "Record 1",
"Record 7", "Record 13", "Record 6", "Record 3", "Record 5"
],
"colNames": ["X1", "X2", "X3", "X4", "Weights"],
"colTypes": ["double", "double", "double", "double", "double"],
"indexCols": null,
"data": [
[1, 1, 1, 2, 7, 3, 10, 11, 11, 7],
[40, 31, 29, 54, 26, 71, 68, 55, 56, 52],
[23, 22, 15, 18, 6, 17, 8, 9, 8, 6],
[34, 44, 52, 22, 60, 6, 12, 22, 20, 33],
[3, 2, 3, 1, 1, 1, 1, 1, 2, 1]
]
}
}
}
|