Classifier¶
-
class
mangrove_surface.wrapper.classifier.
ClassifierWrapper
¶ Classifier resource
A classifier provides
- the list relevants features (including level, weight, discretization attributes)
- the assessments over each train/test schemas
- method to export scores over
- method to improve classifier
-
add_schema
(type_schm, schema, name=None)¶ Upload a new schema of datasets
Parameters: - type_schm –
train
,test
orexport
- schema – a python dictionary recording datasets like this
{ "tags": ["dataset", "tag"], "datasets": [ { "name": "Dataset Name", "filepath": "/path/to/dataset.csv", "tags": ["optional", "tags"], "central": True | False, "keys: ["index"], # optional if there is only # one dataset "separator": ",", # could be `|`, `,`, `;` or ` ` }, ... ] }
- type_schm –
-
add_schema_and_export
(schema, name=None, modalities=[], bin_format='label', raw_variables=[], binned_variables=[], predicted_modality=False)¶ Upload a new schema and export it
Parameters: - schema – a python dictionary of datasets
(see
add_schema()
:) - name – (optional) the schema name
- modalities – (optional) the modalities scored. If no modality is provided then scores are not provided (only variables)
- raw_variables – the list of variables to export as raw value
- binned_variables – the list of variables to export as binned value
- bin_format – (default:
label
) select how to express the binned variables.label
(default) to express value as its intervals or groups, orid
to express value as a concise value - predicted_modality – provided a column with the predicted value
if
predicted_modality==True
(defaultpredicted_modality==False
)
- schema – a python dictionary of datasets
(see
-
compatible_schemas
(test=True, export=True)¶ List compatible schemas (with there type)
-
compute_assessments
(schm_name, outcome_modality=None)¶ Compute assessment over schema named
schm_name
(focus on modalityoutcome_modality
)Parameters: - schm_name – name of the schema used to compute assessments
- outcome_modality – the modality used to compute assessments (by default assessments is computed over the main modality)
-
compute_export
(schm_name, export_name=None, modalities=[], bin_format='label', raw_variables=[], binned_variables=[], predicted_modality=False)¶ Compute a new export
Parameters: - schm_name – the dataset schema which is exported
- export_name – name of the export
- modalities – (optional) the modalities scored. If no modality is provided then scores are not provided (only variables)
- raw_variables – the list of variables to export as raw value
- binned_variables – the list of variables to export as binned value
- bin_format – (default:
label
) select how to express the binned variables.label
(default) to express value as its intervals or groups, orid
to express value as a concise value - predicted_modality – provided a column with the predicted value
if
predicted_modality==True
(defaultpredicted_modality==False
)
-
discretization_attribute
(*args, **kwargs)¶ Return the discretization attribute of the contributive feature
name
Parameters: name – feature name >>> classifier.discretization_attribute("Car_Type") [ { 'coverage': 0.0248497, 'frequency': 529, 'target_distribution': { '0': 0.837429, '1': 0.162571 }, 'value_list': ['Full-size luxury car'] }, ... ]
-
download
(*args, **kwargs)¶ Download the classifier
Parameters: filepath – the filepath where store the classifier
-
exports
()¶ List all exports
-
feature
(*args, **kwargs)¶ Information about feature
name
It returns level, weight, discretization attributes.
Parameters: name – feature name >>> classifier.feature('Car_Type') { 'level': 0.103459, 'maximum_a_posteriori': True, 'name': 'Car_Type', 'nb_parts': 4, 'parts': [ { 'coverage': 0.0248497, 'frequency': 529, 'target_distribution': { '0': 0.837429, '1': 0.162571 }, 'value_list': ['Full-size luxury car'] }, ... ], 'weight': 0.832425 }
-
feature_set
(*args, **kwargs)¶ Return the underlying feature set
Note
This feature set could be used to change type, unused some features
-
features
(*args, **kwargs)¶ List all the features used by the current classifier
>>> classifier.features() [ { 'level': 0.103459, 'maximum_a_posteriori': True, 'name': 'Car_Type', 'nb_parts': 4, 'parts': [ { 'coverage': 0.0248497, 'frequency': 529, 'target_distribution': { '0': 0.837429, '1': 0.162571 }, 'value_list': ['Full-size luxury car'] }, ... ], 'weight': 0.832425 }, ... ]
-
improve
(name=None, tags=[], nb_aggregates=None, maximum_features=None)¶ Create a new classifier
Parameters: - name – (optional) classifier name
- tags – (optional) list of project tag
- nb_aggregates – (optional) number of aggregates generated for the new classifier
- maximum_features – (optional) maximal number of features used by the new classifier
Raises: MangroveError – if the number of requested aggregates is provided and it is smaller than
.nb_aggregates()
-
level
(*args, **kwargs)¶ Return the level of the feature named
name
Parameters: name – feature name The level indicates the correlation between the feature and the outcome
-
nb_aggregates
()¶ Return the number of aggregates
-
outcome
()¶ Outcome field predicted by the current classifier
-
set_unused
(*args, **kwargs)¶ Set feature
name
unusedParameters: name – feature name
-
update_name
(new_name)¶ Update the classifier name
Parameters: new_name – new classifier name
-
weight
(*args, **kwargs)¶ Return the weight of the feature named
name
Parameters: name – feature name The weight indicates how the feature discriminates more than others relevant features (with level > 0)
Assessment¶
-
class
mangrove_surface.wrapper.classifier_evaluation_report.
ClassifierEvaluationReportWrapper
¶ Classifier Evaluation Report resource
-
ACC
()¶ Accuracy
Note
- This method has some alias:
ACC
-
AUC
(*args, **kwargs)¶ Area under curve
-
DOR
()¶ Diagnostic odds ratio
Note
- This method has some alias:
DOR
-
F1_score
(outcome_modality=None)¶ F1 score
Parameters: outcome_modality – (optional) the modality
-
FDR
(outcome_modality)¶ False discovery rate
Parameters: outcome_modality – (optional) the modality Note
- This method has some alias:
FDR
-
FNR
(outcome_modality=None)¶ False negative rate
Parameters: outcome_modality – (optional) the modality Note
- This method has some alias:
FNR
miss_rate
-
FOR
(outcome_modality)¶ False omission rate
Parameters: outcome_modality – (optional) the modality Note
- This method has some alias:
FOR
-
FPR
(outcome_modality=None)¶ False positive rate
Parameters: outcome_modality – (optional) the modality Note
- This method has some alias:
FPR
fall_out
-
LRm
()¶ Negative Likehood ratio
Note
- This method has some alias:
LRp
-
LRp
()¶ Positive Likehood ratio
Note
- This method has some alias:
LRp
-
NPV
(outcome_modality)¶ Negative predictive value
Parameters: outcome_modality – (optional) the modality Note
- This method has some alias:
NPV
-
PPV
(outcome_modality=None)¶ Precision
Parameters: outcome_modality – (optional) the modality Note
- This method has some alias:
positive_predictive_value
-
SPC
(outcome_modality=None)¶ True negative rate
Parameters: outcome_modality – (optional) the modality Note
- This method has some alias:
TNR
specificity
SPC
-
TNR
(outcome_modality=None)¶ True negative rate
Parameters: outcome_modality – (optional) the modality Note
- This method has some alias:
TNR
specificity
SPC
-
TPR
(outcome_modality=None)¶ True positive rate
Parameters: outcome_modality – (optional) the modality Note
- This method has some alias:
recall
TPR
sensitivity
probability_of_detection
-
accuracy
()¶ Accuracy
Note
- This method has some alias:
ACC
-
area_under_curve
(*args, **kwargs)¶ Area under curve
-
auc
(*args, **kwargs)¶ Area under curve
-
confusion_matrix
(*args, **kwargs)¶ Confusion matrix
- ::
>>> ass.confusion_matrix() { 'matrix': [ [13376, 1393], [ 683, 4084] ], 'modalities': ['N', 'Y'] }
-
diagnostic_odds_ratio
()¶ Diagnostic odds ratio
Note
- This method has some alias:
DOR
-
fall_out
(outcome_modality=None)¶ False positive rate
Parameters: outcome_modality – (optional) the modality Note
- This method has some alias:
FPR
fall_out
-
false_discovery_rate
(outcome_modality)¶ False discovery rate
Parameters: outcome_modality – (optional) the modality Note
- This method has some alias:
FDR
-
false_negative
(outcome_modality)¶ Number of false negative errors of the
outcome_modality
False negative = incorrectly rejected
Parameters: outcome_modality – (optional) compute the number of incorrect rejection of the modality
-
false_negative_rate
(outcome_modality=None)¶ False negative rate
Parameters: outcome_modality – (optional) the modality Note
- This method has some alias:
FNR
miss_rate
-
false_omission_rate
(outcome_modality)¶ False omission rate
Parameters: outcome_modality – (optional) the modality Note
- This method has some alias:
FOR
-
false_positive
(outcome_modality=None)¶ Number of incorrect predictions
False positive = incorrectly identified
Parameters: outcome_modality – (optional) compute the number of incorrect prediction associated to this modality Raises: KeyError – if the outcome_modality
does not exist>>> ass.false_positive() 2076 >>> ass.false_positive('Y') 4084
-
false_positive_rate
(outcome_modality=None)¶ False positive rate
Parameters: outcome_modality – (optional) the modality Note
- This method has some alias:
FPR
fall_out
-
gini
()¶ Gini coefficient
-
instances
(outcome_modality=None)¶ Number of instances evaluated
-
lift_curve
(*args, **kwargs)¶ Lift curve over the schema
Parameters: using – is classifier
oroptimal
; by default the lift curve associated to the classifier.
-
miss_rate
(outcome_modality=None)¶ False negative rate
Parameters: outcome_modality – (optional) the modality Note
- This method has some alias:
FNR
miss_rate
-
negative_likehood_ratio
()¶ Negative Likehood ratio
Note
- This method has some alias:
LRp
-
negative_predictive_value
(outcome_modality)¶ Negative predictive value
Parameters: outcome_modality – (optional) the modality Note
- This method has some alias:
NPV
-
positive_likehood_ratio
()¶ Positive Likehood ratio
Note
- This method has some alias:
LRp
-
positive_predictive_value
(outcome_modality=None)¶ Precision
Parameters: outcome_modality – (optional) the modality Note
- This method has some alias:
positive_predictive_value
-
precision
(outcome_modality=None)¶ Precision
Parameters: outcome_modality – (optional) the modality Note
- This method has some alias:
positive_predictive_value
-
prevalence
()¶ Prevalence
-
probability_of_detection
(outcome_modality=None)¶ True positive rate
Parameters: outcome_modality – (optional) the modality Note
- This method has some alias:
recall
TPR
sensitivity
probability_of_detection
-
recall
(outcome_modality=None)¶ True positive rate
Parameters: outcome_modality – (optional) the modality Note
- This method has some alias:
recall
TPR
sensitivity
probability_of_detection
-
sensitivity
(outcome_modality=None)¶ True positive rate
Parameters: outcome_modality – (optional) the modality Note
- This method has some alias:
recall
TPR
sensitivity
probability_of_detection
-
specificity
(outcome_modality=None)¶ True negative rate
Parameters: outcome_modality – (optional) the modality Note
- This method has some alias:
TNR
specificity
SPC
-
target_rate
(outcome_modality)¶ Target rate of the modality
outcome_modality
Parameters: outcome_modality – a modality
-
true_negative
(outcome_modality)¶ Number of true negative errors of the
outcome_modality
True negative = correctly rejected
Parameters: outcome_modality – (optional) compute the number of correct rejection of the modality
-
true_negative_rate
(outcome_modality=None)¶ True negative rate
Parameters: outcome_modality – (optional) the modality Note
- This method has some alias:
TNR
specificity
SPC
-
true_positive
(outcome_modality=None)¶ Number of correct predictions
True positive = correctly identified
Parameters: outcome_modality – (optional) compute the number of correct prediction associated to this modality Raises: KeyError – if the outcome_modality
does not exist>>> ass.true_positive() 17460 >>> ass.true_positive('Y') 4084
-
true_positive_rate
(outcome_modality=None)¶ True positive rate
Parameters: outcome_modality – (optional) the modality Note
- This method has some alias:
recall
TPR
sensitivity
probability_of_detection
-