Classifier

class mangrove_surface.wrapper.classifier.ClassifierWrapper

Classifier resource

A classifier provides

  • the list relevants features (including level, weight, discretization attributes)
  • the assessments over each train/test schemas
  • method to export scores over
  • method to improve classifier
add_schema(type_schm, schema, name=None)

Upload a new schema of datasets

Parameters:
  • type_schmtrain, test or export
  • schema – a python dictionary recording datasets like this
{
    "tags": ["dataset", "tag"],
    "datasets": [
        {
            "name": "Dataset Name",
            "filepath": "/path/to/dataset.csv",
            "tags": ["optional", "tags"],
            "central": True | False,
            "keys: ["index"], # optional if there is only
                              # one dataset
            "separator": ",", # could be `|`, `,`, `;` or ` `
        }, ...
    ]
}
add_schema_and_export(schema, name=None, modalities=[], bin_format='label', raw_variables=[], binned_variables=[], predicted_modality=False)

Upload a new schema and export it

Parameters:
  • schema – a python dictionary of datasets (see add_schema():)
  • name – (optional) the schema name
  • modalities – (optional) the modalities scored. If no modality is provided then scores are not provided (only variables)
  • raw_variables – the list of variables to export as raw value
  • binned_variables – the list of variables to export as binned value
  • bin_format – (default: label) select how to express the binned variables. label (default) to express value as its intervals or groups, or id to express value as a concise value
  • predicted_modality – provided a column with the predicted value if predicted_modality==True (default predicted_modality==False)
compatible_schemas(test=True, export=True)

List compatible schemas (with there type)

compute_assessments(schm_name, outcome_modality=None)

Compute assessment over schema named schm_name (focus on modality outcome_modality)

Parameters:
  • schm_name – name of the schema used to compute assessments
  • outcome_modality – the modality used to compute assessments (by default assessments is computed over the main modality)
compute_export(schm_name, export_name=None, modalities=[], bin_format='label', raw_variables=[], binned_variables=[], predicted_modality=False)

Compute a new export

Parameters:
  • schm_name – the dataset schema which is exported
  • export_name – name of the export
  • modalities – (optional) the modalities scored. If no modality is provided then scores are not provided (only variables)
  • raw_variables – the list of variables to export as raw value
  • binned_variables – the list of variables to export as binned value
  • bin_format – (default: label) select how to express the binned variables. label (default) to express value as its intervals or groups, or id to express value as a concise value
  • predicted_modality – provided a column with the predicted value if predicted_modality==True (default predicted_modality==False)
discretization_attribute(*args, **kwargs)

Return the discretization attribute of the contributive feature name

Parameters:name – feature name
>>> classifier.discretization_attribute("Car_Type")
[
    {
        'coverage': 0.0248497,
        'frequency': 529,
        'target_distribution': {
            '0': 0.837429,
            '1': 0.162571
        },
        'value_list': ['Full-size luxury car']
    },
    ...
]
download(*args, **kwargs)

Download the classifier

Parameters:filepath – the filepath where store the classifier
exports()

List all exports

feature(*args, **kwargs)

Information about feature name

It returns level, weight, discretization attributes.

Parameters:name – feature name
>>> classifier.feature('Car_Type')
{
    'level': 0.103459,
     'maximum_a_posteriori': True,
     'name': 'Car_Type',
     'nb_parts': 4,
     'parts': [
         {
            'coverage': 0.0248497,
            'frequency': 529,
            'target_distribution': {
                '0': 0.837429,
                '1': 0.162571
            },
            'value_list': ['Full-size luxury car']
        },
        ...
    ],
    'weight': 0.832425
}
feature_set(*args, **kwargs)

Return the underlying feature set

Note

This feature set could be used to change type, unused some features

features(*args, **kwargs)

List all the features used by the current classifier

>>> classifier.features()
[
    {
        'level': 0.103459,
         'maximum_a_posteriori': True,
         'name': 'Car_Type',
         'nb_parts': 4,
         'parts': [
             {
                'coverage': 0.0248497,
                'frequency': 529,
                'target_distribution': {
                    '0': 0.837429,
                    '1': 0.162571
                },
                'value_list': ['Full-size luxury car']
            },
            ...
        ],
        'weight': 0.832425
    },
    ...
]
improve(name=None, tags=[], nb_aggregates=None, maximum_features=None)

Create a new classifier

Parameters:
  • name – (optional) classifier name
  • tags – (optional) list of project tag
  • nb_aggregates – (optional) number of aggregates generated for the new classifier
  • maximum_features – (optional) maximal number of features used by the new classifier
Raises:

MangroveError – if the number of requested aggregates is provided and it is smaller than .nb_aggregates()

level(*args, **kwargs)

Return the level of the feature named name

Parameters:name – feature name

The level indicates the correlation between the feature and the outcome

nb_aggregates()

Return the number of aggregates

outcome()

Outcome field predicted by the current classifier

set_unused(*args, **kwargs)

Set feature name unused

Parameters:name – feature name
update_name(new_name)

Update the classifier name

Parameters:new_name – new classifier name
weight(*args, **kwargs)

Return the weight of the feature named name

Parameters:name – feature name

The weight indicates how the feature discriminates more than others relevant features (with level > 0)

Assessment

class mangrove_surface.wrapper.classifier_evaluation_report.ClassifierEvaluationReportWrapper

Classifier Evaluation Report resource

ACC()

Accuracy

Note

This method has some alias:
  • ACC
AUC(*args, **kwargs)

Area under curve

DOR()

Diagnostic odds ratio

Note

This method has some alias:
  • DOR
F1_score(outcome_modality=None)

F1 score

Parameters:outcome_modality – (optional) the modality
FDR(outcome_modality)

False discovery rate

Parameters:outcome_modality – (optional) the modality

Note

This method has some alias:
  • FDR
FNR(outcome_modality=None)

False negative rate

Parameters:outcome_modality – (optional) the modality

Note

This method has some alias:
  • FNR
  • miss_rate
FOR(outcome_modality)

False omission rate

Parameters:outcome_modality – (optional) the modality

Note

This method has some alias:
  • FOR
FPR(outcome_modality=None)

False positive rate

Parameters:outcome_modality – (optional) the modality

Note

This method has some alias:
  • FPR
  • fall_out
LRm()

Negative Likehood ratio

Note

This method has some alias:
  • LRp
LRp()

Positive Likehood ratio

Note

This method has some alias:
  • LRp
NPV(outcome_modality)

Negative predictive value

Parameters:outcome_modality – (optional) the modality

Note

This method has some alias:
  • NPV
PPV(outcome_modality=None)

Precision

Parameters:outcome_modality – (optional) the modality

Note

This method has some alias:
  • positive_predictive_value
SPC(outcome_modality=None)

True negative rate

Parameters:outcome_modality – (optional) the modality

Note

This method has some alias:
  • TNR
  • specificity
  • SPC
TNR(outcome_modality=None)

True negative rate

Parameters:outcome_modality – (optional) the modality

Note

This method has some alias:
  • TNR
  • specificity
  • SPC
TPR(outcome_modality=None)

True positive rate

Parameters:outcome_modality – (optional) the modality

Note

This method has some alias:
  • recall
  • TPR
  • sensitivity
  • probability_of_detection
accuracy()

Accuracy

Note

This method has some alias:
  • ACC
area_under_curve(*args, **kwargs)

Area under curve

auc(*args, **kwargs)

Area under curve

confusion_matrix(*args, **kwargs)

Confusion matrix

::
>>> ass.confusion_matrix()
{
    'matrix': [
        [13376, 1393],
        [  683, 4084]
    ],
    'modalities': ['N', 'Y']
}
diagnostic_odds_ratio()

Diagnostic odds ratio

Note

This method has some alias:
  • DOR
fall_out(outcome_modality=None)

False positive rate

Parameters:outcome_modality – (optional) the modality

Note

This method has some alias:
  • FPR
  • fall_out
false_discovery_rate(outcome_modality)

False discovery rate

Parameters:outcome_modality – (optional) the modality

Note

This method has some alias:
  • FDR
false_negative(outcome_modality)

Number of false negative errors of the outcome_modality

False negative = incorrectly rejected

Parameters:outcome_modality – (optional) compute the number of incorrect rejection of the modality
false_negative_rate(outcome_modality=None)

False negative rate

Parameters:outcome_modality – (optional) the modality

Note

This method has some alias:
  • FNR
  • miss_rate
false_omission_rate(outcome_modality)

False omission rate

Parameters:outcome_modality – (optional) the modality

Note

This method has some alias:
  • FOR
false_positive(outcome_modality=None)

Number of incorrect predictions

False positive = incorrectly identified

Parameters:outcome_modality – (optional) compute the number of incorrect prediction associated to this modality
Raises:KeyError – if the outcome_modality does not exist
>>> ass.false_positive()
2076

>>> ass.false_positive('Y')
4084
false_positive_rate(outcome_modality=None)

False positive rate

Parameters:outcome_modality – (optional) the modality

Note

This method has some alias:
  • FPR
  • fall_out
gini()

Gini coefficient

instances(outcome_modality=None)

Number of instances evaluated

lift_curve(*args, **kwargs)

Lift curve over the schema

Parameters:using – is classifier or optimal; by default the lift curve associated to the classifier.
miss_rate(outcome_modality=None)

False negative rate

Parameters:outcome_modality – (optional) the modality

Note

This method has some alias:
  • FNR
  • miss_rate
negative_likehood_ratio()

Negative Likehood ratio

Note

This method has some alias:
  • LRp
negative_predictive_value(outcome_modality)

Negative predictive value

Parameters:outcome_modality – (optional) the modality

Note

This method has some alias:
  • NPV
positive_likehood_ratio()

Positive Likehood ratio

Note

This method has some alias:
  • LRp
positive_predictive_value(outcome_modality=None)

Precision

Parameters:outcome_modality – (optional) the modality

Note

This method has some alias:
  • positive_predictive_value
precision(outcome_modality=None)

Precision

Parameters:outcome_modality – (optional) the modality

Note

This method has some alias:
  • positive_predictive_value
prevalence()

Prevalence

probability_of_detection(outcome_modality=None)

True positive rate

Parameters:outcome_modality – (optional) the modality

Note

This method has some alias:
  • recall
  • TPR
  • sensitivity
  • probability_of_detection
recall(outcome_modality=None)

True positive rate

Parameters:outcome_modality – (optional) the modality

Note

This method has some alias:
  • recall
  • TPR
  • sensitivity
  • probability_of_detection
sensitivity(outcome_modality=None)

True positive rate

Parameters:outcome_modality – (optional) the modality

Note

This method has some alias:
  • recall
  • TPR
  • sensitivity
  • probability_of_detection
specificity(outcome_modality=None)

True negative rate

Parameters:outcome_modality – (optional) the modality

Note

This method has some alias:
  • TNR
  • specificity
  • SPC
target_rate(outcome_modality)

Target rate of the modality outcome_modality

Parameters:outcome_modality – a modality
true_negative(outcome_modality)

Number of true negative errors of the outcome_modality

True negative = correctly rejected

Parameters:outcome_modality – (optional) compute the number of correct rejection of the modality
true_negative_rate(outcome_modality=None)

True negative rate

Parameters:outcome_modality – (optional) the modality

Note

This method has some alias:
  • TNR
  • specificity
  • SPC
true_positive(outcome_modality=None)

Number of correct predictions

True positive = correctly identified

Parameters:outcome_modality – (optional) compute the number of correct prediction associated to this modality
Raises:KeyError – if the outcome_modality does not exist
>>> ass.true_positive()
17460

>>> ass.true_positive('Y')
4084
true_positive_rate(outcome_modality=None)

True positive rate

Parameters:outcome_modality – (optional) the modality

Note

This method has some alias:
  • recall
  • TPR
  • sensitivity
  • probability_of_detection