Classifier¶

class mangrove_surface.wrapper.classifier.ClassifierWrapper¶

Classifier resource

A classifier provides

the list relevants features (including level, weight, discretization attributes)
the assessments over each train/test schemas
method to export scores over
method to improve classifier

add_schema(type_schm, schema, name=None)¶

Upload a new schema of datasets

Parameters:	type_schm – `train`, `test` or `export` schema – a python dictionary recording datasets like this

{
    "tags": ["dataset", "tag"],
    "datasets": [
        {
            "name": "Dataset Name",
            "filepath": "/path/to/dataset.csv",
            "tags": ["optional", "tags"],
            "central": True | False,
            "keys: ["index"], # optional if there is only
                              # one dataset
            "separator": ",", # could be `|`, `,`, `;` or ` `
        }, ...
    ]
}

add_schema_and_export(schema, name=None, modalities=[], bin_format='label', raw_variables=[], binned_variables=[], predicted_modality=False)¶

Upload a new schema and export it

Parameters:

schema – a python dictionary of datasets (see add_schema():)
name – (optional) the schema name
modalities – (optional) the modalities scored. If no modality is provided then scores are not provided (only variables)
raw_variables – the list of variables to export as raw value
binned_variables – the list of variables to export as binned value
bin_format – (default: label) select how to express the binned variables. label (default) to express value as its intervals or groups, or id to express value as a concise value
predicted_modality – provided a column with the predicted value if predicted_modality==True (default predicted_modality==False)

compatible_schemas(test=True, export=True)¶: List compatible schemas (with there type)

compute_assessments(schm_name, outcome_modality=None)¶

Compute assessment over schema named schm_name (focus on modality outcome_modality)

Parameters:	schm_name – name of the schema used to compute assessments outcome_modality – the modality used to compute assessments (by default assessments is computed over the main modality)

compute_export(schm_name, export_name=None, modalities=[], bin_format='label', raw_variables=[], binned_variables=[], predicted_modality=False)¶

Compute a new export

Parameters:

schm_name – the dataset schema which is exported
export_name – name of the export
modalities – (optional) the modalities scored. If no modality is provided then scores are not provided (only variables)
raw_variables – the list of variables to export as raw value
binned_variables – the list of variables to export as binned value
bin_format – (default: label) select how to express the binned variables. label (default) to express value as its intervals or groups, or id to express value as a concise value
predicted_modality – provided a column with the predicted value if predicted_modality==True (default predicted_modality==False)

discretization_attribute(*args, **kwargs)¶

Return the discretization attribute of the contributive feature name

Parameters:	name – feature name

>>> classifier.discretization_attribute("Car_Type")
[
    {
        'coverage': 0.0248497,
        'frequency': 529,
        'target_distribution': {
            '0': 0.837429,
            '1': 0.162571
        },
        'value_list': ['Full-size luxury car']
    },
    ...
]

download(*args, **kwargs)¶

Download the classifier

Parameters:	filepath – the filepath where store the classifier

exports()¶: List all exports

feature(*args, **kwargs)¶

Information about feature name

It returns level, weight, discretization attributes.

Parameters:	name – feature name

>>> classifier.feature('Car_Type')
{
    'level': 0.103459,
     'maximum_a_posteriori': True,
     'name': 'Car_Type',
     'nb_parts': 4,
     'parts': [
         {
            'coverage': 0.0248497,
            'frequency': 529,
            'target_distribution': {
                '0': 0.837429,
                '1': 0.162571
            },
            'value_list': ['Full-size luxury car']
        },
        ...
    ],
    'weight': 0.832425
}

feature_set(*args, **kwargs)¶: Return the underlying feature set

Note

This feature set could be used to change type, unused some features

features(*args, **kwargs)¶

List all the features used by the current classifier

>>> classifier.features()
[
    {
        'level': 0.103459,
         'maximum_a_posteriori': True,
         'name': 'Car_Type',
         'nb_parts': 4,
         'parts': [
             {
                'coverage': 0.0248497,
                'frequency': 529,
                'target_distribution': {
                    '0': 0.837429,
                    '1': 0.162571
                },
                'value_list': ['Full-size luxury car']
            },
            ...
        ],
        'weight': 0.832425
    },
    ...
]

improve(name=None, tags=[], nb_aggregates=None, maximum_features=None)¶

Create a new classifier

Parameters:	name – (optional) classifier name tags – (optional) list of project tag nb_aggregates – (optional) number of aggregates generated for the new classifier maximum_features – (optional) maximal number of features used by the new classifier
Raises:	MangroveError – if the number of requested aggregates is provided and it is smaller than `.nb_aggregates()`

level(*args, **kwargs)¶

Return the level of the feature named name

Parameters:	name – feature name

The level indicates the correlation between the feature and the outcome

nb_aggregates()¶: Return the number of aggregates

outcome()¶: Outcome field predicted by the current classifier

set_unused(*args, **kwargs)¶

Set feature name unused

Parameters:	name – feature name

update_name(new_name)¶

Update the classifier name

Parameters:	new_name – new classifier name

weight(*args, **kwargs)¶

Return the weight of the feature named name

Parameters:	name – feature name

The weight indicates how the feature discriminates more than others relevant features (with level > 0)

Assessment¶

class mangrove_surface.wrapper.classifier_evaluation_report.ClassifierEvaluationReportWrapper¶

Classifier Evaluation Report resource

ACC()¶

Accuracy

Note

This method has some alias:

ACC

AUC(*args, **kwargs)¶: Area under curve

DOR()¶

Diagnostic odds ratio

Note

This method has some alias:

DOR

F1_score(outcome_modality=None)¶

F1 score

Parameters:	outcome_modality – (optional) the modality

FDR(outcome_modality)¶

False discovery rate

Parameters:	outcome_modality – (optional) the modality

Note

This method has some alias:

FDR

FNR(outcome_modality=None)¶

False negative rate

Parameters:	outcome_modality – (optional) the modality

Note

This method has some alias:

FNR
miss_rate

FOR(outcome_modality)¶

False omission rate

Parameters:	outcome_modality – (optional) the modality

Note

This method has some alias:

FOR

FPR(outcome_modality=None)¶

False positive rate

Parameters:	outcome_modality – (optional) the modality

Note

This method has some alias:

FPR
fall_out

LRm()¶

Negative Likehood ratio

Note

This method has some alias:

LRp

LRp()¶

Positive Likehood ratio

Note

This method has some alias:

LRp

NPV(outcome_modality)¶

Negative predictive value

Parameters:	outcome_modality – (optional) the modality

Note

This method has some alias:

NPV

PPV(outcome_modality=None)¶

Precision

Parameters:	outcome_modality – (optional) the modality

Note

This method has some alias:

positive_predictive_value

SPC(outcome_modality=None)¶

True negative rate

Parameters:	outcome_modality – (optional) the modality

Note

This method has some alias:

TNR
specificity
SPC

TNR(outcome_modality=None)¶

True negative rate

Parameters:	outcome_modality – (optional) the modality

Note

This method has some alias:

TNR
specificity
SPC

TPR(outcome_modality=None)¶

True positive rate

Parameters:	outcome_modality – (optional) the modality

Note

This method has some alias:

recall
TPR
sensitivity
probability_of_detection

accuracy()¶

Accuracy

Note

This method has some alias:

ACC

area_under_curve(*args, **kwargs)¶: Area under curve

auc(*args, **kwargs)¶: Area under curve

confusion_matrix(*args, **kwargs)¶

Confusion matrix

::

>>> ass.confusion_matrix()
{
    'matrix': [
        [13376, 1393],
        [  683, 4084]
    ],
    'modalities': ['N', 'Y']
}

diagnostic_odds_ratio()¶

Diagnostic odds ratio

Note

This method has some alias:

DOR

fall_out(outcome_modality=None)¶

False positive rate

Parameters:	outcome_modality – (optional) the modality

Note

This method has some alias:

FPR
fall_out

false_discovery_rate(outcome_modality)¶

False discovery rate

Parameters:	outcome_modality – (optional) the modality

Note

This method has some alias:

FDR

false_negative(outcome_modality)¶

Number of false negative errors of the outcome_modality

False negative = incorrectly rejected

Parameters:	outcome_modality – (optional) compute the number of incorrect rejection of the modality

false_negative_rate(outcome_modality=None)¶

False negative rate

Parameters:	outcome_modality – (optional) the modality

Note

This method has some alias:

FNR
miss_rate

false_omission_rate(outcome_modality)¶

False omission rate

Parameters:	outcome_modality – (optional) the modality

Note

This method has some alias:

FOR

false_positive(outcome_modality=None)¶

Number of incorrect predictions

False positive = incorrectly identified

Parameters:	outcome_modality – (optional) compute the number of incorrect prediction associated to this modality
Raises:	KeyError – if the `outcome_modality` does not exist

>>> ass.false_positive()
2076

>>> ass.false_positive('Y')
4084

false_positive_rate(outcome_modality=None)¶

False positive rate

Parameters:	outcome_modality – (optional) the modality

Note

This method has some alias:

FPR
fall_out

gini()¶: Gini coefficient

instances(outcome_modality=None)¶: Number of instances evaluated

lift_curve(*args, **kwargs)¶

Lift curve over the schema

Parameters:	using – is `classifier` or `optimal`; by default the lift curve associated to the classifier.

miss_rate(outcome_modality=None)¶

False negative rate

Parameters:	outcome_modality – (optional) the modality

Note

This method has some alias:

FNR
miss_rate

negative_likehood_ratio()¶

Negative Likehood ratio

Note

This method has some alias:

LRp

negative_predictive_value(outcome_modality)¶

Negative predictive value

Parameters:	outcome_modality – (optional) the modality

Note

This method has some alias:

NPV

positive_likehood_ratio()¶

Positive Likehood ratio

Note

This method has some alias:

LRp

positive_predictive_value(outcome_modality=None)¶

Precision

Parameters:	outcome_modality – (optional) the modality

Note

This method has some alias:

positive_predictive_value

precision(outcome_modality=None)¶

Precision

Parameters:	outcome_modality – (optional) the modality

Note

This method has some alias:

positive_predictive_value

prevalence()¶: Prevalence

probability_of_detection(outcome_modality=None)¶

True positive rate

Parameters:	outcome_modality – (optional) the modality

Note

This method has some alias:

recall
TPR
sensitivity
probability_of_detection

recall(outcome_modality=None)¶

True positive rate

Parameters:	outcome_modality – (optional) the modality

Note

This method has some alias:

recall
TPR
sensitivity
probability_of_detection

sensitivity(outcome_modality=None)¶

True positive rate

Parameters:	outcome_modality – (optional) the modality

Note

This method has some alias:

recall
TPR
sensitivity
probability_of_detection

specificity(outcome_modality=None)¶

True negative rate

Parameters:	outcome_modality – (optional) the modality

Note

This method has some alias:

TNR
specificity
SPC

target_rate(outcome_modality)¶

Target rate of the modality outcome_modality

Parameters:	outcome_modality – a modality

true_negative(outcome_modality)¶

Number of true negative errors of the outcome_modality

True negative = correctly rejected

Parameters:	outcome_modality – (optional) compute the number of correct rejection of the modality

true_negative_rate(outcome_modality=None)¶

True negative rate

Parameters:	outcome_modality – (optional) the modality

Note

This method has some alias:

TNR
specificity
SPC

true_positive(outcome_modality=None)¶

Number of correct predictions

True positive = correctly identified

Parameters:	outcome_modality – (optional) compute the number of correct prediction associated to this modality
Raises:	KeyError – if the `outcome_modality` does not exist

>>> ass.true_positive()
17460

>>> ass.true_positive('Y')
4084

true_positive_rate(outcome_modality=None)¶

True positive rate

Parameters:	outcome_modality – (optional) the modality

Note

This method has some alias:

recall
TPR
sensitivity
probability_of_detection