当前位置: 首页 > 热点

当前快讯:使用评价指标工具

2023-06-28 05:16:08 来源:博客园


(资料图)

评估一个训练好的模型需要评估指标,比如正确率、查准率、查全率、F1值等。当然不同的任务类型有着不同的评估指标,而HuggingFace提供了统一的评价指标工具。

1.列出可用的评价指标通过list_metrics()函数列出可用的评价指标:

deflist_metric_test():#第4章/列出可用的评价指标fromdatasetsimportlist_metricsmetrics_list=list_metrics()print(len(metrics_list),metrics_list[:5])

输出结果如下所示:

157["accuracy","bertscore","bleu","bleurt","brier_score"]

可见目前包含157个评价指标,并且输出了前5个评价指标。

2.加载一个评价指标通过load_metric()加载评价指标,需要说明的是有的评价指标和对应的数据集配套使用,这里以glue数据集的mrpc子集为例:

defload_metric_test():#第4章/加载评价指标fromdatasetsimportload_metricmetric=load_metric(path="accuracy")#加载accuracy指标print(metric)#第4章/加载一个评价指标fromdatasetsimportload_metricmetric=load_metric(path="glue",config_name="mrpc")#加载glue数据集中的mrpc子集print(metric)

3.获取评价指标的使用说明评价指标的inputs_description属性描述了评价指标的使用方法,以及评价指标的使用方法如下所示:

defload_metric_description_test():#第4章/加载一个评价指标fromdatasetsimportload_metricglue_metric=load_metric("glue","mrpc")#加载glue数据集中的mrpc子集print(glue_metric.inputs_description)references=[0,1]predictions=[0,1]results=glue_metric.compute(predictions=predictions,references=references)print(results)#{"accuracy":1.0,"f1":1.0}

输出结果如下所示:

ComputeGLUEevaluationmetricassociatedtoeachGLUEdataset.Args:predictions:listofpredictionstoscore.Eachtranslationshouldbetokenizedintoalistoftokens.references:listoflistsofreferencesforeachtranslation.Eachreferenceshouldbetokenizedintoalistoftokens.Returns:dependingontheGLUEsubset,oneorseveralof:"accuracy":Accuracy"f1":F1score"pearson":PearsonCorrelation"spearmanr":SpearmanCorrelation"matthews_correlation":MatthewCorrelationExamples:>>>glue_metric=datasets.load_metric("glue","sst2")#"sst2"oranyof["mnli","mnli_mismatched","mnli_matched","qnli","rte","wnli","hans"]>>>references=[0,1]>>>predictions=[0,1]>>>results=glue_metric.compute(predictions=predictions,references=references)>>>print(results){"accuracy":1.0}>>>glue_metric=datasets.load_metric("glue","mrpc")#"mrpc"or"qqp">>>references=[0,1]>>>predictions=[0,1]>>>results=glue_metric.compute(predictions=predictions,references=references)>>>print(results){"accuracy":1.0,"f1":1.0}>>>glue_metric=datasets.load_metric("glue","stsb")>>>references=[0.,1.,2.,3.,4.,5.]>>>predictions=[0.,1.,2.,3.,4.,5.]>>>results=glue_metric.compute(predictions=predictions,references=references)>>>print({"pearson":round(results["pearson"],2),"spearmanr":round(results["spearmanr"],2)}){"pearson":1.0,"spearmanr":1.0}>>>glue_metric=datasets.load_metric("glue","cola")>>>references=[0,1]>>>predictions=[0,1]>>>results=glue_metric.compute(predictions=predictions,references=references)>>>print(results){"matthews_correlation":1.0}{"accuracy":1.0,"f1":1.0}

首先描述了评价指标的使用方法,然后计算评价指标accuracy和f1。

关键词:

相关阅读

Copyright   2015-2022 北冰洋技术网 版权所有  备案号:沪ICP备2020036824号-3   联系邮箱:562 66 29@qq.com