当前快讯:使用评价指标工具
2023-06-28 05:16:08
来源:博客园
(资料图)
评估一个训练好的模型需要评估指标,比如正确率、查准率、查全率、F1值等。当然不同的任务类型有着不同的评估指标,而HuggingFace提供了统一的评价指标工具。
1.列出可用的评价指标通过list_metrics()函数列出可用的评价指标:
deflist_metric_test():#第4章/列出可用的评价指标fromdatasetsimportlist_metricsmetrics_list=list_metrics()print(len(metrics_list),metrics_list[:5])
输出结果如下所示:
157["accuracy","bertscore","bleu","bleurt","brier_score"]
可见目前包含157个评价指标,并且输出了前5个评价指标。
2.加载一个评价指标通过load_metric()加载评价指标,需要说明的是有的评价指标和对应的数据集配套使用,这里以glue数据集的mrpc子集为例:
defload_metric_test():#第4章/加载评价指标fromdatasetsimportload_metricmetric=load_metric(path="accuracy")#加载accuracy指标print(metric)#第4章/加载一个评价指标fromdatasetsimportload_metricmetric=load_metric(path="glue",config_name="mrpc")#加载glue数据集中的mrpc子集print(metric)
3.获取评价指标的使用说明评价指标的inputs_description属性描述了评价指标的使用方法,以及评价指标的使用方法如下所示:
defload_metric_description_test():#第4章/加载一个评价指标fromdatasetsimportload_metricglue_metric=load_metric("glue","mrpc")#加载glue数据集中的mrpc子集print(glue_metric.inputs_description)references=[0,1]predictions=[0,1]results=glue_metric.compute(predictions=predictions,references=references)print(results)#{"accuracy":1.0,"f1":1.0}
输出结果如下所示:
ComputeGLUEevaluationmetricassociatedtoeachGLUEdataset.Args:predictions:listofpredictionstoscore.Eachtranslationshouldbetokenizedintoalistoftokens.references:listoflistsofreferencesforeachtranslation.Eachreferenceshouldbetokenizedintoalistoftokens.Returns:dependingontheGLUEsubset,oneorseveralof:"accuracy":Accuracy"f1":F1score"pearson":PearsonCorrelation"spearmanr":SpearmanCorrelation"matthews_correlation":MatthewCorrelationExamples:>>>glue_metric=datasets.load_metric("glue","sst2")#"sst2"oranyof["mnli","mnli_mismatched","mnli_matched","qnli","rte","wnli","hans"]>>>references=[0,1]>>>predictions=[0,1]>>>results=glue_metric.compute(predictions=predictions,references=references)>>>print(results){"accuracy":1.0}>>>glue_metric=datasets.load_metric("glue","mrpc")#"mrpc"or"qqp">>>references=[0,1]>>>predictions=[0,1]>>>results=glue_metric.compute(predictions=predictions,references=references)>>>print(results){"accuracy":1.0,"f1":1.0}>>>glue_metric=datasets.load_metric("glue","stsb")>>>references=[0.,1.,2.,3.,4.,5.]>>>predictions=[0.,1.,2.,3.,4.,5.]>>>results=glue_metric.compute(predictions=predictions,references=references)>>>print({"pearson":round(results["pearson"],2),"spearmanr":round(results["spearmanr"],2)}){"pearson":1.0,"spearmanr":1.0}>>>glue_metric=datasets.load_metric("glue","cola")>>>references=[0,1]>>>predictions=[0,1]>>>results=glue_metric.compute(predictions=predictions,references=references)>>>print(results){"matthews_correlation":1.0}{"accuracy":1.0,"f1":1.0}
首先描述了评价指标的使用方法,然后计算评价指标accuracy和f1。
关键词: