HF多模态

BM-K/KoSimCSE-roberta-multitask

https://github.com/BM-K/Sentence-Embedding-is-all-you-need

Korean-Sentence-Embedding

Korean sentence embedding repository. You can download the pre-trained models and inference right away, also it provides environments where individuals can train models.

Quick tour

import torch
from transformers import AutoModel, AutoTokenizer
def cal_score(a, b):
if len(a.shape) == 1: a = a.unsqueeze(0)
if len(b.shape) == 1: b = b.unsqueeze(0)
a_norm = a / a.norm(dim=1)[:, None]
b_norm = b / b.norm(dim=1)[:, None]
return torch.mm(a_norm, b_norm.transpose(0, 1)) * 100
model = AutoModel.from_pretrained('BM-K/KoSimCSE-roberta-multitask')
AutoTokenizer.from_pretrained('BM-K/KoSimCSE-roberta-multitask')
sentences = ['치타가 들판을 가로 질러 먹이를 쫓는다.',
'치타 한 마리가 먹이 뒤에서 달리고 있다.',
'원숭이 한 마리가 드럼을 연주한다.']
inputs = tokenizer(sentences, padding=True, truncation=True, return_tensors="pt")
embeddings, _ = model(**inputs, return_dict=False)
score01 = cal_score(embeddings[0][0], embeddings[1][0])
score02 = cal_score(embeddings[0][0], embeddings[2][0])

Performance

  • Semantic Textual Similarity test set results
ModelAVGCosine PearsonCosine SpearmanEuclidean PearsonEuclidean SpearmanManhattan PearsonManhattan SpearmanDot PearsonDot Spearman
KoSBERTSKT 77.4078.8178.4777.6877.7877.7177.8375.7575.22
KoSBERT80.3982.1382.2580.6780.7580.6980.7877.9677.90
KoSRoBERTa81.6481.2082.2081.7982.3481.5982.2080.6281.25
KoSentenceBART77.1479.7178.7478.4278.0278.4078.0074.2472.15
KoSentenceT577.8380.8779.7480.2479.3680.1979.2772.8170.17
KoSimCSE-BERTSKT 81.3282.1282.5681.8481.6381.9981.7479.5579.19
KoSimCSE-BERT83.3783.2283.5883.2483.6083.1583.5483.1383.49
KoSimCSE-RoBERTa83.6583.6083.7783.5483.7683.5583.7783.5583.64
KoSimCSE-BERT-multitask85.7185.2986.0285.6386.0185.5785.9785.2685.93
KoSimCSE-RoBERTa-multitask85.7785.0886.1285.8486.1285.8386.1285.0385.99

数据统计

相关导航

暂无评论

暂无评论...