我试图将BERT模型与Huggingface 培训员APIC进行情感分析(将案文归类为积极/否定性)。 我的数据集有两栏:<代码>Text和Sentiment
。
Text Sentiment
This was good place 1
This was bad place 0
我的守则是:
from datasets import load_dataset
from datasets import load_dataset_builder
from datasets import Dataset
import datasets
import transformers
from transformers import TrainingArguments
from transformers import Trainer
dataset = load_dataset( csv , data_files= ./train/test.csv , sep= ; )
tokenizer = transformers.BertTokenizer.from_pretrained("TurkuNLP/bert-base-finnish-cased-v1")
model = transformers.BertForSequenceClassification.from_pretrained("TurkuNLP/bert-base-finnish-cased-v1", num_labels=1)
def tokenize_function(examples):
return tokenizer(examples["Text"], truncation=True, padding= max_length )
tokenized_datasets = dataset.map(tokenize_function, batched=True)
tokenized_datasets = tokenized_datasets.rename_column( Sentiment , label )
tokenized_datasets = tokenized_datasets.remove_columns( Text )
training_args = TrainingArguments("test_trainer")
trainer = Trainer(
model=model, args=training_args, train_dataset=tokenized_datasets[ train ]
)
trainer.train()
Running this throws error:
Variable._execution_engine.run_backward(
RuntimeError: Found dtype Long but expected Float
错误可能来自数据集本身,但我能否用我的代码加以确定? 我搜索了因特网,这一错误似乎以前通过“把帐篷推向漂浮”来解决,但我如何与Amper培训员打交道? 任何建议都受到高度赞赏。
参考:
https://discuss.pytorch.org/t/run-back-expected-dtype-float-but-got-d-type-long/61650/10