Now I have a pretrained AI model, it is a chat bot model, according to its develop documentation, it can be fine tuning by feeding custom data. And the data format is: [{"input": input_query, "target": target_query}, ...] It requires two parts for an item: input and target. Currently I have a lots of txt files, they are a famous writer s work, like Da Vinci or picasso in artist. What I want to achieve is chating with the robot who responsing in the style of the famous writer. I want to train a specific style/personality chat robot. The question is: how to convert the txt files (the writer s works) to [input,target] formatted dataset? Any comments are welcome.
我使用的预训练模型: https://github.com/clue-ai/ChatYuan