I need help
Hello everyone. I am newbie in NLP world, and have a task from one firm. It is technical task for intern position. Here is the description of the task:
You task it to process provided technical articles and implement continual training for one of the large Language Models – BERT. The purpose is such that your BERT model understands the context of those papers and ready to answer questions related to those papers. For that, you need to work with Hugging Face. It is also suggested for you to work via Colab. Your deliverables are:
· Deploy original BERT model and test it by asking the questions
· Do continual training of BERT and generate a code allowing to ask questions regarding paper context
· Compare answers of original and your BERT models and show that your model is fit-to-purpose
Here is my problem. As I know, when we finetune BERT we need question, answer, context, start and end positions of answer. But there are too many content provided by them. 6 pdfs which are separated books. Is there a way to generate that questions answers and etc in easy way?