Large Language Models (LLMs) refer to large general-purpose language models that can be pre-trained and then fine-tuned for specific purposes. They are trained to solve common language problems, such as text classification, question answering, document summaries, and text generation. The models can then be adapted to solve specific problems in different fields using a relatively small size of field datasets via fine-tuning.
The ability of LLMs taking the knowledge learnt from one task and applying it to another task is enabled by transfer learning. LLMs predict the probabilities of next word (token), given an input string of text, based on the language in the training data. Besides, instruction tuned language models predict a response to the instructions given in the input. These instructions can be "summarize a text", "generate a poem in the style of X", or "give a list of keywords based on semantic similarity for X".
LLMs are large, not only because of their large size of training data, but also their large number of parameters. They display different behaviors from smaller models and have important implications for those who develop and use A.I. systems. To develop effective LLMs, researchers must address complex engineering issues and work alongside engineers or have engineering expertise themselves.