This technology is a method to choose training examples for Large Language Models (LLMs) that can be used to build more accurate and relevant models.
Large Language Models (LLMs) are powerful machine learning models that are highly adaptable and rely on training datasets. The amount of available training data can influence model parameters and the accuracy of output predictions. One major concern is that, due to the limited size of training data, there may be imbalances in data coverage. This becomes an issue when a LLM is tasked with making predictions in an area with scarce training examples to learn from, negatively affecting accuracy and performance of the LLM.
This technology describes a method to improve LLM accuracy by increasing the relevancy of the training examples. The method measures the similarity between the query and the training data by calculating the cosine similarity between the two. By optimizing the relevance of the training set, this method aims to improve the accuracy of the LLM predictions.
Patent Pending
IR CU23217
Licensing Contact: Greg Maskel