Scientists develop AI monitoring agent to detect and stop harmful outputs
According to the research study, the agent is versatile enough to monitor existing LLMs and can stop damaging outputs, such as code attacks, before they take place. Per the research:”Agent actions are audited by a context-sensitive screen that implements a rigid security limit to stop an unsafe test, with suspect habits logged and ranked to be analyzed by humans. Source: Naihin, et., al. 2023To train the monitoring representative, the researchers constructed an information set of nearly 2,000 safe human-AI interactions across 29 various jobs varying from easy text-retrieval jobs and coding corrections all the method to developing whole webpages from scratch.Related: Meta liquifies accountable AI department in the middle of restructuringThey also created a competing screening information set filled with manually developed adversarial outputs, including lots purposefully developed to be unsafe.The information sets were then utilized to train a representative on OpenAIs GPT 3.5 turbo, a modern system, capable of differentiating between possibly harmful and harmless outputs with an accuracy factor of almost 90%.