Large language models (LLMs) are artificial intelligence (AI) tools that respond to text prompts. They can be made to pursue goals and work on complex, multi-stage tasks by enveloping them with repeat-prompting systems. Many such systems exist and are freely accessible. Through Internet access and user assistance, they can cause significant harm in the real world as soon as they become sufficiently effective. The guardrails imposed by LLMs are currently unable to prevent such harm.
The effectiveness of repeat-prompting systems is currently constrained by the immaturity of their programming and the limited context window size of the underlying LLMs. With hundreds of developers cooperating to improve these systems, however, they may be brought to effectiveness without further advances in underlying LLMs. The larger the context window size, the easier this will be and the sooner such systems will pose significant risks.
With repeat-prompting systems freely downloadable, the only way to ensure their safety is through the underlying LLM. As an alternative to shutting down LLMs outright, which may not be possible or desirable, vendors and open source developers of LLMs should temporarily limit the context window size available to the public. An evidence-based mechanism can be used to lift such restrictions as guardrail systems are shown to be effective in an adversarial setting.
For a detailed explanation, please refer to the PDF linked above.
Leave a comment