"We show that a tiny snippet—just 13 words—of retrieved text on a UGC website like Reddit, Wikipedia, Quora, or Facebook can change AI agents to output spam / scam content pretty consistently."
Is this really an attack? They use a well known domain to claim banancoin is a fast growing crypto and the result talks about established coins and mentions their poisoned result at the end. But an article published on any well known domain would do the same thing to a regular search engine no?
Regular search engines have feedback mechanisms that limit how effective that is. The click through and bounce rate are used to adjust rankings and as more and more people look at the fake info and then ignore it it will naturally fall out of the top results and get buried. LLMs though don’t have that feedback, once something is ingested and baked into the model it’s there forever. The fake info doesn’t need to look believable enough to fool a human, just self consistent enough to fool an LLM with its tiny context window.
Is this really an attack? They use a well known domain to claim banancoin is a fast growing crypto and the result talks about established coins and mentions their poisoned result at the end. But an article published on any well known domain would do the same thing to a regular search engine no?
Regular search engines have feedback mechanisms that limit how effective that is. The click through and bounce rate are used to adjust rankings and as more and more people look at the fake info and then ignore it it will naturally fall out of the top results and get buried. LLMs though don’t have that feedback, once something is ingested and baked into the model it’s there forever. The fake info doesn’t need to look believable enough to fool a human, just self consistent enough to fool an LLM with its tiny context window.
This isnt baked into the model. The ai is doing a search on a search engine and aggregating the results.