No difference. Distillation is a valid and useful way of generating data to improve or make new models. It’s still just example data to be trained on. Anthropic is doing the same with their own models, and inadvertently every other model through web scraping.
The legal difference is that this data is uncopyrightable. At most it’s a TOS breach, nothing major.
No difference. Distillation is a valid and useful way of generating data to improve or make new models. It’s still just example data to be trained on. Anthropic is doing the same with their own models, and inadvertently every other model through web scraping.
The legal difference is that this data is uncopyrightable. At most it’s a TOS breach, nothing major.