• TheBlackLounge@lemmy.zip
    link
    fedilink
    English
    arrow-up
    11
    ·
    2 days ago

    No difference. Distillation is a valid and useful way of generating data to improve or make new models. It’s still just example data to be trained on. Anthropic is doing the same with their own models, and inadvertently every other model through web scraping.

    The legal difference is that this data is uncopyrightable. At most it’s a TOS breach, nothing major.