Bluesky has published a draft proposal on Github regarding adding an option for users to specify whether to allow posts or data to be used for training AI. The recent growth of Bluesky users is partly due to X’s policy opening up data for AI training, even though Opt-Out is possible, prompting many to make the switch. Bluesky reiterates that they will not use user data for AI training, but since the platform is publicly accessible, reports of data being used for AI training continue to surface, to which Bluesky promises to find a solution.
Jay Graber, CEO of Bluesky, discussed this idea of setting options at the SXSW seminar last week, with the response flowing against it. Graber pointed out that companies developing Generative AI already extract public data from various websites for AI training, including Bluesky. The proposal by Bluesky is to set standards similar to robots.txt so developers can determine which data can be extracted.
The format proposed by Bluesky allows access control to work on the ATProtocol, enabling users to specify the types of data extraction allowed, such as for training Generative AI, cross-protocol data extraction, large dataset extraction, archiving, and more.
Graber admits that this approach is more like a labeling system and developers can still violate it if they choose to.
**TLDR**: Bluesky’s draft proposal aims to give users control over data extraction for AI training, amid concerns of public data being used without consent. Jay Graber discussed the idea at a recent seminar, acknowledging existing data extraction practices and proposing access control standards.
Leave a Comment