Webpage: https://datatracker.ietf.org/group/aicontrolws/about/
https://datatracker.ietf.org/group/aicontrolws/about/
Large Language Models and other machine learning techniques require voluminous input data, and one common source of such data is the Internet -- usually, "crawling" Web sites for publicly available content, much in the same way that search engines crawl the Web.
This similarity has led to an emerging practice of allowing the Robots Exclusion Protocol (RFC 9309) to control the behavior of AI-oriented crawlers.
This emerging practice raises many design and operational questions. It is not yet clear whether robots.txt (the mechanism specified by RFC 9309) is well-suited to controlling AI crawlers. A content creator or host may not be able to distinguish a crawler used for search indexing from a crawler used for LLM ingest – and indeed some crawlers may be used for both purposes. Potential use cases may extend across many different units of content, policies to be signaled, and types of content creators. Before robots.txt becomes a de facto solution to AI crawling opt-out, it is necessary to examine whether it is an appropriate mechanism: in particular, whether the creator of a particular unit of content can realistically and fully exercise their right to opt-out, and the scope of data ingest to which that opt-out applies.
This workshop aims to explore practical opt-out mechanisms for AI, and build an understanding of use cases, requirements, and other considerations in this space. The workshop will focus on mechanisms to communicate the opt-out choice and their associated data models. Technical enforcement of opt-out signals is not in scope.
The IAB is looking for short position papers on the following topics; however, this list is non-exhaustive and should be interpreted broadly:
Because robots.txt is emerging as a solution in this space, the discussion will be anchored on it as a starting point, but not limited to that mechanism. Proposals for alternative solutions may be made, but time will not be available for a detailed presentation or discussion.
Interested participants are invited to submit position papers on the workshop topics. Participants can choose their preferred format, including Internet-Drafts, text- or word-based documents, or papers formatted similar as used by academic publication venues. Submission as PDF is preferred. Paper size is not limited, but brevity is encouraged. By default, submissions that are considered relevant will be published on the workshop website. If you wish for your submission to be anonymised or withheld from such publication, please indicate that clearly in the submission.
The organizers will issue invitations based on the submissions received. Sessions will be organized according to the submissions received, and not every accepted submission or invited attendee will have an opportunity to present; the intent is to foster an active discussion and not simply to have a sequence of presentations.
Discussion at the workshop will be held under Chatham House rule, and therefore will not be recorded or minuted. However, a workshop report will be published afterwards. It is anticipated that the workshop report will include:
The workshop will be by invitation only. Those wishing to attend should submit a position paper to ai-control-workshop-pc@iab.org. Position papers from those not planning to attend the workshop themselves are also encouraged.
Logistics: