06:13:57 RRSAgent has joined #GenerativeAI 06:14:01 logging to https://www.w3.org/2023/09/13-GenerativeAI-irc 06:26:52 tidoust has joined #GenerativeAI 06:26:59 RRSAgent, do not leave 06:26:59 RRSAgent, make logs public 06:27:01 Meeting: The Impact of Generative AI on the Web 06:27:01 Chair: diogocortiz, reinaldoferraz, Vagner Diniz 06:27:01 Agenda: https://github.com/w3c/tpac2023-breakouts/issues/6 06:27:03 clear agenda 06:27:05 agenda+ Pick a scribe 06:27:07 agenda+ Reminders: code of conduct, health policies, recorded session policy 06:27:09 agenda+ Goal of this session 06:27:11 agenda+ Discussion 06:27:13 agenda+ Next steps / where discussion continues 07:20:14 tidoust has joined #GenerativeAI 08:00:19 christianliebel has joined #GenerativeAI 08:04:31 DigoCortiz has joined #GenerativeAI 08:22:31 ReinaldoFerraz__ has joined #generativeai 08:23:00 DiogoCortiz has joined #GenerativeAI 08:35:40 zolkis has joined #GenerativeAI 08:43:43 Vagner_NIC_br_ has joined #GenerativeAI 08:56:17 DiogoCortiz has joined #GenerativeAI 08:58:37 present+ 08:58:48 start meeting 08:58:58 Mizushima has joined #GenerativeAI 08:58:58 npdoty has joined #GenerativeAI 08:59:49 q? 09:01:00 tidoust has joined #GenerativeAI 09:01:03 present+ 09:02:35 shunya has joined #GenerativeAI 09:05:33 present+ 09:06:09 anssik has joined #GenerativeAI 09:06:09 Present+ Anssi_Kostiainen 09:06:09 diogo: agenda 09:06:09 diogo: data collection on the Web 09:06:09 diogo: inegration of LLMs in a search engine 09:06:09 cpn has joined #GenerativeAI 09:06:09 diogo: production and publication of synthetic contents on the Web 09:06:09 present+ Chris_Needham 09:06:09 JohnRochford has joined #GenerativeAI 09:06:09 present+ 09:06:16 diogo: we hope anticipate challenges to understand the issues and plan to use inputs at IGF's meeting. 09:06:59 diogo: three maind questions to be discussed 09:06:59 present+ Tomoaki_Mizushima 09:07:38 ... 1) what the limit to scrap web data to train Generative AI 09:09:49 2) what are the potential impacts of incorporating LLMs 09:09:49 ... 3) how can Web help detect AI-generated content 09:09:49 ling has joined #GenerativeAI 09:09:49 @@: directive in the EU, regarding Text and Data Mining, and allows for opt-out through machine-readable means 09:09:49 ... generic definition, but it is possible 09:09:49 ... can't specify every other reason that a publisher might not want their data to be used 09:09:58 tzviya has joined #GenerativeAI 09:09:58 present+ 09:09:58 ... Community Group, TDM Reservation Protocol, machine-readable way for publishers to opt-out .. and how to contact me if you want to use the data 09:09:58 ... some publishers are using it, in France, Sweden, etc. 09:09:58 ... people still fear that companies like Google who use AI for translation, snippets, etc. 09:10:07 q? 09:10:07 ... need to be sure that saying no to TDM doesn't mean no to indexing 09:10:29 gendler has joined #generativeai 09:10:44 alexm has joined #GenerativeAI 09:10:45 reillyg has joined #GenerativeAI 09:11:28 bmay has joined #GenerativeAI 09:11:28 gendler: thanks for starting us off. News Corp is a conglomerate of newspapers and television 09:11:28 Bert has joined #GenerativeAI 09:11:28 alexrudenko_ has joined #GenerativeAI 09:11:28 +present 09:11:28 present+ 09:11:28 present+ 09:11:28 fremy has joined #generativeai 09:11:28 ... three distinct questions: 1) how users can express opinions; 2) scraping content on the Web generally; 3) xxxx 09:11:38 humera-eyeo_ has joined #GenerativeAI 09:11:39 RRSAgent, pointer? 09:11:39 See https://www.w3.org/2023/09/13-GenerativeAI-irc#T09-11-39 09:11:53 GautierC has joined #GenerativeAI 09:11:54 ... US news publishers have been thinking about this question, don't have a formal proposal, but do have an idea that we have to started to build consensus around in the US 09:11:57 danbri_ has joined #generativeai 09:12:02 Christian has joined #GenerativeAI 09:12:05 present+ 09:12:06 dsinger has joined #GenerativeAI 09:12:07 ... see the best place to do the work is a technical standards body 09:12:16 ... purposes is not something we can express easily in machine-readable form 09:12:24 ... don't have specific law to be aiming for in the US 09:12:25 q+ to add a different main question 09:12:32 s /xxxx/how can Web help detect AI-generated content/ 09:12:45 ... things that aren't covered by terms of service 09:12:59 ... crawlers indicate that they don't have any way to know that scraping for a particular purpose isn't allowed 09:13:16 ... need to be able to indicate what purpose the scraper is grabbing this for 09:13:23 ... arguing about how specific the purposes should be 09:13:30 q+ to ask if we are writing for the AI engines or or AI users 09:13:38 ... would prefer to align this with copyright law in general, taking inspiration from Creative Commons 09:13:46 q? 09:13:54 ... attribution, commercial, derivative works 09:14:06 hyojin has joined #GenerativeAI 09:14:11 q+ on something not focused purely on copyright 09:14:21 ... similar formatting to how robots.txt work 09:14:33 ... interested in feedback from technical experts 09:14:54 ack dsinger 09:14:54 dsinger, you wanted to add a different main question 09:15:02 present+ 09:15:30 dsinger: embedded URLs make a big part of the Web. but even for AI engines that generate references in the Web, those citations are just a plausible set of citations, rather than the actual data source that is relied upon 09:15:34 present+ 09:15:38 ... and some don't provide citations/urls at all 09:15:57 ... no assurance that the same AI engine will give the same output to a different person 09:16:14 ... not a Web technology, if it's not pointing in or out 09:16:25 ... is it an existential threat to the Web at all? 09:16:44 DiogoCortiz: may point to the second question on the slide 09:16:47 q! 09:17:00 q+ 09:17:14 q+ to respond to David 09:17:35 ... different players trying to include yyyy 09:18:14 ... Google is reading my website and using the response, and may make a reference to my website. but users still might not use my website any more 09:18:28 ... and so it could change the traffic distributed on the web, or move the power to search engines 09:18:35 I can take over scribe 09:18:38 Scribe+ 09:18:49 Laurent__ has joined #GenerativeAI 09:19:08 tzviya: One of the things we have at Wiley, a publisher of books and journals, I work on governance ... 09:19:20 q? 09:19:27 ... whether we like it or not, Gen AI exists. Whether or not it's an existential issue, the toothpaste is out of the tube.... 09:19:33 ack tzviya 09:19:33 tzviya, you wanted to ask if we are writing for the AI engines or or AI users 09:19:48 https://syntheticmedia.partnershiponai.org 09:19:54 ...what specs can we write for the folks that create and use the engines? It is worth taking a look at the materials that have been created by Practices for Synthetic Media... 09:20:50 ...these are best practices created by orgs you will well know. While proposed solution of TDM is good, it's unclear whether that would be respected when added as a signal on the web. 09:21:19 npdoty: Nick Doty, CDT. I believe it's very important to have these publisher perspectives in the room. Important industry... 09:21:56 ...I urge that when we look at these controls, we look past copyright. All these big companies will sue each other, good luck. Most of us will not be part of those lawsuits, but we will have plenty of interests.... 09:22:11 ...privacy is an issue that is not going to be well covered by copyright issues. 09:22:21 s/issues./issues.... 09:22:27 q? 09:22:47 ack npdoty 09:22:47 npdoty, you wanted to comment on something not focused purely on copyright 09:22:48 ...In response to David Singers, there is some work that we can do some standards work around provenance. 09:22:55 q+ 09:23:04 q+ to offer that we could write guidelines and best practices 09:23:52 remy: Thank you very much, I wanted to address a question of if you don't have links, does that destroy the web as we know it? Naturally, there will be a stronger contection between search engines and data content providers... 09:24:03 ...most of the queries that matter to people are questions of real time nature, sports, political events... 09:24:23 bdekoz has joined #generativeai 09:24:31 ...basically if you want to provide a service like G search or Bing, these things will exist, but people will create and provide high quality data. 09:25:03 hey can somebody repost the link to the google/meta document about generative content guidelines? 09:25:12 ?: I would like to respond to david, I believe that the concept of the website may be archaic in ten years. Some people think this is a threat, others think it's massive hype.... 09:25:38 s/remy/fremy/ 09:25:40 thanks! 09:25:47 s/?/danbri_/ 09:26:10 ...it could be as big as the web, could be a blessing, could be a curse. We've made a change here, around computers who understand natural languages. How does that change issues? 09:26:15 q? 09:26:21 Jaunita- has joined #Generativeai 09:26:37 ack fremy 09:26:40 Q+ 09:26:51 ack danbri_ 09:26:51 danbri_, you wanted to respond to David 09:27:16 q? 09:27:41 ...it could be good for technical reasons, we could summarize millions of pages with an LLM, turn those into 10,000 pages, a tweet, a tag, it might be much better from an eco-wide perspective. What willl be the need for browser bookmarks? Could these systems adjust how we use browsers, how their organized. Some things will be expedited... 09:27:59 ...possible that medical publishers will advance faster with LLM help. 09:28:21 ack gendler 09:28:47 ack dsinger 09:28:47 dsinger, you wanted to offer that we could write guidelines and best practices 09:28:55 dsinger: Answering my own question, we tend to think that we can write a specification to fix these issues, but I think we should focus on guidelines, for things that help build best practices.... 09:29:27 ....if you train on systems that work on high quality, but not truth, you can't then claim your system builds truth. 09:29:39 q? 09:29:41 q? 09:30:22 Juanita: I find that it would be interesting to figure out how we can democratize information and education around the world.... 09:30:31 ack Jaunita 09:30:44 ...there are issues with governments that may try to block access to generative AI, and we should think of these issues. 09:30:55 q+ 09:31:29 diogo: Quick response to this, we are from brazil and we have similar questions around this. We have research around the topic of bias in LLMs, for example... 09:32:10 ...in chatgpt in portugese, in google bard, there is a lot of bias especially with our cultural work. The bot responds in portugese, but it's clear the content is based on American culture.... 09:32:15 s/work on high quality/work on high quality writing/ 09:32:16 q 09:32:17 q? 09:32:32 ...we need to make sure more data is in those LLMs, so that a global perspective is covered. 09:33:15 brian: I think it's an interesting question, as we discuss the shape of the web post gen-AI, we need to think of gen-AI as potnetially having a 'napster moment', where this new technology affects how people interact with the web... 09:33:33 q? 09:33:36 q+ 09:33:47 q+ to ask what standards/guidance w3c can write 09:33:51 ....how will people use information without permission, how will that change things simillarly to how things didn't matter if they were physical. 09:34:25 q? 09:34:30 bdekoz: Hello, I am from africa, and very excited about this conversation around Gen AI. I'm glad we've woken up to all of the possibilities... 09:34:32 ack bmay 09:35:39 q? 09:35:48 ...a question from the perspective of those who create content. A question for the big companies, are you going to block generated content from your systems. Someone can create genAI content just with an attempt to monitize it, will you give penalties to those that do not create work from scratch? Will you protect journalists who need to put in groundwork on these questions and creation. 09:35:50 q+ 09:36:34 danbri: For images, we've collaborated with the ? to help with embedding information into the metadata in order to help us understand whether it came out of a generative tool... 09:36:54 Q+ 09:36:56 ...but not subtle enough. Could be a completely generated image, or something that upscaled a low quality photo... 09:37:32 q+ on content provenance, https://c2pa.org/ 09:38:20 ...text is different. I think there's been a risk of LLM models being used by all sorts of supports. It's hard to find whether something is from a generative model as opposed to a human writer. You end up with material that is part you, part machine. Should that be on the content creators? People may want to find cheaters, but will that take away tools for accessibility.... 09:38:44 ...there is no single answer on what we can rush into, we need to be careful about how we handle this. 09:39:18 q? 09:39:21 Diogo: google has updated their policy, that if you create political content with AI systems, you need to label this content. But how you catch those seems to be a technical challenge. 09:39:49 ack bdekoz 09:41:00 bdekoz: I'm benjamin from Mozilla. On the limits of scrapping content, and protecting creators, robots.txt is not sufficient. We need something like a generative.json with more options than what we're getting out of robots.txt.... 09:41:10 ...creators need prompted metadata or watermarks that will clearly indicate when Gen systems are used, and artisic style should be referenced and labeled.... 09:41:27 ...we should make sure this information is connected to image. 09:42:34 diogo: There have been legal questions around how we can or cannot copyright prompts. Some say prompts are like an essay, and so they are significant enough to be covered by the copyright laws. 09:42:34 https://www.w3.org/TR/webmachinelearning-ethics/ 09:43:09 ack tzviya 09:43:09 tzviya, you wanted to ask what standards/guidance w3c can write 09:43:09 tzviya: I think we should focus on what kinds of standards or guidelines we can actually write in w3c. Referencing the Ethical principles for web machine learning.... 09:43:14 https://www.w3.org/TR/webmachinelearning-ethics/ 09:43:28 q? 09:43:42 qq+ to give a status update on https://www.w3.org/TR/webmachinelearning-ethics/ 09:43:52 ...it walks through many of the issues, including the possible exploitation of private information. We have not focused enough on issues outside of copyright. One that was briefly touched on was bias in a system becoming worse as it gets built upon. You need human intervention, but how do you write guidance for what that intervention needs to be. We certainly need more guidance. 09:44:07 diogo: beyond copyright, in EU/USA there are VARA rights for artists that allow declaiming artworks and dissassociating creators from works that become broken or mangled by their owners 09:44:57 q? 09:45:01 humera: My question is in reference to how the economy will be transformed. What will the impact of search results be when the ads are embedded into the chat bot? How will ad blockers affect this? How will this transform the economics of the web and how ads play a role? 09:45:52 JohnRochford has joined #GenerativeAI 09:45:55 anssik: I would like to follow up on tzviyas point on ethical principles. I was part of the taskforce that worked on this. We lost some contributors, and welcome new members to take part and contribute to the guidelines. I will put a link in the irc. Please follow up! 09:45:55 https://www.w3.org/TR/webmachinelearning-ethics/ 09:46:10 q? 09:46:16 ack annsik 09:46:29 ack humera-eyeo_ 09:46:36 ack Jaunita- 09:46:46 jaunita: are there any standards that could be written to protect chatgpt against governments or bad actors that would be interested in creating misinformation versions of chat-gpt. 09:46:46 q- anssik 09:46:55 q? 09:46:56 q? 09:47:23 q+ to talk about the need to discuss and be visible 09:47:35 q? 09:47:35 https://c2pa.org/ 09:47:40 nick: So many great ideas, it sounds like we have a few areas of work to consider. I wanted to follow up on some of the ideas about providence. The need to say where this image or writing came from, or the multiple steps that are involved. One source to look at is the ???... 09:47:44 ack npdoty 09:47:44 npdoty, you wanted to comment on content provenance, https://c2pa.org/ 09:48:34 ...this is related to a problem we've already seen, how has this image been altered. We have many questions about misinformation, we don't need AI to make misinformation.... 09:48:34 q+ 09:48:34 s/???/Coalition for Content Provenance and Authenticity (C2PA) 09:48:34 q? 09:48:39 AWK_ has joined #GenerativeAI 09:48:41 ...this may be a mechanism that many people find useful. How do we indicate how our content was developed, and what steps were taken. 09:48:48 q? 09:48:53 q+ 09:49:07 jamesn has joined #generativeai 09:49:21 q? 09:49:28 david: I think you're absolutely right, this problem and the problem of misinformation are intertwined. We've made a conclusion jump, that we don't know enough, and so we have to not say much. We should be much more urgent to talk about it. 09:49:39 npdoty: is there a way for human content creators to indicate authenticity or embed some kind of private key that can be authenticated later? 09:49:39 ack dsinger 09:49:39 dsinger, you wanted to talk about the need to discuss and be visible 09:49:59 gendler: prompted by npdoty, generative ai often talked about as a completely new thing 09:50:12 q? 09:50:28 +1 AI is not new 09:50:29 ... thinking along similar lines has happened. generative ai is only exacerbating issues that have already existed for the web 09:50:37 Because we don't know how to solve a problem, we think we should avoid discussing it or talking about it. That's a conclusion jump. In discussion, and in publication of discussion papers or guidelines, we may well make progress. 09:50:46 ... people have been working on them, but now ai may have accelerated the concern 09:50:56 ... look at work that has already been done 09:51:00 Resource for image provenance metadata: https://contentauthenticity.org/ 09:51:03 q? 09:51:03 q? 09:51:19 q? 09:51:32 brian: I wanted to follow up on my earlier point, when we dicsuss things such as provenance, we're talking about being able to trace a relatively discreet thing from one point to another point... 09:51:35 +1 AWK, that was a work item from the c2pa group that I was mentioning 09:51:49 ack gendler 09:51:50 ack bmay 09:51:51 AWK_: thanks 09:52:04 ...in the world of AI, I'm seeing many points being thrown into one. I don't understand how you indicate the 100 milliion provenances... 09:52:04 q+ 09:52:26 ....if a new input is put into the machine, how do we plot those inputs. We need to think of the world in a new way. 09:52:27 q? 09:52:28 q? 09:52:48 gendler: useful again to look at the work already been done 09:53:11 ... algorithmic auditing work may be relevant to large language models 09:53:17 ... potential references to look at from CDT 09:53:17 q? 09:53:21 q? 09:53:24 ack gendler 09:53:35 Diogo: we solicit comments on a conclusion. 09:54:01 Diogo: the idea of this session was to generate insights, and not to discuss a specific point, but on many perspective from the impact of LLMs and generative AI on the web... 09:54:02 q+ on tdmrep and google proposal re robots.txt opt-out 09:54:39 ...we are seeing different types of concerns, and we will clearly be having more conversations in the coming years, because it's not just on the technical side, there will be regulations that will impact how the technical side should work... 09:54:48 npdoty: can you link to the google proposal plz? 09:55:04 ...for example in Brazil, they are trying to pass a new law for copyright that is quite different from EU systems... 09:55:13 q? 09:55:32 google has described interest in this work, not a specific technical proposal: https://blog.google/technology/ai/ai-web-publisher-controls-sign-up/ 09:55:34 ...it allows research, but companies will not be allowed for scrapping. So the technical ways will have to be different for different regulations. 09:55:51 ack npdoty 09:55:51 npdoty, you wanted to comment on tdmrep and google proposal re robots.txt opt-out 09:55:52 danbri_ has joined #generativeai 09:56:32 nick: In terms of the concrete work, and where to continue the conversation, for the specific area of reserving rights, there has been a public proposal from google where they would like to start a conversation, and if the TDMRep CG is the right place to discuss this, or in a new CG. 09:56:49 ?: could technologists help detect AI generated content? 09:56:54 +1 tzviya on GAI CG 09:57:01 re iptc, acquireLicense schema —- https://x.com/brendanquinn/status/1375011632137580544 09:57:21 q? 09:57:48 I'd also be happy to see a GAI CG 09:57:56 +1 to a CG 09:58:06 Diogo: certainly that will be the challenge in the next few years, but it seems like the issue for text will be significantly different to the work for images. Different companies are using different approaches, but how do we add watermarks to a piece of content... 09:58:24 +1 to a CG 09:58:27 close queue 09:58:29 q+ to suggest a CG to produce one or more discussion papers, then guidelines – and that might inspire one or more specs 09:58:33 ...can we create a standard or guidline on how we detect if this is posted on the web? Is that possible? 09:58:35 For the non-AI misinformation world, we have ClaimReview and MediaReview schemas, eg see. https://fullfact.org/blog/2020/mar/how-we-spoke-77-fact-checkers-and-helped-21-them-start-using-claim-review-schema/ 09:58:43 also, to dwsinger's point on misinformation, we had a Credible Web CG, now inactive. can we restart it? 09:59:00 close queue 09:59:24 zakim, close the queue 09:59:24 ok, tzviya, the speaker queue is closed 09:59:36 david: I would like to make sure that this discussion doesn't evaporate if we leave the room. Should we write a paper or two? That may lead to guidelines, that may lead to specifications. Let's get a CG together to publish thing, publicize concern, etc. 09:59:38 I don't think we should do it slowly, but I'm happy to in parallel work on guidelines and some technical proposals 09:59:45 about TDM Reservation Protocol: https://www.w3.org/community/tdmrep/ 10:00:31 rrsagent, make minutes 10:00:32 I have made the request to generate https://www.w3.org/2023/09/13-GenerativeAI-minutes.html ReinaldoFerraz__ 10:03:29 ling has left #GenerativeAI 10:42:45 bdekoz has joined #generativeai 11:54:10 dsinger has joined #GenerativeAI 12:05:13 tidoust has joined #GenerativeAI