12:28:59 RRSAgent has joined #immersive-web 12:29:03 logging to https://www.w3.org/2023/09/12-immersive-web-irc 12:29:23 cabanier has joined #immersive-web 12:29:35 Zakim has joined #immersive-web 12:29:43 present+ 12:30:25 Agenda: https://github.com/immersive-web/administrivia/tree/main/TPAC-2023#agenda 12:31:22 gombos has joined #immersive-web 12:32:10 tidoust has joined #immersive-web 12:33:08 atsushi has joined #immersive-web 12:33:26 etienne has joined #immersive-web 12:34:41 present+ Laszlo_Gombos 12:35:16 present+ 12:35:24 topic: https://github.com/immersive-web/capture/issues/1 12:35:33 present+ 12:35:38 present+ 12:35:45 present+ 12:35:56 bialpio has joined #immersive-web 12:36:06 present+ Piotr_Bialecki 12:36:12 bajones has joined #Immersive-Web 12:36:49 present+ 12:36:49 Ada, you're a little quiet on the video stream 12:36:49 vicki has joined #immersive-web 12:36:49 present+ Francois_Daoust 12:36:49 present+ 12:36:51 present+ 12:36:58 present+ 12:37:56 ada: I think this is an important issue for immersive-ar sessions. Right now the only way to capture is to request raw camera access and composite the content on top. Using a very powerful API for a simple capture use case. 12:38:24 q+ 12:38:30 q+ 12:38:37 ack alcooper 12:38:38 ... it seems a bit dangerous for users to get used to allowing raw camera access for a simple use case, that the OS would better handle 12:39:39 alcooper: something we proposed a while back, we have a repo somewhere and did some thinking about it, we would like to push it more forward 12:39:39 q? 12:39:39 q+ 12:39:49 ack cabanier 12:39:56 rrsagent, make log public 12:40:00 rrsagent, publish minutes 12:40:01 I have made the request to generate https://www.w3.org/2023/09/12-immersive-web-minutes.html atsushi 12:40:14 cabanier: on the headseet you could end up capturing someone else 12:40:32 meeting: Immersive-Web WG/CG TPAC 2023 Day 2 12:40:35 ... there will definitely be a permission prompt needed but maybe also a sound 12:40:43 agenda: https://github.com/immersive-web/administrivia/tree/main/TPAC-2023#agenda 12:40:52 ... not aware of anyone asking for this feature 12:41:03 ada: did people ask for raw camera access? 12:41:13 cabanier: no one except for Nick 12:41:20 q? 12:41:24 ... but people have asked for more information about the environment 12:42:23 bajones: it concerns me that people are using raw camera access to capture the session, it's overly complex and requires you to do a lot of compositing 12:42:49 q+ 12:42:53 https://github.com/alcooper91/xrcapture 12:42:53 ack bajones 12:42:54 ... in the past we considered that most systems have an easy gesture to do this kind of capture 12:43:32 ... I wouldn't mind looking at the proposal again, but what I recall of that is that it was a way to request capture via the share API, the capture being made by the OS 12:43:49 ... I would be very concerned if we had evidence of people using raw camera access for this 12:44:03 ack alcooper 12:44:30 alcooper: droped the link to that explainer above ^ 12:44:37 ... can move it to the official repo 12:45:11 ... the shape of the API is to start capture and get a handle to an object that you can share but maybe not access, so we wouldn't need as strict of a prompt 12:45:28 ... I think AR is really where this makes the more sense, for VR the page is already in control of everything 12:46:06 bajones: I will point out that even in VR, if you're using layers it could get non-trivial to composite yourself 12:46:06 q? 12:46:12 ... so it could be an nice feature, even if it's more critical for AR 12:46:34 ada: 12:47:19 ada: maybe alcooper should move the proposal to the repo 12:47:22 alcooper: will do! 12:47:55 topic: https://github.com/immersive-web/webxr/issues/1338 12:48:41 ada: the issue here is that the input device the user is holding is also being rendered in XR, looking for the best way to tell the session that a control does not need to be rendered 12:48:47 MichaelWilson has joined #immersive-web 12:48:53 ... the only other example of this is screen input 12:49:20 q+ 12:49:28 ... but when you have a tracked pointer, how do you express things like the abscence of squeeze event without a profile 12:49:29 q- 12:49:35 Brandel has joined #immersive-web 12:49:46 present+ 12:49:48 ... one way to do this would be to send an empty glb file in that case 12:49:53 ... that wouldn't render anything 12:50:16 ... it could be pretty generic 12:51:01 ['generic-hand-invisible', 'generic-hand'] 12:51:16 q+ 12:51:32 ack bialpio 12:51:47 bialpio: one comment, I think we should have profile names on transient input sources 12:52:01 ... we should be sending generic-touch-screen or something 12:52:03 q+ 12:52:14 ack bajones 12:53:14 bajones: I think we do communicate profiles for transient input events. in the past we used external signals, like "if the environment mode is not opaque you don't need to render controllers" 12:53:21 ... usually a reasonable assumption 12:53:52 ... in the case of transient input on a touchscreen, usually in an AR session, we're not opaque 12:54:37 ... in this case of this laptop, or for hands, we might be in a VR mode but have an outline for hands. so the render mode is opaque but you shouldn't draw controlelrs 12:55:09 ... the input profile with the empty glb is a reasonable way to trick people into doing the right thing 12:55:10 q+ 12:55:19 ... but I wouldn't mind an explicit signal 12:55:47 ... having the profile named something -invisible might be enough of a signal 12:56:19 ... I don't need if we need an explicit boolean for that, the input profile string is _probably_ enough 12:56:19 ack cabanier 12:56:33 cabanier: I agree with bajones that this feels slightly hacky 12:57:44 ... if the case of the laptop does the WebXR knows that hands are present 12:58:07 ... and some experiences render custom controls, but you probably don't want to render them either 12:58:19 q+ 12:58:20 ... so it sounds better if there was something on the session, or somewhere 12:58:30 ack Brandel 12:58:33 ack Brandel 12:59:09 +1 12:59:20 Brandel: I think having the specific name -invisible might not be a good fit. is it invisible or independtly rendered? 12:59:30 ... it's not that it's invisible is that it's rendered elsewhere 12:59:35 q? 12:59:35 q+ 12:59:39 ... (outside of the XR session) 13:00:27 q? 13:00:32 cabanier: I don't know if you saw ??? that does hand tracking with the laptop camera, and in that case you want to say that the input is hands but you don't want to show them 13:00:50 ack cabanier 13:01:56 q+ 13:02:12 ack bajones 13:02:15 ada: commeting on the issue to recap, adding an enum but should we also tweak the profiles? 13:02:31 bajones: even if we have an explicit signal it's probably worth doing both 13:03:06 ... the worst case scenario is that experiences with custom controllers will render them anyway 13:03:22 ... and we can't fully avoid that, but we can make it easy for people 13:04:17 ... the input profile could probably be called -invisible, since the asset is, but the explicit signal should be called something else: system-rendered? but the system might _not_ be rendering, you might see the physical thing 13:04:35 ... user-visible doesn't sound right either, it's confusing 13:04:43 ... I don't know but I like Brandel's point 13:05:03 ... the flag shouldn't say "invisible" 13:05:07 ada: I'll make a note on the issue 13:05:11 q+ 13:05:38 ack Brandel 13:05:54 +100 13:05:59 Brandel: I agree on the asset / profile naming, but I would prefer "empty" instead of "invisible" 13:06:29 topic: https://github.com/immersive-web/webxr/issues/1338 13:06:56 topic: https://github.com/immersive-web/real-world-meshing/issues/3 13:07:32 cabanier: in our system and OpenXR, objects in the world can be a plane or a mesh 13:07:36 q+ 13:07:39 ... or both, but the site doesn't know which plane belongs to a mesh 13:08:09 ... the site could look at the space associated with each mesh or plane, and assume the same space means the same object 13:08:29 ack bialpio 13:08:30 ... but there might be a better way to report this 13:09:08 bialpio: one thing about using the space: for plane we say that all points in the plane should be at z=0 in the space, which might be a constraint 13:09:42 ... also do you need to render both the mesh and the planes, or rendering planes-only gives you a simplified view? 13:09:54 ... what will the app be expected to render in those cases 13:10:11 ... are the OpenXR bindings any indication? 13:10:34 ... for some use cases it might be important to know if the planes are enough or if you need to look at the mesh 13:11:02 cabanier: OpenXR gives us all the spaces in the scene, and we can filter them by plane or mesh, they just have different meta data 13:11:06 ... but the space returned is the same 13:11:33 ... to answer your question the mesh will also describe the plane and be a more precise representation 13:11:44 q+ 13:12:11 ack 13:12:16 ack bajones 13:12:35 bajones: I can see scenarios where you would render a mesh, like for occlusions 13:12:38 q+ 13:12:50 ... but are there scenarios where you would render a plane? or is it just for hit-testing? 13:13:05 cabanier: if you have both you might not need the planes at all 13:13:29 bajones: if you report both it might be for fallback 13:13:36 ... so older experiences still get planes 13:14:21 ... but if you request mesh data you probably don't need to mix it with plane data 13:14:21 ack bialpio 13:14:21 bialpio: as far as use cases, for plane detection we were thinking about hit testing and areas 13:14:23 ... like "will your object fit"? 13:14:41 ... this is a different use-case as meshing 13:15:07 ... we should at least give freedom to the implementation 13:15:13 q+ 13:15:18 ... but then developers will have to plan for both scenarios 13:15:43 ... maybe we could spec something where you can request meshing but fallback to planes? 13:16:00 ack ada 13:16:04 ... how do be helpful for developers? 13:16:27 ada: it shouldn't be too complex to generate a bouding box for the mesh 13:16:40 s/bouding/bounding 13:16:41 q+ 13:17:24 ack bialpio 13:17:24 bialpio: one remark: we had similar conversations while discussing hit test, the "entity type" 13:17:32 ... you would get a type with the hit test result 13:17:53 ... the idea was to give you a separate opaque object to correlate when hit comes from the same object 13:18:22 ... you could use JS equality to know that the mesh comes from the same object as the plane 13:18:29 cabanier: yes that was in my proposal 13:19:03 bialpio: since it ties to hit-testing we should do something similar 13:19:03 cabanier: agreed 13:19:22 bialpio: In these past conversations the concern was to tying the spec to a specific sub system 13:19:48 ... you could have point clouds, vs plane detection, vs mesh 13:19:54 ... but why would the site care 13:20:23 cabanier: I wasn't really thinking about hit testing, more about drawing things 13:20:43 ... in a way meshes are more modern but not every object has a mesh, a wall wouldn't get a mesh 13:21:08 bialpio: that's also related to object annotations on the Meta device 13:21:14 ... how would it work for other systems? 13:21:28 cabanier: it would probably be the same of systems that scan 13:21:43 bialpio: but how to know if a plane is just a simplified mesh? 13:22:07 q? 13:22:14 cabanier: meshes are pretty new still 13:22:29 ... but using the space would be _a_ solution for now 13:23:40 topic: https://github.com/immersive-web/webxr-hand-input/issues/120 13:24:14 cabanier: we talked about this a bit, on the quest pro because controllers are self-tracked we can report both hands and controllers 13:25:16 q+ 13:25:16 ... but do we need to expose this as a capability? 13:25:16 ... are there any use cases? an obvious one is to use the controller as a pen 13:25:16 ... is it worth the effort to expose, and should it be a feature flag? 13:25:16 ack ada 13:25:59 ada: it couldn't be on the same XR input, because the grip spaces would be different 13:26:45 q+ 13:27:00 ada: it generally feels weird (TODO: ada to add some details :)) 13:28:00 ada: I guess you could have handedness for accessories too 13:28:00 ... definitely something we should work on 13:28:00 ... we'll have to solve this eventually 13:28:32 q? 13:28:32 ack bajones 13:28:32 q+ 13:28:32 leonard has joined #immersive-web 13:28:51 bajones: not use I agree with ada. I think it's reasonable and desirable to have the hand information and controller info on the same XRInput 13:29:03 bajones: the "dangling controller" is an interesting edge case 13:29:39 ... one thing we discussed in the past: we could provide an estimated hand pose with a controller 13:30:10 ... we probably want some sort of signal that this is just an estimation 13:30:31 ... and then you can have a case where you get hand poses that are tracked and not estimated 13:31:07 ... the only weirdness is in the case where I have a controller + hand pose and I get a select event? what is it? 13:31:25 ... I guess the system will use the controller as a higher quality signal for select, but probably OS specific 13:31:36 q+ 13:31:49 ... in any case, apart from the "dangling controller scenario", sounds like that should work 13:31:58 ack cabanier 13:32:28 cabanier: one use case where they should be different is quest pro controllers being used as a pen 13:32:33 ... because you hold the controller differently 13:32:58 ... what bajones said about estimated the hands and rendering them is true, we have an API for that but not in WebXR 13:33:29 ... I like ada's proposal of calling this "accessories", it could be a controller or something else 13:34:04 ... in OpenXR these are exposed as "extra controllers", I think it makes sense to do the same 13:34:11 ada: you mean the hands are extra controllers? 13:34:20 cabanier: instead of 2 controllers you just get 4 13:34:39 ada: then we might want to have "primary controllers" in your sources list 13:35:24 cabanier: today the controllers already disappear 13:35:24 q? 13:35:24 ack ada 13:35:33 cabanier: but we didn't discuss if we _should_ expose this 13:35:40 ada: I think it's interesting! 13:35:51 ... it's the kind of things people are going to want to do 13:36:06 ... especially in the future with full body detection? 13:36:33 ada: what does the room thinks? 13:36:41 +1 13:36:45 +1 13:36:46 ... just do +1 / -1 in IRC 13:36:46 +1 13:36:47 +1 13:36:55 +1 13:37:03 +1 13:37:09 ada: I think it's worth looking at 13:37:12 +1 13:37:17 cabanier: definitely nobody is against it :) 13:37:26 ... but is it useful? 13:37:38 ada: it is! 13:39:07 ada: this is a new PR, as promised! now that Vision Pro was announced 13:39:14 ada: it's a small change 13:39:30 https://github.com/immersive-web/webxr/pull/1343 13:40:03 ada: the idea behind this input type, currently named "transient intent" 13:40:23 ... was related to gaze tracking 13:40:31 ... but it could be useful for general intents 13:41:17 ... and it's transient 13:41:26 ... the input only exists for the duration of the gesture 13:41:52 ... on Vision Pro when you pinch it uses your gaze for the target ray, but then it uses your hand 13:42:47 ... it's a bit like tracked pointers but the target ray space comes "out of your face" 13:43:05 ... it could be a fit for different input types, like accessibility 13:43:43 ... the conversation on the PR addresses some comments from bajones 13:43:52 q+ 13:44:13 ... he pointed out the the target ray space not updating is not really useful 13:44:55 ... one way would be for the developer to provide a depth map of the interactive objects to the XRInput at the start of the event, so that the system knows where you might want to interact 13:45:05 ... we could then update the input ray accordingly 13:45:11 ... but that's a huge API 13:45:36 ... another alternative would be to make the assumption that you probably interacted with something 1-2m away and then estimate 13:46:14 ... the last thing would be to modify the target ray space to act as a child of the grip space 13:46:28 ... (gestures to explain) 13:47:02 ... if the developer attaches to object to the target ray space instead of the grip space it would still work as expected 13:47:11 q+ 13:47:15 ... these are the options I have in mind 13:47:28 q+ 13:47:28 ... we should discuss and bikeshed the enum itself 13:47:44 ack biaplu 13:47:50 ack bialpio 13:48:43 bialpio: I think I understand why the space is not updated, but poses are a property of the spaces 13:48:54 ... and we need to update them even if nothing changed 13:49:18 ... the other concern is that we can create hit test subscription out of spaces 13:49:29 ada: yes but the input is transient 13:49:42 bialpio: need to think about hit testing a bit more 13:49:58 ... but we should make sure that the XR system knows the origin of that space at all time during the gesture 13:50:14 ... someone might subscribe to the hit test, and we need to know the latest state of the space 13:50:26 ada: in this case the third option might work out 13:50:39 bialpio: yeah that might be the easiest way for this to work 13:50:50 vicki_ has joined #immersive-web 13:50:50 ada: it is important that hit test works 13:51:07 bialpio: do we need to spec out that something _should not change_ 13:51:12 ada: or _might now change_ 13:51:25 ... for gaze tracking we don't expose continuous gaze 13:51:34 ... for obvious privacy reasons 13:51:47 bialpio: we could not change it, but keep things open 13:51:51 ... it's not guaranteed not to change 13:52:20 ada: we should probably say something about solution #3 for cases where it won't update continuously 13:53:25 q? 13:53:25 ack bajones 13:53:28 ack bajones 13:53:31 ... it shouldn't just work well for button presses but also gestures 13:53:43 bajones: lots of thoughts 13:53:56 ... but first thank you ada! 13:54:13 ... not sure I fully understand the concerns about spaces updating or not 13:54:42 ... first thing, should this be a "gaze input", we should strive to match the pointer type to what fits it best 13:54:53 ... and introducing a new enum as needed 13:55:03 ... on Vision Pro it's a gaze input that happens to be transient 13:55:19 ... on some platforms in might not be 13:55:45 ... to extend it to assistive devices it would be desirable for them to advertise as gaze input 13:55:53 ... if there's nothing particular to call out 13:56:09 ... as long as there's a reasonable behavior analog 13:56:22 ada: just to clarify, you think we should also use gaze like the existing gaze input 13:56:41 bajones: yes it's probably okay to have the behavior you described on a gaze pointer 13:56:53 ... people don't have that much expectation about how it should behave 13:57:11 ... people are using this to inform "should I render something" and "where should I render it" 13:57:30 ... same for screen input, when advertised as tracked-pointer people to generate an input that goes with it 13:57:41 ... so we should see which behavior we most closely align with 13:57:56 ... a transient gaze input is not an issue 13:58:01 ada: but it has a grip space too 13:58:14 bajones: that's good to refine the behavio 13:58:19 bajones: that's good to refine the behavior 13:58:33 q+ 13:58:44 ... so if you take a system like this (Vision Pro), advertised as gaze but with a grip present 13:58:49 ... that's a strong signal for the page 13:59:20 ... I do have questions about how this interacts with hand input 13:59:28 ... tracked input + transient eye-based input 13:59:37 ... what's the grip pose at that point 13:59:43 ada: I have experience with this 14:00:02 ... the demo I tested this with is a chess game, to see how to interact with fine objects 14:00:10 ... the gaze has enough granularity for this to work 14:00:28 ... I also built this with pure hand tracking 14:00:37 ... and ended up with both on 14:00:58 ... but it just worked together pretty well 14:01:18 ... for buttons and menu it was nice to just look & pinch 14:01:27 ... they just work together very well 14:01:39 ... the hand based input didn't generate unwanted squeeze events 14:02:01 ... a developer could just use each for different interactions 14:02:03 q+ 14:02:24 bajones: just to clarify, in that mode, the hand input gives you full poses but never generates a select event 14:02:27 ada: correct 14:02:46 bajones: that should be okay, the spec specifies that you need at least one primary input for select events 14:02:55 ... the eye-based primary input would be that 14:03:02 ... it's transient but that's propably okay 14:03:14 ... again thank you for this work 14:03:41 ... I want to discuss the manipulation of the ray when you select _again_ 14:03:53 ... I'm concerned about the automatic parenting of the 2 spaces 14:04:09 ... that would be the thing for precise interactions 14:04:26 ... but I'm worried that it's contextually sensitive and wouldn't generalize 14:05:25 ... it makes a lot of sense to do gaze, pinch, drag 14:05:25 ... but then if I how would the ray pivot? 14:05:25 ... there's a lot of nuance to that 14:05:47 ... very specific to the application, only appropriate in some cases 14:06:05 ... in the chess scenario I might want to pivot the piece in place, rather than swinging it around 14:06:27 ... but in the positional audio sample, I probably want to speaker to pivot around my hand 14:06:37 ... not sure the browser is the right place to make this decision 14:06:54 ada: so maybe we should go back to having the target ray space not update 14:07:27 bajones: if it stays static, we loose simple click and drag gestures 14:07:35 ... buttons are definitely the interaction that I see most 14:07:55 ... I do wonder if you could take the initial eye ray, then lock it to the head 14:08:12 ... then the application could opt into something more precise 14:08:20 ... not sure if I advocate for that or not 14:08:31 ... it would allow the existing applications to work, but weird 14:08:45 ada: I would rather having things obviously not work 14:09:15 bajones: it's what you get with cardboard but it's expected 14:09:36 ... there's a subset of experiences that won't be updated 14:09:49 ... it would be sad if they didn't work on shiny new hardware 14:10:03 ... but it might be an awkward fallback 14:10:16 ada: video intermission 14:10:41 ... showing the chess demo 14:13:24 q? 14:13:26 ack cabanier 14:13:42 cabanier: so what happens if you enable hands at the same time? 14:13:55 ada: you get both, but hands don't send select events 14:14:07 cabanier: so you get the input source until you let go of the gesture? 14:14:08 ada: yes 14:14:22 ... they come through as not-handed 14:14:38 cabanier: you say it's a hand? 14:14:48 ada: currently we say it's a single button 14:14:58 ... and not rendering anything 14:15:14 ... but this is hardcoded (the not rendering anything part) 14:15:24 ... otherwise you get a generic controller rendered doing the pinch 14:15:45 ... but if we do what bajones suggested and use gaze it might just work 14:15:49 q? 14:15:51 ... what's the input profile for gaze? 14:16:20 ada: we know the line we're looking down, but not the depth 14:16:31 cabanier: so you don't know the point in space 14:16:33 ada: no 14:16:54 cabanier: in openXR there is an extension that gives you that: the point in space you're looking at 14:17:04 Jesse_Jurman has joined #immersive-web 14:17:24 ada: do you think you would use that if we had it? 14:17:31 cabanier: yes I would prefer using that 14:17:44 ada: so you'd start the ray from the point you're looking at 14:17:53 cabanier: yes that would be the origin of the ray, not exactly sure 14:18:20 ada: it might be weird if you're deeper and you want to do a raycast 14:18:33 ack alcooper 14:18:34 ... I can look into that 14:18:51 cabanier: if it has hands enabled as well, people are probably going to use that 14:19:11 ada: nothing changes with hands enabled 14:19:29 ... you get a third input (2 hands + additionals) 14:19:43 ... in this case up to 2 extra transient inputs 14:19:53 cabanier: how does it work with existing experiences? 14:19:58 ada: works pretty well! 14:20:30 ... it works well with both hands + the gaze 14:20:55 ack alcooper 14:20:59 ... if you want you can test it with the positional audio demo 14:21:18 alcooper: bajones made the suggestion of just using gaze 14:21:32 ... it was my first instinct too 14:21:40 ... especially with transient clearly in the name 14:21:49 q+ 14:22:02 ... having scrolled through the spec there's a bit of a note about the gaze 14:22:12 ... we might want to soften the language 14:22:24