Meeting minutes
Consider whether to allow multiple sessions in browsers
<jgraham_> github: w3c/
<jgraham> RRSAgent: make minutes
jgraham: the original issue was that Classic only allowed on session. With bidi, we could have one tool running tests, and another constructing HAR files from network traffic.
jgraham: obvious way to support this is to have more than one session connecting to a browser.
jgraham: the spec theoretically currently assumes this is the case. There's a PR that never landed in Classic to make this actually work, since the spec currently uses Classic's definition of a session
orkon: in puppeteer, we can connect to a running browser. Want to clarify status. Maybe not use `session.new`, but `session.connect` to connect to a running browser. Why? Because you can't change capabilities of a running session, so connecting is a different workflow
AutomatedTester: this is something we are going to need to do. Potential overlap between sandboxing and multiple sessions. The use case I'd like to support is to create a sandbox tab and have a session per tab, so a client can run tests in parallel more quickly. This is something that Playwright does.
jgraham: about session.new I don't think it's necessary to have a new command. You can always fail to create a session if you're trying to change the capabilities of a running browser in a way that's not supported.
jgraham: I agree that conceptually there are two different operations (starting and connecting to a browser process) Historically we've done this to make the protocol more RAII. Adding a new command feels unnecessary.
jgraham: we should get to this point when we come back to sandboxing. There seem to be two different ideas about how this might work. Need to allow complete access to a running session (eg. to capture network traffic, or monitor windows)
whimboo: we also have situations where you are running a browser and later connect. In our case, we've seen people are running the browser with the protocol active, but without a current session. We would still be able to pass in capabilities. This new session would definitely want to set capabilities.
whimboo: so we could probably move this to the later sandbox discussion
<Zakim> orkon, you wanted to react to orkon
orkon: I think that after hearing the arguments, an extra command is unnecessary. Some capabilities are sent to the browser, some to the session, and some to both. Wanted to mention about the sandbox discussion we had yesterday. That doesn't require you to have a separate bidi session. We can discuss that later
whimboo: when we want front end sessions to connect with capabilities. Will they be overridden by the new session? If a new session that wants to connect can't do this, how should it fail?
<gsnedders> I think we do need a way to distinguish whether a new session is being connected, because the local end probably cares about whether they're getting a browser in a clean state or not.
<gsnedders> I don't care whether that's via a new command or a capability or whatever, but I think it's important that it is distinguishable
<jgraham> ack
<jgraham> * ack next
jgraham: regarding capabilities: you can fail the session on creation if the session can't be created. The spec is reasonably clear about those cases. In terms of needing to know whether you're getting a new browser instance or not, the local end would be in control of that, as it would know whether it was trying to connect to an existing browser instance or whether it was trying to connect to an existing one.
<gsnedders> if you're trying to run tests in parallel, you might not know that. imagine Selenium with Python and pytest and pytest-xdist.
jgraham: I assume clients would make it clear to users which it was trying to do (connect to new or existing browser)
jgraham: The service would make it clear whether you're getting an existing browser or not. Not sure what would happen at the protocol layer to support that.
jgraham: if you're connecting to an existing browser, presumably you could query to find out open windows, or such
<jgraham> Sam Sneddon [:gsnedders]: I'm imagining that, but I don't quite see the case where it would be a problem.
shs96c: the problem that can arise with selenium providers in the cloud. There is no guarantee that the routing will get to a running machine
… I think we should solve this at a protocol level so we can handle this
… we can have a session.connect that can put it to the right machine
… I can see that I would want to connect a test and handle things and then the provider to have a internal tool for building HARs
jgraham: if you have the session I can't see this being an issue
shs96c: In the network intercept module to change things and then the provider to also have a session in to collect this info
jgraham: I can see that then
shs96c: the routing in a cloud provider is why we should have a new command
<shs96c> s/shouldn't/should/
<jgraham> Specifically the thing that makes sense to me is that there might be a situation where you want different clients to get different events, even if you can share session ids.
orkon: regarding Sam Sneddon [:gsnedders] question. I think it would be really useful to be able to show how many connections are available
… then you can see if you are getting a "fresh" browser
<shs96c> orkon: it would be really useful if the response to new session tells you which sessions are already available in the instance. Then you can tell whether you're "alone"
orkon: unsure about using session id for routing, you would get a different socket url
shs96c: traditionally, all cloud providers have used session id as a routing key
AutomatedTester: generally what happens is that a client connects to a hub. The hub then attempts to find the closest new machine, and then will route commands to that. Will do that over websockets using forwarding of commands. Any messages coming back are sent to the local end.
jgraham: I'm a little nervous about returning "there are N open sessions" It feels like it should be opaque to you that there are other sessions. You can't leak the session IDs. It's not obviously a security concern to know whether there's another session, but it's not the information you care about.
jgraham: you probably want to know the state that the browser is in, and we have commands for that. If you expect the browser in a specific state, then your tests should query the session to ensure that, rather than assuming some default base case.
jgraham: we could make this work with a single command, even in the cloud case (eg. by providing an existing session id as a capability, and returning a new session only if there's an existing session with that id)
jgraham: for the local use case, forcing the design to support the remote use case seems like a lot of work
shs96c: two points. 1/ sessions should be opaque to each other so that cloud providers can (eg) monitor sessions without users being aware that the monitoring is happening. 2/ disambiguating connect from new session is useful for a number of reasons, such as memory footprint (capabilities can be large if there's a profile in there), and so that it's crystal clear in logs what was happening
jgraham: your use case of cloud provider running a session and you don't need to know about that suggests there are cases that users will want to run "new session"
jgraham: cloud providers might have a pool of "warm" sessions ready to go, so I think new session should work to connect to an existing browser. There may be a use case for connect for cloud service providers, specifically for cloud service providers, where you might only know the session id.
orkon: I feel that the problem is a bit related to the fact that `new session` does multiple things. It not only creates a new session, but is also responsible for launching a browser. If we separated these cases, we could launch the browser without any sessions, and then we could create a new session. Third use case would be about creating a session <missed the rest, but the gist is clear, I think>
jgraham: there's nothing in the spec that says that new session will launch the browser.
jgraham: with WPT the typical thing we do is that we pre-launch the browsers, and then create a webdriver session. That allows us to have a pool of browsers waiting.
jgraham: there's nothing in the spec that requires new session to start a browser, but that's typically what we do. Cloud providers might also maintain a pool of instances.
orkon: from the perspective of cloud providers, don't they want some way to get a browser to be launched.
jgraham: new session just somehow makes the session available. Describes typical local workflow, where the command finds a driver binary, starts it, and then that starts a browser. But there's nothing in the spec that says that has to happen
AutomatedTester: when it comes to cloud providers, we can't necessarily have a warm pool to connect to, because one of the key things is that people want to amend the `new session` requests on startup. If the browser is already started, you need to restart the browser, so it doesn't make sense to have a pool. You also can't ever trust the previous session, so you have to clean up the entire OS (eg. by rolling back to a previous OS snapshot, or
starting a clean container)
<jgraham> I didn't mean to suggest that there was necessarily going to be a warm pool, just that it seems like a valid optimization (e.g. wpt has exactly this optimisation for Firefox instances). Maybe cloud providers aren't the best example here.
AutomatedTester: I don't think we can connect to something that's already spun up in that sense. But there are going to be times when we want to (eg) collect all the logs or performance data, so that we can build in features that customers might want. But the end user clients are likely going to just want a clean session. If they find out other sessions are connected, they'll want to know why. If customers have a sense that their sessions
aren't entirely there, they'll lose trust in the service.
jgraham: connect feels like a subset of what new session does. A lot of the scenarios I'm thinking of are ones where you may not know the session id. For example, if I'm running some tests and I want to connect my VS.Code instance to running tests, I may know the process ID of the browser, but not the session id.
jgraham: I agree for a cloud provider, the opposite is true.
shs96c: in selenium /status gives you a list of the session ids running in the process, so if you have the port to connect to, you can find out the sessions ids
<Zakim> AutomatedTester, you wanted to react to AutomatedTester
AutomatedTester: Christian Bromann was advocating for a more async `new session` command, which fits in with the "warming up" idea. That's because it may take a long time to start a new browser. That might be something to look into.
sadym (IRC): to clarify, we have a webdriver server, and we want to support a few websocket connections to that server, and some of those connections should connect to the same browser instance, and some of those connections to different browsers
sadym (IRC): if I create a new websocket connection, i need some kind of ID, and that's the session id
jgraham: the answer is "maybe". In the case of a cloud provider, the session id unambiguously identifies the browser. If I'm running firefox locally, there isn't necessarily a server instance involved, but I have shared information that there is a firefox instance that was started with webdriver bidi, and if I connect to the local port. I can do that without knowing whether another program is running
jgraham: so I can start a test, start a network recording program, and then the port is the bit of shared information we need.
jgraham: in some cases the session id is enough. In other the PID or the port number is enough. If we connect via pipes, then the PID would be the thing you want to share around
sadym (IRC): for geckodriver, it only supports one instance of firefox. But in the more general case, you have a N:M connection.
shs96c: test runners typically run multiple tests locally
(discussion between shs96c and jgraham)
shs96c: what about a connect key to allow the use case of session connect
jgraham: for the non-cloud use case, that feels like overkill
shs96c: but there's no guarantee of a stable identifier that can be used locally
jgraham: that would be the URL, wouldn't it?
orkon: it sounds like cloud providers should add a session id or similar to a websocket url
orkon: the websocket url is the identifier that can be used. Therefore it's fine to have just a `new session` command. Cloud providers know which sessions have started and can maintain a routing table based on url.
AutomatedTester: that's what the session id is trying to do. Users connect to a single URL, not to multiple ones. Session ID can be part of the URL (classic) or payload (bidi). One of the downsides with the CDP model is that we need to build out forwarding websockets that go through a hub for accounting purposes. Selenium makes that second connection. The suggestion of using the url puts a lot of complexity into the intermdiary nodes
Sam Sneddon [:gsnedders]: when you're running locally, it's often the case that you can specify the port the session will listen on. That gives you an ID you can look up on at the point where you create it. As soon as you add in any other ID being required, you either need to continue being able to pass in the additional information, or you need to pay more attention when the session is established.
Sam Sneddon [:gsnedders]: let's not break the simple case locally
<jgraham> +1 on not breaking the simple case locally
<Zakim> gsnedders, you wanted to react to shs96c
<jgraham> zakim close the queue
shs96c: I believe that consistency is key, and something that we should be striving for. I also believe that the "local" case is not always local: many people run a selenium server (or similar) to connect to a webdriver session. As such, optimisations, such as keeping track of a URL can't be relied upon locally. Having new session return some unique identifier (either a session id or a connection key) is consistent, and is guaranteed to
work
shs96c: adding in support for things like finding out PIDs locally adds complexity to local ends in a way that isn't beneficial in all cases, and serves to muddy the waters about what's going on.
whimboo: I would like to get started with a security concern about using a session id to connect to an existing session. With CDP, we have access to all the data from the session, no matter which client is connecting. I don't see a problem when this is running locally, but when remotely it may allow an attacker to hijack a session and exfiltrate data. So new connections should be sandboxed. In classic there's just one session.
whimboo: maybe sessions can prevent any other session from connecting, perhaps through a new capability
<AutomatedTester> shs96c: I think we are relying on https requests. I think we can rely on the security of the connection. this leaves the sec issue could be that on the local data being taken out. If it is in the URL then it is can easily be done
<AutomatedTester> ... if its in the payload then it will be handled through the https/wss
<AutomatedTester> ... if we block other "connections" then it will limit the cloud providers from adding the value adds
<jgraham> shs96c: I think we're relying on connections being made through HTTPS without MitM attacks. We can rely on connection security. Then the problem is whether they key can be exfiltrated. If it's encoded int he URL it can. If it's in the payload it's just like security would be if running locally. The use case we're thinking of for being able to to connect is cloud providers being able to add extra facilities. They might also want to record a
<jgraham> video stream. Preventing any other session from connecting to a session would prevent that functionality. I don't think it's a problem unless we're worried about transport security being breached. Being able to lock a session would prevent cloud provides being able to add these things.
whimboo: we may need to check in firefox we don't have a secure one. WSS will be important
<orkon> in chrome it is recommended to use pipes instead of ws
<Zakim> jgraham, you wanted to mention that we're already returning the websocket URL in session.new
jgraham: firefox at the moment only supports local connections, so I think unless we're worried about a process on the same machine as an attacker, I'm not really worried. Cloud providers would wrap the connection. Local attackers could also read pipe data
jgraham: in new session we return a url that allows one to connect to the session. We could also return a new url to create a new session. For firefox, that would be the url you connected to originally. For a cloud provider that would be something different to connect to an existing session. URL would be tied to an existing instance. Lacking the URL would indicate that only one session at a time is supported.
sadym: currently,we have a websocket connection, which can be upgraded to a bidi session. That's the only way to connect to a server. From that perspective, the URL we need to connect to is the browser instance. It seems like, the websocket is a bit overloaded, and it doesn't seem the right place to connect a new session to.
sadym: There is a thing that creates the browser that is out of bidi, and it seems natural to put this logic in the thing that starts the browser.
sadym: the problem is that it would move capabilities out of `new session` in bidi, and it has to be set before the instance is running.
<whimboo> * /sadym: currently, /
<AutomatedTester> shs96c: in summary: We are agreed that we want multiple sessions, we are agreed that are opaque to each other. In puppeteer it would be good to separate initialising the browser from connecting a session. I think that connect command is useful for cloud providers. We are in agreement that needs a unique identify it. THis can be in a URL. We return a ws url for clients to connect back. I propose we add `session.connect` that connects to the ws url that is in the
<AutomatedTester> capabilities that have been returned when starting the browser.
<AutomatedTester> jgraham: I think this is good but there are some things I disagree with that it would be good to continue the implementation detail in the issue
Sandbox mode
<AutomatedTester> github: w3c/
<orkon> spec draft:... (full message at <https://
<AutomatedTester> shs96c: to summarize from a previous discussion is that we want a way to create a session isolation for automation and working across browsers. This would be something like containers
<AutomatedTester> Sam Sneddon [:gsnedders]: in safari concept of profiles is not really exposed and can only be changed when the browser is started. Safari is a singleton like all MacOS applications
<AutomatedTester> ... what safari can do is have many different private sessions
<AutomatedTester> ... and we only provide ephemeral session for webdriver
<AutomatedTester> shs96c: Is the proposed API going to be implementable?
<Zakim> gsnedders, you wanted to react to sadym
<sadym> proposed API: https://
<AutomatedTester> whimboo: I think that it should probably sit on the browser module rather than the browsing context module
<AutomatedTester> and then would like fit the model already in CDP
<AutomatedTester> q>
<jgraham> +1 to this being the wrong place in the API to define this kind of thing.
<jgraham> RRSAgent: make logs public
<jgraham> RRSAgent: make minutes
<orkon> I don't mind defining it elsewhere, my thought process was that it is a container for browsing contexts, therefore, could be in the browsing context. In CDP, it is in the Target domain
<AutomatedTester> orkon: the idea again is that we can achieve storage isolations
<AutomatedTester> ... and then you wouldn't be aware of the other sessions that are happening
<AutomatedTester> ... the first API would create a container that would do some implementation specific
<AutomatedTester> ... and container is implementation specifici
<AutomatedTester> ... and then you create a browsing context in the container
<AutomatedTester> ... and then the next item is that you can close all containers when shutting down the container
<AutomatedTester> RRSAgent: make minutes
<AutomatedTester> sadym (IRC): this API aligns with realm sandox API already being proposed in bidi
<AutomatedTester> jgraham: I think this seems implementable in Firefox
<AutomatedTester> ... aligns well with how the containers API works in Firefox
<AutomatedTester> ... it comes with some limitations around temporary storage but they are being worked on
<gsnedders> re: the specific API, does browsingContext.GetTreeParameters need to contain the Container? does the browsingContext.BrowsingContext not already uniquely identify the browsing context across all containers?
<orkon> gsnedders: it might be redundant. It looked like it was needed but perhaps we can remove that part.
<AutomatedTester> ... my question re: webkit. If you have multiple automated ephemeral sessions does it map to this API where webdriver can see everything or does ever epheremeral session isolated
<AutomatedTester> Sam Sneddon [:gsnedders]: I don't see any reason why it can't be implemented
<whimboo> orkon: For a list of top-level bc's we probably want to have it to exclude tabs not using this container
<AutomatedTester> ... ultimated the UI process knows about everything and everything passes through it so it probvably doesnt matter
<gsnedders> Also, I'd suggest we use some terminology aligned with https://
<AutomatedTester> jgraham: I think the thing we will need to do is go through the APIs and see if need to share information about the container so people can route back to that container
<AutomatedTester> ... or commands take an argumnet that has the container ID
<AutomatedTester> ... there is some implementation complexity but something worth doing
<AutomatedTester> shs96c: I think the requirement about how prodfile data is stored is a impl detail
<jgraham> In particular, it seems important that we can refer to all contexts in a container now or in the future for things like event subscriptions.
<AutomatedTester> ... and I 2nd what Sam says about using https://
<AutomatedTester> jgraham: is that what we want?
<AutomatedTester> Sam Sneddon [:gsnedders]: yes I think so. There are some differences we need to be aware of and this CG document is still very early stages
<AutomatedTester> ... but the terms in there are probably what we want to be using
<AutomatedTester> ... we should check what the term ephmeral sessions means and make sure they match with everyone is doing
<AutomatedTester> orkon: I think using terminology from partition is good
<AutomatedTester> ... I think that the grouping of this going to move things in the future so it could be more than storage partitioning
<AutomatedTester> ... I am open to any suggestions on this moving forward
<AutomatedTester> sadym (IRC): API wise this one of the options and it is quite meaningful
<AutomatedTester> ... we could also put this on to session new as an argument
<AutomatedTester> jgraham: I dont want us to overload new session here as well as we have already discussed about this earlier
<AutomatedTester> ... this is a top level concept
<orkon> I am with James on this
<AutomatedTester> ... I perfer the API shown above as it is simpler to follow and use and is explicit
<AutomatedTester> sadym (IRC): another question is around subscription to events. What happens to global events? Do we subscribe to the container from a global session
<AutomatedTester> jgraham: this is why I suggested we review all APIs when doing this work
<AutomatedTester> orkon: re: subscriptions I dont think we would to subscribe to the container and get those as you subscribe to the context and then events bubble from that
<AutomatedTester> ... so we don't need to change anything
<AutomatedTester> jgraham: It is not part of the MVP for this to be a feature. I am sure we can land this and review. We would need to update the wdspec tests and <explains use case>
Add support for examining/manipulating intercepted network response bodies
<AutomatedTester> github: w3c/
jgraham: with the network request interception, we can change req body and headers but we don't allow intercept the actual network response body and edit it in place
jgraham: the reason is that the body can be very large
jgraham: sending large bodies in a json base 64 message is problematic
jgraham: this is a feature request that you could rewrite the body and we should support it
jgraham: so supporting it depends on the general facility on doing streamed IO of large payloads
jgraham: basically, it works in CDP: instead of returning text in one go, you return a handle that allows you to pull more data in chunks
jgraham: I think it is the most obvious design. For network req interception you also need a way to send the data from the client to the browser (the reverse)
jgraham: you might need a write steam in addition to the read stream
jgraham: and a way to fall back to the existing stream
jgraham: for network request interception it will be an opt in per request
jgraham: so you intercept the requests and tell that you will need a body later
jgraham: do we want to be declared upfront or a decision during the lifecycle?
shs96c: just two things: we are going to be sending binary data back and forth. JSON is not ideal format for it. At some point we might consider an alternative format for the spec, perhaps, protobuf or some other encoding supporting binary data. Second: we want to read and write memory remotely, ByteStreams might be a good fit.
shs96c: perhaps we don't need to create new IO mechanism
<shs96c> orkon: Two things. Firstly, I would prefer to decide on a request by request basis. It's more consistent with how puppeteer works, and it allows us to be more flexible (eg. reacting to other data, or modifying data for every N requests)
<shs96c> orkon: Secondly, uploading the binary data after the request, we'll need to see what support we have in CDP. It only supports a base64 encoded body. Not sure if there's a precedent for file upload
@sadym: is the use case modifying the response body?
Jim Evans: the point of the issue is that I am as a user that I want to modify the entire body or its parts that gets returned in the response
Jim Evans: not only replacing the complete body or manipulating the parts of the body
sadym (IRC): we can currently fulfil the static body
jgraham: we already have it
shs96c: only if the response fits into memory, currently no streaming
shs96c: the first thing the list of URLs is optional, the second thing is if providing the complete response will be handled in the consistent way, e.g., to indicate if it is a final part of the body. Or we always send the IOHandle
shs96c: so we need a way to get the body from the response
shs96c: but when we do network provideResponse we replace the body with the IO handle
shs96c: I guess the IO handle for read will return ???
shs96c: it will return network.Bytes
and if it is the end it will return eof
and write would take a network.Bytes value
shs96c: so we perhaps have a read handle and a write handle
jgraham: it makes sense to have separate handles
jgraham: next steps is to define the details
Web Extensions testing
orkon: we support limited extension testing. For install and remove, for Chrome it is only done in capabilities. The Web Extensions group tell us this is difficult to change in chromium.
orkon: is there some cross-browser agreement we can have on how to install an extension
<whimboo> i don't see an issue. probably we can file one and then attach the log in there afterward
jgraham: the thing that happened earlier in the week, the web extensions CG's goal is to incorporate web extensions into WPT, where each test will be its own extension. The API they're looking for is `install extension` and `remove extension` without needing to restart the browser every time. They may want more (eg. activate and de-activate extensions), but the main thing is being able to install extensions at run time,
jgraham: this would be useful for Mozilla.
jgraham: I think there are use cases for things other than the extension just lasting the lifetime of the browser.
<whimboo> request for geckodriver and WebDriver classic: mozilla/
orkon: was told that installing extensions at run time was "practically impossible"
shs96c: do we want to make this a problem for the web extension CG to solve using a webdriver bidi extension?
orkon: perhaps, yes
whimboo: in geckodriver we can install an extension at runtime. Can even do this in private browsing mode.
whimboo: the goal is to avoid changing the payload of the test per browser
whimboo: the most important short-term goal is to be able to install extensions at `new session` time by having a standardised capability containing the extensions we want to install. Both Firefox and Chromium have support for this.
jgraham: do we think ignoring the web extensions cg stuff, do we have use cases in firefox that require us to install and remove extensions at run time
whimboo: probably not
whimboo: I can imagine that extension authors might want to test the `install` and `unload` scripts are being correctly called
shs96c: I remember from the Selenium project what SauceLabs has said: being able to install extensions at runtime is really useful because it allows having the browser ready to go and then users amend it with extensions at runtime
shs96c: it is debatable if the time is any different but looks like it's use case
<sadym> I created an issue: w3c/
whimboo: whenever people want to test extensions, enabling and disabling is useful. It would still be good to have extension installation at runtime
jgraham: summarising. There seems to be a general appetite for a capability for installing extensions at start up time, and that's something we can standardise. Runtime seems more of a requirement of the Web Extensions CG. The CG cannot write web standards, so for practical reasons it might be easier for us to write the standard, but they could also specify a non-standard track document which specifies the whole thing
<orkon> sgtm
shs96c: Is anyone opposed to us adding that capability ourselves?
sadym (IRC): I've filed issue 548 for this
<jgraham> github: w3c/
<whimboo> shs96c: shs96c: fyi it's sadym (IRC)
orkon: imagine an extension is there, and it has background pages and popups. Do we need to change anything in the spec to make this work? I'm not sure all the extension contexts are the same as the regular browser contexts. Does anyone see any blockers in the current spec for working with extension contexts? We may need some special types to differentiate them
orkon: maybe this is some sort of bidi extension for dealing with web extensions. Not sure
jgraham: I think generally the stuff that's in extensions fits into the model, which is running scripts in specific realms.
jgraham: I'm not sure how close the models of brower and extensions contexts are. Chromium may be doing something different. My best guess is that supporting web extensions is a few more properties or enum options that this script is associated with a specific web extension. Maybe `context created` needs a webextension ID. Similarly for realms.
jgraham: It probably makes sense to include the Web Extension CG to make sure we understand the model correctly. The question I have is how much of a priority is this for you?
orkon: it's P3 for Puppeteer, P2 for webdriver.io, and P5 for Selenium. Depends on how much support is needed. We have subscription events, which allow you to subscribe to everything. Should this apply to extensions?
@jgrh
jgraham: is everything the same priority? The proposal has two separate parts. The capability model seems fairly high priority. For access to the extension, it seems to be lower priority
whimboo: it's not only background pages. web extensions can also add chrome to the browser. When there's a button in the toolbar, there are popups that are opened, these may also be a browsing context
orkon: the thing with background pages is that in cdp there's a special kind of target, and that's supported in puppeteer (eg. service workers)
orkon: there is no way to automatically handle popups
orkon: how will we handle extension actions? Developers also want to test that.
orkon: the difficulty is that we don't have support for this in CDP or Firefox. It's not something that's already supported.
orkon: the summary is that we need to explore more details, and figure out which changes we need to make to the spec. The feeling is that no-one is opposed to being able to access the extension context from bidi.
shs96c: and the feeling is that we should be able to install extensions from new session
Sam Sneddon [:gsnedders]: I think that our general concern about webdriver being able to interfere with local browser state (and extensions are installed at the browser level, so interact with the user's safari install) It remains the case that we don't want the user's state to be observable. We are probably better getting people from the Web Extension CG involved. There are questions around interacting with things that are a shared state
jgraham: particularly for installing extensions, it would make sense to have a spec that if you can't install an extension for "whatever reason" then you can fail, and if that's in `new session` it can fail the session creation. End users would have the option of "not doing that". Could easily have the spec not make extension installation a requirement
orkon: asking a clarifying question: does this position include installing at session start up, or just at runtime?
Sam Sneddon [:gsnedders]: Safari on macOS is a singleton. There's only one instance running. If Safari is already running, the session connects to that running instance.
<jgraham> zakim close the queue
Document id & routing commands by document
<AutomatedTester> github: w3c/
jgraham: The proposal here is when we serialise a document that it has a unique document id that is different to the context/navigable
… the reason for doing this that some commands that don't want to just send to context but at the document
… the example is you want to run a script in a specific document in a context. If the document navigates then targetting the document means it won't run in the wrong place
… from a spec point of view this is how generally things work through the process and having a document id is easier to target
… and it removes the chance that what you think will happen might not happen
… in the original issue in github there is the same request from web extensions
orkon: I was wondering if targeting realms would solve this since a realm is tied to the document lifetime
jgraham: you could specify the realm for execute script but in the case of locateNode and navigate you want it to be specific to that document
orkon: I was meaning that realm could be a sandbox and we target that
… do we feel that it is still useful to target a context?]
jgraham: I think this adds complkexity as it means there is more than 1 to do it
… and we have to do something in the implmentation to route things to the right document and the error cases
… and for subscribing to events we want it to the document
… I think it we would need to probably support both of context and docuemnt
… I think the high priority item is to have serialisable document
… that then opens up the opportunity to do any other changes later
orkon: it is possible for a client to learn when things have executed in the wrong space so I think this could be good
jgraham: so we have agreement that we will serialise the document id back and at a later stage update the other commands
Provide support for batching commands, either only for new session, or for the general case
github: w3c/
jgraham: one of the goals for bidi is that it should work well with relatively high latency connections
… one of the problems is that as we make more commands to the rprotcol the harder it becomes for high latency connections
… so it makes sense to do batching
… like we have in actions. There was a basic proposal for locateNode to batch up searches for elements
… we could batch up things in the new session like "create session, insert these preload scripts and set something else up"
… the use cases are very interesting but only work in certain contexts like new session, find element...
… so do we want to build something that is generic and do we want to handle the case where is data flow between commands?
<Zakim> orkon, you wanted to react to orkon
orkon: I wanted to be mention that high latency for batching out commands in parallel
… If then there are wanting things to be sequentially then it can have a aggregate so that we can do e.g. scroll into view then do an interaction
… I wonder how many cases that would have dependecies between commands
shs96c: not just latency but bandwidth is something we need to care about
… a chatty protocol will slow things down too
… a lot of this is not relevant for localhost
… this is a problem when we introduce the internet and we can see about zipping up data
… and batching be another
… a number of people have looked to do this in the past
… I dont think we need to worry about in the spec. I think we can see about putting a note in the spec saying this could be made more efficient so you can do it
… I dont think we need to solve it but be cognizent. we dont want to be too chatty
jgraham: stuff like compression can be done transparently and i am not worried about it
… for batching I am worried since we are already doing kinda batching I dont think we can't just do it
… <explains use case with actions and executeScript which can't be done in 1 action>
… and then there is the locateNode that goes from element to shadow to another node to ...
… we are building batching in different ways and that might be the right way to do it but we could also see if this should be generic
sadym (IRC): are talking about batching or setting dependencies on commands?
jgraham: yes
jgraham: if we are talking about commands that have sequence then we need to figure out a way to share data
shs96c: in locateNode there could be ways to do this to build out an AST how you want it to figure out the data across
sadym (IRC): if we want dependencies that we can do that. We might need to expose to it the realm and you do that
jgraham: we could have done this all in JavaScript but that is executing things
… so people could execute the bidi commands from a javascript sent acorss the wire
shs96c: I think sadym (IRC) covered my point
orkon: I think if we batching of sequetial execution that it would be beneficial
… I wonder if there are other specs that show how to describe workflows
… but I don;t think we want to have that
… it will then be very hard for intermediaries to be able to understand and execute across platforms
jgraham: I think there is an example CapnProto time travel RPC that allows you to describe a way to batch workflows
… with intermediaries I think its better to not have a scruipting language that does this
… they wouldnt be able to catpure and manipulate the payload
<shs96c> CapnProto time travel RPC: https://
jgraham: if people are excited by this it might be good for someone to create a prototype and then we can go from there. We dont fully understand all the tradeoffs
orkon: I think we have a PAC (Proxy AutoConfiguration) proxy format that is like JS
… it has hard to parse and compile
… and it is probably too many steps to handle
<jgraham> PAC (Proxy AutoConfiguration)
Emulation features - User agent override
github: w3c/
orkon: allows you to set to the custom UA and is related to client hints
… in CDP you can do UA and client hints in 1 command?
… client hints is allows you set up things that can affect the UA
<orkon> https://
AutomatedTester: I see little to no value in being able to change the UA for people. Users will assume that by making Chrome change their UA to be like firefox that it will act like firefox and there might actually be interoperable issues that then exposed where since we are making sure that things work cross browser they should be encouraged to use the relavant browser. E.g. pretending to be a mobile safari in chrome isnt going to
give the same result
Sam Sneddon [:gsnedders]: I was going to point out that Mozilla is apposed to UACH? and that Apple has some concerns
… I think we should avoid doing things that builds on other specs that dont have cross browser buy in
jgraham: UA stance for Mozilla is negative so we can do this as a chrome specific extension
… UA Client hints has some privacy concerns. This can be then covered in a chrome extension method or defined in the UACH spec, slight preference towards the UACH spec because then others can implement it
… on the overall value in doing this is value for people testing their site where the version numbers for example
… there are use cases internally for browser vendors
… in firefox people can set this in prefs but wont be in a specific tests
shs96c: UA string makes sense but I am curious if this will be used in network interception
<jgraham> Request interception doesn't affect navigator.userAgent
shs96c: I have a concern that if we allow this that we are going to encourage people to use a specific browser and wont have a positive impact on a interoperable web
sadym (IRC): it is way more expensive to test on some devices and I don't think it will encourage people to use the wrong browser
shs96c: I think that having the UA set can be useful but pretending to be a different browser is bad
orkon
orkon: Let's drop CH as that is out of scope. UA changes can be good for browsers vendors but not for the average tester
… what is webkits stance on changing this?
Sam Sneddon [:gsnedders]: I think this is fine. we already can do this its just exposing to webdriver
<whimboo> > Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/116.0.0.0 Safari/537.36
whimboo: I wanted to mention setting UA is hard to interpret
… e.g. using this example it doesn't show what the browser is and can lead to false assumptions on the browser
… if we take this into account that we give them presets that they can use
jgraham: I think there are use cases for this. The concerns raised are reasonable. I think its on clients to not overly expose this
… if we have cases where can create the issues on a desktop instead of a mobile device then its fine but we shouldnt encourage people to use that as the final testing value
… and to whimboo 's point we should give them something they can edit rather than make it all up and get it all wrong
Emulation features - Get the unmodified (original) user agent
github: w3c/
orkon: this is related to the previous issue. It can be useful to get the UA when a session is started that we return it
… are there some concerns about doing this?
shs96c: jgraham pointed out that on some urls that it changes the UA but having the default UA returned can be useful
jgraham: I think in general to have getters for any setters we have
… so the case where the browser changes things under from you then we should return it if requested
… for returning it in new session caps that makes sense
… if there is a use for it i am happy to have it in the spec
Sam Sneddon [:gsnedders]: I wanted to point out there are cases where the UA string can change depending on the setting. e.g. on mobile requesting the desktop version of the site
… I broadly agree with jgraham that we should have getters for setters
jgraham: I think we should return things that people could set or if the rbowser can change underneath them
… I dont think its bad to return it in new session
<jgraham> RRSAgent: make minutes
Emulation features - Support emulation of the device pixel ratio
github: w3c/
whimboo: this is simiar to viewport emulation
… what we are missing the device pixel ratio so we are going to need a command to handle this
… we can add it to be part of the viewport emulation being set
Sam Sneddon [:gsnedders]: if you are a headless browser then this make sense
… if you are not headless then does it make sense that you can try set this higher than the screen
jgraham: no
… and example of this in wpt in reftests to set it to 1 so that its the same across browsers
<orkon> makes sense to me too, including the unsupported operation
<gsnedders> Also we might want to question whether we want to call it device-pixel-ratio (c.f. -webkit-device-pixel-ratio) or follow the standard resolution name?
Webdriver BiDi Cookies
github: w3c/
Lola: We are working on the cookies implementationm
… we wanted some clarification on the design
<Lola> w3c/
Lola: I am not sure if people have reviewed jugglinmike (IRC) last comment in 501
… the question is "Do we need storage.cookie since there is network.cookie?
shs96c: I think we should only have 1 thing for cookie. I dont know if it should be storage or network but I feel there should only be 1
jgraham: on the naming there should be one. There might be oberservable to clients. If they are observable to clients they same then it should be the same thing
… if you were to do things through network interaction that we should eb able to set cookies easily
… with 1 simple way
… for context on cookies. browsers partition cookies according to the site that it has set it
… browsers are now starting to allow a different way of setting cookies. So in firefox cookies are being partitioned from where they come.
… and containers and private browsing can set these in different ways
… we are not standardised on partitioned.
… I've looked at Mike's draft that it tries to set what needs to be used and hints what the values could be
… we could see about setting the values in the spec
… and we can mention about setting the partition key
… and we should leave it open for future development
… and I will stop talking. Dont hold me to it... or may be do
<Zakim> gsnedders, you wanted to point out correspondence to sandboxing discussion earlier
Sam Sneddon [:gsnedders]: I want to build on what jgraham said about sandboxing.
<gsnedders> ... I don't think it makes sense to consider storage (or cookies) separate to the earlier sandboxing discussion.
<gsnedders> ... We probably can't enumerate all the keys that every implementation might use to key their storage (and network, etc.) into different partitions, because that's something where implementations continue to try and innovate.
<gsnedders> ... So we probably, to use the language of Infra, need a way to select a User Agent that exists within that Implementation. Which is exactly what the sandboxing discussion was about earlier.
jgraham: I think make sense for nameable partitions
… I don't know how that works for cookies keyed on Source Origin
… if we have partioned containers like 1 and 2 and they can be keyed to it
… there are cases where example.org and example.net being stored differently in different browsers
… and we cant enumerate things since it's unbounded implicit partitions
… if 1st party cookies this can be easy but how each browser treats 3rd party cookies differently
… and not being obvious to users
jugglinmike (IRC): I wonder if there a presendence of storage. I wonder if there cookies on top of containers is it our job to explain how this works to users
… <explains different ways of containers and urls>
… is there any concern on this from a standardisation point of view on how this interacts with the containers
shs96c: I think the real concern here that we will make something the average tester will have no idea how to use
… they wont understand all the differences in these keys
… I think that we create a situation where it confuses folks
… I think we should go back to the use case of "I visit a site and then ask for cookies:" that it makes sense
… so that we dont have users needing to know what browsers they are using
… what is the bare minimum we can put in the spec
… and then iterate
jgraham: I agree we should have reasonable defaults
… for the cases we know we have reasonable defaults
… once we have containers we will make sure that it makes sense for the user
… for the origin paritioned cookies that we default to all be same
… in the case for SSO this is likely where things "break"
… e.g. auth.org is doing the auth for corp.com and people will want to set auth.org
shs96c: if we pregenerate the cookie and then allow people to set it
jgraham: we will try do the simple case well. There will also be cases wher ethey need to understand what they doing with cookies and partiions and they will do it
shs96c: can we do the bare minimum or should we do more
jgraham: if user really wants, they can set up network interception for the specific auth request and return set cookie.
… if we think that containers and SO make sense acorss all browsers then we could encode in the spec and if they cant do it just document it
… e.g. you might be able to set it in Firefox and Safari and Chrome ignores it
shs96c: can we do this so we dont slam the door on iterating over this
jgraham: yes. we can have a key for container that could eb the session and then we have the source origin
orkon: I wanted to say that this capapbility based negotiation can allow us to build useful defaults
… if we wanted to allow people to set source origin via a caps
… I think this is a workable solution in the draft PR and what jgraham suggests
… and some of the container stuff could be automatically computed
jugglinmike (IRC): to shs96c point on simplifying... would it work to define this command on the browsingContext module?
… or is this going to paint us into a corner?
jgraham: the organisation of modules is arbritary
… as orkon said people want to set this near the beginning
… it can be in browser... is it about controling the browser? I don't think it matters
orkon: we could maybe put containers into storage too?
ACTION: give feedback on the PR
jgraham: the summary is we should try a model where we have 2 partion keys (container name from the containers discussion and source origin and both are optional)
<jgraham> I think the model is:... (full message at <https://
Enable calling Classic commands directly from Bidi
<jgraham> github: w3c/
shs96c: motivation: there are specs out there that we don't if they will be updated
shs96c: it will be useful to call those sections directly from webdriver bidi
shs96c: otherwise you won't be able to use some of the features
sadym (IRC): do I understand it right it does not make sense to put it into the classic extension to bidi?
shs96c: but we want bidi to call into webdriver classic
<Zakim> jgraham, you wanted to clarify the design proposal
jgraham: I think it functionally does not make any difference where the spec goes
jgraham: just to clarify what you are thinking: let's take an example from the existing spec, something like getAllCookies. So the proposal would be to have a module in bidi to have a module called classic that will say with prose that it defines a command for each classic spec.
shs96c: no, the suggestion is to have the classic module with a single command that takes the parameter, the url template, and values of the parameters.
jgraham: so it will be modelling HTTP directly?
jgraham: so you send this command it will response with the command from the classic
jgraham: it will be basically saying for each command re-implement the classic command
shs96c: I imagining that I will be connecting via chromedriver and geckodriver. So this module can use existing code paths.
jgraham: a reasonable question: if you have both http and bidi session, would it make sense to have that?
shs96c: not sure I have a good answer
jgraham: probably it is not so much more work to update the specs
shs96c: I just want to have no regression in capabilties
<jgraham> orkon: Not all BiDi implementations will have access to classic, and if they have access to classic it's not clear that it's much different to just use classic.
jgraham: potential problem is that changing to async semantics would cause trouble
sadym (IRC): my concern is that the assumption that not all bidi implementations have a way to invoke classic and that makes the proposal to not acheive its goal
whimboo: that is something complicated especially in our case. But passing classic commands via bidi won't be possible for us. First, we have HTTP web driver and our protocol managers will not be compatible
whimboo: we will need to do a big rewrite
whimboo: if you need to send some http request from bidi because bidi runs in the browser and if we are testing on the mobile devices and we will need to issue http requests there. Not sure if it is easy enough.
whimboo: how much time would cause us to implement this compared to the converting and implementing specs
whimboo: so the number of missing APIs from classic is reducing step by step
sadym (IRC): even though not all the servers have classic. ChromeDriver technically can do what you describe. We have two implementations ChromeDriver and mapper. In mapper it is not possible. But it can be in ChromeDriver.
shs96c: I am going to suggest that we drop this idea and have an action to review existing standards that have webdriver extensions
ACTION: review existing standards that have webdriver extensions
ACTION: provide guidelines on how design BiDi extensions
How should WebDriver BiDi expose prerendering activation?
<jgraham> github: w3c/
https://
<jgraham> orkon: Prerendering is chrome-only feature not implemented by other browsers. Problem is that it will ship in Chrome and it affects how navigations happen in the browser. Changes the lifetime of the browsing context. There is a spec in WICG (see above). Main idea is that one tab can have two active browsing contexts. Link could destroy current context and active prerendered context. We could discuss if this is something we can accomodate in
<jgraham> BiDi e.g. allowing prerendering to have hooks into BiDi so they get predicatable results. Could pretend that prerendering doesn't happen at all, but it doesn't work well with e.g. network interception; should it intercept requests in prerender tab. Opinions on how this should work?
<jgraham> orkon: Also it's a topic that requires some backgorund reading, so maybe no solution today, but we should have it on our radar, since there might be other similar changes to the lifecycle.
<jgraham> Sam Sneddon [:gsnedders]: Some of the problems already exist with preloading, even without prerendering. There are already network requests you could intercept at the point that you're preloading. Other browsers do this, but I don't know if webdriver can trigger them.
<jgraham> orkon: Currently changing the viewport is specific to context, but for prerendering it's technically a different viewport but you'd expect to get the same result.
<jgraham> jgraham: it's hard to know if this violates any invariants that authors might depend on without examining the spec in detail. For example do we get load events without a navigation, or if we get a navigation for the preload do we get a second one for the actual user initiated navigation?
<jgraham> shs96c: I am opposed to magic. I would be tempted to say that the events should be fired as normal as if it was happening in a normal tab. Then we should have a different event to describe the navigable being activated. We expose the underlying internals. Might break some mental models but it's opt-in, so if you're enabling this you should be able to handle it.
<jgraham> jgraham: I'm worried about clients invariants being broken which users can't fix.
<jgraham> shs96c: Selenium doesn't really maintain state, but maybe other clients do
<Zakim> gsnedders, you wanted to ask questions about debugging
<jgraham> Sam Sneddon [:gsnedders]: How does this interact with the ability to debug browsing contexts? Can you break on exception in preload navigables?
<jgraham> orkon: Yes
<jgraham> Sam Sneddon [:gsnedders]: SO you can debug these. Presumably at this point it gets shown somewhere. This seems like an argument we should pass the events through to the client.
<jgraham> sadym (IRC): To clarify the proposal we should add a new type of browsing context and add a new event when one is replaced with another.
<jgraham> shs96c: That seems like a reasonable model
<jgraham> orkon: For puppeteer having some context on the events. Question is about state e.g. enabling interception for a specific top level context it won't be enabled for the background context, so you'll lose events until it's intercepted. So you need to subscribe to everything.
<jgraham> shs96c: Like an inheritable thread local in java
<jgraham> shs96c: Events could flow through to associated contexts
<shs96c> jgraham: the spec will be informed by what you can do in CDP. Agree that events should flow through to clients. If this breaks clients, then that's unfortunate. There will need to be flags on these events to let people know that the events are happening on pre-rendered content, and there will need to be a new event that a navigable will switch
<shs96c> jgraham: the existing model is limited. For example, if a context forms an associated browsing context, then that would inherit the event subscriptions from the parent context, but that's not how the spec works at the moment. The spec expects global or specific subscription to events. May need to consider inheriting subscriptions, or more flexible models
<sadym> Having the current mechanism already supports inheriting, if we consider the pre-rendered context a special kind of child of the main one, all the events subscriptions will be automatically provided, but the client will be able to filter them out by the browsingContextId, which is provided for all the events (the latest should be double checked).
<jgraham> orkon: What worries me a bit is that we could inherit events, but that would technically not be spec complaint because we route commands and events to and from specific top level browsing contexts, and the preload model gives up multiple top level browsing contexts.
<Zakim> shs96c, you wanted to respond about inheriting subscriptions
<jgraham> shs96c: James was right.
<jgraham> We might need to model certain things in the spec as inheriting event subscriptions. I think it would also allow nice functionality. e.g. as a tester I set up request rewriting. The page opens an iframe. I want to automatically inherit the request interception for that iframe.
<jgraham> jgraham: window.open is a better example
<jgraham> orkon: Or service workers
Perform Actions: "scroll" action incompatible with pointer origin type
<jgraham> github: w3c/
<shs96c> scribenick shs96c
jgraham: the spec has a bug that claims you can use the wheel input source at a coord relative to the pointer, but the wheel input source does not have a pointer, and you can scroll on a specific element or at a specific point, but the scroll wheel itself doesn't have a current position.
jgraham: this is unlike an actual mouse with a scroll wheel, where the scroll happens at the current point position. We merge the PR that says if you have a scroll input type, you can't accept `pointer` as the origin for that action.
jgraham: So that raises the question of why `scroll` isn't an action on a `pointer`? Is it worth making `scroll` an action on a `pointer` type or not?
shs96c: I think it probably is
Support for "Is Element visible"
<jgraham> github: w3c/
jgraham: test authors want to figure out if the element is possible to see if my app is working
jgraham: the most obvious thing is to check the bounding client rect is on screen
jgraham: and then turns out the visible means white one white, transforms etc
<cb> bromann
jgraham: we discussed it in Classic endlessly but ended up putting the code into the appendix based on Selenium
jgraham: so the question to what extend we want to revisit this to bidi?
jgraham: we know have preload scripts and not it is easy to send the scripts to the page and we wanted to update the javascript sometimes
jgraham: we have seen regressions when we updated Atoms
jgraham: there are possibilities 1) we do nothing, i.e., use the preload script to detect that whenever you want 2) we spec the same set of scripts 3) revisit the whole thing
jgraham: 3) (smth not based on the script)
<Zakim> shs96c, you wanted to discuss even more entertaining things people want to do with visibility
shs96c: there was one thing you missed. In WD Classic we use the visibility atom to check if the text is displayed because there was no innerText. It is still more consistent in many cases.
shs96c: I think a lot of ability to check if the element is visible might be only in the render tree
shs96c: I am not sure we will be able to define it well in w3c spec
shs96c: my option would be to go with option 1
<shs96c> orkon: just wanted to mention that option 1 sounds good (laughs) but if we want a solution, we should break the definition of "visible" into different parts. In puppeteer, visibility checks are used for figuring out if an element can receive actions (eg. is clickable) Maybe we could split out that part (eg. "what makes an element clickable")
<shs96c> orkon: we don't need to cover all aspects, but we can figure out which assertions are relevant. We could probably do hit testing on specific clickable points
<jgraham> RRSAgent: make minutes
sadym (IRC): +1 what alex said. How can preload script be used for visibility?
shs96c: you put the script to check to the global object
shs96c: so I think hit testing is sufficient for interactions, but you might get weird results for some elements
shs96c: if you have a web component, how do you figure out what is interactable
jgraham: yes, I agree with all of that
jgraham: I think we should try to do nothing in terms of visibility and having smth in the API that directly exposes the pointer interactions APIs, and it makes sense to expose that
<whimboo> RRSAgent: make minutes