Meeting minutes
Security review
anssi: received feedback from the chrome security team
… they help us build our wide review of the WebNN spec on our path to CR
General Security Questions - 1. new scripting language
General Security Questions #241
WebNN Security and Privacy Self-Review Questionnaire responses
2.9. Do features in this specification enable new script execution/loading mechanisms?
Anssi: this touches on question 2.9 of the security questionnaire wrt new script execution
anssik: questionnaire section 2.9 reads: "New mechanisms for executing or loading scripts have a risk of enabling novel attack surfaces. Generally, if a new feature needs this you should consult with a wider audience, and think about whether or not an existing mechanism can be used or the feature is really necessary."
Anssi: the Google security reviewer suggests WebNN introduces a new scripting execution that gets executed in different contexts (CPU, GPU, etc)
… this creates new attack surface for malicious sites
… any concern in updating our response to 2.9 as acknowledgement to that report
RafaelCintron: I strongly disagree with the characterization of this as a scripting language
… I agree with the risks around out-of-bound access
… that will need goo tests and validation to prevent that from API
… as would be needed with any graph API
Anssi: so you're suggesting we respond by focusing on the out-of-bounds aspects in the spec / security explainer
… while disagreeing with the characterization of a new scripting language
General Security Questions - 2. ops that change shape mid-calculation
> Operations such as split/slice/squeeze that change the shape of tensors mid-calculation may lead to incorrect assumptions in later operations - for instance if eliding bounds checks this could lead to out of bounds accesses. It would be good for their to be operation level metadata that might be consumed by implementors to help prevent such problems.
anssi: are there effective mitigations against this already?
RafaelCintron: similar to previous question - I'm not sure what they really mean by operation metadata
… we need to clearly spec what each operator ingest
… and mark graphs as invalid when they generate out of bond access
… we shoul definitely prevent these problems
chai: the question is a bit unclear, but it is true that in some cases that the exact shape of the tensor is unknown until runtime
… it's not insecure per se, but implementations need to mitigate against risks for out-of-bond access
… we should spend a bit more time looking at this
General Security Questions - 3. op availability and deprecation
> The universe of operations is likely to vary in future - how will consumers discover which operations are available (short of enumerating them through failures to instantiate)? How will operations be deprecated (for instance if they turn out to be badly implemented?)
anssi: I interpret this as a question on feature detection and deprecation
… can we shape the API to make deprecate easier?
… the spec does a great job to explain how to polyfill higher level ops in terms of lower level ops
… this would help in case higher level ops need to be deprecated, they could still be polyfilled
… I'm thinking we should note that concern in our security considerations and develop a clear answer
General Security Questions - 4. async APIs
> It feels like .build and .compute should be asynchronous in all cases?
Should restrict the sync APIs to only exist in Workers? #229
Should WebNN support async APIs? #230
Anssi: we're discussing this in issue #229 and #230
anssi: not sure if there is a security-specific rationale behind that one
General Security Questions - 5. Side channels from shared resources
> New side channels will be made available from shared resources (cpu/gpu). Timeable things should be out of process so incur at least some ipc to achieve anything. Probably not a massive worry when compared with already sharing a cpu between processes running renderers.
RafaelCintron: I don't understand why we would need to run timeable things out of process
dom: worth a clarification given timing attacks have been of interest in the past
RafaelCintron: we could make the timing less precise as a mitigation
… as has been done e.g. in WebGL
… doesn't necessarily need out of proc, but worth getting more information
General Security Questions - 6. Permission delegation
> Verify: Sites must delegate permission to host/run models.
anssi: if I understand this correctly, we're covered with the permission policy integration we have in place for the spec
… the top-level Web site needs to delegate permission to iframe to use this feature
RafaelCintron: +1 on permission delegation as important
… permission policy gets us this indeed
dom: my reading is similar to Anssi's, we satisfy the requirement, given complexity of security model, let's loop back and confirm
General Security Questions - 7. serialization and caching
> Verify: No serialization or caching yet - although this is likely in future.
<RafaelCintron> +1 to Anssi
Ningxin: that's also an implementation question
anssik: serialization or caching is out of scope for the WebNN API spec
Ningxin: does directml imply caching e.g. in the driver for comparison?
dom: I think in the context of security reviews, caching creates timing attack surface
… likely implementation considerations, what is asked is to call out this risk to implementations
dom: questionnaire is a tool for us, the responses should be reflected in the spec either normative language or security considerations
chai: the OS security deals with this, probably doesn't depend specifically on WebNN
dom: I fully trust Chai on that, I think there could be mitigations on the WebNN side to complement that, e.g. timing attacks would be need to be considered and protected on the browser level, browser exec code from anywhere and from anyone
RafaelCintron: following up - why specifically timing attacks on caching? e.g. shaders have the same property
dom: you go to your bank site that runs shader, then iframe runs the same shader and loads faster, can detect history of browsing
… on top of my head attack, general principle to understand how much information you can get x-origin from existing caches using timing attack vectors
dom: we should think about this, not sure if there's a mitigation, worth investigating
General Security Questions - 8. Control over how a model is run
> Control over how a model is run - (selecting cpu/gpu/tpu say) - is this too much power for the consuming site - it will for instance make it possible to more directly target a flawed implementation. It's not clear why this is required.
RafaelCintron: goes back to how much choice we should give the developer on where to run
… this is a decision that really matters in terms of performance
… some models really don't run well on GPU, some don't run on CPUs
anssi: one aspect is that this is a hint, it's left to the implementation
dom: I think a hint is a partial answer
… if the UA runs on a platform with specific narrow vuln on some processing unit, higher risk for exploitation
zkis: I was wondering hinting could be one think, but how do you handle errors when something is not available?
… is the following a fingerprinting concern: you have CPU and GPU on most devices, whether some dedicated accelerator, less common
… should not allow enumerating devices, let implementations to decide whether it respects the hint or not, but how to handle errors if that causes the model to fail?
… an issue in OpenVINO, you can run hybrid CPU-GPU models
anssi: I don't think we should allow device enumeration, not sure about how WebGPU deals with this
chai: I'm confused about how this would be a fingerprinting issue
… I'm more worried about not being clear about what devices is going to be used to run this
… it has big implications on sync/async
<dom> +1 to chai's point re sync/async impact
dom: I don't this the current API shape is so fingerprintable
… agree with Chai's point that if CPU vs GPU is a hint it impacts greatly our discussion on sync vs async
Guidelines/philosophy for new operations, including security principles
Guidelines/philosophy for new operations, including security principles #242
Adding new operators, view from ONNX by Michal Karzynski
anssi: this reinforces the value of creating guidance for creating new ops
Rationale/criteria for adding new ops to the WebNN API (TPAC 2021 minutes)
anssi: could be part of the Operand section
<dom> +1
Op metadata that helps avoid implementation mistakes
Op metadata that helps avoid implementation mistakes #243
A conformance suite with disallowed intra-op examples would be helpful for hardening
A conformance suite with disallowed intra-op examples would be helpful for hardening #244
dom: I heard earlier out of bounds we need to look into, formalizing that into test cases for w-p-t makes a lot of sense to me
anssik: on behalf of the group, I want to thank Alex Gough and Chrome Security team for this security review!
dom: really valuable indeed
… incl for wide review
dom: this is well beyond the expectation of a security review, very good feedback for CR horizontal review purposes
Integration with real-time video processing
Integration with real-time video processing
Review proposed prototype next steps
Proposed detailed GPU pipeline processing steps for semantic segmentation prototype
Anssi: Ningxin has proposed a plan to make progress here
dom: Ningxin thanks for formalizing this into concrete next steps!
… is this blocked until WebCodecs proposal is implemented in Chromium?
ningxin_hu: GPU Import of VideoFrame?
… is a dependency
dom: can be proceed without this?
ningxin_hu: another is WebGPU-WebNN interop, per request by Corentin opened an issue with WebGPU WG to investigate this
… that is another dependency in this proposal
… we can look into that in parallel
… so this proposal has these two dependencies and we can work on these in parallel
dom: I'm hearing we need improvements in both specs and implementation to make meaningful progress on the prototype
ningxin_hu: I need to confirm with people working on the WebCodec import to WebGPU, there's some prototype code for that in Chromium
dom: I followed the great discussion on WebNN-WebGPU integration, Ningxin, is this going to a good direction? Need for a joint meeting?
ningxin_hu: I'm fine with GH discussion on this, also checked with WebGPU people and they have monthly and welcome us to have an agenda item there to discuss
… this issue also marked as "post v1" in WebGPU
anssi: I understand WebGPU people are focusing on shipping their v1
chai: the outstanding topic remains the support for async
… the control over the GPU timeline matters when integrating with WebGPU
… WebNN isn't clear on this timeline intersection
… since we're also dealing with CPU, this could have a really big impact on the API shape
… we need to resolve that issue sooner rather than alter
<chai> https://
chai: #230 mentions integration with WebGPU as a consideration in this discussion
Review proposed use case for spec inclusion
Add real-time video processing use case #249
anssi: do we refer to WebGPU/Webcodecs, or remain abstract?
dom: this combines use cases and requirements
dom: I guess if we want to highlight technical aspects, then that would be more of requirements derived from the use case
<ningxin_hu> sgtm
Double-precision baseline implementation of WebNN operations for testing
anssik: Review the double-precision baseline implementation of WebNN operations for web-platform-tests purposes
The baseline implementation of WebNN ops #245
anssik: any concerns with proceeding with this implementation work?
ningxin_hu: would like to get confirmation it is fine to move this repo to WebML GH
… would like to set up the repo, initial PR, people can review then