Distributed multi-party media-rich content review

Presenter: Oleg Sidorkin (Bluescape)
Duration: 9 minutes

All talks

Slides & video

Keyboard shortcuts in the video player
  • Play/pause: space
  • Increase volume: up arrow
  • Decrease volume: down arrow
  • Seek forward: right arrow
  • Seek backward: left arrow
  • Captions on/off: C
  • Fullscreen on/off: F
  • Mute/unmute: M
  • Seek to 0%, 10%… 90%: 0-9

Hello, I'm Oleg Sidorkin. I'm a Principal Software Architect at Bluescape.

I'm also one of the developers of co-browsing technology and that's what I'm going to be talking about today. The pandemic and the shift to remote work has challenged the collaboration work landscape all together. When previously gathering a team in one room behind a high definition screen was a no-brainer, today it has become a challenge and probably not coming back anytime soon. Classic web conferencing and asset sharing tools are not adapted for group work when it comes to high fidelity content such as art, animations, and video.

We at Bluescape created a novel approach where virtually any web site or web enabled tool can gain additional functionality for live co-experience in viewing, commenting and editing creative content. When a user or a team starts a co-browsing session, the desired web site is loaded in the cloud while exact replicas and all incremental changes are broadcasted to all connected clients, giving them the same quality, the same latency, and about the same experience as they would browse the content on their local device or behind the same screen.

On the server side, we use Chromium as a headless web browser thanks to its open source nature and availability of a mature Chrome DevTools Protocol (CDP). Another point to consider was 65% browser market share worldwide earned by Chrome. We selected old fashioned WebSockets as a low latency communication protocol that guarantees in-order message delivery. The challenges start when we try to capture and deliver a live and exact replica of a web page to the remote client.

W3C DOM4 introduced MutationObserver deprecating Mutation Events. MutationObserver is a great way to get notified about changes in the DOM, but it doesn't provide enough information to understand what was the exact change and to form an incremental DOM update to be broadcasted to connected clients. We ended up with some injected JavaScript code that translates browser DOM into a virtual DOM and computes the difference between the previous and the new version of the virtual DOM. To keep cost of computation at bay, we used persistent data structures and hints from MutationObserver about each area that were affected. The immutability of virtual DOM allowed us to leverage shallow comparison via triple equal JavaScript operator so we can skip analysis of entire sub trees if we know that they will not be affected for sure.

It works well for almost all DOM elements, but there are some exceptions to take care of. MutationObserver doesn't sense changes in canvas elements and information extraction from canvas contexts poses its own challenge. We use mixed approach. For less dynamic canvases, like charts, we extract content as data URL that becomes a special attribute in the virtual DOM so the same synchronization logic can be used as for regular elements. For canvases with highly dynamic content and specifically for WebGL ones, we have an experimental approach based on captureStream API and WebRTC video streaming.

The document attached MutationObserver doesn't sense changes in Shadow DOM so every such shadowRoot should be observed individually, which adds additional logistics to the monitoring subsystem but it's solvable. However, additionally, Shadow DOM trees created in closed mode become invisible for JavaScript. In this case we either should give up or we can, and what we do, we monkey-patch attachShadow method of the Element to keep them open. It might contradict certain web site logic but that works.

When we serve content of a web page to all connected clients, we have to introduce certain modifications to it. We strip out all JavaScript code, effectively making the web page passive and embeddable, so it even can bypass the iframe embed implementations. All JavaScript logic is executed in the cloud in headless browser. We patch links to external assets so all of them are served from the cloud as well. No direct access. Those are target web sites. We infuse event handlers to intercept client interactions in the page and reroute them to the cloud instance of the web site. Such interception allows us to introduce true co-browsing experience when the single instance of the web site can be effectively controlled and viewed by multiple distributed clients.

And there are more challenges to come.

The event handlers distinguish “trusted” events generated by user actions and synthetic events generated or modified by script or dispatched via the dispatchEvent API. This feature is effectively employed by certain web sites to ensure genuinity of the user input. The workaround we found so far is to dispatch such events through the Chrome DevTools protocol. And in this case, they become trusted events.

Patching of external assets becomes tricky when it comes to malformed content. For example, a browser might ignore a set of errors in a CSS file and still displays the valid part while the most CSS parsing libraries in node.js ecosystem will fail to process it. The current solution that we use is to mimic the browser behavior as close as possible, effectively taking such logic, such CSS code parsing from the browser's code base, thanks to its open source nature.

We also found that page synchronization logic is sensitive to intrusive browser extensions, that can be enabled on the client side. Some DOM changes are positional and if an extension altered the browser DOM in an unexpected way, the sync protocol might get confused. As a partial mitigation of this issue, we introduced unique identifiers to key DOM elements. JavaScript also doesn't expose object pointers and any kind of object identity, so we used a tree, we leveraged a distinguish property of a Map container where key can be any value, including DOM elements. Maintaining such a map aside allowed us to introduce unique identitiers and object pointers for DOM elements, and keep the original DOM intact without introducing interference with the Web site logic.

Implementing co-browsers, we faced many challenges, and I described just maybe a quarter of them. A significant part is related to limitations or design decisions in corresponding web technologies. And it's understandable because what we're doing is not what you are typically doing. We are trying to implement completely remote browsing experience. For most of such issues, we managed either to find workaround or to introduce certain hacks leveraging dynamic nature of the web stack. So when it works, it works perfect, it works flawlessly. When it doesn't, finding solutions is tricky, it feels like walking through a minefield. But at the end, it's doable.

Thank you.

All talks

Workshop sponsor

Adobe

Interested in sponsoring the workshop?
Please check the sponsorship package.