Multimedia Semantics: Overview of Relevant Tools and Resources
Editor
ZeljkoObrenovic (CWI)
Contributors
TobiasBurger (DERI)
PasqualePopolizio (IWA/HWG)
RaphaƫlTroncy (CWI)
Index
1. Introduction
This page contains an overview of tools and resources relevant to multimedia semantics. If you are interested in other semantic web tools, you can visit ESW semantic web tools page.
We will constantly update this page to add or remove tools. Discussion about this document is invited on the XG public mailing list public-xg-mmsem@w3.org (public archives). Public comments should include "comments: [Tools]" at the start of the Subject header. If you want to add a new tool, please send a public comment to XG public mail list, and provide RDF metadata with DOAP description of the project and data about the contact person using FOAF.
Useful pointers/references:
2. Tools
We have classified tools in three categories: image, audio and video tools. Each category describes the main focus of the tools, although some of the tools, such as most of video tools, can be used with other media.
The order in the listings below does not reflect any assessment of the product or tool; simple alphanumeric order is used.
2.1. Image Related Tools
- Description: AKtive Media is an ontology based cross-media annotation (Images and Text) system. Our goal is to automate the process of annotation by suggesting knowledge to the user in an interactive way while the user is annotating and hence minimizing user effort. The system actively works in the background, interacting with web services and queries our central annotational store to look for context specific knowledge.
- Type: Standalone application
- Features: manual anotation, query system (SPARQL)
- Input: JPG, GIF, BMP, PNG, TIFF
- Output: RDFS, OWL, DAML
- Operating System: OS Independent (Written in an interpreted language - Java), OS Portable (Source code to work with many OS platforms)
- License: Academic Free License (AFL), Educational Community License, GNU General Public License (GPL)
- Description: MPEG-7 based Java prototypes for digital photo and image annotation and retrieval supporting graph like annotation for semantic metadata and content based image retrieval using MPEG-7 descriptors.
- Type: Standalone application
- Features: automatic annotation, manual annotation
- Input: JPEG
- Output: MPEG-7 (extracts IPTC and EXIF metadata and converts them to MPEG-7)
- Operating System: OS Independent (Written in an interpreted language - Java)
License: GNU General Public License (GPL)
- DOAP:
Foto RDF-Gen (English, Spanish)
- Description: A simply XHTML + Javascript tool that generates the content of RDF files to describe images.
- Type: Web application
- Features: manual annotation
- Input: text entered in a HTML form
- Output: RDF
- DOAP:
- Description: A toolkit that provides users the ability to annotate regions of images with respect to an ontology and publish the automatically generated metadata to the Web
- Type: Standalone application
- Features: manual annotation
- Input: JPEG + RDF
- Output: RDF
- Operating System: OS Independent (Written in an interpreted language - Java)
License: Mozilla Public License Version 1.1
- Description: The tool is written in Javascript and uses RESTful web services to access remote information. It is designed to be a quick and easy means of creating structured information about images, including who or what is depicted in the image; where and when it was created; creator and licensing information. The aim is to create and enable the reuse of alternative formats for both text and images for use in an accessibility context, although the potential application is much wider.
- Type: Web application
- Features: manual anotation
- Input: JPEG
- Output: RDF
- License:
- DOAP:
2.2. Audio Related Tools
- Description: A mpeg-7 based audio database, reads mpeg-7 descriptions (see our twin project MPEG7AUDIOENC) and allows useful database like operations.
- Type: Standalone application
- Features: database
- Input: MPEG-7
- Output: RDF
- Operating System:
- License:
- DOAP:
- Description: The Java MPEG-7 Audio Encoder extracts some descriptors and description schemes of the MPEG-7 standard to describe an audio content (in this case: an audio file)
- Type: Standalone application
- Features: automatic annotation (feature extraction)
- Input: WAV, AU, AIFF, MP3
- Output: MPEG-7
- Operating System: OS Independent (Written in an interpreted language - Java), OS Portable (Source code to work with many OS platforms)
License: GNU Library or Lesser General Public License (LGPL)
- DOAP:
MPEG-7 Low Level Audio Descriptors Extractor
- Description: Extracts 17 Low Level Descriptors (LLDs) defined within the MPEG-7 standard
- Type: Web application
- Features: automatic annotation (feature extraction)
- Input: WAV, MP3
- Output: MPEG-7
- License:
- DOAP:
MPEG-7 SpokenContent Description Scheme Extractor
Description: A demonstration tool that extracts an MPEG-7 SpokenContent description from an input speech signal.
- Type: Web application
- Features: automatic annotation (feature extraction)
- Input: WAV, MP3
- Output: MPEG-7
- License:
- DOAP:
- Description: A tool for segmenting, labeling and transcribing speech
- Type: Standalone application
- Features: manual anotation, automatic segmentation
Input: Most standards audio formats (use Snack Sound Toolkin)
- Output: SGML
- Operating System: OS Portable (Source code to work with many OS platforms)
License: GNU General Public License
- DOAP:
2.3. Video Related Tools
Note: Most of the tools referenced below support multiple media types, i.e. video, images and audio.
4M - MultiMedia Metadata Management
- Description: The 4M is an MPEG-7 Java-based prototype for digital images, audio and video management for semantic metadata and content based image retrieval using MPEG-7 descriptors.
- Type: Web-based application
- Features: automatic MPEG-7 features extraction (annotation and algorithm in progress)
- Input: GIF/JPEG (images), WAV/MP3 (audio), AVI/MPEG4 (video) also zip/rar/gz of the previous types
- Output: MPEG-7
- Operating System: OS Independent (Written in an interpreted language - Java)
- License: Unknown, not available for download, free access to a limited version
- Description: Tool to annotate videos (mainly DVDs) and to share these annotations on the Internet; Advene is intended to create stories / hypervideos about a video. Images and videosequences can be extracted from videos to illustrate; Hypervideos can be converted to XHTML
- Type: Standalone application
- Features: manual anotation; extraction of keyframes / videosequences
- Input: Video and audio formats supported by the VLC player (including DVDs)
- Output: Proprietary annotation packages format, XHTML, also opendocument-like Zip file
- Operating System: Linux, Windows, Mac OSX
- License: GPL
- Description: A platform to generically perform annotation and hyperlinking of fragments of time-continuously sampled data in a Web-integrated manner.
- Type: Firefox Extension, server extensions, standalone applications
- Features: manual anotation
- Input: Ogg Theora video codec and the Ogg Vorbis audio codec
- Output: XML (The Continuous Media Markup Language - CMML)
- Operating System: Windows, Linux, Mac OS X
- License: Freely available, licence unknown
- DOAP:
- Description: A free video annotation tool, used at research institutes world-wide. It offers frame-accurate, hierarchical multi-layered annotation driven by user-defined annotation schemes. The intuitive annotation board shows color-coded elements on multiple tracks in time-alignment.
- Type: Standalone application
- Features: manual annotation
Input: MPEG-1, MPEG-2, QuickTime, AVI
- Output: XML
- Operating System: OS Independent (Written in an interpreted language - Java)
- License: Free for research and educational purposes
- DOAP:
- Description: A professional tool for the creation of complex annotations on video and audio resources.
- Type: Standalone application
- Features: manual anotation
- Input: MPEG, MOV, WAV
- Output: XML (EUDICO Annotation Format)
- Operating System: OS Independent (Written in an interpreted language - Java), OS Portable (Source code to work with many OS platforms)
License: GNU General Public License
- DOAP:
- Description: "Extensible Markup Language for Discourse Annotation". It is a system of concepts, data formats and tools for the computer assisted transcription and annotation of spoken language.
- Type: Standalone application
- Features: manual anotation
- Input: Video and audio formats supported by Java Media Framework
- Output: XML
- Operating System: Windows, Linux, Mac OS X
- License: Freely available, licence unknown
- DOAP:
- Description: Frameline 47 from Versatile Delivery Systems. The first commercial MPEG-7 application, Frameline 47 uses an advanced content schema based on MPEG-7 so as to be able to notate entire video files, or segments and groups of segments from within that video file according to the MPEG-7 convention.
- Type: Standalone application
- Features: manual annotation
- Input: MPEG-4
- Output: MPEG-7
- Operating System: Mac OS X 10.4 'Tiger' or above
License: Comercial licence
- DOAP:
- Description: Metadata extraction and search engine based on MPEG-7.
- Type: Standalone application
- Features: automatic annotation (speech and video recognition)
- Input:
- Output: MPEG-7
- Operating System: Mac OS X 10.4 'Tiger' or above
- License: Unknown, not available for download
- DOAP:
- A tool used to collect contributions from many people so that they can build a large high quality database for research on object recognition.
- Type: Web application
- Features: manual anotation
- Input: JPEG
- Output: XML (The Continuous Media Markup Language - CMML)
- DOAP:
M-OntoMat-Annotizer (CERTH-ITI)
Description: M-OntoMat-Annotizer (M stands for Multimedia) is a user-friendly tool developed inside the aceMedia project. It is an extension of the CREAM (CREAting Metadata for the Semantic Web) framework and its reference implementation, OntoMat-Annotizer.
- Type: Standalone application
- Features: automatic anotation, manual anotation
- Input: JMF supported formats
- Output: RDF(S), MPEG-7
- Operating System: OS Independent (Written in an interpreted language - Java), OS Portable (Source code to work with many OS platforms)
License: GNU Library or Lesser General Public License (LGPL)
- DOAP:
Description: Tool to create MPEG-7 metadata of videos; allows free text annotation and temporal decomposition, part of the ViTooKi ToolKit
- Type: Standalone application
- Features: manual anotation
Input: MPEG-1, MPEG-2, MPEG-4 and any other video format supported by the ffmpeg or xvid library, mp4, and avi
- Output: MPEG-7
- Operating System: any
- License: GPL
- DOAP:
- Description: A tool for annotating (describing and indexing) video and audio using ontologies. It contains default description schemes that are usable for many purposes.
- Type: Standalone application (runs over Java Web start)
- Features: manual anotation
Input: Media formats supported by QuickTime player or Java Media Framework
- Output: RDF
- Operating System: OS Independent (Written in an interpreted language - Java)
- License: Unknown, freely availabe for ove Java Web Start
- DOAP:
- Description: A tool for management of large multimedia collections using semantic integration techniques for metadata by applying ontology driven Semantic Web technology. The user can organize multimedia collections with a graphical user interface which includes easy metadata indexing and search capabilities.
- Type: Standalone application (runs over Java Web start)
- Features: metadata indexing, search
Input: Media formats supported by QuickTime player or Java Media Framework
- Output: RDF, relational databases
- Operating System: OS Independent (Written in an interpreted language - Java)
License: GNU General Public License (GPL)
- DOAP:
- Description: A web-based system for media annotation and collaboration for teaching and learning and scholarly applications.
- Type: Web application (a Java servlet and a collection of web-based client programs written in Flash Actionscript 2, Java, and Python)
- Features: manual anotation
Input: Flash FLV, MP3, QuickTime
- Output: XML, RDF
- Operating System: OS Independent (Written in an interpreted language - Java)
License: Educational Community License
- DOAP:
- Description: A tool for creating video content descriptions conforming to MPEG-7 syntax interactively.
- Type: Standalone application
- Features: manual anotation
- Input: Videos in MPEG-1 System or MPEG-1 Video formats
- Output: MPEG-7
- Operating System: Windows 2000
- License: The tools is not longer available for download nor supported by Ricoh
- DOAP:
- Description: The tool allows the annotation and transcription of video (multi-channel) and audio data.
- Type: Standalone application
- Features: manual anotation
- Input: Video and audio formats supported by Java Media Framework
- Output: XML
- Operating System: Windows, Linux, Mac OS X
License: GNU General Public License
- DOAP:
- Description: A tool for collaborative indexing, annotation and discussion of audiovisual content over high bandwidth networks
- Type: Standalone application
- Features: manual anotation
- Input: MPEG-1, MPEG-2, MPEG-4, MP2, QTVR, JPEG,
- Output: MPEG-7
- Operating System: Microsoft Windows XP
- License: Unknown, not available for download
- DOAP:
VideoAnnex IBM MPEG-7 Annotation Tool
- Description: The IBM MPEG-7 Annotation Tool assists in annotating video sequences with MPEG-7 metadata. Each shot in the video sequence can be annotated with static scene descriptions, key object descriptions, event descriptions, and other lexicon sets. The annotated descriptions are associated with each video shot and are put out and stored as MPEG-7 descriptions in an XML file.
- Type: Standalone application
- Features: manual annotation, automatic annotation (shot boundary detection)
- Input: MPEG-1
- Output: MPEG-7
- Operating System: Windows NT, Windows 98, Windows 2000, Win XP
License: Free for evaulation, comercial licence available on request
- DOAP:
- Description: Available only in German
- Type: Standalone application
- Features: video indexing, manual anotation, video segmantation, keyframe extraction
- Input:
- Output: MPEG-7
- Operating System: Unknown
- License: Unknown, not available for download
- DOAP:
- Add following tools of the list:
Description: ViPER-GT can be used to annotate video with metadata about the file (date, place of filming) or information about the topics of the video. It is possible to select areas in the videos to annotate parts of frames (ie. humans) in order to track them. It has a spreadsheet and a timeline view to show the annotations. It supports text and spatial annotations. Annotations can serve as input for ViPER-PE (performance evaluation tool) (annotations are then the ground truth)
- Type: Standalone Java application
- Features: manual anotation; propagation and interpolation frame-based annotations over frames
Input: MPEG-1, but it may support additional formats if you have QuickTime for Java installed. The windows version also supports MPEG-2, using the included Virtual Dub 4 Java library
- Output: XML (Proprietary Viper file format)
- License: Unknown, source and binary distributions available for download
- DOAP:
- Description: An Innovative Tool for Video Navigation, Retrieval, Annotation and Editing
- Type: Standalone application
- Features: manual anotation
- Input: MPEG-2, DV
- Output: MPEG-7
- Operating System: Microsoft Windows 2000/XP
- License: Free for a trial period of 30 days
- DOAP:
3. Additional Resources
3.1. Programming libraries and toolkits
- C++
MP7JRS C++ Library Complete MPEG-7 implementation of part 3, 4 and 5 (visual, audio and MDS) by IIS, JOANNEUM RESEARCH Institute of Informations systems and Information management.
- Java
- Python
- Web applications (PHP, Ajax)
PHP JPEG Metadata Toolkit: http://www.ozhiker.com/electronics/pjmt/
Collection of Ajax and PHP scripts for annotating images based on Fotonotes specification
- Matlab:
MATLAB Toolbox for the LabelMe Image Database
3.2. RDF Converters
Flickcurl - Flickr to RDF C library for the Flickr Web Service API that exports data in RDF
XMP -> RDF - extract XMP metadata from various binary formats and exports it as RDF.
- Other (non-multimedia) RDF converters:
4. Examples and demonstrators
Foafing the Music: Bridging the semantic gap in music recommendation
- Collaborative authoring demos
CONFOTO demo "CONFOTO is an experimental sharing and annotation service for conference photos. It utilizes common RDF vocabularies (dc, foaf, rev, cc, ical, w3photo) to combine simple tagging with rich annotations (e.g. depicted persons, related events, ratings). RDF data is accessible via SPARQL, URIQA, or a link at the bottom of each page."
w3photo "envisions a royalty-free archive of conference pictures from WWW1 to Today -- searchable by the Semantic Web and ready for your tools". It uses various vocabulary, including Dublin Core, FOAF, CYC, Creative Commons, FotoNotes etc.
- Search enignes demos
Squiggle Ski A demonstrative image search engine dedicated to the alpine skiing developed using Squiggle (CEFRIEL's Semantic Search Engine) demonstrative search engine dedicated to music]
- Multimedia production demos
Cuypers multimedia transformation engine
Vox Populi automatic editing of video documentaries
5. To Probe Further
5.1. Relevant Projects and Conferences
- Projects
- Conferences, Workshops and Summer Schools
International Conference on Semantics And digital Media Technology: SAMT'2006 SAMT'2007
International Workshop on Semantic Web Annotations for Multimedia: SWAMM'2006
Multimedia and the Semantic Web Workshop @ ESWC 2005
European Workshop on the Integration of Knowledge,Semantic and Digital Media Technologies: EWIMT'2004, EWIMT'2005.
Summer School on Multimedia Semantics - Analysis, Annotation, Retrieval and Applications: SSMS'2006 SSMS'2007
5.2. Readings: RDF-Based Multimedia Annotations on the Semantic Web
- Photo annotation and social networking
Introduction & background reading (see Easy Image Annotation for the Semantic Web, ILRT Tech report)
FOAF co-depiction "Co-depiction is simply the state of being depicted in the same picture as someone else. We're cataloguing this using FOAF RDF documents, sharing and collecting these in the Web, as a way of documenting in a visual way some connections between people."
- Other image annotation projects
- Using RDF for describing visual resources in the art domain
Laura Hollink et al on spatial semantics, also using WordNet, Sumo, VRA, AAT etc in RDF (ISWC 04 workshop paper, K-CAP 2003 paper). See Laura's Visual Ontology page for up to date version of the RDF/OWL schemata.
Schreiber et al on annotation templates (IEEE IS paper)
Schreiber et al on use of home grown RDF-version of Getty AAT/ULAN/TGN and IconClass for image annotation (see K-CAP 2001 paper)
HP's and other Simile work on describing paintings etc (see HP's Data conversion techreport)
- Embedding RDF image annotations in other formats
Van Ossenbruggen et al on embedding RDF in SMIL (see ]http://homepages.cwi.nl/~media/cuypers/QuestionHow/ QuestionHow work])
Ivan Herman et al on embedding RDF in SVG (also relevant accessibility issues here) (see Computer Graphics Forum paper).
5.3. Readings: Non-RDF based work for Multimedia Annotation
EXIF "stands for Exchangeable Image File Format, and is a standard for storing interchange information in image files, especially those using JPEG compression. Most digital cameras now use the EXIF format. The format is part of the DCF standard created by JEITA to encourage interoperability between imaging devices."
Getty images collection, annotations and vocabularies
IconClass iconographic classification system, thesaurus for describing icons and other visual art
Mark Davis's work on Media Streams
- MPEG-7 Related:
5.4. Readings: Prototype systems
This section describes prototype and experimental systems that have been described in the literature, but are not available for usage or testing.
DIANE - A Multimedia Annotation System
MRAS - Microsoft Research Annotation System (Microsoft Research)
6. Acknowledgments
Thanks to Libby Miller, Benjamin Nowack, Tasos Gounaris and Yannis Avrithis for the pointers and descriptions.