SIOC/CommonMistakes
This page list common mistakes when exporting SIOC data.
While this page is written specifically for SIOC data exporters from community sites, most of the problems and solutions also apply to other applications producing XML, RDF/XML and especially those that try to embed HTML content in RDF/XML.
Invalid XML entities
Symptoms:
XML parser error - Entity 'nbsp' not defined
Reason:
XML DTD defines only 5 basic entities < " ' & > - all other symbolic entities are invalid in XML unless explicitly defined
Solutions:
a) change all invalid entities to their numberic entity equivalents (preferred)
b) explicitly add entity definitions for the missing entities
Invalid labels
When adding labels to resource, take care to replace " by their XML equivalent " , espacially when extracting links from posts, eg
<sioc:reference rdfs:label=""State of the blogosphere"" rdfs:resource="http://www.sifry.com/alerts/archives/000419.html"/>
should be
<sioc:reference rdfs:label=""State of the blogosphere"" rdfs:resource="http://www.sifry.com/alerts/archives/000419.html"/>
CDATA sections
When using <![CDATA[ ]]> to enclose character data you have to check if the actual content includes "]]>".
Wikipedia's CDATA article says about that problem: to encode "]]>" in the middle of a CDATA section, replace all occurrences with the following:
]]]]><![CDATA[>
(This effectively stops and restarts the CDATA section).