Warning:
This wiki has been archived and is now read-only.
Comments to be considered before publishing the last working draft
No. | Subject | Comment | Author | Comment or Proposal | Resolution and Implementation |
---|---|---|---|---|---|
1 | General issues | Possible approaches to implementation should not include the word "should". That implies normativeness. This is a general issue with implementation sections. We say in the Audience section that "The normative element of each best practice is the intended outcome." | Annette | Berna's proposal: To remove the word "should" from the intended outcome sections. To remove the sentence "The normative element of each best practice is the intended outcome" from the BP template and the Audience section. | Resolved: Should was replaced by can. There some cases that still need a review. Data Access Section.
Implementation: https://github.com/w3c/dwbp/commit/c9c5cb7a188e5858fe53ca60ff4363cf0a294851 |
2 | General issues | Subtitles should all be written in the same mode. (Mine were written in imperative -- "do this, don't do that", but most are declarative -- "this should be done".) I think imperative is better, because it gets away from RFC2119 keywords, which we voted not to use. It becomes a call to action, which is our goal, right? | Annette | This needs to be discussed by the group. (see also: https://docs.google.com/spreadsheets/d/1eSTt3A6kTfXYTcVMt5VGDardLIk8b7FnsgENpCuNRBA/edit?usp=sharing | Resolved: Phil and Annette will rewrite the subtitles.
Implementation: https://github.com/w3c/dwbp/commit/047e2f78443fd917aee356eb2b405f6086efd732 |
3 | provide metadata | The intended outcome is "Human-readable metadata will enable humans to understand the metadata and machine-readable metadata will enable computer applications, notably user agents, to process the metadata." This is tautological. Metadata is necessary because, without it, the data will have no context or meaning. | Annette | Phil suggested "I'd write the intended outcome as simply: Humans and machines are able to understand the data." [Re: partial review] | Resolved BP01: to keep Phil’s suggestion for the intended outcome.
Implementation: https://github.com/w3c/dwbp/commit/000593604a95bcb2d57ca15040e9fd9281edcc9b |
4 | provide metadata | Also, I disagree that "If multiple formats are published separately, they should be served from the same URL using content negotiation." publishing multiple files is also reasonable, and it's even what we used in all our examples about metadata. (in BP2, the machine readable example gives the name of the distribution as bus-stops-2015-05-05.csv; in BP4, the entire URI is given, ending in .csv, etc.) | Annette | Phil's comment:
I think BP21 (#conneg) gets it right. You assign a URI to the dataset and use conneg to return whatever is the most appropriate version. However, you *also* provide direct URIs for each version, that by pass the conneg. (...) [Re: partial review] |
Resolved: BP01: to rewrite "If multiple formats are published separately, they should be served from the same URL using content negotiation." New sentence:
If multiple formats are published separately, they should be served from the same URL using content negotiation and made available under separate URIs, distinguished by filename extension. Implementation: https://github.com/w3c/dwbp/commit/b9f17696be1a39effdfecd35cf1470b1b3847e4e |
5 | provide metadata | There is an inconsistency between the suggestion that one should use content negotiation for different formats (csv vs. rdf) and the .:mobility and :themes are referred to as URIs, but they are not URIs. (I know DCAT did this, but I think it's a mistake; colons are not legal in the first segment of a relative URI.) | Annette | Phil's comment "I would word the intended outcome as: Humans and machines can discover the dataset; humans can understand the nature of the data." and there are more comments about this on the message [Re: partial review]
Editor's comment: Phil slightly reword that section and take out the colons as they refer specifically to Turtle representation. (http://w3c.github.io/dwbp/bp.html#DescriptiveMetadata |
Resolved: BP01: to rewrite "If multiple formats are published separately, they should be served from the same URL using content negotiation." New sentence:
If multiple formats are published separately, they should be served from the same URL using content negotiation and made available under separate URIs, distinguished by filename extension. Implementation: https://github.com/w3c/dwbp/commit/b9f17696be1a39effdfecd35cf1470b1b3847e4e |
6 | locale parameters | The human-readable example for the first three BPs is exactly the same. Can we make the examples more specific (maybe include them in the doc rather than link to one big external example)? The ttl in the machine-readable example could be trimmed to just the bold parts. | Annette | Berna's proposal: The doc is very long already. Instead of splitting the example, maybe we can link to specific parts of the page according to the BP.
Phil's comment: +1. All the data is in the HTML and TTL files, just highlight the relevant bits by including those and those only in the main doc. Incidentally, I expect to set up conneg between those two files, yes? [Re: partial review] |
Resolved: include the relevant parts of the html example for each BP in the document itself. Newton will also see other ways to do this.
Implementation: https://github.com/w3c/dwbp/commit/52d136067f63a02b6a9dad24478a0fe237362abd |
7 | locale parameters | I think the Why section is unnecessarily repetitive. A textual example
might also clarify things a little. I suggest: Providing <a href="#locale_parameter">locale</a> parameters helps humans and computer applications to work accurately with things like dates, currencies and numbers that may look similar but have different meanings in different locales. For example, the 'date' 4/7 can be read as 7th of April or the 4th of July depending on where the data was created. Similarly €2,000 is either two thousand Euros or an over-precise representation of two Euros. Making the locale and language explicit allows users to determine how readily they can work with the data and may enable automated translation services. My wording for the intended outcome: To enable humans and software agents accurately to interpret the meaning of strings representing dates, times, currencies and numbers etc. |
Phil | (answering on Annette's message [Re: partial review]) | Implementation: https://github.com/w3c/dwbp/commit/ca49005021c7ffc36d7e90657cdad82303ee31b1 |
8 | Licenses | We say "the license of a dataset can be specified within the data". I think we mean within the *metadata*. | Annette | Phil's comment: +1 Suggested rewording: The presence of license information is essential for data consumers to assess the usability of data. User agents may use the presence/absence of license information as a trigger for inclusion or exclusion of data presented to a potential consumer. |
Resolved: BP was update according to Anntte's and Phil's proposals
Implementation:https://github.com/w3c/dwbp/commit/25f6ce052098a61fa1a6e8c17b998e37b512adf9 |
9 | Provenance | The "Why" is pretty sparse and essentially says the same thing as the intended outcome. I think we could make it stronger. "Provenance is one means by which consumers of a dataset judge its quality. Understanding its origin and history helps one determine whether to trust the data and provides important interpretive context." | Annette | Phil's comment: +1.
My suggested wording for the intended outcome is: To enable humans to know the origin or history of the dataset and to enable software agents to automatically process provenance information. |
Resolved: BP was updated according to Phil's and Annette's comments
Implementation:https://github.com/w3c/dwbp/commit/11dc5ed7aa9a23b287b5037d3cc731e37b2a9e12 |
10 | Provenance | The example links to the metadata example page. It would be more helpful to put the provenance-specific info into the BP doc itself. | Annette | Berna's proposal: to keep the example as an external page. if we present just parts of the human-readable example it will be out of context. | Resolved: include the relevant parts of the html example for each BP in the document itself. Newton will also see other ways to do this.
Implementation: https://github.com/w3c/dwbp/commit/52d136067f63a02b6a9dad24478a0fe237362abd |
11 | Quality | We say "Data quality information will enable humans to know the quality of the dataset and its distributions, and software agents to automatically process quality information about the dataset and its distributions." That's rather tautological. We could say something about enabling humans to determine whether the dataset is suitable for their purposes. | Annette | Phil's comment: Annette and I are in agreement here. I'd phrase the intended outcome as:
To enable people and software to assess the quality and therefore suitability of a dataset for their application. |
Resolved: BP was updated according to Phil's comment
Implementation: https://github.com/w3c/dwbp/commit/8610970ef47d1d7a983763797a9703d6b6053087 |
12 | Quality | We probably should refer to DQV as a finished thing, as it will be soon. The human-readable example links to the metadata one. | Annette | Berna's proposal: to include DQV is a finished document and fix human-readable example.
Phil's comment: +1. I suggest: The machine readable version of the dataset quality metadata may be provided using the Data Quality Vocabulary developed by the DWBP working group VOCAB-DQV. |
Resolved: include the relevant parts of the html example for each BP in the document itself. Newton will also see other ways to do this.
Implementation: https://github.com/w3c/dwbp/commit/52d136067f63a02b6a9dad24478a0fe237362abd
|
13 | Versioning | Of the four implementation bullets, only the last is really a possible approach. The first three belong in the intended outcome. | Annette | Editors'question:. Why the first three belong in the intended outcome? If they are intended outcomes, then the whole intended outcome section needs to be rewritten. In this case, would you like to make a proposal?
Phil's comment: Unusually, I disagree with Annette here. For me, intended outcomes are short "this is what will be possible." The implementation steps are how you make it so, which I think you have in this case. |
Resolved: BP8 won’t change, just the subtitle. Subtitle should be more explicitly about what "has to be done".
Implementation: https://github.com/w3c/dwbp/commit/047e2f78443fd917aee356eb2b405f6086efd732 |
14 | Versioning | The human-readable example links to the metadata one. The version history there lists only 1.1, which is illogical. (1.0 must exist at least.) | Annette | Berna's proposal: to fix the link and the example page. | Resolved: The example will be updated to be more detailed and part of the human-readable example will be included in the doc. |
15 | Version history | The human-readable example links to the metadata one. The version history there lists only 1.1, which is illogical. (1.0 must exist at least.). This example doesn't meet the requirements of the BP. Neither the ttl version nor the Memento example provides a full version history, only a list of versions released. This BP is intended to be about providing the details of what changed. | Annette | In the machine-readable example of this BP there is a property rdfs:comment to show how the dataset was updated. If this is not enough, could you please tell us what else we should present. |
Resolved: BP8 won’t change, just the subtitle. Subtitle should be more explicitly about what "has to be done". Implementation: https://github.com/w3c/dwbp/commit/047e2f78443fd917aee356eb2b405f6086efd732 Resolved: include the relevant parts of the html example for each BP in the document itself. Newton will also see other ways to do this. Implementation: https://github.com/w3c/dwbp/commit/52d136067f63a02b6a9dad24478a0fe237362abd |
16 | Identifiers | Intro item 5 refers to an API which could be confusing, since we talk about APIs as web APIs elsewhere. | Annette | Phil's proposal: De-referencing a URI triggers a computer program to run on a server that may do something as simple as return a single, static file, or it may carry out complex processing. Precisely what processing is carried out, i.e. the software on the server, is completely independent of the URI itself. | Resolved: To update the introduction of Identifiers Section according to Phil's proposal.
"De-referencing a URI triggers a computer program to run on a server that may do something as simple as return a single, static file, or it may carry out complex processing. Precisely what processing is carried out, i.e. the software on the server, is completely independent of the URI itself. " Implementation: https://github.com/w3c/dwbp/commit/819d08a479682d5399b1bc89c69949a0631c6223 |
17 | Persistent URIs as identifiers |
|
Annette | Phil's proposal item 1: delete that sentence so it's just: "To be persistent, URIs must be designed as such. A lot has been written on this topic, see, for example, the European Commission's Study on Persistent URIs [PURI] which in turn links to many other resources."
Proposal (item 2): Annette agreed to keep like it is now. Proposal (item 3): How to test section was updated. |
Resolved: BP was updated according to proposals.
Implementation: https://github.com/w3c/dwbp/pull/373/commits/1941fc3fe7e360169b584e98235e0d2293065fdb |
18 | Persistent URIs within datasets | The word "affordances" is misused. Affordances are how we know what something is intended to do, not what the thing does. Affordances do not act on things, they inform. | Annette | Phil's proposal: "These ideas are at the heart of the 5 Stars of Linked Data where one data point links to another, and of Hypermedia where links may be to further data or to services that can act on or relate to the data in some way." | Resolved: To change the sentence according to Phil's proposal. "These ideas are at the heart of the 5 Stars of Linked Data where one data point links to another, and of Hypermedia where links may be to further data or to services that can act on or relate to the data in some way."
Implementation: https://github.com/w3c/dwbp/commit/819d08a479682d5399b1bc89c69949a0631c6223 |
19 | Persistent URIs within datasets | The intended outcome should be a free-standing piece of text. Starting with "that one item" is confusing. | Annette | Phil's proposal: to rewrite the sentence as follows: "One data item can be related to others across the Web, creating a global information space accessible to humans and machines alike." | Resolved: to update the intended outcome according to Phil's proposal: "One data item can be related to others across the Web, creating a global information space accessible to humans and machines alike."
Implementation: https://github.com/w3c/dwbp/commit/e03313c72ef913aceb40d84ca3f57437f1fe01cb |
20 | Persistent URIs within datasets | Much of the implementation section is about minting new URIs, which is the subject of the previous BP. It is off topic here. Everything from "If you can't find an existing set of identifiers that meet your needs, you'll need to create your own" down to the end of the example doesn't belong in a BP that is about using other people's identifiers. | Annette | Ask Phil to review | Resolved: Approach to implementation didn't change. How to test section was modified as follows:
"Check that within the dataset, references to things that don't change or that change slowly, such as countries, regions, organizations and people, are referred to by URIs or by short identifiers that can be appended to a URI stub. Ideally the URIs should resolve, however, they have value as globally scoped variables whether they resolve or not." |
21 | Persistent URIs within datasets | The last paragraph of the example is almost exactly the same as the last paragraph before the example. | Annette | Phil's comment: "Correct. I have deleted it in my native speaker review copy." | Resolved and Implemented. |
22 | URIs for versions and series | # This BP is confusing two issues. One is the use of a shorter URI for the latest version of a dataset while also assigning a version-specific URI for it. The other issue is making a landing page for a collection of datasets. The initial intent was the former. I don't think this applies to time series. What we're talking about here is use of dates for version identifiers. The example is incomplete; it doesn't say what the latest version URI would be.
|
Annette | Phil's proposal (item 1): to change the example is described as follows:
Suppose that a new bus stop is created. To keep
Phil's proposal (item 2): True, I offer this as a better alternative:
In different circumstances, it will be appropriate to refer separately to each of these examples (and many like them). " is replaced withIn different circumstances, it will be appropriate to refer to the current situation (the current set of bus stops, the current elected officials etc.). In others, it may be appropriate to refer to the situation as it exists/existed at a specific time. Annette proposes to use just existed rather than exists/existed. Phil's proposal (item 3): Rewrite How to test as follows: "Check that each version of a dataset has its own URI, and that there is also a 'latest version' URI." |
Resolved Item 1: to update the BP according to Phil's proposal Resolved Item 2: to update the BP according to Phil's proposal Resolved Item 3: to update the BP according to Phil's proposal Implementation: https://github.com/w3c/dwbp/commit/4a2fd830baaa3d9780f0614ff1e30548246304d1 |
23 | Introduction | First paragraph, some examples have no clear relationship to the web; "this phenomenon" has no clear antecedent.
Needs a careful native-speaker edit. |
Annette | Editors asked Annette to be more specific about the examples. | Resolved: Phil updated introduction.
Implementation: https://github.com/w3c/dwbp/commit/c1b89c386a0540b0b833af332175839a17d699be |
24 | Audience | Remove "such as CSV, JSON and RDF." They are too specific; don't use examples here. | Annette | Editors question: Why do you think that we shouldn't use examples ? I'm ok with removing the examples, but I'd like to understand the reason for this. | Implementation: "such as CSV, JSON and RDF." was removed from the Audience. |
25 | Context | The word "mainly" needs to be removed here: "The DWBP document is mainly interested on the Identification principle that says that URIs should be used to identify resources." "Mainly" means that it is more important than other considerations, which isn't true and probably isn't what was meant. | Annette | Implementation: Phil made slight changes so that the sentence now reads:
"An important aspect of publishing and sharing data on the Web concerns the architectural basis of the Web WEBARCH. An important aspect of this is the identification principle that says that URIs should be used to identify resources. | |
26 | Context | I disagree with the statement that "multiple Dataset Access mechanisms should be available." | Annette | Editors question: Could you please explain why do you disagree with this statement? Maybe this is also a rewriting issue. | Resolved: rewrite the sentence. New sentence: "multiple Dataset Access mechanisms can be available"
Implementation: https://github.com/w3c/dwbp/commit/19205e4f3fa92c695f660afde09b9c82a6d42dfb |
27 | Context | The diagram is still confusing for me. I can't tell what it is trying to say. What is the relationship between the blue dataset and the green, yellow, and orange rectangles supposed to be showing? Why does a blue box refer to a dataset and then to distributions? Why is the grouping within blue boxes different after the arrow? What does the dotted line represent? What does the arrow represent?
This section wanders between discussion of basic definitions and an incomplete enumeration of the best practices themselves. It needs to be rewritten so that it has a clear purpose and adheres to it. |
Annette | Implementation: https://github.com/w3c/dwbp/commit/9e6ad2afea3379cc4bde17ba833cf4c00c680b61 | |
28 | Basic Example | It should be about more modes of transit than just buses. We have some examples that use multiple modes. | Annette | Instead of changing the example description, the examples that mention multiple modes could be rewritten. If we mention multiple modes in the example description we might create big expectations on the public (just few BP examples really consider this aspect). | Resolved: to be more general in the example.
Implementation: https://github.com/w3c/dwbp/commit/2426e4285c3154e8fcc902ee99063a1a1185ea24 |
29 | machine-readable standardized data formats | There is no definition of 'machine readable', or of proprietary software. "computational tools typically available in the relevant domain" will surely include .docx and .xlsx, for example.
I looked at the Wikipedia page which links to a doc from the US government https://en.wikipedia.org/wiki/Machine-readable_data. from that I suggest the following: Paragraph 1: There is an important distinction between formats that can be read and edited by humans using a computer and formats that are machine readable. The latter term implies that the data is readily extracted, transformed and processed by a computer. The following definition of machine readable is based on that provided by the US Office of Management and Budget's definition in their Preparation and Submission of Strategic Plans, Annual Performance Plans, and Annual Program Performance Reports OMB-A11 Paragraph 2:Machine readable: A format in a standard computer language (not natural language text) that can be read automatically by a computer system. Traditional word processing documents and portable document format (PDF) files are easily read by humans but typically are difficult for machines to interpret. Formats such as XML, JSON, NetCDF, RDF or spreadsheets with header columns that can be exported as CSV are machine readable formats. |
Phil | to include the first paragraph in the Why section of the BP and the second one in the glossary.
Annette's proposal: "machine-readable" is used differently here than in the metadata section. Technically, nothing on the web is not machine-readable. I think we could remove that phrase. |
Resolved: use the following definition in the glossary
"Machine-readable data: Data in a standard format that can be read and processed automatically by a computing system. Traditional word processing documents and portable document format (PDF) files are easily read by humans but typically are difficult for machines to interpret and manipulate. Formats such as XML, JSON, HDF5, RDF and CSV are machine-readable data formats." adapted from [include the link proposed by Phil] Implementation: https://github.com/w3c/dwbp/commit/683492861826b8688de0401bc08e69124d42fbcc Implementation: https://github.com/w3c/dwbp/commit/cc779f06892a7d58ab91d682429868e994001911 |
30 | machine-readable standardized data formats | In the 'Why' para, consider adding 'open', 'well documented', 'RAND', etc to 'non-proprietary'. | Chris Litte | We removed the sentence "The use of non-proprietary data formats should also be considered since it increases the possibilities for use and reuse of data".The focus of the BP is about machine-readable standardized data formats rather than recommending data formats with specific characteristics.
Implementation: https://github.com/w3c/dwbp/commit/fc884022a635c9d4051466258e062112df256001 | |
31 | Multiple formats | Suggest that the intended outcome could be worded along the lines of:
"As many users as possible will be able to use the data without first having to transform it into their preferred format." I have many similar comments on intended outcomes. I think they should be statements of the specific benefit that is gained, so "to enable X" rather than "Doing X will enable Y." |
Phil | to review BP considering Phil's proposal | Resolved: BP was updated according to Phil's proposal
Implementation: https://github.com/w3c/dwbp/commit/9712ddfaa5de2149d4d2cdc17836c3de46fbac1c |
32 | Multiple formats | I very much dislike the word 'intended' in the sentence: "Consider the data formats most likely to be needed by intended users, and consider alternatives that are likely to be useful in the future." The idea of making data on the WEb is that it's up to the user to decide that he/she intends to do with it, not the publisher.
Suggest simply making it "Consider the data formats most likely to be needed and consider alternatives that are likely to be useful in the future. |
Phil | Update approach to implementation to include: "Consider the data formats most likely to be needed and consider alternatives that are likely to be useful in the future." | Resolved: BP was updated according to Phil's proposal
Implementation: https://github.com/w3c/dwbp/commit/9712ddfaa5de2149d4d2cdc17836c3de46fbac1c |
33 | Standardized terms | Suggest rewording the intended outcome | Phil | New intended outcome: Enhanced interoperability and consensus among data publishers and consumers. (Ask Antoines feedback) |
Resolved: Antoine will merge BP Use Standardized Terms and BP Reuse Vocabularies. Implementation: https://github.com/w3c/dwbp/commit/80451932600dfc4e26f96539753f3d3e9a919224 |
34 | Reuse vocabularies | Again, the intended outcome could be worded more succinctly I think. | Phil | follow Phil's proposal: To make datasets and metadata easier to compare and integrate by humans or machines. (I added 'and integrate', which I personally think is important but this is more than an editorial change). | Resolved: Antoine will merge BP Use Standardized Terms and BP Reuse Vocabularies.
Implementation: https://github.com/w3c/dwbp/commit/80451932600dfc4e26f96539753f3d3e9a919224 |
35 | Reuse vocabularies | please also clarify 'vocabularies' versus 'code lists' as code lists are used in BP16. The list in the second para. Does not explain the generally agreed distinctions. | Chris Little | Resolved: Antoine will merge BP Use Standardized Terms and BP Reuse Vocabularies.
Implementation: https://github.com/w3c/dwbp/commit/80451932600dfc4e26f96539753f3d3e9a919224 | |
36 | Right formalization level | I would word the intended outcome as:
The data supports a wide range of application cases but is not more complex to produce and reuse than necessary, or, to paraphrase Albert Einstein, "Everything should be made as simple as possible, but no simpler." The Einstein line is often quoted but, like so many quotations, is probably a misquote. And I'd say that the how to test line would be improved by using the word 'typical' rather than target: For formal knowledge representation languages, applying an inference engine on top of the data that uses a given vocabulary does not produce too many statements that are unnecessary for typical applications. |
Phil | Antoine's suggestion: Higher level of formalization make vocabularies and the data that uses them more difficult to produce and re-use. The data should support all application cases but should not be more complex to produce and reuse than necessary |
Resolved: Phil will rewrite the BP (How to test and Examples) Intended outcome: Higher level of formalization make vocabularies and the data that uses them more difficult to produce and re-use. The data should support all application cases but should not be more complex to produce and reuse than necessary Implementation: https://github.com/w3c/dwbp/commit/a1887532129f16b983835afe7f84fcd9bf062c5c |
37 | Sensitive data | Suggest rewording the intended outcome | Phil | follow Phil's proposal: "To enable data consumers to know that data that is referred to from the current dataset is unavailable or only available under different conditions." | Resolved: BP was updated according to Phil's proposal
Implementation: https://github.com/w3c/dwbp/commit/8373f4da3d6032bd65973ec22f00156b52c79e03 |
38 | Sensitive data | Regarding Best Practice 18: Provide data unavailability reference"data unavailability reference" is awkard and unclear. | Annette | Annette's proposal: Could we say "Provide an explanation for data that is not available. | Resolved: BP was updated according to Annette's proposal
Implementation: https://github.com/w3c/dwbp/commit/8373f4da3d6032bd65973ec22f00156b52c79e03 |
39 | Sensitive data | Best Practice 18: Provide data unavailability reference. address testing machine-readability. saying that a legitimate http response code in the 400 or 500 range should be returned. | Annette | Annette's proposal to How to test:
Where the dataset includes references to data that is no longer available or is not available to all users, check that an explanation of what is missing and instructions for obtaining access (if possible) are given. Check if a legitimate http response code in the 400 or 500 range is returned when trying to get unavailable data. |
Resolved: to follow Annette's proposal.
Implementation: https://github.com/w3c/dwbp/commit/6fa8b6bce48b87a44b42b986584d6676e6dcb00e |
40 | Bulk Access | I don't think this should only refer to cases where data is spread across multiple locations. I think it shoujld also cover the simple case of making a file available, as opposed to only providing an API. This is in addition to, not instead of what is written about multiple locations - which I think is very good.
I'd phrase the intended outcome as: "Bulk download enables developers to access the complete dataset for local processing without the need for further calls to the Web." |
Phil | I propose to complement the Why section to include "the simple case of making a file available".
Intended outcome: To enable consumers to access the complete dataset for local processing with a single request. |
Resolved: Update BP Bulk Access as proposed below:
Intended outcome: To enable consumers to access the complete dataset for local processing with a single request. Implementation: https://github.com/w3c/dwbp/commit/b373bef216bc0da7be4418d64ca17c06d4a6186b |
41 | Subsets | The intended outcome section is too long IMO. All the content is valid, I just think some of it could be moved to the Why section.
Really not sure about include an example of making a set of PDFs available. |
Phil | Ask Annette's feedback | Resolved: BP updated based on Annette's proposal.
Implementation: https://github.com/w3c/dwbp/commit/de513905142b813d3a7c5ae5a8d5305942136b7c |
42 | Conneg | In tidying up the language of this BP I pretty much rewrote it. I hope without changing your meaning significantly.
I suggest the intended outcome could be phrased as: "To enable different representations of the same resource to be served fromt he same URI according to the request made by the client." |
Phil | Resolved and Implemented. | |
43 | Access Real Time | Rewrite intended outcome | Phil | follow Phil's proposal: "To enable applications to access time-critical data in real time or near real time, where real-time means a range from milliseconds to a few seconds after the data creation, and near real time is a predetermined delay for expected data delivery." |
Resolved: BP22: to include a definition for near real time in the glossary (from wikipedia) and create a link in the outcome. Change the subtitle to use released instead of produced. Implementation: https://github.com/w3c/dwbp/commit/8168fc7a34be0fa665a97cb1d9f1f325f837e0c8 Implementation: https://github.com/w3c/dwbp/commit/a255b291331a98057033bd4abf0d2ca6212deebf |
44 | Access Up to Date | I think this sentence: "The international date format is recommended to avoid any ambiguity <a href="https://www.w3.org/International/questions/qa-date-format">https://www.w3.org/International/questions/qa-date-format</a>."
Would be better as: "Datestamps should be formatted using the XML Schema <a href="/TR/xmlschema11-2/#dateTimeStamp">dateTimeStamp</a> datatype xmlschema11-2." Although I note that the NOAA example uses the horrible "Mar, 3rd 2016 at 9:03:07 pm PST" format which breaks this advice :-( |
Phil | Berna's proposal: to rewrite this BP | Resolved: BP23: Annette and Bernadette will rewrite BP23 according to Phil's and Annette's suggestion.
Implementation: https://github.com/w3c/dwbp/commit/43ba83a77ecbcdd03d2e9066bedd5f7df482c2e4 |
45 | document your API | Rewrite intended oucome | Phil | follow Phil's proposal: "Developers can obtain detailed information about each call to the API, including the parameters it takes and what it is expected to return." | Implementation: http://w3c.github.io/dwbp/bp.html#documentYourAPI |
46 | document your API | This is very spatial, ideally we should have some non-spatial examples as well. I can tell this came from Linda and Jeremy et al :-) | Phil | Implementation: https://github.com/w3c/dwbp/commit/5879a077989310a409145bf476af57ff0a121342 | |
47 | Assess dataset coverage | Rewrite intended outcome | Phil | follow Phil's proposal: "To enable data consumers to appreciate the coverage and external dependencies of a given dataset." | Resolved: Phil will rewrite Data Preservation BPs
Implementation: https://github.com/w3c/dwbp/commit/a39037008231c32a6addb15c60dcac7638cd1560 |
48 | Use a trusted serialization format | Rewrite intended outcome | Phil | Phil's proposal: "To enable machines to process a dataset even if the original software that was used to create it is no longer available or supported." | Resolved: Phil will rewrite Data Preservation BPs
Implementation: https://github.com/w3c/dwbp/commit/a39037008231c32a6addb15c60dcac7638cd1560 |
49 | Provide structural metadata | I think the why section could be stronger:
<p>Providing information about the internal structure of a distribution is essential for others wishing to explore or query the dataset. It also helps people to understand the meaning of the data.My intended outcome wording: To enable humans to interpret the schema of a dataset and software agents to automatically process distributions. NB, I removed the 2nd instance of the word schema in that sentence which I think was a mistake? [Re: partial review] |
Phil | Resolved: BP was according to Phil's proposal
Implementation:https://github.com/w3c/dwbp/commit/19aaecd5567f2c7280fc27ff560a6e29c4383d7d | |
50 | Provide structural metadata | Possible Approach how about adding a link to the RDF Data Cube. | Chris Little | Phil's proposal: I leave that to the editors although I'd be inclined not to. Yes, QB includes a lot of structural metadata but in the context of the BP, I'd say the examples given are sufficient. | We included a link to the RDF Data Cube in the Possible Approach to Implementation section of BP4.
Implementation: https://github.com/w3c/dwbp/commit/fc884022a635c9d4051466258e062112df256001 |
51 | Provenance | I think the first paragraph of the intro section can be removed and the
glossary link added to the 2nd, like: The Web brings together business, engineering, and scientific communities creating collaborative opportunities that were previously unimaginable. The challenge in publishing data on the Web is providing an appropriate level of detail about its origin. The <a href="#data_producer">data producer</a> may not necessarily be the data provider and so collecting and conveying this corresponding metadata is particularly important. Without <a href="#data_provenance">provenance</a>, consumers have no inherent way to trust the integrity and credibility of the data being shared. Data publishers in turn need to be aware of the needs of prospective consumer communities to know how much provenance detail is appropriate. |
Phil | Resolved: BP was updated according to Phil's proposal
Implementation: https://github.com/w3c/dwbp/commit/0271027460fadf467e7175476387d969e132a103 | |
52 | Quality | Slight rewording of the intro paragraph:
The quality of a dataset can have a big impact on the quality of applications that use it. As a consequence, the inclusion of <a href="#data_quality">data quality</a> information in data publishing and consumption pipelines is of primary importance. Usually, the assessment of quality involves different kinds of quality dimensions, each representing groups of characteristics that are relevant to publishers and consumers. The Data Quality Vocabulary defines concepts such as measures and metrics to assess the quality for each quality dimension VOCAB-DQV. There are heuristics designed to fit specific assessment situations that rely on quality indicators, namely, pieces of data content, pieces of data meta-information, and human ratings that give indications about the suitability of data for some intended use. |
Phil | Resolved: BP was updated according to Phil's proposal
Implementation: https://github.com/w3c/dwbp/commit/88d24996c7db6727552178652b660f846db75530 | |
53 | Versioning | Looking at the intro material I think I could probably find people to
argue that all three of those scenarios are simply corrections rather than new versions. But then, as you say, there is no consensus :-) I would phrase the intended outcome as: To enable humans and software agents to easily determine which version of a dataset they are working with. |
Phil | Resolved: BP was updated according to Phil's proposal
Implementation: https://github.com/w3c/dwbp/commit/12c52bf24c8bfad10045880e2df2cf01d74d8650 | |
54 | Assess dataset coverage | BP28: 'Assess dataset Web context' is better than 'Assess dataset coverage'. Coverage could be confused with the specialised geospatial meaning. Change over paras. 'Scope' may be a useful word.
Could you use an example without mentions 'triples' - a term requiring specialised knowledge or too implementation specific. |
Chris Little | Resolved: Phil will rewrite Data Preservation BPs
Implementation: https://github.com/w3c/dwbp/commit/a39037008231c32a6addb15c60dcac7638cd1560 | |
55 | Bulk Access | BP 19: Provide bulk download
The intended outcome is focused on the wrong thing. It says "Bulk download will enable large file transfers (which would require more time than a typical user would consider reasonable) by dedicated file-transfer protocols." That's true, but it's not the point of the BP. The idea of allowing bulk download applies to datasets that are smaller as well as larger ones, and it need not involve alternative protocols. The outcome we are hoping for is that people will be able to easily download the data with a single request.
In the implementation section, the first bullet should clarify that it is about downloading. Making a request to one URI isn't unique to that bullet. (A bulk request to an API goes to one URI as well.) It should read "For datasets that exist initially as multiple files, preprocessing a copy of the data into a compressed archive format and making the data accessible for download from one URI." The test should be about whether the full dataset can be retrieved with a single request, not whether the data is preprocessed. That test works for APIs as well as file downloads by humans. |
Annette | Proposal:
Intended outcome: To enable consumers to access the complete dataset for local processing with a single request.. Approach to implementation (1st bullet): For datasets that exist initially as multiple files, preprocessing a copy of the data into a single file and making the data accessible for download from one URI. For larger datasets, the file can also be compressed. How to test: Check if the full dataset can be retrieved with a single request. |
Resolved: Update BP Bulk Access as proposed below:
Intended outcome: To enable consumers to access the complete dataset for local processing with a single request. Approach to implementation (1st bullet): For datasets that exist initially as multiple files, preprocessing a copy of the data into a single file and making the data accessible for download from one URI. For larger datasets, the file can also be compressed. How to test: Check if the full dataset can be retrieved with a single request. Implementation: https://github.com/w3c/dwbp/commit/b373bef216bc0da7be4418d64ca17c06d4a6186b |
56 | Subsets for Large Datasets | BP 20: Provide Subsets for Large Datasets
Change "Static datasets that users in the domain would consider to be large will be downloadable in smaller pieces" to "Static datasets that take some time to download will be downloadable in smaller pieces" It's true that being large is dependent on what users in the domain consider to be large, but the issue here is time, not largeness. |
Annette | Annette's proposal: Both human users and applications should be able to access subsets of a dataset, rather than the entire thing, as needed. Available subsets should maximize the ratio of needed data to unneeded data in responses to consumer requests. Static file downloads should be kept to reasonable download times, and APIs should return results of appropriate granularity to suit the domain and Web application performance. | Resolved: to follow Annette's proposal.
Implementation: https://github.com/w3c/dwbp/commit/de513905142b813d3a7c5ae5a8d5305942136b7c |
57 | Content negotiation | BP 21: Content negotiation should be in the implementation section for BP 14, multiple formats, rather than its own BP. I think we've already agreed to change this, but I'll just reiterate that I'm not yet convinced that always using conneg is a best practice for serving multiple formats from an API. I like the use of file extensions, because they allow one to reference a resource as a URI instead of a URI plus required headers (plus a note explaining how to set headers). I also think it's good to allow tests of an API using a browser when possible. Since browsers don't let you set request headers, relying on conneg alone prevents that. Using both addresses most objections, but many people prefer conneg because it allows them to get file extensions out of URIs. Implementing both doesn't accomplish that. For file downloads, I think conneg is a worst practice, because browsers don't allow users to set headers. Anyway, we could argue a long time on this. There is still a lot of disagreement about this stuff. | Annette | Resolved: keep BP21(Content negotiation)and make a link from BP 14 to BP 21.
Implementation: https://github.com/w3c/dwbp/commit/0c0852efdff04fd54b07d5eb2b2bfc3936018d21 | |
58 | access real-time | BP 22: Subtitle: I still don't know what it means for data to be "produced in real time". The other day I posted some log data from a supercomputing system. That data is produced constantly, and it appears in the logs immediately when an event happens. That feels to me like real time, but I don't think it is appropriate to publish on the web in real time, because the purpose of posting is detailed analysis, not monitoring. On the other hand, preparing the log data for publishing is slow, so maybe that's the real measure. Maybe it should be "When data is released in real time . . ."
The intended outcome defines near real time as with a predetermined delay. The U.S. Census has a predetermined delay of 10 years, and that is not near real time. See https://en.wikipedia.org/wiki/Real-time_computing#Near_real-time for some help. I don't understand the Push approach to implementation. I think the last word was intended to be publisher. "Disseminating" is vague and not particularly push-y, and making storage available is certainly not push-y. The last sentence of the implementation section is garbled. I think real-time data implementation is better broken into streaming or not streaming. It would be helpful to give some info about those alternatives. The example doesn't use the transport agency, and it doesn't show how to implement real-time data. It would be more appropriate as an example of an API. Mention of PROV-O in the test is unnecessary and off point. A more appropriate test might be to measure the refresh frequency and see that it matches the update frequency of the source data, and to measure the latency and see if it is in the real-time or near-real-time range. |
Annette | Resolved: BP22: to include a definition for near real time in the glossary (from wikipedia) and create a link in the outcome. Change the subtitle to use released instead of produced.
Implementation: https://github.com/w3c/dwbp/commit/8168fc7a34be0fa665a97cb1d9f1f325f837e0c8 Implementation: https://github.com/w3c/dwbp/commit/a255b291331a98057033bd4abf0d2ca6212deebf Resolved BP22: Change How to test section to use Annette’s proposal: A more appropriate test might be to measure the refresh frequency and see that it matches the update frequency of the source data, and to measure the latency and see if it is in the real-time or near-real-time range. Implementation: https://github.com/w3c/dwbp/pull/396/commits/a9779b4fa568d11354fce82a92b6202a5a34643b | |
59 | Access Up to Date | BP 22: up to date
The Why text is unclear as to what type of coincidence is desired and what should coincide with what. Similar to the real-time BP, I think the issue here is that the publication frequency should match the release frequency. The first sentence of the test reads like a note to ourselves to write a test. One step is "publish an updated version of data." That is not something one can do whenever a test is needed. More importantly, that test only determines whether there is a difference between two versions of the data. What it should be testing is the timeliness of the most recent data. |
Annette | Annette's suggestion: the first sentence is not about why. It belongs in the intended outcome. The intended outcome should say "Data on the web should be updated in a timely manner so that the most recent data available online reflects the most recent data released. When new data is released via any channel, it should be made available on the Web as soon as possible thereafter."
We could use a transit example about real-time bus arrival predictions. The test could be to check that the update frequency is stated and that the most recently published copy on the Web is no older than the stated update frequency. |
Resolved: BP23: Annette and Bernadette will rewrite BP23 according to Phil's and Annette's suggestion.
Implementation: https://github.com/w3c/dwbp/commit/43ba83a77ecbcdd03d2e9066bedd5f7df482c2e4 |
60 | Make Data Available through an API | Regarding the BP 24: Make Data Available through an API The test should say that a test client can simulate calls and the API returns the expected responses. (The test client doesn't simulate the responses.) | Annette | Proposal:
How to test: Check if a test client can simulate calls and the API returns the expected responses. |
Resolved: Update the BP as described below:
How to test: Check if a test client can simulate calls and the API returns the expected responses. Implementation: https://github.com/w3c/dwbp/commit/9e75bea228c7dd8261e1eb33991e07957905ca14 |
61 | document your API | Regarding the BP 26: Provide complete documentation for your API. The examples are all spatial data examples. None of them really makes sense in this context. We should probably offer examples for the transport agency.
Can we use a test like "time to first successful call"? That would require having volunteers to learn to use the API and timing them. |
Annette | Implementation: https://github.com/w3c/dwbp/commit/5879a077989310a409145bf476af57ff0a121342 | |
62 | Use a trusted serialization format | Regarding BP 28: Use a trusted serialization format for preserved data dumps.
If we keep this, it should at least offer JSON as an acceptable example. JSON is the current overwhelming standard for APIs. This talks about "sending data dumps for long-term preservation" and "data depositors". Where are the data being sent? Is it on the Web? The bad example would pass the How to Test. |
Annette | Resolved: Phil will rewrite Data Preservation BPs
Implementation: https://github.com/w3c/dwbp/commit/a39037008231c32a6addb15c60dcac7638cd1560
| |
63 | Update the status of identifiers | Regarding BP 29: Update the status of identifiers
It's not quite clear what we are suggesting get linked to what. The Why talks about linking preserved datasets with the original URI. Are we saying the original URI should continue to point to the preserved dataset? If that's the case, then what does preservation mean? There is also discussion of saving snapshots as versions, which seems to me is covered better under versioning. We say "A link is maintained between the URI of a resource, the most up-to-date description available for it, and preserved descriptions." One link can only join two resources. Should people preserve old descriptions? Maybe descriptions of older versions are what was meant? A 410 status only makes sense if there's nothing served at the URI, which isn't the case if the advice here is followed. 303 seems like a good option. |
Annette | Resolved: Phil will rewrite Data Preservation BPs
Implementation: https://github.com/w3c/dwbp/commit/a39037008231c32a6addb15c60dcac7638cd1560 | |
64 | Feedback | In the Introducion I disagree with this sentence: "In order to quantify and analyze usage feedback, it should be recorded in a machine-readable format." I think using automated tools to gather feedback and store it in a searchable way is a good idea, but saying the feedback should be machine readable is misleading and insufficiently specific. If you have succeeded in posting feedback on the web, it is machine readable by definition. It sounds like we are telling people to publish their feedback as another dataset. You may want to store it in a machine-readable way for the purpose of displaying it to other humans, but there's no reason to *publish* the feedback with machines in mind. | Annette | Resolved: Remove the sentence from the introduction: "In order to quantify and analyze usage feedback, it should be recorded in a machine-readable format."
Implementation: https://github.com/w3c/dwbp/commit/1ad8d3b312eea660fe420cb7aee9b4e2595e575a | |
65 | Feedback | Regarding the BP 31: Gather feedback from data consumers.
This BP includes recommendations about making feedback public, but that's handled in the next BP. We should keep this BP focused on enabling feedback. The first sentence of the Why needs rewriting. We should remove the word "providing" at the beginning. The BP is about collecting feedback, not providing it. It should address the value of setting up a specific way of collecting feedback (makes it easier for consumers to contribute). Remove the mention of machine-readable formats and using a vocabulary for capturing the semantics of the feedback information. Instead, suggest using an automated feedback system, such as a bug tracker. How to test, the first bullet is a note to us, I guess. The second is partially about the next BP. The third is again treating the feedback data as another published dataset. There's nothing wrong with publishing such a dataset, but that's not the idea here. A real test would be whether a consumer is able to find a way to provide feedback. |
Annette | Resolved: to update the BP as follows:
Why: Obtaining feedback helps publishers understand the needs of their data consumers and can help them improve the quality of their published data. It also enhances trust by showing consumers that the publisher cares about addressing their needs. Specifying a clear feedback mechanism removes the barrier of having to search for a way to provide feedback. Approach to implementation: Provide data consumers with one or more feedback mechanisms including, but not limited to, a contact form, point and click data quality rating buttons, or a comment box. In order to make the most of feedback received from consumers, it's a good idea to collect the feedback with a tracking system that captures each item in a database, enabling quantification and analysis. It is also a good idea to capture the type of each item of feedback, i.e., its motivation (editing, classifying [rating], commenting or questioning), so that each item can be expressed using the Dataset Usage Vocabulary [VOCAB-DUV]. How to test: Check that at least one feedback mechanism is provided and readily discoverable by data consumers. Implementation: https://github.com/w3c/dwbp/commit/c9d24d64701564ad40d74907d86e4eb22414a178 | |
66 | Feedback | Regarding the BP 32: Make feedback available.
The Why should mention avoiding duplication and being transparent about the quality of the data. The intended outcome is tautological. It should include the idea that consumers should be able to review issues already raised by others, saving them the trouble of filing duplicate bug reports. Publishing feedback also helps consumers understand any issues that may affect their ability to use the data. The implementation section need to be changed. We should not be telling people that they need to present their feedback in machine readable form. The test is again about metadata for the feedback as a dataset. Publishing your feedback as a dataset is not a best practice. |
Annette | Resolved: Bernadette will update the BP according to Annette's proposal.
Implementation:https://github.com/w3c/dwbp/commit/e25d3263e24ba5e5ebd0202abf6a07c6884f1db4 | |
67 | Data Enrichment | Regarding the BP 33: Enrich data by generating new data
The Why needs a few caveats. "Under some circumstances, missing values can be filled in, and ..." "Publishing more complete datasets can enhance trust, if done properly and ethically." In the intended outcome, "should be enhanced if possible" is too strong. The first paragraph could be "Data that is unstructured should be given structure if possible. Additional derived measures or attributes should be added if they enhance utility. A dataset that has missing values can be enhanced to fill in those values if the addition does not distort analytical results, significance, or statistical power." |
Annette | Resolved: Update the BP as follows:
Why: Enrichment can greatly enhance processability, particularly for unstructured data. Under some circumstances, missing values can be filled in, and new attributes and measures can be added. Publishing more complete datasets can enhance trust, if done properly and ethically. Deriving additional values that are of general utility saves users time and encourages more kinds of reuse. There are many intelligent techniques that can be used to enrich data, making the dataset an even more valuable asset. Intended Outcome: Data that is unstructured should be given structure if possible. In structured data, missing values should be added if they enhance utility, but only if the addition does not distort analytical results, significance, or statistical power. Values generated by inference-based techniques should be labeled as such, and it should be possible to retrieve any original values replaced by enrichment. Whenever licensing permits, the code used to enrich the data should be made available along with the dataset. Implementation: https://github.com/w3c/dwbp/commit/66cf069a3b4d905aa79d9c50abd610898b4283bf | |
68 | Glossary | The definition of locale needs to mention geographic location.
The definition of machine readable data surprises me. I think proprietary formats are machine readable, too. If we want to steer people away from proprietary formats, we should do that explicitly. |
Annette | Resolved: Update the definition of locale as follows:
A set of parameters that clarifies aspects of the data that may be interpreted differently in different geographic locations, such as language and formatting used for numeric values or dates. Implementation: https://github.com/w3c/dwbp/commit/defff64772d0ee99c03589bb397d7747eb83485c | |
69 | licenses | We say "Data license information can be provided as a link to a human-readable license or as a link/embedded machine-readable license." Since licensing info is part of metadata, and we tell people to provide metadata for both humans and machines, we should also require licensing info for both humans and machines. | Annette | Updated according to Annette's proposal
Implementation: https://github.com/w3c/dwbp/pull/388/commits/2508c71de31a9010c8157a5d1b3079d9102c5bd1 | |
70 | machine-readable standardized data formats | In the possible approach to implementation for BP13, could we change NetCDF to HDF5? HDF5 is more general. NetCDF is based on HDF5, so using the latter covers both. | Annette | Resolved: to do the update proposed by Annette.
Implementation: https://github.com/w3c/dwbp/commit/46ac79a2b88a2f0021274214b8e378488cdbfdc5 | |
71 | Context | "Data is published in different distributions, which is a specific physical form of a dataset." should/ can be replaced "Data is published in different distributions, which are specific physical form of a dataset." | Riccardo Albertoni | Resolved: the phrase in Context section ""Data is published in different distributions, which is a specific physical form of a dataset." will be replaced by "Data is published in different distributions, which are specific physical form of a dataset.".
Implementation: https://github.com/w3c/dwbp/commit/579a8a5e9bdb8e8a1b7479435e91737186120b1a | |
72 | Provide Metadata | how to test?
About the sentence "Check if all provided metadata are coherent with the described resource." I am not sure to understand what kind of coherence we are referring to. Perhaps we should specify it. Otherwise, I would opt for suggesting "Check if human readable metadata is available" |
Riccardo Albertoni | Resolved: The phrase in the BP Provide Metadata in the how to test "Check if all provided metadata are coherent with the described resource." will be replaced by "Check if human readable metadata is available"
Implemented: http://w3c.github.io/dwbp/bp.html#ProvideMetadata | |
73 | Provide descriptive Metadata | how to test?
About the sentence "Check if the descriptive metadata is available in a valid machine-readable" I would add "description" or format at the end of it. |
Riccardo Albertoni | Resolved: in the how to test of the BP Provide descriptive Metadata it will be added the word "format at the end of "Check if the descriptive metadata is available in a valid machine-readable format"
Implementation: https://github.com/w3c/dwbp/commit/f2f2eabba300ed5f07bd21c173b8b7d21b20c25c | |
74 | locale parameters | how to test?
there is an extra ")" at the end of the first sentence. |
Riccardo Albertoni | Resolved: in the how to test of Local Parameters it will be taken out the extra ")" at the end of the first sentence.
Implementation: https://github.com/w3c/dwbp/commit/62b32187b82b7b6f0fb0881707db927a7d6f4130 | |
75 | locale parameters | Use machine-readable standardized data formats: Possible Approach to implementation
"Make data available in a machine readable standardized data format that is easily parseable including but not limited to CSV, XML, Turtle, NetCDF, JSON and RDF." RDF is more a data model than a data format, actually it can be serialized in different serialization syntaxes such as turtle, JSON-LD and RDF/XML I would replace the sentence above with "Make data available in a machine readable standardized data format that is easily parseable including but not limited to CSV, XML, Turtle, NetCDF, JSON and RDF/XML ." or "Make data available in a machine readable standardized data format that is easily parseable including but not limited to CSV, XML, NetCDF, JSON and RDF serialization syntaxes like RDF/XML, JSON-LD, turtle." |
Riccardo Albertoni | Resolved: In the Possible Approach to implementation of the BP Use machine-readable standardized data formats the phrase "Make data available in a machine readable standardized data format that is easily parseable including but not limited to CSV, XML, Turtle, NetCDF, JSON and RDF." will be replaced by "Make data available in a machine readable standardized data format that is easily parseable including but not limited to CSV, XML, NetCDF, JSON and RDF serialization syntaxes like RDF/XML, JSON-LD, turtle."
Implementation: https://github.com/w3c/dwbp/commit/c888f0c9db2fd27b56c8037df9ea9781b2bdc29e | |
76 | subsets | How to test should say something about all the subets adding up to the complete set. Didn't we have a test before that the entire dataset can be recovered by making a series of smaller requests? I think we had a note that coming up with use cases isn't deterministic enough. | Annette | Implementation:
https://github.com/w3c/dwbp/commit/5879a077989310a409145bf476af57ff0a121342 https://github.com/w3c/dwbp/commit/5fbf772df2f8f8e863afd6cb4ec98bc68a935316 | |
77 | identifiers |
The example is rather redundant. It is data.mycity..., and yet /dataset also appears in the path. The path also contains /bus as well as /bus-stops. It's unlikely that the agency has so many transit modes that they need to be split between road and rail and water. The same info is conveyed as well by the much shorter http://data.mycitytransit.example.org/bus/stops I think we could go with something like this: http://data.mycity.example.org/transport/ is a base for all the example URIs Probably a real link would need to identify the dataset somehow rather than just say that it's a dataset. What do you think about this? http://data.mycity.example.org/transport/timetables/bus/stops/ |
Implementation:
https://github.com/w3c/dwbp/commit/152af52fda80a0ac43faf80acf806c9d9c4670ef https://github.com/w3c/dwbp/commit/50b9a9c47d05adf0e26bc34cc4e6429d20526b2f | ||
78 | data vocabularies | The first paragraph seems to be suggesting that controlled vocabularies enable easy translation, but it's confusingly phrased. The last three sentences could be changed to read "Standardized vocabularies can also serve to improve the usability of datasets. Say a dataset contains a reference to a concept described in a vocabulary that has been translated into several languages. Such a reference allows applications to localize their display of the data depending on the language of the user."
The last paragraph refers to "the former kind of vocabulary". It's not clear what kind that is. It's not clear what the point of that paragraph is. |
Annette | Implementation:
First paragraph was removed. https://github.com/w3c/dwbp/commit/558e8d4c4555a65268fc6ac6b8d0c9c13a74bc93 | |
79 | multiple formats | The example says John decided to use XML, but it shows ttl, and it shows
metadata, not data. The trend lately is toward doing a single format (json). Do we want to go against that trend? I note that the W3C's own API is json only. |
Annette | Implementation:
https://github.com/w3c/dwbp/commit/b85fa424e9c0ca1f7c795a7f367652279e36ecfc | |
80 | standardized formats | "machine-readable" is used differently here than in the metadata section. Technically, nothing on the web is not machine-readable. I think we could remove that phrase.
"adequate for its intended or potential use" doesn't really help in choosing. That's like saying "data on the web must be good." The intended outcome should be more normative. "Data should be available in a standardized format that is easily parseable" belongs in the intended outcome. We could add that data should not be posted as an image unless the data itself encodes the image. (A jpeg file of a table is an image that encodes the data; RGB channel data from an imaging microscope is data that encodes an image.) The example is metadata, not data. This BP is about formats for the data. |
Annette | We made some updates on the examples and the intended outcome and the why sections were rewritten.
Implementation: https://github.com/w3c/dwbp/pull/386/commits/f033ee73eb869ba70fddc07001d2e4703aff5b0c https://github.com/w3c/dwbp/pull/386/commits/b52e383f7fd61076c15244522105623e2b4f259f https://github.com/w3c/dwbp/commit/f3ed1fe24b3bb8835ae062c06aa0454ae0c9f5c5 https://github.com/w3c/dwbp/commit/e0c2f70d52674d700d80c16cce1de6acc6bf9157 | |
81 | Sensitive data | the discussion of sensitive data still needs a disclaimer, and the text
should me more general rather than focused only on personal privacy. |
Annette | Data sensitive section was removed, BP about data unavailability was included in the Data Access section and text about sensitive data was included in the introduction and data enrichment sectio.
Implementation: https://github.com/w3c/dwbp/commit/dd752baa9227b587f981c71779fcc3a60206273b
| |
82 | Data Access | Data Access Introdcution: We say the web uses http by default and then say that different
approaches can be adopted, including bulk download and APIs. Bulk download and APIs, of tar files or anything else, both use HTTP! Next there is discussion of packaging in bulk using non-proprietary file formats (e.g., tar files). This has nothing to do with being nonproprietary. The point, I think, is archiving a directory structure into a single file. Paragraph 3 is tautological. Data that is already streaming to the web is already published in a manner that allows immediate access. I think we mean to say "For data that is generated in real time or near real time, data publishers should use an automated system to enable immediate access to time-sensitive data, such as emergency information, weather forecasting data, or system monitoring metrics. In general, APIs should be available to allow third parties to automatically search and retrieve such data." If you want to then talk a little about APIs for other kinds of data, you could add a paragraph that goes like this: "Aside from helping to automate real-time data pipelines, APIs are suitable for all kinds of data on the Web. Though they generally require more work than posting files for download, publishers are increasingly finding that delivering a well documented, standards-based, stable API is worth the effort." |
Annette | Implementation: https://github.com/w3c/dwbp/pull/387/commits/b3c0bb30ffa66a5afc20e5d2aaa5ef9b756d1439
| |
84 | web standard APIs | We should have some references for REST.
Richardson, L. and Sam Ruby, RESTful Web Services, O'Reilly, 2007, http://restfulwebapis.org/rws.html. Fielding, Roy T., "Representational State Transfer (REST)", Chapter 5 of Architectural Styles and the Design of Network-based Software Architectures, Ph.D. Dissertation, University of California, Irvine, 2000, https://www.ics.uci.edu/~fielding/pubs/dissertation/rest_arch_style.htm. |
Annette | Implementation: https://github.com/w3c/dwbp/commit/4a7e0e00f8afdef9d68f000ba9f9c6adedc2a66b | |
85 | avoid breaking APIs | In the implementation section, "home resource URI" gets used as a
plural, but an API should only have one. Remove "home" in the first one and it makes more sense. "...by keeping resource URIs constant..." The bit about announcing changes should go in the outcome section. |
Annette |
Implementation: https://github.com/w3c/dwbp/commit/8ff10821cfe75f52bc084f155bf312c9bb35ab78 | |
86 | cite source | The first line of the example ("You can cite the original...") should
replace the text above it ("You can use the Dataset Usage...") The example citation should list the transit agency as the author. 'Data source: MyCity Transport Agency, "Bus Timetable of MyCity...' |
Annette |
Implementation: https://github.com/w3c/dwbp/commit/7ee799d6abbf7ba53616aa6c73a0d3f9cee572c6 | |
87 | challenges | In the diagram, the challenge texts should be similar, either all
statements or all questions. Suggestion for the reuse one: "How can I reuse responsibly?" (The current question sounds a little too self-serving.) |
Annette | Implementation:https://github.com/w3c/dwbp/commit/1aa0c50ab36b085006337ecc0629dafbab8b0d41 |