IRC log of dpub-arch on 2016-02-18
Timestamps are in UTC.
- 18:00:52 [RRSAgent]
- RRSAgent has joined #dpub-arch
- 18:00:52 [RRSAgent]
- logging to http://www.w3.org/2016/02/18-dpub-arch-irc
- 18:01:10 [dkaplan3]
- dkaplan3 has joined #dpub-arch
- 18:01:14 [liam]
- liam has joined #dpub-arch
- 18:01:18 [dkaplan3]
- Present+ Deborah_Kaplan
- 18:01:26 [liam]
- Present+ Liam_Quin
- 18:01:29 [Bill_Kasdorf]
- present+ Bill_Kasdorf
- 18:01:30 [astein]
- astein has joined #dpub-arch
- 18:01:38 [Zakim]
- Zakim has joined #dpub-arch
- 18:01:53 [TimCole]
- rrsagent, set log public
- 18:02:05 [TimCole]
- Meeting: DPUB Archival TF
- 18:03:09 [TimCole]
- Agenda: https://lists.w3.org/Archives/Public/public-digipub-ig/2016Feb/0104.html
- 18:04:41 [TimCole]
- scribenick: astein
- 18:04:46 [astein]
- astein is giving scribing a try
- 18:05:18 [astein]
- TimCole: am I forgetting anything about scribing process that I should tell Ayla?
- 18:05:35 [lrosenth]
- lrosenth has joined #dpub-arch
- 18:06:00 [astein]
- dkaplan3: you say the person's name, what they say, and '...' if they continue
- 18:06:09 [TimCole]
- https://dev.w3.org/cvsweb/~checkout~/2002/scribe/scribedoc.htm?content-type=text/html#Quick_Start_Guide
- 18:06:23 [astein]
- dkaplan: type '??' if you don't know what they say
- 18:06:29 [TimCole]
- Present+ TimCole
- 18:06:45 [astein]
- Present+ astein
- 18:06:51 [lrosenth]
- present+ Leonard
- 18:07:12 [TimCole]
- Topic: Prior Minutes
- 18:07:16 [astein]
- TimCole: first up, minutes
- 18:07:25 [TimCole]
- https://www.w3.org/2016/02/04-dpub-arch-minutes.html
- 18:07:41 [astein]
- TimCole: which are located at the link posted in IRC
- 18:08:19 [astein]
- TimCole: Leonard and Deborah had an email chat about the minutes
- 18:08:28 [astein]
- dkaplan: the minutes were orrect, there was a misreading
- 18:08:48 [astein]
- the minutes were correct
- 18:08:57 [astein]
- TimCole: minutes were accepted
- 18:09:08 [astein]
- TimCole: Let's talk about meeting time
- 18:09:55 [astein]
- TimCole, which is how we ended up with today's time, but there weren't any times that were good for everyone. This time is kind of hard
- 18:10:17 [astein]
- someone: why do you think this is Europe unfriendly, early evening seems the perfect time
- 18:10:41 [lrosenth]
- no preference
- 18:10:42 [astein]
- Leonard actually, not someone
- 18:10:50 [astein]
- q+
- 18:11:37 [astein]
- dkaplan: small preference for Tuesday
- 18:11:43 [astein]
- astein: preference for Thursday
- 18:11:53 [astein]
- TimCole: 1st and 3rd Thursdays of each month through May
- 18:12:01 [astein]
- TimCole; Let's see if we can get done by then
- 18:12:25 [TimCole]
- Proposed Resolution TF will meet 1st and 3rd Thursdays at 1 PM Eastern (US) through end of May
- 18:12:26 [astein]
- TimCole: typing something in here...
- 18:12:55 [astein]
- TimCole: Resolution: TF will meet 1st and 3rd Thursdays at 1 PM Eastern (US) through end of May
- 18:13:07 [astein]
- Resolution: TF will meet 1st and 3rd Thursdays at 1 PM Eastern (US) through end of May
- 18:13:42 [TimCole]
- Topic: Portico
- 18:14:02 [astein]
- TimCole: Let me talk about my conversation with a couple people from Portico
- 18:14:19 [astein]
- Leonard: What is Portico?
- 18:14:36 [TimCole]
- http://www.portico.org/digital-preservation/
- 18:15:21 [astein]
- Bill_Kasdorf: Portico is one of the biggest dark archiving in scholarly publishing...so if a publisher goes out of business, that's a triggering event, which will provide access to all the libraries who subscribed to it so they'll never lose access to it
- 18:15:44 [TimCole]
- http://www.ithaka.org/
- 18:15:48 [astein]
- Bill_Kasdorf: in the scholarly publishing world, the other big one is LOCKS/CLOCKS
- 18:16:01 [astein]
- *LOCKSS/CLOCKSS
- 18:16:21 [astein]
- TimCole: gave link to parent organization, ithaka, which also runs JSTOR
- 18:16:55 [astein]
- TimCole: skyped with Amy and Sheila about what we're doing and what they're doing that we might want to know about/keep track of
- 18:17:04 [TimCole]
- http://www.portico.org/digital-preservation/wp-content/uploads/2013/08/Porticopublishersbrochure.pdf
- 18:17:09 [dkaplan3]
- q+ after tim is done with this summary
- 18:17:23 [TimCole]
- http://www.portico.org/digital-preservation/services
- 18:17:25 [astein]
- TimCole: http://www.portico.org/digital-preservation/wp-content/uploads/2013/08/Porticopublishersbrochure.pdf for a brochure of what they do
- 18:17:25 [dkaplan3]
- q+ to hold until after tim is done with this summary
- 18:17:34 [astein]
- oh I am not on the queue
- 18:17:54 [TimCole]
- q?
- 18:17:57 [astein]
- thanks!
- 18:17:59 [lrosenth]
- use q- to remove yoruself
- 18:18:00 [TimCole]
- ack astein
- 18:18:50 [astein]
- TimCole: As Bill said, they get originals from the publishers. What they get from the publishers vary quite a bit. Typically they get master files, what is actually published. Sometimes they get renditions
- 18:19:18 [astein]
- They sometimes get is publications if two and a half formats, xml, etc...
- 18:19:48 [astein]
- TimCole: sometimes they get just a zip with a bunch of folders, with one folder containing XML, one folder containing, pdf, etc\
- 18:20:24 [astein]
- TimCole: for each publisher, Portico creates a profile, so they can normalize what they get
- 18:20:55 [astein]
- TimCole: They normalize against standards like JATS, which was created by the National Library of Medicine. They're starting to work with EPUB
- 18:21:33 [astein]
- TimCole: as Bill said, they're a dark archive...they do try to get a PDF, or XML transformed into HTML
- 18:22:19 [astein]
- TimCole: They try to render what they're given. They just started looking into EPUB. They're hoping that what they get in EPUB they won't have to normalize. They're very interested in what the EPUB group is talking about...
- 18:22:55 [astein]
- TimCole: The other thing they struggle with a little bit is metadata. Those of us who have been involved with EPUB...they have similar discussions
- 18:23:48 [astein]
- TimCole: They extract information from their archive and create simple dublin core metadata...this could allow a very simple discovery layer if the publisher goes away. They do try to have an html display of that metadata
- 18:24:21 [astein]
- TimCole: graphics are an interesting thing they deal with. They usually get them in several different resolutions, including thumbnails and high res which they save
- 18:25:04 [astein]
- TimCole: ...they do run into some issues with older PDFs breaking. ...they're trying to make sure that they pay attention to pdf so they can migrate to newer versions of PDF or PDF/A
- 18:25:19 [astein]
- TimCole; as of right now they don't automatically transform everything into PDF/A
- 18:25:22 [astein]
- ...
- 18:25:32 [dkaplan3]
- JHOVE
- 18:25:46 [lrosenth]
- JOVE - blech :(. (poorly implemented and unsupported)
- 18:25:49 [astein]
- TimCole: They're doing some work with JHOVE which is a service that identifies file formats
- 18:25:49 [dkaplan3]
- http://jhove.openpreservation.org/
- 18:26:22 [astein]
- TimCole: They're doing some work with interoperability of file formats, content, and metadata that publishers use
- 18:26:34 [astein]
- TimCole: they wish that publishers would try to use RDF more
- 18:26:52 [astein]
- TimCole: I learned a lot of things, including a few that surprised me,???
- 18:26:53 [TimCole]
- q?
- 18:26:59 [TimCole]
- ack dkaplan
- 18:26:59 [Zakim]
- dkaplan, you wanted to hold until after tim is done with this summary
- 18:27:20 [astein]
- dkaplan: this is a good place for me to jump in that I wanted to say and everything about what you said they did
- 18:27:31 [lrosenth]
- dkaplan - I know…doesn’t make it better
- 18:27:55 [astein]
- dkaplan: there's another thing that's vital. Ultimately, the job of the archivist, is that you're going to give them some stuff and they're going to figure out what to do with it
- 18:28:03 [Bill_Kasdorf]
- q+
- 18:28:14 [astein]
- dkaplan...we want to make certain assumptions about fixity, workflows, PREMIS....
- 18:28:37 [astein]
- dkaplan: we want to say that PWPs can be described in a certain way, but they'll take what we give them
- 18:29:11 [astein]
- dkaplan: it would be great if we could say, if we put a punch of metadata in the manifest, what could it be that you could extract, etc
- 18:29:59 [astein]
- dkaplan: these places are taking disparate and undescribed datasets...they're taking everything...JHOVE looks at file formats and says 'ARG this file format isn't going to be support soon, you should do something about it"
- 18:30:26 [astein]
- dkaplan...what kinds of things would you extract from our manifest if we could put stuff in the manifest?
- 18:31:14 [TimCole]
- ack bill
- 18:31:27 [astein]
- TimCole:...Yes they'll take content, but sometimes they'll do bit-level preservation as long as there is software to use, so you can get it back. Othertimes, they very actively transform file formats, etc. Prefer file formats that are easy to read
- 18:31:40 [lrosenth]
- q+
- 18:31:51 [astein]
- Bill_Kasdorf: +1 to everything Tim and Deborah said. ....
- 18:32:26 [astein]
- Bill_Kasdorf: Is PWP a format that publishers oculd easily provide to these archivists or is it something that archivists could transform
- 18:33:18 [astein]
- Bill_Kasdorf: Portico's strategy was always to normalize things so they have a master format..They focused on scholarly journals that pretty much all used the same format. Used to be ??? and is not JATS. Books are bits?
- 18:33:47 [astein]
- Bill_Kasdorf: they started out with the big publishers, Springer, Elsevier, who all have great workflows
- 18:34:16 [astein]
- Bill_Kasdorf: but as they started to deal with smaller publishers, books, different kinds of content, this whole normalization plan starts to shake
- 18:34:45 [astein]
- Bill_Kasdorf: I was really interested to see that they said they're interested in EPUB and PWP
- 18:35:34 [astein]
- Bill_Kasdorf...ideally it would be great to see PWP be something that both the providers and the recipients of archival content could agree on...that would take enormous tension out of process
- 18:36:14 [astein]
- TimCole: they wouldn't reduce their reliance on JATS, they'd take a PWP or EPUB publication as an additional format that they could archive
- 18:36:55 [astein]
- TimCole: But I don't think they were suggesting that they would stop receiving content that would be normalized into JATS...it depends on what they get
- 18:37:22 [TimCole]
- q?
- 18:37:22 [astein]
- TimCole: They don't have any content in EPUB yet. It might be something they can use internally
- 18:37:31 [TimCole]
- ack lros
- 18:37:54 [astein]
- leonard: Bill said something...'what are we trying to achieve in this group'
- 18:38:10 [astein]
- lrosenth: I sent out that info about PDF/A...
- 18:38:33 [dkaplan3]
- q+
- 18:38:45 [astein]
- lrosenth: there are still concepts that we do know about...we could talk about best practices for creating an archival document that will withstand the test of time
- 18:39:08 [astein]
- lrosenth: we look at the open web platform...
- 18:40:01 [astein]
- TimCole: Marcus and someone had an idea to bring up to the bigger group...???
- 18:40:10 [astein]
- lrosenth: what is metadata and what is it not?
- 18:40:17 [TimCole]
- ack dka
- 18:40:20 [astein]
- lrosenth: identifying use cases perfect
- 18:40:47 [astein]
- dkaplan: I like the idea that one of our deliverables being best practices considering PWP won't be done yet
- 18:40:58 [astein]
- dkaplan: I think use cases is another good deliverable
- 18:41:41 [astein]
- dkaplan: with outreach to as many organizations as possible, ask them 'in an ideal world, what are the things that they would want to extract from the PWP. What would they want to extract from PWP/A'
- 18:42:07 [lrosenth]
- q+
- 18:42:19 [astein]
- dkaplan: for those not in the library world, PREMIS is a data dictionary that is used to describe events on an object and who performed the action
- 18:42:47 [astein]
- dkaplan: maybe we would decide that a PWP/A would have some sort of space where you could something like PREMIS or maybe not...
- 18:43:22 [astein]
- dkaplan: that would be decided by talking to these different organizations. Which of these are more related to PDF/A and which aren't?
- 18:43:22 [Bill_Kasdorf]
- q+
- 18:43:23 [astein]
- :P
- 18:43:42 [TimCole]
- ack lros
- 18:43:56 [astein]
- dkaplan: we could make a recommendation for the minimum that every manifest should have in a PWP
- 18:44:27 [astein]
- lrosenth: the idea of an audit train is actually described in PWP since the beginning for the same reasons. Even though it was specified it was never used
- 18:44:38 [TimCole]
- ack bill
- 18:45:08 [astein]
- Bill_Kasdorf: this has been an excellent discussion. 2 fundamental concerns are for people who are archiving are: versioning and migration
- 18:45:32 [astein]
- Bill_Kasdorf: one of the biggest problems is: this works now how can I make sure it will work tomorrow?
- 18:45:57 [astein]
- Bill_Kasdorf: some sort of info about if you do it this way you will be able to get to it in the future.
- 18:46:13 [TimCole]
- Topic: Use Cases
- 18:46:24 [astein]
- ha!
- 18:46:55 [dkaplan3]
- q+
- 18:47:04 [TimCole]
- ack dka
- 18:47:08 [astein]
- TimCole: UseCases - how do we get started on our use case document - wiki, github, etc?
- 18:47:32 [astein]
- dkaplan: I would recommend either wiki or github because they're a little bit easier than email for multiple authors
- 18:47:33 [TimCole]
- github?
- 18:47:42 [dkaplan3]
- +0
- 18:47:44 [lrosenth]
- 0 (no preference - either is fine)
- 18:47:47 [TimCole]
- +1
- 18:47:47 [liam]
- 0
- 18:47:48 [Bill_Kasdorf]
- 0
- 18:47:55 [astein]
- dkaplan: I think we should use whichever one has the least amount of not this votes
- 18:47:58 [astein]
- +1 github
- 18:48:18 [astein]
- TimCole: I should talk to Ivan to make sure I set up the github page correctly
- 18:49:04 [astein]
- *debate between github or wiki*
- 18:49:23 [astein]
- TimCole: advantage to github is that you can create issues
- 18:49:43 [astein]
- Bill_Kasdorf: I just used the wrong voting indicator!
- 18:49:51 [astein]
- TimCole: we'll try github to see how it works
- 18:50:30 [astein]
- TimCole: we're talking about use cases that are driving recommendations or best practices for preserving PWP documents
- 18:50:43 [astein]
- TimCole: Does that definition work?
- 18:50:50 [TimCole]
- Topic: Who's doing what over the next 2 weeks
- 18:51:24 [astein]
- TimCole: we have our task force page, we mentioned on that that we need to reach out to LOCKSS/CLOCKSS, Portico, NISO
- 18:51:40 [astein]
- TimCole: Who is going to do what? I volunteered to contact NISO and Portico
- 18:51:48 [astein]
- TimCole: ???
- 18:52:20 [astein]
- TimCole: do initial outreach, either an email or phone call, and see if they can join one of our calls
- 18:52:42 [astein]
- lrosenth: what about NARA or Library of Congress?
- 18:52:54 [astein]
- lrosenth: I'll be happy to reach out to them
- 18:53:13 [astein]
- Bill_Kasdorf: I could offer similar contacts at the British Library and the KB (Dutch National Library)
- 18:53:51 [astein]
- dkaplan: I'm trying to reach out through extended contacts to the National Library of Australia but I don't actually have contacts
- 18:53:58 [astein]
- Bill_Kasdorf: I could probably help with it
- 18:54:23 [astein]
- Bill_Kasdorf: DPLA or Europeana, actually I'll retract that
- 18:55:00 [astein]
- dkaplan: that being said, the DPLA is right in my backyard. They're really easy to reach out to. I can reach out to Mark Matienzo
- 18:55:10 [astein]
- dkaplan: he's worked on more than DPLA
- 18:55:25 [astein]
- Bill_Kasdorf: Boston Public has a big project with special collections
- 18:55:53 [astein]
- TimCole: do we have anyone interested in talking to LOCKSS/CLOCKSS
- 18:56:19 [astein]
- Bill_Kasdorf: I don't have a lot of bandwith but I could get Vicki from LOCKSS/CLOCKS
- 18:56:20 [astein]
- S
- 18:56:43 [astein]
- TimCole: could introduce Ayla to CLOCKSS/LOCKSS
- 18:57:01 [astein]
- TimCole: Ayla and Tim will contact Chris Prom and Bill Ingram UIUC
- 18:57:26 [astein]
- TimCole: add your name and who you're talking to wiki. Reach out to contacts for our next call
- 18:57:38 [astein]
- TimCole: maybe we can talk a little more about timeline as well
- 18:58:23 [astein]
- lrosenth: can contact the GPO
- 18:58:43 [astein]
- dkaplan: I can talk to a lot of people who are very active in government depository libraries
- 18:59:44 [astein]
- TimCole: we're going to stay on Thursdays at 1PM EST
- 18:59:57 [astein]
- Bill_Kasdorf: will you send a calendar invite?
- 19:00:00 [astein]
- TimCole: yes
- 19:00:26 [TimCole]
- rssagent, draft minutes
- 19:00:47 [astein]
- thanks for doing that, I wasn't sure if I should
- 19:00:55 [astein]
- I have to run to another meeting!
- 19:02:06 [TimCole]
- RRSAgent, draft minutes
- 19:02:06 [RRSAgent]
- I have made the request to generate http://www.w3.org/2016/02/18-dpub-arch-minutes.html TimCole
- 19:02:50 [TimCole]
- RRSAgent, set log public
- 21:08:45 [Zakim]
- Zakim has left #dpub-arch
- 21:12:12 [liam]
- liam has left #dpub-arch