IRC log of dpub-arch on 2016-02-18

Timestamps are in UTC.

18:00:52 [RRSAgent]
RRSAgent has joined #dpub-arch
18:00:52 [RRSAgent]
logging to http://www.w3.org/2016/02/18-dpub-arch-irc
18:01:10 [dkaplan3]
dkaplan3 has joined #dpub-arch
18:01:14 [liam]
liam has joined #dpub-arch
18:01:18 [dkaplan3]
Present+ Deborah_Kaplan
18:01:26 [liam]
Present+ Liam_Quin
18:01:29 [Bill_Kasdorf]
present+ Bill_Kasdorf
18:01:30 [astein]
astein has joined #dpub-arch
18:01:38 [Zakim]
Zakim has joined #dpub-arch
18:01:53 [TimCole]
rrsagent, set log public
18:02:05 [TimCole]
Meeting: DPUB Archival TF
18:03:09 [TimCole]
Agenda: https://lists.w3.org/Archives/Public/public-digipub-ig/2016Feb/0104.html
18:04:41 [TimCole]
scribenick: astein
18:04:46 [astein]
astein is giving scribing a try
18:05:18 [astein]
TimCole: am I forgetting anything about scribing process that I should tell Ayla?
18:05:35 [lrosenth]
lrosenth has joined #dpub-arch
18:06:00 [astein]
dkaplan3: you say the person's name, what they say, and '...' if they continue
18:06:09 [TimCole]
https://dev.w3.org/cvsweb/~checkout~/2002/scribe/scribedoc.htm?content-type=text/html#Quick_Start_Guide
18:06:23 [astein]
dkaplan: type '??' if you don't know what they say
18:06:29 [TimCole]
Present+ TimCole
18:06:45 [astein]
Present+ astein
18:06:51 [lrosenth]
present+ Leonard
18:07:12 [TimCole]
Topic: Prior Minutes
18:07:16 [astein]
TimCole: first up, minutes
18:07:25 [TimCole]
https://www.w3.org/2016/02/04-dpub-arch-minutes.html
18:07:41 [astein]
TimCole: which are located at the link posted in IRC
18:08:19 [astein]
TimCole: Leonard and Deborah had an email chat about the minutes
18:08:28 [astein]
dkaplan: the minutes were orrect, there was a misreading
18:08:48 [astein]
the minutes were correct
18:08:57 [astein]
TimCole: minutes were accepted
18:09:08 [astein]
TimCole: Let's talk about meeting time
18:09:55 [astein]
TimCole, which is how we ended up with today's time, but there weren't any times that were good for everyone. This time is kind of hard
18:10:17 [astein]
someone: why do you think this is Europe unfriendly, early evening seems the perfect time
18:10:41 [lrosenth]
no preference
18:10:42 [astein]
Leonard actually, not someone
18:10:50 [astein]
q+
18:11:37 [astein]
dkaplan: small preference for Tuesday
18:11:43 [astein]
astein: preference for Thursday
18:11:53 [astein]
TimCole: 1st and 3rd Thursdays of each month through May
18:12:01 [astein]
TimCole; Let's see if we can get done by then
18:12:25 [TimCole]
Proposed Resolution TF will meet 1st and 3rd Thursdays at 1 PM Eastern (US) through end of May
18:12:26 [astein]
TimCole: typing something in here...
18:12:55 [astein]
TimCole: Resolution: TF will meet 1st and 3rd Thursdays at 1 PM Eastern (US) through end of May
18:13:07 [astein]
Resolution: TF will meet 1st and 3rd Thursdays at 1 PM Eastern (US) through end of May
18:13:42 [TimCole]
Topic: Portico
18:14:02 [astein]
TimCole: Let me talk about my conversation with a couple people from Portico
18:14:19 [astein]
Leonard: What is Portico?
18:14:36 [TimCole]
http://www.portico.org/digital-preservation/
18:15:21 [astein]
Bill_Kasdorf: Portico is one of the biggest dark archiving in scholarly publishing...so if a publisher goes out of business, that's a triggering event, which will provide access to all the libraries who subscribed to it so they'll never lose access to it
18:15:44 [TimCole]
http://www.ithaka.org/
18:15:48 [astein]
Bill_Kasdorf: in the scholarly publishing world, the other big one is LOCKS/CLOCKS
18:16:01 [astein]
*LOCKSS/CLOCKSS
18:16:21 [astein]
TimCole: gave link to parent organization, ithaka, which also runs JSTOR
18:16:55 [astein]
TimCole: skyped with Amy and Sheila about what we're doing and what they're doing that we might want to know about/keep track of
18:17:04 [TimCole]
http://www.portico.org/digital-preservation/wp-content/uploads/2013/08/Porticopublishersbrochure.pdf
18:17:09 [dkaplan3]
q+ after tim is done with this summary
18:17:23 [TimCole]
http://www.portico.org/digital-preservation/services
18:17:25 [astein]
TimCole: http://www.portico.org/digital-preservation/wp-content/uploads/2013/08/Porticopublishersbrochure.pdf for a brochure of what they do
18:17:25 [dkaplan3]
q+ to hold until after tim is done with this summary
18:17:34 [astein]
oh I am not on the queue
18:17:54 [TimCole]
q?
18:17:57 [astein]
thanks!
18:17:59 [lrosenth]
use q- to remove yoruself
18:18:00 [TimCole]
ack astein
18:18:50 [astein]
TimCole: As Bill said, they get originals from the publishers. What they get from the publishers vary quite a bit. Typically they get master files, what is actually published. Sometimes they get renditions
18:19:18 [astein]
They sometimes get is publications if two and a half formats, xml, etc...
18:19:48 [astein]
TimCole: sometimes they get just a zip with a bunch of folders, with one folder containing XML, one folder containing, pdf, etc\
18:20:24 [astein]
TimCole: for each publisher, Portico creates a profile, so they can normalize what they get
18:20:55 [astein]
TimCole: They normalize against standards like JATS, which was created by the National Library of Medicine. They're starting to work with EPUB
18:21:33 [astein]
TimCole: as Bill said, they're a dark archive...they do try to get a PDF, or XML transformed into HTML
18:22:19 [astein]
TimCole: They try to render what they're given. They just started looking into EPUB. They're hoping that what they get in EPUB they won't have to normalize. They're very interested in what the EPUB group is talking about...
18:22:55 [astein]
TimCole: The other thing they struggle with a little bit is metadata. Those of us who have been involved with EPUB...they have similar discussions
18:23:48 [astein]
TimCole: They extract information from their archive and create simple dublin core metadata...this could allow a very simple discovery layer if the publisher goes away. They do try to have an html display of that metadata
18:24:21 [astein]
TimCole: graphics are an interesting thing they deal with. They usually get them in several different resolutions, including thumbnails and high res which they save
18:25:04 [astein]
TimCole: ...they do run into some issues with older PDFs breaking. ...they're trying to make sure that they pay attention to pdf so they can migrate to newer versions of PDF or PDF/A
18:25:19 [astein]
TimCole; as of right now they don't automatically transform everything into PDF/A
18:25:22 [astein]
...
18:25:32 [dkaplan3]
JHOVE
18:25:46 [lrosenth]
JOVE - blech :(. (poorly implemented and unsupported)
18:25:49 [astein]
TimCole: They're doing some work with JHOVE which is a service that identifies file formats
18:25:49 [dkaplan3]
http://jhove.openpreservation.org/
18:26:22 [astein]
TimCole: They're doing some work with interoperability of file formats, content, and metadata that publishers use
18:26:34 [astein]
TimCole: they wish that publishers would try to use RDF more
18:26:52 [astein]
TimCole: I learned a lot of things, including a few that surprised me,???
18:26:53 [TimCole]
q?
18:26:59 [TimCole]
ack dkaplan
18:26:59 [Zakim]
dkaplan, you wanted to hold until after tim is done with this summary
18:27:20 [astein]
dkaplan: this is a good place for me to jump in that I wanted to say and everything about what you said they did
18:27:31 [lrosenth]
dkaplan - I know…doesn’t make it better
18:27:55 [astein]
dkaplan: there's another thing that's vital. Ultimately, the job of the archivist, is that you're going to give them some stuff and they're going to figure out what to do with it
18:28:03 [Bill_Kasdorf]
q+
18:28:14 [astein]
dkaplan...we want to make certain assumptions about fixity, workflows, PREMIS....
18:28:37 [astein]
dkaplan: we want to say that PWPs can be described in a certain way, but they'll take what we give them
18:29:11 [astein]
dkaplan: it would be great if we could say, if we put a punch of metadata in the manifest, what could it be that you could extract, etc
18:29:59 [astein]
dkaplan: these places are taking disparate and undescribed datasets...they're taking everything...JHOVE looks at file formats and says 'ARG this file format isn't going to be support soon, you should do something about it"
18:30:26 [astein]
dkaplan...what kinds of things would you extract from our manifest if we could put stuff in the manifest?
18:31:14 [TimCole]
ack bill
18:31:27 [astein]
TimCole:...Yes they'll take content, but sometimes they'll do bit-level preservation as long as there is software to use, so you can get it back. Othertimes, they very actively transform file formats, etc. Prefer file formats that are easy to read
18:31:40 [lrosenth]
q+
18:31:51 [astein]
Bill_Kasdorf: +1 to everything Tim and Deborah said. ....
18:32:26 [astein]
Bill_Kasdorf: Is PWP a format that publishers oculd easily provide to these archivists or is it something that archivists could transform
18:33:18 [astein]
Bill_Kasdorf: Portico's strategy was always to normalize things so they have a master format..They focused on scholarly journals that pretty much all used the same format. Used to be ??? and is not JATS. Books are bits?
18:33:47 [astein]
Bill_Kasdorf: they started out with the big publishers, Springer, Elsevier, who all have great workflows
18:34:16 [astein]
Bill_Kasdorf: but as they started to deal with smaller publishers, books, different kinds of content, this whole normalization plan starts to shake
18:34:45 [astein]
Bill_Kasdorf: I was really interested to see that they said they're interested in EPUB and PWP
18:35:34 [astein]
Bill_Kasdorf...ideally it would be great to see PWP be something that both the providers and the recipients of archival content could agree on...that would take enormous tension out of process
18:36:14 [astein]
TimCole: they wouldn't reduce their reliance on JATS, they'd take a PWP or EPUB publication as an additional format that they could archive
18:36:55 [astein]
TimCole: But I don't think they were suggesting that they would stop receiving content that would be normalized into JATS...it depends on what they get
18:37:22 [TimCole]
q?
18:37:22 [astein]
TimCole: They don't have any content in EPUB yet. It might be something they can use internally
18:37:31 [TimCole]
ack lros
18:37:54 [astein]
leonard: Bill said something...'what are we trying to achieve in this group'
18:38:10 [astein]
lrosenth: I sent out that info about PDF/A...
18:38:33 [dkaplan3]
q+
18:38:45 [astein]
lrosenth: there are still concepts that we do know about...we could talk about best practices for creating an archival document that will withstand the test of time
18:39:08 [astein]
lrosenth: we look at the open web platform...
18:40:01 [astein]
TimCole: Marcus and someone had an idea to bring up to the bigger group...???
18:40:10 [astein]
lrosenth: what is metadata and what is it not?
18:40:17 [TimCole]
ack dka
18:40:20 [astein]
lrosenth: identifying use cases perfect
18:40:47 [astein]
dkaplan: I like the idea that one of our deliverables being best practices considering PWP won't be done yet
18:40:58 [astein]
dkaplan: I think use cases is another good deliverable
18:41:41 [astein]
dkaplan: with outreach to as many organizations as possible, ask them 'in an ideal world, what are the things that they would want to extract from the PWP. What would they want to extract from PWP/A'
18:42:07 [lrosenth]
q+
18:42:19 [astein]
dkaplan: for those not in the library world, PREMIS is a data dictionary that is used to describe events on an object and who performed the action
18:42:47 [astein]
dkaplan: maybe we would decide that a PWP/A would have some sort of space where you could something like PREMIS or maybe not...
18:43:22 [astein]
dkaplan: that would be decided by talking to these different organizations. Which of these are more related to PDF/A and which aren't?
18:43:22 [Bill_Kasdorf]
q+
18:43:23 [astein]
:P
18:43:42 [TimCole]
ack lros
18:43:56 [astein]
dkaplan: we could make a recommendation for the minimum that every manifest should have in a PWP
18:44:27 [astein]
lrosenth: the idea of an audit train is actually described in PWP since the beginning for the same reasons. Even though it was specified it was never used
18:44:38 [TimCole]
ack bill
18:45:08 [astein]
Bill_Kasdorf: this has been an excellent discussion. 2 fundamental concerns are for people who are archiving are: versioning and migration
18:45:32 [astein]
Bill_Kasdorf: one of the biggest problems is: this works now how can I make sure it will work tomorrow?
18:45:57 [astein]
Bill_Kasdorf: some sort of info about if you do it this way you will be able to get to it in the future.
18:46:13 [TimCole]
Topic: Use Cases
18:46:24 [astein]
ha!
18:46:55 [dkaplan3]
q+
18:47:04 [TimCole]
ack dka
18:47:08 [astein]
TimCole: UseCases - how do we get started on our use case document - wiki, github, etc?
18:47:32 [astein]
dkaplan: I would recommend either wiki or github because they're a little bit easier than email for multiple authors
18:47:33 [TimCole]
github?
18:47:42 [dkaplan3]
+0
18:47:44 [lrosenth]
0 (no preference - either is fine)
18:47:47 [TimCole]
+1
18:47:47 [liam]
0
18:47:48 [Bill_Kasdorf]
0
18:47:55 [astein]
dkaplan: I think we should use whichever one has the least amount of not this votes
18:47:58 [astein]
+1 github
18:48:18 [astein]
TimCole: I should talk to Ivan to make sure I set up the github page correctly
18:49:04 [astein]
*debate between github or wiki*
18:49:23 [astein]
TimCole: advantage to github is that you can create issues
18:49:43 [astein]
Bill_Kasdorf: I just used the wrong voting indicator!
18:49:51 [astein]
TimCole: we'll try github to see how it works
18:50:30 [astein]
TimCole: we're talking about use cases that are driving recommendations or best practices for preserving PWP documents
18:50:43 [astein]
TimCole: Does that definition work?
18:50:50 [TimCole]
Topic: Who's doing what over the next 2 weeks
18:51:24 [astein]
TimCole: we have our task force page, we mentioned on that that we need to reach out to LOCKSS/CLOCKSS, Portico, NISO
18:51:40 [astein]
TimCole: Who is going to do what? I volunteered to contact NISO and Portico
18:51:48 [astein]
TimCole: ???
18:52:20 [astein]
TimCole: do initial outreach, either an email or phone call, and see if they can join one of our calls
18:52:42 [astein]
lrosenth: what about NARA or Library of Congress?
18:52:54 [astein]
lrosenth: I'll be happy to reach out to them
18:53:13 [astein]
Bill_Kasdorf: I could offer similar contacts at the British Library and the KB (Dutch National Library)
18:53:51 [astein]
dkaplan: I'm trying to reach out through extended contacts to the National Library of Australia but I don't actually have contacts
18:53:58 [astein]
Bill_Kasdorf: I could probably help with it
18:54:23 [astein]
Bill_Kasdorf: DPLA or Europeana, actually I'll retract that
18:55:00 [astein]
dkaplan: that being said, the DPLA is right in my backyard. They're really easy to reach out to. I can reach out to Mark Matienzo
18:55:10 [astein]
dkaplan: he's worked on more than DPLA
18:55:25 [astein]
Bill_Kasdorf: Boston Public has a big project with special collections
18:55:53 [astein]
TimCole: do we have anyone interested in talking to LOCKSS/CLOCKSS
18:56:19 [astein]
Bill_Kasdorf: I don't have a lot of bandwith but I could get Vicki from LOCKSS/CLOCKS
18:56:20 [astein]
S
18:56:43 [astein]
TimCole: could introduce Ayla to CLOCKSS/LOCKSS
18:57:01 [astein]
TimCole: Ayla and Tim will contact Chris Prom and Bill Ingram UIUC
18:57:26 [astein]
TimCole: add your name and who you're talking to wiki. Reach out to contacts for our next call
18:57:38 [astein]
TimCole: maybe we can talk a little more about timeline as well
18:58:23 [astein]
lrosenth: can contact the GPO
18:58:43 [astein]
dkaplan: I can talk to a lot of people who are very active in government depository libraries
18:59:44 [astein]
TimCole: we're going to stay on Thursdays at 1PM EST
18:59:57 [astein]
Bill_Kasdorf: will you send a calendar invite?
19:00:00 [astein]
TimCole: yes
19:00:26 [TimCole]
rssagent, draft minutes
19:00:47 [astein]
thanks for doing that, I wasn't sure if I should
19:00:55 [astein]
I have to run to another meeting!
19:02:06 [TimCole]
RRSAgent, draft minutes
19:02:06 [RRSAgent]
I have made the request to generate http://www.w3.org/2016/02/18-dpub-arch-minutes.html TimCole
19:02:50 [TimCole]
RRSAgent, set log public
21:08:45 [Zakim]
Zakim has left #dpub-arch
21:12:12 [liam]
liam has left #dpub-arch