About
Note
|
Abstract Currently, academic publishing is in flux, with several different narratives colliding. First, there is now a move toward increasing openness within science with open access, open data and open source. Open access means that academic articles are freely available for all to read, and is fast becoming a standard mechanism for publishing. I will describe the roots of this movement. Second, although the web has been around for a while, academic publishing has been largely untouched by it; meanwhile the web itself is increasingly been pushed as a mechanism for large scale data integration as part of a linked-data environment. I will describe two pieces of my own work, knowledgeblog and greycite, which enable academics to publish natively to the web as linked data. I will also describe some of the emerging publishing models from elsewhere. Finally, I will consider ways in which we as a school might alter our current practice, in anticipation of these changes. If I remember, I will also explain why academic publishing is a Dutch tulip bulb. |
A Question
Who has an article in Lecture Notes in Computing Science?
Open Science
A broad definition would be:
-
Open Data
-
Open Source
-
Open Access
Note
|
Today I am going to talk about open access, because currently there are a lot of changes happening here politically. But I want to talk about this more generically in the context of open science; the idea that, to quote a recent Royal Society report, Science is an open enterprise. As well as talking about the background in general, I am also going to talk about some of the work that we have been doing recently, looking at how the publication process might change, most enabling or predicated on the basis of free access to the material. |
Some human interest
-
A biologist from UC Davis.
-
Wanted to collect his father’s (also a biologist) life work
-
But couldn’t.
Story by David Dobbs Taken from: http://www.wired.com/wiredscience/2011/05/free-science-one-paper-at-a-time-2/
Note
|
Following a grand journalistic tradition, I thought I would start off with a human interest angle, with a story that I have borrowed from an article in Wired from last year. I do this both some I can warm your hearts, but also so I can plug the fact that I got into the article as well. This is a picture of Jonathan Eisen. After his fathers untimely death in 1987, he wanted to commerate his fathers work by collecting all his papers. But he couldn’t, because many of them were note available and most had paywalls. |
Harvard runs out of cash
-
"fiscally unsustainable"
-
"academically restrictive"
-
"online content from two providers have increased by about 145% over the past six years"
-
"exacerbated by […] publishers to acquire, bundle, and increase the pricing"
Taken from: http://www.guardian.co.uk/science/2012/apr/24/harvard-university-journal-publishers-prices
Note
|
This was followed up by this amusing story. These are quotes from a memo sent by the library at Harvard to its faculty. Yes, Harvard are running out of cash. The basic problem identified by Harvard library is two-fold. First the publishers keep on increasing their prices. It’s actually hard to know for sure how much they are doing this, of course, because most publishers require a NDA for deals done with individual libraries. And, second, because of bundling, also known as "big deals". This is the process where by a library buys not one journal but many at a discount from a publisher; the problem is that libraries often find that the prices increase steadily over time. |
Open Access
-
Pioneered by BMC — an online journal
-
Originally took the bottom end of the market
-
Followed by PLoS
-
Originally aimed at the top end, and grant supported
Note
|
One potential solution to the problem is open access. This was pioneered by BioMedCentral from about 2002. This was an online publication house, with the end papers being free for everyone to read. They made their money by charging authors for publishing. Interestingly, this wasn’t see as terribly novel because in biology, page charges were common anyway and often substantial (1000 a page for instance). They originally took the low end of the market. Which is why a couple of years later, PLoS was formed. It came about because a number of bioinformaticians were getting disgruntled at their inability to text mine articles; to continue with the human interest angle, one of these was Michael Eisen, brother of Jonathan mentioned earlier. Are your hearts all warm yet? Now, the first time that I saw Michael Eisen talk about this they had an interesting angle; they wanted to do nothing innovative — the publication process had to be as much like the existing one as possible, because, they figured one change at a time. Good idea. |
Open Access (10 years on)
-
BMC Bioinformatics is now high impact
-
PLoS has 6 main journals
-
And PLoS One
Note
|
So what has happened in the last 10 years. Well, first, open access has become accepted particularly in some fields. BMC Bioinformatics, for example, is now a high impact journal. In biology, about 20% of papers are open access. PLoS has 6 main journals now (PLoS Biology is edited by Jonathan Eisen). And PLoS One. It came later, and unlike the "main" journals in PLoS it is turning out to be revolutionary. |
PLoS One
-
Has peer-review
-
Judges on scientific rigour
-
Not on perceived importance
-
Now has impact factor 4.4
-
In 2010 > 6000 articles
-
the largest journal in the world
Note
|
PLoS One is online and open access. It charges for publication. It is peer-reviewed, and this peer-review judges on scientific rigour of the work. Here is the revolutionary one; it judges not on the basis of percieved importance of the work. Basically, it has removed itself from the last shackle of tree-based publication. The marginal cost of publication is now small. There are no issues, and anyway, these days people get to articles via google. It now has an impact factor of 4.4 — although, incidentally, PLoS has a publically stated policy that IFs are non-sensical, which PLoS One shows clearly. In 2010 it published more than 6000 articles which made it the largest journal in the world. In 2011, this number doubled. |
Open Access Mandates
-
Many funders now mandate OA
-
RCUK (sort of). NIH. Wellcome.
-
Welcomed by all!
-
Research Works Act
Note
|
Many funders, particularly those with public aims such as Wellcome, or those funded by the public, have started to jump on board. Why should the public pay for things twice, they argue; they pay for the research, why should they pay to read about it. So there are now open access mandates from RCUK, although it’s a little vague what it actually means. The toll access publishers, of course, have always had providing access to research materials as a key part of their mission statements, so they have been enthusiastic supporters of this. Which is where we come to the Research Works Act. |
RWA
No Federal agency may adopt, implement, maintain, continue, or otherwise engage in any policy, program, or other activity that […] causes, permits, or authorizes network dissemination of any private-sector research work without the prior consent of the publisher of such work
-
sponsered by senators who have recieved multiple donations from Elsevier
-
"private-sector research work" would include papers
-
Because, after all a paper is produced by the private sector
Note
|
RWA was a US senate bill which cause a lot of grief, especially coming on the back of SOPA and ACTA to which it is only broadly related. Reading the quote above, you might think what is the big deal, but it was basically aimed at NIHs open access policy, since "private-sector research work" was seen to include anything published by a private sector publisher, even if all the work had been done within the public sector. |
Cost of Knowledge
-
From an original post by Tim Gowers
-
Fields medalist
-
polymath, crowd sourcing maths proofs.
-
Not the first boycott
Note
|
All of this lead to the Cost of Knowledge. Although the website was not set up by him, it was in response to an article by Tim Gowers. Famous, if you are a mathematician, partly because he won a Field Medal, partly for using the web to stimulate collaborative mathematics. Basically, it’s a boycott of Elsevier. He will no longer review, author or edit for Elsevier. This is not the first such boycott; that was a decade ago. Will this one work? Hard to tell, but there is a lot more traction, not helped by Elsevier continually refering to papers as "their content". They are correct, of course, it is theirs. But many scientists are starting to ask why, given that they are the authors, reviewers, editors and readers. |
Arguments for Paywall Journals
-
Paywall promotes quality
-
Australasian Journal of Bone and Joint Medicine
-
Paid for by Merck
-
Positive stories about Merck products
OA is vanity publishing
-
Chaos, Solitons and Fractals
OA is vanity publishing
-
Chaos, Solitons and Fractals
OA is vanity publishing
-
Chaos, Solitons and Fractals
OA is vanity publishing
-
Chaos, Solitons and Fractals
OA is vanity publishing
-
Author M. S. El Naschie published >300 single author articles
-
A big favorite of Editor-in-Chief
-
M. S. El Naschie
-
At one point, Chaos, Solitons and Fractals was highest IF maths journal
Paywall access prevents copying
|
|
A Question and an Answer
-
Who has an article in Lecture Notes in Computing Science?
-
None of you. If it’s in LNCS, the article is Springer’s
-
OA or OA/2
-
Open Access is, however, expensive
-
Between 800 and 3000(!) pound
-
It’s "half open access"
-
Ultimately, do I care? It’s not my cash!
-
There is also a hidden cost
Note
|
Open Access is, however, expensive, at between 800 and 3000 pound. This has lead a friend of mine, Bijan Parsia, to coin the phrase "half open access". It’s only open from one side of the equation. But, ultimately, do I care? It’s not actually my cash, and besides research is expensive. Publication costs are normally about 1% of the total cost of research at most. But I am going to argue that there are a number of hidden costs, and that these costs are ones that you will or should care about. |
The Costs
-
The Second Biggest Scientific Publisher
-
250,000 articles per year
-
240 million Downloads
-
Cost: 1.5 Billion Euro
-
Elsevier
Note
|
So, let’s consider the costs. First, lets consider the second biggest scientific publisher in the world. Some basic stats. This is Elsevier. Actually, Elsevier is not that common in Computing, but has an enormous life sciences presence. Springer-Verlag is of a similar size. |
The Costs
-
The First Biggest Scientific Publisher
-
17 million articles
-
> 20 languages
-
365 million readers
-
Total Cost: 10 million dollars
-
Wikipedia
Note
|
And the biggest publisher of scientific literature in the world. Wikipedia. Now, of course, this isn’t entirely fair. Wikipedia also publishes a lot of non-scientific literature (the figures from Elsevier also include this). So this isn’t a fair comparison. But it’s not, however, 2 orders of magnitude unfair either. So why is it so expensive? Elsevier’s profit margin? Well, no, because even it’s 40% profit margin isn’t enough to explain this. |
The process
The process
Note
|
Let’s consider a typical publishing process. This process is, incidentally, not Elsevier, but it is from PLoS;in fact PLoS One. It is not unrelated to other journals however. I generally write my articles in LaTeX, because I happen to like it. You may use word, but the point remains. |
The process
Note
|
I convert this into PDF on my machine and then upload this. Along with all the TeX in case they need them for some unspecified reason. |
The process
-
Again!
Note
|
My PDF gets converted into another PDF. As far as I can tell, this happens at the level of a PDF→PDF conversion. I still have no idea what this is hoping to achieve. My believe is that Word docs also get PDF converted at this point. |
The process
Note
|
This PDF gets converted into an Word doc. Hmmm. Surely, you say, this makes no sense? Even worse when you discover that this conversion process involves someone copying the PDF into word. |
The process
Taken from: http://en.wikipedia.org/wiki/File:XML.svg
The process
Taken from: http://www.w3c.org/html/logo
Note
|
Actually, it’s HTML4, but I couldn’t find an HTML4 logo. |
The process
-
Again, Again!
Hidden Cost
Compare to arXiv
-
The "physics" pre-print server
-
Cut and paste in standard metadata
-
Upload .tex and .bbl file (also images)
-
View PDF
-
Click "go"
-
Cost $7 per paper
-
Takes about ~7minutes (n=1) from LaTeX to published
Note
|
Now we can compare this to arXiv which was originally a preprints service for physics and now takes articles from a much wider set of disciplines including computing science. It includes a standard(ish) and stable identifier, allows metadata harvesting and sends out emails and stuff. On a short survey, n=1, this takes an average of 7 minutes to complete from start to finish. |
Compare to Wordpress
-
Open Lab Book
-
Write in Word
-
Click "go"
-
Publication time measures in ms
Note
|
As part of my commitment to open science, I also use Wordpress to host my open notebook. Note here, that I have carefully phrased this to avoid the word "blog". To a large part I do not write about my life, my hobbies or my cat here. Although I did used to talk about my dinner. Over time, my blog has evolved to become largely profession. And publishing there is very, very easy. Actually, I don’t use word. But, regardless of what you use, the publication process — as opposed to the authoring process — is very quick, and is best measured in milliseconds. |
Ontogenesis
-
Wanted to publish tutorial information
-
Writing a book is painful
-
Open, public peer-review
-
We now have around 30 articles, published over 2 years
-
Many articles short, discrete
-
We have published unpublishable material
Note
|
So, where does this new found rapidity leave us. Well, we started experimenting with this a couple of years ago. In my other life, I build ontologies. It isn’t easy, not helped by the total lack of tutorial information available. We thought about writing a book, but it never happened. It never happened because we had all got so irritated with publishing taking material, then after a year of no communication suddenly getting an email saying "here are the proofs, please correct within 5 days." Books also require either very significant pieces of writing, or are multi-multi-author. Either way it’s a pain. In some cases, we wanted to publish quite short material. So, I wrote a nice article on "do cyclists pay tax?" which turns out to be a good example describing the difference between universal and existential quantification, as well as roles and inheritance. It’s 2 pages, it’s complete. Similarly, "what is disjointness" In short, ontogenesis is full of unpublishable material. It’s unpublishable not because the material is bad, but because the publication process is bad. We now have 30 articles, and about 30000 page views. |
Linked Data
-
Now we have a simple process
-
Which we have extended
-
Knowledgeblog
-
We can publish linked, semantic article
Note
|
But there’s more…. We now also have a simple process, with no human involvement. We can now start to push semantics through this process, we can make an article a part of a linked data environment. This allows us to do interesting things. |
References
-
Academics love in text-citations (Lord, 2012)
-
Will describe two tools, kcite and greycite
References
Lord, P (2012) Academics love Citations. J.Unsupport.Assert
References (author)
-
Authors insert primary identifiers
-
[cite]10.100/100.1[/cite]
-
This is a DOI.
-
We also do arXiv, pubmed IDs.
Note
|
We are now part of a linked data environment. The article is an active mashup. None of the metadata you see here is embedded. All of it is gathered from other sources. More over, every reference has a URL, and can be unambiguously identified (in the sense, we can be sure what we are point to — comparision of two references is a little harder). And the author has gained some advantage. They don’t need to type the metadata in, and if they get the reference wrong they will see it straight away. |
References (reader)
-
Readers can see citation or direct link
-
Readers can change citation style
Note
|
We also have an active document. The reference style is no longer handed down from on high. The reader can choose how they want to see things. |
References (kcite architecture)
The Problem
-
Inserting links is painful
-
Fortunately, we can use the bibliographic metadata to help
-
Will demonstrate this after a brief interlude
Note
|
Inserting references in this way is a pain. However, in most cases, we can do this in a metadata driven way. We can use the same metadata that is used to generate the bibliography for the authors. |
Greycite
-
Originally, kcite could not support URLs
-
As well as Publishing the Unpublishable
-
We want to Cite the Uncitable
-
We needed a source for metadata
-
Now we have http://greycite.knowledgeblog.org
-
Developed by Lindsay Marshall, Computing Science, Newcastle
Note
|
One substantial problem is that absence of an ability to do this for URIs. So, we wanted to address this with greycite. So, going to show how this works. Greycite is a new tool which mines bibliographic metadata from URIs. Doesn’t work for all URIs. Mining is (deliberately) not too intelligent. We look mostly for things that are intended to be mined. We have also added tools to wordpress to allow flexible insertion of metadata (including with shortcodes, or through a nice GUI). |
Greycite
-
This is greycite
Greycite
-
Add a URL
Note
|
Putting a URL into to an article on my blog |
Greycite
-
URL results
Note
|
In this case, we’ve seen the URL before and greycite knows some basic metadata. |
Greycite
-
Scrolling to more detail
Note
|
Looking in more detail, we can see that in March, the article had a title "Why Not?", was authored by me and dates from 2010. |
Greycite
-
Provenance (coins)
Note
|
We have gathered this metadata from a variety of sources, and you can see the provenance. Coins is a dreadful standard which is now quite a few years old, but some people use it. Works by embedding a span tag into the body of the post. Although it is dreadful in everyway, that it is embedded in the body is it’s most useful feature, as many people don’t control their headers. |
Greycite
-
Provenance (OGP)
Note
|
Also we support Open Graph Protocol. Much nicer "standard", partly developed by Facebook who would appear to be much better at uncovering other peoples bibliographic metadata, than alledgedly they are at uncovering their own financial data. |
Greycite
-
And elsewhere
Note
|
We have used somewhat established ad-hoc standards, so it works on sites outside of our control, and also sites which are not necessarily academic. This is an article from the BBC for instance. |
Greycite
-
And elsewhere
Greycite
-
Sadly, some websites have no semantics that we can find
Greycite
-
Preservation
Note
|
We can also link through to other resourecs, such as archive.org. We also support archive.org.uk — provided by the British Library and are working on webcitations.org |
Greycite
Note
|
So, we can maintain links to the academic record even if the links break. Currently, this is not apparent in kcite generated bibliographies, but we will add redirection in soon for links which appear to have gone 404. |
Citing Links
-
How does citation work?
Note
|
This is the bio-ontologies website. Lots of papers on it. Many with complex metadata. For those of you interested in this sort of thing, I have now separated the metadata from the wordpress environment. Previously all the authors needed logins. PITA. |
Citing Links
-
We want to cite this page
Citing Links
-
In an article, I am writing
Note
|
For my own editing environment, I use asciidoc, bibtex and emacs. I acknowledge that this is a little niche, but it does work with other environments also. |
Citing Links
-
First we take the URL
Note
|
Actually, this is enough. It is all that you need; however, as an author I find myself citing the same URL repeatedly. Google is very good at getting you to where you want to go, but it is not perfect, particularly when there are a lot of articles on one topic. So, I wanted something quicker, searching over what I am interested in. |
Citing Links
-
Query Greycite
Citing Links
-
Get back bibtex
Note
|
The metadata here comes from the web page, so in one sense cannot be wrong, although it can be different from what the author wants it to be. Recently, I’ve added support for Wordpress to advertise it’s metadata on the page as well, so it’s visible. We can do similar things with DOIs, arxiv and the like. |
Citing Links
-
So, we search
Citing Links
-
And select
Citing Links
-
And insert
Citing Links
-
And publish
Note
|
All of this is tied together with just a little semantic glue. We needed some hueristics, we needed some format shifting, but that is it. |
Accessibility
-
This is all fairly simple
-
And works because the content is OA
-
Greycite can get to the metadata because it is open
-
BL can archive, because the content is open
-
Archive also includes metadata because it is open
-
Much of it works outside academic publishing
-
Compare DOI, CrossRef, LOCKSS and so on.
Note
|
All of this works because we have an open resource, based on widely available standards. Compare this to CrossRef and DOI technology which is more complex. Compare this to LOCKSS which is more complex. And, in most cases, we are not using bespoke software specific to the academic publishing industry. Hence it works with BBC news. |
Ideas: Glossary
-
Short 140 word articles with a title
-
Word, not character!
-
Fully attributed
-
Linkble
-
Publish via email
-
Displayed inline
Note
|
So, further ideas. We have already pursed a mashup strategy. Want to push this further. In this publication environment we can do very small-scale publishing. While others are pushing nanopublications, this is more mini-publication. You publish a short article, 140 words long to operate as a glossary. Links back will then appear in popups as a glossary, rather than as a hyperlink or in a reference list. The glossary will not have a single name space, so multiple definitions are possible. And all the things we have added so far will help. Greycite will provide bibtex to make the link insertion easier. Probably going to investigate a publish by email protocol also, so that new articles are very easy to publish while still maintaining a moderation step. |
Ideas: Structured Knowledge
-
Open Disease Reports from David Shotton
-
Short summaries of disease information published elsewhere
-
Critical for third world
-
We want to structure the knowledge for mining.
Ideas: Enhanced Linking
Note
|
We want to link to other resources out there. We would like to do this intelligently. For instance, this is a tool called chemicalize which inserts implicit links by named-entity recognition. Very nice. But probably not something that you want on a cookery page. |
Ideas: Linking
Ideas: The NearCon
-
Nearly a conference
-
Cross between a workshop and a special issue
-
Publish papers and then talk about them!
-
Like a workshop
-
see papers you might miss
-
-
Asynchronous!
Note
|
Also wanted to pursue |
Publishing in Flux
-
eLife - modelled on PLoS one
-
F1000 - Cross between PLoS One and arXiv
-
PeerJ - $100 to register, free to publish
-
Dutch Tulip bulb
Note
|
We are not the only people playing in this environment. Publishing is in a lot of flux at the moment and there are a lot of new ideas coming out. So, eLife for example, which is modelled on PLoS One and is lead by Mark Patterson who previously worked for PLoS Currents. Interestingly, it’s directly supported by the Wellcome trust. F1000 are producing something new. This is rather an offshoot of their prepublication, and poster publication service. Probably free to publish initially but we are not sure yet. Finally, is PeerJ. If we can sequence the human genome for $1000, why can we not publish a paper for $99. Basically, you register for $100, then can publish for life (1 paper a year, and you have to do 1 review a year, and all authors need to be registered). This is Peter Binfield, Jason Vogt and, most interestingly, Tim O’Reilly. The dutch tulip bulb? First good example of a speculative economic bulb, where the cost of a resource increased totally out of proportion to its value. Currently, academic publishing comes with a lot of costs, but does it come with any value? |
What can we do?
-
Technical Report Series
-
Replace with arXiv
-
Recognised
-
Stable
-
Standard Metadata
-
Supplement with Kblog
-
More experimental
-
Web First
What can we do?
-
Grants
-
Publish all our grants on the web
-
Successful or not!
-
Knowledge blog would be a good framework
What can we do?
-
Online thesis
-
We can encourage students to publish their theses online
-
Allyson Lister has just done this
-
http://themindwobbles.wordpress.com/2012/06/14/thesis-abstract/
What can we do?
-
Open Notebook Science
-
all research active staff and students
-
Publish as we go!
What can we do?
-
Cash!
-
Q: Can I get cash for my PhD student to conference?
-
A: Yes! But conference should come with publication.
-
All the suggested publication locations are toll access
What can we do?
-
Teaching
-
Open Education Resources
-
Release lecture notes online
-
Release all recap online
-
Try before you buy!
Elephant in the Room
-
REF
-
Promotion Committees
-
Lawyers
Note
|
REF is a problem — there is a tendency toward the false metrics such as IF, and this is an undeniable problem. Promotion Committees — tend to have a bias toward high quality journals, where "high quality" is defined as "the ones I published/publish in". The older a scientist is, the more likely they are to have the bulk of their work in TA journals. Lawyers — currently, RECAP is not going online and has, in fact, got more restrictive because of fears that we will uncover the large scale copyright violation by staff. Also, university does not allow staff to attach CC, OA licenses to their work without permission. Although, strangely, it does allow them to give their work away without compensation and without even retaining the right to use it themselves. Hmmm. |
Acknowledments
-
Kblog/Ontogenesis
-
Robert Stevens, Georgina Moulton (Manchester).
-
Dan Swan, Simon Cockell (Newcastle)
-
-
Kcite
-
Simon Cockell
-
-
Greycite
-
Lindsay Marshall
-
Summary/Conclusion
-
Open Access == freely available papers
-
Part of the move to Open Science
-
Happening whether we like it or not
-
It should be seen as an opportunity not a risk