Currently, academic publishing is in flux, with several different narratives colliding. First, there is now a move toward increasing openness within science with open access, open data and open source. Open access means that academic articles are freely available for all to read, and is fast becoming a standard mechanism for publishing. I will describe the roots of this movement. Second, although the web has been around for a while, academic publishing has been largely untouched by it; meanwhile the web itself is increasingly been pushed as a mechanism for large scale data integration as part of a linked-data environment. I will describe two pieces of my own work, knowledgeblog and greycite, which enable academics to publish natively to the web as linked data. I will also describe some of the emerging publishing models from elsewhere. Finally, I will consider ways in which we as a school might alter our current practice, in anticipation of these changes.

If I remember, I will also explain why academic publishing is a Dutch tulip bulb.

A Question

Who has an article in Lecture Notes in Computing Science?

Open Science

A broad definition would be:

  • Open Data

  • Open Source

  • Open Access


Today I am going to talk about open access, because currently there are a lot of changes happening here politically. But I want to talk about this more generically in the context of open science; the idea that, to quote a recent Royal Society report, Science is an open enterprise.

As well as talking about the background in general, I am also going to talk about some of the work that we have been doing recently, looking at how the publication process might change, most enabling or predicated on the basis of free access to the material.

Some human interest

  • A biologist from UC Davis.

  • Wanted to collect his father’s (also a biologist) life work

  • But couldn’t.


Following a grand journalistic tradition, I thought I would start off with a human interest angle, with a story that I have borrowed from an article in Wired from last year. I do this both some I can warm your hearts, but also so I can plug the fact that I got into the article as well.

This is a picture of Jonathan Eisen. After his fathers untimely death in 1987, he wanted to commerate his fathers work by collecting all his papers. But he couldn’t, because many of them were note available and most had paywalls.

Harvard runs out of cash

  • "fiscally unsustainable"

  • "academically restrictive"

  • "online content from two providers have increased by about 145% over the past six years"

  • "exacerbated by […] publishers to acquire, bundle, and increase the pricing"


This was followed up by this amusing story. These are quotes from a memo sent by the library at Harvard to its faculty. Yes, Harvard are running out of cash.

The basic problem identified by Harvard library is two-fold. First the publishers keep on increasing their prices. It’s actually hard to know for sure how much they are doing this, of course, because most publishers require a NDA for deals done with individual libraries. And, second, because of bundling, also known as "big deals". This is the process where by a library buys not one journal but many at a discount from a publisher; the problem is that libraries often find that the prices increase steadily over time.

Open Access

  • Pioneered by BMC — an online journal

  • Originally took the bottom end of the market

  • Followed by PLoS

  • Originally aimed at the top end, and grant supported


One potential solution to the problem is open access. This was pioneered by BioMedCentral from about 2002. This was an online publication house, with the end papers being free for everyone to read. They made their money by charging authors for publishing. Interestingly, this wasn’t see as terribly novel because in biology, page charges were common anyway and often substantial (1000 a page for instance).

They originally took the low end of the market. Which is why a couple of years later, PLoS was formed. It came about because a number of bioinformaticians were getting disgruntled at their inability to text mine articles; to continue with the human interest angle, one of these was Michael Eisen, brother of Jonathan mentioned earlier. Are your hearts all warm yet?

Now, the first time that I saw Michael Eisen talk about this they had an interesting angle; they wanted to do nothing innovative — the publication process had to be as much like the existing one as possible, because, they figured one change at a time. Good idea.

Open Access (10 years on)

  • BMC Bioinformatics is now high impact

  • PLoS has 6 main journals

  • And PLoS One


So what has happened in the last 10 years. Well, first, open access has become accepted particularly in some fields. BMC Bioinformatics, for example, is now a high impact journal. In biology, about 20% of papers are open access. PLoS has 6 main journals now (PLoS Biology is edited by Jonathan Eisen).

And PLoS One. It came later, and unlike the "main" journals in PLoS it is turning out to be revolutionary.

PLoS One

  • Has peer-review

  • Judges on scientific rigour

  • Not on perceived importance

  • Now has impact factor 4.4

  • In 2010 > 6000 articles

  • the largest journal in the world


PLoS One is online and open access. It charges for publication. It is peer-reviewed, and this peer-review judges on scientific rigour of the work. Here is the revolutionary one; it judges not on the basis of percieved importance of the work. Basically, it has removed itself from the last shackle of tree-based publication. The marginal cost of publication is now small. There are no issues, and anyway, these days people get to articles via google.

It now has an impact factor of 4.4 — although, incidentally, PLoS has a publically stated policy that IFs are non-sensical, which PLoS One shows clearly. In 2010 it published more than 6000 articles which made it the largest journal in the world. In 2011, this number doubled.

Open Access Mandates

  • Many funders now mandate OA

  • RCUK (sort of). NIH. Wellcome.

  • Welcomed by all!

  • Research Works Act


Many funders, particularly those with public aims such as Wellcome, or those funded by the public, have started to jump on board. Why should the public pay for things twice, they argue; they pay for the research, why should they pay to read about it. So there are now open access mandates from RCUK, although it’s a little vague what it actually means.

The toll access publishers, of course, have always had providing access to research materials as a key part of their mission statements, so they have been enthusiastic supporters of this. Which is where we come to the Research Works Act.


No Federal agency may adopt, implement, maintain, continue, or otherwise engage in any policy, program, or other activity that […] causes, permits, or authorizes network dissemination of any private-sector research work without the prior consent of the publisher of such work

  • sponsered by senators who have recieved multiple donations from Elsevier

  • "private-sector research work" would include papers

  • Because, after all a paper is produced by the private sector


RWA was a US senate bill which cause a lot of grief, especially coming on the back of SOPA and ACTA to which it is only broadly related. Reading the quote above, you might think what is the big deal, but it was basically aimed at NIHs open access policy, since "private-sector research work" was seen to include anything published by a private sector publisher, even if all the work had been done within the public sector.

Cost of Knowledge


All of this lead to the Cost of Knowledge. Although the website was not set up by him, it was in response to an article by Tim Gowers. Famous, if you are a mathematician, partly because he won a Field Medal, partly for using the web to stimulate collaborative mathematics.

Basically, it’s a boycott of Elsevier. He will no longer review, author or edit for Elsevier. This is not the first such boycott; that was a decade ago. Will this one work? Hard to tell, but there is a lot more traction, not helped by Elsevier continually refering to papers as "their content". They are correct, of course, it is theirs. But many scientists are starting to ask why, given that they are the authors, reviewers, editors and readers.

Arguments for Paywall Journals

  • Paywall promotes quality

  • Australasian Journal of Bone and Joint Medicine

  • Paid for by Merck

  • Positive stories about Merck products

OA is vanity publishing

  • Chaos, Solitons and Fractals


OA is vanity publishing

  • Chaos, Solitons and Fractals


OA is vanity publishing

  • Chaos, Solitons and Fractals


OA is vanity publishing

  • Chaos, Solitons and Fractals


OA is vanity publishing

  • Author M. S. El Naschie published >300 single author articles

  • A big favorite of Editor-in-Chief

  • M. S. El Naschie

  • At one point, Chaos, Solitons and Fractals was highest IF maths journal

Paywall access prevents copying


  • An image of a molecule

  • Copyright Springer-Verlag

  • Actually, produced by Peter Murray-Rust

  • "A global resource for computational chemistry"

  • Journal of Molecular Modeling

A Question and an Answer

  • Who has an article in Lecture Notes in Computing Science?

    • None of you. If it’s in LNCS, the article is Springer’s

OA or OA/2

  • Open Access is, however, expensive

  • Between 800 and 3000(!) pound

  • It’s "half open access"

  • Ultimately, do I care? It’s not my cash!

  • There is also a hidden cost


Open Access is, however, expensive, at between 800 and 3000 pound. This has lead a friend of mine, Bijan Parsia, to coin the phrase "half open access". It’s only open from one side of the equation. But, ultimately, do I care? It’s not actually my cash, and besides research is expensive. Publication costs are normally about 1% of the total cost of research at most.

But I am going to argue that there are a number of hidden costs, and that these costs are ones that you will or should care about.

The Costs

  • The Second Biggest Scientific Publisher

  • 250,000 articles per year

  • 240 million Downloads

  • Cost: 1.5 Billion Euro

  • Elsevier


So, let’s consider the costs. First, lets consider the second biggest scientific publisher in the world. Some basic stats. This is Elsevier. Actually, Elsevier is not that common in Computing, but has an enormous life sciences presence. Springer-Verlag is of a similar size.

The Costs

  • The First Biggest Scientific Publisher

  • 17 million articles

  • > 20 languages

  • 365 million readers

  • Total Cost: 10 million dollars

  • Wikipedia


And the biggest publisher of scientific literature in the world. Wikipedia. Now, of course, this isn’t entirely fair. Wikipedia also publishes a lot of non-scientific literature (the figures from Elsevier also include this). So this isn’t a fair comparison. But it’s not, however, 2 orders of magnitude unfair either.

So why is it so expensive? Elsevier’s profit margin? Well, no, because even it’s 40% profit margin isn’t enough to explain this.

The process


The process


Let’s consider a typical publishing process. This process is, incidentally, not Elsevier, but it is from PLoS;in fact PLoS One. It is not unrelated to other journals however. I generally write my articles in LaTeX, because I happen to like it. You may use word, but the point remains.

The process


I convert this into PDF on my machine and then upload this. Along with all the TeX in case they need them for some unspecified reason.

The process