Thursday, March 30. 2006
On Thu, 30 Mar 2006, Helen Hockx-Yu wrote in JISC-REPOSITORIES:
"I should be grateful if anyone can provide me some evidence to back the following statement:The statement is (1) not based on evidence at all, but pure speculation and (2) speculation not on the part of the content-providers (i.e., the authors, who are presently only spontaneously self-archiving their published articles at about the 15% level) but on the part of others, whose a priori concept of an institutional repository is that it is for long-term preservation (rather than for immediate access-provision and impact maximisation)'Concern of longevity has contributed to the lack of active engagement from many researchers [with institutional repositories]. Guarantee of long-term preservation helps enhance a repository's trustworthiness by giving authors confidence in the future accessibility and more incentives to deposit content'"I guess longevity here also applies to the financial sustainability of the repository itself as a business operation, in addition to its content."
One pretty much gets out of such subjective speculations what one puts into them (including the requisite confirmatory moans from fellow-preservationists!).
JISC author surveys have given the empirical answer as to why only about 15% of papers are being self-archived spontaneously today (although 49% of authors have deposited at least once): Authors are too busy to do it until/unless their employers and or funders make it a priority by mandating it -- and then 95% of them will duly do it:
Swan, A. (2005) Open access self-archiving: An Introduction. JISC/ Key Perspectives Technical Report.But it would be absolutely absurd of their employers and funders to mandate self-archiving for the sake of long-term preservation! Preservation of what, and why? Articles are published by journals. The preservation of the published version (PDF/XML) is the responsibility of the journals that publish it, the libraries that subscribe/license it, and the deposit libraries that archive it. None of that is the responsibility of the author or his institution, and never has been. Hence it is ridiculous to think the reason authors are not self-archiving today is because they are fretting about preservation!
Nor is there the slightest evidence that the 15% of articles that has been self-archived spontaneously in central or institutional repositories has vanished or is at risk! Arxiv content is still there today, a decade and a half since its inception in 1991, under nonstop use. CogPrints contents likewise, since its inception nearly a decade ago. Ditto for the IRs that have been up since GNU Eprints was first released in 2000.
The pertinent feature of all of these archives (even the oldest and biggest) is the pathetically small proportion of their total annual target content -- for Arxiv, all of physics+, for CogPrints, all of cognitive science, for PubMed Central, all of biomedical science, and for institutional IRs, all of each institution's own annual research article output -- what a pathetic proportion of their respective target contents they are actually capturing.
But there are exceptions, and the biggest of them is CERN, which is far above the spontaneous 15% self-archiving baseline and rapidly approaching 100% for its current annual output (while making remarkable progress with its retroactive legacy output too, thanks to superb library activism).
So too are Southampton ECS, U. Minho, and QUT. And the reason is that these four institutions (3 institutions plus 1 institutional department) self-archiving mandates for their own output (rather than no policy, or library activism alone). And the rationale for the mandates (although of course these archives, like all IRs, are duly attending to the preservation of what contents they have!) is not long-term preservation but immediate access-provision for the sake of maximising usage and impact before their authors' bones are in preservation.
So while preservationists lose themselves in speculation about the fact that maybe authors are not depositing because their secret yearnings for preservation are even more exacting than the preservationists', so they are abstaining until they can be absolutely guranteed of immortality for their texts as well as their institutions, the reality is much simpler:
They have (and should have) no special interest in preservation for their authors' drafts. They do have an interest in citation, but not enough to bother self-archiving until/unless their institutions and funders require it. Silly, and short-sighted (sic) but there we are.
Let us hope that their institutions and funders will have the good sense to adopt policies that require (and reward) their researchers for doing what is in their own best interests (as well as the best interests of their institutions and funders) -- just as they already require and reward them to publish (or perish).
Nor is the reward the imperishability of those authors' refereed final drafts that they will be self-archiving (not the publisher's proprietary PDF), but their own scientific immortality (which would slip away fast if they were to keep waiting to immortalise their publishers' PDFs instead, as the preservationists -- embalmers? -- are imagining they are doing).
(Do I sound like an archivangelist whose remaining reserves of patients have taken flight?)
In an unpublished study, Antelman et al. (2005) hand-tested the accuracy of the algorithm that Hajjem et al.'s (2005) software robot used to identify Open Access (OA) and Non-Open-Access (NOA) articles in the ISI database. Antelman et al. found much lower accuracy (d' 0.98, bias 0.78, true OA 77%, false OA 41%), with their larger sample of nearly 600 (half OA, half NOA) in Biology (and even lower, near-chance performance in Sociology, sample size 600, d' 0.11, bias 0.99, true OA 53% false OA 49%) compared to Hajjem et al., who had with their smaller Biology sample of 200, found: d' 2.45, beta 0.52, true OA 93%, false OA 16%.Summary: Antelman et al. (2005) hand-tested the accuracy of the algorithm that Hajjem et al.'s (2005) software robot used to to trawl the web and automatically identify Open Access (OA) and Non-Open-Access (NOA) articles (references derived from the ISI database). Antelman et al. found much lower accuracy than Hajjem et al. Had reported. Hajjem et al. have now re-done the hand-testing on a larger sample (1000) in Biology, and demonstrated that Hajjem et al.'s original estimate of the robot's accuracy was much closer to the correct one. The discrepancy was because both Antelman et al. And Hajjem et al had hand-checked a sample other than the one the robot was sampling. Our present sample, identical with what the robot saw, yielded: d' 2.62, bias 0.68, true OA 93%, false OA 12%. We also checked whether the OA citation advantage (the ratio of the average citation counts for OA articles to the average citation counts for NOA articles in the same journal/issue) was an artifact of false OA: The robot-based OA citation Advantage of OA over NOA for this sample [(OA-NOA)/NOA x 100] was 70%. We partitioned this into the ratio of the citation counts for true (93%) OA articles to the NOA articles versus the ratio of the citation counts for the false (12%) "OA" articles. The "false OA" advantage for this 12% of the articles was 33%, so there is definitely a false OA Advantage bias component in our results. However, the true OA advantage, for 93% of the articles, was 77%. So in fact, we are underestimating the true OA advantage.Previous AmSci Topic Thread:
Hajjem et al. have now re-done the hand-testing on a still larger sample (1000) in Biology, and we think we have identified the reason for the discrepancy, and demonstrated that Hajjem et al.'s original estimate of the robot's accuracy was closer to the correct one.
The discrepancy was because Antelman et al. were hand-checking a sample other than the one the robot was sampling: The templates are the ISI articles. The ISI bibliographic data (author, title, etc.) for each article is first used to automatically trawl the web with search engines looking for hits, and then the robot applies its algorithm to the first 60 hits, calling the article "OA" if the algorithm thinks it has found at least one OA full-text among the 60 hits sampled, and NOA if it does not find one.
Antelman et al. did not hand-check these same 60 hits for accuracy, because the hits themselves were not saved; the only thing recorded was the robot's verdict on whether a given article was OA or NOA. So Antelman et al. generated another sample -- with different search engines, on a different occasion -- for about 300 articles that the robot had previously identified as having an OA version in its sample, and 300 for which it had not found an OA version in its sample; Antelman et al.'s hand-testing found much lower accuracy.
Hajjem et al.'s first test of the robot's accuracy made the very same mistake of hand-checking a new sample instead of saving the hits, and perhaps it yielded higher accuracy only because the time difference between the two samples was much smaller (but the search engines were again not the same ones used). Both accuracy hand-tests were based on incommensurable samples.
Testing the robot's accuracy in this way is analogous to testing the accuracy of an instant blood test for the presence of a disease in a vast number of villages by testing a sample of 60 villagers in each (and declaring the disease to be present in the village (OA) if a positive case is detected in the sample of 60, NOA otherwise) and then testing the accuracy of the instant test against a reliable incubated test, but doing this by picking another sample of 60 from 100 of the villages that had previously been identified as "OA" based on the instant test and 100 that had been identified as "NOA." Clearly, to test the accuracy of the first, instant test, the second test ought to have been performed on the very same individuals on which the first test had been performed, not on another sample based only on the overall outcome of the first test, at the whole-village level.
So when we hand-checked the actual hits (URLs) that the robot had identified as "OA" or "NOA" in our Biology sample of 1000, saving all the hits this time, the robot's accuracy was again much higher: d' 2.62, bias 0.68, true OA 93%, false OA 12%.
All this merely concerned the robot's accuracy in detecting true OA. But our larger hand-checked sample now also allowed us to check whether the OA citation advantage (the ratio of the average citation counts for OA articles to the average citation counts for NOA articles in the same journal/issue) was an artifact of false OA:
We accordingly had the robot's estimate of the OA citation Advantage of OA over NOA for this sample [(OA-NOA)/NOA x 100 = 70%], and we could now partition this into the ratio of the citation counts for true (93%) OA articles to the NOA articles (false NOA was very low, and would have worked against an OA citation advantage) versus the ratio of the citation counts for the false (12%) "OA" articles. The "false OA" advantage for this 12% of the articles was 33%, so there is definitely a false OA Advantage bias component in our results. However, the true OA advantage, for 93% of the articles, was 77%. So in fact, we are underestimating the OA advantage.
As explained in previous postings on the American Scientist topic thread, the purpose of the robot studies is not to get the most accurate possible estimate of the current percentage of OA in each field we study, nor even to get the most accurate possible estimate of the size of the OA citation Advantage. The advantage of a robot over much more accurate hand-testing is that we can look at a much larger sample, and faster -- indeed, we can test all of the articles in all the journals in each field in the ISI database, across years. Our interest at this point is in nothing more accurate than a rank-ordering of %OA as well as %OA citation Advantage across fields and years. We will nevertheless tighten the algorithm a little; the trick is not to make the algorithm so exacting for OA as to make it start producing substantially more false NOA errors, thereby weakening its overall accuracy for %OA as well as %OA advantage.
Stevan Harnad & Chawki Hajjem
Thursday, March 23. 2006
As predicted, and long urged, the UK's wasteful, time-consuming Research Assessment Exercise (RAE) is to be replaced by metrics:
"Research exercise to be scrapped"RAE outcome is most closely correlated (r = 0.98) with the metric of prior RCUK research funding (Figure 4.1) (this is no doubt in part a "Matthew Effect"), but research citation impact is another metric highly correlated with the RAE outcome, even though it is not explicitly counted. Now it can be explicitly counted (along with other powerful new performance metrics) and all the rest of the ritualistic time-wasting can be abandoned, without further ceremony.
This represents a great boost for institutional self-archiving in Open Access Institutional Repositories, not only because that is the obvious, optimal means of submission to the new metric RAE, but because it is also a powerful means of maximising research impact, i.e., maximising those metrics: (I hope Research Councils UK (RCUK) is listening!).
Harnad, S. (2001) Why I think that research access, impact and assessment are linked. Times Higher Education Supplement 1487: p. 16.And this new metric RAE policy will help "unskew" it, by instead placing the weight on the individual author/article citation counts (and download counts, CiteRanks, authority counts, citation/download latency, citation/longevity, co-citation signature, and many, many new OA metrics waiting to be devised and validated, including full-text semantic-analysis and semantic-web-tag analyses too) rather than only, or primarily, on the blunter instrument (the journal impact factor).
This is not just about one number any more! The journal tag will still have some weight, but just one weight among many, in an OA scientometric multiple regression equation, customised for each discipline.
This is an occasion for rejoicing at progress, pluralism and openness, not digging up obsolescent concerns about over-reliance on the journal impact factor.
The document actually says
You are quite right, though, that the default metric many have in mind is research income, but be patient! Now that the door has been opened to objective metrics (instead of amateurish in-house peer-re-review), this will spawn more and more candidates for enriching the metric equation. If RAE top-slicing wants to continue to be an independent funding source in the present "dual" funding system (RCUK/RAE), it will want to have some predictive metrics that are independent of prior funding. (If RAE instead just wants to redundantly echo research funding, it need merely scale up RCUK research grants to absorb what would have been the RAE top-slice and drop the RAE and dual funding altogether!)"one or more metrics... could be used to assess research quality and allocate funding, for example research income, citations, publications, research student numbers etc."
The important thing is to scrap the useless, time-wasting RAE preparation/evaluation ritual we were all faithfully performing, when the outcome was already so predictable from other, cheaper, quantitative sources. Objective metrics are the natural, sensible way to conduct such an exercise, continuously, and once we are doing metrics, many powerful new predictive measures will emerge, over and above grant income and citations. The RAE ranking will not come from one variable, but from a multiple regression equation, with many weighted predictor metrics in an Open Access world, in which research full-texts in their own authors' Institutional Repositories are citation-linked, download-monitored and otherwise scientometrically assessed and analysed continuously.
Hitchcock, S., Brody, T., Gutteridge, C., Carr, L., Hall, W., Harnad, S., Bergmark, D. and Lagoze, C. (2002) Open Citation Linking: The Way Forward. D-Lib Magazine 8(10).
Wednesday, March 22. 2006
MIT has proposed two OA policy steps: compliance with the NIH Public Access Policy and seeking consensus on copyright retention.
In the interests of brevity, clarity, and comprehension, I will be (uncharacteristically) brief (8 points):
(1) The two steps taken by MIT are a very good thing, compared to taking no steps at all, but:
Posted by Peter Suber in Open Access News:
Two steps to support OA at MIT
Tuesday, March 21. 2006
Richard Poynder has done another penetrating and informative interview -- this time of Richard Stallman, founder of the GNU Project, the Free Software Foundation, and Copyleft.
Richard Stallman is a remarkable person and has made and continues to make invaluable contributions to freeing software to be creatively developed and used without proprietary restrictions.
It is important to understand what it is that Stallman stands for, in order to see that it is not the same thing as Open Access (OA) (although of course it is fully compatible with and in harmony with OA):
What Stallman means by "free" is free to use, develop and distribute. His main target is software code (though he has a more general view about all forms of property). Stallman opposes anything that prevents software from being further developed, improved upon, and distributed. (N.B. He does not oppose the selling of software; he opposes the hiding of the code, and the outlawing of its re-use and revision.)
Please note, though, that he states very clearly in the Interview that he understands that scholarly/scientific articles are not like computer code, meant to be modified and redistributed by others. This is a profound and fundamental difference, and if you don't grasp it, you invite all kinds of confusion and misunderstanding:
The right analogy between research findings and software is at the level of the content of the research findings, not the form (i.e., not the code, not the text). The text is proprietary, but the content is for everyone's use, and re-use (with proper citation to the source). Software code, in contrast, has no content. It is the code itself that Stallman is talking about modifying and redistributing.
The one small point of commonality (as opposed to mere analogy, at the content level) is the question of mirroring rights for OA texts: Stallman thinks it is not enough to put OA content in one's own IR; he thinks you have to make sure to formally grant explicit mirroring (and, presumably, caching and harvesting) rights with it too.
I don't agree with Stallman on this one tiny point; I think all the rest of the uses pretty much come with the web/OA territory right now; I'll start worrying about it if/when google ever needs a license to harvest freely accessible web content. Right now, too much OA content is still missing, and worries about having to renegotiate rights are part of what keeps it missing. So let's forget about that for now.
The disanalogy between the OA movement and the Free Software movement is, of course, that whereas the publisher charging for access to the text is fine, the author also wants to provide toll-free access to his own final draft, in order to maximize its usage and impact: The authors of peer-reviewed journal articles are not interested in royalty revenue (whereas some authors of software code might be) because any toll-barrier at all preventing a would-be user from having access to their work costs the author in terms of lost research impact, research progress, and even further research grant income and other possible rewards.
I think this disanalogy is easy to understand, but it too needs to be made and kept quite explicit in everyone's mind.
I close with just a logical point on the question of "free" in the sense of free-of-charge and "free" in the GNU sense of free-to-revise/redistribute: Is it not a bug if a hacker (i.e., a programmer, in Stallman's good sense, the original meaning of "hacker") can write software code, sell it (in the hope of making an honest living), but the very first customer who buys it can make a trivial revision (or none at all) and then give the code away to one and all (or even make a tiny improvement, relative to the total work that went into the original) and start selling it at a competing cut-rate price?
I just pose this as a kind of koan for the putative free/free distinction (I'm sure others have thought of it too, and there may even be an answer, but I cannot intuit it offhand); and if the distinction does not survive it, then what has to go: the freedom to sell or the freedom to revise/redistribute?
I ask this only in a spirit of genuine puzzlement, because I really admire what Richard Stallman advocates and stands for.
One could also ask whether Richard Stallman's sense of "freedom" really scales up, beyond software, to all forms of human product, as he seems to believe. How many people could earn an honest living from their creative work that way?
Eprints of course has been GNU Eprints from the outset.
Richard Stallman AmSci Postings:
Monday, March 13. 2006
[Update: See new definition of "Gratis" and "Libre" OA, 27/8/2008]
Note to Peter Suber and the original formulators of the Budapest Open Access Initiative (Re-posted from AmSci Forum 13 March 2005 [last year]).
I would like to suggest that this is the right time, in light of recent developments, to update the BOAI definition of OA to make explicit what was already implicit in it: That OA must be now and must be permanent (not, for example, a feature that is provided for an instant, a century from now).
I think this was always perfectly obvious to anyone who read the BOAI definition of OA, but, as people will do, those with a vested interest in doing so found a loophole in the wording as it now stands. This is easily remediable by adding and announcing the obvious "immediate" (upon acceptance for publication) and "permanent" that should have been stated explicitly in the first place.
I think we overlooked this partly because we could not second-guess all conceivable self-serving construals by opponents of OA, but partly because we were trying to be as encouraging as possible about partial measures. Yet we were very careful, and should now be even moreso, not to allow the notion of "partial-OA" -- which is on a direct slippery-slope in which TA (toll-access) too would become construable as just another form of partial-OA!
Delayed free-access and temporary free-access are forms of access, to be sure -- and some is generally better than none, more is generally better than less -- but OA itself is only complete free access, immediate and permanent, for everyone and anyone, anytime, anywhere webwide. Otherwise all access would be OA, and the rest would just be a matter of degree (or, in the words of the wag, we would have agreed on our profession and we would now be merely haggling about the price!)
The BOAI definition was not etched in stone. 3+ years of experience have now suggested ways in which it can be clarified and optimized. This is a good time to make explicit what was already implicit in it, which is: OA is a trait of an article, not an evanescent state. Just as an article is OA if it is freely accessible online, an article is not OA if it is not freely accessible online, and hence an article that is not immediately accessible freely online is not OA and an article that is no longer freely accessible online is not OA (and never was -- within the limits of inductive uncertainty and the impossibility of clairvoyance, i.e., if the obsolescence was planned).
Being accessible might be a transitory state, but being OA has to be an all-or-none trait. Researchers don't need access to research eventually, or temporarily or sometimes or somewhere: All researchers need OA to all research, immediately, permanently, at all times, and everywhere (webwide). I suggest that we announce the following update to the passage that starts:
"By "open access" to this literature, we mean its free availability on the public internet, permitting..."to:
"By "open access" to this literature, we mean its free availability on the public internet, immediately and permanantly, permitting..."Those with an interest in blocking or minimzing non-toll-based access will of course scream that BOAI is "moving the goalposts!" but I think anyone who thinks clearly and honestly about the interests of the research community and of research itself, and what was the fundamental rationale and motivation for OA in the first place, will see that this is merely highlighting what the goal has been all along, not moving it.
Date: Sun, 13 Mar 2005 03:30:27 +0000 (GMT)
From: Stevan Harnad
To: Richard Poynder
Subjectt: Poynder's Blog-Point
One thing you missed: The "immediate" and "permanent" are and always were implicit in the BOAI definition of OA: An article is OA if and when it is freely accessible online. Obviously when it is not, it is not OA, so that excludes any embargo period, or any temporary "hook" period, withdrawn afterward!
The goal of OA is to make all articles OA: Not all articles OA after a while, or for a while. The answer to the question "Is this article OA?" has to be "yes", not "no". If an article can be OA some of the time, and not OA other times, then you may as well say an article can be OA to some people and not to other people (which is exactly what toll-access is: OA to those who can pay, non-OA to those who cannot).
Immediacy and permanence is as intrinsic to the fundamental rationale for OA as the full-text's being on-line and toll-free is. Researchers don't want to keep losing 6-12 months of research impact and progress, and call that Open Access.
Back Access is a cynical sop, any way you look at it, and a deplorable attempt to misuse both the principle of OA and the rationale underlying it.
I hope the Immediate Institutional Keystroke Policy as a default bottom line will put an end to any further inclination to try to use the Back-Access Ploy, for it immunizes institutions completely from any pressure for an embargo (the N-1 keystroles to deposit the metadata and full text are required, for internal purposes; the Nth OA keystroke is strongly encouraged but up to the author), leaving the dominoes to fall naturally (and anarchically) of their own accord. Sensible institutions won't even bother formalizing the Nth keystroke as optional, but will deal with it, if need be, on a case by case basis.
The Wellcome Trust will have the eternal historical distinction of having been the first research funder to actually mandate Open Access (OA) self-archiving (May 2005): NIH Public Access Policy alas did not help advance OA, but rather missed an opportunity and inadvertently held things back for at least 2 years. But the hope now is that -- inspired in part by the far better model provided by the Wellcome Trust policy -- the NIH policy will be revised, becoming a self-archiving requirement instead of just a self-archiving request, no longer allowing a delay of up to 12 (or even 6) months.
It does not follow, however, that the current Wellcome Trust policy is unflawed, or that it provides the optimal model for others to follow. It was a great help at its historic time, as a counterweight to the far more flawed NIH policy, but at this historic point, the Wellcome Trust policy too risks becoming a retardant instead of a facilitator of OA, if it is imitated by others in its flaws instead of its strengths.
The strength of the Wellcome Policy is that (1) it is an exception-free requirement, not an optional request, and that (2) it does not allow a delay of longer than 6 months.
Its flaw is that (a) it allows any delay at all and that (b) it requires self-archiving in a central, 3rd-party repository (PubMed Central; PMC) instead of the author's own institutional OA Institutional Repository (IR) (from which PMC could then harvest if/when it wishes).
The two flaws are linked. For the simple and natural way to rule out delays is to require immediate deposit of the accepted, final draft in the author's own institutional OA IR (immediately upon acceptance for publication), but merely request/encourage that access to the deposited draft should be immediately set to "Open Access." That leaves the author the option to provisionally set access instead as "Restricted Access" if need be (for up to 6 months).
How is this linked to the requirement to deposit in PMC instead of at home? Because PMC is neither the author nor the author's institution. It is not even the Wellcome Trust. It is a generic, 3rd-party repository, which publishers can (perhaps rightly) construe as a rival 3rd-party publisher. Publishers are certainly within their rights to block or embargo rival 3rd-party publishing. (Whether it makes any sense to try to treat a 3rd-party OA repository as a rival publisher in the OAI-interoperable age is another matter!)
But the author and the author's own institution certainly cannot be construed as a rival 3rd-party publisher: They are the party of the first part, the content-provider, and the publisher is only the party of the second part: the value-adder and vendor.
And that is why far more journals have given their green light to author self-archiving in their own respective institutional OA IRs, than to self-archiving in a central 3rd-party repository like PMC. And that is also why PMC-archiving is more vulnerable to a publisher embargo.
But there is an ultra-simple way to require immediate deposit while accommodating any publisher embargo at the same time: Require immediate deposit in the author's own OA IR -- immediately upon acceptance for publication -- and harvest the full-text into PMC after 6 months!
That way the deposit is, without exception, immediate, and for about 93% of articles, access too will be immediately OA. (Those articles, too, can be immediately harvested into PMC.) For the c. 7% of articles set to Restricted Access, the metadata will be immediately visible anyway, and emailed eprint-requests (facilitated and automatized with the help of the IR software) can fulfil the access-needs of would-be users who cannot afford access to the proprietary journal version during the embargo period.
Why not implement the deposit/access-setting distinction, but in PMC rather than in the author's own IR? Because it fails to generalise to all the rest of OA research output (in all fields of research, not just biomedical). The Wellcome Trust funds some of the world's biomedical research; NIH funds more; but there are vasts amounts of further research -- in biology, medicine, physical sciences, engineering, social sciences and even the humanities -- that would all fail to benefit from a parochial PMC mandate for biomedical research. If, instead, funders like Wellcome and NIH mandated that their fundees self-archive in their own institutional OA IRs, that would effectively "tile" all of OA space, effectively and completely, as universities cover all fields of research output. (Central OA repositories like PMC and others would still be available for any orphan works from unaffiliated researchers.)
In other words, funders are not helping world OA if they keep thinking of it as a go-it-alone operation. Funders only fund bits; central OA repositories don't exist for all disciplines and fields; and even if they did, they -- unlike the researchers' institutions -- do not have the clout to reinforce scattered funder mandates with institutional self-archiving mandates, to ensure that all their institutional research output is indeed self-archived.
So the simple and sensible way to update and optimise the pioneering Wellcome Trust self-archiving mandate would be to (1) require the self-archiving to be done in the fundee's own institutional OA IR (as the UK Select Committee proposed), (2) require it to be done immediately upon acceptance for publication, (3) encourage immediate access-setting to OA, (4) require access-setting at OA by 6 months at the latest, and (5) harvest the metadata into PMC immediately upon deposit -- and the full-text into PMC (if need be -- there's a case to be made for just linking to the IR version) within 6 months at the latest.
Why is Wellcome Trust not making this simple and obvious update without even any need for prompting? I think it is because there are again green and gold wires crossed: Over and above its mission to ensure that all Wellcome-funded research (and, hopeably, all research) is made OA, the Wellcome Trust has the further worthy goal of encouraging a transition to the OA (gold) publishing model. This is all fine, but not if the slow, uncertain transition to gold OA is supported at the expense of a speedy, certain transition to 100% OA itself (green).
And that is what I think is happening: Wellcome is not doing everything it could to hasten OA itself, because it is not committed only to OA, but to publishing reform too.
My own view is that publishing reform will take care of itself, and that the urgent task is to get to 100% OA as soon as possible. (Indeed, that itself will probably prove the most important stimulant to publishing reform.) But to slow the immediately feasible and certain transition to OA in the service of far slower and less certain -- and more hypothetical -- measures to induce publishing reform, is not, I think, to help OA along the road to the optimal and inevitable (and already overdue) outcome.
On Mon, 13 Mar 2006, Robert Kiley (Wellcome Trust) [RK] wrote in the American Scientist Open Access Forum:
RK: "Please note the Wellcome Trust currently does NOT have any plans to reduce the 6 month time limit on its grant condition. The grant condition requires published research (original research papers in peer reviewed journals) arising in part or whole from Trust funding to be placed in Pubmed Central (or UK PMC when it exists) no later than 6 months after the date of publication."No need to reduce the 6 months if Wellcome does not wish to. Just mandate immediate deposit (in the fundee's own OA IR) and let delayed access-setting bear the burden of the delay. Meanwhile, everyone gets into the habit of self-archiving at home, and emailing eprints can bridge the gap, universally and uniformly.
RK: "It is obvious that a potential delay of up to 6 months is not ideal in terms of the timing of access, but it is a realistic response to the very real concerns of publishers, large and small, that self archiving is a threat to their business model. Whether this is eventually shown to be the case is immaterial as it is this perception that we need to deal with."Fine. As noted: Mandate immediate deposit and allow the option of delayed access-setting.
RK: " As the only funding organisation with a mandate in its grant condition to support open access through open access publishing and archiving in PMC we are very well aware how many journals are currently at odds with this policy."Note the conflation of open access provision (through self-archiving, green) with open access publishing (gold)...
RK: "That is why, in conjunction with JISC, we are funding an extension of the Sherpa/Romeo project to identify, at the journal level, which journals will allow a copy of the published paper to be deposited into PMC/UKPMC so it is available no later than 6 months after the original publication date."It is always good to extend Sherpa/Romeo's coverage, but Romeo already lists embargoes, if any. So surely what Romeo needs is more coverage of journal self-archiving policies, not a focus on 6-month embargoes!
RK: "In order to encourage experiments in alternative business models to the subscription model the Trust also explicitly supports open access publishing as part of the research funding process."
So far, so good. Funding authors' OA (gold) publishing charges is very constructive and helpful. But now this:
RK: " That is why we provided some assistance to OUP, Blackwell's and Springer in drafting the author licence for their various open access offerings so that they were explicitly compliant with publishing and depositing in an archive such as PMC."This sort of thing simply encourages the locking in of a 6-month embargo instead of helping to phase it out!
If the Wellcome Trust instead simply mandated immediate deposit and let access-setting bear the weight of any embargoes, it would not need to get into the business of entrenching and canonizing embargoes instead of letting them die a quiet death of natural causes!
RK: " We see open access repositories and open access publishing as complimentary exercises and to us, and the publishers we talk to, there is a direct link between the impact of self archiving and the publishing process so it is a pragmatic response to deal with both issues in parallel."What is complementary today is: (1) non-OA publishing, (2) OA publishing, and (3) OA repositories for the author self-archiving of both (1) and (2).
Self-archiving is not a form of OA publishing, and the immediate and reachable goal -- the one that justifies OA in the first place, namely, access to 100% of published research articles -- is a transition to 100% OA, not necessarily a transition to OA publishing.
RK: " In time the most likely scenario, and one the Trust is supporting, is that open access publishing, or another model yet to be invented, will become the norm and publishers will be able to operate without a reliance on subscriptions. As such the 6 month embargo period will be kept under review but at the moment the Trust has no plans to change it."That's fine. Let the allowable 6-month delay stand, but let it be a delay in access-setting, not deposit. And let the immediate deposit be in the fundee's own institutional IR, with PMC harvesting it after the allowable delay -- rather than delaying the deposit itself, and insisting it be in PMC!
EXECUTIVE SUMMARY: Universities and research funders are both invited to use this document to help encourage the adoption of an Open Access Self-Archiving Mandate at their institution. Note that this recommended "Immediate-Deposit & Optional-Access" (IDOA) policy model (also called the "Dual Deposit/Release Strategy") has been specifically formulated to be immune from any delays or embargoes (based on publisher policy or copyright restrictions): The deposit -- of the author's final, peer-reviewed draft of all journal articles, in the author's own Institutional Repository (IR) -- is required immediately upon acceptance for publication, with no delays or exceptions. But whether access to that deposit is immediately set to Open Access or provisionally set to Closed Access (with only the metadata, but not the full-text, accessible webwide) is left up to the author, with only a strong recommendation to set access as Open Access as soon as possible (immediately wherever possible, and otherwise preferably with a maximal embargo cap at 6 months).
1. Research Accessibility
1.1 There exist 24,000 peer-reviewed journals (and conference proceedings) publishing 2.5 million articles per year, across all disciplines, languages and nations.2. Research Impact: Usage and Citations
2.1 This is confirmed by recent findings, independently replicated by many investigators, showing that articles for which their authors have supplemented subscription-based access to the publisher’s version by self-archiving their own final drafts free for all on the web are downloaded and cited twice as much across all 12 scientific, biological, social science and humanities disciplines analysed so far. (Note: there are no discipline differences in benefits of self-archiving, only in awareness.)3. University Self-Archiving Mandates Maximise Research Impact
3.1 Only 15% of the 2.5 million articles published annually are being spontaneously self-archived worldwide today..4. Action: This university should now mandate self-archiving university-wide
4.1 This university should now maximise its own research impact and set an example for the rest of the world by adopting a self-archiving mandate university-wide.5. The Importance of Prompt Action
5.1 Self-archiving is effortless, taking only a few minutes and a few keystrokes; library help is available too (but hardly necessary).
Southampton University Resources for Supporting Open Access Worldwide
A1 U. Southampton ECS department was the first department or institution in the world to adopt a self-archiving mandate (2001).
Sunday, March 12. 2006
The Open Access (OA) guidelines of Deutsche Forschungsgemeinschaft (DFG, German Research Foundation) are very, very welcome, but I hope that a few seemingly minor details (see below) can be revised to make them an effective model for others worldwide:
DFG Passes Open Access GuidelinesThe first problem concerns this clause:
"recommended encouraging funded scientists to also digitally publish their results and make them available via open access"On the one hand, this clause is too weak: It is specifically because the NIH only "recommended/encouraged" that its public access policy has failed and now needs to be strengthened to "required/mandated."
On the other hand, the present clause is far too vague and ambiguous:
(1) Virtually all journals today are hybrid paper/digital already, so recommending/encouraging that the publication should have a "digital version" is breaking down open doors.
(2) What needs to be brought out clearly is that what is actually being required is that a digital version of the publication should be made open access (OA) -- by self-archiving it (depositing it in an OA repository).
(3) What can also be recommended/encouraged (but not required) is to publish in an OA journal where possible.
(4) All ambiguity about "publishing" and "publication" should be eliminated, by saying (and meaning) that "publishing" means publishing in a peer-reviewed journal, whereas depositing a published article in an OA repository is not publishing but access-provision. A published article is already published! Self-archiving increases the access to that publication by making it available to those would-be users who cannot afford subscription access to the publisher's proprietary version.
"require funded scientists to also self-archive their published results in an online repository to make them available via open access"(5) No rights renegotiation is necessary at all for the 93% of journals that already endorse immediate self-archiving
(6) For the 7% of journals that do not yet endorse self-archiving, no rights renegotiation is needed for immediate depositing, but rights can be negotiated for setting Open Access.
NB: "OA Self-Archiving" means (i) depositing the full text and metadata in a web repository and (ii) setting access to the full-text as Open Access. The depositing itself (i) (where no one can see the full-text but the author) requires no permission from anyone! The only conceivable rights issue concerns access-setting.
"In order to put secondary publications (i.e. self-archived publications by which the authors provide their scientific work on the internet for free following conventional publication) on the proper legal footing, scientists involved in DFG-funded projects are also requested to reserve the exploitation rights."(7) Please don't call providing OA to an already-published article "secondary publication"! In a formal sense self-archiving can indeed be construed that way, but that is not a construal that clarifies, it merely confuses. Leave publication to publishers. Authors don't publish their own articles, let alone publish their own already-published articles! They provide access to them, just as they did in paper days when they provided reprints or photocopies, none of which were called "secondary publication." Secondary publishers are publishers, 3rd parties (not the author, and not the primary publisher), that republish an entire published work; or they are indexers/abstracters, that republish parts of it. Self-archivng authors are not secondary publishers of their own published work.
(8) Whereas it is certainly useful and desirable to "reserve the exploitation rights" for authors' published articles, this is not a prerequisite for self-archiving their own drafts (rather than the publisher's PDF), and certainly not for the 68% of journals that are already "green," having given their official blessing to author self-archiving of postprints -- nor for the 25% more that have endorsed preprint self-archiving. Rights renegotiation is hence moot for all but 7% of the c. 8800 journals indexed in Romeo (and that includes virtually all the principal international journals).
(9) Most important: The rights negotiation is not about the depositing (which should be mandatory, and immediate upon acceptance for publication) but only about the access-setting -- i.e., whether access to the deposited full-text is set to "Open Access" or only "Restricted Access" (and if the latter, then for how long).
"For publications that they self-archive on the internet for free following publication, scientists involved in DFG-funded projects are also encouraged -- if the publisher has not already endorsed immediate author self-archiving -- to retain the immediate right to set access as 'Open Access'."The guidelines continue:
Recommendations are currently being integrated into the usage guidelines, which form an integral part of every approval. They are worded as follows:The last sentence is awkward and ambiguous, mixing up publishing and self-archiving, but it is easily clarified:
"To achieve this, all work should be published either in conventional journals or in recognised peer-reviewed open access journals; and in addition (the author's draft of) all publications should be self-archived in discipline-specific or institutional electronic archives (repositories)."The guidelines continue:
"When entering into publishing contracts scientists participating in DFG-funded projects should, as far as possible, permanently reserve a non-exclusive right of exploitation for electronic publication of their research results for the purpose of open access. Here, discipline-specific delay periods of generally 6-12 months can be agreed upon, before which publication of previously published research results in discipline-specific or institutional electronic archives may be prohibited."Recommended revision:
"When entering into publishing contracts with journals that do not already explicitly endorse immediate author self-archiving, scientists participating in DFG-funded projects should, as far as possible, permanently reserve a non-exclusive right to set access to their deposited draft as Open Access immediately upon deposit. An access-delay interval of 6-12 months is discouraged, but allowable under current DGF policy; during this interval the publication, always deposited immediately upon acceptance, may be placed under Restricted Access rather than Open Access."Allowing any Restricted Access interval at all is the weaker form of OA mandate, but it is still sufficient. It is critically important, however, that:
(a) Depositing the full text is required, not just requested
(b) The depositing itself must always be done immediately upon acceptance for publication, not after the access-delay interval agreed with the publisher
(c) During any agreed access-delay interval (one year maximum) access to the full-text can be set as Restricted Access rather than Open Access
I would also recommend against permitting a delay as long as one year: NIH is now moving from a year to 4 months; Wellcome allows 6 months but is planning to reduce that. There is no need for DFG to be more permissive of access restriction.
The guidelines finish thus:
Please ensure that a note indicating support of the project by the DFG is included in the publication.
Friday, March 3. 2006
This is perhaps a good juncture at which to make it explicit that there is "small-p preservation" and "large-P Preservation." Of course GNU Eprints, like everyone else (including ArXiv since way back in 1991) is doing small-p preservation, and will continue to do so: Open Access is for the sake of immediate access, today, tomorrow, and into the future -- and this, in turn, is for the sake of maximising immediate usage and impact, today, tomorrow, and into the future. Hence small-p preservation is a necessary means to that end.
But big-P Preservation, in contrast, is Preservation as an end in itself: as the motivation for archiving in the first place; or as a pressing need for ephemeral or fragile "born-digital" contents; or as a responsibility for content-providers (journal-providers) or content-purchasers (subscribing libraries) or content-preservers (deposit/record libraries) who need to ensure the perennity of their sold/purchased product.
So it is absurd to imagine (and for that reason needs to be stated explicitly, again and again, even though it is patently obvious) that Eprints is either oblivious to small-p preservation or that its contents are one bit more or less likely to vanish tomorrow than any other digital contents that are being conscientiously preserved and migrated and upgraded today, keeping up with the ongoing developments in the means of preservation.
The difference between preservation and Preservation is that preservation is not an end in itself, it is a means to an end (which is immediate, ongoing access-provision and usage), whereas Preservation is an end in itself.
Why is it so important to make it crystal clear that Eprints and OA are not for Preservation projects? that their primary motivation is not to ensure the longevity of digital contents (even though Eprints and OA do provide longevity, and do keep up with whatever developments occur in the means of long-term preservation of their contents)?
Because OA's target contents are 85% missing! The pressing problem of absent content cannot be its Preservation! Eighty-five percent of the 2.5 million articles published annually in the world's 24,000 journals are not being self-archived today (and, a fortiori, were not self-archived yesterday, or the month/year/decade before). What has been -- and continues to be -- lost, as a consequence of this, is not the contents in question (for they are being Preserved in their proprietary-product version, by their producers [publishers] along with their purchasers [libraries]).
What has been (and continues to be) lost for the 85% of annual OA target content that has not been (and is not being) self-archived, is access, usage, and impact. That is the true motivation for Eprints and OA self-archiving. And (listen carefully, because this is the gist of it!): that content will never be self-archived by its authors for the sake of Preservation, because it need not be: its Preservation is already in other hands than its authors (or its authors' institutions), as it always was, and for the foreseeable future will continue to be. The mission of authors and their institutions was not, is not, and should not have to be the Preservation of their own published journal article output [but see Note below**].
Nor, by the same token, is it the mission or motivation of authors' institutions to create Institutional Repositories (IRs) for the Preservation of their own published journal article output. If there is no better reason for creating OA IRs today than the Preservation of one's own journal article output, then there is no reason for institutions to create OA IRs today, and no reason for their authors to self-archive. This is a logical, empirical and practical fact, stated (recall, again) at a historical moment when 85% of OA target content is still missing, even though it is overdue, even though its self-archiving has been feasible for years, and even though its continuing absence entails that 85% of maximised research usage and impact (i.e., impact from usage by all would-be users rather than only those whose institutions can afford journal access) continues to be lost.
To wrongly identify the mission or motivation of Eprints or OA self-archiving with the need to Preserve digital contents is to provide yet another (strong) reason for authors not to self-archive. Because Preservation is simply no reason at all (for OA self-archiving).
And to subsume the urgent mission of finding a way to generate that missing 85% of OA target content under the murky mission of the generic Preservation of generic digital content is simply to miss the point of OA self-archiving altogether, and to imagine that it is merely yet another instance of Preservation-Archiving -- whose mission and motivation, to repeat, yet again, is not immediate, urgent, long-overdue content-provision, access-provision, and usage/impact-maximisation, but long-term content-Preservation, as an end in itself.
So please, let us reassure those who might be fussed about it, that the contents of OA IRs like Eprints can and will continue to be preserved, but that to be Preserved is not their purpose, nor the purpose of self-archiving: immediate and ongoing access-provision and usage/impact-maximisation is their purpose. And that purpose is currently not being met -- not because the OA contents are at risk of not being preserved today, but because (85% of) the OA contents are at a certainty of not being provided today.
The OA problem, in other words, is not Preservation tomorrow, but Provision today. Hitching today's Provision problem to tomorrow's Preservation problem is yet another recipe for prolonging the non-Provision of 85% of OA's target content.
What is needed for the provision of the missing 85% of OA's target content is author motivation; and the empirical findings on how OA enhances usage and impact go only part of the way toward engaging author motivation. The critical missing bit to ensure the provision of the missing content is institutional OA self-archiving mandates, not the plugging in of OA as merely another plank in the institution's generic Preservation platform.
I sense I am repeating myself -- but it appears to be needed, for the conflation of the Preservation-archiving mission and the OA access-provision mission just keeps recurring, deferring time, energy and motivation from OA access-provision, which is Eprints' raison d'etre.
[**Note: One last, somewhat subtler point, almost need not be stated, but it's probably better to make it explicit too, even though it is highly premature and highly hypothetical: If and when it should ever transpire -- and there is as yet no sign at all that it will -- that 100% OA via 100% self-archiving, having been neared or reached, should cause radical changes in the journal publishing system, forcing publishers to down-size into becoming only peer-review service-providers and certifiers, rather than also being the analog and digital product access-providers, as they are now, thereby forcing them to off-load access-provision and archiving onto their authors' institutions, then, and only if/when "then" ever comes, authors' institutions will inherit the primary-content Preservation mission, and not just the supplementary-content preservation mission.Stevan Harnad
(Page 1 of 2, totaling 11 entries) » next page
Syndicate This Blog
Materials You Are Invited To Use To Promote OA Self-Archiving:
The American Scientist Open Access Forum has been chronicling and often directing the course of progress in providing Open Access to Universities' Peer-Reviewed Research Articles since its inception in the US in 1998 by the American Scientist, published by the Sigma Xi Society.
The Forum is largely for policy-makers at universities, research institutions and research funding agencies worldwide who are interested in institutional Open Acess Provision policy. (It is not a general discussion group for serials, pricing or publishing issues: it is specifically focussed on institutional Open Acess policy.)
You can sign on to the Forum here.
Last entry: 2017-03-27 13:12
1125 entries written
238 comments have been made