Monday, November 13. 2006
Critique of Publishing Research Consortium Study
The following is a critique of:
Chris Beckett and Simon Inger, Self-Archiving and Journal Subscriptions: Co-existence or Competition? An international Survey of Librarians' Preferences. Commissioned by the Publishing Research Consortium from Scholarly Information Strategies Ltd (SIS), a scholarly publishing consultancy. October 2006Because there has so far been no detectable correlation between author self-archiving and journal cancellations, the Publishing Research Consortium commissioned a survey of acquisition librarians' preferences and attitudes about a number of hypothetical alternatives. From the responses a theoretical model was constructed, which predicted cancellations as more self-archived content becomes available. How did the study arrive at this prediction without any actual cancellation data?
The prediction was based on a rather simple methodological flaw: Librarians were given a series of hypothetical choices, each a choice among three hypothetical "products," A, B and C. The librarians were asked to pick which of the three product options they would prefer most and least. Each hypothetical product option consisted of a complicated combination of six properties out of 3-4 possible values per property.
Presenting this array of hypothetical product options as choices to acquisition librarians (apart from being highly complicated and highly hypothetical, with many hidden assumptions) is specious, for among the potential properties of the hypothetical "product" options was the property that some of the options were free.
But a free self-archived journal article is not a product: It is not something that an acquisitions librarian decides whether or not to acquire. Open Access (OA) is not a product-acquisition issue at all: At best (or worst) its a product cancellation issue.
Hence the only credible and direct hypothetical question one could have asked librarians about self-archived journal articles (and even then there would be no guarantee that librarians would actually do as they predicted they would do under the hypothetical conditions) would be about the circumstances under which they think they would cancel existing journals:
And even that question is laden with highly speculative and even indeterminate assumptions: How could librarians (or anyone) know what percentage of a journal was accessible for free, self-archived, for any particular journal?"Would you cancel journal X if 100% of its articles were accessible free online (80%? 60%? 40%?)? If they were accessible immediately (after 6 months? 12? 24?)?"
And what about interactions between journal X and journal Y? (How to spend a given acquisitions budget -- what to acquire and what to cancel -- is presumably a comparative decision, and we are asking about the keep/cancel trade-offs.)
But what if 60% of all journals were free online (immediately? after 12 months?)? (Acquisition/cancellation decisions today are largely competitive ones: X gets cancelled in favour of Y. The rules of this trade-off game would presumably change if all journals were roughly on a par for their percentage of freely available online content or the length of the delay before it is freely available.)
Straightforward questions on what a librarian predicts they would cancel (in favour of what) under what hypothetical conditions (and how those conditions could be ascertained) might possibly have some weak predictive value. But such straightforward questions are not what this series of questions about preferences among hypothetical "product options" asked.
[Even straightforward hypothetical answers to straightforward hypothetical questions may not have any predictive value if the hypotheses are far-fetched or unfamiliar enough, if they have hidden or incoherent assumptions: I frankly don't believe there is a librarian alive who has a clue as to what they would keep or cancel if the self-archived versions of all journal articles were suddenly available free online today -- let alone what they would do as all journal contents gradually approached 100% availability, at various (uncertain) speeds, from a trajectory of increasing (but uncertain) free content (40% to 60% to 80%) and/or decreasing delay (24 months to 12 months to 6 months).]
And that's without mentioning intangibles such as any continuing demand for the paper edition, etc., nor how librarians could know the percentages available, how quickly the percentages would grow, and at what relative rate they would grow among more and less important journals, more and less expensive journals.
But it was not even these straightforward, if highly speculative, questions that were asked of librarians in this survey. Instead, they were asked to pick the most and least favoured option among three hypothetical "products," A, B and C, with a variety of complicated combinations of 6 hypothetical properties, which could each take 3-4 values:
1. ACCESS DELAY: 24-months, 12-months, 6-months, immediate access
2. PERCENTAGE OF JOURNAL'S CONTENT: 100%, 80%, 60%, 40%
3. COST: 100%, 50%, 25%, 0%
4. VERSION: preprint, refereed, refereed+copy-edited, published-PDF;
5. ACCESS RELIABILITY: high, medium, low
6. JOURNAL QUALITY: high, medium, low
In each case, products A, B and C were given some combination of the values on properties 1-6, and the librarian had to choose which of the 3 combinations they most and least preferred.
From samples of these combinations (interpolated and extrapolated within and between librarians) the survey concludes that:
PRC: A major study of librarian purchasing preferences has shown that librarians will show a strong inclination towards the acquisition [sic] of Open Access (OA) materials as they discover that more and more learned material has become available in institutional repositories.(1) OA materials are not "acquired" (and it is both misleading and absurd to cast either the questions or the responses in an acquisitions context). Non-OA products are acquired, and the availability of OA versions of them might or might not induce cancellation in favour of other non-OA products under various circumstances (that are not even touched upon by this study or its methodology).
Why would the model assume arbitrary differential rates of OA growth among journals rather than roughly uniform growth across all journals in each field (apart form random fluctuations)? And if there were systematic differential OA growth within a field, wouldn't librarians' decisions depend very much on the field, and on which journal contents happen to became OA faster, rather than on any general predictions generated from this theoretical model?
(2) Nothing whatsoever was determined about what happens as more and more OA becomes available all round, nor about how availability would be ascertained, nor at what rate OA would grow and be ascertained. There were merely static questions about 3 hypothetical competing "products," some stipulated to be PP% OA within MM months.
PRC: Overall the survey shows that a significant number of librarians are likely to substitute OA materials for subscribed resources, given certain levels of reliability, peer review and currency of the information available. This last factor is a critical one -- resources become much less favoured if they are embargoed for a significant length of time.The survey shows nothing whatsoever about libraries substituting OA material for anything, because free self-archived content is not something a subscriber institution (library) provides (by buying it in) but something an author institution provides, via its IR, by self-archiving it.
If the questions had been forthrightly put as pertaining to cancellation decisions under various hypothetical conditions, then at least we would have had librarians' speculations about what they think they would cancel under those hypothetical conditions. But instead we have inferences from a model based on least- and most-preferred "product" options having little or no bearing on any question other than the librarians' preferences for the hypothetical properties: They prefer journals with lower prices, whose content is higher quality, more reliable, more immediate, peer-reviewed, and preferably 100% of it. (Librarians don't much care whether the peer-reviewed article is the author's final draft or the publisher's PDF, as long as it's peer-reviewed: That is a genuine finding of this study!)
There is no way at all to interpolate or extrapolate from data like these to draw valid or even coherent conclusions about self-archiving and cancellations, with or without a "conjoint analysis" model.
PRC: One of the key benefits of the conjoint analysis approach used in this survey was the removal of bias by not referring, when testing different product configurations, to any named incarnations of content types, including subscription journals, licensed full-text (or aggregated) databases, or articles on OA repositories.This "bias" was eliminated at the cost of making it a questionnaire about acquisitions among a variety of competing "products" when it should have been a questionnaire about cancellations under a variety of hypothetical OA conditions (many of them unascertainable, hence moot).
PRC: The survey tested librarians' preferences for a series of hypothetical and unnamed products frequently showing unfamiliar combinations of attributes -- such as a fully priced journal embargoed for 24 months, or content at 25% of the price but through an unreliable service. By taking this approach, the survey measured librarians' preferences for an abstract set of potential products thus avoiding any pre-conceived preferences for named products, such as journals, licensed full- text (aggregated) databases or content on OA repositories.Indeed. But OA is not an alternative product for acquisition: it is a property that might or might not induce cancellation in favor of other products under certain hypothetical (and presumably competitive) conditions.
PRC: The data were abstracted into a "Share of Preference" model (or simulator) which has then been used to model real-life products and thus create predictions for librarians' real-life preferences for these products. It is therefore possible to go beyond the comparisons, in this work, of journals versus OA and to model other preferences, such as between OA and licensed full-text databases.The "Share of Preference model" might be viable when the preference really concerns competing products for acquisition, with a variety of rival properties, but it fails completely when applied to free non-products, not for acquisition at all, but treated as if they were just another among the rival properties of products competing for acquisition.
We could have said a-priori that librarians (like all consumers) will prefer a higher quality product over a lower quality product, 100% of a product over 60% of a product, an immediate product over a delayed product, a lower-priced product over a higher-priced product. A "Share of Preference model" could give some rough rank orders for those various combinations.
It seems natural to add to such a "Share of Preference model" that consumers will prefer a free product over a priced product, except that we are talking here about acquisitions librarians, who do not "acquire" free products but merely buy or cancel priced journals. This study simply does not and cannot indicate under what OA conditions they will cancel what for what.
The following (mild) conclusions, are the only ones that can be drawn:
PRC: There is a strong preference for content that has undergone peer review.Yes, and librarians don't much care whether the peer-reviewed content is the publisher's PDF version or the author's final version -- except that the publisher's PDF is for sale and the author's final draft is not! Nor does the model tell us under what conditions, if both versions are available for a journal X, librarians would cancel the publisher's PDF (and in favour of what journal Y?). The question is never even raised. That's the question the study was designed to answer, but the method could not answer it. The survey might as well have asked the librarians directly, for X/Y pairs of hypothetical or actual journals -- rather than A/B/C triplets of hypothetical "products" -- banal questions such as:
I suspect that it is because -- in the absence of any actual evidence of self-archiving causing cancellations -- a survey on hypothetical cancellations of journal X in favour of journal Y (or no journal at all) under various %OA and months-delay conditions would not have been very convincing or informative that the survey instead resorted to "Share of Preference" modelling. But I'm afraid the outcome is even less convincing."If 100% of X were immediately available for free online and Y was not, and your users needed X and Y equally, and you could not afford both, and you currently subscribed to X and not to Y, would you cancel X for Y?"
PRC: How soon content is made available is a key determinant of content model preference in librarian's acquisition behaviour; delay in availability reduces the attractiveness of a product offering.Yes, immediate access is preferable to delayed access. And, no doubt, if/when librarians are ever inclined to cancel a journal X because PP% of its articles are freely available, they are more likely to do so if that PP% is immediately available than if it is only available 24 months after publication. But we could have guessed that without this study. The question is: Under what circumstances are librarians going to cancel what, when? This study does not and cannot tell us. Relative preference models can only tell us that they are more likely to do it under these conditions than under those conditions (and we already knew all that).
Having said all this, it is important to state clearly that, although there is still no evidence at all of self-archiving causing cancellations, it is possible, indeed probable, that self-archiving will cause some cancellations, eventually. No one knows (1) how soon it will cause cancellations, nor (2) how many cancellations it will cause. That all depends on (a) how much demand there still is for the print edition and (b) for the journal's online edition at that time, (c) for how long that demand lasts, and (d) how quickly self-archiving grows and approaches 100%. (Perhaps someone should do a survey on people's predictions about those factors!)
But regardless of any of this -- and regardless also of the validity or invalidity of the present survey -- the possibility or probability of cancellation pressure is most definitely not the basis on which the research community should decide whether or not to self-archive and whether or not to mandate self-archiving. That decision must be based entirely on the benefits of OA self-archiving for research access, impact, productivity and progress -- definitely not on the basis of the possibility of revenue losses for publishers.
We do well to remind ourselves that these questions are not primarily about what is or is not good for the publishing industry. They are about what is and is not good for research, researchers, their institutions, their funders, and the tax-paying public that funds the funders. Research is supported and conducted and peer-reviewed and published for the sake of research progress and applications, not in order to support the publishing industry, or to protect it from risk.
And what is certain is that peer-reviewed research publishing can and will successfully adapt to Open Access: How can it fail to do so, when it is researchers who conduct the research, write the articles, perform the peer review, read, use, apply and cite the research, and, now, provide online access to it as well? Publishers are performing a valuable service (in implementing the peer review and in providing a paper and online edition) but it is publishing that must adapt to what is best for research in the online age, definitely not research that must adapt to what is best for publishing. And publishing can and will adapt.
(I might add that Dr. Alma Swan is not the super-ennuated (sic) Proustian personage repeatedly cited in this PRC survey, but the cygnine author of a number of landmark surveys, one of them reporting the only existing evidence -- negative -- for a causal connection between OA self-archiving and cancellations.)Berners-Lee, T., De Roure, D., Harnad, S. and Shadbolt, N. (2005) Journal publishing and author self-archiving: Peaceful Co-Existence and Fruitful Collaboration
Swan, A. (2005) Open access self-archiving: An Introduction. JISC Technical Report.
On Thu, 16 Nov 2006, Simon Inger and Chris Beckett replied:
1. The methodology deployed and the entire point of conducting a conjoint survey at all:Simon and Chris are, I think, quite right that there is considerable danger of bias, in one direction or the other, when acquisitions librarians are asked to speculate about what they would do in hypothetical future scenarios.
But it is not at all clear that the method Simon and Chris used corrects for these biases, or merely changes the subject (from predicting cancellations under hypothetical conditions, to merely expressing product/property preferences under hypothetical conditions).
A survey that asks people if they like steak to eat, and then asks if they like chicken to eat, is not as powerful as a survey that asks them to choose between steak and chicken. Bring in another variable, such as, "how well done do you like your meat?" and you get a very different answer depending on whether the surveyee preferred steak or chicken in the first place. By combining these factors with others through a conjoint survey, you might just find out how bad the steak has to be before chicken tartare starts to command a market share! We hope this illustrates the whole purpose of the conjoint in applying it to the situation that publishing currently faces; it forces people to reveal the true underlying factors in their decision-making in a way that hasn't been done before.The conjoint method is no doubt a good method for estimating or ranking relative product property preferences in general. But in the particular case of library journal acquisitions/cancellations, OA and self-archiving, as noted, the method not only does not remedy the the possibility of bias, but it bypasses the question of cancellations altogether -- the question that I take it that (for lack of actual cancellation data) the survey was trying to answer.
2. Whether or not OA can be considered a product in any meaningful sense:I'm afraid I cannot agree with this reasoning: The mobile phone analogy (as well as the meat analogy) begs the question, because in both cases the product and the client are unambiguous, and it is a straightforward quid pro quo: Would the client rather buy steak or chicken? mobile phone or home phone? The choice is a direct trade-off between (two) competing products. And I also agree that if one of them were free, that would not change anything: It would still be this versus that.
But that's not at all how it is with paid journals vs. self-archived OA content.
Let's start with an easy example: Suppose we weren't talking about anarchically self-archived articles, but about OA vs. non-OA journals. And to make it even simpler, let us suppose (as is the case with, for example, with BioMed Central journal institutional "memberships"), that a library has a choice between two journals that are equated, somehow, in terms of readership, quality, subject-matter and usage-needs of institutional users, that there is only enough money to afford one of them, and that they differ in that one is subscription-based and the other is based on institutional "membership" fees (for publishing institutional articles).
That's an odd choice situation for an acquisitions librarian (since in one case the librarian is buying in the journal's content, and in the other the librarian is paying for the institution's own outgoing content), but perhaps librarians would intuit that they get better value for their institutional money from the second journal (especially if they consult with their institutional users, and they agree -- a detail not mentioned by the survey, which seems to assume subscription/cancellation decisions are all or mostly in the hands of the librarians!).
But that would be a prima-facie plausible prediction by librarians, about what they would prefer and do under those conditions. Even more plausible would be a least/most choice involving three equivalent journals, when the library can afford only two journals, and the third is an OA journal for which someone else (other than the library) pays the institutional OA charges, making it effectively "free" to the library. Under those conditions the librarian could realistically say they'd prefer to "cancel" the free (OA) journal (i.e., just let users download it for themselves, free, from the web) so they can use all available money saved for the other two journals.
(Of course, the tricky part is that a pure OA journal [e.g., BMC or PLoS] is not one that a library subscribes to anyway! (Actually, most OA journals are available for subscription, and do not charge author-institutions for publication. Possibly, just possibly, the results of the PRC survey might have some predictive value as to whether that kind of OA journal is likely to be cancelled; but so far there is little actual evidence of that happening either, though it might! Keep your eyes on the longevity of the majority of the OA journals in DOAJ that do not change for publication but make ends meet from subscriptions.)
But we have not yet come to third option, the one that the survey was commissioned by PRC to test, and that is author self-archiving, and whether that will cause cancellations.
It is for author self-archiving that the question of the extra properties of percentage content, and length of embargo had to be introduced and varied in this study. Length of embargo is not the problem, but percentage content very much is, and so is the fact that all self-archived content is free. Here we are square in the middle of the profound difference between OA journals (a complete, quid-pro-quo product) and OA self-archiving (an anarchic process, applying to only a portion of content, and an unknown proportion at that, growing -- but again at an unknown rate -- across time).
With journals (including OA journals), it's journal X vs journal Y ("product" X vs. "product" Y): Shall I purchase X and cancel Y, or vice versa? Shall I purchase X and Y and cancel Z? These are presumably familiar, hence realistic acquisitions librarian questions (in consultation with users -- who were not surveyed in this survey!).
But what is the question with journals vs. anarchic self-archived content? What is it that a librarian is contemplating buying versus cancelling when what they are really faced with is a choice between a journal and a distributed, anarchic and uncertain percentage of its contents (with no indication of how it is even knowable what that percentage is)?
But let's overlook that and agree that if it were a question of buying vs. cancelling journal X based on some estimate of the percentage of its contents that is available for free in self-archived form, librarians could dream up a hypothetical preference from a combination of properties such as journal quality, journal price, percentage free content, and embargo length.
But that would be journal X vs. not-X, or journal X vs. Y. What is the librarian's conjecture as to their preference when all journals have PP% of their content self-archived? That's not a journal vs. journal acquisition/cancellation question any more: It's asking librarians to second-guess the OA future: Are we to infer from the conjoint preference data that they would cancel all journals under those conditions (second-guessing their users on how long they might, for example, continue to value the paper edition?).
The analogy with chicken and steak would be whether conjoint chicken/steak or mobile/home-phone property preferences predict whether and when people would stop paying for food or phones altogether because they were somehow miraculously available free with a certain probability (and/or) delay) for a certain percentage of the potential calls and time. We know that if it were all free, immediately and with certainty, everyone would prefer that. But do conjoint preferences tell us one bit more than that? (And again we leave out the parties of the second part -- the institutional users - as well as the paper edition and how they might feel about it, and for how long...)
That may be so, for now, but at the same time we are aware of organisations that are building products which combine the power of OAI-PMH (and the crawling power of Google); existing abstracting & indexing databases; publisher operated link servers; and library operated link servers: to build an organised route to OA materials - a route that would allow a non-subscriber of a journal article to be directed to the free OA repository version instead. Once these products exist we are sure our research indicates that some librarians at least will actually switch to OA versions for some of their information needs, while others will continue to purchase the journal product for a whole raft of reasons and others will provide, i.e. acquire, both options.Let me quickly agree about what I would not have contested from the very outset:
(1) Without the conjoint survey, I would already have agreed that everyone prefers to have something for free rather than paying for it.
(2) I also happen to believe, personally, that once 100% OA self-archiving has been reached -- but I don't know how soon it will be reached, nor how soon after it is reached this will happen -- there will be cancellation pressure that will lead to downsizing and a transition to OA publishing.
But it is still a fact that there is as yet no evidence of cancellation pressure, and I do not at all see how the conjoint preference study tells us any more than we already know (and don't know) about whether and when and how much cancellation pressure will ever be caused by self-archiving.
(I have to add that I profoundly doubt that in the OA world libraries and librarians will mediate in any way between users and the refereed journal article literature. Library mediation will be as supererogatory as it is with what users do with google today.)
3. The issue of bias:I think the attempt to avoid all of these emotional (and notional) biases was a commendable one, and it would have been successful too, if the conjoint-preference method had been amenable to analysing the anarchic phenomenon of author self-archiving and its likely effect on librarian acquisition/cancellation. But it is not, because anarchic, blanket self-archiving is simply not an acquisition/cancellation matter.
Acquisition/cancellation concerns what to buy, retain and cancel from among a finite set of products using a finite acquisitions budget. It is a competitive matter: competition between products. Anarchic self-archiving is gradual and uncertain, but it generates only an all-or-none cancellation question, and one that is in no way addressed by the conjoint preferences method.
(I am sure, by the way, that librarians could have been polled -- directly and unemotionally -- about how much journal content they thought would have to be self-archived before they would no longer need to purchase journals at all -- but I don't think their speculations on that would have been very informative.)
I do think, though, that one indirect finding on this question did emerge from the conjoint method (and it surprised me, considering how strident some librarians have been in the opposite direction in the past!): It does seem that librarians are surprisingly indifferent to the difference between an author's refereed final draft and the publisher's PDF. That's very interesting (and it's progress: in librarian awareness and understanding of what researchers really do and don't need!).
4. The statement of apparently obvious or banal findings:Agreed. (But that's hardly very surprising either! Nor informative about whether and when self-archiving causes cancellations.)
Much more important, however, is how the decision becomes qualified by other factors - and to what extent they are qualified. (Would you like free raw chicken for dinner or paid-for cooked chicken?) Look closely and the results show that the lure of "free" has only so much pulling power, and a combination of other factors pull more potently against it. So in themselves the importance of each of the attributes has limited value - it is in combination that their true meaning comes through.I think what you are saying here is that in varying the combination of 6 properties, each with 3-4 possible values, you founded a complex preferential structure. But it still doesn't tell us whether and when self-archiving will cause cancellations.
5. The validity of inferring cancellation behaviour from the findings:For those (like me) who happen to think that 100% OA self-archiving is likely eventually to cause cancellations, downsizing, and a transition to the OA cost-recovery, but that there is as yet no evidence of this, and that it is a matter of complete uncertainty how fast the self-archiving will grow, how soon the cancellation pressure will be felt, and how strong the cancellation pressure will be -- this study did not provide any new information.
For those empiricists (with whom I have some sympathy too), who simply say there is no evidence at all yet that self-archiving causes cancellations -- and that even in the few fields where self-archiving has been at or near 100% for some years there is still no such evidence -- it is likewise true that this study has not provided any new evidence: neither about whether there will be cancellations, nor, if so, about when and how much.
American Scientist Open Access Forum
Syndicate This Blog
Materials You Are Invited To Use To Promote OA Self-Archiving:
The American Scientist Open Access Forum has been chronicling and often directing the course of progress in providing Open Access to Universities' Peer-Reviewed Research Articles since its inception in the US in 1998 by the American Scientist, published by the Sigma Xi Society.
The Forum is largely for policy-makers at universities, research institutions and research funding agencies worldwide who are interested in institutional Open Acess Provision policy. (It is not a general discussion group for serials, pricing or publishing issues: it is specifically focussed on institutional Open Acess policy.)
You can sign on to the Forum here.
Last entry: 2016-11-25 23:04
1117 entries written
238 comments have been made