Saturday, April 6. 2013Paid-Gold OA, Free-Gold OA & Journal Quality Standards Peter Suber has pointed out that "About 50% of articles published in peer-reviewed OA journals are published in fee-based journals" (as reported by Laakso & Bjork 2012).Laakso & Bjork also report that "[12% of] articles published during 2011 and indexed in the most comprehensive article-level index of scholarly articles (Scopus) are available OA through journal publishers... immediately...". That's 12% immediate Gold-OA for the (already selective) SCOPUS sample. The percentage is still smaller for the more selective Thomson-Reuters/ISI sample. I think it cannot be left out of the reckoning about paid-Gold OA vs. free-Gold OA that: (#1) most articles are not published as Gold OA at all today (neither paid-Gold nor free-Gold)#2 and #3 are hypotheses, but I think they can be tested objectively. A test for #2 would be to compare the download and citation counts (not the journal impact factors) for Gold OA (including hybrid Gold) articles vs non-Gold subscription journal articles (excluding the ones that have been made Green OA) within the same subject (and language!) area. A test for #3 would be to compare the download and citation counts (not the journal impact factors) for paid-Gold (including hybrid Gold) vs free-gold articles within the same subject (and language!) area. I mention this because I think just comparing the number of paid-Gold vs. free-Gold journals without taking quality into account could be misleading. Wednesday, October 24. 2012Comparing Carrots and Lettuce"The inexorable rise of open access scientific publishing". Our (Gargouri, Lariviere, Gingras, Carr & Harnad) estimate (for publication years 2005-2010, measured in 2011, based on articles published in the c. 12,000 journals indexed by Thomson-Reuters ISI) is 35% total OA in the UK (10% above the worldwide total OA average of 25%): This is the sum of both Green and Gold OA.Our sample yields a Gold OA estimate much lower than Laakso & Björk's. Our estimate of about 25% OA worldwide is composed of 22.5% Green plus 2.5% Gold. And the growth rate of neither Gold nor (unmandated) Green is exponential. There are a number of reasons neither "carrots vs. lettuce" nor "UK vs. non-UK produce" nor L&B estimates vs. G et al estimates can be compared or combined in a straightforward way. Please take the following as coming from a fervent supporter of OA, not an ill-wisher, but one who has been disappointed across the long years by far too many failures to seize the day -- amidst surges of "tipping-point" euphoria -- to be ready once again to tout triumph. First, note that the hubbub is yet again about Gold OA (publishing), even though all estimates agree that there is far less of Gold OA than there is of Green OA (self-archiving), and even though it is Green OA that can be fast-forwarded to 100%: all it takes is effective Green OA mandates (I will return to this point at the end). So Stephen Curry asks why there is a discrepancy between our (Gargouri et al) estimates of Gold OA -- in the UK and worldwide (c. <5%) -- the estimates of Laakso & Björk (17%). Here are some of the multiple reasons (several of them already pointed out by Richard van Noorden in his comments too): 1. Thomson-Reuters ISI Subset: Our estimates are based solely on articles in the Thomson-Reuters ISI database of c. 12,000 journals. This database is more selective than the SCOPUS database on which L&B's sample is based. The more selective journals have higher quality standards and are hence the ones that both authors and users prefer. (Without getting into the controversy about journal citation impact factors, another recent L&B study has shown that the higher the journal's impact factor, the less likely that the journal is Gold OA. -- But let me add that this is now likely to change, because of the perverse effects of the Finch Report and the RCUK OA Policy: Thanks to the UK's announced readiness to divert UK research funds to double-paying subscription journal publishers for hybrid Gold OA, most journals, including the top journals, will soon be offering hybrid Gold OA -- a very pricey way to add the UK's 6% of worldwide research output to the worldwide Gold OA total: The very same effect could be achieved free of extra cost if RCUK instead adopted a compliance-verification mechanism for its existing Green OA mandates.) 2. Embargoed "Gold OA": L&B included in their Gold OA estimates "OA" that was embargoed for a year. That's not OA, and certainly should not be credited to the total OA for any given year -- whence it is absent -- but to the next year. By that time, the Green OA embargoes of most journals have already expired. So, again, any OA purchased in this pricey way -- instead of for a few extra cost-free keystrokes by the author, for Green -- is more of a head-shaker than occasion for heady triumph. 3. 1% Annual Growth: The 1% annual growth of Gold OA is not much headway either, if you do the growth curves for the projected date they will reach 100%! (The more heady Gold OA growth percentages are not Gold OA growth as a percentage of all articles published, but Gold OA growth as a percentage of the preceding year's Gold OA articles.) 4. Green Achromatopsia: The relevant data for comparing Gold OA -- both its proportion and its growth rate -- with Green come from a source L&B do not study, namely, institutions with (effective) Green OA mandates. Here the proportions within two years of mandate adoption (60%+) and the subsequent growth rate toward 100% eclipse not only the worldwide Gold OA proportions and growth rate, but also the larger but still unimpressive worldwide Green OA proportions and growth rate for unmandated Green OA (which is still mostly all there is). 5. Mandate Effectiveness: Note also that RCUK's prior Green OA mandate was not an effective one (because it had no compliance verification mechanism), even though it may have increased UK OA (35%) by 10% over the global average (25%). Stephen Curry: "A cheaper green route is also available, whereby the author usually deposits an unformatted version of the paper in a university repository without incurring a publisher's charge, but it remains to be seen if this will be adopted in practice. Universities and research institutions are only now beginning to work out how to implement the new policy (recently clarified by the RCUK)."Well, actually RCUK has had Green OA mandates for over a half-decade now. But RCUK has failed to draw the obvious conclusion from its pioneering experiment -- which is that the RCUK mandates require an effective compliance-verification mechanism (of the kind that the effective university mandates have -- indeed, the universities themselves need to be recruited as the compliance-verifiers). Instead, taking their cue from the Finch Report -- which in turn took its cue from the publisher lobby -- RCUK is doing a U-turn from its existing Green OA mandate, and electing to double-pay publishers for Gold instead. A much more constructive strategy would be for RCUK to build on its belated grudging concession (that although Gold is RCUK's preference, RCUK fundees may still choose Green) by adopting an effective Green OA compliance verification mechanism. That (rather than the obsession with how to spend "block grants" for Gold) is what the fundees' institutions should be recruited to do for RCUK. 6. Discipline Differences: The main difference between the Gargouri, Lariviere, Gingras, Carr & Harnad estimates of average percent Gold in the ISI sample (2.5%) and the Laakso & Bjork estimates (10.3% for 2010) probably arise because L&B's sample included all ISI articles per year for 12 years (2000-2011), whereas ours was a sample of 1300 articles per year, per discipline, separately, for each of 14 disciplines, for 6 years (2005-2010: a total of about 100,000 articles). 7. Biomedicine Preponderance? Our sample was much smaller than L&B's because L&B were just counting total Gold articles, using DOAJ, whereas we were sending out a robot to look for Green OA versions on the Web for each of the 100,000 articles in our sample. It may be this equal sampling across disciplines that leads to our lower estimates of Gold: L&B's higher estimate may reflect the fact that certain disciplines are both more Gold and publish more articles (in our sample, Biomed was 7.9% Gold). Note that both studies agree on the annual growth rate of Gold (about 1%) 8. Growth Spurts? Our projection does not assume a linear year-to-year growth rate (1%), it detects it. There have so far been no detectable annual growth spurts (of either Gold or Green). (I agree, however, that Finch/RCUK could herald one forthcoming annual spurt of 6% Gold (the UK's share of world research output) -- but that would be a rather pricey (and, I suspect, unscaleable and unsustainable) one-off growth spurt. ) 9. RCUK Compliance Verification Mechanism for Green OA Deposits: I certainly hope Stephen Curry is right that I am overstating the ambiguity of the RCUK policy! But I was not at all reassured at the LSHTM meeting on Open Access by Ben Ryan's rather vague remarks about monitoring RCUK mandate compliance, especially compliance with Green. After all that (and not the failure to prefer and fund Gold) was the main weakness of the prior RCUK OA mandate. Stevan Harnad Saturday, April 2. 2011"The Sole Methodologically Sound Study of the Open Access Citation Advantage(!)"It is true that downloads of research findings are important. They are being measured, and the evidence of the open-access download advantage is growing. See: S. Hitchcock (2011) "The effect of open access and downloads ('hits') on citation impact: a bibliography of studies"But the reason it is the open-access citation advantage that is especially important is that refereed research is conducted and published so it can be accessed, used, applied and built upon in further research: Research is done by researchers, for uptake by researchers, for the benefit of the public that funds the research. Both research progress and researchers' careers and funding depend on research uptake and impact. The greatest growth potential for open access today is through open access self-archiving mandates adopted by the universal providers of research: the researchers' universities, institutions and funders (e.g., Harvard and MIT) . See the ROARMAP registry of open-access mandates. Universities adopt open access mandates in order to maximize their research impact. The large body of evidence, in field after field, that open access increases citation impact, helps motivate universities to mandate open access self-archiving of their research output, to make it accessible to all its potential users -- rather than just those whose universities can afford subscription access -- so that all can apply, build upon and cite it. (Universities can only afford subscription access to a fraction of research journals.) The Davis study lacks the statistical power to show what it purports to show, which is that the open access citation advantage is not causal, but merely an artifact of authors self-selectively self-archiving their better (hence more citable) papers. Davis's sample size was smaller than many of the studies reporting the open access citation advantage. Davis found no citation advantage for randomized open access. But that does not demonstrate that open access is a self-selection artifact -- in that study or any other study -- because Davis did not replicate the widely reported self-archiving advantage either, and that advantage is often based on far larger samples. So the Davis study is merely a small non-replication of a widely reported outcome. (There are a few other non-replications; but most of the studies to date replicate the citation advantage, especially those based on bigger samples.) Davis says he does not see why the inferences he attempts to make from his results -- that the reported open access citation advantage is an artifact, eliminated by randomization, that there is hence no citation advantage, which implies that there is no research access problem for researchers, and that researchers should just content themselves with the open access download advantage among lay users and forget about any citation advantage -- are not welcomed by researchers. These inferences are not welcomed because they are based on flawed methodology and insufficient statistical power and yet they are being widely touted -- particularly by the publishing industry lobby (see the spin FASEB is already trying to put on the Davis study: "Paid access to journal articles not a significant barrier for scientists"!) -- as being the sole methodologically sound test of the open access citation advantage! Ignore the many positive studies. They are all methodologically flawed. The definitive finding, from the sole methodologically sound study, is null. So there's no access problem, researchers have all the access they need -- and hence there's no need to mandate open access self-archiving. No, this string of inferences is not a "blow to open access" -- but it would be if it were taken seriously. What would be useful and opportune at this point would be meta-analysis. Stevan Harnad American Scientist Open Access Forum EnablingOpenScholarship The Sound of One Hand ClappingSuppose many studies report that cancer incidence is correlated with smoking and you want to demonstrate in a methodologically sounder way that this correlation is not caused by smoking itself, but just an artifact of the fact that the same people who self-select to smoke are also the ones who are more prone to cancer. So you test a small sample of people randomly assigned to smoke or not, and you find no difference in their cancer rates. How can you know that your sample was big enough to detect the repeatedly reported correlation at all unless you test whether it's big enough to show that cancer incidence is significantly higher for self-selected smoking than for randomized smoking? Many studies have reported a statistically significant increase in citations for articles whose authors make them OA by self-archiving them. To show that this citation advantage is not caused by OA but just a self-selection artifact (because authors selectively self-archive their better, more citeable papers), you first have to replicate the advantage itself, for the self-archived OA articles in your sample, and then show that that advantage is absent for the articles made OA at random. But Davis showed only that the citation advantage was absent altogether in his sample. The most likely reason for that is that the sample was much too small (36 journals, 712 articles randomly OA, 65 self-archived OA, 2533 non-OA). In a recent study (Gargouri et al 2010) we controlled for self-selection using mandated (obligatory) OA rather than random OA. The far larger sample (1984 journals, 3055 articles mandatorily OA, 3664 self-archived OA, 20,982 non-OA) revealed a statistically significant citation advantage of about the same size for both self-selected and mandated OA. If and when Davis's requisite self-selected self-archiving control is ever tested, the outcome will either be (1) the usual significant OA citation advantage in the self-archiving control condition that most other published studies have reported -- in which case the absence of the citation advantage in Davis's randomized condition would indeed be evidence that the citation advantage had been a self-selection artifact that was then successfully eliminated by the randomization -- or (more likely, I should think) (2) no significant citation advantage will be found in the self-archiving control condition either, in which case the Davis study will prove to have been just one non-replication of the usual significant OA citation advantage (perhaps because of Davis's small sample size, the fields, or the fact that most of the non-OA articles become OA on the journal's website after a year). (There have been a few other non-replications; but most studies replicate the OA citation advantage, especially the ones based on larger samples.) Until that requisite self-selected self-archiving control is done, this is just the sound of one hand clapping. Readers can be trusted to draw their own conclusions as to whether Davis's study, tirelessly touted as the only methodologically sound one to date, is that -- or an exercise in advocacy. Self-Selected or Mandated, Open Access Increases Citation Impact for Higher Quality Research (2010) PLOS ONE 5 (10) (authors: Gargouri, Y., Hajjem, C., Lariviere, V., Gingras, Y., Brody, T., Carr, L. and Harnad, S.) Thursday, March 31. 2011On Methodology and Advocacy: Davis's Randomization Study of the OA AdvantageOpen access, readership, citations: a randomized controlled trial of scientific journal publishing doi:10.1096/fj.11-183988fj.11-183988Sorry to disappoint! Nothing new to cut-and-paste or reply to: Still no self-selected self-archiving control, hence no basis for the conclusions drawn (to the effect that the widely reported OA citation advantage is merely an artifact of a self-selection bias toward self-archiving the better, hence more citeable articles -- a bias that the randomization eliminates). The methodological flaw, still uncorrected, has been pointed out before. If and when the requisite self-selected self-archiving control is ever tested, the outcome will either be (1) the usual significant OA citation advantage in the self-archiving control condition that most other published studies have reported -- in which case the absence of the citation advantage in Davis's randomized condition would indeed be evidence that the citation advantage had been a self-selection artifact that was then successfully eliminated by the randomization -- or (more likely, I should think) (2) there will be no significant citation advantage in the self-archiving control condition either, in which case the Davis study will prove to have been just a non-replication of the usual significant OA citation advantage (perhaps because of Davis's small sample size, the fields, or the fact that most of the non-OA articles become OA on the journal's website after a year). Until the requisite self-selected self-archiving control is done, this is just the sound of one hand clapping. Readers can be trusted to draw their own conclusions as to whether this study, tirelessly touted as the only methodologically sound one to date, is that -- or an exercise in advocacy. Stevan Harnad American Scientist Open Access Forum EnablingOpenScholarship Wednesday, October 20. 2010Correlation, Causation, and the Weight of EvidenceJennifer Howard ("Is there an Open-Access Advantage?," Chronicle of Higher Education, October 19 2010) seems to have missed the point of our article. It is undisputed that study after study has found that Open Access (OA) is correlated with higher probability of citation. The question our study addressed was whether making an article OA causes the higher probability of citation, or the higher probability causes the article to be made OA. The latter is the "author self-selection bias" hypothesis, according to which the only reason OA articles are cited more is that authors do not make all articles OA: only the better ones, the ones that are also more likely to be cited. But almost no one finds that OA articles are cited more a year after publication. The OA citation advantage only becomes statistically detectable after citations have accumulated for 2-3 years. Even more important, Davis et al. did not test the obvious and essential control condition in their randomized OA experiment: They did not test whether there was a statistically detectable OA advantage for self-selected OA in the same journals and time-window. You cannot show that an effect is an artifact of self-selection unless you show that with self-selection the effect is there, whereas with randomization it is not. All Davis et al showed was that there is no detectable OA advantage at all in their one-year sample (247 articles from 11 Biology journals); randomness and self-selection have nothing to do with it. Davis et al released their results prematurely. We are waiting*,** to hear what Davis finds after 2-3 years, when he completes his doctoral dissertation. But if all he reports is that he has found no OA advantage at all in that sample of 11 biology journals, and that interval, rather than an OA advantage for the self-selected subset and no OA advantage for the randomized subset, then again, all we will have is a failure to replicate the positive effect that has now been reported by many other investigators, in field after field, often with far larger samples than Davis et al's. Meanwhile, our study was similar to that of Davis et al's, except that it was a much bigger sample, across many fields, and a much larger time window -- and, most important, we did have a self-selective matched-control subset, which did show the usual OA advantage. Instead of comparing self-selective OA with randomized OA, however, we compared it with mandated OA -- which amounts to much the same thing, because the point of the self-selection hypothesis is that the author picks and chooses what to make OA, whereas if the OA is mandatory (required), the author is not picking and choosing, just as the author is not picking and choosing when the OA is imposed randomly.Davis's results are welcome and interesting, and include some good theoretical insights, but insofar as the OA Citation Advantage is concerned, the empirical findings turn out to be just a failure to replicate the OA Citation Advantage in that particular sample and time-span -- exactly as predicted above. The original 2008 sample of 247 OA and 1372 non-OA articles in 11 journals one year after publication has now been extended to 712 OA and 2533 non-OA articles in 36 journals two years after publication. The result is a significant download advantage for OA articles but no significant citation advantage. And our finding is that the mandated OA advantage is just as big as the self-selective OA advantage. As we discussed in our article, if someone really clings to the self-selection hypothesis, there are some remaining points of uncertainty in our study that self-selectionists can still hope will eventually bear them out: Compliance with the mandates was not 100%, but 60-70%. So the self-selection hypothesis has a chance of being resurrected if one argues that now it is no longer a case of positive selection for the stronger articles, but a refusal to comply with the mandate for the weaker ones. One would have expected, however, that if this were true, the OA advantage would at least be weaker for mandated OA than for unmandated OA, since the percentage of total output that is self-archived under a mandate is almost three times the 5-25% that is self-archived self-selectively. Yet the OA advantage is undiminished with 60-70% mandate compliance in 2002-2006. We have since extended the window by three more years, to 2009; the compliance rate rises by another 10%, but the mandated OA advantage remains undiminished. Self-selectionists don't have to cede till the percentage is 100%, but their hypothesis gets more and more far-fetched... The other way of saving the self-selection hypothesis despite our findings is to argue that there was a "self-selection" bias in terms of which institutions do and do not mandate OA: Maybe it's the better ones that self-select to do so. There may be a plausible case to be made that one of our four mandated institutions -- CERN -- is an elite institution. (It is also physics-only.) But, as we reported, we re-did our analysis removing CERN, and we got the same outcome. Even if the objection of eliteness is extended to Southampton ECS, removing that second institution did not change the outcome either. We leave it to the reader to decide whether it is plausible to count our remaining two mandating institutions -- University of Minho in Portugal and Queensland University of Technology in Australia -- as elite institutions, compared to other universities. It is a historical fact, however, that these four institutions were the first in the world to elect to mandate OA. One can only speculate on the reasons why some might still wish to cling to the self-selection bias hypothesis in the face of all the evidence to date. It seems almost a matter of common sense that making articles more accessible to users also makes them more usable and citable -- especially in a world where most researchers are familiar with the frustration of arriving at a link to an article that they would like to read (but their institution does not subscribe), so they are asked to drop it into the shopping cart and pay $30 at the check-out counter. The straightforward causal relationship is the default hypothesis, based on both plausibility and the cumulative weight of the evidence. Hence the burden of providing counter-evidence to refute it is now on the advocates of the alternative. Davis, PN, Lewenstein, BV, Simon, DH, Booth, JG, & Connolly, MJL (2008) Open access publishing, article downloads, and citations: randomised controlled trial , British Medical Journal 337: a568 Gargouri, Y., Hajjem, C., Lariviere, V., Gingras, Y., Brody, T., Carr, L. and Harnad, S. (2010) Self-Selected or Mandated, Open Access Increases Citation Impact for Higher Quality Research. PLOS ONE 10(5) e13636 Harnad, S. (2008) Davis et al's 1-year Study of Self-Selection Bias: No Self-Archiving Control, No OA Effect, No Conclusion. Open Access Archivangelism July 31 2008 ![]() Tuesday, October 19. 2010Comparing OA and Non-OA: Some Methodological Supplements(1) Yes, we cited the Davis et al study. That study does not show that the OA citation advantage is a result of self-selection bias. It simply shows (as many other studies have noted) that no OA advantage at all (whether randomized or self-selected) is detectable only a year after publication, especially in a small sample. It's since been over two years and we're still waiting to hear whether Davis et al's randomized sample still has no OA advantage while a self-selected control sample from the same journals and year does. That would be the way to show what the OA advantage is a self-selection bias. Otherwise it's just the sound of one hand clapping.Harnad, S (2008) Davis et al's 1-year Study of Self-Selection Bias: No Self-Archiving Control, No OA Effect, No Conclusion. Open Access Archivangelism. July 31 2008.(2) No, we did not look only at self-archiving in institutional repositories. Our matched-control sample of self-selected self-archived articles came from institutional repositories, central repositories, and authors' websites. (All of that is "Green OA.") It was only the mandated sample that was exclusively from institutional repositories. (Someone else may wish to replicate our study using funder-mandated self-archiving in central repositories. The results are likely to be much the same, but the design and analysis would be rather more complicated.) Swan, A. (2006) The culture of Open Access: researchers’ views and responses, in Jacobs, Neil, Eds. Open Access: Key Strategic, Technical and Economic Aspects. Chandos Publishing (Oxford) Limited. ![]() Monday, February 8. 2010Open Access: Self-Selected, Mandated & Random; Answers & QuestionsGargouri, Y., Hajjem, C., Lariviere, V., Gingras, Y., Brody, T., Carr, L. and Harnad, S. (2010) Self-Selected or Mandated, Open Access Increases Citation Impact for Higher Quality Research.(Submitted)We are happy to have performed these further analyses, and we are very much in favor of this sort of open discussion and feedback on pre-refereeing preprints of papers that have been submitted and are undergoing peer review. They can only improve the quality of the eventual published version of articles. However, having carefully responded to Phil's welcome questions, below, we will, at the end of this posting, ask Phil to respond in kind to a question that we have repeatedly raised about his own paper (Davis et al 2008), published a year and a half ago... RESPONSES TO DAVIS'S QUESTIONS ABOUT OUR PAPER: PD:We are very appreciative of your concern and hope you will agree that we have not been interested only in what the referees might have to say. (We also hope you will now in turn be equally responsive to a longstanding question we have raised about your own paper on this same topic.) PD:Our article supports its conclusions with several different, convergent analyses. The logistical analysis with the odds ratio is one of them, and its results are fully corroborated by the other, simpler analyses we also reported, as well as the supplementary analyses we append here now. [Yassine has since added that your confusion was our fault because by way of an illustration we had used the first model (0 citations vs. 1-5 citations), with its odds ratio of 0.957 ("For example, we can say for the first model that for a one unit increase in OA, the odds of receiving 1-5 citations (versus zero citations) increased by a factor of 0.957 "). In the first model the value 0.957 is below and too close to 1 to serve as a good illustration of the meaning of the odds ratio. We should have chosen a better example. one in which (Exp(ß) is clearly greater than 1. We should have said: "For example, we can say for the second model that for a one unit increase in OA, the odds of receiving 5-10 citations (versus 1-5 citations) increased by a factor of 1.323." This clearer example will be used in the revised text of the paper. (See Figure 4S with a translation to display the deviations relative to an odds ratio of one rather than zero {although Excel here insists on labelling the baseline "0" instead of "1"! This too will be fixed in the revised text}.] PD:Here is the analysis underlying Figure 4, re-done without CERN, and then again re-done without either CERN or Southampton. As will be seen, the outcome pattern, as well as its statistical significance, are the same whether or not we exclude these institutions. (Moreover, I remind you that those are multiple regression analyses in which the Beta values reflect the independent contributions of each of the variables: That means the significant OA advantage, whether or not we exclude CERN, is the contribution of OA independent of the contribution of each institution.) PD:As noted in Yassine's reply to Phil, that formula was incorrectly stated in our text, once; in all the actual computations, results, figures and tables, however, the correct formula was used. PD:The log of the citation ratio was used only in displaying the means (Figure 2), presented for visual inspection. The paired-sample t-tests of significance (Table 2) were based on the raw citation counts, not on log ratios, hence had no leverage in our calculations or their interpretations. (The paired-sample t-tests were also based only on 2004-2006, because for 2002-2003 not all the institutional mandates were yet in effect.) Moreover, both the paired-sample t-test results (2004-2006) and the pattern of means (2002-2006) converged with the results of the (more complicated) logistical regression analyses and subdivisions into citation ranges. PD:As noted, the log ratios were only used in presenting the means, not in the significance testing, nor in the logistic regressions. However, we are happy to provide the additional information Phil requests, in order to help readers eyeball the means. Here are the means from Figure 2, recalculated by adding 1 to all citation counts. This restores all log ratios with zeroes in the numerator (sic); the probability of a zero in the denominator is vanishingly small, as it would require that all 10 same-issue control articles have no citations! The pattern is again much the same. (And, as noted, the significance tests are based on the raw citation counts, which were not affected by the log transformations that exclude numerator citation counts of zero.) This exercise suggested a further heuristic analysis that we had not thought of doing in the paper, even though the results had clearly suggested that the OA advantage is not evenly distributed across the full range of article quality and citeability: The higher quality, more citeable articles gain more of the citation advantage from OA. In the following supplementary figure (S3), for exploratory and illustrative purposes only, we re-calculate the means in the paper's Figure 2 separately for OA articles in the citation range 0-4 and for OA articles in the citation range 5+. The overall OA advantage is clearly concentrated on articles in the higher citation range. There is even what looks like an OA DISadvantage for articles in the lower citation range. This may be mostly an artifact (from restricting the OA articles to 0-4 citations and not restricting the non-OA articles), although it may also be partly due to the fact that when unciteable articles are made OA, only one direction of outcome is possible, in the comparison with citation means for non-OA articles in the same journal and year: OA/non-OA citation ratios will always be unflattering for zero-citation OA articles. (This can be statistically controlled for, if we go on to investigate the distribution of the OA effect across citation brackets directly.) PD:We will be doing this in our next study, which extends the time base to 2002-2008. Meanwhile, a preview is possible from plotting the mean number of OA and non-OA articles for each citation count. Note that zero citations is the biggest category for both OA and non-OA articles, and that the proportion of articles at each citation level decreases faster for non-OA articles than for OA articles; this is another way of visualizing the OA advantage. At citation counts of 30 or more, the difference is quite striking, although of course there are few articles with so many citations: REQUEST FOR RESPONSE TO QUESTION ABOUT DAVIS ET AL'S (2008) PAPER: Davis, PN, Lewenstein, BV, Simon, DH, Booth, JG, & Connolly, MJL (2008)Davis et al had taken a 1-year sample of biological journal articles and randomly made a subset of them OA, to control for author self-selection. (This is comparable to our mandated control for author self-selection.) They reported that after a year, they found no significant OA Advantage for the randomized OA for citations (although they did find an OA Advantage for downloads) and concluded that this showed that the OA citation Advantage is just an artifact of author self-selection, now eliminated by the randomization. What Davis et al failed to do, however, was to demonstrate that -- in the same sample and time-span -- author self-selection does generate the OA citation Advantage. Without showing that, all they have shown is that in their sample and time-span, they found no significant OA citation Advantage. This is no great surprise, because their sample was small and their time-span was short, whereas many of the other studies that have reported finding an OA Advantage were based on much larger samples and much longer time spans. The question raised was about controlling for self-selected OA. If one tests for the OA Advantage, whether self-selected or randomized, there is a great deal of variability, across articles and disciplines, especially for the first year or so after publication. In order to have a statistically reliable measure of OA effects, the sample has to be big enough, both in number of articles and in the time allowed for any citation advantage to build up to become detectable and statistically reliable. Davis et al need to do with their randomization methodology what we have done with our mandating methodology, namely, to demonstrate the presence of a self-selected OA Advantage in the same journals and years. Then they can compare that with randomized OA in those same journals and years, and if there is a significant OA Advantage for self-selected OA and no OA Advantage for randomized OA then they will have evidence that -- contrary to our findings -- some or all of the OA Advantage is indeed just a side-effect of self-selection. Otherwise, all they have shown is that with their journals, sample size and time-span, there is no detectable OA Advantage at all. What Davis et al replied in their BMJ Authors' Response was instead this: PD:This is not an adequate response. If a control condition was needed in order to make an outcome meaningful, it is not sufficient to reply that "the publisher and sample allowed us to do the experimental condition but not the control condition." Nor is it an adequate response to reiterate that there was no significant self-selected self-archiving effect in the sample (as the regression analysis showed). That is in fact bad news for the hypothesis being tested. Nor is it an adequate response to say, as Phil did in a later posting, that even after another half year or more had gone by, there was still no significant OA Advantage. (That is just the sound of one hand clapping again, this time louder.) The only way to draw meaningful conclusions from Davis et al's methodology is to demonstrate the self-selected self-archiving citation advantage, for the same journals and time-span, and then to show that randomization wipes it out (or substantially reduces it). Until then, our own results, which do demonstrate the self-selected self-archiving citation advantage for the same journals and time-span (and on a much bigger and more diverse sample and a much longer time scale), show that mandating the self-archiving does not wipe out the citation advantage (nor does it substantially reduce it). Meanwhile, Davis et al's finding that although their randomized OA did not generate a citation increase, it did generate a download increase, suggests that with a larger sample and time-span there may well be scope for a citation advantage as well: Our own prior work and that of others has shown that higher early download counts tend to lead to higher citation counts later. Bollen, J., Van de Sompel, H., Hagberg, A. and Chute, R. (2009) A principal component analysis of 39 scientific impact measures in PLoS ONE 4(6): e6022, Brody, T., Harnad, S. and Carr, L. (2006) Earlier Web Usage Statistics as Predictors of Later Citation Impact. Journal of the American Association for Information Science and Technology (JASIST) 57(8) 1060-1072. Lokker, C., McKibbon, K. A., McKinlay, R.J., Wilczynski, N. L. and Haynes, R. B. (2008) Prediction of citation counts for clinical articles at two years using data available within three weeks of publication: retrospective cohort study BMJ, 2008;336:655-657 Moed, H. F. (2005) Statistical Relationships Between Downloads and Citations at the Level of Individual Documents Within a Single Journal. Journal of the American Society for Information Science and Technology 56(10): 1088- 1097 O'Leary, D. E. (2008) The relationship between citations and number of downloads Decision Support Systems 45(4): 972-980 Watson, A. B. (2009) Comparing citations and downloads for individual articles Journal of Vision 9(4): 1-4 Sunday, January 17. 2010Preference Surveys and Self-Fulfilling Prophecies: Do Users Prefer No Access To Postprint Access?![]() SM: "Stevan asserts that researchers who cannot afford access to the published version of articles are perfectly happy with the self-archived author's final version.Sally does not always put her survey questions in the most transparent way. If you really want to find out whether or not researchers are "happy" with the author's refereed, accepted final draft when they lack access to the published version you have to ask them that: (1) "How often do you encounter online, in a search or otherwise, the author's free refereed, accepted final draft of a potentially relevant article to which you (or your institution) cannot afford paid full-text access?"That's the forthright, transparent way to put the exact contingencies we are addressing. No equivocation or ambiguity. In contrast, I am sure that Sally's question about "How often do you use author drafts?" was just that: "How often do you use author drafts?" Not "How often do you encounter a potentially relevant article, but decline to use it because you only have access to the author draft and not the published version?" Sally's responses -- which seem to say that 47% do use the author draft and 53% do not use the author draft -- fail to reveal whether the 53% who fail to use the author draft indeed fail to do so because, even though they have found a potentially relevant author draft free online, and lack access to the publisher draft, they prefer to ignore the potentially relevant author draft (this would be very interesting and relevant news if it were indeed true), or simply because they happen to be among the 53% who had never encountered a potentially relevant author draft free online when they had no access to the publisher version. (And could the 16% who did use the author draft "wherever possible" perhaps correspond to the well-known datum that only about 15% of all articles have freely accessible author drafts online)? Surveys that obscure these fundamental details under a cloud of ambiguity are not revealing researchers' preferences but their own. Stevan Harnad American Scientist Open Access Forum Thursday, January 7. 2010Log Ratios, Effect Size, and a Mandated OA Advantage?
![]() Phil Davis: "An interesting bit of research, although I have some methodological concerns about how you treat the data, which may explain some inconsistent and counter-intuitive results, see: http://j.mp/8LK57u A technical response addressing the methodology is welcome."Thanks for the feedback. We reply to the three points of substance, in order of importance: (1) LOG RATIOS: We analyzed log citation ratios to adjust for departures from normality. Logs were used to normalize the citations and attenuate distortion from high values. Moed's (2007) point was about (non-log) ratios that were not used in this study. We used log citation ratios. This approach loses some values when the log tranformation makes the denominator zero, but despite these lost data, the t-test results were significant, and were further confirmed by our second, logistic regression analysis. It is highly unlikely that any of this would introduce a systematic bias in favor of OA, but if the referees of the paper should call for a "simpler and more elegant" analysis to make sure, we will be glad to perform it. (2) EFFECT SIZE: The size of the OA Advantage varies greatly from year to year and field to field. We reported this in Hajjem et al (2005), stressing that the important point is that there is virtually always a positive OA Advantage, absent only when the sample is too small or the effect is measured too early (as in Davis et al's 2008 study). The consistently bigger OA Advantage in physics (Brody & Harnad 2004) is almost certainly an effect of the Early Access factor, because in physics, unlike in most other disciplines (apart from computer science and economics), authors tend to make their unrefereed preprints OA well before publication. (This too might be a good practice to emulate, for authors desirous of greater research impact.) (3) MANDATED OA ADVANTAGE? Yes, the fact that the citation advantage of mandated OA was slightly greater than that of self-selected OA is surprising, and if it proves reliable, it is interesting and worthy of interpretation. We did not interpret it in our paper, because it was the smallest effect, and our focus was on testing the Self-Selection/Quality-Bias hypothesis, according to which mandated OA should have little or no citation advantage at all, if self-selection is a major contributor to the OA citation advantage. Our sample was 2002-2006. We are now analyzing 2007-2008. If there is still a statistically significant OA advantage for mandated OA over self-selected OA in this more recent sample too, a potential explanation is the inverse of the Self-Selection/Quality-Bias hypothesis (which, by the way, we do think is one of the several factors that contribute to the OA Advantage, alongside the other contributors: Early Advantage, Quality Advantage, Competitive Advantage, Download Advantage, Arxiv Advantage, and probably others). The Self-Selection/Quality-Bias (SSQB) consists of better authors being more likely to make their papers OA, and/or authors being more likely to make their better papers OA, because they are better, hence more citeable. The hypothesis we tested was that all or most of the widely reported OA Advantage across all fields and years is just due to SSQB. Our data show that it is not, because the OA Advantage is no smaller when it is mandated. If it turns out to be reliably bigger, the most likely explanation is a variant of the "Sitting Pretty" (SP) effect, whereby some of the more comfortable authors have said that the reason they do not make their articles OA is that they think they have enough access and impact already. Such authors do not self-archive spontaneously. But when OA is mandated, their papers reap the extra benefit of OA, with its Quality Advantage (for the better, more citeable papers). In other words, if SSQB is a bias in favor of OA on the part of some of the better authors, mandates reverse an SP bias against OA on the part of others of the better authors. Spontaneous, unmandated OA would be missing the papers of these SP authors. There may be other explanations too. But we think any explanation at all is premature until it is confirmed that this new mandated OA advantage is indeed reliable and replicable. Phil further singles out the fact that the mandate advantage is present in the middle citation ranges and not the top and bottom. Again, it seems premature to interpret these minor effects whose unreliability is unknown, but if forced to pick an interpretation now, we would say it was because the "Sitting Pretty" authors may be the middle-range authors rather than the top ones... Brody, T. and Harnad, S. (2004) Comparing the Impact of Open Access (OA) vs. Non-OA Articles in the Same Journals. D-Lib Magazine 10(6).Yassine Gargouri, Chawki Hajjem, Vincent Lariviere, Yves Gingras, Les Carr, Tim Brody, Stevan Harnad Davis, P.M., Lewenstein, B.V., Simon, D.H., Booth, J.G., Connolly, M.J.L. (2008) Open access publishing, article downloads, and citations: randomised controlled trial British Medical Journal 337:a568 Hajjem, C., Harnad, S. and Gingras, Y. (2005) Ten-Year Cross-Disciplinary Comparison of the Growth of Open Access and How it Increases Research Citation Impact. IEEE Data Engineering Bulletin 28(4) 39-47. Moed, H. F. (2006) The effect of 'Open Access' upon citation impact: An analysis of ArXiv's Condensed Matter Section Journal of the American Society for Information Science and Technology 58(13) 2145-2156
(Page 1 of 4, totaling 33 entries)
» next page
|
QuicksearchSyndicate This BlogMaterials You Are Invited To Use To Promote OA Self-Archiving:
Videos:
The American Scientist Open Access Forum has been chronicling and often directing the course of progress in providing Open Access to Universities' Peer-Reviewed Research Articles since its inception in the US in 1998 by the American Scientist, published by the Sigma Xi Society. The Forum is largely for policy-makers at universities, research institutions and research funding agencies worldwide who are interested in institutional Open Acess Provision policy. (It is not a general discussion group for serials, pricing or publishing issues: it is specifically focussed on institutional Open Acess policy.)
You can sign on to the Forum here.
ArchivesCalendar
CategoriesBlog Administration |
|||||||||||||||||||||||||||||||||||||||||||||||||
