Write to Cite: Writing Style and Citation Counts

A pair of articles bookended the summer with bibliometric data that tells us something about the correlation of writing style and citation counts:

Shorter-title articles have more citations.(1)

Longer-abstract articles have more citations (2)

At first these seemed contradictory to me. Surely either less is more, or more is more? I read the longer-abstracts piece (not an article, but an editorial that includes analysis) early in the summer and imagined a logical progression:

  • more words in the abstract would match
    • more search engine queries and more matches would lead to
      • more presence in search results which would lead to
        • more readers, which would lead to
          • more citations.

Indeed, the authors of the “articles with long abstracts have more citations” essay describe something like this as a hypothesis:

“We find that shorter abstracts (fewer words [R1a] and fewer sentences [R1b]) consistently lead to fewer citations, with short sentences (R2) being beneficial only in Mathematics and Physics. Similarly, using more (rather than fewer) adjectives and adverbs is beneficial (R5). Also, writing an abstract with fewer common (R3a) or easy (R3b) words results in more citations.” (2)

And

“Despite the fact that anybody in their right mind would prefer to read short, simple, and well-written prose with few abstruse terms, when building an argument and writing a paper, the limiting step is the ability to find the right article. For this, scientists rely heavily on search techniques, especially search engines, where longer and more specific abstracts are favored. Longer, more detailed, prolix prose is simply more available for search. This likely explains our results, and suggests the new landscape of linguistic fitness in 21st century science.” (2)

But then there’s that “articles-with-short-titles-have-more-citations” study, published in late August. (1) I couldn’t come up with a logical progression that seemed to indicate why short titles would correlate with more citations. The opposite came to mind, in fact: in researcher interviews in 2013 and 2014, HighWire heard a preference among readers for article titles that were like a declarative sentence: “A catalyzes B in the presence of C.” These would not tend to be short titles. Many publishers have begun including such statements on their tables of contents, and researchers tell us they like this. HighWire calls these “annotated TOCs”; Science for example does these very well in its TOC. But the ‘annotations’ are typically longer than the authors’ article title.

(It was unfortunate, perhaps, that these two articles were completely complementary: one looked only at titles, the other looked only at abstracts.)

Should editors counsel authors to shrink titles and expand abstracts? Neither study investigated or demonstrated causation. That is, shrinking a title or bulking up an abstract isn’t demonstrated to increase (cause) citations, only that short titles and long abstracts are associated with or attributes of (correlated with) more highly-cited articles.

Neither article demonstrates an explanation for its findings, though each has a hypothesis. But the hypothesis about titles doesn’t work for abstracts, and vice versa. Here’s the explanation offered in the “short-titles-more-cites” paper:

“We propose three possible explanations for these results. One potential explanation is that high-impact journals might restrict the length of their papers’ titles. Similarly, incremental research might be published under longer titles in less prestigious journals. A third possible explanation is that shorter titles may be easier to understand, enabling wider readership and increasing the influence of a paper.”

The complementary long-abstracts-more-cites article looks at 15 guidelines commonly recommended to writers of scientific articles.   It provides evidence that eight of the rules are wrong-headed, correlating negatively (in red) with citations:

journal.pcbi.1004205.g001“Fig 1. Effect of abstract features on citations. For each discipline (rows) and each abstract feature (columns), we measured whether a certain feature (e.g., having fewer words than the typical abstract published in the same journal [R1a]) led to a significant increase (blue) or decrease (red) in total citations. We considered an effect positive or negative only if the associated probability of being zero was smaller than 0.01/15 (i.e., we applied the Bonferroni correction to obtain an overall significance level of 1%). doi:10.1371/journal.pcbi.1004205.g001”

The short-titles-more-cites paper has a more complicated story to tell (despite its shorter title): the negative relationship between title length and citation count is clear for articles 2007-2011, but survives only at the journal level for 2012 and 2013. The authors summarize thus:

“Our analysis suggests that papers with shorter titles do receive greater numbers of citations. However, it is well known that papers published in certain journals attract more citations than papers published in others. When citation counts are adjusted for the journal in which the paper is published, we find that the strength of the evidence for the relationship between title length and citations received is reduced. Our results do however reveal that journals which publish papers with shorter titles tend to receive more citations per paper.”

What to make of this?

Researchers tell us they find their way to articles using several routes:

  1. Personal recommendations
  2. Email tables of contents (for favorite or “followed” journals)
  3. Scholarly search engines (e.g., Google Scholar, PubMed, Scopus, SSRN)
  4. General search engines (e.g., Google Web Search)

Article titles are a primary UI for #2,3,4 in that a title is what shows up, plus matching text shows up in “snippets” for some of #3,4.   But for search engines such as Google Scholar, matching a search query is not based solely on title or abstract, but on full text. So abstract words are not primarily there to match query terms in the case of search engines that index full text, like Google Scholar.

But those abstracts are important for the filtering steps that a researcher uses to decide whether to invest more time in reading.   My colleague Anurag Acharya, who leads the Google Scholar team, offers this explanation:

“Researcher workflow … is often structured as multiple filtering steps — do a query, scan results list and pick some abstracts to read, read these abstracts, pick some fulltext articles to read. Longer/more detailed abstracts have the potential to help the paper make it through the second filtering step (read abstract -> pick fulltext).”

But what to make of the correlation with short titles? Sometimes a shorter title (assuming it isn’t fanciful with low “information scent”) can suggest something more comprehensive, and thus attract the reader who is looking for a foundational article to read – what Anurag calls the ‘name’ paper — and possibly to cite. Consider two titles: “Arabidopsis Mutagenesis” vs. “Genome-wide insertional mutagenesis of Arabidopsis thaliana.”

Think of the research-article equivalent of Wikipedia titles; or the titles in Annual Reviews articles, which are comprehensive pieces that are hugely cited. Anurag had a personal experience with this:

“As it happens, many years ago, I had submitted a paper on disk architectures titled “Active DIsks”. The paper was accepted but the program committee chair insisted I change the title to a longer one. So we went with “Active disks: Programming model, algorithms and evaluation”. . So that the paper didn’t become the “name” paper for this architecture.”

(1) :Letchford A, Moat HS, Preis T. 2015 The advantage of short paper titles. R. Soc. open sci.2: 150266. http://dx.doi.org/10.1098/rsos.150266

(2) Weinberger CJ, Evans JA, Allesina S (2015) Ten Simple (Empirical) Rules for Writing Science. PLoS Comput Biol 11(4): e1004205. http://10.1371/journal.pcbi.1004205

Why terminology in access management needs to be more … accessible

Recently, I was introduced to something called Domain Driven Design (DDD). There’s a great book on the subject by Abel Avram and Floyd Marinescu available for free at infoq.com. One of the more intriguing concepts from DDD I found described in the book is the ‘ubiquitous language’. Continue reading

The Royal Society’s “Future of Scholarly Scientific Communication” Meeting, Part 1

In April & May 2015 the Royal Society held a two-part conference on scholarly scientific communication. Before the summer ends I want to write my impressions of the first part of the conference, which was largely about peer review. There is important material from this conference, for editors and societies who are considering editorial changes as they go into the fall cycle of board meetings.

The conference was notable in that the Royal Society invited delegates from all the types of stakeholders in the “ecosystem” of scientific communication.   So this was not at all the typical “publishers-only meeting”. Of course there were publishers present, along with journal editors and researchers at various career stages. But there were also representatives from funders and from institutions, from technology and commercial as well, along with experts in the history of science. The mix was cross-disciplinary as well: physics, biology, chemistry etc.   (The historian just mentioned is Aileen Fyfe of St. Andrews.   She provided some commentary from outside the sciences. Prof. Fyfe could remind us how “modern” peer review came about, and what its methods were designed to do – but also that complaints about the process are not something new to the last 50 years.)

At the end of this post, I’ll provide pointers to the conference details, including a summary and audio recordings. But first, the highlights:

Going into this meeting, I had observed that some of the most interesting things happening in the publishing ecosystem are happening “upstream” from the published-journal web-site: they are happening in the peer-review workflow.   There’s plenty of evidence for this: peer review changes and experiments going on at BMJ, at eLife, at PLOS, at the Royal Society, at Cold Spring Harbor Labs, at Faculty of 1000, etc.   “But wait, there’s more” as they say: the Royal Society pointed attendees to a background paper written by the Research Information Network and commissioned by the Wellcome Trust: Scholarly Communication and Peer Review – The Current Landscape and Future Trends.  This 30+ page paper points to a lot of the experiments and trends. If you or your editors are planning experiments this fall, the paper is worth a run through.

My major takeaway from this meeting is that there was surprising consensus – perhaps even a sense of inevitability – that the practice of posting dfpreprints would address many of the problems in science publishing, particularly in biomedical sciences. (The practice being long-established in fields of physics.)  Preprints (like ArXiv in physics, and bioRxiv from CSHL in life sciences) actually changes the “upstream/downstream” dynamic that I mentioned above: in traditional review models, evaluation precedes distribution; but preprint availability lets distribution precede evaluation. So many of the problems with bias and delay are mitigated by distribution (availability) coming ahead of the review filter.   This lets expert readers tap into an information stream, which they can filter for themselves.

Experts doing their own filtering has come up before in HighWire’s work. In researcher interviews that we conducted in 2014, we saw some conflicting commentary: readers were telling us that journal brand was important to identifying articles to read, but they also told us is was irrelevant – sometimes the same researcher told us both. When we pursued this, we found the key, handed to us by a neuroscience postdoc (to paraphrase): “When I’m reading an article in an area in which I’m expert, I don’t really care where it is published, and don’t need peer review – I can do my own review; for articles outside my expertise, I rely on other experts to review it first.”

This is a clear argument in favor of preprint servers: they get articles in front of all the potential expert readers fast.   To borrow a phrase from the conference, preprints don’t “impede science”. They don’t polish it either.

This consensus for preprint servers emerged in the morning discussion on the second day of the conference.   I don’t recall seeing Harold Varmus at the second day of the meeting (he was at the first day) – if he were there he might have been bemused recalling the horrified reaction to his “E-biomed” preprint server proposal in 1999!

For further reading:

The summary report from the conference is extensive and well-edited.  Pages 8-10 are about the peer review discussion. The full four-day meeting agenda is also online, along with links to audio files for those who really want the play-by-play!

More on this topic:

Prof. Fyfe and I will be joined by colleagues — John Inglis, who heads Cold Spring Harbor Labs; Dr Simon Kerridge, Director of Research Services at University of Kent and Chair of the Association of Research Managers and Administrators;  and Dr Kirsty Edgar, Leverhulme Early Career Research Fellow, at the School of Earth Sciences, University of Bristol — at the upcoming ALPSP meeting at Heathrow for a panel discussion of Peer Review: Evolution, Experiment, and Debate, on Friday morning, 11 September 2015.

Aligning publishing technology with the funder view of impact

There are few topics in digital publishing that cause so much debate as that of research impact. A lot of this debate – within the publishing world, at least – has tended to focus on ways of improving (or improving on) existing mechanisms. How can we make Impact Factor work better? Should we put less emphasis on the journal and more on the article – or on the author? Continue reading

Writing Headlines: Tell Readers Something Useful

If you write article titles, be clear, not cute.

Every journal that I work with has declared the “online version is the journal of record.” But some still seem to write for print, not for online, when they write article titles that work in print, but don’t work in the nearly-context-free online places that headlines appear.

Article titles are “headlines”, and headline-writers sometimes get clever with plays on words.   I remember in graduate school (I was in English Literature) thinking that every dissertation title had to have some too-clever play on words to pique interest, followed by a colon, followed by something that actually communicated meaning and intent.   Something like this:

Dangerous Grounds: Coffee Farming During the Civil War

This will work as long as the clear part after the colon never gets separated from the cute part, by some errant software, or just truncated in the small spaces of a cell phone or a sidebar.

We admire cleverness in the names of TV shows, book titles and boutique stores. But in the scholarly-publishing world of too-much-to-read, where we let search criteria, search engine result pages, and email tables of contents be our main decision point on what to read, we need to be clear, not cute; tell, not tease.

Journals today are including more and more “front matter” and editorials in their publishing, and need to have headline editors pay attention to the online context. Actually, it is the lack of context that is the challenge. Think about the places that headlines appear:

  • Google and Google Scholar search results
  • Online tables of contents
  • Email tables of contents
  • Mobile phone small screens
  • “Right rail” sidebars in journal-article pages
  • RSS feeds, and feed readers
  • In the Window Title of your web browser

None of these are going to carry along the context that appears on the printed page.

And sometimes our search criteria might prioritize words in a title, and the clever titles will not win that race to the top of the result page. This can particularly be a problem when we scan a result page looking only at the first two or three words in each result item: those words had better be signal, not noise.

When HighWire interviewed researchers asking what would help them work faster to take in the literature, one of the top suggestions was “better article titles.” The suggestion was that article titles should be more like declarative sentences than the “click bait” we see on a lot of blogs. (You’ve seen click bait all around: “The Three Reasons Your Spouse Is Going to Divorce You” or “911 Center Has Bed Bugs: Who do they Call?” That second one is not made up.)

A great blog post appeared recently from The Nolan Norton Group – they do superb evidence-based usability studies – reminding us of their “5 Tips for Writing Headlines that Convert” browsers to readers.   I recommend it as a refresher for those who write headlines, whether for the occasional editorial, a news piece, press releases or blog posts (yes, I did think more than twice about the headline for this post; several cute versions got the ax).

What has surprised you about scholarly publishing?

August is the “hinge month” in the academic calendar, a time just before we walk through the doorway to the next academic year. This might be a good time to take a look at what kinds of things have been surprising in our “industry” over the past years, before we have a new year of surprises!

I’m thinking about things that are surprising because

  • they happened faster than we would ever have guessed;
  • they happened slower than we had expected (or haven’t happened yet);
  • they turned out so differently from what the breathless hyperbole or naysayer comments had predicted.

The publishing industry can still surprise me. I start with a perspective from a university and a perspective as a platform supplier and a perspective from Silicon Valley (where everything is transformational, disruptive and instant of course – except when it dies on the vine). So I’ll list my surprises (plus some from colleagues), and hope that you will find this list thought-provoking enough to provide some surprises of your own in the comments.

What has happened faster than (I) expected:

  • How quickly journals went online in the 1995-1999 era
  • How quickly “publish ahead of print” spread across science journals
  • How rapidly Google (web search and Scholar) have become a dominant discovery path
  • How quickly COUNTER usage measures took hold (and then slowed down changes in UX that would shift usage)
  • How quickly DOIs have become (nearly) universal
  • How quickly commercial publishers took up Open Access models (though largely for new journals).

What has happened slower than (I!) expected:

  • How long it took libraries to cancel duplicate subscriptions in an IP-based online world;
  • How long it took basic science journals – at least those without advertising – to go online only;
  • The persistence of the concept of articles bundled into issues — that nearly all long-established journals still rely on “issues” – though many of the new journals use “continuous publishing” models;
  • How slowly authors are taking up ORCIDs;
  • How slowly standards for data are arriving — they are coming along, but are challenging for researchers and publishers;
  • How long it took for a “biomedical ArXiv” preprint server to fly, 15 years after Harold Varmus proposed “e-biomed”;
  • That courseware, coursepacks and textbooks have not much integrated with the online scholarly monograph or article.

What hasn’t happened at all that I expected… my surprises:

  • That online only journals still have cover images for “issues”.
  • At how little the article “container” has changed, though we hang a lot more off of it now. As a colleague noted, “We are very slow to lose the structure of print.. What makes us reluctant to consume knowledge that is unbound?”
  • At how much staying power the PDF has, given how many research objects don’t fit particularly well in it.

What has turned out differently from (others’) predictions, promises, or fears:

  • There was a prediction just a few years ago that PLOS One would expand – PAC-man style – to consume the tier of journals below Science, Nature and Cell. That hasn’t happened.
  • The prediction that a thousand flowers would bloom, and that journal brands would be irrelevant seems to have run into the power of brands and corporate consolidation. Journals persist, and large publishers are even larger.
  • Our ‘enthusiasm about trends and startups’ (as the same colleague noted), though so few have staying- or change-inducing power.

So why do some things happen faster or slower than expected, or seeming stall? The reasons are interesting, and may get into something essential about our work but somewhat hidden. For example, the spread of publish ahead of print has to do with editorial competition; it spread rapidly because it gives editors a competitive advantage in attracting authors, and once your competitor had it, you had to have it too!

Any surprises to add to my list? Do any of my surprises surprise you?