Evaluating an AI Tool Designed for Academic Researchers: Consensus

By now, we’ve all seen how poorly ChatGPT deals with academic research—especially with citations. ChatGPT is known to make up details that sound real but are false, even going so far as to create fake DOIs. But ChatGPT doesn’t represent all of AI, or even all of the subset of Generative AI. New AI tools are emerging to support academic researchers; we’ve evaluated one new tool—Consensus—that claims to use AI “to extract and distill findings directly from scientific research” (according to their website).


Playing in Consensus

Consensus uses “language models to surface and synthesize claims from academic research papers.” Users pose queries in the Consensus search bar to generate results with tags like “Highly Cited” and “Very Rigorous Journal”. There is also an option to synthesize, in which the AI “reads” several papers and provides for you a summary. You are also able to filter the search according to date and types of study (e.g., Case Report, Systematic Review, Meta Analysis).

My PhD intern Andrea Kampen and I are currently doing some work on research-creation, so we started our inquiry by asking Consensus, “What are the information dissemination methods in research-creation?” Instead of directly answering this question, Consensus produced a series of results about dissemination and health information, earth and space science, and other irrelevant fields. It seems that Consensus has been designed for researchers in STEM and business fields, rather than for researchers in the fine arts or humanities.

Even posting a health-focused question, “Does Ozempic have negative health outcomes?”, delivered more results that dealt with probability than with that prescription medication.

It may be that researchers in STEM and business disciplines may find Consensus to be helpful to find answers to questions that cover heavily trodden terrain, but Consensus doesn’t yet seem capable of replacing a literature review, nor does it yet seem to search in databases of humanities, fine arts, and social science research.

This is just my & Andrea’s subjective opinion, though; as researchers who don’t have expertise in AI, we wanted to get a more credible take on this tool. So, we interviewed three librarians who specialize in digital scholarship, knowledge synthesis, and digital literacy: Ekatarina Grgurić, Vanessa Kitchin, and Alex Alisauskas. They offered some words of consideration along with suggestions for how to integrate AI tools like Consensus in your research efforts (or not!).

Considerations

 AI selections: The librarians noted that it isn’t clear how this tool makes its selections. The Consensus website describes that the extraction model was trained on “tens of thousands of papers that have been annotated by PhDs”—though it isn’t clear what these annotations offered nor who the PhDs were. AI algorithms should be explained by the product developer; without such explanations, the process of how the selections are made the process is “black boxed.” Black boxes are particularly problematic when you’re trying to do rigorous, reproducible, and transparent research. To achieve good scholarship, you should be able to articulate how knowledge is built—and Consensus doesn’t enable you to do that well.

Corpus of articles: Consensus only searches a set collection of articles, which means that the results are bound to draw only from one particular pool of articles. Consensus has partnerships with several key datasets (Semantic Scholar, Corpus, CORE, SciScore), but that doesn’t mean that the journals in there are all of the same quality. The librarians pointed out that it is unclear if the results are drawn from extracts, just from the abstract, or from the article as a whole. Additionally, the tool will only be helpful to disciplines who are represented within these datasets.

Conformity: The librarians argued that an extension of using AI algorithms could be that people start writing papers in the formula or structure that will result in their papers being selected by the algorithm. This may lead to increased conformity in how research findings are communicated. As Helen Sword argues, “academics who always plan, research, and write to a template risk thinking to a template as well” (2012: 125). Will tools like Consensus be good for how science is communicated in the long term? The answer to that question isn’t yet clear.

Long-term thinking: Since Consensus is a proprietary tool, you can’t necessarily rely on it in the long term. At time of writing, it isn’t clear if Consensus allows for exporting formats so that access to the data generated is not tied to its platform.

Citation: Another interesting consideration presented by the librarians was that of citation. Where other AI tools like ChatGPT generate text based on a prompt, Consensus is drawing on published literature. So, when ChatGPT presents a citation that sounds like exactly what you’re looking for, it’s because it created just a fake citation just for you, but Consensus retrieves real, published results. When using a tool like Consensus that generates results with an AI algorithm, the results will differ week to week and even moment to moment. If you want to cite a particular Consensus search, then—until APA and MLA and the like catch up with the times—you may have to figure out for yourself how to cite your search and its results. Do you take a screenshot, host that on your own website, and cite that? Should you include specific timestamps of your search results? Do you just cite Consensus, or should you also cite the corpus of articles on which the tool was trained? There’s not yet agreed-upon guidance, so you’ll need to use your best judgment for now.

Support, not Solution

Tools like Consensus can be used to get a lay of the research landscape, but can’t be used to build on knowledge unless transparency is present. AI tools regurgitate existing language. From the knowledge synthesis perspective, researchers are striving to eliminate bias and, though elements of Consensus could be helpful, it will not put an end to bias.

However, Consensus can be helpful for efficiency. For example, Consensus will generate a summary of the papers for you by drawing out “snippets from papers related to your questions” (consensus.app). Doing so has a drawback, though, as you’d not be doing the work of making sense of the literature while you’re reviewing it.

Consensus could be particularly helpful for brainstorming or thinking through early-stage research questions. If integrated into a broader workflow, it could support the generative process of getting an overview of terminology and topics present within a broader research topic. An aim of using Consensus is to make published scholarship consumable, but it’s still up to you to assess if the consumption will be nourishing.

Leave a Reply