Think about how massive the amount of new information being created today is. Researchers worldwide continue to produce publications, findings and new theories every minute. The old library model (however big) has morphed into an endless, continuously growing digital realm. Researchers are now faced with difficult questions of how to find just one solitary piece (droplet) of information about their research topic in this ocean of information. As we can see, the modern research problem does not lie in how many more hours one can spend searching manually, but rather in how to work together with Artificial Intelligence (AI – Data/Knowledge Expert). The quest for finding the related web papers is no longer just a simple search, but more like a complex excavation of data, where AI becomes the researcher’s coach or mentor throughout the process. It’s not that we are trying to replace the curiosity of researchers, but rather give researchers the tools to empower their curiosity and transform the overwhelming amount of irrelevant sound into one cohesive signal. Researchers rely upon artificial intelligence (AI) as a tool for identifying specific web papers because AI fundamentally alters our relationship with the concept of “information.” Previously we would search for documents (e.g., research articles), but today we search for patterns, relationships, and insights that may be hidden within the literature; insights that might not be obvious even after a human has spent their entire life reviewing through reading titles and abstracts associated with the same literature.
The Web Papers Avalanche
It’s clear that traditional methods simply can’t manage the volume of information available through the internet. A researcher in a niche area, such as “crispr gene editing in non-model plants,” will be faced with thousands of possible web papers (from one year alone) spread across publisher sites; institutional repositories; preprint servers, like arXiv and bioRxiv; and specialized databases. The researcher may do a keyword search on one platform and receive a mountain of results, many of which are only loosely related to the topic at hand. The first “pillar” of trust in A.I. is the enormous power it has to sift through this huge volume of data. A.I. tools do not tire and therefore can research millions of web papers in seconds by parsing not only by keyword but also by full text, references, and metadata. This is more than just speed; it is about depth. Humans would do research in five databases. An AI system that has been carefully designed to assist researchers may yield as many as fifty other sources that would have otherwise gone undiscovered by the researcher, along with potentially valuable web papers that would be hidden in less-visited areas of the Internet. By using a methodical approach to searching for online resources and examining existing information sources thoroughly, AI builds the foundation of confidence for researchers through its ability to search every conceivable source so as to provide them with the maximum support possible for their research projects.
However, it’s not just about the volume of data. The real beauty is in understanding context. Old-style search engines viewed web pages as collections of words (buckets), while Artificial Intelligence (especially through natural language processing), can read a web page in a similar way to what an experienced person would read a web page (i.e., how it is written and its meaning, as well as the specific intent of the author). This leads into the second reason why you can trust NLP, which may be the more significant reason.
Intelligent Context, Not Just Keywords
I’m sure you know the feeling. You’re looking up a term like “machine learning in climate models,” and you get hundreds of thousands of matched web papers. But most will only have the words of your search term written in the passage. And almost always the site you need to see, in order to find the relevant, conceptually aligned work, will be at the bottom of page three (and sometimes beyond) of all the results. AI completely alters this scenario. Using sophisticated models of semantic relationships in a query and in each of the millions of web papers; modern discovery tools can identify that the word “cell” refers to a biological topic and not a battery topic. They also know that “deep learning” and “neural networks” are very similar topics and will identify results for both, even if “neural networks” is not present in your search. This semantic knowledge allows researchers to access web papers based on conceptual compatibility versus solely based on exact matches of words within the text. It’s the difference between looking up a word in a dictionary and having an actual conversation with a trusted friend who knows the answer.
This smart system will help you learn what type of researcher you are due to all of your interactions with the system, like when you read, save or cite different web papers. Based on this data, the AI will build a profile on you and will be able to provide you with unique content that you will find valuable based on your current research needs. You will receive additional research papers that relate to your research interests before many researchers find the information valuable and cite it in their own research. This proactive and personalized curated collection of web papers provides a new type of relationship with your discoveries; they do not feel like a tool for you to utilize but rather as a student and a researcher, you receive a very personalized and dedicated research assistant that understands your objectives and can search the global web of web papers with the goal of providing you with high-quality candidates for you to review.
Mapping the Invisible Connections
The potential for AI to reveal how science works is an exciting reason that researchers trust AI. Each paper on the web serves as a node of meaning, but much of the real meaning may be found in between the nodes—across citation networks, across common methods, and through similar ideas in many areas of study. Humans are capable of excellent deep and linear reading, but they often struggle to perceive the enormous non-linear relationship between all of these nodes. That is where AI excels—by examining millions of documents and analysing how they are interconnected by their bibliographic references, AI can find foundational documents, current trends, and unexpected paths between unrelated disciplines.
Envision reading a groundbreaking article published on the internet ten years ago. A system powered by artificial intelligence generates a geographical overview of each person who has referenced that document (forward referencing), as well as how they are connected to each other, in a matter of seconds. This system also organizes references based on similarity in theme providing an overview of how an original idea has developed into a range of sub-disciplines/fields of study. The ability to utilize such a platform will enable exploratory discovery. For example, if one were to start from a familiar article or reference and then follow an artificial intelligence-generated link, one might ultimately find that a revolutionary computer science technique was being applied to solve problems in materials science, as documented in the literature of these respective fields that they would never have searched themselves. Instead of being a miner digging in one place, a researcher becomes an explorer whose map of knowledge is constantly updating and changing through a dynamic interaction with AI; therefore, as trustworthiness increases, researchers use AI to help expand their own mental horizons, revealing connections and paths that might previously be invisible to them.
Overcoming the Filter Bubble Fear
A critical thinker might ask an important question: Isn’t creating this personalized experience going to create an academic filter bubble? If the AI is showing me web papers that are consistent with my existing views, won’t there be less opportunity for me to discover new ideas or new research that I might disagree with? The answer is YES. The best AI tools will try to mitigate this filtering effect through a more intelligent approach, and this will become part of what researchers see as being trustworthy. Researchers trust systems that have both accuracy and randomness in what they return. Advanced algorithms include “controlled surprise” from the application of algorithms to captured patterns of your behaviours while doing research.
For example, an algorithm may be programmed to recommend a highly significant web paper in a related area that shares no keywords, but the underlying mathematical models or conceptual frameworks might be identical. The aim of this is to produce an intelligent and ever-expanding universe of literature that is of relevance to the user, not to create an echo chamber. The trust comes from transparency and control. The “novelty” dial can be adjusted by those doing research. They will also be able to find the related maps manually or through the app and can always see the rationale behind a recommendation (ie, “this was recommended to you because it’s related to 80% of your core references”). This instilled a sense of control in users so that AI continues to serve as a device for expanding view rather than contracting them or creating a lack of trust in the information discovered. Discovering a web document that challenges a previous viewpoint using an AI-suggested connection could be the most affirmation a user could receive when utilizing AI-based recommendations.
The Human-AI Partnership in Discovery
The trust between researchers and AI is not about using top level professionals to blindly use AI to get results but instead developing a professional relationship based on what each participant can offer as separate but equal partners. Researchers provide their expertise, analytical abilities and imagination to ask valid or even humorous queries. AI supports researchers in returning information, finding patterns, retrieving large amounts of information and evaluating and analyzing multiple forms of data simultaneously.
The process of this exchange between researchers and AI is conversational. The researcher develops a thought (not fully conformed) and asks the AI system for assistance in providing literature to support the completion of their thought process (AI responds by providing scholarly literature/catalogued information related to the initially requested information). The researcher reviews the response provided by the AI for both gaps and alternate angles to their inquiry. They then refine their original inquiry before sending it back to the AI. Upon returning to the original query, the AI scans through its data base for responses (results) that correlate with the researchers current needs. This type of spiral cycle allowed the research process to be completed much quicker, transforming it from a process that might take several months to complete into an ongoing/frequent interaction between the researcher and the AI. A website paper’s merit will have been critically evaluated, the last synthesis will have been conducted and a new hypothesis will have had a definitive spark of creativity by a human being. The only reason for which AI will complement the human brain is to make sure it is using an optimal map of the totality of existing human knowledge in the form of the totality of web papers that have been compiled together as an intelligently created collection of web papers in existence at one time.
Why do scientists rely on AI to help find papers on the web? For a number of reasons, including greatly reducing the problem of finding large amounts of information. The most important reason to find papers online is to make the process of doing science more intellectual. AI provides a way for researchers to navigate through complex information and see the relationships between various concepts, which can result in researchers spending less time searching for information and more time creating information. Unlike in the vast digital library of the web, AI has become an essential librarian, cartographer and, scout, helping transform the unmanageable mass of information into an orderly stream of knowledge that helps guide researchers to the information they desire and perhaps even, more importantly, the information they didn’t know that they need.
