Microsoft Research: Generative Retrieval For Ranking Answers





Microsoft Research published details on a new conversational question answering model that may point to the future of web search

Microsoft announced a new conversational question answering model that outperforms other methods, answering questions faster and accurately while using significantly less resources.

What is proposed is a new way to rank passages from content using what they call Generative Retrieval For Conversational Question Answering, which they named GCoQA.

The researchers write that the next direction to take is exploring how to use it for general web search.

https://1674615d5b61ca1aa6a2a2bd73c9b240.safeframe.googlesyndication.com/safeframe/1-0-40/html/container.html?upapi=true

Generative Retrieval For Conversational Question Answering

An autoregressive language model predicts what the next word or phrase is.

This model uses autoregressive models that use “identifier strings” which in plain English are representations of passages in a document.

In this implementation, they use the page title (to identify what the page is about) and section titles (to identify what a passage of the text is about).

The experiment was carried out on Wikipedia data, where the page titles and section titles can be relied upon to be descriptive.

They are used to identify the topic of a document and the topic of the passages contained in a section of the document.

https://1674615d5b61ca1aa6a2a2bd73c9b240.safeframe.googlesyndication.com/safeframe/1-0-40/html/container.html?upapi=true

So it’s kind of like, if used in the real world, using the title element to learn what a webpage is about and the headings to understand what the sections of a webpage are about.

The “identifiers” are a way to encode all of that knowledge as a representation, which is mapped to the passages on the webpage and the titles.

The passages that are retrieved are later put into another autoregressive model in order to generate the answers to questions.

Generative Retrieval

For the retrieval part, the research paper says the model uses a technique called “beam search” to generate identifiers (representations of passages from the webpage) that are then ranked in order of the likelihood of being the answer.

Bring More Clients to Your Door
Expand your digital footprint, monitor online reviews, and improve your business’s local rankings all in one platform—with Semrush

Try It Free

ADVERTISEMENT

The researchers write:

“…we utilize beam search… a commonly-used technique, to generate multiple identifiers instead of just one.

Each generated identifier is assigned a language model score, enabling us to obtain a ranking list of generated identifiers based on these scores.

The ranking identifiers could naturally correspond to a ranking list of passages.”

The research paper then goes on to say that the process could be seen as a “hierarchical search.”

Hierarchical, in this scenario, means ordering the results first by page topic and then by the passages within the page (using the section headings).

Once those passages are retrieved, another autoregressive model generates the answer based on the retrieved passages.

Comparison With Other Methods

The researchers found that GCoQA outperformed many other commonly used methods that they compared it against.

It was useful for overcoming limitations (bottlenecks) in other methods.

In many ways, this new model promises to bring a profound change to conversational question answering.

For example, it uses 1/10th the amount of memory resources than current models, which is a huge leap in efficiency, plus it’s faster.

The researchers write:

“…it becomes more convenient and efficient to apply our method in practice.”

https://1674615d5b61ca1aa6a2a2bd73c9b240.safeframe.googlesyndication.com/safeframe/1-0-40/html/container.html?upapi=true

D

The Microsoft researchers later conclude:

“Benefiting from fine-grained cross-interactions in the decoder module, GCoQA could attend to the conversation context more effectively.

Additionally, GCoQA has lower memory consumption and higher inference efficiency in practice.”

Limitations Of GCoQA

However, there are several limitations that need solving before this model can be applied.

They found that GCoQA had limitations due to the use of the “beam search” technique, which limited the ability of GCoQA to recall “large-scale passages.”

Increasing the beam size didn’t help matters either, as it slowed the model down.

https://1674615d5b61ca1aa6a2a2bd73c9b240.safeframe.googlesyndication.com/safeframe/1-0-40/html/container.html?upapi=true

Another limitation is that while Wikipedia is reliable about using headings in a meaningful way.

But using it on webpages outside of Wikipedia could cause the model to run into a stumbling block.

Many webpages on the Internet do a poor job of using their section headings to accurately denote what a passage is about (which is what SEOs and publishers are supposed to be doing).

The research paper observes:

“The generalizability of GCoQA is a legitimate concern.

GCoQA heavily relies on the semantic relationship between the question and the passage identifiers for retrieving relevant passages.

While GCoQA has been evaluated using three academic datasets, its effectiveness in real-world scenarios, where questions are often ambiguous and challenging to match with the identifiers, remains uncertain and requires further investigation.”

GCoQA Is A Promising New Technology

Ultimately, the researchers stated that the performance gains are a strong win. The limitations are something that need to be worked through.

The research paper concludes that there are two promising areas to continue studying:

“(1) investigating the use of generative retrieval in more general Web search scenarios where identifiers are not directly available from titles; and (2) examining the integration of passage retrieval and answer prediction within a single, generative model in order to better understand their internal relationships.”

Value Of GCoQA

The research paper (Generative Retrieval for Conversational Question Answering) was published on GitHub by one of the research scientists.

Visit that GitHub page to find the link to the PDF.

As sometimes happens, research papers have a way of disappearing behind a paywall, so there’s no guarantee that it will still be available in the future.

GCoQA may not be coming soon to a search engine.

The value of GCoQA is that it shows how researchers are working to discover ways to use generative models to transform web search as we know it today.

This could be a preview of what the search engines of the relatively near future may look like.

Related Posts

What employers can learn from the OpenAI drama: Employment & Labor Insider

On November 17, OpenAI, the leading artificial intelligence company behind ChatGPT, announced that it had removed Sam Altman as the company’s CEO. Mr. Altman has been a…

Will My Homeowner’s Insurance Cover Working From Home?

Will My Homeowner’s Insurance Cover Working From Home?

Working from home has become more common over the years, allowing workers to save time and money with a home office. Flexibility and work–life balance also make…

NLRB issues “joint employer” regulations: Employment & Labor Insider

On October 26, the National Labor Relations Board, by a 3 to 1 vote, issued regulations with a new standard for determining “joint employer” status under the…

Top 9 Luxury Stays for 2024

South Africa is a top vacation destination for anyone looking to relax with some incredible natural beauty, good food, and top-quality accommodations. This country is teeming with…

How to Become a Lighting Technician – Career Sidekick

Step 1: Complete Your Education A high school diploma or equivalent is typically the minimum educational requirement to enter this field. While in high school, you can…

The Importance of Setting Career Goals and How to Achieve Them

The Importance of Setting Career Goals and How to Achieve Them

Setting career goals is crucial for personal and professional growth. By having clear objectives in mind, individuals can stay focused, motivated, and make informed decisions about their…

Leave a Reply

Your email address will not be published. Required fields are marked *