How to apply the Retrieval Mean Reciprocal Rank (MRR)? #989

celsofranssa · 2022-04-26T21:55:27Z

celsofranssa
Apr 26, 2022

Suppose that after a forward pass, the model outputs a prediction as shown below:

{
    "query_idx": tensor([0, 1, ..., 2]),
    "query_rpr": tensor([
        [0.1790, 0.4046, ..., 0.5882],
            ...
        [0.1207, 0.6405, ..., 0.0214]
    ]),
    "doc_idx": tensor([9, 5, ..., 7]),
    "doc_rpr": tensor([
        [0.290, 0.1045, ..., 0.8852],
            ...
        [0.774, 0.4056, ..., 0.1012]
    ])
}

where:

query_idx: is the query index;
query_rpr: is the query embedding;
doc_idx: is the document idx;
doc_rpr: is the document representation.

and the relevance map contains:

{ # query_idx -> (relevant) doc_idx
    0: 9,
    1: 5,
    3: 7,
}

Therefore, how can the MRR metric be applied in this scenario?

SkafteNicki · 2022-04-27T13:45:38Z

SkafteNicki
Apr 27, 2022
Maintainer

cc: @lucadiliello

0 replies

lucadiliello · 2022-04-27T19:31:35Z

lucadiliello
Apr 27, 2022
Maintainer

At the moment, you cannot because you don’t have scores. Your retrieval system is just embedding queries and documents but is not classifying document as relevant or not relevant.

I don’t know whether you encoded the queries and the documents together or separately, but something you can do is to use the cosine similarity between the embeddings to retrieve the scores:

scores = torch.nn.functional.cosine_similarity(output['query_rpr'], output['doc_rpr'], dim=1)

and to compute a vector of relevant documents as:

labels = torch.tensor([relevance_map[q_idx.item()] == d_idx for q_idx, d_idx in zip(output['query_idx'], output['doc_idx'])])

Finally, you can compute the MRR with:

mrr = RetrievalMRR()
result = mrr(scores, labels, output['query_idx'])

3 replies

celsofranssa Apr 27, 2022
Author

At the moment, you cannot because you don’t have scores. Your retrieval system is just embedding queries and documents but is not classifying document as relevant or not relevant.

I don’t know whether you encoded the queries and the documents together or separately, but something you can do is to use the cosine similarity between the embeddings to retrieve the scores:
scores = torch.nn.functional.cosine_similarity(output['query_rpr'], output['doc_rpr'], dim=1)
and to compute a vector of relevant documents as:
labels = torch.tensor([relevance_map[q_idx.item()] == d_idx for q_idx, d_idx in zip(output['query_idx'], output['doc_idx'])])
Finally, you can compute the MRR with:
mrr = RetrievalMRR()
result = mrr(scores, labels, output['query_idx'])

I did something similar, as shown below:

import pickle
import torch
from torchmetrics import Metric, RetrievalMRR

class MRRMetric(Metric):
    def __init__(self, params):
        super(MRRMetric, self).__init__()
        self.retrieval_mrr = RetrievalMRR()
        self._load_relevance_map(f"{params.relevance_map.dir}relevance_map.pkl")

    def _load_relevance_map(self, relevance_map_path):
        with open(relevance_map_path, "rb") as relevance_map_file:
            self.relevance_map = pickle.load(relevance_map_file)

    def similarities(self, x1, x2):
        """
        Calculates the cosine similarity matrix for every pair (i, j),
        where i is an embedding from x1 and j is another embedding from x2.

        :param x1: a tensors with shape [batch_size, hidden_size].
        :param x2: a tensors with shape [batch_size, hidden_size].
        :return: the cosine similarity matrix with shape [batch_size, batch_size].
        """
        x1 = x1 / torch.norm(x1, dim=1, p=2, keepdim=True)
        x2 = x2 / torch.norm(x2, dim=1, p=2, keepdim=True)
        return torch.matmul(x1, x2.t())

    def flatten(self, tensor):
        return torch.flatten(tensor)

    def update(self, query_idx, query_rpr, doc_idx, doc_rpr):
        pairs = torch.cartesian_prod(query_idx, doc_idx)
        target = torch.tensor([y.item() in self.relevance_map[x.item()] for x, y in pairs])
        scores = self.flatten(
            self.similarities(query_rpr, doc_rpr)
        )
        indexes = query_idx.repeat_interleave(query_idx.shape[0])
        self.retrieval_mrr.update(scores, target, indexes)

    def compute(self):
        return self.retrieval_mrr.compute()

However, it was super slow (probably because of the item() operation).

lucadiliello Apr 27, 2022
Maintainer

If .item() is the problem you may just do

def update(self, query_idx, query_rpr, doc_idx, doc_rpr):
    query_idx = query_idx.cpu().detach().tolist()
    doc_idx = doc_idx().detach().tolist()
    target = torch.tensor([d_id == self.relevance_map[d_id] for q_id in query_idx for d_id in doc_idx])
    scores = self.similarities(query_rpr, doc_rpr).flatten()
    indexes = torch.tensor([q_id for q_id in query_idx for d_id in doc_idx])
    self.retrieval_mrr.update(scores, target, indexes)

However, I suspect that the reason of the slow computation relies in the .compute() operation. How many documents and queries are you comparing?

celsofranssa Apr 27, 2022
Author

Doesn't any operations that move the tensors from GPU to CPU like item() as well detach(), and to list() slow the computation?
Only 510 queries and 490 documents (with batch_size=64).

celsofranssa · 2022-04-28T15:38:32Z

celsofranssa
Apr 28, 2022
Author

I've updated the MRRMetric class to avoid move tensors between devices. Now, it looks like this:

import pickle
import torch
from torchmetrics import Metric, RetrievalMRR

class MRRMetric(Metric):
    def __init__(self, params):
        super(MRRMetric, self).__init__()
        self.retrieval_mrr = RetrievalMRR()
        self._load_relevance_map(f"{params.relevance_map.dir}relevance_map.pkl")

    def _load_relevance_map(self, relevance_map_path):
        with open(relevance_map_path, "rb") as relevance_map_file:
            relevance_map = pickle.load(relevance_map_file)

        targets = []
        for qs1_idx in range(670):
            relevance = [False] * 670
            for qs2_idx in relevance_map.get(qs1_idx, []):
                relevance[qs2_idx] = True
            targets.append(relevance)
        self.targets = torch.tensor(targets)

    def similarities(self, x1, x2):
        """
        Calculates the cosine similarity matrix for every pair (i, j),
        where i is an embedding from x1 and j is another embedding from x2.

        :param x1: a tensors with shape [batch_size, hidden_size].
        :param x2: a tensors with shape [batch_size, hidden_size].
        :return: the cosine similarity matrix with shape [batch_size, batch_size].
        """
        x1 = x1 / torch.norm(x1, dim=1, p=2, keepdim=True)
        x2 = x2 / torch.norm(x2, dim=1, p=2, keepdim=True)
        return torch.matmul(x1, x2.t())

    def update(self, query_idx, query_rpr, doc_idx, doc_rpr):
        target = self.targets[:, doc_idx]
        target = target[query_idx, :]
        scores = self.similarities(query_rpr, doc_rpr)
        indexes = query_idx.repeat(query_idx.shape[0], 1)
        self.retrieval_mrr.update(scores, target, indexes)

    def compute(self):
        return self.retrieval_mrr.compute()

However, the slow computation remains during the validation step (when the metric is calculated).
In the figure below, the GPU usage was at 100% before the validation step started. After that, the GPU usage drops to 16%.

What am I doing wrong?

1 reply

lucadiliello May 16, 2022
Maintainer

It is slow because under the hood the metric is grouping scores based on the query. To speed up things, you may compute the metric separately for each query with something like from torchmetrics.functional.retrieval.reciprocal_rank import retrieval_reciprocal_rank and then take the average by yourself.

celsofranssa · 2022-08-13T15:10:07Z

celsofranssa
Aug 13, 2022
Author

Is there an example where this MRR metric works in a large number of queries?

2 replies

lucadiliello Aug 13, 2022
Maintainer

If you have a very large number of queries, there are 2 possible scenario:

You have the candidate scores separated for each query
All the scores are in the same vector and a vector with query ids are used to understand which score belongs to which query

In the first case, you should have the data like in the following:

predictions = [
    {'query_id': 0, 'scores': [0.5, 0.7, 0.1, 0.0, -1.1], 'labels': [1, 1, 0, 0, 1] },
    {'query_id': 1, 'scores': [0.8, 0.1, 0.2, 0.3, 0.3, 0.6], 'labels': [0, 0, 0, 0, 1, 0] },
    {'query_id': 2, 'scores': [-0.3, 0.4, 0.8], 'labels': [0, 1, 1] },
    ...
]

And the preferred way is to compute MRR separately for each query and then take the average. Something like:

from torchmetrics.functional.retrieval import retrieval_reciprocal_rank
all_reciprocal_ranks = [retrieval_reciprocal_rank(pred['scores'], pred['labels']) for pred in predictions]
mrr = torch.mean(torch.stack(all_reciprocal_ranks))

In the second case, you have usually something like

query_ids = [0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 2, 2, 2]
scores = [0.5, 0.7, 0.1, 0.0, -1.1, 0.8, 0.1, 0.2, 0.3, 0.3, 0.6, -0.3, 0.4, 0.8]
labels = [1, 1, 0, 0, 1, 0, 0, 0, 0, 1, 0, 0, 1, 1]
# notice those lists are just the concatenation of the above

and you should use the RetrievalMRR class to compute mrr. This because the module implementation takes care of dividing results based on the query id (and is usually slower).

from torchmetrics.retrieval import RetrievalMRR
mrr_metric = RetrievalMRR()
mrr = mrr_metric(scores, labels, query_ids)

Untested code.
Hope it will help!

lucadiliello Aug 13, 2022
Maintainer

Generally, the faster way is using the functional interface.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How to apply the Retrieval Mean Reciprocal Rank (MRR)? #989

{{title}}

Replies: 4 comments 6 replies

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Select a reply

How to apply the Retrieval Mean Reciprocal Rank (MRR)? #989

celsofranssa Apr 26, 2022

Replies: 4 comments · 6 replies

SkafteNicki Apr 27, 2022 Maintainer

lucadiliello Apr 27, 2022 Maintainer

celsofranssa Apr 27, 2022 Author

lucadiliello Apr 27, 2022 Maintainer

celsofranssa Apr 27, 2022 Author

celsofranssa Apr 28, 2022 Author

lucadiliello May 16, 2022 Maintainer

celsofranssa Aug 13, 2022 Author

lucadiliello Aug 13, 2022 Maintainer

lucadiliello Aug 13, 2022 Maintainer

celsofranssa
Apr 26, 2022

Replies: 4 comments 6 replies

SkafteNicki
Apr 27, 2022
Maintainer

lucadiliello
Apr 27, 2022
Maintainer

celsofranssa Apr 27, 2022
Author

lucadiliello Apr 27, 2022
Maintainer

celsofranssa Apr 27, 2022
Author

celsofranssa
Apr 28, 2022
Author

lucadiliello May 16, 2022
Maintainer

celsofranssa
Aug 13, 2022
Author

lucadiliello Aug 13, 2022
Maintainer

lucadiliello Aug 13, 2022
Maintainer