A Note on Independent Peer Review

The recently published article by Smalheiser and Bonifield, "Two Similarity Metrics for Medical Subject Headings (MeSH): An Aid to Biomedical Text Mining and Author Name Disambiguation", vol. 7 (e1), received external, independent peer review. Elizabeth Workman, MLIS, PhD graciously agreed to serve as an expert Handling Editor, and arranged for two anonymous reviews. The reviews, and the authors' response to them, are appended below. These safeguards ensure that DISCO publishes articles of the highest quality and follows current standards of transparency and peer review. Similar steps will be followed whenever there is a real or apparent conflict of interest between the journal editors and a submitted article.

Two Similarity Metrics for Medical Subject Headings (MeSH): An aid to biomedical text mining and author name disambiguation Authors: Neil R. Smalheiser and Gary Bonifield

This file contains the anonymous reviews of the initially submitted article, as well as the author responses that discuss the revisions made.  This is appended here with the permission of the reviewers and Handling Editor.

Reviewer #1:

This paper computes similarity metrics on MEDLINE citations using the MeSH indexing terms assigned to them. The authors compute two metrics: the first is based co-occurring terms in individual citations, while the second is based on the citations produced by a given author. The first metric is not entirely new and the authors put it in context, indicating differences from previous work. The main reason for providing this metric would appear to be to contrast it with the author-based metric, which is novel and is motivated by previous work by the first author to disambiguate author name. The authors contrast the results from the first and second metrics and point out some interesting characteristics of the second metric, in addition to speculating on further research that it may underpin. The computation on which the metrics are based appears to be sound, and the authors present a well written account of the subject they address. This paper is likely to be of interest to readers of this journal with similar interests. I recommend it be published as is.

Reviewer #2:

This paper discusses the application of metrics to measure the likelihood of co-occurrence of MeSH indexing terms within the same article (article-based metrics) or for authors publishing in one given journal (author-based metrics). This is an extension on a previous paper by one of the authors that applied (variations of) these metrics on differing articles and on biomedical journals respectively.

The paper is topically relevant and shows adequate literary quality and scholarship. The metrics described are not novel although their application is. The work needs some clarification and teasing out important concepts on which some premises and deductions are based to have the expected scientific rigor. Additionally, the motivation for the second metric –author-based metric—is rather weak.

Format/punctuation errors:

1. Page 4: At the end of first sentence under Methods, please substitute a comma for the period after MEDLINE.

2. There is no blank line in between references 2 and 3.

Content issues:

3. Authors typically build upon previous work when developing a theory on enlarging upon earlier findings. Thus, mention of previous work as a summarized narrative within the text is to be expected, along with the titles of earlier /related work included in bibliographic references. These additional mentions would cause a distortion in the number of times a pair of MeSH terms occurs within the articles by the same author. It could also have a bearing—at least in part—on the much higher number of pairs calculated in the author metric than that in the article metric. Clarification on this is missing and would be helpful. Related to the above, it’d seem relevant to state whether the bibliographic references in the full articles have been included in calculating the metrics. If they have, then it quite possible that co-occurrence measure has may be artificially inflated. If references were included, what has been done to compensate for this distortion?

4. Page 7: It is stated that ‘the article-based metric corresponded well to human ratings of semantic relatedness’ based on mapping to the corresponding MeSH term pairs (as far a possible) the compiled list of 29 UMLS concept pairs presented by Pedersen et al.’s, annotated for semantic similarity. This mapping raises some questions. First, do all of the 29 UMLS concept pairs correspond to MeSH terms? Secondly, how were 29 concept pairs mapped to thousands of MeSH term pairs? Or is the claim on Page 7 based on how well the article-based metric did only on the 29 pairs? And considering that 7 of the 29 UMLS concept pairs had no co-occurrences within the same article (top of page 8), this claim is based on the mapping of 22 concept pairs.

This requires more clarification, as Pedersen et al. measured semantic similarity, which may not be the same as semantic relatedness for one given topic based on a co-occurrence metric. Co-occurrence (which is what the article-based metric measures, as indicated on page 7) is not equivalent to semantic similarity. As many of this paper’s sources state, semantic relatedness is more general than semantic similarity. The latter would imply a likeness between two terms. This likeness may or may not be present when dealing with co-occurrence between two terms in a given article. The authors seem to use these concepts interchangeably although they’re quite different. This may have an effect on their claim on Page 7, quoted at the start of this point (point 3).

Further, since “only those MeSH terms appearing in at least 25 articles were considered in calculating term similarity measures,” (Page 4), it is quite conceivable that perhaps some of those 7 UMLS concept pairs with no occurrence within the same article did not appear in at least 25 articles (but did appear in, say, 22 articles) and were thus not considered for the metric. So an article-based similarity score of 0 may not be tell the whole story, especially since all of these seven ‘outliers’ had author-based co-occurrences with odds ratios greater than zero. The authors then state that “this may suggest that the author-based metric is more sensitive in detecting indirect similarities.” Wouldn’t this rather be “relatedness”? And why would this be ‘indirect’? Is it because these co-occurrences are not found in the same articles? With respect to the smoothing effect (Page 8), if an author has published 7 articles and each has 8 MeSH terms, potentially there is a pool of 56 MeSH terms to be considered pairwise, compared to only 8 MeSH terms for each article. In reality, this potentiality may never be realized, as many authors often tend to publish on topics are closely related—if not the same ones.

5. As mentioned, both this paper and D’Souza and Smalheiser’s (2014) use the author-based metric. So it would not be accurate to describe this metric as ‘novel,’ as described in the abstract. It’d seem that what is novel is the objective for which it was used, and this is brought up towards the very end of the paper (page 12). Given that the two papers discussed use many of the same resources, the difference in the use of the author-based metric should be mentioned earlier on, as well as the way in which the use of this metric is novel (though the metric itself is not).

On discussing the interpretation of the differing results between the two metrics (Page 9), the authors fail to mention the fact that the two metrics do not cover exactly the same data: the author-based metric was applied to MEDLINE and not-MEDLINE PubMed records (Page 4). Not-MEDLINE articles may focus less on (bio)medical issues than MEDLINE articles, thus articles on tennis talked more about tennis-related disorders, and authors who wrote about tennis also wrote about a variety of other sports. What confirms the validity of the metrics is that they measure these differences. However the differences themselves are a factor of the coverage of the differences in the data sets.

6. One would wonder about the usefulness of the author-based metric in modeling author disambiguation. Why would this feature be desirable? Could the authors provide one or two case scenarios in which this would be of value? The motivation for the author-based metric is rather weak.

7. The web query interface for users to retrieve related Mesh terms for any specified MeSH term can be a valuable tool for information retrieval and data mining.

 

Authors’ response and detailed description of revisions:

I appreciate the comments and suggestions, and have made the following clarifications and changes (see responses interspersed).

This paper discusses the application of metrics to measure the likelihood of co-occurrence of MeSH indexing terms within the same article (article-based metrics) or for authors publishing in one given journal (author-based metrics). This is an extension on a previous paper by one of the authors that applied (variations of) these metrics on differing articles and on biomedical journals respectively.

The paper is topically relevant and shows adequate literary quality and scholarship. The metrics described are not novel although their application is. The work needs some clarification and teasing out important concepts on which some premises and deductions are based to have the expected scientific rigor. Additionally, the motivation for the second metric –author-based metric—is rather weak.

Format/punctuation errors:

  1. Page 4: At the end of first sentence under Methods, please substitute a comma for the period after MEDLINE.

DONE.

  1. There is no blank line in between references 2 and 3.

DONE.

Content issues:

  1. Authors typically build upon previous work when developing a theory on enlarging upon earlier findings. Thus, mention of previous work as a summarized narrative within the text is to be expected, along with the titles of earlier /related work included in bibliographic references. These additional mentions would cause a distortion in the number of times a pair of MeSH terms occurs within the articles by the same author. It could also have a bearing—at least in part—on the much higher number of pairs calculated in the author metric than that in the article metric. Clarification on this is missing and would be helpful. Related to the above, it’d seem relevant to state whether the bibliographic references in the full articles have been included in calculating the metrics. If they have, then it quite possible that co-occurrence measure has may be artificially inflated. If references were included, what has been done to compensate for this distortion?

These comments appear to be based on the notion that we mapped MeSH terms from text instances within the articles. We have now clarified in Methods that we simply extracted the MeSH terms that were indexed as part of the article’s MEDLINE record.

4. Page 7: It is stated that ‘the article-based metric corresponded well to human ratings of semantic relatedness’ based on mapping to the corresponding MeSH term pairs (as far a possible) the compiled list of 29 UMLS concept pairs presented by Pedersen et al.’s, annotated for semantic similarity. This mapping raises some questions. First, do all of the 29 UMLS concept pairs correspond to MeSH terms?

As shown in Table 1, 27 of the 29 concept pairs corresponded to MeSH terms.

Secondly, how were 29 concept pairs mapped to thousands of MeSH term pairs? Or is the claim on Page 7 based on how well the article-based metric did only on the 29 pairs?

We removed a sentence and added a sentence on p. 7, both to clarify that co-occurrence may indicate either semantic relatedness or semantic similarity, and to clarify that the claim relates to the Pedersen set shown in Table 1 (comprising 27 MeSH pairs).

And considering that 7 of the 29 UMLS concept pairs had no co-occurrences within the same article (top of page 8), this claim is based on the mapping of 22 concept pairs.

This is not correct. An article odds ratio of 0 contributes to the correlation; it is not missing data.

This requires more clarification, as Pedersen et al. measured semantic similarity, which may not be the same as semantic relatedness for one given topic based on a co-occurrence metric. Co-occurrence (which is what the article-based metric measures, as indicated on page 7) is not equivalent to semantic similarity. As many of this paper’s sources state, semantic relatedness is more general than semantic similarity. The latter would imply a likeness between two terms. This likeness may or may not be present when dealing with co-occurrence between two terms in a given article. The authors seem to use these concepts interchangeably although they’re quite different. This may have an effect on their claim on Page 7, quoted at the start of this point (point 3).

We agree with the reviewer in the meaning and distinction of semantic relatedness vs. similarity. Hopefully, the modified text is more clear and satisfactory now.

Further, since “only those MeSH terms appearing in at least 25 articles were considered in calculating term similarity measures,” (Page 4), it is quite conceivable that perhaps some of those 7 UMLS concept pairs with no occurrence within the same article did not appear in at least 25 articles (but did appear in, say, 22 articles) and were thus not considered for the metric.

All of the MeSH pairs in Table 1 appeared in over 25 articles. We have added a phrase to Methods clarifying that calculating an odds ratio at all requires that both MeSH terms in the pair appeared in over 25 articles.  

So an article-based similarity score of 0 may not be tell the whole story, especially since all of these seven ‘outliers’ had author-based co-occurrences with odds ratios greater than zero. The authors then state that “this may suggest that the author-based metric is more sensitive in detecting indirect similarities.” Wouldn’t this rather be “relatedness”? And why would this be ‘indirect’? Is it because these co-occurrences are not found in the same articles?

We rephrased the text to clarify this: Seven of the 29 MeSH pairs in Table 1 had no co-occurrences at all within the same article (and hence have article-based similarity scores of 0), yet all of these had author-based co-occurrences such that the odds ratios were greater than zero. This may suggest that the author-based metric is more sensitive in detecting weak relationships.

With respect to the smoothing effect (Page 8), if an author has published 7 articles and each has 8 MeSH terms, potentially there is a pool of 56 MeSH terms to be considered pairwise, compared to only 8 MeSH terms for each article. In reality, this potentiality may never be realized, as many authors often tend to publish on topics are closely related—if not the same ones.

The reviewer is correct. However, we do empirically detect a smoothing effect generally over the dataset comparing article odds ratio vs. author odds ratio, and that is partially due to the larger potential pool of terms, so the statement as written is not misleading.

5. As mentioned, both this paper and D’Souza and Smalheiser’s (2014) use the author-based metric. So it would not be accurate to describe this metric as ‘novel,’ as described in the abstract. It’d seem that what is novel is the objective for which it was used, and this is brought up towards the very end of the paper (page 12). Given that the two papers discussed use many of the same resources, the difference in the use of the author-based metric should be mentioned earlier on, as well as the way in which the use of this metric is novel (though the metric itself is not).

Thanks for the opportunity to clarify this critical point. The author-based metric described in the present paper relates any two MeSH TERMS according to how likely they are to appear in the articles written by the same author. The author-based metric in D’Souza and Smalheiser was totally different: This relates any two JOURNALS according to how likely they are to appear in the articles written by the same author. We have now clarified this in the Methods section.

On discussing the interpretation of the differing results between the two metrics (Page 9), the authors fail to mention the fact that the two metrics do not cover exactly the same data: the author-based metric was applied to MEDLINE and not-MEDLINE PubMed records (Page 4). Not-MEDLINE articles may focus less on (bio)medical issues than MEDLINE articles, thus articles on tennis talked more about tennis-related disorders, and authors who wrote about tennis also wrote about a variety of other sports. What confirms the validity of the metrics is that they measure these differences. However the differences themselves are a factor of the coverage of the differences in the data sets.

There are some slight differences in the author-based vs. article based datasets, because the former is based on Author-ity whose current version draws on 1966-2009, whereas the article dataset draws on 1966-2014. However, MeSH terms are only assigned to MEDLINE articles -- even if PubMed contains not-MEDLINE articles too, they do not contribute to the extracted MeSH terms at all.  We have now clarified this in the Methods section.

6. One would wonder about the usefulness of the author-based metric in modeling author disambiguation. Why would this feature be desirable? Could the authors provide one or two case scenarios in which this would be of value? The motivation for the author-based metric is rather weak.

We have now expanded the motivation and given a specific example in the text.

7. The web query interface for users to retrieve related Mesh terms for any specified MeSH term can be a valuable tool for information retrieval and data mining.

Thanks.