What are similarity scores?

  • Updated
Download Icon Download

A similarity search for substances results in a set of candidates that are most similar to your query structure, grouped by similarity score range. Similarity scores are based on a two-dimensional small molecule comparison using a Tanimoto similarity metric.

The Tanimoto metric assigns a score based on CAS structure descriptors as follows:

 

Score = (100 * C)/((QS + FS) - C)

 

Where:

C = number of descriptors that the query and result set structures have in common

QS = number of descriptors in the query structure

FS = number of descriptors in the result set structure

Structure Descriptors

Substance similarity scores are computed based on these kinds of structure descriptors:

  • Atom count
  • Ring count
  • Atom sequence
  • Bond sequence
  • Augmented atoms
  • Degree of connectivity
  • Element composition
  • Type of ring

Scoring of Related Structures

Structure descriptors do not include data on stereo, or isotopic labeling, hydrogen atoms (with the exception of charged hydrogen), or charges on non-hydrogen atoms, so similarity scores are identical for structures that differ only by those structural features.

Multi-Component Substances

Each component in a multi-component substance is assigned a score when compared to the search query. The highest score assigned to any of the components is used as the substance score.

Back to Filter Substances by Structure Match