Selectivity and Cardinality

  • Selectivity(選択率)とCardinality(濃度)
    • Selectivity, Cardinality のそれぞれの意味はわかるが
    • "選択率が高い・低い"がよくわからなくなるのでメモ

Selectivity of index = cardinality/(number of rows) * 100%

  • 男女の性別を保存するカラム(Sex)を考えた時,その値は 'Male', 'Female' の二値しか取らないとする.レコードが一万行あるとき,Cardinality が '2' のとき,Selectivity は '0.02%'(極めて低い)

A low selectivity basically means there is not a lot of variation in the values in a column – that there is not a lot of possibilities for the values of a column. Suppose, using the example table that we discussed earlier, that we want to find the names of all the females in the table.

Database query optimizers have to make a decision about whether it would actually make sense to either use the index to find certain rows in a table or to not use the index. This is because there are times when using the index is actually less efficient than just directly scanning the table itself. This is something that you should remember: even if a column has an index created for it, that does not mean the index will always be used, because scanning the table directly without going through the index first could be a better, more efficient, option.

  • High / Low Selectivity
    • High: 少ない行しか選択されない => インデックスが有効
    • Low: 多くの行が選択される => テーブルスキャンが有効(インデックスがあまり働かない)(例:性別など)
  • High/ Low Cardinality
    • High: あるカラムについて,値のバリエーションが多い
    • Low: あるカラムについて,値のバリエーションが少ない(例: 性別など),Hash Join などが有効(ハッシュ値が衝突する可能性が低い)