Lemma
math, backwards

document frequency

ko · counterpart 문서빈도 (df)

`df(t)` = the number of documents in the corpus that contain term `t` at least once. Binary presence per doc (a doc with the term ten times still counts as 1). Distinct from _collection frequency_ (total occurrences across the corpus). IDF is built on `df`, not collection frequency, because "appears at all" is the signal a query cares about.

related
used on · 1