In statistics, Hájek projection of a random variable T on a set of independent random vectors \( X_{1},\dots ,X_{n} \) is a particular measurable function of \( X_{1},\dots ,X_{n} \)that, loosely speaking, captures the variation of T in an optimal way. It is named after the Czech statistician Jaroslav Hájek .


Given a random variable T and a set of independent random vectors \( X_{1},\dots ,X_{n} \), the Hájek projection \( \hat{T} \) of T } T onto \( {\displaystyle \{X_{1},\dots ,X_{n}\}} \)is given by[1]

\( {\displaystyle {\hat {T}}=\operatorname {E} (T)+\sum _{i=1}^{n}\left[\operatorname {E} (T\mid X_{i})-\operatorname {E} (T)\right]=\sum _{i=1}^{n}\operatorname {E} (T\mid X_{i})-(n-1)\operatorname {E} (T)} \)


Hájek projection \( \hat{T} \) is an \( L^{2} \) projection of T onto a linear subspace of all random variables of the form \( {\displaystyle \sum _{i=1}^{n}g_{i}(X_{i})} \), where g i : R d → R {\displaystyle g_{i}:\mathbb {R} ^{d}\to \mathbb {R} } {\displaystyle g_{i}:\mathbb {R} ^{d}\to \mathbb {R} } are arbitrary measurable functions such that \( {\displaystyle \operatorname {E} (g_{i}^{2}(X_{i}))<\infty } \) for all \( i=1,\dots ,nv \)
\( {\displaystyle \operatorname {E} ({\hat {T}}\mid X_{i})=\operatorname {E} (T\mid X_{i})} \) and hence \( {\displaystyle \operatorname {E} ({\hat {T}})=\operatorname {E} (T)} \)
Under some conditions, asymptotic distributions of the sequence of statistics \( {\displaystyle T_{n}=T_{n}(X_{1},\dots ,X_{n})} \) and the sequence of its Hájek projections \( {\displaystyle {\hat {T}}_{n}={\hat {T}}_{n}(X_{1},\dots ,X_{n})} \) coincide, namely, if \( {\displaystyle \operatorname {Var} (T_{n})/\operatorname {Var} ({\hat {T}}_{n})\to 1} \) , then \( {\displaystyle {\frac {T_{n}-\operatorname {E} (T_{n})}{\sqrt {\operatorname {Var} (T_{n})}}}-{\frac {{\hat {T}}_{n}-\operatorname {E} ({\hat {T}}_{n})}{\sqrt {\operatorname {Var} ({\hat {T}}_{n})}}}} \)converges to zero in probability.


