1. Introduction
The entropy measure of a probability distribution, as introduced in the pioneering work of Shannon [Reference Shannon25], has found key applications in numerous fields. In information theory, it is used as a measure of uncertainty associated with a random phenomenon. If X is an unknown but observable quantity with a finite discrete range of possible values $\{x_1,\ldots,x_n\}$ with an associated probability mass function vector $\textbf{p}_n=(p_1,\ldots,p_n)$ , where $p_i=P(X=x_i)$ , $i=1,\ldots,n$ , such that $\sum_{i=1}^{n}p_i=1$ , the Shannon entropy measure, denoted by $H(X)=H(\mathbf{p}_n)$ , equals $-\sum_{i=1}^{n}p_i\log p_i$ , where $\log(\!\cdot\!)$ denotes the natural logarithm. As mentioned in [Reference Lad17], this measure has a complementary dual, termed extropy, which is also a very useful notion. The extropy measure, denoted by $J(X)=J(\mathbf{p}_n)$ , is defined as $-\sum_{i=1}^{n}(1-p_i)\log(1-p_i)$ in the discrete case. Just as for entropy, extropy can also be interpreted as a measure of the amount of uncertainty associated with the distribution of X. It can be seen that the entropy and extropy of a binary distribution are identical, but, for $n\geq3$ , the entropy is greater than the extropy; see, e.g., [Reference Lad17]. As with entropy, the maximum extropy distribution is also the uniform distribution, and both measures are invariant with respect to permutations of their mass functions, while they behave quite differently in their assessments of the refinement of a distribution.
If X is an absolutely continuous non-negative random variable having probability density function (PDF) f(x) with support $\mathcal{S}$ , the Shannon differential entropy is then defined as
provided the integral is finite with mean value $\mu=\mathbb{E}[X]$ and variance $\sigma^2(X)=\textrm{Var}(X)=\mathbb{E}[X-\mu]^2$ . It is known that $\sigma^2(X)<\infty$ implies that $\mathbb{E}(|X|)<\infty$ , so $H(X)<\infty$ , but the converse is not true. Also, H(X) can be finite when $\mathbb{E}(|X|)$ is not (e.g. the Cauchy distribution). It is important to mention that information-theoretic methodologies are useful in many problems and so have received considerable attention in the literature; see, for example, [Reference Asadi, Ebrahimi, Kharazmi and Soofi6, Reference Ebrahimi, Soofi and Soyer8, Reference Ebrahimi, Soofi and Zahedi9, Reference Kharazmi and Balakrishnan14, Reference Soofi, Ebrahimi and Habibullah26, Reference Yuan and Clarke29] and the references therein.
Shannon motivated the measure in (1.1) by arguing that refining the categories for a discrete quantity X, with diminishing probabilities in each, yields this analogous definition in the limit. This motivated [Reference Lad17] to introduce the dual notion of extropy for a continuous random variable. As pointed out in [Reference Lad17], for large n the extropy measure can be approximated by
where $\Delta x=(x_n-x_1)/(n-1)$ for any specific n. Thus, the measure of differential extropy for a continuous PDF can be well defined, via the limit of $J(\mathbf{p}_n)$ as n increases, in the following form:
Another useful expression of it can be given in terms of the hazard rate and reversed hazard rate functions. For an absolutely continuous non-negative random variable X with survival function $\overline{F}(x)=1-F(x)$ , and hazard rate and reversed hazard rate functions $\lambda(x)=f(x)/\overline{F}(x)$ and $\tau(x)=f(x)/{F}(x)$ , respectively, the extropy can be expressed as
where $\mathbb{E}_{12}$ and $\mathbb{E}_{22}$ denote expectations with respect to the PDFs
respectively. The densities in (1.4) and (1.5) are in fact the densities of minima and maxima of two independent and identically distributed (i.i.d.) random variables [Reference Arnold, Balakrishnan and Nagaraja1]. Several properties and statistical applications of the extropy in (1.2) were discussed in [Reference Lad17]. Moreover, [Reference Yang, Xia and Hu28] studied relations between extropy and variational distance, and determined the distribution that attains the minimum or maximum extropy among all distributions within a given variation distance from any given probability distribution. Qiu and Jia in [Reference Qiu and Jia22] explored the residual extropy properties of order statistics and record values, while [Reference Qiu and Jia21] proposed two estimators of extropy and used them to develop goodness-of-fit tests for the standard uniform distribution. In the present work, we carry out a detailed study of extropy and its various properties, including its dynamic versions.
The rest of this paper is organized as follows. In Section 2 we describe some preliminary details on information divergence, and equilibrium and weighted distribution, and also mention some well-known distributional orders that are most pertinent for all the results developed here. In Section 3, some differences and similarities between entropy, extropy, and variance are pointed out, and they are then applied to a wide family of distributions. In Section 4, characterizations based on maximum extropy and minimum relative extropy criteria for probability models based on moment constraints are presented. In Section 5, monotonicity properties of the dynamic residual and past extropies are established. Finally, some concluding remarks are provided in Section 6.
2. Preliminaries
We briefly describe here information divergence, and equilibrium and weighted distributions, and then mention some well-known distributional orders that are essential for all the subsequent developments. Throughout, X and Y will denote non-negative random variables with absolutely continuous cumulative distribution functions (CDFs) F(x) and G(x), survival functions $\overline{F}(x)=1-F(x)$ and $\overline{G}(x)= 1-G(x)$ , and PDFs f(x) and g(x), respectively. When the considered random variables are not non-negative, it will be mentioned explicitly.
2.1. Information divergence
The Kullback–Leibler (KL) discrimination information between two densities f and g is defined as
provided the integral is finite, and it requires f to be absolutely continuous with respect to g. This condition is necessary, but not sufficient, for the finiteness of (2.1). The equality holds in (2.1) if and only if $f(x)=g(x)$ almost everywhere. The KL discrimination information between any distribution with a PDF f and the uniform PDF $f^{\star}$ on a common bounded support $\mathcal{S}$ is given by [Reference Ebrahimi, Soofi and Soyer8]
where $H(f^{\star})=\log\|\mathcal{S}\|$ , with $\|\mathcal{S}\|$ denoting the size of the support $\|\mathcal{S}\|$ . We recall that X is smaller than Y in the entropy order (denoted by $X\leq_\textrm{e}Y$ ) if and only if $H(X)\leq H(Y)$ . From (2.2), for two distributions with PDFs f and g on a common bounded $\|\mathcal{S}\|$ , $X\leq_\textrm{e}Y$ if and only if $d(f||f^{\star})\geq d(g||f^{\star})$ . The case of unbounded $\|\mathcal{S}\|$ can be interpreted similarly in terms of (2.2).
A natural problem of interest is to determine a distribution, within a class of probability distributions $\Omega=\{f\}$ , that minimizes $d(f||g)$ for a given g, referred to as the reference distribution. The classical minimum discrimination information (MDI) formulation is defined in terms of moment constraints: $\Omega=\Omega_{\theta}$ , defined by all distributions with $\mathbb{E}_f[T_j(X)]=\theta_j<\infty$ , $j=1,\ldots,J$ , where $\theta_j$ is a constant and $T_j(x)$ is a measurable statistic. For this problem, the MDI theorem [Reference Kullback16] gives the form of the MDI PDF $f^{\star}(x)\in\Omega_{\theta}$ , a formula for the MDI function $d(f||g)$ , and a formula for the recovery of moment constraint parameters. With a single moment constraint, for example, the MDI theorem concerning $\min_{f}d(f||g)$ subject to $\mathbb{E}_f[T(X)]=\theta$ , $\int_{0}^{\infty}f(x)\,\textrm{d} x=1$ , gives the solution as $f^{\star}(x)=g(x)C_{\lambda}\textrm{e}^{\tau T(x)}$ , where $\tau>0$ is the Lagrange multiplier and $C_{\lambda}$ is a normalizing constant. For further applications of the MDI model, see [Reference Asadi, Ebrahimi, Hamedani and Soofi4, Reference Asadi, Ebrahimi, Hamedani and Soofi5] and the references therein.
2.2. Equilibrium distribution
Recall that the limiting distribution of the excess time (or the forward recurrence time) in a renewal process (or in shock models) results in the so-called equilibrium distribution. Let $\{X_n\}_{n\in \mathbb{N}}$ be a sequence of independent non-negative random variables representing inter-arrival times between shocks. Further, suppose these random variables have an identical CDF F(t), with finite mean $\mu$ . Also, let $X_1$ have a possibly different CDF $F_1(t)$ , with finite mean $\mu_1=\mathbb{E}[X_1]$ . Both distribution functions $F_1(t)$ and F(t) are non-degenerate at $t=0$ , i.e. $F_1(0)= F(0)= 0$ . For $S_n=\sum_{i=1}^{n}X_i$ , $n\in \mathbb N$ , with $S_0\equiv0$ , let $N(t)=\max\{n\colon S_n\leq t\}$ represent the number of renewals during (0, t]. Let $\gamma(t)$ be the excess time in a stochastic process or residual lifetime at time t, i.e. $\gamma(t)=S_{N(t)+1}-t$ . From the elementary renewal theorem, the distribution of the equilibrium random variable $\widetilde{X}_\textrm{e}$ is known to be
and the corresponding PDF is $\widetilde{f}_\textrm{e}(x) = {\overline{F}(x)}/{\mu}$ , $x>0$ [Reference Nakagawa19]. The equilibrium distribution is the asymptotic distribution of the time since the last renewal at time t and the waiting time until the next renewal.
Weighted distributions have found many applications; see, for example, [Reference Nanda and Jain20] and the references therein. For a variable X with PDF f and a non-negative real function w, let
be the PDF of the associated weighted random variable $X^w$ , provided $\mathbb E[w(X)]$ is positive and finite. Note that the equilibrium random variable $X_\textrm{e}$ is a weighted random variable obtained from X with $w(x)=1/\lambda(x)$ , where $\lambda$ is the failure rate function of X.
2.3. Stochastic orders
Aging notions and stochastic orders, as discussed in [Reference Shaked and Shanthikumar24], have found several important uses in many disciplines. We mention below some key aging concepts and stochastic orders, which are most pertinent for the developments here. Throughout, the terms ‘increasing’ and ‘decreasing’ are used in a non-strict sense.
Let X have the hazard rate and reversed hazard rate functions $\lambda_X(x)=f(x)/\overline{F}(x)$ and $\tau_X(x)=f(x)/F(x)$ , respectively. Similarly, let Y have the hazard rate and reversed hazard rate functions $\lambda_Y(x)=g(x)/\overline{G}(x)$ and $\tau_Y(x)=g(x)/G(x),$ respectively. Then, in the present work, we use the following notions: the decreasing reversed failure rate (DRFR) property; the increasing/decreasing failure rate (IFR/DFR) properties; the usual stochastic order (denoted by $X\leq_\textrm{st}Y$ ); hazard rate order (denoted by $X\leq_\textrm{hr}Y$ ); dispersive order (denoted by $X\leq_\textrm{d}Y$ ); convex order (denoted by $X\leq_\textrm{cx}Y$ ). For their informal definitions and properties, we refer the readers to [Reference Shaked and Shanthikumar24]. In Table 1, we present the implications of these orders in terms of random variables X and Y.
3. Results on extropy
Differential entropy is a measure of the disparity of the PDF f(x) from the uniform distribution. Indeed, it measures uncertainty in the sense of the utility of using f(x) in place of the ultimate uncertainty of the uniform distribution [Reference Good10]. Variance measures the average of distances of outcomes of a probability distribution from its mean. Because extropy is a complementary dual of entropy, it is also a measure of the disparity of the PDF f(x) from the uniform distribution. Even though entropy, extropy, and variance are all measures of dispersion and uncertainty, the lack of a simple relationship between orderings of a distribution by the three measures arises from some substantial and subtle differences. For example, the differential entropy of random variable X takes values in $[\!-\!\infty,\infty]$ , while extropy takes values in $[\!-\!\infty,0)$ . Moreover, $J(f)<H(f)$ due to the fact that $2x\log x<x^2$ for all $x>0$ .
In terms of mathematical properties, both entropy and extropy are non-negative in the discrete case. Moreover, in this case, $H(\mathbf{p})$ and $J(\mathbf{p})$ are invariant under one-to-one transformations of X. In the continuous case, neither the entropy nor the extropy is invariant under one-to-one transformations of X. Let $\phi(\!\cdot\!)\colon\mathbb{R}\mapsto\mathbb{R}$ be a one-to-one function and $Y=\phi(X)$ . It is known that $H(Y)=H(X)-\mathbb{E}[\!\log J_{\phi}(Y)]$ [Reference Ebrahimi, Soofi and Soyer8], where $J_{\phi}(Y) = |{\textrm{d}\phi^{-1}(Y)}/{\textrm{d} Y}|$ is the Jacobian of the transformation. As $f_Y(y)=f_X(\phi^{-1}(y))|{1}/({\phi'(\phi^{-1}(y))})|$ , we readily find that
However, there is no such direct relationship with the standard deviation. Furthermore, for any $a>0$ and $b\in \mathbb{R}$ ,
which means that they are all position-free but scale-dependent. The following theorem extends the impact of scale on the extropy of a random variable to more general transformations. This result is similar to [Reference Ebrahimi, Maasoumi and Soofi7, Theorem 1] for the differential entropy and variance, and we therefore do not present its proof. First, we recall that X is smaller than Y in the extropy order (denoted by $X\leq_\textrm{ex}Y$ ) if and only if $J(X)\leq J(Y)$ .
Theorem 3.1. Let X be a random variable with PDF f(x), and $Y=\phi(X)$ , where $\phi\colon (0,\infty) \to (0,\infty)$ is a function with a continuous derivative $\phi'(x)$ in the support of X such that $\mathbb{E}(Y^2) < \infty$ . If $|\phi'(x)|\geq 1$ for all x in the support of X, then $X\leq_\textrm{ex}Y$ .
It is known that the Shannon entropy of the sum of two independent random variables is larger than both their individual entropies. In a similar manner, the following theorem presents the corresponding result for extropy.
Theorem 3.2. If X and Y are two absolutely continuous independent random variables, then $J(X+Y) \geq \max\{J(X),J(Y)\}$ .
Proof. Let X and Y be two absolutely continuous independent random variables with CDFs F and G, and PDFs f and g, respectively. Then, using the convolution formula and setting $Z = X + Y$ , we immediately obtain
Now, applying Jensen’s inequality for the convex function $x^2$ to this result, we get $(\mathbb{E}_Y[f(z-Y)])^2 \leq \mathbb{E}_Y[f^2(z-Y)]$ , $z\in \mathbb{R}$ . Then, by integrating both sides of this inequality with respect to z from $-\infty$ to $\infty$ , we obtain
The proof is completed by using similar arguments for the random variable Y.
We recall that the two-dimensional version of the Shannon differential entropy in (1.1) is $H(X,Y)=-\mathbb{E}[\!\log f(X,Y)]$ . If X and Y are independent, then it is evident that $H(X,Y)=H(X)+H(Y)$ . However, extropy has a distinctly different property in this regard. Indeed, defining the two-dimensional version of the differential extropy in (1.2) as $J(X,Y) = -\frac{1}{2}\mathbb{E}[f(X,Y)]$ , if X and Y are independent, then $J(X,Y) = -2J(X)J(Y)$ .
In analogy to (2.1), the relative extropy in a density $f(\!\cdot\!)$ relative to $g(\!\cdot\!)$ is defined as [Reference Lad17]
provided the integral is finite. The equality holds if and only if $f(x)=g(x)$ almost everywhere. The relative extropy can then be represented as
where $J(f,g) = -\frac{1}{2}\mathbb{E}[f(Y)] = -\frac{1}{2}\mathbb{E}[g(X)]$ is the inaccuracy measure of f with respect to g or vice versa. As pointed out in [Reference Lad17], the relative extropy between any distribution with a PDF f and the uniform PDF $f^{\star}$ on a common bounded support $\mathcal{S}$ is given by $d^\textrm{c}(f||f^{\star}) = J(f^{\star}) - J(f) \geq 0$ , where $J(f^{\star}) = -{1}/({2\|\mathcal{S}\|})$ . So, by this result, for two distributions with PDFs f and g on a common bounded $\|\mathcal{S}\|$ , we have $X\leq_\textrm{ex}Y$ if and only if $d^\textrm{c}(f||f^{\star}) \geq d^\textrm{c}(g||f^{\star})$ . The case of unbounded $\|\mathcal{S}\|$ can be interpreted in a similar manner. We now present some implications of the stochastic and convex orderings for two distributions by means of extropy.
Theorem 3.3. Let X and Y be two non-negative random variables with PDFs f(x) and g(x), respectively. If $X\leq_\textrm{st}Y$ and Y is DFR, then $X\leq_\textrm{ex}Y$ .
Proof. Let Y be DFR with $X\leq_\textrm{st}Y$ . Thus, we have
The first inequality in (3.2) is obtained by noting that $X\leq_\textrm{st}Y$ implies $\mathbb{E}_X[g(X)] > \mathbb{E}_Y[g(Y)]$ as g is a decreasing function because Y is DFR. The second inequality is obtained by using the Cauchy–Schwarz inequality. Making use of (1.2) and (3.2), we obtain the required result.
The following theorem gives implications of the convex order under some condition for the same ordering of the two models by extropy.
Theorem 3.4. Under the conditions of Theorem 3.3, if $X \leq_\textrm{cx} Y$ and g(x) is a concave function, then $X \leq_\textrm{ex}Y$ .
Proof. From the non-negativity of relative extropy in (3.1), we get
Let g(x) be a concave function, so $-g(x)$ is a convex function. By applying the definition of convex order, the assumption $X \leq_\textrm{cx} Y$ implies that
For convenience, we present in Table 2 the expressions for extropy, entropy, and standard deviation of some common distributions.
† $\gamma=0.5773\ldots$
* $\varphi_1(\alpha)=\psi(\alpha)-\psi(\alpha+\beta)$ , $\varphi_2(\beta)=\psi(\beta)-\psi(\alpha+\beta)$
3.1. Extropy of finite mixture distributions
The entropy of mixture distributions has been studied by many authors, including [Reference Hild, Pinto, Erdogmus and Principe12, Reference Rohde, Nichols, Bucholtz and Michalowicz23, Reference Tan, Tantum and Collins27]. Here, we derive a closed-form expression for the extropy of finite mixture distributions. Let $X_i$ , $i = 1,\ldots,n$ , be a collection of n absolutely continuous independent random variables. Further, let $f_i(\!\cdot\!)$ be the PDF of $X_i$ and $\mathbf{P}=(p_1,\ldots,p_n)$ be the mixing probabilities. Then, the PDF of a finite mixture random variable $X_p$ is given by
where $\sum_{i=1}^{n}p_i=1$ , $p_i\geq 0$ . Using the algebraic identity
from (1.2) and (3.5) we readily get
where $J(f_i,f_j) = -\frac{1}{2}\mathbb{E}[f_i(X_j)]$ is the inaccuracy measure of $f_i$ with respect to $f_j$ . It is evident that the expression in (3.6) is easy to compute, but there is no such expression for the entropy. Thus, this seems to be one advantage of extropy over entropy. We now present the following example as an illustration of the above result.
Example 3.1. Let $f_m = 0.5[N(\mu,\sigma^2)+N(\!-\!\mu,\sigma^2)]$ denote a mixed Gaussian distribution. It is clear that this distribution is obtained by just splitting a Gaussian distribution $N(0,\sigma^2)$ into two parts, centering one half about $+\mu$ and the other half about $-\mu$ , and consequently has a mean of zero and variance $\sigma^2_{m} = \sigma^2+\mu^2$ . In this case, [Reference Michalowicz, Nichols and Bucholtz18] provided an analytical expression for signal entropy in situations when the corrupting noise source is mixed Gaussian, since a mixed Gaussian distribution is often considered as a noise model in many signal processing applications. From (3.6), $J(f_m) = 0.25\{J(f_1) + J(f_2)\} + 0.5J(f_1,f_2)$ , and in this case it can be shown that
Thus, we obtain
which can be seen as an increasing function of $\sigma^2$ . Figure 1 shows $J(f_m)$ as a function of $\sigma^2$ with respect to various values of $\mu$ . The plots show that $J(f_m)$ is increasing with respect to both $\sigma^2$ and $\mu$ .
4. Characterizations based on maximum extropy
The maximum entropy (ME) principle is an extension of Laplace’s principle of insufficient reason for assigning probabilities. Both principles stipulate distributing the probability uniformly when the only available information is the support of the distribution $\mathcal{S}$ . When additional information is available, the ME principle stipulates distributing the probability close to the uniform distribution while preserving the relevant information. In a similar vein, the maximum extropy (MEX) can be regarded as an extension of Laplace’s principle of insufficient reason for assigning probabilities. Let us consider the moment class of distributions
where the $T_j(x)$ are integrable with respect to f and $T_0(x)=\theta_0=1$ is the normalizing factor. Then, the objective is to find $f^\star$ that maximizes J(f) subject to a set of moment constraints defined in (4.1).
Theorem 4.1. Let $\Omega_{\theta}$ be as defined in (4.1), with $T_j(x)$ , $j=1,\ldots,J$ , being integrable functions with respect to f and $T_0(x) = \theta_0 = 1$ being the normalizing factor. Then, MEX is attained by the distribution with PDF
where $(\lambda_0,\lambda_1,\ldots,\lambda_J)$ are Lagrange multipliers such that $\lambda_0 = 0$ when $\mathcal{S}$ is unbounded.
Proof. The aim is to maximize $J(f) = -\frac{1}{2}\int ^{\infty}_{0} f^2(x) \,\textrm{d} x$ subject to the constraints $\int_{\mathcal{S}}T_j(x)f(x)\,\textrm{d} x = \theta_j$ , $j=0,1,\ldots,J$ , where $\mathcal{S}$ may be bounded or unbounded, the $T_j(x)$ are integrable with respect to g, and $T_0(x)=\theta_0=1$ is the normalizing factor. The requirement is then equivalent to maximizing
where $(\lambda_0,\lambda_1,\ldots,\lambda_J)$ are Lagrange multipliers such that $\lambda_0=0$ when $\mathcal{S}$ is unbounded. The Lagrangian is similar to the ME problem in terms of f, so taking derivatives gives the solution as in (4.2) [Reference Jaynes13]. Because the function $-\frac{1}{2}x^2$ , $x>0$ , is concave, the solution is unique.
Theorem 4.1 readily provides the following characterizations of some well-known distributions.
Corollary 4.1.
-
(i) The uniform distribution in [0, 1] is the MEX model in the class of distributions with no constraint.
-
(ii) A distribution with PDF $f(x)=2(2-3\theta)+6(2\theta-1)x$ , $0<x<1$ , is the MEX model in the class of distributions with finite expectation, $\mathbb{E}(X)=\theta$ .
-
(iii) The exponential distribution with PDF $f(x)=\lambda \textrm{e}^{-\lambda x}$ , $x\geq0$ , is the MEX model in the class of distributions with finite moment $\mathbb{E}(\textrm{e}^{-\lambda X})=1$ .
-
(iv) The Weibull distribution with PDF $f(x)=\alpha\lambda x^{\alpha-1} \textrm{e}^{-\lambda x^{\alpha}}$ , $x\geq0$ , $\alpha>0$ , is the MEX model in the class of distributions with finite moment $\mathbb{E}(\textrm{e}^{-\lambda X^{\alpha}})=1$ .
-
(v) A distribution with PDF $f(x)=\textrm{e}^{-x}(3-2x)$ , $x>0$ , is the MEX model in the class of distributions with finite moments $\mathbb{E}(\textrm{e}^{-X})=\mathbb{E}(X\textrm{e}^{-X})=1$ .
Corollary 4.1 leads us to derive the equilibrium distribution model as a MEX model as follows.
Theorem 4.2. The solution to the constrained maximization problem
is the equilibrium distribution of the renewal time with PDF $f^{\star}(x)=\widetilde{f}_\textrm{e}(x)$ , $x>0$ .
Proof. For an unbounded random variable, Theorem 4.1 gives the solution to (4.3) as $f^{\star}(x) = \lambda_1\overline{F}(x)$ , where $\lambda_1^{-1} = \int_{0}^{\infty}\overline{F}(x)\,\textrm{d} x = \mu$ . Thus, $f^{\star}(x)$ is the PDF of the equilibrium distribution for the renewal time.
Let us assume that $\psi(x)$ is an increasing differentiable function with $\psi'(x)=\phi(x)\geq 0$ . Denote by $\widetilde{X}_{\phi}$ the weighted version of $\widetilde{X}_\textrm{e}$ with PDF
for $x>0$ , where $\mathbb{E}[\psi(X)] = \int_{\mathcal{S}}\phi(x)\overline{F}(x)\,\textrm{d} x$ , provided it exists. The following theorem then generalizes Theorem 4.2.
Theorem 4.3. The solution to the constrained maximization problem
is the weighted version of $\widetilde{X}_{\phi}$ with PDF $f^{\star}(x)=\widetilde{f}_{\phi}(x)$ , $x\in{\mathcal{S}}$ , in (4.4).
Proof. The proof is similar to that of Theorem 4.2 and is therefore omitted for the sake of brevity.
Note that when $\phi(x)\equiv1$ , Theorem 4.3 reduces to Theorem 4.2. Moreover, the PDF $f^{\star}(x)={2x\overline{F}(X)}/{\mathbb{E}[X^2]}$ , $x\in{\mathcal{S}}$ , is the MEX model in the class of distributions with $\phi(x)=x$ .
In an analogous manner, we can consider the minimum relative extropy (MREX). In this case, we seek the distribution in a class of probability distributions $\Omega=\{f\}$ that minimizes $d^\textrm{c}(f||g)$ for a given g, called the reference distribution, in terms of moment constraints. In this regard, we have the following theorem.
Theorem 4.4. Let $\Omega_{\theta}$ be as defined in (4.1), with $T_j(x)$ , $j=1,2,\ldots,J$ , being integrable functions with respect to f and $T_0(x)=\theta_0=1$ being the normalizing factor. Then, MEX is attained by the distribution with PDF
where $(\lambda_0,\lambda_1,\ldots,\lambda_J)$ are Lagrangian multipliers such that $\lambda_0=0$ when $\mathcal{S}$ is unbounded.
Proof. The aim is to minimize $d^\textrm{c}(f||g) = \frac{1}{2}\int_{\mathcal{S}}[f(x)-g(x)]^2\,\textrm{d} x$ subject to the constraints $\int_{\mathcal{S}}T_j(x)f(x)\,\textrm{d} x = \theta_j$ , $j=0,1,\ldots,J$ , where $\mathcal{S}$ may be bounded or unbounded, the $T_j(x)$ are integrable with respect to g, and $T_0(x) = \theta_0 = 1$ is the normalizing factor. The requirement is then equivalent to minimizing $\int_{\mathcal{S}}\psi(x)\,\textrm{d} x$ , where
such that $(\lambda_0,\lambda_1,\ldots,\lambda_J)$ are Lagrangian multipliers and $\lambda_0=0$ when $\mathcal{S}$ is unbounded. The Lagrangian is similar to the MEX problem in terms of f, so taking the derivatives gives the solution as in (4.5). Because the function $\frac{1}{2}x^2$ , $x>0$ , is convex, the solution is unique.
5. Results on residual and past extropy
Let X be a non-negative random variable representing the lifetime of a unit, and let $t\geq 0$ denote its current age. Then, our interest now is the residual lifetime $\mathcal{S}_t=\{x\colon x>t\}$ . At age t, the PDF of the residual lifetime, $X_t = [X-t \mid X \geq t]$ , is $f(x;\;t)=f(x)/\overline{F}(t)$ , $x\geq t>0$ . In this situation, the residual extropy is given by [Reference Qiu and Jia21]
where $\mathbb{E}_t$ is the expectation with respect to the residual density $f(x;\;t)$ . The residual extropy takes values in $[\!-\!\infty,0)$ and it identifies with the extropy of $[X\mid X>t]$ . Another useful expression can be given as
where $\mathbb{E}_{12,t}$ is the expectation with respect to the residual density of $f_{12}(x)$ as defined in (1.4).
The question of whether $J(X;\;t)$ characterizes the lifetime distribution is answered partially in the following theorem.
Theorem 5.1. Let X be a non-negative random variable with CDF F which is differentiable and has a continuous PDF f over $\mathcal{S}_t$ . If f(x) is strictly decreasing over $\mathcal{S}_t$ , then $J(X;\;t)$ uniquely determines F.
Proof. As f(x) is strictly decreasing over $\mathcal{S}_t$ , we obtain
Now, let us consider two random variables X and Y and suppose that $J(X;\;t)=J(Y;\;t)$ , i.e.
Taking derivatives on both sides with respect to t we get
Now, suppose there exists a $t^\star\in\mathcal{S}_t$ such that $\lambda_Y(t^\star)\neq\lambda_X(t^\star)$ . Rearranging the terms and letting $\mathbb{E}_{t^\star}[f(X;\;t^\star)]=\mathbb{E}_{t^\star}[g(Y;\;t^\star)]$ , we obtain
or equivalently
Without loss of generality, let $\lambda_Y(t^\star)>\lambda_X(t^\star)$ . Then, using (5.4), we obtain $\mathbb{E}_{t^\star}[f_{t^\star}(X)]>\lambda_X(t^\star)$ , which is a contradiction to the condition in (5.3). For the case when $\lambda_Y(t^\star)<\lambda_X(t^\star)$ , the contradiction is obtained in terms of $\mathbb{E}_{t^\star}[g_{t^\star}(Y)]>\lambda_Y(t^\star)$ , which completes the proof of the theorem.
It is important to mention that the above theorem is applicable to a large class of distributions that include monotone densities [Reference Asadi, Ebrahimi, Hamedani and Soofi4]. The following theorem relates the dynamic extropy and hazard rate orderings.
Theorem 5.2. Let X and Y be two non-negative continuous random variables having CDFs F and G, PDFs f and g, and hazard rate functions $\lambda_X$ and $\lambda_Y$ , respectively. If $X\leq_\textrm{hr}Y$ and either X or Y is DFR, then $J(X;\;t)\leq J(Y;\;t)$ .
Proof. Let $X_t$ and $Y_t$ denote the residual lifetime variables with densities $f_t$ and $g_t$ , respectively. The condition $X\leq_\textrm{hr}Y$ implies that $X_{12,t}\leq Y_{12,t}$ in the usual stochastic order, where $X_{12}$ and $Y_{12}$ have survival functions $\overline{F}^2(x)$ and $\overline{G}^2(x)$ , respectively. If we assume that X is DFR, then $\mathbb{E}_{12,t}[\lambda_X(X)]\geq \mathbb{E}_{12,t}[\lambda_X(Y)]\geq\mathbb{E}_{12,t}[\lambda_Y(Y)]$ . From (5.2), we get $J(X;\;t)\leq J(Y;\;t)$ . If Y is DFR, then, using a similar argument, we again obtain $J(X;\;t)\leq J(Y;\;t)$ . Hence, the theorem.
Example 5.1. Let X be an absolutely continuous non-negative random variable with PDF f(x) and survival function $\overline{F}(x)$ . Further, let $0\equiv X_0 \le X_1 \le X_2 \le \cdots$ denote the epoch times of a non-homogeneous Poisson process (NHPP) with intensity function $\lambda(x)$ , $x\geq 0$ , where $X_1$ has the same distribution as X. Let $T_n=X_{n}-X_{n-1}$ , $n\in\mathbb{N}$ , denote the length of the nth inter-epoch interval (or inter-occurrence time). Denoting by $\overline{F}_{n}(x)$ the survival function of $X_{n}$ , $n\in\mathbb{N}$ , we have [Reference Arnold, Balakrishnan and Nagaraja2]
From [Reference Shaked and Shanthikumar24, Example 1.B.13], it is known that $T_n\leq_\textrm{hr}T_{n+1}$ . On the other hand, for all $n\geq1$ , if X is DFR then $T_n$ is DFR due to [Reference Gupta and Kirmani11, Theorem 5]. As a result, Theorem 5.2 implies that $J(T_n;\;t)\leq J(T_{n+1};t)$ .
We now propose two new classes of life distributions by combining the notions of extropy and aging.
Definition 5.1. Let X be an absolutely continuous random variable with PDF f. Then, we say that X has increasing/decreasing dynamic extropy (IDEX/DDEX) if $J(X;\;t)$ is increasing/decreasing.
Roughly speaking, if a unit has a CDF that belongs to the class of DDEX, then as the unit ages, the conditional probability density function becomes more informative. The following theorem gives a relationship between these classes and the well-known increasing (decreasing) failure rate classes of life distributions.
Theorem 5.3. For an absolutely continuous random variable X with PDF f, if X is IFR (DFR), then X is DDEX (IDEX).
Proof. We prove it for IFR; the DFR case can be handled in an analogous manner. Suppose X is IFR; then, from (5.2), for $t > 0$ we get
where $\lambda_{12}(t)$ is the hazard rate function of $X_{12}$ . From this, we see that $J(X;\;t)$ is decreasing in t, i.e. X is DDEX.
Another important class of life distributions is the class of increasing failure rate in average (IFRA) distributions. Recall that X is IFRA if $H(x)/x$ is increasing in x, where $H(x)=-\log\overline{F}(x)$ denotes the cumulative hazard function. The following example shows that there is no relationship between the proposed class and the IFRA class of life distributions.
Example 5.2. Consider the random variable X with survival function
Figure 2 presents the residual extropy and the function $H(t)/t$ , from which we observe that X is not an IFRA distribution. Moreover, it is easy to verify that, in this case,
The plot of the residual extropy in Fig. 2 shows that the random variable X is DDEX. This example also shows that DDEX does not imply the IFR property.
The connection between the extropy residual life functions of two random variables and the proportional hazard model is explored in the following theorem.
Theorem 5.4. Let X and Y be two absolutely continuous non-negative random variables with survival functions $\overline{F}(t)$ and $\overline{G}(t)$ , and hazard rate functions $\lambda_X(t)$ and $\lambda_Y(t)$ , respectively. Further, let $\theta(t)$ be a non-negative increasing function such that $\lambda_Y(t)=\theta(t)\lambda_X(t)$ , $t>0$ , and $0\leq \theta(t)\leq 1$ . Then, if $J(X;\;t)$ is a decreasing function of t, $J(Y;\;t)$ is also decreasing in t, provided $\lim_{t\to\infty}({\overline{G}(t)}/{\overline{F}(t)}) < \infty$ .
Proof. From (5.2), $J(Y;\;t)$ is decreasing in t if and only if $\mathbb{E}_{12,t}[\lambda_Y(Y)]$ is increasing in t. Let us set $m_1(t)=\mathbb{E}_{12,t}[\lambda_X(X)]$ and $m_2(t)=\mathbb{E}_{12,t}[\lambda_Y(Y)]$ . Then, because $m'_{\!\!1}(t)=\lambda^X_{12}(t)[m_1(t)-\lambda_X(t)]$ and $m'_{\!\!2}(t)=\lambda^Y_{12}(t)[m_2(t)-\lambda_Y(t)]$ , $m_2(t)$ is increasing in t if $m_2(t)\geq\lambda_Y(t)=\theta(t)\lambda_X(t)$ , which holds if $m_2(t)\geq \theta(t)m_1(t)$ , $t>0$ . Define the function $\varphi(t)$ as
We now prove that $\varphi(t)\leq 0$ . Differentiating $\varphi(t)$ with respect to t and then performing some algebraic manipulations, we obtain
From the assumptions that $\theta(t)$ and $m_1(t)$ are increasing in t, we get $\varphi'(t) > 0$ , i.e. $\varphi(t)$ is increasing in t. Now, as $\lim_{t\to\infty}({\overline{G}(t)}/{\overline{F}(t)}) < \infty$ , we get
Hence, $\varphi(t)\leq0$ for any t, i.e. $m_1(t)\leq m_2(t)$ , which completes the proof of the theorem.
Consider a parallel system with n units having lifetimes $X_1,\ldots,X_n$ , which are i.i.d. absolutely continuous random variables with CDF F(x). The corresponding system lifetime is $X_{n:n}=\max\{X_1,\ldots,X_n\}$ , whose CDF is $F_{n:n}(x)\;:\!=\;\mathbb{P}(X_{n:n}\leq x)=[F(x)]^n$ , $x\geq 0$ . Then, we have the following theorem, which gives the closure property of DDEX distributions under the formation of parallel systems. Its proof is similar to that of [Reference Asadi and Ebrahimi3, Theorem 2.3], so we do not present it here.
Theorem 5.5. Let $X_1,\ldots,X_n$ be a set of i.i.d. random variables with CDF F, PDF f, hazard rate function $\lambda$ , and decreasing residual extropy $J(X;\;t)$ . If $J(X_{n:n};t)$ denotes the residual extropy of the nth-order statistic among $X_1,\ldots,X_n$ , then $J(X_{n:n};\;t)$ is also decreasing.
Let X be a non-negative random variable representing the lifetime of a unit, and $t\geq 0$ denote its current age. It is then of interest to examine the inactivity time of the item with support $\mathcal{S}_{[t]}=\{x\colon x\le t\}$ . At age t, the PDF of the inactivity time, $X_{[t]} = [t-X\mid X \leq t]$ , is given by $f(x;\;[t])=f(x)/F(t)$ , $0<x\leq t$ . Then, the past extropy is defined as
where $\mathbb{E}_{[t]}$ is the expectation with respect to the inactivity density, $f(x;\;[t])$ . In analogy with (5.1), the past extropy also takes values in $[\!-\!\infty,0)$ and it identifies with the extropy of $[X\mid X\leq t]$ ; see, e.g., [Reference Krishnan, Sunoj and Unnikrishnan Nair15]. By using (1.3), another useful expression for it can be given as
where $\mathbb{E}_{22,t}$ is the expectation with respect to the inactivity density of $f_{22}(x)$ defined in (1.5). We now propose a new class of life distributions based on the notion of past extropy.
Definition 5.2. We say that X has increasing past extropy (IPEX) if $\widetilde{J}(X;\;[t])$ is increasing in $t>0$ .
The expression in (5.5) is useful in examining the behavior of past extropy in terms of the behavior of the reversed failure rate, as done in the following theorem.
Theorem 5.6. For an absolutely continuous non-negative random variable X with PDF f, if X is DRFR, then X is IPEX.
Proof. If X is DRFR, then $\tau(t)$ is decreasing in t, so
where $\tau_{22}(t)$ is the reversed hazard rate function of $X_{22}$ . Hence, the theorem.
The following example demonstrates the usefulness of Theorem 5.6 in recognizing some IPEX distributions.
Example 5.3.
-
(i) Let X be an exponential random variable with PDF $f(x)=\lambda\textrm{e}^{-\lambda x}$ for $x>0$ , $\lambda>0$ . The RFR function of X is $\tau(x)=\lambda[1-\textrm{e}^{-\lambda x}]^{-1}$ . We can easily check that $\tau(x)$ is decreasing in x, and so, according to Theorem 5.6, X is IPEX.
-
(ii) Let X have an inverse Weibull distribution with CDF $F(x)=\exp{\![\!-\!(1/\sigma x)^\lambda]}$ , $x>0$ , $\sigma,\lambda>0$ . The RFR function is $\tau(x)=\lambda\sigma^{-\lambda}x^{-(1+\lambda)}$ , which is decreasing in x. Hence, X is IPEX.
Theorem 5.7. Let X and Y be two absolutely continuous non-negative random variables with reversed hazard rate functions $\tau_X(t)$ and $\tau_Y(t)$ , respectively. Further, let $\theta(t)$ be a non-negative increasing function such that $\tau_Y(t)=\theta(t)\tau_X(t)$ , $t>0$ , and $0\leq \theta(t)\leq 1$ . Then, if $\widetilde{J}(X;\;[t])$ is a decreasing function of t, $\widetilde{J}(Y;\;[t])$ is also decreasing in t, provided $\lim_{t\to0}({{G}(t)}/{{F}(t)}) < \infty$ .
Note that this theorem connects the past extropy of two random variables to the known proportional reversed hazard rates model.
6. Concluding remarks
Many information measures have been studied in the literature. For instance, entropy functions are used to measure the uncertainty in a random variable. If these entropy functions are applied to residual lifetime or past lifetime (or inactivity time) variables, then we obtain dynamic measures of uncertainty that can measure the aging process. We have provided several results on extropy, which is a complementary dual function of entropy. Some similarities between entropy, extropy, and variance have been discussed. In spite of some agreements between these measures, there are some notable differences as well. For example, many well-known families of distributions have been characterized as the unique maximum entropy and extropy solutions, while no such characterization is available in terms of variance. It needs to be mentioned that there is no universal relationship between entropy, extropy, and variance orderings of distributions. One advantage of extropy as compared to other measures is that it yields an expression for finite mixture distributions, while no such expression is available for closed-form entropy and variance measures. We have shown that extropy information ranks uniformity of a wide variety of absolutely continuous distributions. We have then elaborated on some theoretical merits of extropy and presented several results about the associated characterizations and also its dynamic versions. The most important advantage of extropy is that it is easy to compute, and it will therefore be of great interest to explore its important potential applications in developing goodness-of-fit tests and inferential methods.
Acknowledgements
A. Toomaj was partially supported by a grant from Gonbad Kavous University. We thank the Editor-in-Chief and anonymous reviewers for their useful comments and suggestions on an earlier version of this manuscript which led to this improved version.
Funding information
There are no funding bodies to thank relating to the creation of this article.
Competing interests
There were no competing interests to declare which arose during the preparation or publication process of this article.