Evaluation of the statistical characteristics of random data. Analysis of the likeness of distributions. Point statistical assessments

Let it take to explore the quantitative sign of the general population. Suppose that out of theoretical considerations it was possible to establish which distribution is a sign. The task of estimating the parameters, which is determined by this distribution. For example, if you know that the studied feature is distributed in the general population according to the normal law, it is necessary to estimate the mathematical expectation and the standard deviation, since these two parameters fully determine the normal distribution. If there is reason to believe that the feature has the distribution of Poisson, then it is necessary to estimate the parameter that this distribution is determined. Usually there are only sampling data obtained as a result of observations: ,, ...,. Through this data and express the estimated parameter. Considering, ..., like the values \u200b\u200bof independent random variables ,, ..., can be said that it can be found that it means to find a function from the observed random variables, which gives an approximate value of the estimated parameter.

So, statistical assessment The unknown parameter of the theoretical distribution is called the function from the observed random variables. The statistical assessment of the unknown parameter of the general population is called in one number called pottle. The following point estimates are discussed below: offset and unstable, efficient and wealthy.

In order for statistical estimates to give good approximations of the estimated parameters, they must satisfy certain requirements. We specify these requirements. Let there be a statistical assessment of an unknown parameter of theoretical distribution. Suppose that the score is found on the sample volume. We will repeat the experience, that is, the extraction of their general aggregate is another sample of the same volume and according to its data we will find an assessment, etc. We obtain the number, ..., which will be different. Thus, the estimate can be considered as a random amount, and the number ,, ..., - as its possible values.

If the estimate gives an approximate value with excess, then the number found according to the sample data ( ) There will be more true meaning. Consequently, the mathematical expectation (average value) of a random variable will be greater than, i.e.. If it gives an approximate value with the disadvantage, then.

Thus, the use of a statistical assessment, the mathematical expectation of which is not equal to the estimated parameter, would lead to systematic errors. Therefore, it is necessary to require the mathematical expectation of the assessment to be equal to the parameter. Compliance with the requirement eliminates systematic errors.

Understand Call a statistical assessment, the mathematical expectation of which is equal to the estimated parameter, i.e..

Displaced Call a statistical assessment, the mathematical expectation of which is not equal to the estimated parameter.

However, it is mistaken to assume that the unstable estimate always gives a good approximation of the estimated parameter. Indeed, possible values \u200b\u200bcan be strongly scattered around its average value, i.e. the variance of magnitude can be significant. In this case, the assessment found according to one sample, for example, may be very remote from its average value, and therefore, from the most estimated parameter. Taking as an approximate value, we would allow a big mistake. If you need a small dispersion to be small, then the ability to allow a large error will be excluded. Therefore, the statistical evaluation provides performance requirements.

Effective They call a statistical assessment, which (for a given sampling) has the smallest possible dispersion. When considering the samples of a large amount to statistical estimates, a requirement is made.

Wealthy Call a statistical assessment, which, when she strives for probability to the estimated parameter. For example, if the dispersion of an uncompensated estimate is striving for zero, such an assessment is also wealthy.

Consider the question of which selective characteristics are best in the sense of non-ability, efficiency and consistency assesses the general limit and dispersion.

Let the discrete general set relative to the quantitative feature are studied. General Middle It is called the average arithmetic values \u200b\u200bof the sign of the general population. It can be calculated by formulas or , where - the values \u200b\u200bof the sign of the general population of volume - the corresponding frequencies, and.

Suppose from the general population as a result of independent observations on the quantitative basis, the sample of volume with the values \u200b\u200bof the characteristic . Selective average Call the average arithmetic selective aggregate. It can be calculated by formulas or , where - the values \u200b\u200bof the attribute in the submissions of the volume - the corresponding frequencies, and.

If the general is unknown is unknown and it is required to evaluate it according to the sample data, then as an estimate of the general middle, the selective medium is taken, which is unrelated and a wealthy assessment. It follows that if several samples are quite large, from the same general population, selective averages will be found, then they will be approximately equal to each other. This consists of a property. sustainability of sample medium.

Note that if the dispersion of two aggregates is the same, then the proximity of the sample medium to the general does not depend on the ratio of the sample size to the volume of the general population. It depends on the size of the sample: the size of the sample more, the less selective average differs from the general.

In order to characterize the scattering of the values \u200b\u200bof the quantitative feature of the general population around its average value, the summary characteristic is introduced - the general dispersion. General Dispersion Called the average arithmetic squares of deviations of values \u200b\u200bof a sign of a general population from their average value, which is calculated by the formulas: , or .

In order to characterize the scattering of the observed values \u200b\u200bof the quantitative characteristic of the sample around its average value, a summary characteristic is introduced - a seerful dispersion. Selective dispersion Called the average arithmetic squares of deviations of the observed values \u200b\u200bof the feature from their average value, which is calculated by the formulas: , or .

In addition to the dispersion, to characterize the scattering of the values \u200b\u200bof the feature of the general (selective) set around its average value, use a summary characteristic - an average quadratic deviation. General secondary quadratic deviation Call a square root from the General Dispersion :. Selective medium quadratic deviation Call square root from selective dispersion:

Suppose from the general population as a result of independent observations on the quantitative basis, the sample of volume is retrieved. Required according to the sample data to assess the unknown general dispersion. If you take a selective dispersion as an estimate of the general dispersion, then this assessment will lead to systematic errors, giving an understated value of the general dispersion. This is explained by the fact that the selective dispersion is a displaced rating; In other words, the mathematical expectation of the sample dispersion is not equal to the estimated general dispersion, but equally .

It is easy to correct the selective dispersion so that its mathematical expectation is equal to the general dispersion. It is enough to multiply to the fraction. As a result, we obtain a corrected dispersion, which is usually denoted through. The corrected dispersion will be an unburned estimate of the General Dispersion: .

2. Interval estimates.

Along with point estimation, the statistical theory of estimation of parameters is engaged in interval estimation. The problem of interval estimation can be formulated as follows: According to the sample data, it can be constructed with a predetermined probability with a predetermined probability that the parameter is estimated within this interval. Interval estimation is particularly necessary with a small number of observations when the point estimate is largely random, therefore, little reliable.

Confidential interval For a parameter, such an interval is called relative to which it is possible with a predetermined probability close to one, to argue that it contains an unknown value of the parameter, i.e. . The smaller the number for the selected probability, the more precisely the assessment of the unknown parameter. Conversely, if this is a large number, then the assessment made with this interval is little suitable for practice. Since the ends of the confidence interval depend on the elements of the sample, the values \u200b\u200bcan also change from the sample sample. The probability is called a trustful probability (reliability). Usually, the reliability of the assessment is defined in advance, and the number close to one is as taken as they take. The choice of a trustful probability is not a mathematical task, but is determined by a specific problem being solved. Most often set reliability equal to; ; .

We present without the output of the confidence interval for the general average with a known meaning of the average quadratic deviation, provided that the random value (quantitative) is distributed normally:

where - the specified number close to one, and the values \u200b\u200bof the function are given in Appendix 2.

The meaning of this ratio is as follows: with reliability it can be argued that the confidence interval ( ) Covers an unknown parameter, the rating accuracy is equal. The number is determined from equality, or. The table (Appendix2) find an argument to which the value of the Laplace function corresponds to.

Example 1.. Random value has a normal distribution with a known average quadratic deviation. Find confidence intervals to assess the unknown general average by selective average if the size of the samples and the reliability of the assessment is specified.

Decision. We will find. From the ratio we get that. Table (Appendix 2) we find. Find accuracy of the assessment . Trust intervals will be as follows: . For example, if, the confidence interval has the following confidence borders:; . Thus, the values \u200b\u200bof an unknown parameter, consistent with the sample data, satisfy inequality .

The confidence interval for the general moderative normal distribution of the feature at an unknown value of the average quadratic deviation is set by the expression .

From here it follows that with reliability it can be argued that the confidence interval covers an unknown parameter.

There are ready-made tables (Appendix 4), using which, according to the specified and find the likelihood, and back, on the specified and can be found.

Example 2.. The quantitative sign of the general population is distributed normally. By sample volume found sample medium and corrected rms deviation. Assess the unknown general average with the help of a trust interval with reliability.

Decision. We will find. Using the table (Appendix 4), and find :. We find trust borders:

So, with reliability, an unknown parameter is enclosed in a confidence interval.

3. The concept of statistical hypothesis. General formulation of the task of testing hypotheses.

Checking statistical hypotheses is closely related to the theory of estimating parameters. In natural science, the technique, the economy is often resorted to the statement of hypotheses that can be checked statistically, that is, based on the results of observations in a random sample. Under statistical hypothesis There are such hypotheses that relate or to the form, or to separate parameters of the distribution of random variable. For example, statistical is a hypothesis that the distribution of labor productivity workers performing the same work in the same conditions, has a normal distribution law. The statistical will also be a hypothesis that the average dimensions of the parts produced on the same type, parallel to the working machines, do not differ among themselves.

Statistical hypothesis is called plain if it definitely determines the distribution of a random variable, otherwise the hypothesis is called complex.For example, a simple hypothesis is the assumption that the random variety is distributed according to a normal law with a mathematical expectation equal to zero, and a dispersion equal to one. If it suggests that a random value has a normal distribution with a dispersion equal to one, and the mathematical expectation is a number of segment, then this is a complex hypothesis. Another example of the hypothesis is the assumption that a continuous random value with a probability takes a value from the interval, in this case the distribution of a random variable can be any of the class of continuous distributions.

Often the size distribution is known, and on the sample of observations it is necessary to test the assumptions about the value of the parameters of this distribution. Such hypothesis are called parametric.

The verified hypothesis is called zero hypothesis and is denoted. Along with the hypothesis, we consider one of alternative (competing) hypotheses. For example, if a hypothesis is checked on the equality of the parameter to some specified value, i.e.:, then as an alternative hypothesis, one of the following hypotheses can be viewed ::; :; :; :, where - the specified value ,. The choice of alternative gtpothesis is determined by the specific formulation of the task.

The rule for which the decision is made to accept or reject the hypothesis is called criteria . Since the decision is made on the basis of sampling of observations of a random variable, it is necessary to choose the appropriate statistics, called the criterion statistics in this case. When checking a simple parametric hypothesis: as statistics, the criterion is chosen the same statistics as to evaluate the parameter.

Checking statistical hypothesis is based on principle, in accordance with which unlikely events are considered impossible, and events that have a greater probability are reliable. This principle can be implemented as follows. Before analyzing the sample, some small probability called level of importance. Let be the set of statistics values, a - such a subset that, subject to the truth of the hypothesis, the likelihood of criterion statistics in is equal, i.e. .

Denote by the selective statistics value calculated by the sample of observations. The criterion is formulated as follows: reject the hypothesis if; Take a hypothesis if. Criterion based on the use of a predetermined level of significance called criterion of significance. The set of all the values \u200b\u200bof the statistics of the criterion, in which the decision is made to reject the hypothesis, is called critical region; The area is called taken area hypotheses.

The level of significance determines the size of the critical area. The position of the critical area on the set of statistics values \u200b\u200bdepends on the wording of an alternative hypothesis. For example, if the hypothesis is checked:, and the alternative hypothesis is flooded as: (), the critical area is placed on the right (left) "tail" of the statistics distribution, i.e. it has the form of inequality: (), where and are those values \u200b\u200bof statistics that Accepted with probabilities, respectively, and provided that the hypothesis is correct. In this case, the criterion is called one-sided, respectively, right-sided and left-sided. If an alternative hypothesis is formulated as:, the critical area is placed on both "tails" of the distribution, i.e., determined by the combination of inequalities and; In this case, the criterion is called bilateral.

In fig. 30 shows the location of the critical area for various alternative hypotheses. Here is the density of the distribution of the statistics of the criterion, provided that the hypothesis is true, - the area of \u200b\u200badoption of the hypothesis, .

Thus, checking parametric statistical hypothesis with the help of the significance criterion can be divided into the following steps:

1) formulate verifiable () and alternative () hypotheses;

2) to designate the level of significance; as not consistent with the results of observations; If, then take the hypothesis, i.e. it is considered that the hypothesis does not contradict the results of observations.

Usually, when performing item p. 4 - 7 uses statistics, quanti tanks are tabulated: statistics with normal distribution, Statistics Student, Fisher statistics.

Example 3.. According to passport data of the automotive engine fuel consumption on 100 km Mileage is 10 L.. As a result, the engine design is expected that fuel consumption will decrease. Tests are held for verification 25 Accidentally selected cars with a modernized engine, and selective secondary fuel expenses on 100 km mileage based on test results amounted to 9.3 L.. Suppose that the sample of fuel consumption is obtained from a normally distributed general population with an average and dispersion. Provided that the hypothesis of the critical area for the initial statistics is valid, that is, is equal to the level of importance. Find the probabilities of errors of the first and second kind for the criterion with such a critical area. It has a normal distribution with a mathematical expectation equal to the dispersion equal to. The probability of error of the second kind will find by formula (11.2):

Consequently, in accordance with the accepted criterion of 13.6% of cars having fuel consumption 9 L. on the 100 km Mileage, classified as cars having fuel consumption 10 L..

4. Theoretical and empirical frequencies. Criteria of consent.

Empirical frequencies - frequencies resulting from experience (observation). Theoretical frequencies Decreased by formulas. For the normal distribution law, they can be found as follows:

, (11.3)

The issues of statistical assessment are associated with such problematic aspects of mathematical statistics, as a scientific methodology, random variables, statistical distributions, etc. For any sample, errors caused by the incomplete coverage of units, measurement errors, and the like reasons. Such errors in real life give each hypothesis (in particular, formulated on the basis of economic conclusions) random, stochastic character. Regardless of the number of variables provided for by theoretical hypotheses, it is assumed that the influence of various types of errors can be quite accurately described using only one component. Such a methodological approach allows you to limit the one-dimensional distribution of probabilities while evaluating several parameters.

Statistical assessment - This is one of two types of statistical judgment (second type - hypotheses check). It is a special kind of a method of judgment on the numerical values \u200b\u200bof the characteristics (parameters) of the distribution of the general population according to the sample data from this totality. That is, having the results of selective observation, we are trying to estimate (with the greatest accuracy) the values \u200b\u200bof certain parameters, on which the distribution of the feature (replacement), which interests us, in the general population. Since the sample includes only a part of the general population units (sometimes a very small number of them), there is a risk to allow an error. Despite the decrease in such risk with an increase in the number of observation units, it still takes place in selective observation. From here, the decision taken according to the sample results provide probabilistic nature. But it would be incorrect to consider statistical judgments only from the probability positions. Such an approach is not always sufficient to construct the correct theoretical assumptions regarding the parameters of the general population. Often you need another number of additional judgments that would provide a deeper rationale. For example, you need to evaluate with a large approximation of the average number of qualified workers at the enterprises of the region. In this case, the average arithmetic variable x from the general population, which has a normal distribution is estimated. Having received a sample on this feature in the amount p Units, it is necessary to resolve the question: What size according to the sample data must be taken as the closest to the average in the general population? Such values, the mathematical expectation of which is equal to the desired parameter (or close to it), can be brought several: a) the average arithmetic; b) Fashion; c) median; d) the average, calculated in the scope of the variation, etc.

From a probabilistic point of view, each of the above-mentioned values \u200b\u200bcan be considered to give the best approximation to the desired parameter of the general population (x), since the mathematical expectation of each of these functions (especially for large samples) is equal to the general average. It is determined by such an assumption that with repeatedly repeat sample from the same general aggregate, "on average" is the correct result.

The correctness of "average" is explained by the equality of repetitions of positive and negative deviations of the estimated errors of the general average estimate, that is, the average estimate error will be zero.

In practical conditions, as a rule, organize one sample, so the researcher is interested in the question of a more accurate assessment of the desired parameter based on the results of a particular sample. To solve such a task, except for the conclusions that arise directly from the abstract calculation of probability, additional rules for motivating the best approximation of the assessment to the desired parameter of the general population are needed.

There is a sufficient number of ways to assess the constants on selective observations. Which of them are the best in solving specific objectives of the study - the theory of statistical estimation is engaged. It explores the conditions that should be obeyed by one or another assessment, orients on estimates are more preferable under these circumstances. The theory of estimates indicates the superiority of one assessment compared to the other.

As you know, the information obtained on the basis of the sample is not a categorical in conclusion. If, for example, 100 animal heads studied by their healthy diseases were 99, then there is a possibility that one animal that remains unexplored precisely carrying the virus of the intended disease. Since it is unlikely, there is a conclusion about the absence of this disease. In most cases, this conclusion is fully justified.

Guided by such conclusions in practical activity, the experimenter (researcher) is based on the reliability of information, but only for its probability.

The other side of the sample observation, as already noted, solves the task of perhaps more objective determination of the degree of reliability of the received sample ratings. The solution to this task is trying to provide as much as possible probabilistic expression, that is, it is about determining the degree of accuracy of the assessment. Here, the researcher determines the boundaries of a possible discrepancy between the estimate obtained during the sample, and the valid value of its value in the general population.

The accuracy of the assessment is due to the method of its calculation according to the sample data and the method of selection of units into the selective set.

The method of obtaining estimates involves any computational procedure (method, rule, algebraic formula). This is the priority of statistical assessment theory. The selection methods lead to the implementation of sample research techniques.

The above allows us to define the concept of "statistical assessment".

Statistical assessment - This is an approximate value of the desired setting of the general population, which is obtained by the sample results and provides the possibility of adopting informed decisions about the unknown parameters of the general population.

Suppose that ^ "- the statistical assessment of the unknown parameter ^ theoretical distribution. By repeatedly implemented the same

The size of the sample from the general population was found estimates and 2 ^ "" P,

having different meanings. Therefore, the estimate ^ "can be considered as

random value, and +17 two, 3 ~ "P - as its possible values. As a random value, it is characterized by a certain probability density function. Since this function is due to the result of selective observation (experiment), it is called selective distribution. This function describes the probability density for each of the estimates using a specific number of selective

observations. Assuming that, statistical assessment ^, "is an algebraic function from a certain set of data and such a set will be obtained in the implementation of selective observation, then in

as a general form, the score will be expressed: ® n \u003d f (xl.x2, ^ 3, ... x t).

At the end of the sample examination, this function is no longer an estimate of the general form, and takes - a specific value, that is, it becomes a quantitative assessment (number). In other words, it follows from the above expression that any of the indicators characterizing the results of selective observation can be considered an assessment. Selective average is an estimate of the General Middle. Calculated on the sample dispersion or calculated from it the value of the average quadratic deviation is estimates of the corresponding characteristics of the general population, etc.

As already noted, the calculation of statistical evaluations does not guarantee error exclusion. The essence lies in the fact that the latter should not be systematic. The presence of them should be random. Consider the methodological side of this position.

Suppose the assessment ^ "gives an inaccurate value of the estimation of the general population with a disadvantage. In this case, each calculated value \u003d 1,2,3, ..., n) will be less than the valid value of the value of $.

For this reason, the mathematical expectation (average value) of a random variable will be less than in, that is, (m (^ p. And, on the contrary, if it gives an excessive evaluation, then the mathematical expectation

random ^ "will become more than $.

It follows that the use of a statistical estimate, the mathematical expectation of which is not equal to the estimated parameter, leads to systematic errors, that is, to non-random errors that currries the measurement results in one direction.

A natural requirement arises: the mathematical expectation of the estimate ^ "should be equal to the estimated parameter. Compliance with this requirement does not eliminate errors in general, since the selective values \u200b\u200bof the assessment may be greater than or less than the valid value of the estimation of the general population. But there will be errors in the other way from values \u200b\u200b^ (according to probability theory) with the same frequency. Consequently, compliance with this requirement, the mathematical expectation of the sample estimate should be equal to the estimated parameter, eliminates the receipt of systematic (non-random) errors, that is

M. (in) = 6.

The choice of a statistical assessment that gives the best approximation of the estimated parameter is an important task in the theory of estimation. If it is known that the distribution of the random variable under the general population complies with the law of the normal distribution, then according to selective data it is necessary to estimate the mathematical expectation and the average quadratic deviation. It is explained by the fact that the two characteristics mentioned fully determine the foundations on which a normal distribution was built. If the underlying random value is distributed according to the law of Poisson, the parameter ^ is evaluated, since it determines this distribution.

Mathematical statistics distinguishes such methods for obtaining statistical estimates according to selective data: the method of moments, the maximum method of believing.

Upon receipt of estimates by the method of moments, the moments of the general population are replaced by the moments of the selective aggregate (instead of probabilities, frequencies use frequencies).

In order for the statistical assessment to give the "best approximation" to the general characteristic, it should have a number of properties. This will be discussed below.

The ability to choose the best estimate is due to the knowledge of their basic properties and the ability to classify estimates on these properties. In mathematical literature "Properties of estimates", it is sometimes called "requirements for estimates" or "evaluating criteria". In the basic properties of statistical assessments include: inconsistency, efficiency, ability, sufficiency.

If we accept that selective average (~) and selective dispersion

(ST) are estimates of the corresponding general characteristics (^), that is, their mathematical expectation, we take into account that with large numbers

sampling units are called characteristics (~) will be approached by their mathematical expectations. If the number of sampling units is small, these characteristics can differ significantly from the corresponding mathematical expectations.

If the average value of the sample characteristics selected as an estimate corresponds to the value of the finest characteristics, the assessment is called unbelievable. Proof that the mathematical expectation of the selective average is equal to the general mode (M (x) \u003d x), indicates that the value is ~ is unrelated

middle. Otherwise it is the case with the electoral dispersion (O). her

M (Art 2) \u003d - O-2. .

mathematical expectation n is not equal to general

dispersion. So, h is a biased assessment A ". To eliminate a systematic error and get an unbound rating, selective

the dispersion is multiplied by the P - 1 correction (it follows from education

in 2 _ 2 p p -1 "P -1.

the above equation: P).

Thus, with a few sample, the dispersion is equal to:

2 Tsh, - ~) 2 p E. (x and - ~) 2

sG B. \u003d X - \u003d -.

p P - 1 P -1.

Fraction (P - 1) Call Bessel's amendment. The Bessel Mathematician has found that selective dispersion is a displaced estimate of the General Dispersion and applied the specified correction to adjust

ratings. For small samples, the correction (P - 1) differs significantly from 1. With an increase in the number of observation units, it quickly approaches 1. when<> 50 The difference between the estimates disappears, that is

° ~ "-. Without the above, the following definitions of non-ability claims flow.

Understand call a statistical assessment, the mathematical expectation of which with any sample size is equal to the value

the parameter of the general aggregate, that is, M (^) \u003d 9; m (x) \u003d x.

The category "Mathematical Wait" is studying in the course of probability theory. This is the numerical characteristic of a random variable. The mathematical expectation is approximately equal to the average value of random variable. Mathematical expectations of a discrete random variable Call the amount of all its possible values \u200b\u200bfor their probabilities. Suppose it is made in studies in which a random value H. accepted sh 1 times the value of W 2 times the value of W and times the value of x k. At the same time, W 1 + W 2 + W 3 + ... + W K \u003d p. Then the sum of all values \u200b\u200badopted x, equal

x 1 w 1 + x 2 w 2 + x 3 w 3 + ... + x to sh to

The average arithmetic of these values \u200b\u200bwill be:

X 1 w 1 + x 2 w 2 + x 3 w 3 + ... + x to sh to - sh 1 ^ sh 2 ^ sh 3 ^ ^ sh to

p or 1 p 2 p 3 p 1 p.

Since n - relative frequency ^ value H. ^ P - relative frequency of the value x 2, etc., the above equation will take the form:

X \u003d X 1 № 1 + x 2 № 2 + x 3 № 3 + ... + x to n\u003e to

With a large number of selective observations, the relative frequency is approximately equal to the likelihood of the event, that is

and\u003e 1 \u003d l; ^ 2 \u003d sh \u003d ™ \u200b\u200bK \u003d RK and therefore x 2 x 1 p 1 + x 2 p 2 + x 3 g. 3 + ... + x Krk. Then

x ~ m. (x) The probabilistic meaning of the resulting calculation result is that the mathematical expectation is approximately equal (the more precisely, the more sample) of the average arithmetic observed values \u200b\u200bof the random variable [M (x -) \u003d ~ 1.

The criterion of disability guarantees the lack of systematic errors in assessing the parameters of the general population.

Note that the sample rating (^) is a random value, the value of which can vary from one sample to another. The extent of its variations (dispersion) around the mathematical expectation of the general population parameter # characterizes the dispersion of ST2 (^).

Let be in andIN - - two unbelievable estimates of the parameter ^, that is M (B. ") \u003d 6 and m (d,) \u003d c. Dispersions of them in 1 (in -) and in G. F. -). From two 0 these NOCs in Arto to give preference to one that has less dispersion around the estimated parameter. If the evaluation dispersion ^ "Less dispersion

sP Assessments, then for the assessment & is accepted first, that is, ^ ".

Unchanged assessment ^, which has a smallest dispersion among all possible unbelievable estimates of the parameter ^, calculated from the samples of the same volume, is called an effective assessment. This is the second property (requirement) of statistical estimates of the parameters of the general population. It is necessary to remember that the effective assessment of the parameter of the general population subordinated to the definite distribution law does not coincide with the effective estimate of the parameter of the second section.

When considering samples of a large volume, statistical estimates should have an ability property. Evaluation is capable (the term "suitable" or "coordinated") also means that the greater the size of the sample, the greater the likelihood that the evaluation error will not exceed how much small positive

numbers E. Evaluation 6 parameter ^ is called wealthy if it is subject to the law of large numbers, that is, the following equality is performed:

/ Shg | G. b-B. <Е} = 1.

As we see, capable of calling such a statistical assessment, which at P approaches probability to the estimated parameter. In other words, this value of the indicator obtained by sample and the approaching (coincides in probability) as a result of the law of large numbers with an increase in the size of the sample to its expectation. For example, if the dispersion of an unstable estimate is striving for zero, this assessment is also wealthy, since it has the smallest possible dispersion (with a given sample size).

Capable estimates are:

1) the proportion of the sign in the selective aggregate, that is, the frequency as an assessment of the lobe of the trait in the general population;

2) selective average as an assessment of the general average;

3) selective dispersion as an assessment of the general dispersion;

4) Selective coefficients of asymmetry and excesses as an assessment of general coefficients.

In the literature on mathematical statistics, for some reason, it is not always possible to meet a description of the fourth properties of statistical estimates - toostate. Evaluation sufficient (or exhaustive) is an assessment that leads (provides) the completeness of the coverage of all sample information about the unknown parameter of the general population. Thus, a sufficient estimate includes all the information that is contained in the sample on the studied statistical characteristics of the general population. None of the previously reviewed estimates under consideration can not provide the necessary additional information on the parameter under study, as a sufficient statistical assessment.

Consequently, the average arithmetic selective ~ is an unformed estimate of the average arithmetic general. The factor of the disability of this assessment shows: if with the general population take a large number of random samples, then their average *<отличались бы от генеральной средней в большую и меньшую сторону одинаково, то есть, свойство несмещенности хорошей оценки также показывает, что среднее значение бесконечно большого числа выборочных средних равно значению генеральной средней.

In the symmetric rows of median distribution, the general average estimate is an indiscer. And provided that the number of selective aggregate is approaching the General (P ~ * N), the median may be in such rows and a wealthy assessment of the General Seednoi. It also concerns the criterion for effectiveness relative to the median as an estimate of the average arithmetic general aggregate, you can prove that in samples Large volume median median error (STME) is 1,2533 of the mean square error of selective medium

). That is, Stme *. Therefore, the median cannot be an effective estimate of the average arithmetic general aggregate, since its average quadratic error is larger than the average square error of the average arithmetic sampling. In addition, the average arithmetic satisfies the conditions of failure and ability, and, consequently, is the best assessment.

This setting is possible. Maybe the average arithmetic sample to be an unbelievable median assessment in symmetric sets of aggregate, for which the values \u200b\u200bof the average and median coincide? And there will be a selective medium of a wealthy assessment of the median of the general population? In both cases, the answer will be positive. For the median of the general aggregate (with a symmetric distribution), the average arithmetic sample is uncompensated and agreed.

Remembering that the stme ~ 1,2533st, we conclude: the average arithmetic sample, and not a median, a more effective assessment of the median of the general population under study.

Each characteristic of the sample is not necessarily a better estimate of the corresponding characteristic of the general population. Knowledge of the properties of estimates allows you to solve the issue of not only the choice of estimates, but also their improvements. As an example, it is possible to consider the case when calculations show that the values \u200b\u200bof the average quadratic deviations of several samples from one general set in all cases are less than the average quadratic deviation of the general population, and the value of the difference is due to the sample size. Multiplying the value of the average quadratic deviation of the sample on the correction coefficient, we obtain an improved estimate of the average quadratic deviation of the general population. For such a correction coefficient use Bessel amendment

p A I. p

(P - 1), that is, to eliminate the evaluation displacement "P - 1. The numeric expression shows that the average quadratic sampling deviation is used as an assessment, gives an underestimated value of the general population parameter.

As is known, the statistical characteristics of the selective aggregate is approximate estimates of the unknown parameters of the general population. The assessment itself may have the form of one number or any specific point. An assessment that is determined by one number is called point. Thus, selective average (~) is an unbelievable and most effective point estimate of the General Medium (x), and the selective dispersion) is a displaced point estimate of the General

dispersion (). If you designate the average error of the sample medium T. <> The actual estimate of the general average can be written in the form of x ± t °. This means that ~ - an estimate of the general average with an error equal to ". It is clear that point statistical estimates X and O should not have a systematic error in

ooo ~~ O.<в 2

the side of the overestimation or understatement of the estimated parameters x and. As mentioned earlier, evaluations that satisfy the condition are called

unstable. What is the error of the parameter T "? This is the average of a variety of specific errors:

The point estimate of the parameter of the general population is that with different possible sample estimates, it is first elected that has optimal properties, and then the value of this assessment is calculated. The resulting calculated value of the latter is considered as the best approximation to the unknown true value of the parameter of the general population. Additional calculations associated with the definition of a possible error of the assessment are not always mandatory (depending on the evaluation virgin virgin), but, as a rule, are carried out almost always.

Consider the examples of determining the point estimate for the average studied features and for their shares in the general population.

Example. Ombols of grain crops of the district are 20000 hectares. With a 10% selective inspection of the fields, such selective characteristics were obtained: the average yield - 30 C with I ha, the yield dispersion - 4, the area of \u200b\u200bcrops of high-yielding crops - 1200 hectares.

What to know about the magnitude of the average yield of grain crops in the area and which is the numerical significance of the indicator of the share (specific gravity) of high-yielding crops in the total area of \u200b\u200bgrain under study

region? That is, it is necessary to evaluate the named parameters (x, d) in the general population. To calculate estimates, we have:

N \u003d 20,000; - = 20000 x 0.1 \u003d 2000; ~ \u003d 30;<т = л / 4; № 2000,

As you know, selective average arithmetic is an effective assessment.

general middle arithmetic. Thus, it can be taken that

the best estimate of the general parameter (^) is 30. To determine the degree

evaluation accuracy It is necessary to find the middle (standard) error:

iA. P ~ I. April 2000 h PPL

t \u003d L. - (1--) = - (1--) = 0,04

v. n. and2000 2000 ^

The resulting magnitude of the error indicates a great accuracy of the assessment. The value of t here means that with repeated repetition of such samples, the error of the parameter estimate would be on average 0.04. That is for point

assessment, the average yield in the farms of the area will be x \u003d 30 - 0.04 c with the I ha.

In order to obtain a point estimate of the indicator of the shares of high-yielding crops of grain in the total grain area for the best assessment, a fraction of a share in the sample ¥ \u003d 0.6 may be taken. Thus, it can be said that, according to the results of observations, the best estimate of the desired indicator of the structure will be the number of 0.6. By specifying calculations, you should calculate the average error of this rating: T. and (1 _ n) and 0.6 (1 - 0.b) (1 \u003d 0.01

v. p N V. 2000 2000 but

As we see, the average error estimate estimate is 0.01.

The result obtained means that if it was repeated to repeat the sample with a volume of 2000 hectares of grain, the average error of the adopted assessment of the share (specific gravity) of high-yielding crops in the area of \u200b\u200bgrain crops of the area enterprises would be ± 0.01. In this case, p \u003d 0.6 ± 0.01. In percentage terms, the proportion of high-yielding crops in the total area of \u200b\u200bthe grain area will be on average 60 ± I.

Calculations show that for a specific case, the highest estimate of the desired indicator of the structure will be the number of 0.6, and the average error of the estimate in one direction or another will be approximately equal to 0.01. As you can see, the assessment is quite accurate.

Several ways of point estimate of the average quadratic deviation are known in cases where the sample was carried out from the general population of units with a normal distribution and the parameter to unknown. Simple (the easiest in calculations) estimate is the scope of the variation (and °) of the sample, multiplied by the correction factor, taken according to the standard tables and which depends on the size of the sample (for small samples). The average quadratic deviation parameter in the general population can be estimated using the calculated selective dispersion, taking into account the number of degrees of freedom. The root square from this dispersion gives the value that will be used as an assessment of the general mean-of-mean-square deviation).

Using the value of the parameter in "Calculate the average error of the estimate of the General Medium (x") method discussed above.

As mentioned earlier, in accordance with the requirement of the ability of confidence in the accuracy of one or another point assessment increases with an increase in the size of the sample. Demonstrate this theoretical position on the example of a point estimate is somewhat difficult. The effect of sampling on the accuracy of the assessment is obvious when calculating interval estimates. This will be discussed below.

Table 39 shows the most frequently used point estimates of the parameters of the general population.

Table 39.

Basic point estimates _

The values \u200b\u200bof estimates calculated by different ways may be unequal in size. In this regard, in practical calculations, it should be engaged in not a consistent calculation of possible options, but based on the properties of various estimates, choose one of them.

With a small number of observation units, the point estimate is largely by chance, therefore, little reliable. Therefore, in small samples, it can be very different from the estimated characteristics of the general population. Such a situation leads to rough errors in the conclusions that apply to the general population according to the results of the sample. For this reason, with samples of small volume, use interval estimates.

In contrast to the point interval estimate gives a range of points within which the general population parameter must be. In addition, the probability is indicated in the interval assessment, and, therefore, it is important in statistical analysis.

Interval is called an estimate that is characterized by two numbers - the boundaries of the interval that covers (covers) the estimated parameter. Such an assessment is some interval, in which a given probability is the desired parameter. The center of the interval is received by a selective point estimate.

Thus, the interval estimates are the further development of point estimation, when such an assessment at a small sample size is ineffective.

The task of interval estimation in general can be formulated as follows: according to selective observation, it is necessary to build a numeric interval, in respect of which the previously selected level of probability can be argued that within this interval is an estimated parameter.

If you take a sufficiently large number of sampling units, then, using the Lyapunov theorem, you can prove the likelihood that the sampling error will not exceed some given value A, that is

And ~ "*!" And or and № "G. Ya.

In particular, this theorem makes it possible to evaluate the errors of approximate equalities:

- "P (P and - Frequency) x "x. n

If ^ * 2xz ..., x - ~ independent random variables and n, then the probability of their average (x) is in the range from A to 6 and can be determined by the equations:

p (A.(H. (e) 1 E 2. these

_but - e (x); _ V - E (x) de ° A

The probability of p at the same time is called a trustful probability.

Thus, the confidence probability (reliability) of the estimate of the general parameter on the selective assessment is called the likelihood with which inequalities are carried out:

| ~ H. | <а; | и, ориентир | <д

where A is the limit error of the estimate, according to the average and share.

The boundaries in which this given probability can be the general characteristic, are called confidential intervals (trust borders). And the boundaries of this interval were called the boundaries of confidence.

Trusting (or tolerant) borders are borders, output beyond which this characteristic due to random oscillations has a slight probability (l ^ 0.5; p 2<0,01; Л <0,001). Понятие "доверительный интервал" введено Дж.Нейман и К.Пирсоном (1950 г.). Это установленный по выборочным данным интервал, который с заданной вероятностью (доверительной вероятностью) охватывает (покрывает) настоящее, но неизвестно для нас значение параметра. Если уровня доверительной вероятности принять значения 0,95, то эта вероятность свидетельствует о том, что при частых приложениях данного способа (метода) вычислений доверительный интервал примерно в 95% случаев будет покрывать параметр. Доверительный интервал генеральной средней и генеральной доли определяется на основе приведенных выше неравенств, из которых

it follows that ~ _a - x - ~ + a; № _A - G. - № + A.

In mathematical statistics, the reliability of one or another parameter is estimated by the value of the three following probability levels (sometimes called "probability thresholds"): L \u003d 0.95; ^ 2 \u003d 0.99; p 3 \u003d 0.9999. The probabilities that are solved to neglect, that is, but 1 = 0.05 ;; a 2 \u003d 0.01; "3 \u003d 0.001 is called levels of significance, or levels of materiality. From the above levels, reliable conclusions ensures the likelihood of p 3 \u003d 0.9999. Each level of trust probability corresponds to a certain value of the normalized deviation (see Table 27). If there are no standard probability interval values \u200b\u200bat the disposal of the standard tables, then this probability can be calculated with a certain degree of approximation by the formula:

R (<) = - = ^ = 1 e "~ th and.

In Figure 11, those parts of the total area bounded by a normal curve and an abscissa axis that correspond to the value are shaded <= ± 1;<= ± 2; <= и 3 и для которых вероятности равны 0,6287, 0,9545; 0,9973. При точечном оценке рассчитывается, как уже известно, средняя ошибка выборки, при интервальном - предельная.

Depending on the principles of selection of units (repeated or without repeated), structural formulas for calculating sampling errors

vary in the magnitude of the amendment (N).

Fig. 11. Curve of the normal distribution of probabilities

Table 40 shows the calculation formulas for errors of estimates of the general parameter.

Consider a specific case of the interval assessment of the parameters of the general population according to selective observation data.

Example. In the sample survey of the farms of the area, it was found that the average daily powder of cows (x) is 10 kg. The proportion of purebred cattle in the total number of livestock is 80%. The sampling error with the confidence probability p \u003d 0.954 was equal to 0.2 kg; For private purebred cattle 1%.

Thus, the boundaries in which the general average may be located

performance, will be 9.8<х <10,2; для генеральной доли скота -79 <Р <81.

Conclusion: With a probability of 0.954, it can be argued that the difference between the election average productivity of cows and the general productivity is 0.2 kg. The average daily limit is 9.8 and 10.2 kg. The share (specific weight) of the purebred cattle in the enterprises of the region is ranging from 79 to 81%, the estimate error does not exceed 1%.

Table 40.

Calculation of point and interval sampling errors

When organizing a sample, it is important to determine the necessary number (P). The latter depends on the variation of the united aggregate. The more the rattles, the greater the number of samples should be. Feedback between the number of sampling and its limit error. The desire to get a smaller error requires an increase in the number of selective aggregate.

The required number of sampling is determined on the basis of the formulas of the selection error (D) with a specified probability level (P). Mathematical transformations are obtained formulas for calculating the size of the sample (Table 41).

Table 41.

Calculation of the required number of sampling _

It should be noted that all of the statistical assessments are based on the assumption that the selective set of which is used in the evaluation obtained using the method (method) of the selection, which ensures the probabilities of the sample.

At the same time, choosing a confidence probability of assessment, it should be guided by the principle that the choice of its level is not mathematical tasks, but is determined by the problem specifically solved. In confirmation, consider the example.

Example. Suppose, at two enterprises, the probability of the production of finished (high-quality) products is equal to p \u003d 0.9999, that is, the probability of obtaining the marriage of products will be a \u003d 0.001. Is it possible within the framework of mathematical considerations, not interested in the nature of the product, to solve the question of whether the lack of a \u003d 0.001 has a high probability. Suppose one enterprise produces a seeder, and the second is aircraft for treating crops. If one defective will happen on 1000 seeders, then you can put up with it, because the smelting of 0.1% of the seeders is cheaper than rebuilding the technological process. If one defective, this will certainly lead to serious consequences during its operation. So, in the first case, the probability of obtaining marriage but = 0.001 can be accepted in the second case - no. For this reason, the choice of a trustful probability in the calculations in general and when calculating estimates, in particular, should be carried out on the basis of the specific conditions of the problem.

Depending on the tasks of the study, it may be necessary to calculate one or two confidence borders. If the features of the solved problem require only one of the boundaries, the upper or lower, you can make sure that the probability with which this boundary is set will be higher than when specifying both boundaries for the same confidence coefficient value 1

Let the confidence borders are installed with the probability of p \u003d 0.95, that is,

in 95% of cases, the general secondary (x) will not be less than the lower

confidential interval x ™ - x "m and no more upper trust

the copper interval is \u003d x + in this case, only with a probability A \u003d 0.05 (or 5%), the average general may exit the specified boundaries. Since the distribution of X is symmetrical, then half of this level

the probabilities, that is, 2.5% will be in case when X (x ™ is the second half - in case, x ^ x "^ -. From this it follows that the probability that the average general can be less than Value top

the trusting border of commercials "-, equal to 0.975 (that is, 0.95 +0.025). Therefore, conditions are created when we neglect with two trust borders

the meaning x is less than x "" *. and large or heer. Call

only one confidence border, for example, board, we neglect only those ~ exceeding this border. For the same value of the confidence coefficient x, the level of significance and here turns out to be two times less.

If only the sign is calculated that exceed

(or vice versa do not exceed) the values \u200b\u200bof the desired parameter x, the confidence interval is called one-sided. If the values \u200b\u200bunder consideration are limited on both sides, the confidence interval is called bilateral. From the above, it follows that hypotheses and a number of criteria, in particular the criterion of X-Student, should be considered as one-sided and bilateral. Therefore, with a bilateral hypothesis, the level of significance for the same value of X will be twice as much as one-sided. If we want to leave the same level of significance (and the level of trusting probability), as with a double-sided hypothesis, then the value of x should be taken less. This feature is taken into account in the compilation of standard Tables of Criteria X-Student (Appendix 1).

It is known that from the practical side, it is more often of interest not so much confidence intervals of the possible value of the general average, how many are maximum and minimal values, more or less of which, with a given (trusted) probability, can not be. In mathematical statistics, they are called a guaranteed maximum and guaranteed minimum average. Describing these parameters

accordingly, through and x ™, you can write: HS ™ \u003d X +; Higher \u003d x ~.

When calculating the guaranteed maximum and minimum values \u200b\u200bof the General Medium, as the boundaries of the unilateral confidence interval in the above formulas, the value 1 Takes as a criterion one-sided.

Example. In 20 sections, the average yield of sugar beet 300 n / ha was installed. This selective average characterizes the corresponding

the parameter of the general population (x) with an error of 10 n / ha. According to the selectivity of estimates, the general average yield can be both more and less selective average x \u003d 300. With the probability of p \u003d 0.95, it can be argued that the desired parameter will not be more HS "\u003d 300 +1.73 x10 \u003d 317.3 C / ha.

The value of 1 is taken for the number of degrees of freedom ^ \u003d 20-1 with a unilateral critical area and the level of significance but = 0.05 (Appendix 1). So, with the probability of p \u003d 0.95, the maximum possible level of general average yield is estimated at 317 N / ha, that is, under favorable conditions, the average yield of sugar beet does not exceed the specified value.

In some branches of knowledge (for example, in natural sciences), the theory of evaluation is inferior to the theory of verification of statistical hypotheses. In economic science, statistical assessment methods play a very important role in verifying the reliability of research results, as well as in various types of practical calculations. First of all, it concerns the use of a point assessment of the studied statistical aggregates. The choice can be better evaluation - the main problem of the point estimate. The possibility of this choice is due to the knowledge of the basic properties (requirements) of statistical assessments.

) Tasks of mathematical statistics.

Suppose that there is a parametric family of probability distributions (for simplicity, we consider the distribution of random variables and the case of one parameter). Here is a numeric parameter, the value of which is unknown. It is required to estimate it according to the existing sample of values \u200b\u200bgenerated by this distribution.

Distinguish two main types of ratings: point estimates and trust intervals.

Spot estimation

Point estimation is a form of statistical estimation at which the value of an unknown parameter is approaching a separate number. That is, you must specify the function from the sample (statistics)

,

the value of which will be considered as an approximation to an unknown true value.

The general methods for building point estimates include: the method of maximum truth, the method of the moments, the method of quantile.

Below are some properties that may have or not possess point estimates.

Wealth

One of the most obvious point evaluation requirements is to expect a sufficiently good approximation to the true value of the parameter at sufficiently large values \u200b\u200bof the sampling volume. This means that the assessment should converge towards the true meaning. This is an assessment property and is called wealth. Since we are talking about random quantities for which various types of convergence are available, then this property can be accurately formulated in different ways:

When you use a term wealth, it usually refers weak viability, i.e. convergence of probability.

The condition of consistency is practically mandatory for all assessments used in practice. Insolvent assessments are extremely rarely used.

Immunity and asymptotic failure

The parameter rating is called understandIf its mathematical expectation is equal to the true value of the estimated parameter:

.

Weaker condition is asymptotic failurewhich means that mathematical expectation of the assessment converges to the true value of the parameter with increasing sampling:

.

Disability is the recommended property of estimates. However, it should not be too overestimated by its significance. Most often, uncompensated estimates of the parameters exist and then try to consider only them. However, there may be such statistical tasks in which there are no unformed assessments. The most famous example is the following: Consider the distribution of Poisson with the parameter and deliver the task of estimating the parameter. It can be proved that there is no unstable evaluation for this task.

Comparison of estimates and efficiency

For comparing between different estimates of the same parameter, the following method is used: Choose some risk functionwhich measures the deflection of the estimate from the true value of the parameter, and the best consider the one for which this function takes a smaller value.

Most often, the mathematical expectation of the square of the valuation deviations from the true meaning is considered as a risk function.

For unrelated estimates, it is just a dispersion.

There is a lower bound on this risk function called inequality of Kramer-Rao.

(Unstasted) estimates for which this lower limit is achieved (i.e. having the minimum possible dispersion) are called effective. However, the existence of an effective assessment is a fairly strong requirement for the task that is far from always.

Condition is weaker asymptotic efficiencywhich means that the ratio of the dispersion of the unformed estimate to the lower boundary of Kramera-RAO tends to a unit at.

Note that with sufficiently broad assumptions about the distribution under study, the maximum truthfulness method gives an asymptotically effective estimate of the parameter, and if there is an effective assessment - then it gives an effective assessment.

Sufficient statistics

Statistics are named sufficient For a parameter, if the conditional distribution of the sample is provided that it does not depend on the parameter for all.

The importance of the concept of sufficient statistics is determined by the following approval. If there is sufficient statistics, A is an unsubstantiated assessment of the parameter, then the conditional mathematical expectation is also an unbelievable estimate of the parameter, and its dispersion is less or equal to the dispersion of the initial assessment.

Recall that the conditional mathematical expectation is a random value that is a function from. Thus, in the class of unrelated estimates, it is enough to consider only those that are functions from sufficient statistics (provided that there is such a task).

(Unbelievable) Effective parameter estimate is always sufficient statistics.

It can be said that sufficient statistics contain all the information about the estimated parameter, which is contained in the sample.

Statistical estimates of the parameters of the general population. Statistical hypothesis

Lecture 16.

Let it take to explore the quantitative sign of the general population. Suppose that out of theoretical considerations it was possible to establish which distribution is a sign. From here there is a task of estimating the parameters that determine this distribution. For example, if it is known that the studied sign is distributed in the general population according to the normal law, then it is necessary to assess (approximately found) mathematical expectation and the standard deviation, since these two parameters fully determine the normal distribution. If there are reason to believe that the sign has the distribution of Poisson, then it is necessary to estimate the parameter that this distribution is determined.

Usually, in the distribution, the researcher has only sampling data, for example, the values \u200b\u200bof the quantitative feature obtained as a result of observations (here and then observations are assumed independent). Through this data and express the estimated parameter.

Considering how the values \u200b\u200bof independent random variables , It can be said that it is possible to find a statistical estimate of the unknown parameter of theoretical distribution means to find a function from the observed random variables, which gives an approximate value of the estimated parameter. For example, as will be shown below, a function is used to assess the mathematical expectation of the normal distribution (the average arithmetic observed sign values):

.

So, statistical assessment The unknown parameter of the theoretical distribution is called the function from the observed random variables. The statistical assessment of the unknown parameter of the general population recorded by one number is called pottle. Consider the following point estimates: displaced and unstable, efficient and wealthy.

In order for statistical estimates to give "good" approximations of the estimated parameters, they must meet certain requirements. We specify these requirements.

Let there be a statistical assessment of an unknown parameter of theoretical distribution. Suppose that when the volume is sampled, an assessment was found. We repeat the experience, that is, the extract from the general population another sample of the same volume and by its data will find an assessment, etc. Repeating experience repeatedly, we get the number which, generally speaking, will differ among themselves. Thus, the estimate can be viewed as a random amount, and - as possible its values.

It is clear that if the estimate gives an approximate value with an excess, each number found according to the sample data will be more true. Therefore, in this case, the mathematical (average value) of the random variable will be greater than, that is. Obviously, if it gives an approximate value with a disadvantage, then.


Therefore, the use of a statistical assessment, the mathematical expectation of which is not equal to the estimated parameter, leads to a systematic (one sign) errors. For this reason, it is natural to require the mathematical expectation of the assessment to be equal to the parameter. Although adherence to this requirement, in general, will not eliminate errors (some values \u200b\u200bare more, while others are less than), the errors of different characters will be met as often. However, compliance with the requirement guarantees the impossibility of obtaining systematic errors, that is, eliminates systematic errors.

Understand Call statistical assessment (error), the mathematical expectation of which is equal to the estimated parameter with any size of the sample, that is.

Shifted Call a statistical assessment, the mathematical expectation of which is not equal to the estimated parameter in any sample size, that is.

However, it would be erroneous to assume that an unbelievable estimate always gives a good approximation of the estimated parameter. Indeed, possible values \u200b\u200bcan be strongly scattered around their average value, that is, the dispersion may be significant. In this case, the assessment is found according to the same sample, for example, it may be very remote from the mean value, which means that of the most estimated parameter. Thus, accepting as an approximated value, we will admit a big mistake. If you require the dispersion to be small, the ability to allow a big mistake will be excluded. For this reason, the statistical evaluation makes the requirement of effectiveness.

Effective They call a statistical assessment, which (with a given sample volume) has the smallest possible dispersion.

Wealthy They call a statistical assessment, which, when she strives for probability to the estimated parameter, that is, equality is true:

.

For example, if the dispersion of an unstable estimate is striving for zero, such an assessment is also wealthy.

Consider the question of which selective characteristics is best in the sense of inconsistency, efficiency and consistency assesses the general secondary and dispersion.

Let the discrete general set relative to a certain quantitative basis.

General Middle It is called the average arithmetic values \u200b\u200bof the sign of the general population. It is calculated by the formula:

§ - if all the values \u200b\u200bof the sign of the general population of volume are different;

§ - If the values \u200b\u200bof the sign of the general population have respectively frequency, and. That is, the general average is the average weighted values \u200b\u200bof the characteristic of weights equal to the corresponding frequencies.

Comment: Suppose Let the general set of volume contain objects with different values \u200b\u200bof the feature. Imagine that one object is extracted from this aggregate. The likelihood that an object with a sign of a feature will be extracted, for example, is obviously equal to. With the same probability, any other object can be retrieved. Thus, the character value can be considered as a random amount, the possible values \u200b\u200bof which have the same probabilities equal to. It is not difficult, in this case, find a mathematical expectation:

So, if we consider the surveyed sign of the general population as a random amount, then the mathematical expectation of the trait is equal to the General Middle of this feature :. We received this conclusion, believing that all objects of the general population have different signs. The same result will be obtained, if we assume that the general set contains several objects with the same sign.

Summarizing the resulting result on the general combination with the continuous distribution of the trait, we define the general average as a mathematical waiting feature: .

Suppose to study the general set relative to the quantitative feature, a sample of volume is retrieved.

Selective average Call the average arithmetic values \u200b\u200bof the sign of the sample set. It is calculated by the formula:

§ - if all the values \u200b\u200bof the characteristic set of volume are different;

§ - If the values \u200b\u200bof the characteristics of the sample set are respectively frequencies, and. That is, the selective average is the average weighted values \u200b\u200bof the feature with weights equal to the corresponding frequencies.

Comment: Selective average found according to one sample is obviously a certain number. If you extract other samples of the same volume from the same general aggregate, then selective average will change from sample sample. Thus, the selective average can be considered as a random amount, and therefore, we can talk about the distributions (theoretical and empirical) sample medium and the numerical characteristics of this distribution, in particular, about the curiac anticipation and dispersion of the sample distribution.

Further, if the general is unknown is unknown and it is required to evaluate it according to the sample data, then as an assessment of the general middle, the selective average is taken as an unstable and wealthy assessment (we offer this statement to prove yourself). It follows from the above that if in several samples there is a sufficiently large amount of one and the same general population, selective averages will be found, they will be approximately equal to each other. This consists of a property. sustainability of sample medium.

It should be noted that if the dispersion of two sets is the same, then the proximity of the sample medium to the general does not depend on the ratio of the size of the sample to the volume of the general population. It depends on the size of the sample: the volume of the sample more, the less selective average differs from the general. For example, if 1% of objects were selected from a single set, and 4% of objects were selected from another set, and the volume of the first sample turned out to be large than the second, then the first selective average will differ less from the corresponding general average than the second one.