22 August 2023

Measures of central tendency

    It is a single value within the range of data which represents a group of individuals value in a simple and concise manner. So we get quick understanding of the general size of the individuals in the group. Since the values lies within the range of the data.

Some definitions for Measures of central tendency:

    " An average may be thought of as a measure of central value " - John I. Griffin

    " The inherit inability of human to group in its entirely a large body of numerical data compels us to seek relatively few constants that will adequately show the data " - R. A. Fisher

    " Averages are statistical constants which enable us to comprehend in a single effort the significance of the whole " - A. L. Bowley

Properties of central tendency:

  • It should be rigidly defined.
  • It should be simple and easy to calculate.
  • It depends on all the observations.
  • It should be suitable for further mathematical treatment.
  • It should be easily located from the graph.
  • It should not be much affected by the extreme observations.
Mathematical averages:
  1.  Arithmetic mean
  2. Geometric mean
  3. Harmonic mean

Positional averages: 

  1. Median
  2. Mode
  3. Quartiles
  4. Quintiles
  5. Octiles
  6. Deciles
  7. Percentiles.

Commercial averages: 

  1. Moving average
  2. Progressive average
  3. Composite average.

The five measures of central tendency that uses very commonly. 

  1. Arithmetic mean 
  2. Median
  3. Mode
  4. Geometric mean
  5. Harmonic mean 
    Here the clear explanation,

    

Arithmetic Mean: It is set of observations is their sum divided by the number of observations.




                                                      
                   Where i = 1, 2, 3, ...... n

Merits for mean:

  • It is rigidly defined.
  • It is easy to calculate.
  • It is based upon all observations.
  • It is amenable to algebraic treatment. 
  • Of all averages, mean is affected least by fluctuations of sampling. This property is sometimes described by saying that mean is a stable average

    Demerits for mean:

  • It cannot be determined by inspection nor it can be located graphically.
  • Mean cannot be used if we are dealing with qualitative characteristics which cannot be measured quantitatively.
  • Mean cannot be obtained if a single observation is missing or lost or illegible unless we drop it out and compute the mean of the remaining values.
  • Mean cannot be calculated if the extreme class is open.
  • In extremely asymmetrical distribution, usually mean is not a suitable measure of location.
Problems for mean:

1. Calculate the mean of the following data 50, 76, 44, 48, 57, 59, 63, 45, 48, 30.

2. Find the mean of the following data. 

    Given data, 



Median: Median of a distribution is the value of the variable which divides it into two equal parts. It is the value which exceeds and is exceeded by the same number of observations. The median is a positional average.

                        
    
   Merits of median: 
  • It is rigidly defined.
  • It is simple and easy to calculate.
  • It can be located from the graph.
    Demerits of median:
  • It does not depend on all the observations.
  • It is not suitable for further mathematical treatment.
  • In some situations, median is affected by extreme observations.
Problems for median:

1. Find the median for following frequency distribution.

     Given data,
                                            

      The cumulative frequency just greater than N/2 is 47.5, The value of x corresponding to 47.5 is 5.

2. Find the median for following distribution

                   
    Given data,

             
            The c.f just greater than N/2 is 19 is 33 and the corresponding class is 500 - 600.

                


                                                               = 506.66

Mode: Mode is the value which occurs most frequently in a set of observations and around which the other items of the set cluster densely. In the other words, mode is the value of the variable which is predominantly in the series.


      
    



Merits of mode:
  • It is rigidly defined.
  • It is simple and easy to calculate.
  • It is based on all the observations.
  • It can be located from graph.
  • It is not affected by the extreme observations.
Demerits of mode:
  • It is not rigidly defined.
  • It does not depend on all the observations.
  • It is not suitable for further mathematical treatment.
Problems for mode:

1. Find the mode for following
                                               
    Given data, 
    
    Here the maximum frequency is 14, thus the mode is 8.

2. Find the mode for the following distribution.

         Given data,
 
                                   

            

                                                                         = 46. 67

Geometric mean: It is a set of n observations is the nth root of their product.


                
    Merits for GM:
  • It is  rigidly defined.
  • It depends on all the observations.
  • It is not affected by the extreme observations.
    Demerits of GM:
  • The calculation of GM is not simple and easy.
  • It cannot be located from the graph
Problem of geometric mean:

1. Find the GM of 
 
                    

   
               


Harmonic mean: Harmonic mean of a number of observations none of which is zero, is the reciprocal of the AM of the reciprocals of the given values.

            
    Merits of HM:
  • It is rigidly defined.
  • It depends on all the observations.
  • It is not affected by the extreme observations.
Demerits of HM:
  • It is not simple and easy.
  • It cannot be located from the graph.
Problem for the HM:

1. Calculate the HM for the following continuous frequency distribution.

    

Quantiles: The values of the variable x corresponding to (N +1/4)th, (N + 1/2)th and 3(N+1)/4th items of an ordered discrete series are the values of Q1, Q2, and Q3 respectively. The position of the required item can easily be adjudged with the help of cumulative frequencies.

Deciles: Similar to quartiles, the values of the variable x corresponding to the i(N+1)/10th item for i = 1,2,.......9 of an ordered discrete series is Di. The position of i(N+1)/10th item can be located with the help of cumulative frequencies.

Percentiles: Just like deciles, the value of the variable x corresponding to i(N+1)/100th item for i=1,2,....,99 of an ordered series is Pi.i(N+1)/100th item in the series can easily be placed with the help of cumulative frequencies.

21 August 2023

Introduction to Statistics

1. What is statistics?

    Statistics is essentially a branch of applied mathematics and may be regarded as mathematics applied to observational data. It was defined by Sir R. A. Fisher.

2. What is scope of statistics?

    The scope of statistics in planning, economics, mathematics, business, industries, biology, astronomy, medical science, psychology, education, research, war.

3. Limitations of statistics?

    a. It is not suited to the study of qualitative phenomenon.

    b. The laws of statistics are not exact.

    c. It is liable to be misused.

    d. It does not study individuals.

4. Give me some definitions of statistics 

    a. " Statistics are the classified facts representing the conditions of the people in a state specially those facts which can be stated in number or in tables of numbers or in any tabular or classified arrangement " - Webster.

    b. " Statistics are numerical statement of facts in any department of equity placed in relation to each other " - Bowley.

    c. " By Statistics we mean quantitative data affected to a marked extent by multiplicity of causes" - Yule and Kendall.

5. What are the main divisions of statistics?

    a. Theoretical statistics or Mathematical statistics

    b. Statistical functions

    c. Descriptive statistics

    d. Inferential statistics

    e. Applied statistics.

6. What are different types of statistical investigation?

    a. Census method

    b. Sample method

7. Define Census method

    It means to include each and every unit or object of the population under reference for enquiry or observation. For example, to know the national income, we have to include every individual or unit which contributes towards the national income.

8. Define Sample method

    It means an investigator has to select some units from the population about which conclusions have to be drawn and take observations on the selected units. The results obtained from sample values are applicable to the population as a whole.

9. What are the main four functions of statistics?

    a. Collection of data

    b. Presentation of data

    c. Analysis of data

    d. Interpretation of data

10. Different methods of collection of data

    a. Direct personal enquiry method

    b. Indirect oral investigation

    c. By filling o schedules

    d. By mailed questionnaire.

    e. By old records

11. Define primary data and secondary data

    a. Primary data are those which are collected from the units or individuals directly and these data have never been used for any purpose earlier.

    b. Secondary data which had been collected by some individual or agency and statistically treated to draw certain conclusions. Again, the same data are used and analyzed to extract some other information, are termed as secondary data.

12. What are the requisites of a reliable data?

    a. It should be consistent.

    b. It should be complete.

    c. It should be accurate.

    d. It should be homogeneous.

13. Name some precautions to take in the planning of a survey

    a. Purpose

    b. Scope of survey

    c. Definition of terms

    d. Stating the hypothesis.

14. What are the characteristics of good questionnaire?

    a. It should be brief.

    b. The question should be mutually exclusive in nature.

    c. Personal questions should be avoided.

    d. It should not very lengthy

    e. It should not take time.

15. Different kinds of statistical investigation

    a. Survey

    b. Open enquiry

    c. Direct enquiry

    d. Indirect enquiry

    e. original enquiry

    f. Repetitive enquiry

    g. Regular enquiry

    h. ad hoc enquiry

    i. Limited enquiry

    j. extensive enquiry

16. What is statistical regularity?

    The law states that a reasonably large number of items selected at random from a large group of items selected at random from a large group of items will, on the average, be representative of the large group or population. This law is governed by the theory of probability.

17. What are the different sources of statistical errors?

    a. Errors of origin

    b. Errors of inadequacy

    c. Errors of manipulation 

    d. Errors of interpretation

18. State the law of decreasing variation 

    Law of decreasing variation indicates that the variation in a sample tends to reduce as the sample size increases.

19. What is purpose of statistics?

    Statistics deals with collection of data, classification of data, analysis of data and interpretation of data.

20. Briefly explain about primary data?

    The information collected by the interviewer or investigator or enumerator from the respondents for the first time is called primary data. The primary data can be collected by any one of the following three methods.

    a. Direct personal interview, the interviewer directly collects the information from the respondent by personal interview. This method accurate and reliable but this method takes more time and expensive when compared to the other methods.

    b. Indirect personal interview, the interviewer collects the information from the respondent through telephonic conversation. If the respondent is not available, then the Indirect personal method is used. In case of accidents the person may not be available and hence this method is used.

    c. Mailed questionnaire; the questionnaire is sent to the respondent through mail. The respondent should complete the questionnaire and return the questionnaire within fixed duration. The data collected by this method is not accurate, but this method takes less time, and it is less expensive.

21. Briefly explain about secondary data?

    The data which is collected from already available records is known as secondary data. The are two sources of collecting secondary data.

    a. If the information is collected from the magazines or books released by CSO, NSSO, etc., are called the Published Sources.

    b. If the information is collected from the dairies or letters or private individuals then they are called the Unpublished Sources.  

 

Measures of Skewness and Measures of Kurtosis

  Measures of Skewness     To say, skewness means 'lack of symmetry'. We study skewness to have an idea about the shape of the curve...