23 August 2023

Statistics important points


  1. Statistics are numerical statements of facts, but all numerical statements are not statistics.
  2. The statement, "Statistics is both a science and an art" was given by "Tippet."
  3. Statistics is a science and an art.
  4. Statistics is the arithmetic of human welfare.
  5. Before analysis, the data should be edited.
  6. The data collected from published reports is known as secondary data.
  7. Statistics is not applicable to single observation.
  8. Statistics can prove anything.
  9. The word 'Statistics' is used as singular and plural both.
  10. Relative error is always positive and negative.
  11. A study based on complete enumeration is known as census survey.
  12. If the actual value of a unit is 315 and its estimated value is 300, then the absolute error is 15.
  13. Statistical results are true on average.
  14. Classification is the grouping of facts that are distinguished by some significant attributes.
  15. Classification can be done according to attributes.
  16. Quantitative classification leads to frequency distribution.
  17. The difference between the upper and lower limit of a class is called class interval.
  18. The average of the upper and lower limits of a class is known as mid-value.
  19. Number of classes depend on the class interval.
  20. Tabulation follows classification.
  21. Headings of the columns of a table are called as captions.
  22. A series arranged in accordance with each and every observation is known as individual series.
  23. The graphs of less than type and more than type distributions intersect at median.
  24. Percentage frequency is the relative frequency multiplied by 100.
  25. A series of data with exclusive classes along with the corresponding frequencies is called continuous frequency distribution.
  26. Numerical data presented in descriptive form are called textual presentation.
  27. If the number of students in a school is 200 and maximum and minimum marks earned are 90 and 10 respectively, for the distribution of marks, the class interval is 9.
  28. If the lower and upper limits of a class are 10 and 40 respectively, the mid-points of the class are 25.0.
  29. In a grouped data, the number of classes preferred are adequate.
  30. Class interval is measured as the difference between upper and lower limit.
  31. A grouped frequency distribution with uncertain first or last classes is known as open end distribution.
  32. Data can be well displayed or presented by way of cross classification.
  33. A simple table represents always two factors.
  34. A complex table represents two or more factors.
  35. The column heading of a table are known as captions.
  36. A frequency distribution can be discrete and continuous.
  37. In an individual series, each variate value has frequency one.
  38. An individual series is a particular case of discreate series.
  39. Frequency of a variable is always an integer.
  40. The data given as 5,7,12,17,79,84,91 will an individual series.
  41. In an ordered series, the data are in ascending and descending order.
  42. Classification is applicable in case of quantitative and qualitative.
  43. Distribution of families according to their size will be classified as quantitative classification.
  44. Year wise recording of data of food production will be called Chronological classification.
  45. A figure 16,318.7 rounded to the nearest tenth place is 16,320.
  46. The figure 35,627 approximated to the thousandth place by the method of discarding figure is 32,600.
  47. The figure 13.75 rounded to one decimal place is 13.8.
  48. Government cannot do proper planning without the help of statistics.
  49. Statistics is liable to misused.
  50. Use of statistical methods is most dangerous in the hands of inexpert.
  51. Diagrams are another form of tabulation.
  52. Histogram and historigram are not same.
  53. A histogram is not suitable for geographical classification.
  54. Frequency polygon can be drawn with the help of histogram.
  55. Histograms can be drawn only for continuous frequency distribution.
  56. Pie-chart is always circular.
  57. Squares are two-dimensional diagrams.
  58. Cylinders are three-dimensional diagrams.
  59. Pictograms are non-dimensional diagrams.
  60. A straight line in a graph indicates the trend.
  61. Lorenz curve was initially given by M.O. Lorenz, and it indicates the inequality of distributions of two factors of a population. When Lorenz curve turns out to be a straight line, it its concluded that the two factors have equal distribution.  
  62. Column charts on circular base are just like bar diagrams.
  63. A smoothed frequency polygon is known as frequency curve.
  64. There is no zero-base line in a semi-logarithmic graph since log 0 = - infinity.
  65. Pictograms are originated by Dr Otto Neurath.
  66. Pictograms are least satisfactory. 
  67. Ratio charts are also known as semi logarithmic charts.
  68. When more than one factor is to be displayed for comparison during various years, multiple bar diagram is suitable.
  69. Line diagram is suitable when there are too many variate values in the frequency distribution.
  70. Deviation bar diagram is suitable for showing the difference between budget provisions and actual expenditure of PWD in the last ten years.
  71. Charts and graphs are the presentation of numerical facts by means of area and other geometrical forms. It facilitates to know the relationship and the trend and also comparison of values.
  72. The purpose served by diagrams and chart is simple presentation of data.
  73. Charts of a particular chart depends on the nature of the data, the purpose of the data and the type of audience.
  74. Rectilinear co-ordinate chart is also referred to as Cartesian co-ordinate graph and also rectangular graph.
  75. Trilinear chart is used to portray simultaneously three variables. It shapes like equilateral triangle.
  76. In an ogive curve, the points are plotted for the values and cumulative frequencies. Ogive for more than type and less than type distributions intersect at median.
  77. When the values are large in magnitude in a chronological series and variation amongst values in small, a graph is better drawn by choosing a false base line.
  78. In a bar diagram, the base line is vertical, and the bars are horizontal.
  79. In a column chart, the base line is horizontal, and the bars are vertical.
  80. In case of frequency distribution with classes of unequal widths, the heights of bars of a histogram are proportional to the frequency densities.
  81. Year wise production of rice, wheat and maize for the last ten years can be displayed by simple column chart.
  82. Profit and loss of a firm during various years can be displayed through deviation bar chart.
  83. When for some countries, the magnitudes are small and for other, the magnitudes are very large, to portray the data, it is preferred to construct broken bar diagram.
  84. Historigram is suitable for time series data.
  85. When we have the number of court cases of different categories and information about number of cases settled, the information can be better portrayed through sliding bar diagram.
  86. To show the maximal and minimal values in a time series, the suitable chart is range curve.
  87. With help of ogive curve, one can determine the deciles, percentiles and median.
  88. Pictograms are least accurate and generally used by dilettante and suitable for the data in counts. It shown by pictures.
  89. If there is an increase in a series at constant rate, the graph will be a straight line from left bottom to right top. If there is a decrease in a series, then a straight line from left top to right bottom.
  90. A semi-logarithmic graph of a series increasing by a amount will be a convex upward curve.
  91. The suitable chart to emphasize the difference between two time series, of which one is at a higher level, is band chart.
  92. When there is a pronounced skewness, the desirable series to plot the frequency distribution is logarithmic scale.
  93. When there are a large number of values in an individual series, preference for portraying the data goes to line chart.
  94. An alternative chart to pie-chart is rectangular chart.
  95. The graph of the successive points of a distribution joined by straight lines in statistical terminology is known as frequency curve.
  96. A deviational or bilateral chart with 100 percent component columns is also known as floating column chart.
  97. The immigration and outmigration of people in a number of countries and also the net migration can be better displayed by gross-deviation column chart.
  98. Common form of rectilinear co-ordinate graph is stratum chart.
  99. An arithmetic chart can have multiple amount scale.
  100. Proportion of males and females in India in different occupations in the year 2000 can most properly be represented by sliding bar diagram.
  101. Mean is a measure of location.
  102. If a constant value 50 is subtracted from each observation of a set, the mean of the set is decreased by 50.
  103. If a constant 5 is added to each observation of a set, the mean is increased by 5.
  104. If each observation of a set is multiplied by 10, the mean of the new set of observations is ten times the original mean.
  105. If each value of a series is multiplied by 10, the median of the coded values is 10 times the original median value.
  106. If each value of a series is multiplied by 10, the mode of the coded values is 10 times the original model value.
  107. If each observation of a set is divided by 2, then the mean of new values is decreased by 2.
  108. Geometric mean of two observations can be calculated only if both the observations are positive. It is better than other means when the data are in ratios or percentages. It is a good measure of central value if the data are in ratios or proportions.
  109. Expenditure during first five months of a year is Rs. 96 per month and during the last seven months is Rs. 120 per month. The average expenditure per month during whole year is Rs. 110 per month.
  110. Average strength of eleven members = 11.0. Average strength of the first six members = 10.5. Average strength of the last six members = 11.5. The average strength of the sixth member is 11.0.
  111. The average of the 7 number 7, 9, 12, x, 5, 4, 11 is 9. The missing number x is 15.
  112. The mean proportion of 0.16 and 0.01 is 0.04.
  113. A train speed covered the first 5 km of its journey at a speed of 30 km/h and next 15 km at a speed of 45 km/h. The average speed of the train was 40 km/h.
  114. The second of the two samples have 50 items with mean 15. If the whole group has 150 items with mean 16, the mean of the first sample is 16.5.
  115. For a group of 100 candidates, the mean was found to be 40. Later on it was discovered that a value 45 was misread as 54. The correct mean is 39.91.
  116. A distribution consists of three groups having 40,50 and 60 items with means 20, 26 and 15 respectively. The mean of the distribution is 20.
  117. The average age of 50 students in a bus is 20 years. When the age of conductor is included, the average age is increased by one year. The age of the conductor is 71.
  118. The average of 5 numbers is 40 and the average of another 4 numbers is 50. The average of all numbers taken together is 44.44.
  119. The average temperature of two cities on first six days of a week is same. The temperature dropped in one city all of a sudden on the seventh day of the week. The average weekly temperature of the two cities differed by 0.5 Celsius. The difference between the six days average daily temperature of two cities and the seventh day temperature of the other city is 2.0.
  120. There were 25 teachers in a school whose mean age was 30 years. A teacher retired at the age of 60 years and a new teacher was appointed in his place. The mean age of teachers in the school was reduced by one year. The age of the new teacher was 35 years.
  121. If the mean of a set of two observations is 9 and its geometric mean is 6. Then the harmonic mean of the set of observation is 4.
  122. The mean of two numbers is 6.5 and their geometric mean is 6. The two numbers are 9,5.
  123. In a factory there are 60 percent laborers. 30 percent scribes and 10 percent executives. On average, the salary of a laborer is Rs. 1600 p.m, of a scribe Rs.3000 p.m and that of an executive is Rs. 8000 p.m. The average salary of an employee in the factory is Rs. 2660 p.m.
  124. The mean of seven observations is 8. A new observation 16 is added. The mean of eight observation is 9.
  125. If the sum of N observations is 630 and their mean is 42, then the value of N is 15.
  126. If the two observations are 10 and -10, then their harmonic mean is infinity.
  127. The percentage of values used in case of 10 percent trimmed mean is 80 percent.
  128. The average of 2n natural numbers from 1 to 2n is (2n + 1)/2.
  129. A man goes from his house to his office at the speed of 20 km/h and returns from his office to home at the speed of 30 km/h. His mean speed is 24 km/h.
  130. A frequency distribution having two modes is said to be bimodal.
  131. If modal value is not clear in a distribution, it can be ascertained by the method of grouping.
  132. The median of the variate values 11, 7, 6, 9, 12, 15, 19 is 11.
  133. The middle value of an ordered series is called 50 percentiles.
  134. Rs. 600 per day are paid on a research farm to its 50 daily paid laborer. A worker gets five unpaid holidays in a month. The average income of a daily paid laborer is Rs. 300 p.m.
  135. The variate values which divide a series into five equal parts are called ad quintiles, if it divides into four equal parts then it is called quartiles. If it divides into 10 equal parts, then it is deciles.
  136. If we plot the more than type and less than type frequency distributions of the same set of data, their graphs intersect at the point which is known as median.
  137. In a class test, 40 students out of 50 passed with mean marks 6.0 and the overall average of class marks was 5.5. The average marks of students who failed were 3.5.
  138. The average marks of section A are 65 and that of section B are 70. The average of both the sections combined is 67. The ratio of number of students of section A to B is 3:2.
  139. Geometric mean can be used to find outgrowth rate of GNP.
  140. Weighted mean gives a higher value than unweighted mean if larger items have higher weights and small items have lower weights.
  141. If for values of X, mean = 25, harmonic mean = 9, then the geometric mean is 15.
  142. The second quartile of the following set of data, 0, 1, -1, -2, 6, 4, 5, 8, 12, 10, 11 is 5.
  143. A person has deputed to find the average income of factory employees. To provide a correct picture of average income, he should find out weighted mean.
  144. If y = 2x - 11 and median of y is 49, then the median of x is 30.
  145. If y = 3x + 6 and mode of y is 66, then the mode of x is 21.

22 August 2023

Measures of central tendency

    It is a single value within the range of data which represents a group of individuals value in a simple and concise manner. So we get quick understanding of the general size of the individuals in the group. Since the values lies within the range of the data.

Some definitions for Measures of central tendency:

    " An average may be thought of as a measure of central value " - John I. Griffin

    " The inherit inability of human to group in its entirely a large body of numerical data compels us to seek relatively few constants that will adequately show the data " - R. A. Fisher

    " Averages are statistical constants which enable us to comprehend in a single effort the significance of the whole " - A. L. Bowley

Properties of central tendency:

  • It should be rigidly defined.
  • It should be simple and easy to calculate.
  • It depends on all the observations.
  • It should be suitable for further mathematical treatment.
  • It should be easily located from the graph.
  • It should not be much affected by the extreme observations.
Mathematical averages:
  1.  Arithmetic mean
  2. Geometric mean
  3. Harmonic mean

Positional averages: 

  1. Median
  2. Mode
  3. Quartiles
  4. Quintiles
  5. Octiles
  6. Deciles
  7. Percentiles.

Commercial averages: 

  1. Moving average
  2. Progressive average
  3. Composite average.

The five measures of central tendency that uses very commonly. 

  1. Arithmetic mean 
  2. Median
  3. Mode
  4. Geometric mean
  5. Harmonic mean 
    Here the clear explanation,

    

Arithmetic Mean: It is set of observations is their sum divided by the number of observations.




                                                      
                   Where i = 1, 2, 3, ...... n

Merits for mean:

  • It is rigidly defined.
  • It is easy to calculate.
  • It is based upon all observations.
  • It is amenable to algebraic treatment. 
  • Of all averages, mean is affected least by fluctuations of sampling. This property is sometimes described by saying that mean is a stable average

    Demerits for mean:

  • It cannot be determined by inspection nor it can be located graphically.
  • Mean cannot be used if we are dealing with qualitative characteristics which cannot be measured quantitatively.
  • Mean cannot be obtained if a single observation is missing or lost or illegible unless we drop it out and compute the mean of the remaining values.
  • Mean cannot be calculated if the extreme class is open.
  • In extremely asymmetrical distribution, usually mean is not a suitable measure of location.
Problems for mean:

1. Calculate the mean of the following data 50, 76, 44, 48, 57, 59, 63, 45, 48, 30.

2. Find the mean of the following data. 

    Given data, 



Median: Median of a distribution is the value of the variable which divides it into two equal parts. It is the value which exceeds and is exceeded by the same number of observations. The median is a positional average.

                        
    
   Merits of median: 
  • It is rigidly defined.
  • It is simple and easy to calculate.
  • It can be located from the graph.
    Demerits of median:
  • It does not depend on all the observations.
  • It is not suitable for further mathematical treatment.
  • In some situations, median is affected by extreme observations.
Problems for median:

1. Find the median for following frequency distribution.

     Given data,
                                            

      The cumulative frequency just greater than N/2 is 47.5, The value of x corresponding to 47.5 is 5.

2. Find the median for following distribution

                   
    Given data,

             
            The c.f just greater than N/2 is 19 is 33 and the corresponding class is 500 - 600.

                


                                                               = 506.66

Mode: Mode is the value which occurs most frequently in a set of observations and around which the other items of the set cluster densely. In the other words, mode is the value of the variable which is predominantly in the series.


      
    



Merits of mode:
  • It is rigidly defined.
  • It is simple and easy to calculate.
  • It is based on all the observations.
  • It can be located from graph.
  • It is not affected by the extreme observations.
Demerits of mode:
  • It is not rigidly defined.
  • It does not depend on all the observations.
  • It is not suitable for further mathematical treatment.
Problems for mode:

1. Find the mode for following
                                               
    Given data, 
    
    Here the maximum frequency is 14, thus the mode is 8.

2. Find the mode for the following distribution.

         Given data,
 
                                   

            

                                                                         = 46. 67

Geometric mean: It is a set of n observations is the nth root of their product.


                
    Merits for GM:
  • It is  rigidly defined.
  • It depends on all the observations.
  • It is not affected by the extreme observations.
    Demerits of GM:
  • The calculation of GM is not simple and easy.
  • It cannot be located from the graph
Problem of geometric mean:

1. Find the GM of 
 
                    

   
               


Harmonic mean: Harmonic mean of a number of observations none of which is zero, is the reciprocal of the AM of the reciprocals of the given values.

            
    Merits of HM:
  • It is rigidly defined.
  • It depends on all the observations.
  • It is not affected by the extreme observations.
Demerits of HM:
  • It is not simple and easy.
  • It cannot be located from the graph.
Problem for the HM:

1. Calculate the HM for the following continuous frequency distribution.

    

Quantiles: The values of the variable x corresponding to (N +1/4)th, (N + 1/2)th and 3(N+1)/4th items of an ordered discrete series are the values of Q1, Q2, and Q3 respectively. The position of the required item can easily be adjudged with the help of cumulative frequencies.

Deciles: Similar to quartiles, the values of the variable x corresponding to the i(N+1)/10th item for i = 1,2,.......9 of an ordered discrete series is Di. The position of i(N+1)/10th item can be located with the help of cumulative frequencies.

Percentiles: Just like deciles, the value of the variable x corresponding to i(N+1)/100th item for i=1,2,....,99 of an ordered series is Pi.i(N+1)/100th item in the series can easily be placed with the help of cumulative frequencies.

21 August 2023

Introduction to Statistics

1. What is statistics?

    Statistics is essentially a branch of applied mathematics and may be regarded as mathematics applied to observational data. It was defined by Sir R. A. Fisher.

2. What is scope of statistics?

    The scope of statistics in planning, economics, mathematics, business, industries, biology, astronomy, medical science, psychology, education, research, war.

3. Limitations of statistics?

    a. It is not suited to the study of qualitative phenomenon.

    b. The laws of statistics are not exact.

    c. It is liable to be misused.

    d. It does not study individuals.

4. Give me some definitions of statistics 

    a. " Statistics are the classified facts representing the conditions of the people in a state specially those facts which can be stated in number or in tables of numbers or in any tabular or classified arrangement " - Webster.

    b. " Statistics are numerical statement of facts in any department of equity placed in relation to each other " - Bowley.

    c. " By Statistics we mean quantitative data affected to a marked extent by multiplicity of causes" - Yule and Kendall.

5. What are the main divisions of statistics?

    a. Theoretical statistics or Mathematical statistics

    b. Statistical functions

    c. Descriptive statistics

    d. Inferential statistics

    e. Applied statistics.

6. What are different types of statistical investigation?

    a. Census method

    b. Sample method

7. Define Census method

    It means to include each and every unit or object of the population under reference for enquiry or observation. For example, to know the national income, we have to include every individual or unit which contributes towards the national income.

8. Define Sample method

    It means an investigator has to select some units from the population about which conclusions have to be drawn and take observations on the selected units. The results obtained from sample values are applicable to the population as a whole.

9. What are the main four functions of statistics?

    a. Collection of data

    b. Presentation of data

    c. Analysis of data

    d. Interpretation of data

10. Different methods of collection of data

    a. Direct personal enquiry method

    b. Indirect oral investigation

    c. By filling o schedules

    d. By mailed questionnaire.

    e. By old records

11. Define primary data and secondary data

    a. Primary data are those which are collected from the units or individuals directly and these data have never been used for any purpose earlier.

    b. Secondary data which had been collected by some individual or agency and statistically treated to draw certain conclusions. Again, the same data are used and analyzed to extract some other information, are termed as secondary data.

12. What are the requisites of a reliable data?

    a. It should be consistent.

    b. It should be complete.

    c. It should be accurate.

    d. It should be homogeneous.

13. Name some precautions to take in the planning of a survey

    a. Purpose

    b. Scope of survey

    c. Definition of terms

    d. Stating the hypothesis.

14. What are the characteristics of good questionnaire?

    a. It should be brief.

    b. The question should be mutually exclusive in nature.

    c. Personal questions should be avoided.

    d. It should not very lengthy

    e. It should not take time.

15. Different kinds of statistical investigation

    a. Survey

    b. Open enquiry

    c. Direct enquiry

    d. Indirect enquiry

    e. original enquiry

    f. Repetitive enquiry

    g. Regular enquiry

    h. ad hoc enquiry

    i. Limited enquiry

    j. extensive enquiry

16. What is statistical regularity?

    The law states that a reasonably large number of items selected at random from a large group of items selected at random from a large group of items will, on the average, be representative of the large group or population. This law is governed by the theory of probability.

17. What are the different sources of statistical errors?

    a. Errors of origin

    b. Errors of inadequacy

    c. Errors of manipulation 

    d. Errors of interpretation

18. State the law of decreasing variation 

    Law of decreasing variation indicates that the variation in a sample tends to reduce as the sample size increases.

19. What is purpose of statistics?

    Statistics deals with collection of data, classification of data, analysis of data and interpretation of data.

20. Briefly explain about primary data?

    The information collected by the interviewer or investigator or enumerator from the respondents for the first time is called primary data. The primary data can be collected by any one of the following three methods.

    a. Direct personal interview, the interviewer directly collects the information from the respondent by personal interview. This method accurate and reliable but this method takes more time and expensive when compared to the other methods.

    b. Indirect personal interview, the interviewer collects the information from the respondent through telephonic conversation. If the respondent is not available, then the Indirect personal method is used. In case of accidents the person may not be available and hence this method is used.

    c. Mailed questionnaire; the questionnaire is sent to the respondent through mail. The respondent should complete the questionnaire and return the questionnaire within fixed duration. The data collected by this method is not accurate, but this method takes less time, and it is less expensive.

21. Briefly explain about secondary data?

    The data which is collected from already available records is known as secondary data. The are two sources of collecting secondary data.

    a. If the information is collected from the magazines or books released by CSO, NSSO, etc., are called the Published Sources.

    b. If the information is collected from the dairies or letters or private individuals then they are called the Unpublished Sources.  

 

Measures of Skewness and Measures of Kurtosis

  Measures of Skewness     To say, skewness means 'lack of symmetry'. We study skewness to have an idea about the shape of the curve...