14 September 2023

Problems on measures of central tendency


  1. The following numbers give the weights of 55 students of a class. Prepare a suitable frequency table:  42, 74, 40, 60, 82, 115, 41, 61, 75, 83, 63, 53, 110, 76, 84, 50, 67, 65, 78, 77, 56, 95, 68, 69, 104, 80, 79, 79, 54, 73, 59, 81, 100, 66, 49, 77, 90, 84, 76, 42, 64, 69, 70, 80, 72, 50, 79, 52, 103, 96, 51, 86, 78, 94, 71.
          (a) Draw the histogram and frequency polygon of the above data. 
          (b) For the above heights, prepare a cumulative frequency table and draw the less than ogive. 

   Sol: Given the weights of 55 students, we can prepare a frequency table by categorizing the data into class intervals. Let's use intervals of size 10 to create the frequency table.

  • Firstly, find the minimum and maximum weight to determine the range. MIN = 40, MAX = 115.
  • Let's choose class intervals of size 10: 40-49, 50-59, 60-69, etc.
  • Count the number of students in each interval.
    Frequency table:
                                               
   

              (a) Histogram: 

    

              (a) Frequency polygon:

     

             (b)  Cumulative frequency table:

Interval

Cumulative Frequency

40-49

7

50-59

14

60-69

24

70-79

40

80-89

47

90-99

51

100-109

54

110-119

55

     (b) Ogive curve:


    2. What are the points to be borne in mind in the formation of a frequency table?
        Choosing appropriate class intervals, from a frequency table for the following data: 10.2, 0.5, 5.2, 6.1, 3.1, 6.7, 8.9, 5.4, 3.6, 9.2, 6.1, 7.3, 2.0, 1.3, 6.4, 8.0, 4.3, 4.7, 12.4, 8.6, 13.1, 3.2, 9.5, 7.6, 4.0, 5.1, 8.1, 1.1, 11.5, 3.1, 6.8, 7.0, 8.2, 2.0, 3.1, 6.5, 11.2, 12.0, 5.1, 10.9, 11.2, 8.5, 2.3, 3.4, 5.2, 10.7, 4.9, 6.2.


Sol:  When forming a frequency table, there are several points that should be borne in mind:
 
  •  Understand the purpose: The table should be suitable for the objective of the analysis. For instance, detailed classifications may be required for an in-depth study, while broader categories may suffice for an overview.
  • Minimum and Maximum values: Determine the range of the data by identifying the minimum and maximum values. This helps in defining the classes.
  • Class Intervals: It should be uniform, meaning that they should have equal width, unless the data necessitates variable width due to its nature. Choose a suitable class width based on the range and the number of observations. The width should neither be too small nor too large. The starting point of the first-class intervals is often a convenient number slightly less than or equal to the smallest observation.
  • The number of classes typically ranges from 5 to 20. The formula 2 power k > n, where n is the number of observations and k is the number of classes, can be a good starting point.
  • The class intervals should be distinct and not overlap.
  • If there are exceptionally high or low values, consider using open-ended classes like "more than or less than".
  • Determine the class mid points, boundaries, and limits. This helps in plotting graphs and further analysis.
  • They can be used for raw data to count the frequency for each class.
  • If required, include a column for cumulative frequency.
  Frequency table:
                                          

Interval

 Frequency

 0.5 - 3.0

6

 3.0 - 5.5

15

5.5 - 8.0

10

8.0 - 10.5

9

10.5 - 13.0

7

 

3. The following table shows the distribution of the number of students per teacher in 750 colleges:

  Students: 1, 4, 7, 10, 13, 16, 19, 22, 25, 28
Frequency: 7, 46, 165, 195, 189, 89, 28, 19, 19, 9, 3

Draw histogram for the data




4. Descriptive statistics for age wise view on electrical vehicles.

Row

Age

Frequency

Percentage

1

Below 25

 243

68.0672268907563

2

25 - 35

55

15.406162464

3

35 – 45

34

9.5238095238095237

4

Above 45

25

7.0028011204481793






5.
Descriptive statistics for occupation wise view on electrical vehicles

Row

Occupation

Frequency

Percentage

1

Student

 203

56.862745098039213

2

Government Employee

27

7.56302521008

3

Private Employee

91

25.490196078431371

4

Self-employee

24

6.7226890756302522

5

Un-employee

12

3.3613445378151261









   6Data visualization for global warming wise view on electrical vehicles  
                  

Row

Global warming

Frequency

Percentage

1

Strongly Agree

 135

37.815126050420169

2

Agree

182

50.980392156862742

3

Disagree

28

7.8431372549019605

4

Strongly Disagree

12

3.361344537









   7Data visualization for knowledge about electrical vehicles     

     

Row

Knowledge about Ev

Frequency

Percentage

1

Newspapers

 46

12.88515406162465

2

Magazine

57

15.96638655

3

Television

55

15.406162464985995

4

Internet

199

55.742296918767508

 









8.
Data visualization for charging points for electrical vehicles  


Row

Charging Points

Frequency

Percentage

1

Each 50 KM

 129

36.134453781512605

2

Each 100 KM

141

39.495798319327733

3

Each 150 KM

54

15.12605042

4

Each 200 KM

33

9.2436974789915958












23 August 2023

Statistics important points


  1. Statistics are numerical statements of facts, but all numerical statements are not statistics.
  2. The statement, "Statistics is both a science and an art" was given by "Tippet."
  3. Statistics is a science and an art.
  4. Statistics is the arithmetic of human welfare.
  5. Before analysis, the data should be edited.
  6. The data collected from published reports is known as secondary data.
  7. Statistics is not applicable to single observation.
  8. Statistics can prove anything.
  9. The word 'Statistics' is used as singular and plural both.
  10. Relative error is always positive and negative.
  11. A study based on complete enumeration is known as census survey.
  12. If the actual value of a unit is 315 and its estimated value is 300, then the absolute error is 15.
  13. Statistical results are true on average.
  14. Classification is the grouping of facts that are distinguished by some significant attributes.
  15. Classification can be done according to attributes.
  16. Quantitative classification leads to frequency distribution.
  17. The difference between the upper and lower limit of a class is called class interval.
  18. The average of the upper and lower limits of a class is known as mid-value.
  19. Number of classes depend on the class interval.
  20. Tabulation follows classification.
  21. Headings of the columns of a table are called as captions.
  22. A series arranged in accordance with each and every observation is known as individual series.
  23. The graphs of less than type and more than type distributions intersect at median.
  24. Percentage frequency is the relative frequency multiplied by 100.
  25. A series of data with exclusive classes along with the corresponding frequencies is called continuous frequency distribution.
  26. Numerical data presented in descriptive form are called textual presentation.
  27. If the number of students in a school is 200 and maximum and minimum marks earned are 90 and 10 respectively, for the distribution of marks, the class interval is 9.
  28. If the lower and upper limits of a class are 10 and 40 respectively, the mid-points of the class are 25.0.
  29. In a grouped data, the number of classes preferred are adequate.
  30. Class interval is measured as the difference between upper and lower limit.
  31. A grouped frequency distribution with uncertain first or last classes is known as open end distribution.
  32. Data can be well displayed or presented by way of cross classification.
  33. A simple table represents always two factors.
  34. A complex table represents two or more factors.
  35. The column heading of a table are known as captions.
  36. A frequency distribution can be discrete and continuous.
  37. In an individual series, each variate value has frequency one.
  38. An individual series is a particular case of discreate series.
  39. Frequency of a variable is always an integer.
  40. The data given as 5,7,12,17,79,84,91 will an individual series.
  41. In an ordered series, the data are in ascending and descending order.
  42. Classification is applicable in case of quantitative and qualitative.
  43. Distribution of families according to their size will be classified as quantitative classification.
  44. Year wise recording of data of food production will be called Chronological classification.
  45. A figure 16,318.7 rounded to the nearest tenth place is 16,320.
  46. The figure 35,627 approximated to the thousandth place by the method of discarding figure is 32,600.
  47. The figure 13.75 rounded to one decimal place is 13.8.
  48. Government cannot do proper planning without the help of statistics.
  49. Statistics is liable to misused.
  50. Use of statistical methods is most dangerous in the hands of inexpert.
  51. Diagrams are another form of tabulation.
  52. Histogram and historigram are not same.
  53. A histogram is not suitable for geographical classification.
  54. Frequency polygon can be drawn with the help of histogram.
  55. Histograms can be drawn only for continuous frequency distribution.
  56. Pie-chart is always circular.
  57. Squares are two-dimensional diagrams.
  58. Cylinders are three-dimensional diagrams.
  59. Pictograms are non-dimensional diagrams.
  60. A straight line in a graph indicates the trend.
  61. Lorenz curve was initially given by M.O. Lorenz, and it indicates the inequality of distributions of two factors of a population. When Lorenz curve turns out to be a straight line, it its concluded that the two factors have equal distribution.  
  62. Column charts on circular base are just like bar diagrams.
  63. A smoothed frequency polygon is known as frequency curve.
  64. There is no zero-base line in a semi-logarithmic graph since log 0 = - infinity.
  65. Pictograms are originated by Dr Otto Neurath.
  66. Pictograms are least satisfactory. 
  67. Ratio charts are also known as semi logarithmic charts.
  68. When more than one factor is to be displayed for comparison during various years, multiple bar diagram is suitable.
  69. Line diagram is suitable when there are too many variate values in the frequency distribution.
  70. Deviation bar diagram is suitable for showing the difference between budget provisions and actual expenditure of PWD in the last ten years.
  71. Charts and graphs are the presentation of numerical facts by means of area and other geometrical forms. It facilitates to know the relationship and the trend and also comparison of values.
  72. The purpose served by diagrams and chart is simple presentation of data.
  73. Charts of a particular chart depends on the nature of the data, the purpose of the data and the type of audience.
  74. Rectilinear co-ordinate chart is also referred to as Cartesian co-ordinate graph and also rectangular graph.
  75. Trilinear chart is used to portray simultaneously three variables. It shapes like equilateral triangle.
  76. In an ogive curve, the points are plotted for the values and cumulative frequencies. Ogive for more than type and less than type distributions intersect at median.
  77. When the values are large in magnitude in a chronological series and variation amongst values in small, a graph is better drawn by choosing a false base line.
  78. In a bar diagram, the base line is vertical, and the bars are horizontal.
  79. In a column chart, the base line is horizontal, and the bars are vertical.
  80. In case of frequency distribution with classes of unequal widths, the heights of bars of a histogram are proportional to the frequency densities.
  81. Year wise production of rice, wheat and maize for the last ten years can be displayed by simple column chart.
  82. Profit and loss of a firm during various years can be displayed through deviation bar chart.
  83. When for some countries, the magnitudes are small and for other, the magnitudes are very large, to portray the data, it is preferred to construct broken bar diagram.
  84. Historigram is suitable for time series data.
  85. When we have the number of court cases of different categories and information about number of cases settled, the information can be better portrayed through sliding bar diagram.
  86. To show the maximal and minimal values in a time series, the suitable chart is range curve.
  87. With help of ogive curve, one can determine the deciles, percentiles and median.
  88. Pictograms are least accurate and generally used by dilettante and suitable for the data in counts. It shown by pictures.
  89. If there is an increase in a series at constant rate, the graph will be a straight line from left bottom to right top. If there is a decrease in a series, then a straight line from left top to right bottom.
  90. A semi-logarithmic graph of a series increasing by a amount will be a convex upward curve.
  91. The suitable chart to emphasize the difference between two time series, of which one is at a higher level, is band chart.
  92. When there is a pronounced skewness, the desirable series to plot the frequency distribution is logarithmic scale.
  93. When there are a large number of values in an individual series, preference for portraying the data goes to line chart.
  94. An alternative chart to pie-chart is rectangular chart.
  95. The graph of the successive points of a distribution joined by straight lines in statistical terminology is known as frequency curve.
  96. A deviational or bilateral chart with 100 percent component columns is also known as floating column chart.
  97. The immigration and outmigration of people in a number of countries and also the net migration can be better displayed by gross-deviation column chart.
  98. Common form of rectilinear co-ordinate graph is stratum chart.
  99. An arithmetic chart can have multiple amount scale.
  100. Proportion of males and females in India in different occupations in the year 2000 can most properly be represented by sliding bar diagram.
  101. Mean is a measure of location.
  102. If a constant value 50 is subtracted from each observation of a set, the mean of the set is decreased by 50.
  103. If a constant 5 is added to each observation of a set, the mean is increased by 5.
  104. If each observation of a set is multiplied by 10, the mean of the new set of observations is ten times the original mean.
  105. If each value of a series is multiplied by 10, the median of the coded values is 10 times the original median value.
  106. If each value of a series is multiplied by 10, the mode of the coded values is 10 times the original model value.
  107. If each observation of a set is divided by 2, then the mean of new values is decreased by 2.
  108. Geometric mean of two observations can be calculated only if both the observations are positive. It is better than other means when the data are in ratios or percentages. It is a good measure of central value if the data are in ratios or proportions.
  109. Expenditure during first five months of a year is Rs. 96 per month and during the last seven months is Rs. 120 per month. The average expenditure per month during whole year is Rs. 110 per month.
  110. Average strength of eleven members = 11.0. Average strength of the first six members = 10.5. Average strength of the last six members = 11.5. The average strength of the sixth member is 11.0.
  111. The average of the 7 number 7, 9, 12, x, 5, 4, 11 is 9. The missing number x is 15.
  112. The mean proportion of 0.16 and 0.01 is 0.04.
  113. A train speed covered the first 5 km of its journey at a speed of 30 km/h and next 15 km at a speed of 45 km/h. The average speed of the train was 40 km/h.
  114. The second of the two samples have 50 items with mean 15. If the whole group has 150 items with mean 16, the mean of the first sample is 16.5.
  115. For a group of 100 candidates, the mean was found to be 40. Later on it was discovered that a value 45 was misread as 54. The correct mean is 39.91.
  116. A distribution consists of three groups having 40,50 and 60 items with means 20, 26 and 15 respectively. The mean of the distribution is 20.
  117. The average age of 50 students in a bus is 20 years. When the age of conductor is included, the average age is increased by one year. The age of the conductor is 71.
  118. The average of 5 numbers is 40 and the average of another 4 numbers is 50. The average of all numbers taken together is 44.44.
  119. The average temperature of two cities on first six days of a week is same. The temperature dropped in one city all of a sudden on the seventh day of the week. The average weekly temperature of the two cities differed by 0.5 Celsius. The difference between the six days average daily temperature of two cities and the seventh day temperature of the other city is 2.0.
  120. There were 25 teachers in a school whose mean age was 30 years. A teacher retired at the age of 60 years and a new teacher was appointed in his place. The mean age of teachers in the school was reduced by one year. The age of the new teacher was 35 years.
  121. If the mean of a set of two observations is 9 and its geometric mean is 6. Then the harmonic mean of the set of observation is 4.
  122. The mean of two numbers is 6.5 and their geometric mean is 6. The two numbers are 9,5.
  123. In a factory there are 60 percent laborers. 30 percent scribes and 10 percent executives. On average, the salary of a laborer is Rs. 1600 p.m, of a scribe Rs.3000 p.m and that of an executive is Rs. 8000 p.m. The average salary of an employee in the factory is Rs. 2660 p.m.
  124. The mean of seven observations is 8. A new observation 16 is added. The mean of eight observation is 9.
  125. If the sum of N observations is 630 and their mean is 42, then the value of N is 15.
  126. If the two observations are 10 and -10, then their harmonic mean is infinity.
  127. The percentage of values used in case of 10 percent trimmed mean is 80 percent.
  128. The average of 2n natural numbers from 1 to 2n is (2n + 1)/2.
  129. A man goes from his house to his office at the speed of 20 km/h and returns from his office to home at the speed of 30 km/h. His mean speed is 24 km/h.
  130. A frequency distribution having two modes is said to be bimodal.
  131. If modal value is not clear in a distribution, it can be ascertained by the method of grouping.
  132. The median of the variate values 11, 7, 6, 9, 12, 15, 19 is 11.
  133. The middle value of an ordered series is called 50 percentiles.
  134. Rs. 600 per day are paid on a research farm to its 50 daily paid laborer. A worker gets five unpaid holidays in a month. The average income of a daily paid laborer is Rs. 300 p.m.
  135. The variate values which divide a series into five equal parts are called ad quintiles, if it divides into four equal parts then it is called quartiles. If it divides into 10 equal parts, then it is deciles.
  136. If we plot the more than type and less than type frequency distributions of the same set of data, their graphs intersect at the point which is known as median.
  137. In a class test, 40 students out of 50 passed with mean marks 6.0 and the overall average of class marks was 5.5. The average marks of students who failed were 3.5.
  138. The average marks of section A are 65 and that of section B are 70. The average of both the sections combined is 67. The ratio of number of students of section A to B is 3:2.
  139. Geometric mean can be used to find outgrowth rate of GNP.
  140. Weighted mean gives a higher value than unweighted mean if larger items have higher weights and small items have lower weights.
  141. If for values of X, mean = 25, harmonic mean = 9, then the geometric mean is 15.
  142. The second quartile of the following set of data, 0, 1, -1, -2, 6, 4, 5, 8, 12, 10, 11 is 5.
  143. A person has deputed to find the average income of factory employees. To provide a correct picture of average income, he should find out weighted mean.
  144. If y = 2x - 11 and median of y is 49, then the median of x is 30.
  145. If y = 3x + 6 and mode of y is 66, then the mode of x is 21.

22 August 2023

Measures of central tendency

    It is a single value within the range of data which represents a group of individuals value in a simple and concise manner. So we get quick understanding of the general size of the individuals in the group. Since the values lies within the range of the data.

Some definitions for Measures of central tendency:

    " An average may be thought of as a measure of central value " - John I. Griffin

    " The inherit inability of human to group in its entirely a large body of numerical data compels us to seek relatively few constants that will adequately show the data " - R. A. Fisher

    " Averages are statistical constants which enable us to comprehend in a single effort the significance of the whole " - A. L. Bowley

Properties of central tendency:

  • It should be rigidly defined.
  • It should be simple and easy to calculate.
  • It depends on all the observations.
  • It should be suitable for further mathematical treatment.
  • It should be easily located from the graph.
  • It should not be much affected by the extreme observations.
Mathematical averages:
  1.  Arithmetic mean
  2. Geometric mean
  3. Harmonic mean

Positional averages: 

  1. Median
  2. Mode
  3. Quartiles
  4. Quintiles
  5. Octiles
  6. Deciles
  7. Percentiles.

Commercial averages: 

  1. Moving average
  2. Progressive average
  3. Composite average.

The five measures of central tendency that uses very commonly. 

  1. Arithmetic mean 
  2. Median
  3. Mode
  4. Geometric mean
  5. Harmonic mean 
    Here the clear explanation,

    

Arithmetic Mean: It is set of observations is their sum divided by the number of observations.




                                                      
                   Where i = 1, 2, 3, ...... n

Merits for mean:

  • It is rigidly defined.
  • It is easy to calculate.
  • It is based upon all observations.
  • It is amenable to algebraic treatment. 
  • Of all averages, mean is affected least by fluctuations of sampling. This property is sometimes described by saying that mean is a stable average

    Demerits for mean:

  • It cannot be determined by inspection nor it can be located graphically.
  • Mean cannot be used if we are dealing with qualitative characteristics which cannot be measured quantitatively.
  • Mean cannot be obtained if a single observation is missing or lost or illegible unless we drop it out and compute the mean of the remaining values.
  • Mean cannot be calculated if the extreme class is open.
  • In extremely asymmetrical distribution, usually mean is not a suitable measure of location.
Problems for mean:

1. Calculate the mean of the following data 50, 76, 44, 48, 57, 59, 63, 45, 48, 30.

2. Find the mean of the following data. 

    Given data, 



Median: Median of a distribution is the value of the variable which divides it into two equal parts. It is the value which exceeds and is exceeded by the same number of observations. The median is a positional average.

                        
    
   Merits of median: 
  • It is rigidly defined.
  • It is simple and easy to calculate.
  • It can be located from the graph.
    Demerits of median:
  • It does not depend on all the observations.
  • It is not suitable for further mathematical treatment.
  • In some situations, median is affected by extreme observations.
Problems for median:

1. Find the median for following frequency distribution.

     Given data,
                                            

      The cumulative frequency just greater than N/2 is 47.5, The value of x corresponding to 47.5 is 5.

2. Find the median for following distribution

                   
    Given data,

             
            The c.f just greater than N/2 is 19 is 33 and the corresponding class is 500 - 600.

                


                                                               = 506.66

Mode: Mode is the value which occurs most frequently in a set of observations and around which the other items of the set cluster densely. In the other words, mode is the value of the variable which is predominantly in the series.


      
    



Merits of mode:
  • It is rigidly defined.
  • It is simple and easy to calculate.
  • It is based on all the observations.
  • It can be located from graph.
  • It is not affected by the extreme observations.
Demerits of mode:
  • It is not rigidly defined.
  • It does not depend on all the observations.
  • It is not suitable for further mathematical treatment.
Problems for mode:

1. Find the mode for following
                                               
    Given data, 
    
    Here the maximum frequency is 14, thus the mode is 8.

2. Find the mode for the following distribution.

         Given data,
 
                                   

            

                                                                         = 46. 67

Geometric mean: It is a set of n observations is the nth root of their product.


                
    Merits for GM:
  • It is  rigidly defined.
  • It depends on all the observations.
  • It is not affected by the extreme observations.
    Demerits of GM:
  • The calculation of GM is not simple and easy.
  • It cannot be located from the graph
Problem of geometric mean:

1. Find the GM of 
 
                    

   
               


Harmonic mean: Harmonic mean of a number of observations none of which is zero, is the reciprocal of the AM of the reciprocals of the given values.

            
    Merits of HM:
  • It is rigidly defined.
  • It depends on all the observations.
  • It is not affected by the extreme observations.
Demerits of HM:
  • It is not simple and easy.
  • It cannot be located from the graph.
Problem for the HM:

1. Calculate the HM for the following continuous frequency distribution.

    

Quantiles: The values of the variable x corresponding to (N +1/4)th, (N + 1/2)th and 3(N+1)/4th items of an ordered discrete series are the values of Q1, Q2, and Q3 respectively. The position of the required item can easily be adjudged with the help of cumulative frequencies.

Deciles: Similar to quartiles, the values of the variable x corresponding to the i(N+1)/10th item for i = 1,2,.......9 of an ordered discrete series is Di. The position of i(N+1)/10th item can be located with the help of cumulative frequencies.

Percentiles: Just like deciles, the value of the variable x corresponding to i(N+1)/100th item for i=1,2,....,99 of an ordered series is Pi.i(N+1)/100th item in the series can easily be placed with the help of cumulative frequencies.

Measures of Skewness and Measures of Kurtosis

  Measures of Skewness     To say, skewness means 'lack of symmetry'. We study skewness to have an idea about the shape of the curve...