Loading [MathJax]/jax/output/HTML-CSS/jax.js

Saturday, November 23, 2013

Another Take on Averages (Tuklas Vol. 15, No. 3 - November 23, 2013)

ANOTHER TAKE ON AVERAGES

One day, Carlo and Hanz decided to perform an experiment. They each had an opaque box in which they placed a red ball and a white ball. They would then draw from their respective boxes, record the color of the drawn ball, then place the ball back. They can draw as many times as they prefer. The goal is to have the highest percentage of red balls drawn.

Carlo's strategy was to draw significantly more than Hanz, and it paid off. In their first session, Carlo was able to draw the red ball 7 out of 19 times to get a percentage of 0.368. Hanz only made 9 attempts and drew the red ball 4 times to get a percentage of 0.364.

For the second session, Hanz made sure he will draw more times than Carlo, but things would still not go Hanz's way. Carlo drew the red ball 17 times out of 39 (43.6%), while Hanz drew 32 out of 74 (43.2%).

Hanz almost admitted defeat, but the statistician in him found hope by taking the aggregate draws as follows:


PLAYER1ST SESSION2ND SESSIONTOTAL
Carlo 7/190.368417/390.436024/580.4138
Hanz 4/110.363632/740.432436/850.4235

Using the total data, Hanz argued that he won.

We now ask two questions. First, who ACTUALLY won? And more importantly, how did this happen?

This story is an example of a phenomenon called the Simpson's Paradox. This states that if a group of data were separated into several categories, the population which exhibits the most successes may not exhibit less in each of the categories. What this means is that when comparing two groups of data, the successes of one group can still be ahead when we look at the aggregate, even if that group is behind percentage-wise in all the specific categories. This phenomenon had been observed as early as the 1920's through medical and sociostatistical data. However, Edward H. Simpson was one of the first who explicitly pointed it out in 1951, and it was not until 1972 that a paper formalized its definition.

A CLASSIC EXAMPLE

Before we go deeper into the situations that make up the paradox, let us introduce one of the most classic examples. A popular scenario where Simpson's Paradox occurs is through a sport -- baseball. Although this was not the first occurrence of the paradox on record, it gained the interest of statisticians and baseball junkies alike, especially those who utilized Sabermetrics (the Math of Baseball). These people, called sabermetricians, use baseball data in order to analyze the potential of a new player, team chemistry, even the use of performance-enhancing drugs. Sabermetricians may find this particular set of data interesting, since the presence of this phenomenon will make them rethink their traditional ways of looking into data.

For 1995-1996, the batting averages of Derek Jeter and David Justice (both of whom were very popular at the time) were compared, and the summary is in the next table.


PLAYER19951996TOTAL
Jeter12/480.250183/5820.314195/6300.310
Justice4/11.25332/740.321149/5510.270

Similar to the Carlo-Hanz experiment, Justice had better averages if we look at the seasons independently. Yet when we take the aggregate, Jeter is the actual winner. Because of this, baseball junkies should consider the yearly averages or even the aggregate (more commonly called the "career" statistic) separately, and not take one or the other alone.

REASONING

So after the two scenarios, you might be thinking, "Is there something wrong with the computation of the data?", or "Is there something that can be done to make the data make sense?"

To answer these questions, we go back to the data. One of the main reasons for the paradox is the discrepancy in the trials. To see this, let us try to visualize what is happening. We assign to the y-axis the number of successful trials per category, and to the x-axis the total number of trials per category. The success rate per category per group can be thought of as the ratio m=pq, with p successes out of q trials done by a particular group. On the Cartesian plane, this can be represented as a vector with slope equal to m. We assign trials a and b (in black) to Group A and trials c and d to Group B (in blue). Trials a and c belong to the same category, as do trials b and d. The total success rate for Groups A and B can then be represented by the slope of the vectors a+b and c+d, respectively (see figure below).



Let us now compare the slopes. Note that ma>mc and mb>md. That is, Group A performed better than Group B under each individual category. But upon forming a+b and c+d (using the Parallelogram Law), we can see that ma+b<mc+d, despite the fact that a+b is longer than c+d. This indicates that the trials themselves are not considered, only the ratios.

Given all of this, is there any way for us to make sense out of the separate data and try to make the conclusion consistent? To do so, we need to do what we call a "normalization process". For the Hanz-Carlo experiment, we will "normalize" the data by making the percentages consistent with their respective number of trials. That is, we give their trials weights so that the aggregate percentage would be a reflection of the same number of data points. This removes the paradox by removing the inconsistencies that it hinges on. In doing so, we have the following table:


PLAYER1ST SESSION2ND SESSIONTOTAL
Carlo7/190.3684(17/39)*(74/74)0.436039.2564/930.4221
Hanz(4/11)*(19/19)0.363632/740.432438.9091/930.4184

CONCLUSION

On the practical side, the statistical anomaly shown by Simpson's Paradox is a good reminder to be mindful of the computed numbers. For instance, sabermetrics is a young field, and Simpson's Paradox is a good example on how to improve upon current methods.

There is a certain viewpoint from psychology that goes by the mantra "the whole is more than the sum of its parts." Looking at things from this perspective, Simpson's Paradox would be more in line with the viewpoint "the whole is not the same as its parts." That is, we can also say that there are situations wherein the sum or aggregate says something different from the individual parts.

Now that we have a clearer picture of what has happened, we go back to the story of Carlo and Hanz. The question of who won among the two is still open to debate. What about you? Do you believe that the individual sessions have more weight, or that the overall performance is the best indicator?

ABOUT THE AUTHOR:
Victor Andrew Antonio is a Lecturer at the Ateneo de Manila University. He obtained his B.S. in Mathematics at the Ateneo de Manila University in 2012 and is currently taking his M.S. in Mathematics at the same university.

REFERENCES:
[1] Bogomolny, Alexander. Simpson Paradox. Retrieved from http://www.cut-the-knot.org/Curriculum/Algebra/SimpsonParadox.shtml.
[2] Pearl, Judea. Simpson's Paradox: An Anatomy. 1999.
[3] Wagner, Clifford H. Simpson's Paradox in Real Life. The American Statistician, Vol. 36 No. 1 (Feb 1982), 46-48.

OLYMPIAD CORNER
from the Asian Pacific Mathematics Olympiad, 2011

Problem: Let a,b,c be positive integers. Prove that it is impossible to have all of the three numbers a2+b+c,b2+c+a,c2+a+b to be perfect squares.

Solution: 
Suppose on the contrary that all three numbers are perfect squares. Since a2+b+c is a perfect square larger than a2, it follows that a2+b+c(a+1)2, which is equivalent to b+c2a+1.Using the same argument, we also obtain c+a2b+1anda+b2c+1.Combining the above inequalities, we have 2(a+b+c)2(a+b+c)+3which results in a contradiction. 

This proves that it is impossible to have all of the three numbers to be perfect squares.

PROBLEMS
  1. Two circles have exactly two points A and B in common. Find a straight line L through A such that the circles cut chords of equal lengths out of L. How many solutions can the problem have?
  2. For nonnegative real numbers a, b, c with a+b+c=1, prove that a+(bc)24+b+c3.
  3. If 2n1 is a prime number, then for any group of distinct positive integers a1, a2, ,an there exist i,j{1,2,,n} such that ai+aj(ai,aj)2n1.
We welcome readers to submit solutions to the problems posed below for publication consideration. Solutions may be submitted to the PEM facilitators on the deadline date or online via vantonio1992@gmail.com. Solutions must be preceded by the solver's name, school affiliation and year level. The deadline for submission is 12:30 PM December 7, 2013

SOLUTIONS
(for November 16, 2013)
  1. Prove that if a and b are two sides of a triangle and mc is the median drawn to the third side, then |mc||a|+|b|2. (Taken from Mathematics as Problem Solving by Alexander Soifer)
    (solved by Jan Kendrick Ong [Chang Kai Shek College] and Farrell Eldrian Wu [MGC New Life Academy]; partial credit for Jayson Dwight S. Catindig [Ateneo de Manila HS])

    SOLUTION: 
    Let O be the midpoint of AB. Construct D on line |OC| such that |OC|=|OD|.


    Then the quadrilateral ABCD is a parallelogram, and |BD|=|CA|=|b|. In addition, by the Triangle Inequality, |CD|<|CB|+|BD|, which means 2|mc|<|a|+|b||mc||a|+|b|2.
    Now, when n is even, we can write n as 4km2, where m is odd. Then look at all pairings of divisors. There is an odd number of pairs whose sum is odd. They are (4k(1)m2,,4k(i)m2i,,4kmm,,4k(i)m2i,,4km21).So, σ(n) is odd since we are adding an odd number of odd numbers.
  2. Prove that, for all integers n2,112+122++1n22.
    (solved by Farrell Eldrian Wu [MGC New Life Academy]; partial credit for Jayson Dwight S. Catindig [Ateneo de Manila HS] and Jan Kendrick Ong [Chang Kai Shek College])

    SOLUTION: 
    This is equivalent to proving 122++1n21. Note that 1(i1)i1i2 for all i2. This means that 122++1n2112+123++1(n1)n=(1112)+(1213)++(1n11n)=11n=n1n122++1n2<1.
  3. Find all solutions of 2n+7=x2 where n and x are integers. (Taken from International Mathematics: Tournament of the Towns, Questions, and Solutions,Tournaments 6 to 10 (1984 to 1988) by P. J. Taylor)
    (solved by Jayson Dwight S. Catindig [Ateneo de Manila HS] and Farrell Eldrian Wu [MGC New Life Academy]; partial credit for Marielle Macasaet [St. Theresa's College] and Jan Kendrick Ong [Chang Kai Shek College])

    SOLUTION: 
    Since we are required to make x an integer, then 2n must be an integer, and so n0. We consider all possible cases when we rewrite the equation modulo 4.

    If n>1, then 2n0mod4. On the other hand, if n=1, then 2n2mod4, and finally, if n=0, then 2n1mod4. We look at those three possibilities and deduce solutions for 2n+7x2mod4. Note that the only possible quadratic residues modulo 4 are 0 and 1.
    • CASE 1: n>1.
      We start with x2(2n+7)mod43mod4,which is impossible.
    • CASE 2: n=1. x2(2n+7)mod41mod4,which is acceptable. Going back to the original equation, it can be seen that x=±3.
    • CASE 3: n=0. x2(2n+7)mod40mod4,which is also acceptable. But this means x=±22, which are not integers.
    Therefore, the only solution is n=1, x=±3.
ERRATA
The right-hand side of Problem #11 was changed from "3" to "3."

Sunday, November 17, 2013

Equations that Shook the World, Part I (Tuklas Vol. 15, No. 2 - November 16, 2013)

EQUATIONS THAT SHOOK THE WORLD, PART I: 1=0.999

Counting may perhaps be our first ever encounter with mathematics. The anatomic ``digitus'' (i.e., fingers and toes) serves as our most basic counting tool. Perhaps that is the reason we refer to a number x in the set {x|xN,0x9} as a ``digit.'' In turn, that set serves as the basis of our decimal number system.

Part of being human is the desire to transcend limits imposed by our beliefs or by our ignorance. Counting is one of those activities wherein we love to exercise that desire. ``Count as far as you possibly can'' is perhaps a challenge we subsconsciously tackle when we attempt to break world Olympic records, generate as much wealth as we can, or enumerate as many stars as possible on a clear night. The number just seems endless, without bound!

Interestingly enough, the equation 1=0.99999 is an elegant way to express the notion of ``endlessness'' or ``unboundedness'' in an activity as finite as counting. Yet therein lies the mystery. For the longest time in antiquity, dating back to ancient India and the ancient Greeks (from where our present counting system originates), there had been no formal way to express the concept we now call ``infinity.'' Hence, the infinite was discussed more from a philosophical viewpoint, in the sense of being an ideal one can only strive for. The ancient Greeks defined the circle as a perfect shape which can be made only by creating a polygon with an endless number of sides. In ancient India, infinity had been a monster of innumerable multitude.

Only between the late 19th and early 20th centuries did human society come to grips with the concept of infinity through the efforts of Georg Cantor in his set theory about cardinalities. But at an earlier time, around the 17th century, the proponents of analytic geometry had pieces of the puzzle laid out. That era also saw ``infinity'' taking on the symbol of a lemniscate ``'' (which looks a lot like a horizontal digit 8). That symbol was formally introduced in 1655 by John Wallis, an English mathematician credited for his marked contributions to calculus. In fact, the German mathematician and co-inventor of calculus Gottfried Leibniz treated as an ``ideal quantity'' which is by nature different from but seems to resemble the properties of very large numbers.   

Undoubtedly, the equation 1=0.99999 can be proven in many different ways, none of which would make any sense in the absence of an understanding of the notion of . For example, a very common proof starts with denoting x=0.99999.  So 10x=9.9999=9+0.9999=9+x. The interpretation made in the last part of the equation would of course only make sense if one considers that the digit 9 repeats itself endlessly towards the right. Finally, the proof is simply a matter of algebra with 10x=9+x, which of course yields x=1. Therefore, one shows that 1=0.99999. Truly it is one among a few equations that shook the human world by putting mathematical substance to what only used to be a philosophical ghost - the concept of infinity.

Now try typing in 0.99999 in your home calculator until it spans from one end of the screen to the other; then, press the [ = ] or [ Enter ] button. What does it say?

ABOUT THE AUTHOR:
Dranreb Earl Juanico is an Assistant Professor at the Ateneo de Manila University. He obtained his B.S., M.S., and Ph.D. in Physics at the University of the Philippines in 2002, 2004 and 2007, respectively.

OLYMPIAD CORNER
from the Iboamerican Math Olympiad, 1997

Problem: Let H be the orthocenter of the non-equilateral triangle ABC and O its circumcenter. Let the lines AH and AO intersect the circumcircle at the points M and N, respectively. Denote P ,Q, and R to be the intersection points of the lines BC and HN, BC and OM, HQ and OP, respectively. Prove that AORH is a parallelogram.

Solution: 
We need to show that AOHR and ORAH.

Let E be the intersection of the segments AM and BC. At this point, we will prove a property relating to the orthocenter.

Claim: E is the midpoint of HM, that is HE=EM.
Since CMA and ABC are inscribed angles intercepting the same arc, then mCMA=mABC=α. Given that CD and AE are altitudes, then focusing on the quadrilateral BEHD, we have mEHD=360(90+90+α)=180α.Hence it follows that mCHM=α. This implies that the two right triangles ΔCHE and ΔCME are congruent, and therefore HE=EM.
Since E is the midpoint of HM, it follows that the right triangles EQM and EQH are congruent, so mEMQ=mQHE. Furthermore, AO=OM, which implies that mOAM=mAMO. Thus QHE=mOAM, and therefore AOHR.
Next, we note that mAMN=mHEQ=90, hence MNEQ. Moreover, since E is the midpoint of HM, then P is the midpoint of HN. Since ΔHNM is a right triangle, then P the center of the circumcircle of ΔHNM. This means that PM=PN and ΔPNM is isosceles. As a result, P lies on the perpendicular bisector of base MN. However, the vertex O of the isosceles triangle ONM also lies on this perpendicular bisector. Thus OPNM, which implies ORAH. Hence we have established that AORH is a parallelogram.

PROBLEMS
  1. Prove that if a and b are two sides of a triangle and mc is the median drawn to the third side, then |mc||a|+|b|2.
  2. Prove that, for all integers n2,112+122++1n22.
  3. Find all solutions of 2n+7=x2 where n and x are integers.
We welcome readers to submit solutions to the problems posed below for publication consideration. Solutions may be submitted to the PEM facilitators on the deadline date or online via vantonio1992@gmail.com. Solutions must be preceded by the solver's name, school affiliation and year level. The deadline for submission is 12:30 PM November 23, 2013

SOLUTIONS
(for October 19, 2013)
  1. Show that if n is a perfect square, then the sum of the divisors of n is odd.
    (solved by Jayson Dwight S. Catindig [Ateneo de Manila HS] and Farrell Eldrian Wu [MGC New Life Academy]; partial credit for Hans Jarrett Ong [Chang Kai Shek College])

    SOLUTION: 
    We prove this first for the case when n is odd. If n is odd, then all divisors of n are odd. The pairs of divisors will add up to form even numbers, except for n, an odd number which will come up in the list of divisors only once. Hence, σ(n)=2k+n, where 2k is the sum of the divisors of n excluding n.

    Now, when n is even, we can write n as 4km2, where m is odd. Then look at all pairings of divisors. There is an odd number of pairs whose sum is odd. They are (4k(1)m2,,4k(i)m2i,,4kmm,,4k(i)m2i,,4km21).So, σ(n) is odd since we are adding an odd number of odd numbers.
  2. Let A be any set of 12 distinct integers chosen from the arithmetic progression 9,13,17,77. Prove that there must be two distinct integers in A whose sum is 90.
    (solved by Jayson Dwight S. Catindig [Ateneo de Manila HS], Luis Salvador R. Diy [Xavier], Hans Jarett Ong [Chang Kai Shek College] and Farrell Eldrian Wu [MGC New Life Academy])

    SOLUTION: 
    Consider the set S={(9),(13,77),(17,73),,(41,49),(45)}.Note that |S|=10. Suppose we choose 9 and 45, so that we can still avoid having a sum of 90. We still have to choose 10 numbers. By the Pigeonhole Principle, there is at least one pair in S wherein both numbers will be chosen, which means there are 2 numbers whose sum is 90.
  3. Let p(x) be a polynomial with integer coefficients satisfying p(0)=p(1)=2011.Show that p has no integer zeros.
    (solved by Hans Jarett Ong [Chang Kai Shek College]; partial credit for Farrell Eldrian Wu [MGC New Life Academy])

    SOLUTION: 
    We prove by contradiction.

    Letp(x)=anxn+an1xn1++a1x+a0,where aiZ. If p(0)=2011, then this means a0=2011. At the same time, since p(1)=2011, then an+an1+a1+a0=2011an+an1+a1+2011=2011an+an1++a1=0.Suppose there is another integer z such that p(z)=0.

    Then anzn+an1zn1++a1z+2011=0anzn+an1zn1++a1z=2011From this, we can see that z cannot be even, because then, each terms in the sum must be even. So we consider the case where z is odd.

    Since we already know that an+an1++a1=0 is even, and multiplying an odd number (powers of z) to each of the terms will not change the parity, then anzn+an1zn1++a1z is even. This is a contradiction. Hence, p cannot have integer zeroes.
  4. Evaluate n=11n(1n+1)(1n+2).
    (solved by Hans Jarett Ong [Chang Kai Shek College] and Farrell Eldrian Wu [MGC New Life Academy])

    SOLUTION: 
    It can be seen that 1n(1n+1)(1n+2)=12n+1n+1+12n+2=12n1n+1+12(n+2)So S=n=11n(1n+1)(1n+2)=n=1(12n1n+1+12(n+2))=n=112nn=11n+1+n=112(n+2)=(12+14+)(12+13+)+(16+18+).Rearranging some of the addends we have S=(12+14+16+18+)+(16+18+)(12+13+)=[12+14+(13+14+)](12+13+)=14.
  5. Let ABC be an equilateral triangle and P a point in its interior. Consider XYZ, the triangle with XY=PC, YZ=PA, and ZX=PB, and M a point in its interior such that XMY=YMZ=ZMX=120. Prove that XM+YM+ZM=AB(Taken from Mathematical Olympiad Challenges by Andreescu and Gelca)
    (solved by Jayson Dwight S. Catindig [Ateneo de Manila HS], Hans Jarett Ong [Chang Kai Shek College] and Farrell Eldrian Wu [MGC New Life Academy])

    SOLUTION: 
    Rotate triangle ZMY through 60 counterclockwise about Z to ZNW.
    First note that triangles ZMN and ZYW are equilateral. Hence MN=ZM and YW=YZ. Now XMN and MNW are straight angles, both being 120+60 , so XW=XM+YM+ZM. On the other hand, when constructing backwards the triangle ABC from triangle XYZ, we can choose A=W and C=X. Then the side length of the equilateral triangle is XW, which is equal to XM+YM+ZM.
  6. An interior point P is chosen in the rectangle (ABCD) such that APD+BPC=180. Find the sum of the angles DAP and BCP. (Taken from Mathematical Olympiad Challenges by Andreescu and Gelca)
    (solved by Marielle Macasaet [St. Theresa's College, QC], Hans Jarett Ong [Chang Kai Shek College] and Farrell Eldrian Wu [MGC New Life Academy])

    SOLUTION: 
    Translate triangle DCP to triangle ABP, as below.
    This way we obtain the quadrilateral APBP, which is cyclic, since APB+APB=360APDBPC=180. Let Q be the intersection of AB and PP. Then since the lines AD, PP, and BC are parallel, we have DAP+BCP=APQ+QPB. The latter two angles have measures equal to half of the arcs AP and BP of the circle circumscribed to the quadrilateral APBP. On the other hand, the angle BQP, which is right, is measured by half the sum of these two arcs. Hence DAP+BCP=BQP=90.

Friday, November 8, 2013

PEM session for November 9, 2013 cancelled

In light of supertyphoon Yolanda, tomorrow's PEM session (November 9, 2013) will be cancelled. The Tuklas problems deadline will also be extended accordingly.

We will be discussing when to hold possible make-up sessions next week.

Stay safe everyone! :]