Loading [MathJax]/jax/output/HTML-CSS/jax.js

Saturday, October 24, 2015

Convex Functions and Jensen's Inequality (Tuklas Vol. 17, No. 5 - October 24, 2015)

CONVEX FUNCTIONS AND JENSEN'S INEQUALITY

J. L. W. V. Jensen had the following to say about the beauty of convex functions [4]:
It seems to me that the notion of convex functions is just as fundamental as positive function[s] or increasing function[s]. If [I] am not mistaken in this, the notion ought to find its place in elementary expositions of the theory of real functions.
Convex functions play numerous roles in different fields in pure and applied mathematics, most notably in geometry, real analysis, probability theory, and nonlinear optimization. The modern-day appreciation and depth of the theory of convex functions is heavily attributed to Jensen, known for his inequality on convex functions that has become a stable discussion in any material on the theory of convexity. Jensen's Inequality has since then taken on various forms---a discrete form, a form which uses integrals, and a form using probabilities, among others. In relation to Jensen's statement on convex function, Steele refers to convexity as the ``third pillar'' of mathematical inequalities, along with positivity and monotonicity. [5]

The aim of this article is to provide the reader with initial insights on the theory of convex functions and the intuition and applications of Jensen's Inequality. As we shall see later, convex functions and Jensen's Inequality has earned a special spot as an indispensable technique in dissecting Olympiad-level inequalities. One of the most fundamental results related to Jensen's Inequality is that it can lead to what is called the generalized Arithmetic Mean-Geometric Mean Inequality. For a more detailed discussion of convex functions, the reader is encouraged to look at the references used for this article.

Before convex functions are introduced, we must first define what a convex set is. A set S is a convex set if the line segment connecting any two points in S is contained entirely in S. Mathematically, the set S is convex if for any x,yS and for any θ[0,1], the number θx+(1θ)y, called the convex combination of x and y, is a member of the set S.
Figure 1: The two-dimensional sets above are examples of convex and non-
convex sets. The leftmost figure is a convex set, while the center and right-
most figures are not.

Convex functions, on the other hand, are functions defined on a convex set. A function f is said to be convex on the interval [a,b] if for any x,y[a,b] (with x<y) and θ[0,1],f(θx+(1θ)y)θf(x)+(1θ)f(y).Graphically, this definition means that if we take any two points A and B on the graph of a convex function f, the line segment AB should be above the graph y=f(x). This characterization also means that the graph of convex functions are U-shaped or bowl-shaped on the interval [a,b]. Speaking in the language of averages, for a convex function, the function value of the weighted average of x and y is less than or equal to the weighted average of the function values at x and y.

Figure 2: The graph of a convex function.

Calculus is also used to determine whether a function is convex. A function is convex on an interval [a,b] if all tangent lines to the graph y=f(x) on the interval lie below the graph of the function. Furthermore, a function is convex if its second derivative is nonnegative for all x[a,b].

Examples of convex functions include the exponential function f(x)=eax for any aR, powers of absolute values f(x)=|x|p for p1, and the maximum function f(x)=max{x1,x2,,xn} for real numbers x1,x2,,xn.

There is a very nice quality of convex functions that makes this entire class of functions an attraction in the field of optimization. Notice that if a function f is convex on an interval [a,b], then its minimum value on [a,b] is unique. A convex function also attains its maximum value at the endpoints of the interval considered; that is, if f is convex on [a,b], then the maximum value of f is either f(a) or f(b). Furthermore, any local minimum is a global one; a strictly convex function (where the inequality is strictly less than) admits at most one minimum. In usual optimization, one may be looking at several local minima, but with (strictly) convex functions, we are guaranteed that there is only one minimum.

The most important result related to convex functions is Jensen's Inequality. This inequality states that if the real-valued function f:[a,b]R is convex, x1,x2,,xn are numbers from an interval [a,b], and p1,p2,,pn are numbers in [0,1] whose sum is 1 (i.e. ni=1pi=1), then it holds that f(p1x1+p2x2++pnxn)p1f(x1)+p2f(x2)++pnf(xn).Speaking in terms of weighted averages, Jensen's Inequality means that the function evaluated at the weighted average of the xi's is less than or equal to the weighted average of the functional values at the xi's. Notice also that if n=2, then the inequality reduces to the definition of convexity introduced earlier. Equality is attained when x1=x2==xn.

The power of convex functions and Jensen's Inequality lies in the problem-solver's ability to spot a convex function in the given problem. Once the appropriate convex function has been identified, then the results above may be used to complete the solution to the problem. This step, however, is sometimes the most difficult when attempting to resolve an inequality problem.

To demonstrate the power of convex functions and Jensen's Inequality, we shall prove the generalized Arithmetic Mean-Geometric Mean (AM-GM) Inequality yp11×yp22××ypnnpiy1+p2y2++pnyn,where the yi's are nonnegative numbers and the pi's are numbers in the interval [0,1] that sum up to 1.

It turns out that one only has to resort to the exponential function f(x)=ex, which we note is a convex function on the entire real number line. (We can verify its convexity based on its graph or using the defintion for convexity). Thus, for any x1,x2,,xnR and pi's satisfying the condition above, it holds that ep1x1+p2x2++pnxnp1ex1+p2ex2++pnexn.Using the laws of exponents, we may rewrite the above equation as (ex1)p1×(ex2)p2××(exn)pnp1ex1+p2ex2++pnexn.Let yi=exi for all i=1,2,,n. Then the previous equation can be written as yp11×yp22××ypnnp1y1+p2y2++pnyn.Thus, the generalized AM-GM Inequality has been proven.

Another problem in which convexity and Jensen's Inequality can be used is the following [5]:
In an equilateral triangle with area A, the product of any two sides is equal to 4A/3. Show that there exist two sides whose lengths have a product that is greater than or equal to 4A/A.
To solve this, one may find useful some formulas that give the area of a triangle given side lengths and measures of interior angles. In particular, we note that if a, b, and c are side lengths of a triangle and α, β, and γ are the angles opposite a, b, and c, respectively, thenA=12absinγ=12acsinβ=12bcsinα.The rest of the solution is left to the reader; as before, to use Jensen's Inequality, we must first be able to identify a convex function from the given problem. What do you think is the convex function we can use?

In the theory of inequalities, there are three pillars---positivity, monotonicity, and convexity. As we have shown in the foregoing exposition, convex functions significantly recur in the study of various mathematical relationships between numbers and functions. Its far-reaching beauty, with the aid of Jensen's Inequality, will continue to grace various applications in the optimization of functions and other fields involving a mastery of mathematical inequalities.

ABOUT THE AUTHOR:
Len Patrick Garces is an Instructor at the Ateneo de Manila University. He obtained his Master of Applied Mathematics major in Mathematical Finance degree at the Ateneo de Manila University in 2015.

REFERENCES:
[1] Boyd, S. & L. Vanderberghe. (2009). Convex Optimization. Cambridge University Press.
[2] Bautista, E. P & I. J. L. Garces. (2010). Mathematical Excursions: A Problem-Solving Primer for Trainers and Olympiad Enthusiasts. C&E Publishing, Inc.
[3] Manfrino, R. B, J. A. G. Ortega & R. V. Delgado. (2005). Inequalities: A Mathematical Olympiad Approach. Birkhauser.
[4] Niculescu, C. P. & L. E. Persson. (2004). Convex Functions and their Applications: A Contemporary Approach. Springer.
[5] Steele, J. M. (2004). The Cauchy-Schwarz Master Class: An Introduction to the Art of Mathematical Inequalities. Cambridge University Press.

OLYMPIAD CORNER
Team Selection Test for the 55th IMO, Bulgaria

Problem:  Find the least positive real number α with the following property: if the weight of a finite number of pumpkins is 1 ton and the weight of every pumpkin is not more than α tons then the pumpkins can be distributed in 50 boxes (some of the boxes may remain empty) such that there are no more than α tons of pumpkins in every box.

Solution: 
Claim: the real number α that satisfies the above conditions is α=251.

Suppose that α<251, and let k0 be a nonnegative integer such that151×2kα<151×2k1Consider 51×2k pumpkins, each having a weight of 151×2k tons. Since there are 50 boxes, and 50<51×2k for any nonnegative integer k, then by the pigeonhole principle, there exists a box with at least two pumpkins, with combined weight of at least 151×2k1>α. This is a contradiction.

We now show that α=251. Suppose that we have a total of m pumpkins, and we place a pumpkin in each of the m empty boxes. Take two of the lightest boxes; if the combined weight is at most 251, we transfer all the pumpkins into one of these boxes and remove the other. When the operation terminates, let n be the number of boxes remaining. Suppose x1x2xn represent the weights (in tons) of pumpkins in the n boxes. Note that based on the procedure, xi251 for all i=1,2,...,n, which means we have distributed m pumpkins into n boxes such that there is no more than 251 tons of pumpkins in every box. Moreover, since x1+x2>251 and x1x2, then x2>151. Thereforex1+x2++xn>251+(n2)151But since the sum of weights is $1$, then251+(n2)151<1n<51Hence we have distributed the pumpinks in no more than 50 boxes such that weight in each box does not exceed 251.

SOLUTIONS
(for October 10, 2015)
  1. Determine the values of x such that 2x+3x4x+6x9x1. (Taken from 101 Problems in Algebra by Andreescu and Feng)
    (Solved by Jarrett Ian G. Lim [Philippine Academy of Sakya] and Farrell Eldrian Wu [MGC New Life Christian Academy]; partial credit for Joyce Heidi Ong [Chiang Kai Shek College], Steven Reyes [Saint Jude Catholic School], and Madeline Tee [Jubilee Christian Academy])

    SOLUTION: 
    Note that, for any value of x,2x3x+4x6x+9x+1=12[(4x26x+9x)+(4x22x+1)+(9x23x+1)]=12[(2x3x)2+(2x1)2+(3x1)2]0,because we just have a sum of squares of real numbers. This means that2x3x+4x6x+9x+1012x+3x4x+6x9xfor all x. Therefore, the solution set is R.
  2. Let n, a, and b be positive integers. Prove thatgcd(na1,nb1)=ngcd(a,b)1.(First Stage, Moscow Mathematical Olympiad, 1995)
    (Solved by Jarrett Ian G. Lim [Philippine Academy of Sakya], Joyce Heidi Ong [Chiang Kai Shek College], and Farrell Eldrian Wu [MGC New Life Christian Academy]; partial credit for Madeline Tee [Jubilee Christian Academy])

    SOLUTION: 
    What we will show is that the expression on the left will divide the expression on the right, and vice-versa.

    First note that gcd(a,b) must divide both a and b. Also, for any expression nx1, if y|x, thennx1=(ny1)(nxy+nx2y++1).Since gcd(a,b) divides a and b, then ngcd(a,b)1 must also divide both na1 and nb1, that is, ngcd(a,b)1 is a common factor of na1 and nb1. Consequently, ngcd(a,b)1 must divide gcd(na1,nb1).

    We now work with the second part, which is to prove that gcd(na1,nb1) divides ngcd(a,b)1. Recall that if gcd(a,b) denotes the greatest common divisor of a and b, then there exists x, y such that axby=gcd(a,b). Specifically, we can choose x and y to be positive. Now, following a similar argument as in the first part, we now know that na1 will divide nax1 and nb1 will divide nby1.

    This means that gcd(na1,nb1) divides nax1 and nby1. So gcd(na1,nb1) divides (nax1)(nby1).

    Note that(nax1)(nby1)=nby(naxby1)=nby(ngcd(a,b)1).Again, gcd(na1,nb1) divides (nax1)(nby1), and since na1 and nb1 are both 1(modn) while nby0(modn), we have gcd(nby,gcd(na1,nb1))=1. This means that gcd(na1,nb1) divides ngcd(a,b)1.
  3. Let a and b, with ab be roots of x23x50=0. Determine the value of a32014b2+2015.
    (Solved by Jarrett Ian G. Lim [Philippine Academy of Sakya], Joyce Heidi Ong [Chiang Kai Shek College], and Farrell Eldrian Wu [MGC New Life Christian Academy]; partial credit for Madeline Tee [Jubilee Christian Academy])

    SOLUTION: 
    This item has two solutions.

    The first is the more tedious one, where we actually get the values of a and b via the quadratic formula, and substitute them into the expression a32014b2+2015, but this is tedious.

    The second involves a more intricate approach. First, note that by Vieta's Theorem, we have a+b=3 and ab=50.

    LetA=a32014b2+2015andB=b32014a2+2015.Adding A and B, we get A+B=(a3+b3)2014(a2+b2)+2030The first expression can be factored as A+B=(a+b)(a2ab+b2)2014(a2+b2)+2030=(a+b)(a2+2ab+b23ab)2014(a2+2ab+b22ab)+2030=(a+b)[(a+b)23ab]2014[(a+b)22ab]+2030=3[32+150]2014(32+100)+2030=217,109.Similarly,AB=(a3b3)+2014(a2b2)=(ab)[(a+b)2ab+2014(a+b)].Now, where will we get the values of those expressions? We go back to the original equation which reads x23x50=0.

    It can be shown that the sum of the roots is 3, and the product of the roots is 50. Since ab, then ab is nonnegative, soab=+(ab)2=(a+b)24abThis means thata+b=3ab=50ab=209.This means thatAB=209[9+50+2014(3)]=6101209.Solving for A from A+B=217,109 and AB=6,101209, we getA=6,101209217,1092.
  4. Four spheres have radii 2, 2, 3, and 3 respectively. Each sphere is tangent to three others. There is another sphere which is tangent to all these four spheres. Determine the radius of this sphere. (China, 1995)
    (Solved by Jarrett Ian G. Lim [Philippine Academy of Sakya] and Farrell Eldrian Wu [MGC New Life Christian Academy]; partial credit for Madeline L. Tee [Jubilee Christian Academy])

    SOLUTION: 
    Let A and B be the centers of the two spheres of radius 2, and C and D the centers of the two spheres of radius 3. Let E and r be the center and radius of the sphere tangent to all others.

    In addition, let M and N be the midpoints of AB and CD, respectively.

    First, note that AC=AD=5, and N is the midpoint of CD. This means that AN is perpendicular to CD. So we can use Pythagorean Theorem on ANC to show thatAC2=AN2+NC252=AN2+32AN=4.Now, MN is perpendicular to AB, so we can again use Pythagorean Theorem on AMN and see thatAN2=AM2+MN242=22+MN2MN=12.Now we focus our attention to the sphere with center E. It must be noted that, for the sphere with center at E to be tangent to all four spheres, E must lie on the perpendicular bisecting planes of both AB and CD, which intersect at the line in which MN is a segment of. This means that E lies on segment MN.

    Moreover, EMA is a right triangle with AM=2 and AE=r+2. We now haveAE2=AM2+ME2(r+2)2=22+ME2ME=r2+4r.Similarly, working with EN, we will get EN=r2+6r. Since E lies on segment MN, we haveMN=ME+EN12=r2+4r+r2+6r(12r2+6r)2=(r2+4r)212212r2+72r+r2+6r=r2+4r2r+12=212r2+72r(r+6)2=(12r2+72r)2r2+12r+36=12r2+72r0=11r2+60r36r=611,6.This means that the radius of the sphere tangent to all others is 611.
  5. Let P1P2P12 be a regular dodecagon. Prove that P1P5, P4P8, and P3P6 are concurrent (they intersect at the same point). (23rd Putnam, 1963)
    (Partial credit for Jarrett Ian G. Lim [Philippine Academy of Sakya], Joyce Heidi Ong [Chiang Kai Shek College], Madeline Tee [Jubilee Christian Academy], Farrell Eldrian Wu [MGC New Life Christian Academy])

    SOLUTION: 
    Consider the added points Q1,Q2,,Q6 such that Q1Q2Q6 is a regular hexagon of the same length as the original dodecagon. This can be done by forming equilateral triangles P1P2Q1, P3P4Q2, and so on until P11P12Q6.



    Since the measure of one interior angle of a dodecagon is 150, then we can say that P3P2Q1=P5P4Q2==P1P12Q6=90. As a consequence, we have formed six squares in the process.

    We first focus our attention to the isosceles triangle P1Q1Q2. Here, note that the vertex angle is P1Q1Q2, which measures 150. This means that Q1Q2P1=Q1P1Q2=15.

    Since Q2P5 is the diagonal of square P4P5Q3Q2, we have P5Q2Q3=45. Of course we also have Q1Q2Q3=120.

    This means thatP1P2Q1+Q1Q2Q3+P5Q2Q3=180,which means that P1Q2P5 form a line segment. Similarly, we can say that P4Q3P8 form a line segment as well.

    Consequently, P1P5 and P4P8 intersect at the center of square P4P5Q3Q2.

    By our construction, P3P4Q2 and P5Q3P6 are symmetric with respect to the center of square P4P5Q3Q2. As such, P3P6 serves as the perpendicular bisector of P4Q2 and P5Q3. This means that indeed, P3P6 must pass through the center of square P4P5Q3Q2 as well.

    So P1P5, P4P8, and P3P6 intersect at the same point.
  6. For each permutation a1,a2,,a2014 of the integers 1,2,,2014, form the sum1007i=1|a2i1a2i|.Find the average value of all these sums. (American Invitational Mathematics Examination, 1996)
    (Solved by Jarrett Ian G. Lim [Philippine Academy of Sakya] and Farrell Eldrian Wu [MGC New Life Christian Academy]; partial credit for Madeline Tee [Jubilee Christian Academy])

    SOLUTION: 
    We first focus on the average value of A=|a1a2|. Since we are just taking permutations, then we can just take the average value of A, because this average value will be the same for all |a2i1a2i|, so the average that we want is just 1007 times the average value of A.

    Now, if a1=x, x{1,2,,2014}, then the average value of A is just(x1)+(x2)++1+1+2++(2014x)2013=12013[x(x1)2+(2014x)(2015x)2]=x22015x+1007(2015)2013.Now we need to take the sum of all possible average values when we vary the value of a1. This means that we have120142014x=1x22015x+1007(2015)2013=1201412013[2014(2015)(2031)62015(2014)(2015)2+1007(2014)(2015)]=1201412013[2014(2015)(2029)632015(2014)(2015)6+32014(2014)(2015)6]=20153.This means that the average value of all possible sums is just (1007)(2015)3.

No comments:

Post a Comment