Homeschooling Ben and Lily this year, I’ve been teaching math, programming,and business so far, and helping my wife with the other subjects. One thing that strikes me is how much incidental material I have to teach— and that this might be the most important part of the teaching. We have Word Books where the children write down hard-to-understand or hard-to-spell words that come up, for example. Read more…
The median of a set of numbers is the middle value. In the set (1,2,3), the median is 2. But how about the set (1,2,3,4)? Most commonly, people define the median as 2.5. That is a good measure of central tendency, I guess, but it isn’t satisfactory because it mixes the ideas of mean and median. Also, then the median isn’t a member of the set.
Perhaps the best definition is that the median is X, where X is the lowest value such that 50% of the values are less than or equal to X. Read more…
From Wikipedia, Principal Components Analysis:
PCA is theoretically the optimal linear scheme, in terms of least mean square error, for compressing a set of high dimensional vectors into a set of lower dimensional vectors and then reconstructing the original set. It is a non-parametric analysis and the answer is unique and independent of any hypothesis about data probability distribution.
I just learned a useful math term: CODOMAIN. Consider the function f(x) = 3 +5x as defined over the intervals of x in [0, 10] and f(x) in [0, \infinity). The DOMAIN is [0,10]. The RANGE is [3, 50]. The CODOMAIN is [0, \infinity). This mapping is one-to-one, but not onto, so the range and codomain are not identical.
My colleague Haizhen Lin found a neat trick from someone in the math department. Suppose you have a density f(x) and you want to construct a pointwise less risky function, as in my paper cited below. You can use this:
f(a, x) = (1/a) f( .5 – .5/a +x/a)
If a=1, f(a,x) = f(x).
If a is small, f(a,x) tends to get big because of the 1/a portion, and it gets very big for x=0, but for x far from 0, the f becomes small because the argument becomes very big, distant from 0.
“When Does Extra Risk Strictly Increase the Value of Options?” The Review of Financial Studies, 20(5): 1647-1667 (September 2007). It is well known that risk increases the value of options. This paper makes that precise in a new way. The conventional theorem says that the value of an option does not fall if the underlying option becomes riskier in the conventional sense of the mean-preserving spread. This paper uses two new definitions of “riskier” to show that the value of an option strictly increases (a) if the underlying asset becomes “pointwise riskier,” and (b) only if the underlying asset becomes “extremum riskier.” Paper in tex or pdf ( http://www.rasmusen.org/published/Rasmusen07-RFS-options.pdf).
National Review’s blog reports that the SAT is changing so that only a student’s MAXIMUM score out of all the times he takes the test will be reported to colleges. What amazing favoritism to rich, stupid, applicants!
Or maybe not so amazing. This will be a bonanza for the SAT company, since their tests will be taken so many more times. This is especially true nowadays, when many colleges have merit-based scholarships and your $45 retest fee might have a 1/10 chance of yielding you $1000 extra in tuition breaks.
It also raises an interesting mathematical question. Suppose everyone ends up taking the test exactly 8 times. This will cost a lot more, of course, but will it yield more accurate evaluation of the applicants? Which provides more useful information:
1. A single test score.
2. The maximum of 8 test scores.
The answer depends on the distribution of an individual’s test scores for his given talent. If someone with ability X scores X on the test with probability .9 and X-y with probability .1, the Maximum is a better measure (in fact, then it is even better than the average of 8 test scores).
If someone with ability X scores X on the test with probability .8, X-y with probability .1, and X+y with probability .1, which is better? The maximum still, I think. In almost every case, person i will end up with a maximum of Xi+y, and we can simply subtract y and get a person’s ability.
If someone with ability X scores X on the test with probability .999 and X+y with probability .001, then I think , the Single reported score is better. It is right with probability .999, whereas the Maximum will frequently be X+y (with probability 1-.999^8) so it will be right with only probability .992. (I haven’t phrased that carefully– what we care about is not the percentage of “right” answers but the variance of the measure minus the true ability, but in this special case the two criteria give the same answer.)
What if the distribution of test scores around ability has a normal distribution? I don’t know. The answer might depend on the variance. I’ll ask our job candidate at lunch. He’s a couple of years out of grad school already, so he shouldn’t freak out at the question.
A nice image from the
From Wikipedia’s Smooth Functions:
“The class C0 consists of all continuous functions. The class C1 consists of all differentiable functions whose derivative is continuous; such functions are called continuously differentiable.”
A differentiable function might not be C1.
The function f(x) = x^2*sin(1/x) for x \neq 0 and f(x) =0 for x=0 is everywhere continuous and differentiablem, but its derivative is f'(x) = -cos(1/x) + 2x*sin(1/x) for x \neq 0 and f'(x) =0 for x=0, which is discontinuous at x=0, so it is not C1.
|The Weierstrass Function|
From Wikipedia’s Weirstrass Function comes this good graphic of an everywhere continuous but nowhere differentiable function.
Nov. 9. I wondered about the following questions:
Do there exist monotonic functions that are everywhere continuous but
Do there exist monotonic functions that are nowhere continuous?
No in either case, it seems. Here is an answer:
First, monotone functions only can have a countable number of discontinuities (since these must be jump discontinuities where the function makes progress upward/downward and all uncountable positive sums are infinite).
Moreover, for a more involved reason, the set of points where a monotone function is not differentiable must have lebesgue measure 0. (I.e. they are differentiable almost everywhere.)
One way to see this is from the fact that for an increasing function the limit of the slope of the secant line between (x,f(x)) and (x+h,f(x+h)) for each fixed x as h varies must always exist (and be nonnegative), provided we allow it to also take on the value +infinity. Then one can show this cannot be infinity except on a measure 0 set…again, the function would make too much progress.
On the other hand, the derivative can not exits on an uncountable set (e.g. the Cantor staircase function). Moreover, there is a slightly more sophisticated example of a strictly increasing continuous function that goes from f(0)=0 to f(1)=1 which has a derivative equal to 0 almost everywhere, in fact whenever the derivative exists.
Since they are differentiable almost everywhere, the derivatives of monotone functions are Lebesgue integrable functions (extend to the nondifferentiable points however you want, it won’t affect the integral). So the previous example shows that the Fundamental Theorem of Calculus cannot be extended to even the class of derivatives of continuous monotone functions (even when the resulting derivative function is the constant function), since then we would have 0=\int_01 f'(x)dx=f(1)-f(0)=1. (The FTC does work, however, if f is continuous and the derivative exists except at a countable set).
From PlanetMathm here is Cantor’s Staircase (in a 20-iteration figure, instead of infinite iterations), which uses a Cantor Set to build a function which is continuous and monotonic (strictly?) but with f'(x) =0 almost everywhere.
|Graph of the cantor function using 20 iterations|
Martin Osborne has some good notes on quasiconcavity. I’m still not satisfied, though. It’s a basic enough idea that I wish I had better intuition for it, and lots and lots of pictures of functions that are or are not quasiconcave.
October 25: Here are some key features of a quasiconcave function f(x).
- It has convex upper level sets. The set of points x such that f(x) >= a is convex for any number a.
- It has convex indifference curves if it is a utility function. If f(x) is strictly monotonically increasing, the function g(x) such that f(x)=a is a convex function.
Every concave function is quasiconcave, but some quasiconcave functions are not concave. A key feature of quasiconcavity that concavity doesn’t have is that if you do an increasing transformation of a qc function, it is still qc. I wonder if the following is true:
Conjecture: Iff function f(.) is quasiconcave, there exists an increasing transformation g(.) such that g(f(.)) is concave.
I’d start to prove the conjecture this way. Let x and y be points in the upper level set of f(.), which means f(x)>=a and f(y)>=a. Since f(.) is quasiconcave, the upper level set is convex, which means that f(mx+ (1-m)y) >=a too. What we need to show first is that there exists some increasing function g() such that
g(f(mx+ (1-m)y)) >= mg(f(x)) + (1-m)g(f(y)). I think we need to start by assuming that f(x) \neq f(y), and that they are both on the boundary of that convex upper level set. Then we can see how g has to affect those two levels of f differently.
If the conjecture is true, then maybe we can think of quasiconcavity as being the equivalent of concavity for functions that are just defined on ordinal, not cardinal spaces.
October 26. Why, though, do we worry about quasi-concavity at all in economics? Why not just assume that utility functions are concave? The conventional answer would be that utility is ordinal, not cardinal. That is a bad answer for three reasons. First, even if it is ordinal, we could say, “It’s only the ordinal properties of a utility function that affect decisions. Therefore, for convenience, let’s say that whatever function you start with, you have to use a monotonic transformation to make it concave before we start working with it.” Second, we might say, “Since only ordinal properties matter, let’s assume utility is concave for convenience.” Third, we might accept cardinality. Everybody uses von-Neumann Morgenstern cardinal utility in their models anyway, making only a brief nod, if any, to ordinality. But a risk-averse agent has concave utility. For these reasons, I wonder why it’s worth making our graduate students learn about quasi-concavity. The opportunity cost is that they’re not learning about something more useful such as the CAPM or the Coase Theorem.
Maybe quasi-concavity comes up in enough other contexts to be important. I know Rick Harbaugh has a paper on comparative cheap talk where it comes up. In Varian, it comes up first in production functions, where it allows you to have convex input sets for a given output without requiring diminishing returns to scale, as true concavity would.
October 27. Yet another thought. Margherita Cigola has done work on defining quasiconcavity in ordinal spaces, on lattices. Convexity has to be defined specially there. She uses a different (equivalent in R space) definition of quasiconcavity:
f(mx + (1-m)y) >= mf(x) + (1-m)f(y)
I like that because it is closer to the definition of concavity.
Or another, suitable when the function is differentiable: f is quasiconcave if whenever there is a maximum (i.e., the first derivatives are zero), the matrix of second derivatives is negative definite. MR suggested that, for the single-dimensional x case. I’m not sure it does generalize that way.
I haven’t used this idea since high school, really, but it comes up now and then, so I looked it up in Wikipedia. 100 has one significant figure, as do 20 and 23 and .0001, the article says. The number .00200, however, has three significant figures. The number 1.234 has 4 significant figures. Digits beyond accurate measurement don’t count as significant. There is ambiguity, however, in whether 100 feet really has just one significant figure. It may be that you have measured it to the nearest foot, in which case it really has three significant figures.
The real importance of significant figures comes in doing arithmetic. If you run 100 yards in 11.71 seconds, and the 100 has three significant figures, then the speed should be written with three significant figures as 8.54 yards per second, not as 8.53970965 yards per second.
This Reuleaux Triangle from Wolfram/Mathematica is a nice idea for a shape. It is the shape a Wankel engine takes, perhaps because you can rotate this triangle inside a square as shown at the Wolfram site.
Dean Anton Sherwood has lots of good math graphics at http://www.ogre.nu/doodle/#chainmail. Here’s one.
An annulus is the region lying between two concentric circles in 2-space– a ring.
…Lipschitz continuity, named after Rudolf Lipschitz, is a smoothness condition for functions which is stronger than regular continuity. Intuitively, a Lipschitz continuous function is limited in how fast it can change; a line joining any two points on the graph of this function will never have a slope steeper than a certain number called the Lipschitz constant of the function….
* The function f(x) = x^2 with domain all real numbers is not Lipschitz continuous. This function becomes arbitrarily steep as x goes to infinity. It is however locally Lipschitz continuous.
* The function f(x) = x^2 defined on [ − 3,7] is Lipschitz continuous, with Lipschitz constant K = 14.
Elasticities in Regressions. (update of old post)Here are how to calculate elasticities from regression coefficients, a note possibly useful to economists who like me keep having to rederive this basic method:
- The elasticity is (%change in Y)/(%change in X) = (dy/dx)*(x/y).
- If y = beta*x then the elasticity is beta*(x/y).
- If y = beta* log(x) then the elasticity is (beta/x)*(x/y) =
- If log(y) = beta* log(x) then the elasticity is (beta*y/x)*(x/y) =
beta, which is a constant elasticity.
(reason: then y= exp(beta*log(x)), so dy/dx = beta*exp(beta*log(x))*(1/x) = beta*y/x.)
- If log(y) = beta*x then the elasticity is (beta* y )*(x/y) = beta*x.
(reason: then y = exp(beta*x), so dy/dx = beta*exp(beta*x) = beta*y.)
- If log(y) = alpha + beta*D, where D is a dummy variable, then we are interested in the finite jump from D=0 to D=1, not an infinitesimal elasticity. That percentage jump is
dy/y = exponent(beta)-1,
because log(y,D=0) = alpha and log(y, D=1) = alpha + beta, so
(y,D=1)/(y, D=0) = exp(alpha+beta)/exp(alpha) = exp(beta)
dy/y = (y,D=1)/(y, D=0) -1 = exp(beta)-1
This is consistent, but not unbiased. We know that OLS is BLUE, unbiased, as an estimator of the impact of the dummy D on log(Y), but that does not imply that it is unbiased as an estimator of the impact of D on Y. That is because E(f(z)) does not equal f(E(z)) in general and that ultimate effect of D on y, exp(beta)-1, is a nonlinear function of beta. Alexander Borisov pointed out to me that Peter Kennedy (AER, 1981) suggests using exp(betahat-vhat(betahat)/2)-1 as an estimate of the effect of going from D=0 to D=1, as biased, but less biased, and also consistent .
I heard Adam Rosen give his paper, “Confidence sets for partially identified parameters that satisfy a finite number of moment inequalities.” It stimulated some thoughts. (Click here to read
A standard counterintuitive result in statistics is that if the
true model is logit, then it is okay to use a sample selected on the
Y’s, which is what the “case-control method” amounts to. You may select
1000 observations with Y=1 and 1000 observations with Y=0 and do
estimation of the effects of every variable but the constant in the
usual way, without any sort of weighting. This was shown in Prentice &
Pyke (1979). They also purport to show that the standard errors may be
computed in the usual way— that is, using the curvature (2nd
derivative) of the likelihood function. (Click here for more)
At lunch at Nuffield I was just asking MM about some math notation I’d like: a symbol for “is not necessarily equal to”. For example, and economics paper might show the following:
Proposition: Stocks with equal risks might or might not have the same returns. In the model’s notation, x IS NOT NECESSARILY EQUAL TO y.