Use of Notation

For most people, a mathematical notation is like a language: you learn it and stick with it. For people doing mathematical research, however, this is not enough: they must design new notations for new problems. The design of good notation is both hard and worthwhile since a bad initial notation can retard a line of research greatly.

Before we had mathematical notation, equations were all written out in language. Since words have multiple meanings and variable precedences, long equations written out in language can be extraordinarily difficult and sometimes fundamentally ambiguous. A good representative example of this is the legalese in the tax code. Since we want greater precision and clarity, we adopt mathematical notation.

One fundamental thing to understand about mathematical notation, is that humans as logic verifiers, are barely capable. This is the fundamental reason why one notation can be much better than another. This observation is easier to miss than you might expect because, for a problem that you are working on, you have already expended the effort to reach an understanding.

I don’t know of any systematic method for designing notation, but there are a set of heuristics learned over time which may be more widely helpful.

  1. Notation should be minimized. If there are two ways to express things, then choose the (objectively, by symbol count) simpler one. If notation is only used once, it should be removable (this often arises in presentations).
  2. Notation divergence should be minimized. If the people working on some problem have a standard notation, then sticking with it is easier. For example, in machine learning x is almost always a set of features from which predictions are made.
  3. A reasonable mechanism for notation design is to first name and define the quantities you are working with (for example, reward r and time t), and then make derived quantities by combination (for example rt is reward at time t).
  4. Variables should be alliterated. Time is t, reward is r, cost is c, hypothesis is h.
  5. Name collisions (or near collisions) should be avoided. E and p are terrible variable names in some contexts.
  6. Sub-sub-scripts should be avoided. It is often possible to change a sub-sub-script into a sub-script by redefinition.
  7. Superscripts are dangerous because of overloading with exponentiation.
  8. Inessential dependences should be suppressed in the definition. (For example, in reinforcement learning the MDP M you are working with is often suppressable because it never changes.)
  9. A dependence must be either uniformly suppressed or uniformly explicit.
  10. Short theorem statements are very nice. There seem to be two styles of theorem statements: long including all definitions and short with definitions made before the statement. As computer scientists, we have to prefer “short” because long is nonmodular. As humans, it’s easier to read.
  11. It is very easy to forget the quantification of a variable (“for all” or “there exists”) when you are working on a theorem, and it is essential for readers that you specify it explicitly.
  12. Avoid strange alphabets. It is hard for people to think with unfamiliar symbols. english lowercase > english upper case > greek lower case > greek upper case > hebrew > other strange things.
  13. The definitions section of a paper often should not contain all the definitions in a paper. Instead, it should cover the universally used definitions. Others can be introduced just before they are used.

These heuristics often come into conflict, which can be hard to resolve. When trying to resolve the conflict, it’s important to understand that it’s easy to fail to imagine what a notation would be like. Trying out different notations and comparing is reasonable.

Are there other useful heuristics for notation design? (Or disagreements with the above heuristics?)

10 Replies to “Use of Notation”

  1. This is a helpful article. The heuristics are very useful for a paper practicer like me. The notation usages are even confusing for non-English researchers. As for me, I can not differentiate some hebrew notations from greek lower case ones. People often use them in same cases.

  2. I would also advise to properly use bound variables. A programming language would never let you get away with the kind of abuse of “local variables” that mathematicians often commit. What I have in mind is things like writing f(x) for a function instead of just f.

    Do not be afraid to use proper notation for anonymous functions, i.e., instead of saying “x^2 + 3 is a function” (it isn’t!), say “x \mapsto x^2 + 3 is a function”, or “\lambda x. x^2 + 3 is a function”. or “fun x -> x^2 + 3 is a function”.

    Another silly thing (which always confused me when I was a student) is insane notation for partial derivatives. For example, in calculus of variations one takes a partial derivative of L with respect to both x, and “\dot{x}”. How on earth can you differentiate w.r.t. to a derivative of x? Of course, you don’t really, you just use insane notation.

    Always indicate, either explicitly or by convention, what the type of any variable is. This is something most people do anyhow.

  3. oops… it was:
    Knuth’s “Mathematical Writing” [http://www-cs-faculty.stanford.edu/~knuth/klr.html]
    and the its pdf [http://tex.loria.fr/typographie/mathwriting.pdf]

  4. I agree on the use of “the function f(x)”. It is very disturbing, and people only notice it when f is actually a function that maps stuff (whatever they are) to functions. Then, people are all confused between the function f and the function f(x). This makes some papers very hard to read.

  5. Another silly thing (which always confused me when I was a student) is insane notation for partial derivatives. For example, in calculus of variations one takes a partial derivative of L with respect to both x, and “\dot{x}”. How on earth can you differentiate w.r.t. to a derivative of x? Of course, you don’t really, you just use insane notation.

    Don’t I know it! Worse, sometimes in COV you take what appears to be derivatives w.r.t. to functions (dL/d{q}),
    when your really taking scalar derivatives with respect to the third argument. The first time I read this notation
    I thought I’d *never* understand.

  6. Thanks for the article – it’s a great list of things to keep in mind when doing any technical writing.

    One thing I’ve seen in papers is a section listing the notation used and each symbol’s definition. This is a good idea as an appendix. I don’t think it’s a good idea when used in the main body of the paper, though — better to introduce each piece of notation in context.

    The other thing I’d add is to be extra careful in your point 2) if your paper may have readers from different backgrounds. Quantum computing is a good example of this, where there is a choice whether or not to use bra-ket notation for vectors.

  7. This is a timely article, because I’m looking for an example of machine learning models expressed in a format that is common in the social sciences. The format expresses inference by describing probability distributions using two equations (a stochastic and a systemic). An example for least squares woul be: Y ~ fn( y | mu, sigma^2 ), mu = XB.

    Any ideas?

  8. i usually find greek letters more suited to describe abstract objects in mathematics, and part of their charm is precisely that they look a bit more unusual for most readers, and hence underline that there is something a bit specific about the manipulated objects. I usually keep the roman alphabet (english letters?? some kind of ethnocentrism?) for more simple entities, such as integers, real numbers or simple functions. Finally, another thing is that greek letters look a bit more “static” while roman letters feel easier to manipulate. This might be related to programming, since we only use i,j,k,n etc… in loops, and not $alpha $beta etc..

Comments are closed.