A different kind of experiment at CERN

This article, as written by me, appeared in The Hindu on January 24, 2012.

At the Large Hadron Collider (LHC) at CERN, near Geneva, Switzerland, experiments are conducted by many scientists who don’t quite know what they will see, but know how to conduct the experiments that will yield answers to their questions. They accelerate beams of particles called protons to smash into each other, and study the fallout.

There are some other scientists at CERN who know approximately what they will see in experiments, but don’t know how to do the experiment itself. These scientists work with beams of antiparticles. According to the Standard Model, the dominant theoretical framework in particle physics, every particle has a corresponding particle with the same mass and opposite charge, called an anti-particle.

In fact, at the little-known AEgIS experiment, physicists will attempt to produce an entire beam composed of not just anti-particles but anti-atoms by mid-2014.

AEgIS is one of six antimatter experiments at CERN that create antiparticles and anti-atoms in the lab and then study their properties using special techniques. The hope, as Dr. Jeffrey Hangst, the spokesperson for the ALPHA experiment, stated in an email, is “to find out the truth: Do matter and antimatter obey the same laws of physics?”

Spectroscopic and gravitational techniques will be used to make these measurements. They will improve upon, “precision measurements of antiprotons and anti-electrons” that “have been carried out in the past without seeing any difference between the particles and their antiparticles at very high sensitivity,” as Dr. Michael Doser, AEgIS spokesperson, told this Correspondent via email.

The ALPHA and ATRAP experiments will achieve this by trapping anti-atoms and studying them, while the ASACUSA and AEgIS will form an atomic beam of anti-atoms. All of them, anyway, will continue testing and upgrading through 2013.

Working principle

Precisely, AEgIS will attempt to measure the interaction between gravity and antimatter by shooting an anti-hydrogen beam horizontally through a vacuum tube and then measuring how it much sags due to the gravitational pull of the Earth to a precision of 1 per cent.

The experiment is not so simple because preparing anti-hydrogen atoms is difficult. As Dr. Doser explained, “The experiments concentrate on anti-hydrogen because that should be the most sensitive system, as it is not much affected by magnetic or electric fields, contrary to charged anti-particles.”

First, antiprotons are derived from the Antiproton Decelerator (AD), a particle storage ring which “manufactures” the antiparticles at a low energy. At another location, a nanoporous plate is bombarded with anti-electrons, resulting in a highly unstable mixture of both electrons and anti-electrons called positronium (Ps).

The Ps is then excited to a specific energy state by exposure to a 205-nanometre laser and then an even higher energy state called a Rydberg level using a 1,670-nanometre laser. Last, the excited Ps traverses a special chamber called a recombination trap, when it mixes with antiprotons that are controlled by precisely tuned magnetic fields. With some probability, an antiproton will “trap” an anti-electron to form an anti-hydrogen atom.

Applications

Before a beam of such anti-hydrogen atoms is generated, however, there are problems to be solved. They involve large electric and magnetic fields to control the speed of and collimate the beams, respectively, and powerful cryogenic systems and ultra-cold vacuums. Thus, Dr. Doser and his colleagues will spend many months making careful changes to the apparatus to ensure these requirements work in tandem by 2014.

While antiparticles were first discovered in 1959, “until recently, it was impossible to measure anything about anti-hydrogen,” Dr. Hangst wrote. Thus, the ALPHA and AEgIS experiments at CERN provide a seminal setting for exploring the world of antimatter.

Anti-particles have been used effectively in many diagnostic devices such as PET scanners. Consequently, improvements in our understanding of them feed immediately into medicine. To name an application: Antiprotons hold out the potential of treating tumors more effectively.

In fact, the feasibility of this application is being investigated by the ACE experiment at CERN.

In the words of Dr. Doser: “Without the motivation of attempting this experiment, the experts in the corresponding fields would most likely never have collaborated and might well never have been pushed to solve the related interdisciplinary problems.”

Aaron Swartz is dead.

This article, as written by me and a friend, appeared in The Hindu on January 16, 2013.

In July, 2011, Aaron Swartz was indicted by the district of Massachusetts for allegedly stealing more than 4.8 million articles from the online academic literature repository JSTOR via the computer network at the Massachusetts Institute of Technology. He was charged with, among others, wire-fraud, computer-fraud, obtaining information from a protected computer, and criminal forfeiture.

After paying a $100,000-bond for release, he was expected to stand trial in early 2013 to face the charges and, if found guilty, a 35-year prison sentence and $1 million in fines. More than the likelihood of the sentence, however, what rankled him most was that he was labelled a “felon” by his government.

On January 11, Friday, Swartz’s fight, against information localisation as well as the label given to him, ended when he hung himself in his New York apartment. He was only 26. At the time of his death, JSTOR did not intend to press charges and had decided to release 4.5 million of its articles into the public domain. It seems as though this crime had no victims.

But, he was so much more than an alleged thief of intellectual property. His life was a perfect snapshot of the American Dream. But the nature of his demise shows that dreams are not always what they seem.

At the age of 14, Swartz became a co-author of the RSS (Rich Site Summary) 1.0 specification, now a widely used method for subscribing to web content. He went on to attend Stanford University, dropped out, founded a popular social news website and then sold it — leaving him a near millionaire a few days short of his 20th birthday.

A recurring theme in his life and work, however, were issues of internet freedom and public access to information, which led him to political activism. An activist organisation he founded campaigned heavily against the Stop Online Piracy Act (SOPA) bill, and eventually killed it. If passed, SOPA would have affected much of the world’s browsing.

At a time that is rife with talk of American decline, Swartz’s life reminds us that for now, the United States still remains the most innovative society on Earth, while his death tells us that it is also a place where envelope pushers discover, sometimes too late, that the line between what is acceptable and what is not is very thin.

The charges that he faced, in the last two years before his death, highlight the misunderstood nature of digital activism — an issue that has lessons for India. For instance, with Section 66A of the Indian IT Act in place, there is little chance of organising an online protest and blackout on par with the one that took place over the SOPA bill.

While civil disobedience and street protests usually carry light penalties, why should Swartz have faced long-term incarceration just because he used a computer instead? In an age of Twitter protests and online blackouts, his death sheds light on the disparities that digital activism is subjected to.

His act of trying to liberate millions of scholarly articles was undoubtedly political activism. But had he undertaken such an act in the physical world, he would have faced only light penalties for trespassing as part of a political protest. One could even argue that MIT encouraged such free exchange of information — it is no secret that its campus network has long been extraordinarily open with minimal security.

What then was the point of the public prosecutors highlighting his intent to profit from stolen property worth “millions of dollars” when Swartz’s only aim was to make them public as a statement on the problems facing the academic publishing industry? After all, any academic would tell you that there is no way to profit off a hoard of scientific literature unless you dammed the flow and then released it per payment.

In fact, JSTOR’s decision to not press charges against him came only after they had reclaimed their “stolen” articles — even though Laura Brown, the managing director of JSTOR, had announced in September 2011, that journal content from 500,000 articles would be released for free public viewing and download. In the meantime, Swartz was made to face 13 charges anyway.

Assuming the charges are reasonable at all, his demise will then mean that the gap between those who hold onto information and those who would use it is spanned only by what the government thinks is criminal. That the hammer fell so heavily on someone who tried to bridge this gap is tragic. Worse, long-drawn, expensive court cases are becoming roadblocks on the path towards change, especially when they involve prosecutors incapable of judging the difference between innovation and damage on the digital frontier. It doesn’t help that it also neatly avoids the aura of illegitimacy that imprisoning peaceful activists would have for any government.

Today, Aaron Swartz is dead. All that it took to push a brilliant mind over the edge was a case threatening to wipe out his fortunes and ruin the rest of his life. In the words of Lawrence Lessig, American academic activist, and his former mentor at the Harvard University Edmond J. Safra Centre for Ethics: “Somehow, we need to get beyond the ‘I’m right so I’m right to nuke you’ ethics of our time. That begins with one word: Shame.”

LHC to re-awaken in 2015 with doubled energy, luminosity

This article, as written by me, appeared in The Hindu on January 10, 2012.

After a successful three-year run that saw the discovery of a Higgs-boson-like particle in early 2012, the Large Hadron Collider (LHC) at CERN, near Geneva, Switzerland, will shut down for 18 months for maintenance and upgrades.

This is the first of three long shutdowns, scheduled for 2013, 2017, and 2022. Physicists and engineers will use these breaks to ramp up one of the most sophisticated experiments in history even further.

According to Mirko Pojer, Engineer In-charge, LHC-operations, most of these changes were planned in 2011. They will largely concern fixing known glitches on the ATLAS and CMS particle-detectors. The collider will receive upgrades to increase its collision energy and frequency.

Presently, the LHC smashes two beams, each composed of precisely spaced bunches of protons, at 3.5-4 tera-electron-volts (TeV) per beam.

By 2015, the beam energy will be pushed up to 6.5-7 TeV per beam. Moreover, the bunches which were smashed at intervals of 50 nanoseconds will do so at 25 nanoseconds.

After upgrades, “in terms of performance, the LHC will deliver twice the luminosity,” Dr. Pojer noted in an email to this Correspondent, with reference to the integrated luminosity. Precisely, it is the number of collisions that the LHC can deliver per unit area which the detectors can track.

The instantaneous luminosity, which is the luminosity per second, will be increased to 1×1034 per centimetre-squared per second, ten-times greater than before, and well on its way to peaking at 7.73×1034 per centimetre-squared per second by 2022.

As Steve Myers, CERN’s Director for Accelerators and Technology, announced in December 2012, “More intense beams mean more collisions and a better chance of observing rare phenomena.” One such phenomenon is the appearance of a Higgs-boson-like particle.

The CMS experiment, one of the detectors on the LHC-ring, will receive some new pixel sensors, a technology responsible for tracking the paths of colliding particles. To make use of the impending new luminosity-regime, an extra layer of these advanced sensors will be inserted around a smaller beam pipe.

If results from it are successful, CMS will receive the full unit in late-2016.

In the ATLAS experiment, unlike with CMS which was built with greater luminosities in mind, pixel sensors are foreseen to wear out within one year after upgrades. As an intermediate solution, a new layer of sensors called the B-layer will be inserted within the detector for until 2018.

Because of the risk of radiation damage due to more numerous collisions, specific neutron shields will be fit, according to Phil Allport, ATLAS Upgrade Coordinator.

Both ATLAS and CMS will also receive evaporative cooling systems and new superconducting cables to accommodate the higher performance that will be expected of them in 2015. The other experiments, LHCb and ALICE, will also undergo inspections and upgrades to cope with higher luminosity.

An improved failsafe system will be installed and the existing one upgraded to prevent accidents such as the one in 2008.

Then, an electrical failure damaged 29 magnets and leaked six tonnes of liquid helium into the tunnel, precipitating an eight-month shutdown.

Generally, as Martin Gastal, CMS Experimental Area Manager, explained via email, “All sub-systems will take the opportunity of this shutdown to replace failing parts and increase performance when possible.”

All these changes have been optimised to fulfil the LHC’s future agenda. This includes studying the properties of the newly discovered particle, and looking for signs of new theories of physics like supersymmetry and higher dimensions.

(Special thanks to Achintya Rao, CMS Experiment.)

There’s something wrong with this universe.

I’ve gone on about natural philosophy, the philosophy of representation, science history, and the importance of interdisciplinary perspectives when studying modern science. There’s something that unifies all these ideas, and I wouldn’t have thought of it at all hadn’t I spoken to the renowned physicist Dr. George Sterman on January 3.

I was attending the Institute of Mathematical Sciences’ golden jubilee celebrations. A lot of my heroes were there, and believe me when I say my heroes are different from your heroes. I look up to people who are capable of thinking metaphysically, and physicists more than anyone I’ve come to meet are very insightful in that area.

One such physicist is Dr. Ashoke Sen, whose contributions to the controversial area of string theory are nothing short of seminal – if only for how differently it says we can think about our universe and what the math of that would look like. Especially, Sen’s research into tachyon condensation and the phases of string theory is something I’ve been interested in for a while now.

Knowing that George Sterman was around came as a pleasant surprise. Sterman was Sen’s doctoral guide; while Sen’s a string theorist now, his doctoral thesis was in quantum chromodynamics, a field in which the name of Sterman is quite well-known.


– DR. GEORGE STERMAN (IMAGE: UC DAVIS)

When I finally got a chance to speak with Sterman, it was about 5 pm and there were a lot of mosquitoes around. We sat down in the middle of the lawn on a couple of old chairs, and with a perpetual smile on his face that made one of the greatest thinkers of our time look like a kid in a candy store, Sterman jumped right into answering my first question on what he felt about the discovery of a Higgs-like boson.

Where Sheldon Stone was obstinately practical, Sterman was courageously aesthetic. After the (now usual) bit about how the discovery of the boson was a tribute to mathematics and its ability to defy 50 years of staggering theoretical advancements by remaining so consistent, he said, “But let’s think about naturalness for a second…”

The moment he said “naturalness”, I knew what he was getting it, but more than anything else, I was glad. Here was a physicist who was still looking at things aesthetically, especially in an era where lack of money and the loss of practicality by extension could really put the brakes on scientific discovery. I mean it’s easy to jump up and down and be excited about having spotted the Higgs, but there are very few who feel free to still not be happy.

In Sterman’s words, uttered while waving his arms about to swat away the swarming mosquitoes while discussing supersymmetry:

There’s a reason why so many people felt so confident about supersymmetry. It wasn’t just that it’s a beautiful theory – which it is – or that it engages and challenges the most mathematically oriented among physicists, but in another sense in which it appeared to be necessary. There’s this subtle concept that goes by the name of naturalness. Naturalness as it appears in the Standard Model says that if we gave our any reasonable estimate of what the mass of the Higgs particle should be, it should by all rights be huge! It should be as heavy as what we call the Planck mass [~10^19 GeV].”

Or, as Martinus Veltman put it in an interview to Matthew Chalmers for Nature,

Since the energy of the Higgs is distributed all over the universe, it should contribute to the curvature of space; if you do the calculation, the universe would have to curve to the size of a football.

Naturalness is the idea in particle physics specifically, and in nature generally, that things don’t desire to stand out in any way unless something’s really messed up. For instance, consider the mass hierarchy problem in physics: Why is the gravitational force so much more weaker than the electroweak force? If either of them is a fundamental force of nature, then where is the massive imbalance coming from?

Formulaically speaking, naturalness is represented by this equation:

Here, lambda (the mountain) is the cut-off scale, an energy scale at which the theory breaks down. Its influence over the naturalness of an entity h is determined by how many dimensions lambda acts on – with a maximum of 4. Last, c is the helpful scaling constant that keeps lambda from being too weak or too strong in some setting.

In other words, a natural constant h must be comparable to other nature constants like it if they’re all acting in the same setting.

(TeX: hquad =quad c{ Lambda }^{ 4quad -quad d })

However, given how the electroweak and gravitational forces – which do act in the same setting (also known as our universe) – differ so tremendously in strength, the values of these constants are, to put it bluntly, coincidental.

Problems such as this “violate” naturalness in a way that defies the phenomenological aesthetic of physics. Yes, I’m aware this sounds like hot air but bear with me. In a universe that contains one stupendously weak force and one stupendously strong force, one theory that’s capable of describing both forces would possess two disturbing characteristics:

1. It would be capable of angering one William of Ockham

2. It would require a dirty trick called fine-tuning

I’ll let you tackle the theories of that William of Ockham and go right on to fine-tuning. In an episode of ‘The Big Bang Theory’, Dr. Sheldon Cooper drinks coffee for what seems like the first time in his life and goes berserk. One of the things he touches upon in a caffeine-induced rant is a notion related to the anthropic principle.

The anthropic principle states that it’s not odd that the value of the fundamental constants seem to engender the evolution of life and physical consciousness because if those values aren’t what they are, then a consciousness wouldn’t be able to observe them. Starting with the development of the Standard Model of particle physics in the 1960s, it’s become known that these constants are really fine in their value.

So, with the anthropic principle providing a philosophical cushioning, like some intellectual fodder to fall back on when thoughts run low, physicists set about trying to find out why the values are what they are. As the Standard Model predicted more particles – with annoying precision – physicists also realised that given the physical environment, the universe would’ve been drastically different even if the values were slightly off.

Now, as discoveries poured in and it became clear that the universe housed two drastically different forces in terms of their strength, researchers felt the need to fine-tune the values of the constants to fit experimental observations. This sometimes necessitated tweaking the constants in such a way that they’d support the coexistence of the gravitational and electroweak forces!

Scientifically speaking, this just sounds pragmatic. But just think aesthetically and you start to see why this practice smells bad: The universe is explicable only if you make extremely small changes to certain numbers, changes you wouldn’t have made if the universe wasn’t concealing something about why there was one malnourished kid and one obese kid.


Doesn’t the asymmetry bother you?

Put another way, as physicist Paul Davies did,

There is now broad agreement among physicists and cosmologists that the Universe is in several respects ‘fine-tuned’ for life. The conclusion is not so much that the Universe is fine-tuned for life; rather it is fine-tuned for the building blocks and environments that life requires.

(On a lighter note: If the universe includes both a plausible anthropic principle and a Paul Davies who is a physicist and is right, then multiple universes are a possibility. I’ll let you work this one out.)

Compare all of this to the desirable idea of naturalness and what Sterman was getting at and you’d see that the world around us isn’t natural in any sense. It’s made up of particles whose properties we’re sure of, of objects whose behaviour we’re sure of, but also of forces whose origins indicate an amount of unnaturalness… as if something outside this universe poked a finger in, stirred up the particulate pea-soup, and left before anyone evolved enough to get a look.

(This blog post first appeared at The Copernican on January 6, 2013.)

The case of the red-haired kids

This blog post first appeared, as written by me, on The Copernican science blog on December 30, 2012.

Seriously, shame on me for not noticing the release of a product named Correlate until December 2012. Correlate by Google was released in May last year and is a tool to see how two different search trends have panned out over a period of time. But instead of letting you pick out searches and compare them, Correlate saves a bit of time by letting you choose one trend and then automatically picks out trends similar to the one you’ve your eye on.

For instance, I used the “Draw” option and drew a straight, gently climbing line from September 19, 2004, to July 24, 2011 (both randomly selected). Next, I chose “India” as the source of search queries for this line to be compared with, and hit “Correlate”. Voila! Google threw up 10 search trends that varied over time just as my line had.

correlate_date

Since I’ve picked only India, the space from which the queries originate remains fixed, making this a temporal trend – a time-based one. If I’d fixed the time – like a particular day, something short enough to not produce strong variations – then it’d have been a spatial trend, something plottable on a map.

Now, there were a lot of numbers on the results page. The 10 trends displayed in fact were ranked according to a particular number “r” displayed against them. The highest ranked result, “free english songs”, had r = 0.7962. The lowest ranked result, “to 3gp converter”, had r = 0.7653.

correlations

And as I moused over the chart itself, I saw two numbers, one each against the two trends being tracked. For example, on March 1, 2009, the “Drawn Series” line had a number +0.701, and the “free english songs” line had a number -0.008, against it.

correlate_zoom

What do these numbers mean?

This is what I want to really discuss because they have strong implications on how lay people interpret data that appears in the context of some scientific text, like a published paper. Each of these numbers is associated with a particular behaviour of some trend at a specific point. So, instead of looking at it as numbers and shapes on a piece of paper, look at it for what it represents and you’ll see so many possibilities coming to life.

The numbers against the trends, +0.701 for “Drawn Series” (my line) and -0.008 for “free english songs” in March ‘09, are the deviations. The deviation is a lovely metric because it sort of presents the local picture in comparison to the global picture, and this perspective is made possible by the simple technique used to evaluate it.

Consider my line. Each of the points on the line has a certain value. Use this information to find their average value. Now, the deviation is how much a point’s value is away from the average value.

It’s like if 11 red-haired kids were made to stand in a line ordered according to the redness of their hair. If the “average” colour around was a perfect orange, then the kid with the “reddest” hair and the kid with the palest-red hair will be the most deviating. Kids with some semblance of orange in their hair-colour will be progressively less deviating until they’re past the perfect “orangeness”, and the kid with perfectly-orange hair will completely non-deviating.

So, on August 23, 2009, “Drawn Series” was higher than its average value by 0.701 and “free english songs” was lower than its average value by 0.008. Now, if you’re wondering what the units are to measure these numbers: Deviations are dimensionless fractions – which means they’re just numbers whose highness or lowness are indications of intensity.

And what’re they fractions of? The value being measured along the trend being tracked.

Now, enter standard deviation. Remember how you found the average value of a point on my line? Well, the standard deviation is the average value among all deviations. It’s like saying the children fitting a particular demographic are, for instance, 25 per cent smarter on average than other normal kids: the standard deviation is 25 per cent and the individual deviations are similar percentages of the “smartness” being measured.

So, right now, if you took the bigger picture, you’d see the chart, the standard deviation (the individual deviations if you chose to mouse-over), the average, and that number “r”. The average will indicate the characteristic behaviour of the trend – let’s call it “orange” – the standard deviation will indicate how far off on average a point’s behaviour will be deviating in comparison to “orange” – say, “barely orange”, “bloody”, etc. – and the individual deviations will show how “orange” each point really is.

At this point I must mention that I conveniently oversimplified the example of the red-haired kids to avoid a specific problem. This problem has been quite single-handedly responsible for the news-media wrongly interpreting results from the LHC/CERN on the Higgs search.

In the case of the kids, we assumed that, going down the line, each kid’s hair would get progressively darker. What I left out was how much darker the hair would get with each step.

Let’s look at two different scenarios.

Scenario 1: The hair gets darker by a fixed amount each step.

Let’s say the first kid’s got hair that’s 1 units of orange, the fifth kid’s got 5 units, and the 11th kid’s got 11 units. This way, the average “amount of orange” in the lineup is going to be 6 units. The deviation on either side of kid #6 is going to increase/decrease in steps of 1. In fact, from the first to the last, it’s going to be 5, 4, 3, 2, 1, 0, 1, 2, 3, 4, and 5. Straight down and then straight up.

blue_bars

Scenario 2: The hair gets darker slowly and then rapidly, also from 1 to 11 units.

In this case, the average is not going to be 6 units. Let’s say the “orangeness” this time is 1, 1.5, 2, 2.5, 3, 3.5, 4, 5.5, 7.5, 9.75, and 11 per kid, which brings the average to ~4.6591 units. In turn, the deviations are 3.6591, 3.1591, 2.6591, 2, 1591, 1.6591, 1.1591, 0.6591, 0.8409, 2.8409, 5.0909, and 6.3409. In other words, slowly down and then quickly more up.

red_bars

In the second scenario, we saw how the average got shifted to the left. This is because there were more less-orange kids than more-orange ones. What’s more important is that it didn’t matter if the kids on the right had more more-orange hair than before. That they were fewer in number shifted the weight of the argument away from them!

In much the same way, looking for the Higgs boson from a chart that shows different peaks (number of signature decay events) at different points (energy levels), with taller but fewer peaks to one side and shorter but many more peaks to the other, can be confusing. While more decays could’ve occurred at discrete energy levels, the Higgs boson is more likely (note: not definitely) to be found within the energy-level where decays occur more frequently (in the chart below, decays are seen to occur more frequently at 118-126 GeV/c2 than at 128-138 GeV/c2 or 110-117 GeV/c2).

incidence
Idea from Prof. Matt Strassler’s blog

If there’s a tall peak where a Higgs isn’t likely to occur, then that’s an outlier, a weirdo who doesn’t fit into the data. It’s probably called an outlier because its deviation from the average could be well outside the permissible deviation from the average.

This also means it’s necessary to pick the average from the right area to identify the right outliers. In the case of the Higgs, if its associated energy-level (mass) is calculated as being an average of all the energy levels at which a decay occurs, then freak occurrences and statistical noise are going to interfere with the calculation. But knowing that some masses of the particle have been eliminated, we can constrain the data to between two energy levels, and then go after the average.

So, when an uninformed journalist looks at the data, the taller peaks can catch the eye, even run away with the ball. But look out for the more closely occurring bunches – that’s where all the action is!

If you notice, you’ll also see that there are no events at some energy levels. This is where you should remember that uncertainty cuts both ways. When you’re looking at a peak and thinking “This can’t be it; there’s some frequency of decays to the bottom, too”, you’re acknowledging some uncertainty in your perspective. Why not acknowledge some uncertainty when you’re noticing absent data, too?

While there’s a peak at 126 GeV/c2, the Higgs weighs between 124-125 GeV/c2. We know this now, so when we look at the chart, we know we were right in having been uncertain about the mass of the Higgs being 126 GeV/c2. Similarly, why not say “There’s no decays at 113 GeV/c2, but let me be uncertain and say there could’ve been a decay there that’s escaped this measurement”?

Maybe this idea’s better illustrated with this chart.

incidence_valley

There’s a noticeable gap between 123 and 125 GeV/c2. Just looking at this chart and you’re going to think that with peaks on either side of this valley, the Higgs isn’t going to be here… but that’s just where it is! So, make sure you address uncertainty when you’re determining presences as well as absences.

So, now, we’re finally ready to address “r”, the Pearson covariance coefficient. It’s got a formula, and I think you should see it. It’s pretty neat.

daum_equation_1356801915634

(TeX: rquad =quad frac { { Sigma }_{ i=1 }^{ n }({ X }_{ i }quad -quad overset { _ }{ X } )({ Y }_{ i }quad -quad overset { _ }{ Y } ) }{ sqrt { { Sigma }_{ i=1 }^{ n }{ ({ X }_{ i }quad -quad overset { _ }{ X } ) }^{ 2 } } sqrt { { Sigma }_{ i=1 }^{ n }{ (Y_{ i }quad -quad overset { _ }{ Y } ) }^{ 2 } } })

The equation says “Let’s see what your Pearson covariance, “r”, is by seeing how much all of your variations are deviant keeping in mind both your standard deviations.”

The numerator is what’s called the covariance, and the denominator is basically the product of the standard deviations. X-bar, which is X with a bar atop, is the average value of X – my line – and the same goes for Y-bar, corresponding to Y – “mobile games”. Individual points on the lines are denoted with the subscript “i”, so the points would be X1, X2, X3, …, and Y1, Y2, Y3, …”n” in the formula is the size of the sample – the number of days over which we’re comparing the two trends.

The Pearson covariance coefficient is not called the Pearson deviation coefficient, etc., because it normalises the graph’s covariance. Simply put, covariance is a measure of how much the two trends vary together. It can have a minimum value of 0, which would mean one trend’s variation has nothing to do with the other’s, and a maximum value of 1, which would mean one trend’s variation is inescapably tied with the variation of the other’s. Similarly, if the covariance is positive, it means that if one trend climbs, the other would climb, too. If the covariance is negative, then one trend’s climbing would mean the other’s descending (In the chart below, between Oct ’09 and Jan ’10, there’s a dip: even during the dive-down, the blue line is on an increasing note – here, the local covariance will be negative).

correlate_sample

Apart from being a conveniently defined number, covariance also records a trend’s linearity. In statistics, linearity is a notion that stands by its name: like a straight line, the rise or fall of a trend is uniform. If you divided up the line into thousands of tiny bits and called each one on the right the “cause” and the one on the left the “effect”, then you’d see that linearity means each effect for each cause is either an increase or a decrease by the same amount.

Just like that, if the covariance is a lower positive number, it means one trend’s growth is also the other trend’s growth, and in equal measure. If the covariance is a larger positive number, you’d have something like the butterfly effect: one trend moves up by an inch, the other shoots up by a mile. This you’ll notice is a break from linearity. So if you plotted the covariance at each point in a chart as a chart by itself, one look will tell you how the relationship between the two trends varies over time (or space).

The case of the red-haired kids

Seriously, shame on me for not noticing the release of a product named Correlate until December 2012. Correlate by Google was released in May last year and is a tool to see how two different search trends have panned out over a period of time. But instead of letting you pick out searches and compare them, Correlate saves a bit of time by letting you choose one trend and then automatically picks out trends similar to the one you’ve your eye on.

For instance, I used the “Draw” option and drew a straight, gently climbing line from September 19, 2004, to July 24, 2011 (both randomly selected). Next, I chose “India” as the source of search queries for this line to be compared with, and hit “Correlate”. Voila! Google threw up 10 search trends that varied over time just as my line had.

Since I’ve picked only India, the space from which the queries originate remains fixed, making this a temporal trend – a time-based one. If I’d fixed the time – like a particular day, something short enough to not produce strong variations – then it’d have been a spatial trend, something plottable on a map.

Now, there were a lot of numbers on the results page. The 10 trends displayed in fact were ranked according to a particular number “r” displayed against them. The highest ranked result, “free english songs”, had r = 0.7962. The lowest ranked result, “to 3gp converter”, had r = 0.7653.

And as I moused over the chart itself, I saw two numbers, one each against the two trends being tracked. For example, on March 1, 2009, the “Drawn Series” line had a number +0.701, and the “free english songs” line had a number -0.008, against it.

What do these numbers mean?

This is what I want to really discuss because they have strong implications on how lay people interpret data that appears in the context of some scientific text, like a published paper. Each of these numbers is associated with a particular behaviour of some trend at a specific point. So, instead of looking at it as numbers and shapes on a piece of paper, look at it for what it represents and you’ll see so many possibilities coming to life.

The numbers against the trends, +0.701 for “Drawn Series” (my line) and -0.008 for “free english songs” in March ‘09, are the deviations. The deviation is a lovely metric because it sort of presents the local picture in comparison to the global picture, and this perspective is made possible by the simple technique used to evaluate it.

Consider my line. Each of the points on the line has a certain value. Use this information to find their average value. Now, the deviation is how much a point’s value is away from the average value.

It’s like if 11 red-haired kids were made to stand in a line ordered according to the redness of their hair. If the “average” colour around was a perfect orange, then the kid with the “reddest” hair and the kid with the palest-red hair will be the most deviating. Kids with some semblance of orange in their hair-colour will be progressively less deviating until they’re past the perfect “orangeness”, and the kid with perfectly-orange hair will completely non-deviating.

So, on August 23, 2009, “Drawn Series” was higher than its average value by 0.701 and “free english songs” was lower than its average value by 0.008. Now, if you’re wondering what the units are to measure these numbers: Deviations are dimensionless fractions – which means they’re just numbers whose highness or lowness are indications of intensity.

And what’re they fractions of? The value being measured along the trend being tracked.

Now, enter standard deviation. Remember how you found the average value of a point on my line? Well, the standard deviation is the average value among all deviations. It’s like saying the children fitting a particular demographic are, for instance, 25 per cent smarter on average than other normal kids: the standard deviation is 25 per cent and the individual deviations are similar percentages of the “smartness” being measured.

So, right now, if you took the bigger picture, you’d see the chart, the standard deviation (the individual deviations if you chose to mouse-over), the average, and that number “r”. The average will indicate the characteristic behaviour of the trend – let’s call it “orange” – the standard deviation will indicate how far off on average a point’s behaviour will be deviating in comparison to “orange” – say, “barely orange”, “bloody”, etc. – and the individual deviations will show how “orange” each point really is.

At this point I must mention that I conveniently oversimplified the example of the red-haired kids to avoid a specific problem. This problem has been quite single-handedly responsible for the news-media wrongly interpreting results from the LHC/CERN on the Higgs search.

In the case of the kids, we assumed that, going down the line, each kid’s hair would get progressively darker. What I left out was how much darker the hair would get with each step.

Let’s look at two different scenarios.

Scenario 1The hair gets darker by a fixed amount each step.

Let’s say the first kid’s got hair that’s 1 units of orange, the fifth kid’s got 5 units, and the 11th kid’s got 11 units. This way, the average “amount of orange” in the lineup is going to be 6 units. The deviation on either side of kid #6 is going to increase/decrease in steps of 1. In fact, from the first to the last, it’s going to be 5, 4, 3, 2, 1, 0, 1, 2, 3, 4, and 5. Straight down and then straight up.

Scenario 2The hair gets darker slowly and then rapidly, also from 1 to 11 units.

In this case, the average is not going to be 6 units. Let’s say the “orangeness” this time is 1, 1.5, 2, 2.5, 3, 3.5, 4, 5.5, 7.5, 9.75, and 11 per kid, which brings the average to ~4.6591 units. In turn, the deviations are 3.6591, 3.1591, 2.6591, 2, 1591, 1.6591, 1.1591, 0.6591, 0.8409, 2.8409, 5.0909, and 6.3409. In other words, slowly down and then quickly more up.

In the second scenario, we saw how the average got shifted to the left. This is because there were more less-orange kids than more-orange ones. What’s more important is that it didn’t matter if the kids on the right had more more-orange hair than before. That they were fewer in number shifted the weight of the argument away from them!

In much the same way, looking for the Higgs boson from a chart that shows different peaks (number of signature decay events) at different points (energy levels), with taller but fewer peaks to one side and shorter but many more peaks to the other, can be confusing. While more decays could’ve occurred at discrete energy levels, the Higgs boson is more likely (note: not definitely) to be found within the energy-level where decays occur more frequently (in the chart below, decays are seen to occur more frequently at 118-126 GeV/c2 than at 128-138 GeV/c2 or 110-117 GeV/c2).

If there’s a tall peak where a Higgs isn’t likely to occur, then that’s an outlier, a weirdo who doesn’t fit into the data. It’s probably called an outlier because its deviation from the average could be well outside the permissible deviation from the average.

This also means it’s necessary to pick the average from the right area to identify the right outliers. In the case of the Higgs, if its associated energy-level (mass) is calculated as being an average of all the energy levels at which a decay occurs, then freak occurrences and statistical noise are going to interfere with the calculation. But knowing that some masses of the particle have been eliminated, we can constrain the data to between two energy levels, and then go after the average.

So, when an uninformed journalist looks at the data, the taller peaks can catch the eye, even run away with the ball. But look out for the more closely occurring bunches – that’s where all the action is!

If you notice, you’ll also see that there are no events at some energy levels. This is where you should remember that uncertainty cuts both ways. When you’re looking at a peak and thinking “This can’t be it; there’s some frequency of decays to the bottom, too”, you’re acknowledging some uncertainty in your perspective. Why not acknowledge some uncertainty when you’re noticing absent data, too?

While there’s a peak at 126 GeV/c2, the Higgs weighs between 124-125 GeV/c2. We know this now, so when we look at the chart, we know we were right in having been uncertain about the mass of the Higgs being 126 GeV/c2. Similarly, why not say “There’s no decays at 113 GeV/c2, but let me be uncertain and say there could’ve been a decay there that’s escaped this measurement”?

Maybe this idea’s better illustrated with this chart.

– IDEA FROM Prof. Matt Strassler’s blog

There’s a noticeable gap between 123 and 125 GeV/c2. Just looking at this chart and you’re going to think that with peaks on either side of this valley, the Higgs isn’t going to be here… but that’s just where it is! So, make sure you address uncertainty when you’re determining presences as well as absences.

So, now, we’re finally ready to address “r”, the Pearson covariance coefficient. It’s got a formula, and I think you should see it. It’s pretty neat.

(TeX: rquad =quad frac { { Sigma }_{ i=1 }^{ n }({ X }_{ i }quad -quad overset { _ }{ X } )({ Y }_{ i }quad -quad overset { _ }{ Y } ) }{ sqrt { { Sigma }_{ i=1 }^{ n }{ ({ X }_{ i }quad -quad overset { _ }{ X } ) }^{ 2 } } sqrt { { Sigma }_{ i=1 }^{ n }{ (Y_{ i }quad -quad overset { _ }{ Y } ) }^{ 2 } } })

The equation says “Let’s see what your Pearson covariance, “r“, is by seeing how much all of your variations are deviant keeping in mind both your standard deviations.”

The numerator is what’s called the covariance, and the denominator is basically the product of the standard deviations. X-bar, which is X with a bar atop, is the average value of X – my line – and the same goes for Y-bar, corresponding to Y – “mobile games”. Individual points on the lines are denoted with the subscript “i”, so the points would be X1, X2, X3, …, and Y1, Y2, Y3, …”n” in the formula is the size of the sample – the number of days over which we’re comparing the two trends.

The Pearson covariance coefficient is not called the Pearson deviation coefficient, etc., because it normalises the graph’s covariance. Simply put, covariance is a measure of how much the two trends vary together. It can have a minimum value of 0, which would mean one trend’s variation has nothing to do with the other’s, and a maximum value of 1, which would mean one trend’s variation is inescapably tied with the variation of the other’s. Similarly, if the covariance is positive, it means that if one trend climbs, the other would climb, too. If the covariance is negative, then one trend’s climbing would mean the other’s descending (In the chart below, between Oct ’09 and Jan ’10, there’s a dip: even during the dive-down, the blue line is on an increasing note – here, the local covariance will be negative).

Apart from being a conveniently defined number, covariance also records a trend’s linearity. In statistics, linearity is a notion that stands by its name: like a straight line, the rise or fall of a trend is uniform. If you divided up the line into thousands of tiny bits and called each one on the right the “cause” and the one on the left the “effect”, then you’d see that linearity means each effect for each cause is either an increase or a decrease by the same amount.

Just like that, if the covariance is a lower positive number, it means one trend’s growth is also the other trend’s growth, and in equal measure. If the covariance is a larger positive number, you’d have something like the butterfly effect: one trend moves up by an inch, the other shoots up by a mile. This you’ll notice is a break from linearity. So if you plotted the covariance at each point in a chart as a chart by itself, one look will tell you how the relationship between the two trends varies over time (or space).

Dr. Stone on the Higgs search

On December 10, 2012, I spoke to a bunch of physicists attending the Frontiers of High-energy Physics symposium at the Institute of Mathematical Sciences, Chennai. They included Rahul Sinha, G. Rajasekaran, Tom Kibble, Sheldon Stone, Marina Artuso, M.V.N. Murthy, Kajari Mazumdar, and Hai-Yang Cheng, amongst others.

All their talks, obviously, focused on either the search for the Higgs boson or the search for dark matter, with the former being assured and celebratory and the latter, contemplative and cautious. There was nothing new left to be said – as a peg for a news story – given that what of 2012 had gone before that day had already read hundreds of stories on the two searches.

Most of the memorable statements by physicists I remember from that day came from Dr. Sheldon Stone, Syracuse University, and member, LHCb collaboration.

A word on the LHCb before I go any further: It’s one of the seven detector-experiments situated on the Large Hadron Collider’s (LHC’s) ring. Unlike the ATLAS and CMS, whose focus is on the Higgs boson, the LHCb collaboration is studying the decay of B-mesons and signs of CP-symmetry violations at high energies.

While he had a lot to say, he also best summed up what physicists worldwide might’ve felt when the theorised set of particles’ rules called the Standard Model (SM) was having its predictions validated one after the other, leaving no room for a new theory to edge its way in. While very elegant by itself, the SM has no answers to some of the more puzzling questions, such as that of dark matter or of mass-hierarchy problem.

In other words, the more it stands validated, the fewer cracks there are for a new and better theory, like Supersymmetry, to show itself.

In Dr. Stone’s words, “It’s very depressing. The Standard Model has been right on target, and so far, nothing outside the model has been observed. It’s very surprising that everything works, but at the same time, we don’t know why it works! Everywhere, physicists are depressed and clueless, intent on digging deeper, or both. I’m depressed, too, but I also want to dig deeper.”

In answer to some of my questions on what the future held, Dr. Stone said, “Now that we know how things actually work, we’re starting to play some tricks. But beyond that, moving ahead, with new equipment, etc., is going to cost a lot of money. We’ve to invest in the collider, in a lot of detector infrastructure, and computing accessories. In 2012, we had a tough time keeping with the results the LHC was producing. For the future, we’re counting on advancements in computer science and the LHC Grid.”

One interesting thing that he mentioned in one of his answers was that the LHC costs less than one aircraft-carrier. I thought that’d put things in perspective – how much some amount of investment in science could achieve when compared to what the same amount could achieve in other areas. This is not to discourage the construction of aircraft carriers, but to rethink the opportunities science research has the potential to unravel.

(This blog post first appeared at The Copernican on December 22, 2012.)

Is there only one road to revolution?

Read this first.

mk

Some of this connects, some of it doesn’t. Most of all, I have discovered a fear in me that keeps me from from disagreeing with people like Meena Kandasamy – great orators, no doubt, but what are they really capable of?

The piece speaks of revolution as being the sole goal of an Indian youth’s life, that we must spend our lives stirring the muddied water, exposing the mud to light, and separating grime from guts and guts from guts from glory. This is where I disagree. Revolution is not my cause. I don’t want to stir the muddied water. I concede that I am afraid that I will fail.

And at this point, Meena Kandasamy would have me believe, I should either crawl back into my liberty-encrusted shell or lay down my life. Why should I when I know I will succeed in keeping aspirations alive? Why should I when, given the freedom to aspire, I can teach others how to go about believing the same? Why should I when I can just pour in more and more clean water and render the mud a minority?

Why is this never an option? Have we reached a head, that it’s either a corruption-free world or a bloodied one? India desperately needs a revolution, yes, but not one that welcomes a man liberated after pained struggles to a joyless world.

How big is your language?

This blog post first appeared, as written by me, on The Copernican science blog on December 20, 2012.

zipff

It all starts with Zipf’s law. Ever heard of it? It’s a devious little thing, especially when you apply it to languages.

Zipf’s law states that the chances of finding a word of a language in all the texts written in that language are inversely proportional to the word’s rank in the frequency table. In other words, this means that the chances of finding the most frequent word is twice as much as are chances of finding the second most frequent word, thrice as much as are chances of finding the third most frequent word, and so on.

Unfortunately (only because I like how “Zipf” sounds), the law holds only until about the 1,000th most common word; after this point, a logarithmic plot drawn between frequency and chance stops being linear and starts to curve.

The importance of this break is that if Zipf’s law fails to hold for a large corpus of words, then the language, at some point, must be making some sort of distinction between common and exotic words, and its need for new words must either be increasing or decreasing. This is because, if the need remained constant, then the distinction would be impossible to define except empirically and never conclusively – going against the behaviour of Zipf’s law.

Consequently, the chances of finding the 10,000th word won’t be 10,000 times less than the chances of finding the most frequently used word but a value much lesser or much greater.

A language’s diktat

Analysing each possibility, i.e., if the chances of finding the 10,000th-most-used word are NOT 10,000 times less than the chances of finding the most-used word but…

  • Greater (i.e., The Asymptote): The language must have a long tail, also called an asymptote. Think about it. If the rarer words are all used almost as frequently as each other, then they can all be bunched up into one set, and when plotted, they’d form a straight line almost parallel to the x-axis (chance), a sort of tail attached to the rest of the plot.
  • Lesser (i.e., The Cliff): After expanding to include a sufficiently large vocabulary, the language could be thought to “drop off” the edge of a statistical cliff. That is, at some point, there will be words that exist and mean something, but will almost never be used because syntactically simpler synonyms exist. In other words, in comparison to the usage of the first 1,000 words of the language, the (hypothetical) 10,000th word would be used negligibly.

The former possibility is more likely – that the chances of finding the 10,000th-most-used word would not be as low as 10,000-times less than the chances of encountering the most-used word.

As a language expands to include more words, it is likely that it issues a diktat to those words: “either be meaningful or go away”. And as the length of the language’s tail grows, as more exotic and infrequently used words accumulate, the need for those words drops off faster over time that are farther from Zipf’s domain.

Another way to quantify this phenomenon is through semantics (and this is a far shorter route of argument): As the underlying correlations between different words become more networked – for instance, attain greater betweenness – the need for new words is reduced.

Of course, the counterargument here is that there is no evidence to establish if people are likelier to use existing syntax to encapsulate new meaning than they are to use new syntax. This apparent barrier can be resolved by what is called the principle of least effort.

Proof and consequence

While all of this has been theoretically laid out, there had to have been many proofs over the years because the object under observation is a language – a veritable projection of the right to expression as well as a living, changing entity. And in the pursuit of some proof, on December 12, I spotted a paper on arXiv that claims to have used an “unprecedented” corpus (Nature scientific report here).

Titled “Languages cool as they expand: Allometric scaling and the decreasing need for new words”, it was hard to miss in the midst of papers, for example, being called “Trivial symmetries in a 3D topological torsion model of gravity”.

The abstract of the paper, by Alexander Petersen from the IMT Lucca Institute for Advanced Studies, et al, has this line: “Using corpora of unprecedented size, we test the allometric scaling of growing languages to demonstrate a decreasing marginal need for new words…” This is what caught my eye.

While it’s clear that Petersen’s results have been established only empirically, that their corpus includes all the words in books written with the English language between 1800 and 2008 indicates that the set of observables is almost as large as it can get.

Second: When speaking of corpuses, or corpora, the study has also factored in Heaps’ law (apart from Zipf’s law), and found that there are some words that obey neither Zipf nor Heaps but are distinct enough to constitute a class of their own. This is also why I underlined the word common earlier in this post. (How Petersen, et al, came to identify this is interesting: They observed deviations in the lexicon of individuals diagnosed with schizophrenia!)

The Heaps’ law, also called the Heaps-Herdan law, states that the chances of discovering a new word in one large instance-text, like one article or one book, become lesser as the size of the instance-text grows. It’s like a combination of the sunk-cost fallacy and Zipf’s law.

It’s a really simple law, too, and makes a lot of sense even intuitively, but the ease with which it’s been captured statistically is what makes the Heaps-Herdan law so wondrous.

The sub-linear Heaps' law plot: Instance-text size on x-axis; Number of individual words on y-axis.
The sub-linear Heaps’ law plot: Instance-text size on x-axis; Number of individual words on y-axis.

Falling empires

And Petersen and his team establish in the paper that, extending the consequences of Zipf’s and Heaps’ laws to massive corpora, the larger a language is in terms of the number of individual words it contains, the slower it will grow, the lesser cultural evolution it will engender. In the words of the authors: “… We find a scaling relation that indicates a decreasing ‘marginal need’ for new words which are the manifestations of cultural evolution and the seeds for language growth.”

However, for the class of “distinguished” words, there seems to exist a power law – one that results in a non-linear graph unlike Zipf’s and Heaps’ laws. This means that as new exotic words are added to a language, the need for them, as such, is unpredictable and changes over time for as long as they are away from the Zipf’s law’s domain.

All in all, languages eventually seem an uncanny mirror of empires: The larger they get, the slower they grow, the more intricate the exchanges become within it, the fewer reasons there are to change, until some fluctuations are injected from the world outside (in the form of new words).

In fact, the mirroring is not so uncanny considering both empires and languages are strongly associated with cultural evolution. Ironically enough, it is the possibility of cultural evolution that very meaningfully justifies the creation and employment of languages, which means that at some point, languages only become bloated in some way to stop germination of new ideas and instead start to suffocate such initiatives.

Does this mean the extent to which a culture centered on a language has developed and will develop depends on how much the language itself has developed and will develop? Not conclusively – as there are a host of other factors left to be integrated – but it seems a strong correlation exists between the two.

So… how big is your language?