Posted by: Chris Brew | April 12, 2012

Pandas is useful, when enchanted

If you are Python devotee, and you write programs that munge words, the combination of and is easy and efficient for tokenization, word counting, and all that. Recommended.

Posted by: Chris Brew | June 30, 2011

New Job

I just moved to the Educational Testing Service, to work on their cRater project, which is like the famous eRater essay-grading project, except with more semantics, and for short answers instead of essays. I THINK the c stands for content, but it could stand for “constructed response”, which is a psychometrics and educational testing thing. Learning fast, having fun, working with a great group of people. Note that the cRater link describes cRater as it was in 2004, not as it is now.

Posted by: Chris Brew | April 11, 2011

“You can observe a lot by just watching” Yogi Berra

While it isn’t always easy, I can usually tell where people were raised as well as where they were born. Koreans raised in Los Angeles have a style completely different from those born in Seoul; the English, en masse, look different from the Scots, and it just takes one look at those wacky triangular eyeglasses for me to know that a young lady is either French or getting that way. In the same way, just by looking, I can pretty much diagnose the families waiting for the Newark to Gatwick flight. Here’s one, prematurely greying father with John Lennon glasses, slightly older  mother with shoulder length hair, three blond boys with backpacks and crew cuts. I’m like, O.K. , he’s British, she’s American, all three boys born in the USA. Or, Asian looking father, fiftyish, no mother in the party, two young teen daughters, one classically Eurasian looking, the other blonder. Sure, I can do that: he’s born in Hong Kong, but doesn’t speak Cantonese well, one kid born in Shanghai, the other in canoe transit up the Amazon. Same father, I think, but the first girl’s mother is definitely working as a dogcatcher in Evansville, Indiana, and the second one’s mother once had that unfortunate accident with a hairnet and an avocado. Could these be the same person? Very likely, but I’m not infallible, while I know for sure that the father is part-time seal tamer and computer science professor, I can’t be be sure whether he’s a bigamist. It’s just a matter of assessing the evidence.

I can also tell what language people speak, because the patterns of vowels and consonants shape the face. Turkish oral surgeons spend 47% of their time unsticking the tongue tip from the roof of the mouth. “Who put the gluten in this agglutinative language?”, they cry. And did you know that Mick Jagger was raised Basque? His English accent is a fake: he stole it from a classmate at LSE, using 1960s recording technology and a hypnopaedic pillow. You don’t get those lips from an Indo-European language, let me tell ya! Angela Lansbury is Swedish, and Dick van Dyke really is a cockney. As a young man Rex Harrison sang Wagner’s Parsifal with Maria Callas in the Italian premiere at La Scala: the My Fair Lady thing is a front. Not many know that, but you can see all this in their faces.

Just by looking, I can tell whether your dog will develop cataracts (and whether your cat will develop doggeracts, should you care). Show me your friend’s wardrobe, and I can predict the mean rainfall over the Andes for the next two weeks. Two glances inside your purse and I can diagnose your psychological problems to eight decimal places AND predict your fashion preferences. Just from your diet, I can tell you not only your height, weight and hat size but your views on a wide range of social issues and the three last digits of your social security number. If you were raised by wolves, I can tell. If you were kept locked in a cupboard by your neglectful parents, I will spot it, and be able to offer career advice, speech therapy and a range of inexpensive  after-care options. If your father married his half sister and you were raised by a vengeful dwarf in the forest, I will know, and be the first to offer you a place to lay your sword. And advise you on whether the local fire brigade is any use for your unknowingly genetically suspect purpose. But I’m not special, I think most people could do that, just by looking.
Posted by: Chris Brew | March 26, 2011

How to do comparisons between machine learning schemes

Nice paper comparing 16 model selection and weighting schemes. Includes 58 benchmark datasets. The data analysis was done in the following way – for each dataset, rank the schemes. Then average the ranks. – use the Friedman test to test whether ranks are all equal ( – if ranks are not all equal, use the Nemenyi test (covered in papers by Demsar, Garcia et al Ying Yang, Geoffrey I. Webb, Jesús Cerquides, Kevin B. Korb, Janice R. Boughton, Kai Ming Ting: To Select or To Weigh: A Comparative Study of Linear Combination Schemes for SuperParent-One-Dependence Estimators. IEEE Trans. Knowl. Data Eng. 19(12): 1652-1665 (2007), ISSN: 1041-4347

Posted by: Chris Brew | April 27, 2010

Why statisticians shouldn’t write movie titles

Never Give a Sucker an Asymptotically Even Flip
The Variational Enigma
The Fisher King
Independence Day
Return to Monte Casino
The Man Who Measured the Bank at Monte Carlo
Between 99 and 103 Dalmatians
8.5 +/- 0.2
The Metropolis Method (void where prohibited by law)
Improper Priors go wild on Cancun

Posted by: Chris Brew | April 6, 2010

Plain speaker’s guide to “any more” and “anymore:

First here’s a rephrasing of what Huddleston and Pullum’s epic Cambridge Grammar of English says about “any more” and similar adverbs. The main discussion is on p 710 and following, with other bits on 823 and 831

  1. They are polarity sensitive: this means that there is a difference in acceptability between “She isn’t here any more” and “She is here any more”. For many speakers, the first is OK, the second not.
  2. The difference between “any more” and “anymore” is a British/American spelling difference.
  3. You can line up “anymore” with “still” and “no longer”. They differ in how they work with negation.

My own impressions follow. Most speakers can say :

“She is still here” (i.e. she is here and has been for a while)
“She is still not here” (i.e. we are waiting, and she still hasn’t arrived),
“She is not here anymore” ,”She is no longer here” (in both cases, she was here, but now isn’t)

Many speakers find: “She is not still here”,”She is here anymore” awkward. For the first one the intended meaning is the same as the one expressed by “She is no longer here”. Some speakers, including me, blow a fuse when confronted with the second one, and don’t even understand what it means. For others, “anymore” can be used anywhere that “nowadays” is, with much the same meaning, so “She is here anymore” could be used (if you are, say, in a bar) when the person in question used to avoid the bar but now hangs out there on a regular basis. Similarly “Ice cream is cheap anymore” works for many people, but in my natural dialects, I  would have to either turn it round and say “Ice cream isn’t expensive anymore” or punt and say “Ice cream is cheap nowadays”.

Unfortunately, linguists have taken to confusing themselves and others by talking about “positive anymore”.  If they had called it “nowadays anymore” there would have  been no trouble. These adverbs are neither positive nor negative, just a little fussy about what kind of sentences they like to be wrapped up in. The “nowadays” translation helped me, and is from John Lawler. As he says

Apparently, for users of positive “anymore”, “nowadays” doesn’t
cut it anymore. Anymore, they use “anymore” instead. Or perhaps
only in certain speech contexts; the definitive sociolinguistic
study remains to be done.

I guess I can forgive him for using the term “positive”, because he puts it in quotes and gives an amusing example.

By the way, in Columbus, Ohio. ice cream really is cheap and good at Graeter‘s and  Jeni’s . No ice creams were consumed in the creation of this post, but several area shops are on high alert.

Posted by: Chris Brew | April 6, 2010

Facebook’s de facto terms of use

If you are thinking of collecting and distributing data from social media sites, you should read
Pete Warden’s account of how Facebook responded to his activities.. Facebook appears to be keen to exert more control than one would think they are entitled to, and certainly more than is convenient for academics. Nobody knows how this would play out in court… Twitter is looking better than ever as a data source.

Posted by: Chris Brew | April 1, 2010

Genuinely funny April Fool article

This one actually made me laugh:

A would-be saboteur arrested today at the Large Hadron Collider in Switzerland made the bizarre claim that he was from the future. Eloi Cole, a strangely dressed young man, said that he had travelled back in time to prevent the LHC from destroying the world.

The LHC successfully collided particles at record force earlier this week, a milestone Mr Cole was attempting to disrupt by stopping supplies of Mountain Dew to the experiment’s vending machines.,39029552,49305387,00.htm?s_cid=33

Posted by: Chris Brew | April 1, 2010

Environmentally concerned spouse

“It felt so good to throw away that Martha Stewart dishwasher liquid”

Context: in our machine, Martha Stewart’s green dishwasher liquid may be green, but isn’t effective for washing.

Posted by: Chris Brew | March 22, 2010


The Society for the Promotion of Long Prepositions,Adverbs and Conjunctions wishes, henceforward, to exist, notwithstanding its lack of positive ontological status heretofore. Moreover. it regrets and plans to remedy its previous delinquencies in this area, but nevertheless accepts that its existence may not continue for long. Contrariwise, it sees itself as a lexical mayfly skittering over the surface of the language, and is OK with that. Anyone know where the nectar is?

« Newer Posts - Older Posts »