2006-07-07

australia vote data

consider the following data format, then shoot yourself:

blocks of rows 1…33 × cols 1…12.

Rows are candidates i, columns votecounts v.

Some number g of groups of rows are required per group of counts c. For each c=1…c, rows repeat g times.

Some rows contain summary information; linefeeds between groups b and g vary. Actual candidates are identifiable by having lastnames in caps. The length of c may be determined by an item indicating the total number of counts required. No indication of g is given except the recurrence of a same-named candidate.

The naming scheme of the files containing these is the same for 93,96,98, changes in 2001, and again in 2004. The file formats appear to be the same but once my parser works on a few, I'm sure it will choke a few dozen times. Here is an example file from 1993.

The goal is to parse this into a i×v+x matrix, where x is the stuff we’re actually interested in: who won, in what order, on what count, what party; party magnitude; thresholds/quotas – and eventually the nature of women’s representation in Single Transferable Vote systems. Australia is the largest of these, but Ireland, Malta, and Fiji also use STV in at least one house. The worst file is 2004 New South Wales.

No comments: