2006-07-14

dealing with dumb formats

10:39:35 AM Ryan Black: i have 1900 files
10:39:43 AM Ryan Black: i want a frequency count of "QUESTION" in each of them
10:39:46 AM Ryan Black: saved into something nice
10:39:55 AM Ryan Black: 1900 Word files, that is.

Word doc files are awful. It could probably be vbscripted, but having Word actually open and handle 1900 files doesn't sound like the best idea. Any sane person's inclincation would be to have a shellscript do it. In addition, he wants some trailing text (a job for grep). The key is catdoc, which is exactly what it sounds like – it cats a word doc! Its brother xls2csv is also a throughly good idea.

No comments: