shell - Print out the distribution of words in multiple files -

February 15, 2012

i trying create executable take in number of text files , give output distribution of words number of occurrences. done in bash scripting, , have far is:

#!/bin/bash y=$(cat $* | wc -w)  cat $* | tr ' ' '//' |  tr '[:upper:]' '[:lower:]' | tr -d '[:punct:]' |  grep -v '[^a-z]'| sort | uniq -c | sort -rn | head -$y

i error trying set y , can't figure out how head print out every word otherwise.

is there improve way print out?

why run head @ all? there's no guarantee there many words there words in files; indeed, practically guaranteed there won't (since there'll repeated words). , if want see data, show data; don't filter output sort -nr.

the first tr needs 1 slash, think. normally, you'd map blanks , punctuation newlines (with -s alternative tr squeeze adjacent newlines one). slashes first tr count punctuation in 3rd tr, isn't obvious you're there. think i'd expect see like:

cat "$@" | tr -cs '[:alpha:]' '\n' |      # convert non-alpha character newline tr '[:upper:]' '[:lower:]' |   # case-convert lower case sort | uniq -c | sort -nr

note utilize of "$@" rather $*; there's no difference when file names specify don't contain blanks (newlines, tabs, etc); when do, "$@" form right , $* not, may utilize "$@". right far more $* is.

for c source code had lying around, output script was:

 246 n  217  153 int  141 list  124 if  118 t  103 char   99   97 size   90 buffer   89 context   82 d   81 void   79 include   79 h   78 s   65   62 j   55 ptr   54 r   54 const   53 static   53 sem   51 pthread   49 z   49 oldneedle   49 err   47   47  homecoming   46 mutex   44 printf   43 error   43 c

note word 'h' appears word 'include'; there's reason that! word t appears lot, that's because, example, size_t treated 2 words filtering. preserving underscores possible; alter first tr utilize '[:alpha:]_' (note underscore). eliminated digits, can maintain if want.

shell

Search This Blog

Pages Vivanta

shell - Print out the distribution of words in multiple files -

Comments

Post a Comment

Popular posts from this blog

javascript - mongodb won't find my schema method in nested container -

Hibernate criteria by a list of natural ids -

ios - Lagging ScrollView with UIWebview inside -