Counting & Matching Text in Files

June 12, 2010, by | Start Discussion

As we are at finding leaking data, I thought putting something related would be more fun.

In this issue of command line gyan, we’ll see how we can find some text in files. Say we are looking for a pattern of text in multiple file. This can also be used in any such DLP to find the leaking data.

Windows

Here we go with a simple attempt and see how good it is

C:windows> find /c "disk" *.log

Output

This switch /c of find command here will search the current directory Driver (C:windows) for all the .log files where the string “disk” is mentioned. This will print number of occurrences of the substring too.

To further improvise it we’ll first redirect all the errors to null

C:windows> find /c "disk" *.log 2>null

Ok, haven’t achieved much yet. We are still getting a lot of with “0” occurrences. In a big directory this might not be very useful. So now we’ll remove all the lines showing 0 occurrences.

C:windows>find /c "disk" *.log 2>null | find /v ": 0"

This will now remove all the lines which are showing “: 0”

/v here negates the search string “: 0” & show only those lines which do not have string “: 0”

So far so good, but there are two ugly things in the result

The leading “—-” put buy find /c command & unnecessary blank lines courtesy to our second find /v of “: 0”.

To solve this we’ll have to rely on our good old FOR loop.

C:WINDOWS>for /f "delims=-" %i in ('"find /c "disk" *.log 2>nul | find /v ": 0" "') do @echo %i

I think it’s pretty simple from the command that we have removed all unwanted delimiters like “-” & CRFLs. You might wonder how we removed the CRLF (blank lines), basically we didn’t parsed lined that didn’t had “-” in them

So what say now, happy with the output now you can use this output in any report as such.

Linux

The same can be done in Linux in following way.

If we have to find it in a single file then command below will do the trick.

$ grep robot  /var/log/httpd/access/access.log| wc -l

But our objective is to scan a complete directory and search for the string.

So let’s created a loop that does the main part of the work, and then piped the output into awk for better reporting.

$ for f in *; do echo -n "$f "; grep disk $f | wc -l; done | awk '{t = t + $2; print $2 "t" $1} END {print t "tTOTAL"}'

Yeah that becomes a big command to remember but the idea is to make it a habit 😉

For loop here will execute the same command string in all the files and awk will do result capture and printing.

bio data - Rohit Srivastwa

Leave a Reply