As we are at finding leaking data, I thought putting something related would be more fun.
In this issue of command line gyan, we’ll see how we can find some text in files. Say we are looking for a pattern of text in multiple file. This can also be used in any such DLP to find the leaking data.
Windows
Here we go with a simple attempt and see how good it is
C:windows> find /c "disk" *.log
Output
This switch /c of find command here will search the current directory Driver (C:windows) for all the .log files where the string “disk” is mentioned. This will print number of occurrences of the substring too.
To further improvise it we’ll first redirect all the errors to null
C:windows> find /c "disk" *.log 2>null
Ok, haven’t achieved much yet. We are still getting a lot of with “0” occurrences. In a big directory this might not be very useful. So now we’ll remove all the lines showing 0 occurrences.
C:windows>find /c "disk" *.log 2>null | find /v ": 0"
This will now remove all the lines which are showing “: 0”
/v here negates the search string “: 0” & show only those lines which do not have string “: 0”
So far so good, but there are two ugly things in the result
The leading “—-” put buy find /c command & unnecessary blank lines courtesy to our second find /v of “: 0”.
To solve this we’ll have to rely on our good old FOR loop.
C:WINDOWS>for /f "delims=-" %i in ('"find /c "disk" *.log 2>nul | find /v ": 0" "') do @echo %i
I think it’s pretty simple from the command that we have removed all unwanted delimiters like “-” & CRFLs. You might wonder how we removed the CRLF (blank lines), basically we didn’t parsed lined that didn’t had “-” in them
So what say now, happy with the output now you can use this output in any report as such.
Linux
The same can be done in Linux in following way.
If we have to find it in a single file then command below will do the trick.
$ grep robot /var/log/httpd/access/access.log| wc -l
But our objective is to scan a complete directory and search for the string.
So let’s created a loop that does the main part of the work, and then piped the output into awk for better reporting.
$ for f in *; do echo -n "$f "; grep disk $f | wc -l; done | awk '{t = t + $2; print $2 "t" $1} END {print t "tTOTAL"}'
Yeah that becomes a big command to remember but the idea is to make it a habit 😉
For loop here will execute the same command string in all the files and awk will do result capture and printing.