English 中文(简体)
GREP - finding all occurrences of a string
原标题:

I am tasked with white labeling an application so that it contains no references to our company, website, etc. The problem I am running into is that I have many different patterns to look for and would like to guarantee that all patterns are removed. Since the application was not developed in-house (entirely) we cannot simply look for occurrences in messages.properties and be done. We must go through JSP s, Java code, and xml.

I am using grep to filter results like this:

grep SOME_PATTERN . -ir | grep -v import | grep -v // | grep -v /* ...

The patterns are escaped when I m using them on the command line; however, I don t feel this pattern matching is very robust. There could possibly be occurrences that have import in them (unlikely) or even /* (the beginning of a javadoc comment).

All of the text output to the screen must come from a string declaration somewhere or a constants file. So, I can assume I will find something like:

public static final String SOME_CONSTANT = "SOME_PATTERN is currently unavailable";

I would like to find that occurrence as well as:

public static final String SOME_CONSTANT = "
SOME_PATTERN blah blah blah";

Alternatively, if we had an internal crawler / automated tests, I could simply pull back the xhtml from each page and check the source to ensure it was clean.

最佳回答

I would use sed, not grep! Sed is used to perform basic text transformations on an input stream. Try s/regexp/replacement/ option with sed command.

You can also try awk command. It has an option -F for fields separation, you can use it with ; to separate lines of you files with ;.

The best solution will be however a simple script in Perl or in Python.

问题回答

To address your concern about missing some occurrences, why not filter progressively:

  1. Create a text file with all possible matches as a starting point.
  2. Use filter X (grep for ^import , for example) to dump probable false positives into a tmp file.
  3. Use filter X again to remove those matches from your working file (a copy of [1]).
  4. Do a quick visual pass of the tmp file and add any real matches back in.
  5. Repeat [2]-[4] with other filters.

This might take some time, of course, but it doesn t sound like this is something you want to get wrong...





相关问题
Really strange grep 2.5.1 bug in cat d reading long lines

Recently a peer and I discovered an interesting bug in GNU grep 2.5.1 in which standard input with lines greater than 200,000,000 characters causes grep to fail, even if the pattern is not in one of ...

grep a tab in UNIX

How do I grep tab ( ) in files on the Unix platform?

how to grep a variable in the shell program? [duplicate]

#!/bin/bash for ((var=0; var<20; var++)) do echo " Number is: $(grep Multiple_Frame = echo **$var** 20mrf.txt | wc -l)" >>statisic.txt done This shell program cannot produce correct ...

GREP - finding all occurrences of a string

I am tasked with white labeling an application so that it contains no references to our company, website, etc. The problem I am running into is that I have many different patterns to look for and ...

Grep doesn t work correctly with .as files

Here s the statement I m running: grep -i -H ConfigureControls *.as Note that I m forcing file names with the -H flag. What I get back is: } } trac} } this.chairControls.debug....

热门标签