KDiff3 supports two preprocessor options.
- Preprocessor command:
When any file is read, it will be piped through this external command. The output of this command will be visible instead of the original file. You can write your own preprocessor that fulfills your specific needs. Use this to cut away disturbing parts of the file, or to automatically correct the indentation etc.
- Line-matching preprocessor command:
When any file is read, it will be piped through this external command. If a preprocessor-command (see above) is also specified, then the output of the preprocessor is the input of the line-matching preprocessor. The output will only be used during the line matching phase of the analysis. You can write your own preprocessor that fulfills your specific needs. Each input line must have a corresponding output line.
The idea is to allow the user greater flexibility while configuring the diff-result. But this requires an external program, and many users don't want to write one themselves. The good news is that very often sed or perl will do the job.
Example: Simple testcase: Consider file a.txt (6 lines):
aa ba ca da ea fa
And file b.txt (3 lines):
cg dg eg
Without a preprocessor the following lines would be placed next to each other:
aa - cg ba - dg ca - eg da ea fa
This is probably not wanted since the first letter contains the actually interesting information. To help the matching algorithm to ignore the second letter we can use a line matching preprocessor command, that replaces 'g' with 'a':
With this command the result of the comparison would be:
aa ba ca - cg da - dg ea - eg fa
Internally the matching algorithm sees the files after running the line matching preprocessor, but on the screen the file is unchanged. (The normal preprocessor would change the data also on the screen.)
This section only introduces some very basic features of sed. For more
information see info:/sed or
A precompiled version for Windows® can be found at
Note that the following examples assume that the sed command is in some
folder in the
PATH environment variable. If this is not the case, you have to specify the full absolute
path for the command.
In this context only the sed substitute command is used:
Before you use a new command within KDiff3, you should first test it in a console. Here the echo command is useful. Example:
echo abrakadabra | sed 's/a/o/' -> obrakadabra
This example shows a very simple sed-command that replaces the first occurrence of "a" with "o". If you want to replace all occurrences then you need the "g" flag:
echo abrakadabra | sed 's/a/o/g' -> obrokodobro
The "|"-symbol is the pipe-command that transfers the output of the previous command to the input of the following command. If you want to test with a longer file then you can use cat on UNIX® like systems or type on Windows® like systems. sed will do the substitution for each line.
Currently KDiff3 understands only C/C++ comments. Using the Line-matching preprocessor command: option you can also ignore other types of comments, by converting them into C/C++-comments.
Example: To ignore comments starting with "
#", you would like to convert them to "
//". Note that you also must enable the Ignore C/C++ comments (treat as white space) option to get an effect. An appropriate Line-matching preprocessor command: would be:
Since for sed the "
/" character has a special meaning, it is necessary to place the "
\" character before each "
/" in the replacement-string. Sometimes the "
\" is required to add or remove a special meaning of certain characters. The single quotation marks (') are only important when testing on the command shell as it will otherwise attempt to process some characters.
KDiff3 does not do this except for the escape sequences '
\"' and '
Use the following Line-matching preprocessor command: to convert all input to uppercase:
Here the "
.*" is a regular expression that matches any string and in this context matches all characters in the line.
\1" in the replacement string refers to the matched text within the first pair of "
\(" and "
\U" converts the inserted text to uppercase.
CVS and other version control systems use several keywords to insert automatically
generated strings (info:/cvs/Keyword substitution).
All of them follow the pattern "
$KEYWORD generated text$". We now need a
line-matching preprocessor command that removes only the generated text:
\|" separates the possible keywords. You might want to modify this list
according to your needs.
\" before the "
$" is necessary because otherwise the "
$" matches the end of the line.
While experimenting with sed you might come to understand and even like these regular expressions. They are useful because there are many other programs that also support similar things.
Ignoring numbers actually is a built-in option. But as another example, this is how it would look as a line-matching preprocessor command.
Any character within '
[' and '
]' is a match and will be replaced with nothing.
Sometimes a text is very strictly formatted, and contains columns that you always want to ignore, while there are other columns you want to preserve for analysis. In the following example the first five columns (characters) are ignored, the next ten columns are preserved, then again five columns are ignored and the rest of the line is preserved.
Each dot '
.' matches any single character. The "
\1" and "
\2" in the replacement string refer to the matched text within the first
and second pair of "
\(" and "
\)" denoting the text to be preserved.
Sometimes you want to apply several substitutions at once. You can then use the
;' to separate these from each other. Example:
echo abrakadabra | sed 's/a/o/g;s/\(.*\)/\U\1/' -> OBROKODOBRO
Instead of sed you might want to use something else like perl.
perl -p -e 's/
But some details are different in perl. Note that where
sed needed "
\(" and "
requires the simpler "
(" and "
)" without preceding '
sed 's/\(.*\)/\U\1/' perl -p -e 's/(.*)/\U\1/'
The data is piped through all internal and external preprocessors in the following order:
Ignore case (treat as white space) (conversion to uppercase),
Detection of C/C++ comments,
Ignore numbers (treat as white space),
Ignore white space
The data after the normal preprocessor will be preserved for display and merging. The other operations only modify the data that the line-matching-diff-algorithm sees.
In the rare cases where you use a normal preprocessor note that the line-matching-preprocessor sees the output of the normal preprocessor as input.
The preprocessor-commands are often very useful, but as with any option that modifies your texts or hides away certain differences automatically, you might accidentally overlook certain differences and in the worst case destroy important data.
For this reason during a merge if a normal preprocessor-command is being used KDiff3 will tell you so and ask you if it should be disabled or not. But it won't warn you if a Line-matching preprocessor command: option is active. The merge will not complete until all conflicts are solved. If you disabled → menu item then the differences that were removed with the Line-matching preprocessor command: option will also be invisible. If the button remains disabled during a merge (because of remaining conflicts), make sure to enable → menu item. If you don't want to merge these less important differences manually you can select → menu item.