Trimming comments in HTML documents using Apache Ant

This short article, explaining how to trim unnecessary code (comments, empty lines) from HTML documents, is a follow-up to an article published a couple of weeks ago on this blog: Building Web Applications With Apache Ant. Basically, the idea is to use Ant’s optional replaceregexp task as shown below:

<target name="-trim.html.comments">
    <fileset id="html.fileset"
        dir="${build.dir}"
        includes="**/*.jsp, **/*.php, **/*.html"/>
    <!-- HTML Comments -->
    <replaceregexp replace="" flags="g"
        match="\<![ \r\n\t]*(--([^\-]|[\r\n]|-[^\-])*--[ \r\n\t]*)\>">
        <fileset refid="html.fileset"/>
    </replaceregexp>
    <!-- Empty lines -->
    <replaceregexp match="^\s+[\r\n]" replace="" flags="mg">
        <fileset refid="html.fileset"/>
    </replaceregexp>
</target>

Update: Use this code very carefully as it is dangerous territory (Thanks to my co-worker Ryan Grove for pointing out some of the shortcomings)

6 thoughts on “Trimming comments in HTML documents using Apache Ant

  1. Pingback: Web Application Construction Tools » Chris Norton

  2. Pingback: ThemePassion - Best stuff about design! » Trimming comments in HTML documents using Apache Ant

  3. Steve

    How do you prevent the regex from removing Javascript enclosed in HTML comments in order to hide it from browsers, which have Javascript disabled?
    I fiddled around a lot but never managed to get ant ignore comments which end with //->. It always results in an infinite loop. :-(

  4. Jaime Bueza

    Here’s one for removing console.log within a combined yui compressed file.


    <replaceregexp file="my_combined_file.js" match="(console\.log\(.*\))" flags="g" replace="\/\/\1"></replaceregexp>

Comments are closed.