How to Select Everything Between HTML Comments With Regex
When it comes to working with HTML files in JavaScript, in most cases, you will want to use a parser for complex tasks. However, regex also has its own place. Regex can be used for simple tasks, such as selecting everything between HTML comments.
In such cases, using a regex is a simpler solution and much faster than trying to achieve the same thing with an HTML parser. To select everything between HTML comments using regex, we can use the following formula:
/<!-- COMMENT_START -->(.*?)<!-- COMMENT_END -->/gs
Everything that goes between //
will be interpreted as a regex. Let's break down the regex to better understand how it works:
<!-- COMMENT_START -->
: Matches the start of a comment block. You can replaceCOMMENT_START
with the word of your choice.()
: This is a capture group that groups multiple tokens together for extracting a substring or using backreference. This is optional in our case..
: This character will match any character except a newline.*
: This quantifier is used for matching 0 or more of the preceding token.?
: This is also a quantifier that makes the regex match as few characters as possible. Without it, the match would include multiple whitespaces.<!-- COMMENT_END -->
: Matches the end of the comment block. This will be the end of the match.gs
: At the end of the regex, we can define expression flags.g
stands for "global", which means all matches will be returned, not just the first one.s
enables the "dotall" mode, which allows the.
character to match newlines.
How to use it in JavaScript
To use this regex in JavaScript and grab the content between the HTML comments, we can use the built-in test
and match
methods in the following way:
const html = '<!-- COMMENT_START -->...<!-- COMMENT_END -->'
if (/<!-- COMMENT_START -->/.test(html)) {
const regex = /<!-- COMMENT_START -->(.*?)<!-- COMMENT_END -->/gs
const matches = html.match(regex)
matches.forEach(match => {
const content = match
.replace(/<!-- COMMENT_START -->/g, '')
.replace(/<!-- COMMENT_END -->/g, '')
.trim() // Optionally, you can trim off excess whitespaces
console.log(content)
})
}
First, we need to test if the HTML contains the comments that we are looking for. This can be done by using the test
method on a regex. The method accepts a string as a parameter and returns a boolean (true
if the string contains the regex, otherwise false
).
Inside the if
statement, we can use the match
method on the string, which will return an array of matches. This means we can grab all occurrences from an HTML file. To extract the content from between the comments, we can loop through the results and remove the HTML comments using the replace
method.
<!-- COMMENT_START -->
<div class="content">HTML Content between HTML comments</div>
<!-- COMMENT_END -->
<!-- COMMENT_START -->
<div class="content">Another instance</div>
<!-- COMMENT_END -->
Given the above example, the provided JavaScript snippet will match both blocks and extract the HTML content in between. The above example will produce the following output:
<div class="content">HTML Content between HTML comments</div>
<div class="content">Another instance</div>
Summary
Regex is a versatile tool that can be used for a variety of tasks. If you would like to test out your regexes visually, you can use online tools such as RegExr or Regexper. If you have any questions about the above solution, make sure you leave them in the comments below.
Are you new to JavaScript? Make sure you take a look at our JavaScript roadmap to learn everything you need to know about one of the core parts of the web.
Rocket Launch Your Career
Speed up your learning progress with our mentorship program. Join as a mentee to unlock the full potential of Webtips and get a personalized learning experience by experts to master the following frontend technologies: