Rich Comment Formatting

Category Web Development
Posted July 8, 2013
by Jacob Emerick

Comments are an important part of a blog. Too often blogs are used as a soapbox for one person to state their ideas and opinions out into the open air of the internet without any sort of inline interaction. Comments allow an audience to give feedback, debate the points, or provide their own views back at the original author and to the general public. Without any sort of inline interaction a blog is just another semi-static website.

I added comments to my blog a long time ago, around 2009, Since then over 200 different people have added 600 thoughts, ideas, and feedback, all of which has been valuable additions to the original post (aside from the occasional spam post). There was still one thing that bugged me - I had very strict rules in place about what was acceptable content in a comment. I didn't allow any html tags, not even bold or italics, and even stripped out new lines. It was about time that I spruced up the comments a bit and let a few more things filter through.

The first thing I did was look at how other popular blogging frameworks dealt with comments. Most of them allowed certain html tags and stripped out the rest.

Note: I did entertain the idea of using bbcode, or something similar, to handle markup. I decided against this as I figured that more users out there understand html than bbcode and hey, why force people to learn when you're trying to encourage them to share?


<a> -> clickable links, often with a rel=nofollow to avoid seo issues
<b> -> bold words
<i> -> italic words

This seemed like a good start. I did want to add one more tag, <pre>, in case anyone wanted to add code to a comment. This tag just changes up the line break behavior and (depending on your stylesheet) can also add monospacing to the font. So now I had a good start for what tags to use, but there was one more thing to thing about.

While some people understand what html tags are, there are plenty that still don't use them on a day-to-day basis. I didn't want to ignore them with this update. I decided to add two more additional rules. First, if a link gets passed in I wanted autolink it - that is, wrap it in an <a> tag so its clickable. The other thing I wanted to do was add breaks whenever someone hit the 'enter' key (AKA made a new line). This would let users who don't know HTML still add rich, readable comments.

So, to sum things up, I wanted comments to be processed as so.


Check to make sure that acceptable tags where in an acceptable format*
Strip out any additional tags
Add extra syntax (autolink and line breaks) to content

*acceptable format = no extra attributes

My first thought was to just make an easy procedural script to do this. However, this didn't work. Doing this type of processing is much easier if you use placeholders and remove the acceptable tags all together (as you'll see in the code below). Creating a placeholder, using it, and saving the placeholder & replacement is too much logic to stuff into a few functions. So I created a quick and dirty utility class that I'm still not totally excited about even though it works just fine for my use cases.


class CleanComment
{

  private static $LINK_PATTERN =
    '@<a.*href=["\']([^"\']*)["\'].*>(.*)</a>@i';
  private static $BOLD_PATTERN =
    '@<b.*>(.*)</b>@i';
  private static $ITALIC_PATTERN =
    '@<i.*>(.*)</i>@i';
  private static $CODE_PATTERN =
    '@<pre[^>]*>(.*)</pre>@is';

  private static $LINK_REPLACE =
    '<a href="%s" rel="nofollow" target="_blank">%s</a>';
  private static $BOLD_REPLACE =
    '<b>%s</b>';
  private static $ITALIC_REPLACE =
    '<i>%s</i>';
  private static $CODE_REPLACE =
    '<pre>%s</pre>';

  private static $URL_PATTERN =
    '@(https?://[a-z0-9\.-]+\.[a-z]{2,6}[^\s]*[^\.,\?\!;\s]+)@i';
  private static $LINE_BREAK_PATTERN = '@([\r\n]+)@';

  private static $LINE_BREAK_REPLACE = '<br />';

  private $replacement_array = array();

  public function __construct() {}

  public function activate($content)
  {
    $content = $this->process_element(
      $content, self::$CODE_PATTERN, self::$CODE_REPLACE);
    $content = $this->process_element(
      $content, self::$LINK_PATTERN, self::$LINK_REPLACE);
    $content = $this->process_element(
      $content, self::$ITALIC_PATTERN, self::$ITALIC_REPLACE);
    $content = $this->process_element(
      $content, self::$BOLD_PATTERN, self::$BOLD_REPLACE);
    
    $content = $this->strip_extra_tags($content);
    
    $content = $this->link_unlinked_urls(
      $content, self::$URL_PATTERN, self::$LINK_REPLACE);
    $content = $this->add_line_breaks($content);
    $content = $this->replace_element_patterns($content);
    
    return $content;
  }

  private function process_element($content, $pattern, $replace)
  {
    $match_count = preg_match_all(
      $pattern, $content, $matches, PREG_SET_ORDER);
    
    if($match_count < 1)
      return $content;
    
    foreach($matches as $match)
    {
      $full_match = array_shift($match);
      $placeholder = $this->create_placeholder($full_match);
      $full_match_pattern = $this->create_full_match_pattern($full_match);
      
      $content = preg_replace(
        $full_match_pattern, $placeholder, $content, 1);
      $this->replacement_array[$placeholder] = vsprintf($replace, $match);
    }
    
    return $content;
  }

  private function create_placeholder($text)
  {
    return md5($text . rand());
  }

  private function create_full_match_pattern($text)
  {
    $pattern = '';
    $pattern .= '@';
    $pattern .= preg_quote($text, '@');
    $pattern .= '@';
    $pattern .= 'i';
    
    return $pattern;
  }

  private function strip_extra_tags($content)
  {
    return strip_tags($content);
  }

  private function link_unlinked_urls($content, $pattern, $replace)
  {
    $match_count = preg_match_all(
      $pattern, $content, $matches, PREG_SET_ORDER);
    
    if($match_count < 1)
      return $content;
    
    foreach($matches as $match)
    {
      $full_match = array_shift($match);
      $full_match_pattern = $this->create_full_match_pattern($full_match);
      $replace = sprintf($replace, $match[0], $match[0]);
      
      $content = preg_replace(
        $full_match_pattern, $replace, $content, 1);
    }
    
    return $content;
  }

  private function add_line_breaks($content)
  {
    return preg_replace(
      self::$LINE_BREAK_PATTERN, self::$LINE_BREAK_REPLACE, $content);
  }

  private function replace_element_patterns($content)
  {
    foreach($this->replacement_array as $key => $replace)
    {
      $content = str_replace($key, $replace, $content);
    }
    
    return $content;
  }

}

I'm sure that some of the regular expressions will need to get modified as different URLs start coming through. This class worked just dandy for all my existing comments, though.

Using this code I went back and processed all of the comments, adding a new 'format' for each one so I could compare the raw user comment to the processed one. Everything looked and worked great, especially for some of the longer paragraphs that some of the fellow hikers have left describing different areas and experiences. It's also in the codebase now (modified a bit to fit in the stack) and I'm saving two versions of every comment, the raw and the processed, just in case something wonky happens.

So now you can read and add richer comments to the blog! Enjoy :)

Jacob Emerick's Blog

Rich Comment Formatting

Related Posts

Comments (0)

Tag Cloud

Recent Comments

Some of my Sites

Activity Stream

Jacob Emerick's Blog

Rich Comment Formatting

Related Posts

Fighting the Comment Spam

Cross Site Status Updater

Blog Updates and Comments

Web Development Goals

Comments (0)

Tag Cloud

Recent Comments

Some of my Sites

Activity Stream