Kae Travis

Strip out style attributes in HTML

Posted on by in ASP.Net
Tags:

This post describes the process I use to strip out style attributes in HTML code using a regular expression.

My website is presenting data from a field in SharePoint.  This field uses HTML and CSS style attributes to construct the note.  A user would enter this data via a Sharepoint website, and my .Net website will present it elsewhere. The trouble is, when my site presents this data the message can look like a right mess.  Different fonts, different sizes and different colours (you’ve met those idiots before who like to use Comic Sans font in a professional environment, right?).  So before I present the data in a Literal control I decided to write a regular expression to strip out any style/class attributes etc.  And here is the .Net function (which I have in a class):

  //function to strip CSS styles etc from sharepoint notes
    public static string stripStyles(string message)
    {

        //replace non-ascii with empty string
        message = Regex.Replace(message, @"[^\u0000-\u007F]", string.Empty);

        //replace 3 or more BR with one BR
        message = Regex.Replace(message, "(?:\\s*<br[/\\s]*>\\s*){3,}", "");

        //remove any style attributes   
        message = Regex.Replace(message, "style=(\"|')[^(\"|')]*(\"|')", "");

        //remove any classe attributes
        message = Regex.Replace(message, "class=(\"|')[^(\"|')]*(\"|')", "");  

        //remove empty p tags
        message = Regex.Replace(message, "(<p>\\s*</p>|<p>\\s*​\\?</p>)", "");
        
        //remove font tags
        message = Regex.Replace(message, "</?(font)[^>]*>", "");

        return message;

    }

It won’t produce perfect results, because there are also uses of the <font> tag scattered about in these messages.  But I’m going to leave those alone for now since I suspect <font> tags may be used to highlight (bold/colour) certain words (auto-generated from the WYSIWYG editor).

Strip out style attributes in HTML
Strip out style attributes in HTML

Leave a Reply