优化Wordpress摘要提取，自动补全HTML Tag

wordpress默认不会在摘要部分保留html标签，这样看着挺丑，在去年早些时候做了一个优化，在strip_all_tags()调用strip_tags()时，保留了<p>,<a>,<em>,<strong>,<img>,<embed>这几个标签，

    $allowed_tags = ‘<p>,<a>,<em>,<strong>,<img>,<embed>’;
    $string = strip_tags($string, $allowed_tags);

但这样又导致了新的问题：1.字数统计包含html标签，2.截取摘要会截断HTML标签。一直默默忍受，今天终于忍不住，动手修复一番，直接上代码

/**

截取HTML,并自动补全闭合
@param $html

@param $length
*/
function subHtml($html, $length)
{
$result = ‘’;
$tagStack = array();
$len = 0;
$contents = preg_split(“~(<[^>]+?>)~si”, $html, -1, PREG_SPLIT_NO_EMPTY | PREG_SPLIT_DELIM_CAPTURE);
foreach ($contents as $tag) {

if (trim($tag) == "")
    continue;
if (preg_match("~&lt;([a-z0-9]+)[^&gt;]*?/&gt;~si", $tag)) {
    $result .= $tag;
} else if (preg_match("~&lt;/([a-z0-9]+)[^/&gt;]*?&gt;~si", $tag, $match)) {
    if ($tagStack[count($tagStack) - 1] == $match[1]) {
        array_pop($tagStack);
        $result .= $tag;
    }
} else if (preg_match("~&lt;([a-z0-9]+)[^&gt;]*?&gt;~si", $tag, $match)) {
    if (!startsWith($match[1], "br", false) &amp;&amp; !startsWith($match[1], "img", false)) {
        array_push($tagStack, $match[1]);
    }
    $result .= $tag;
} else if (preg_match("~&lt;!--.*?--&gt;~si", $tag)) {
    $result .= $tag;
} else {
    if ($len + mb_strlen($tag) &lt; $length) {
        $result .= $tag;
        $len += mb_strlen($tag);
    } else {
        $str = mb_substr($tag, 0, $length - $len + 1);
        $result .= $str;
        break;
    }
}

}
while (!empty($tagStack)) {

$result .= '&lt;/' . array_pop($tagStack) . '&gt;';

}
return $result;
}

/**

custom excerpt trim
*/
function better_trim_excerpt($text = ‘’) {
$raw_excerpt = $text;
if ( ‘’ == $text ) {

$text = get_the_content('');
$text = strip_shortcodes( $text );
$text = apply_filters('the_content', $text);
$text = str_replace(']]&gt;', ']]&amp;gt;', $text);
$excerpt_length = apply_filters('excerpt_length', 55);
$excerpt_more = apply_filters('excerpt_more', ' ' . '[...]');
//$text = my_trim_words( $text, $excerpt_length, $excerpt_more );
$text = subHtml($text, $excerpt_length) . $excerpt_more;

}
return apply_filters(‘wp_trim_excerpt’, $text, $raw_excerpt);
}

remove_filter(‘get_the_excerpt’, ‘wp_trim_excerpt’);
add_filter(‘get_the_excerpt’, ‘better_trim_excerpt’, 5);
其中subHtml() 函数会补全被截断的html标签，并且只统计除标签以外的文字