优化Wordpress摘要提取,自动补全HTML Tag

wordpress默认不会在摘要部分保留html标签,这样看着挺丑,在去年早些时候做了一个优化,在strip_all_tags()调用strip_tags()时,保留了<p>,<a>,<em>,<strong>,<img>,<embed>这几个标签,

    $allowed_tags = ‘<p>,<a>,<em>,<strong>,<img>,<embed>’;
$string = strip_tags($string, $allowed_tags);

但这样又导致了新的问题:1.字数统计包含html标签,2.截取摘要会截断HTML标签。一直默默忍受,今天终于忍不住,动手修复一番,直接上代码

/**

  • 截取HTML,并自动补全闭合
  • @param $html
  • @param $length
    */
    function subHtml($html, $length)
    {
    $result = ‘’;
    $tagStack = array();
    $len = 0;
    $contents = preg_split(“~(<[^>]+?>)~si”, $html, -1, PREG_SPLIT_NO_EMPTY | PREG_SPLIT_DELIM_CAPTURE);
    foreach ($contents as $tag) {
    if (trim($tag) == "")
        continue;
    if (preg_match("~&lt;([a-z0-9]+)[^&gt;]*?/&gt;~si", $tag)) {
        $result .= $tag;
    } else if (preg_match("~&lt;/([a-z0-9]+)[^/&gt;]*?&gt;~si", $tag, $match)) {
        if ($tagStack[count($tagStack) - 1] == $match[1]) {
            array_pop($tagStack);
            $result .= $tag;
        }
    } else if (preg_match("~&lt;([a-z0-9]+)[^&gt;]*?&gt;~si", $tag, $match)) {
        if (!startsWith($match[1], "br", false) &amp;&amp; !startsWith($match[1], "img", false)) {
            array_push($tagStack, $match[1]);
        }
        $result .= $tag;
    } else if (preg_match("~&lt;!--.*?--&gt;~si", $tag)) {
        $result .= $tag;
    } else {
        if ($len + mb_strlen($tag) &lt; $length) {
            $result .= $tag;
            $len += mb_strlen($tag);
        } else {
            $str = mb_substr($tag, 0, $length - $len + 1);
            $result .= $str;
            break;
        }
    }
    
    }
    while (!empty($tagStack)) {
    $result .= '&lt;/' . array_pop($tagStack) . '&gt;';
    
    }
    return $result;
    }

/**

  • custom excerpt trim
    */
    function better_trim_excerpt($text = ‘’) {
    $raw_excerpt = $text;
    if ( ‘’ == $text ) {
    $text = get_the_content('');
    $text = strip_shortcodes( $text );
    $text = apply_filters('the_content', $text);
    $text = str_replace(']]&gt;', ']]&amp;gt;', $text);
    $excerpt_length = apply_filters('excerpt_length', 55);
    $excerpt_more = apply_filters('excerpt_more', ' ' . '[...]');
    //$text = my_trim_words( $text, $excerpt_length, $excerpt_more );
    $text = subHtml($text, $excerpt_length) . $excerpt_more;
    
    }
    return apply_filters(‘wp_trim_excerpt’, $text, $raw_excerpt);
    }

remove_filter(‘get_the_excerpt’, ‘wp_trim_excerpt’);
add_filter(‘get_the_excerpt’, ‘better_trim_excerpt’, 5);
其中subHtml() 函数会补全被截断的html标签,并且只统计除标签以外的文字