Truncating Text and HTML
Posted by mkhairul - March 27, 2008 at 10:03:38 pm -Categories: blog
Of all the applications that I built, truncating text with HTML is crucial. The last thing I need is an entry that takes up half of the page and makes it looks absolutely ugly (I could put a read more on my posts but sometimes I just forget and my posts and not that long).
Here's a function that I use for truncating text with HTML (texts without HTML also works). If I remember correctly its a snippet from cakephp.
Update: Seems the Text Editor doesn't play nice. Here's the file.
PHP:
-
function truncate($text, $length = 100, $ending = '...', $exact = false, $considerHtml = true) {
-
if ($considerHtml) {
-
// if the plain text is shorter than the maximum length, return the whole text
-
return $text;
-
}
-
-
// splits all html-tags to scanable lines
-
-
$truncate = '';
-
-
foreach ($lines as $line_matchings) {
-
// if there is any html-tag in this line, handle it and add it (uncounted) to the output
-
// if it's an "empty element" with or without xhtml-conform closing slash (f.e. <br/>)
-
if (preg_match('/^<(\s*.+?\/\s*|\s*(img|br|input|hr|area|base|basefont|col|frame|isindex|link|meta|param)(\s.+?)?)>$/is', $line_matchings[1])) {
-
// do nothing
-
// if tag is a closing tag (f.e. </strong>)
-
// delete tag from $open_tags list
-
if ($pos !== false) {
-
}
-
// if tag is an opening tag (f.e. <strong>)
-
// add tag to the beginning of $open_tags list
-
}
-
// add html-tag to $truncate'd text
-
$truncate .= $line_matchings[1];
-
}
-
-
// calculate the length of the plain text part of the line; handle entities as one character
-
$content_length = strlen(preg_replace('/&[0-9a-z]{2,8};|&#[0-9]{1,7};|&#x[0-9a-f]{1,6};/i', ' ', $line_matchings[2]));
-
if ($total_length+$content_length> $length) {
-
// the number of characters which are left
-
$left = $length - $total_length;
-
$entities_length = 0;
-
// search for html entities
-
if (preg_match_all('/&[0-9a-z]{2,8};|&#[0-9]{1,7};|&#x[0-9a-f]{1,6};/i', $line_matchings[2], $entities, PREG_OFFSET_CAPTURE)) {
-
// calculate the real length of all entities in the legal range
-
foreach ($entities[0] as $entity) {
-
if ($entity[1]+1-$entities_length <= $left) {
-
$left--;
-
} else {
-
// no more characters left
-
break;
-
}
-
}
-
}
-
// maximum lenght is reached, so get off the loop
-
break;
-
} else {
-
$truncate .= $line_matchings[2];
-
$total_length += $content_length;
-
}
-
-
// if the maximum length is reached, get off the loop
-
if($total_length>= $length) {
-
break;
-
}
-
}
-
} else {
-
return $text;
-
} else {
-
}
-
}
-
-
// if the words shouldn't be cut in the middle...
-
if (!$exact) {
-
// ...search the last occurance of a space...
-
// ...and cut the text in this position
-
}
-
}
-
-
// add the defined ending to the text
-
$truncate .= $ending;
-
-
if($considerHtml) {
-
// close all unclosed html-tags
-
foreach ($open_tags as $tag) {
-
$truncate .= '</' . $tag . '>';
-
}
-
}
-
-
return $truncate;
-
-
}
March 27, 2008 | In blog |
3 Comments »
RSS feed for comments on this post. TrackBack URI
Leave a comment
Powered by WordPress with GimpStyle Theme design by Horacio Bella.
Entries and comments feeds.
Valid XHTML and CSS.


It would be monumentally helpful if the less than signs and other parts of the code weren’t converted to < etc.
Comment by Mike — April 28, 2008 #
Sorry about that dude, I’ve posted the file.
Comment by mkhairul — April 29, 2008 #
I do not believe this
Comment by fornetti — August 31, 2008 #