Sunday, February 11, 2018

php - Regex - Convert HTML to valid XML tag






I need help writing a regex function that converts HTML string to a valid XML tag name. Ex: It takes a string and does the following:





  • If an alphabet or underscore occurs in the string, it keeps it

  • If any other character occurs, it's removed from the output string.

  • If any other character occurs between words or letters, it's replaced with an Underscore.




Ex:
Input: Date Created
Ouput: Date_Created


Input: Date
Created
Output: Date_Created

Input: Date\nCreated
Output: Date_Created

Input: Date 1 2 3 Created
Output: Date_Created




Basically the regex function should convert the HTML string to a valid XML tag.


Answer



A bit of regex and a bit of standard functions:



function mystrip($s)
{
// add spaces around angle brackets to separate tag-like parts
// e.g. "
" becomes "
"

// then let strip_tags take care of removing html tags
$s = strip_tags(str_replace(array('<', '>'), array(' <', '> '), $s));

// any sequence of characters that are not alphabet or underscore
// gets replaced by a single underscore
return preg_replace('/[^a-z_]+/i', '_', $s);
}

No comments:

Post a Comment

plot explanation - Why did Peaches&#39; mom hang on the tree? - Movies &amp; TV

In the middle of the movie Ice Age: Continental Drift Peaches' mom asked Peaches to go to sleep. Then, she hung on the tree. This parti...