Extended Regular Expressions

Advanced Regular Expressions: No Witchcraft

Admittedly, they look strange at first and I’ve been avoiding them myself for a very long time. But for my new plugin “Divi – PageSpeed Booster” I finally had to face them. Regular Expressions have nothing to do with anything you know. But once you understand the principle, you will realize how ingenious the idea behind it is and how many great possibilities it offers you at once. If you make the effort to understand it.

I’m not going into the basics here in the post. These are already well explained elsewhere. For example, there is the course by Jeffrey Way and this page here is recommended to test it: regexr.com. Here in the article it is more about advanced possibilities of use. And here I use PHP to explain the context. But in principle, the regexes themselves should not differ too much from each other in the different languages.

On the site regex.com you will find all the important descriptions about the meaning of the different characters and also how to use them in context. Take a look at it at your leisure if you haven’t dealt with it yet and then let’s get started.

Get image, iframe, source and audio tags in something like a foreach loop

  return preg_replace_callback
  (

      '/(?<media><(?<tag>img|iframe|source|audio)(?![^>]*(?:divilazy|nolazy))[^>]*>(?:\s*<\s*\/\s*iframe\s*>)?)/',

      function ( $match )
      {

          $this->tag   = $match['tag'];
          $this->media = $this->hostToCdn( $match['media'] );

          switch ( $this->tag ) :

              case 'img':

                  $return = $this->image();

                  break;

              case 'source':

                  if ( DALL()->isIn( 'type="video', $this->media ) )
                      $return = $this->video();

                  if ( DALL()->isIn( 'type="audio', $this->media ) )
                      $return = $this->audio();

                  break;

              case 'audio':

                  $return = $this->audio();

                  break;

              case 'iframe':

                  $return = $this->iframe();

                  break;

          endswitch;

          return apply_filters( DALL()->prefix() . '_return_media', $return );

      },

      $this->output

  );

As you can see, we use the function “preg_replace_callback” to get the different instances from the source code by using the first parameter with the regex. Then you can call a callback with the second parameter to edit the single instances. In my case I use a switch to perform different operations depending on the tag.

But what exactly does the regex do? With “?<media>” we name the whole match and can control it via “$match[‘media’]“. With “?<tag>” we can get the corresponding element name via “$match[‘tag’]” to use it for further actions in the callback. The elements in this case are “img“, or “iframe“, or “source“, or “audio“. With a negative lookahead “(?![^>]*(?:divilazy|nolazy))” we exclude 2 classes that can be assigned to the attribute “class“. With “[^>]” we allow all characters except the closing angle bracket “>“. So we can get the whole tag with the opening and closing angle brackets. With “(?:\s*<\s*\/\s*iframe\s*>)?)” we optionally get the closing iframe tag, taking into account that there can also be spaces. There you should pay attention at the place, because there are indeed extensions that execute such with. However, this is fortunately not the rule, but we must be prepared. In any case, you should always hold back on “.*” whenever possible. Because with this you fetch every following character and this can lead to unwanted results.

Get Youtube and Vimeo ID’s from the tags

'/https([^\"\'])*youtu[^\"\']*\/(?<id>\w{11})[^\"\']*?([\'\"])/'

'/https([^\"\'])*vimeo[^\"\']*\/(?<id>\d{7,12})(?:[^\"\']*?([\'\"]))/'

Youtube ID’s are strings with 11 characters. We get these with “(?<id>\w{11})” and can use them further with “$match[‘id’]“.

With Vimeo ID’s it is a bit more difficult to find out how long they actually are. They are always at least 7 numbers long, sometimes 8. In any case, they are an exclusive series of numbers and so we get them with “(?<id>\d{7,12})“. We can also access them with “$match[‘id’]“.

Get background images

'/(?<start><(?<tag>\w{1,12})\s(?![^>]*(divilazy|nolazy|url\(\)))[^>]*(?<!lazy)style=[^>]*)((background-image:|background:)[^>]*(?<value>url\([^\)]*\))[\s;]?)(?<end>[^>]*>)/'

These are a bit more complicated because we don’t know which tag they are assigned to, because in principle you can assign them to any tag using the “style” attribute. So after the opening angle bracket “<” we use the rule “<(?<tag>\w{1,12})\s” with a maximum of 12 characters and a following whitespace to get the tag. We can then retrieve this with “$match[‘tag’]“.

Also in this case we exclude 2 classes with a negative lookahead “(?![^>]*(divilazy|nolazy|url)” and an empty url assignment “url()“. Then follows a negative lookbehind “(?<!lazy)” to allow only “style=”…” and to exclude “lazystyle=”…“.

Then we search with “(background-image:|background:)[^>]*(?<value>url” for a background attribute that continues with url and get it in case of a positive match with “$match[‘value’]“.

Get noscript tags

/**
 * Remove noscript elements to prepare html for lazyLoad
 *
 * @since 1.0
 */
public function clearHtml( $html )
{

    return preg_replace_callback
    (

        '/(?<match>(?:<\s?noscript\s?>)(?:.|\n)*?(?:\/\s?noscript\s?>))/',

        function ( $match )
        {

            $this->counter++;

            $replace = "%%noscript{$this->counter}%%";

            $this->noscript[$replace] = $match['match'];

            return $replace;

        },

        $html

    );

} // end clearHtml

Here we now have a crucial exception. Any content can be present in a noscript tag. Therefore we look with “(?:.|\n)*” for all characters including linebreak. Responsible for this is “.|\n“. And this up to the closing noscript tag “?(?:\/\s?noscript\s?>)“. The whole tag is stored in “$match[‘match’]“.

Here in my case I store the matches in an array “$this->noscript” and replace the noscript tags with “%%noscript{$this->counter}%%“. This way I can process the output with further operations and bring the placeholders back later with a foreach loop and “str_peplace” very easily. In this case in a buffer of “ob_start” which closes automatically.

Do something with a video source

/**
 * Set video background attributes to corresponding elements
 *
 * @since 1.0
 */
public function setVideoBgAttributes( array $data, string $output )
{

    foreach ( $data as $key => $url ) :

        $suffix = DALL()->isIn( 'mp4', $url ) ? 'mp4' : 'webm';

        $find = "/(<[^>]*source[^>]*)src=(['\"\s])" . str_replace( '/', '\/', $url ) . "([^>]*>)/";

        $repl = "$1class=\"divilazy bg\" src=\""
                . $this->hostToCdn( DALL()->videos() ) . DALL()->dummy()
                . ".{$suffix}\" data-lazyvideo=$2{$this->hostToCdn( $url )}$3";

        $output = preg_replace( $find, $repl, $output );

    endforeach;

    return $output;

} // end setVideoBgAttributes

In this case the video url was known and it was a matter of converting the source tag as needed for the lazy load plugin. Rather rarely the case, but perhaps helpful in principle.

With the regex the source tag is searched, which contains the video link “$url“. With “(<[^>]*source[^>]*)” we get everything before the attribute “src” and can put it back later with “$1“. With “src=([‘\”\s])” we get the kind of opening quotes and put it back through the 2nd angle bracket with “$2“. With “([^>]*>)/” we get everything that follows after the matched url and reinsert it with “$3“.

In between we can do all the operations we need to output our source tag.

Final words for the extended use of Regular Expressions

Even if they scare you a bit at first, I haven’t heard of anyone avoiding them at first, Regular Expressions are a very valuable and powerful tool for getting a handle on really difficult tasks. From a performance standpoint, you should always try to work around them. So is there a safe way to solve something without Regular Expressions? Then use it always and without exception. Otherwise, they are the way out when all other tools fail.

Have you suggestions for improvements to this article? Just use the comment area below. Do you want support for implementation or do you need help elsewhere? You can book us. For this, simply use our contact form to get in touch with us.

Divi is a registered trademark of Elegant Themes, Inc. This website is not affiliated with nor endorsed by Elegant Themes.

Get the best out of your web!

Bruno Bouyajdad

Webentwicklung, AI, Blogautor

Über den Autor

Bruno Bouyajdad liebt es, komplexe oder komplizierte Zusammenhänge möglichst einfach zu erklären, damit Menschen, die sich für die Themenbereiche, die er behandelt, einen möglichst einfachen Zugang bekommen. Es fasziniert ihn, sich in diese Welten der digitalen Bits einzugraben und dann Lösungen für seine Follower bereitzuhalten, welche ihre Probleme auf möglichst einfache Weise lösen.

Nach knapp 10 Jahren Erfahrung im Außendienst für Datensicherungslösungen KMU (vor den Clouds) und Photovoltaik, ist er seit über 13 Jahren Webentwickler, mit viel Erfahrung in PHP, JavaScript, WooCommerce, WordPress und Multisite-Netzwerken, auch für Multi-Language-Lösungen. Insbesondere in den Bereichen Plugin-Entwicklung, Theme-Erstellung, Server-Administration und vieles mehr. Er ist auch Experte in Sachen PageSpeed Score-Optimierung und Sicherheit. Dazu kleinere Projekte mit Symfony, VUE, React.

Dazu gehören auch Webdesign, Content-Erstellung, SEO, insbeondere Technical-SEO. Er beherrscht die komplette Adobe Master Suite, hat gute Kenntnisse in C4D, FCPX und Logic Pro.

Seit 5 Jahren begeistert er sich zudem für die Programmierung neuronaler Netzwerke in Python.

Weiterhin teilt er sein Wissen als Blog-Autor, ist begeisterter Fotograf und wenn die Zeit es zulässt, erstellt er hochwertige virtuelle Panorama-Touren, oder schneidet begeistert Videos.

Zudem ist er ChatGPT Prompt-Engineer. Das war einfach Liebe auf den ersten Blick.

In seiner Freizeit beschäftigt er sich gerne mit naturwissenschaftlichen und philosophischen Fragestellungen oder betätigt sich sportlich beim Wandern, auf dem Fahrrad oder im Studio.

Udacity Certificate AI Programming

0 0 votes

Article Rating

0 Comments

Newest

Oldest Most Voted

Inline Feedbacks

View all comments

Boost Divi with the power of LazyLoad » Indikator Design

2 years ago

[…] In this case it simply makes sense to work with Regular Expressions. For further information I have written a blog here: Extended Regular Expressions […]