Custom URL parsing and overlapping patterns

Updated 3 months ago by admin

The Web Security rules engine uses a category based system for controlling access to specific web sites, called the "Custom URL" module. Categories contain patterns that determine if the rules engine will use the category to match the requested URL. They are a flexible way of controlling web access or overriding the base categorisation of a URL.

The rules engine optimises the categories and their patterns before any rules are executed in the account. It is important to understand how this may affect rule matching if there are overlapping patterns within multiple categories in the account.

Firstly, it is important to understand how patterns are parsed.

Given the example pattern account.acme.com :

  • The pattern is split into parts from left to right using the period "." character. The parts in the example are: account, acme and com
  • There is an implicit wildcard to the left of the left most part in the pattern, i.e. account is evaluated as *.account.acme.com, meaning it would match account.acme.com and any/all sub-domains of that e.g. my.account.acme.com.

The next step is for the logic in the rules engine to determine if the pattern matches the requested URL, i.e. https://my.account.acme.com.

The rules engine first tries an exact match on the URL domain. If one or more URL categories contain the exact pattern my.account.acme.com, then those categories are selected for use with any filter rules that may apply. Any other sub-domain (or wildcard) matching is skipped and the associated categories will not be selected for rule processing.

If there is no exact match on the URL domain, the rules engine moves on to removing sub-domain parts from the URL domain, from left to right, and trying to match again. For example, it will remove my and search for account.acme.com, and if no match, it will remove account and search for acme.com, and so on until there is nothing left to match. This is important because it means the most specific pattern will match first, and any others will be ignored, even if the other less specific patterns are in a different category.

If there is a path involved in the pattern it will be evaluated only once the domain part has been matched.

Overlapping pattern scenario

It is possible that two or more patterns could exist in different categories and the patterns could overlap each other.

For example, the category "My Blocked Sites" may contain account.acme.com and the category "My Allowed Sites" may contain acme.com.

When the rules engine begins to process the requested URL, e.g. https://my.account.acme.com, it determines that account.acme.com exists in the URL category "My Blocked Sites" and it is more specific than any other entry and therefore discards the more generic patternacme.com even though it is in a different category, i.e. "My Allowed Sites".

Now consider that these categories are used in two rules, attempting to control the same site.

For example, a rule with priority 10 blocks "My Blocked Sites" for all users, and a rule with priority 1 allows "My Allowed Sites" for a specific group of users, thus trying to override the block for certain users. Remember that rules are processed in priority order (ascending) and the first rule that matches will win and end further rule processing.

In this case, the rules engine will not match My Allowed Sites because acme.com was disregarded in favour of the more specific account.acme.com which only exists in My Blocked Sites category. One solution would be to change account.acme.com to acme.com, so that acme.com becomes the most significant pattern and both categories can match if required by the rules.

Strategies to avoid URL pattern overlap:

  • If you wish to control an entire web site (domain), then use the base domain e.g. acme.com in a URL category. Avoid sub-domains like www.acme.com.
  • Don't add more specific patterns to a category if the base domain is already in the category e.g. avoid a category containing patterns like www.acme.com and acme.com. Just use acme.com.
  • Use a single Category to control a specific site, if it warrants it. For example, create a new category called Acme Site and reference that in a rule, rather than using multiple patterns in multiple categories for the same site.
  • Use the Search options in the Custom URL section to find all patterns that match acme.com and remove any duplicates or more significant patterns that could override the base domain match you require.


How did we do?