Entry Types in Lists

Entry Types in Lists

Entry Types in Lists

When adding an Entry to a List, you can choose the ‘Entry Type’.  The search order is important when using these Entry Types.  See Order of Precedence for Entry Types' below.

Breakdown of a URL

scheme://username:password@domain:port/path?query_string#anchor

Entry Type Descriptions

The table below shows the available options and definitions for the Entry Types displayed when adding a New Entry.

Option

Description

URL

Use this option for a base URL entry.

Scheme

The ‘Scheme’ entry type only processes the scheme part of the URL.

 

Path

The ‘Path’ Entry Type searches in the path section of the URL and ignores the host.  All path entries begin at the ‘/’.  You can input query types without the leading character. This will be added by the WebAdmin. 

 

Query

The ‘Query’ entry type searches the CGI query of the URL and ignores the host and path. Query entries begin at the ‘?’.  You can input query types without the leading character.  They will be added by the WebAdmin.

 

Keyword

Keywords are strings of characters that can appear in a URL.  

Multiple Keywords

The Multiple Keywords means that the entry is a set of multiple Keyword entries separated by a space. The matched URLs should have these words in any order.

Whole Word

The Whole Word type means that the entry word matches only a whole substring between separators (./+&=) or the line end.

Multiple Words

The request URL must contain each word in any order to match this entry. This is like multiple Keywords but each part should match exactly a whole substring between separators.

File Extension

File extension support allows you to enter extensions such as CSS, PNG, into any List.  This functionality is supported in all Lists.  File extension support can complement your filtering policy by preventing access to various file types that are suspect, or simply not allowed as part of your IT Acceptable Use Policy. 

Regular Expression

Regular expressions are supported in All lists. However, there is a performance impact when they are used. It is recommended that the full URL be used where possible.  Using a Keyword instead of a regular expression can often resolve the issue.  A Keyword looks for the occurrence of that Keyword in the URL or URI.

Default Action

This action is used if nothing specific is found in the List. This can be used to categorize unknown entries such as unknown countries or unknown User Agents.

List Parsing Order of Precedence for Entry Types

This order of precedence is not adjustable and is based on the performance characteristics of each specific list type.  The search order in the list is shown in the image below. 

Once a match is made, processing of the other features will not be completed.  The order is based on list performance characteristics.

How Lists Act

The following example displays how a list would act when adding more and more entries of different types.  The type of entries and the entry that is matched is based on the above processing precedence.  Each feature set also has a processing precedence which must also be known for the exact nature of the processing.

Remember, each type of list feature used also has a processing precedence that must be considered when adding multiple entries of the same type to the same list.  Keep your lists simple and avoid adding similar entries.  Add the most simplistic entry to the best list type that will meet your requirements.

Scheme Entry Type in Lists

You can add a scheme to your Local and System-Wide Lists. In the example below, three Scheme types have been entered.  We want to deny access to ftp and lastfm sites but allow Skype. The ‘Scheme’ entry type only processes the scheme part of the URL. Protocols or schemes with ports specified can be entered as ‘Schemes’.

Lenovo NetFilter can help correct URL entries.  See 'List Suggestions' documentation.

Path Entry Type in Lists

You can add a path to your Local and System-wide Lists.  In the example below, only a path that begins with ‘video’ will be denied. The ‘Path’ Entry Type searches in the path section of the URL and ignores the host.  All path entries begin at the ‘/’.  You can input query types without the leading character. 

In the example below, only the first path entry domain.com/video is denied.  If the entry was domain.com/news/video, the /video would not be denied because it is not the first path entry.  The entry /*/video would deny the video path.

Query Entry Type in Lists

You can add a query to your Local and System-wide Lists.  In the example below, it will deny a query in any URL if it has the form ‘q=sex’. Any query with ‘sex’ in it will be denied.

For example, Google and Bing include the searched word in the CGI query as ‘q=word’. But Yahoo includes it as ‘p=word’ and YouTube does it as ‘search_query=word’. Therefore, in the example below, the query will block Google and Bing requests but will not block the same search in Yahoo or YouTube.

You can input query types without the leading character.  They will be added by the WebAdmin.

Regular Expressions Performance Impact

Regular expressions are supported in all lists; however, there is a performance impact when they are used. It is recommended that the full URL be used where possible.  Using a Keyword instead of a regular expression can often resolve the issue.  A Keyword looks for the occurrence of that keyword in the URL or URI.

The order of precedence for the types is URL, Keyword, extension, and lastly regular expression; this order is based on performance.  Once a match is made, then the processing stops.

A warning message displays in the Lists window when a ‘Regular Expression’ is added to a list warning that a regular expression entry is very slow on processing.

File Extension Support in Lists

File extension support allows you to enter extensions like .EXE, .CSS, or .PNG, into any List.  This functionality is supported in all Lists.  File extension support can complement your filtering policy by preventing access to various file types that are suspect, or simply not allowed as part of your IT Acceptable Use Policy. 

It is important to note that this may make the enforcement of filtering policy more restrictive or permissive than without this feature.  You will want to consider over blocking and/or under blocking when developing your filtering policy.

Entries

Example URL

Meaning

ZIP

http://www.domain.com/compressed.zip

Matched because of the file extension.

ZIP

http://www.domain.com/ZIPformat/filelink

Not matched, since ZIP is not a file extension in this case, rather a keyword.

EXE

https://www.domain.com/executable.exe

Matched because of the file extension.

RAR

http://www.domain.com/file.rar/download

Not matched, because file.rar is not a filename, rather a part of the URL.

RAR

http://www.domain.com/file.rar?mirror=canada

Matched because the file.rar is the extension of the filename.  CGI parameters are stripped.

JPG

http://www.domain.com/test/file.jpg/?#/&

Matched because special characters, /?#& will be stripped.

Adding an extension that is a subset of another extension will result in an unknown processing order and output.  If you add the extension '.exe' as one category and the extension '.ex' in another category, the output will be unknown and must be tested for a better determination.

Extension Performance

Extension list processing is extremely fast due to the nature of the algorithm implemented.  In fact, since the extension is matched only at the end of a URL, it can exceed the performance of all other methods. 

Keyword Support in Lists

Keyword support allows all lists to contain Keywords for Categorization.  Keyword support removes the need, in some cases, for creating complex regular expressions to do Keyword matching in all lists.

Keywords are any substring that is searched in the whole URL string, from the scheme to the file name or the CGI Query. By default, they must contain three or more characters and no spaces (This default is configurable in WebAdmin Settings under 'Minimum Keyword length'). It looks for Keywords in any URL and part of a URL included in the hostname, path, and query. Its main purpose is to block Search Engine queries.

In the WebAdmin, you always select the current entry type explicitly. Therefore, if you add http:// but the type is 'Keyword' it will be saved and then searched as a keyword.  In addition, if the type is 'URL' but you input an entry without the http:// prefix, it will be interpreted as a URL and searched as a URL (rather than as a Scheme if the line doesn’t include dots, slashes, and a question mark.)

Keywords can be very powerful, but they can easily cause the blocking of more sites than you intend (i.e. Overblocking). Therefore, always use keywords as a last resort. You can also use the Whole Word entry type instead of Keyword.

As noted above, always select the current entry type explicitly. Therefore, if you add something with http:// but the type is Keyword it will be saved and then searched as a keyword.  The Keyword type can be used in any list. However, the keyword sex also blocks search engine queries for EssexWessexsextant, and sexton.

Entry

Meaning

Notes

sex

*sex*

Anywhere in the URL that the string 'sex' is found, there will be a match, resulting in a policy decision.  For example, the following sites would match:

·       http://www.essex.gov.uk

·       http://www.sex.com

·       http://example.site.org/path/sussex.file.html

 

.org

*.org*

This will match the string .org if found in the URL.  The following sites would match:

·       http://www.sex.org/file/path

·       http://www.cool.com/something.org/something/file

·       http://example.com/something/?query=.org

 

Keyword processing can be used to block Search Engines requests that contain specified search terms. 

The list processing procedure always tries to find the longest matching word. It always stops the Keyword search when the first matching word is found. As an example, if a list has the keywords 'test' and 'tes', it means that if the URL has a substring 'test', it will never match the 'tes' entry. But this entry can match another URL where it is a part of some other word, e.g. 'quotes'.

Singular and Multiple Keywords and Whole and Multiple Words

This section explains more about Keyword and Multiple Keywords Types as well as Whole Word and Multiple Words types.

For Keywords, a whole or partial word, for example 'key' would match both 'key' and 'keyword'.

For Whole Word, a whole word, for example 'key' would match –key- but not 'keyword'.

These can be added as both a complete singular entry or as a multiple instance entry.

Singular type Keyword entries will use the full text provided, including spaces, and match it as one word or keyword. This means that 'a key' will match 'a key' but not 'key a'. The restrictions of Keyword or Whole Word will also apply depending on which type of entry is used as well; meaning for Keywords 'a key' will match 'a keyword', where Whole Word will not.

Multiple type entries will separate the text at each space to create multiple words or keywords. This means that 'a key' will match both 'a key' and 'key a'. The restrictions of Keyword or Whole Word will also apply, depending on which type of entry is used. This means that for Multiple Keywords 'a key' will match 'a small monkey' where Multiple Words will not.  See the ‘Multiple Keywords Entry Type’ and the ‘Multiple Word Entry Type’ topics below for more information.

Whole Word Entry Type

The 'Whole Word' type means that the entry word matches only a whole substring between separators that are ./+&= or the line end.'

For example, if the word is 'sex' and the type is Whole Word it matches URLs:

·       http://sex.com/

·       http://www.sex.com/

·       http://com.sex

·       http://company.com/sex/

·       http://company.com/sex

·       http://company.com/sex.jpg

·       http://company.com/jpg.sex

·       http://company.com/?q=sex

·       http://company.com/?q=sex&a=b

·       http://company.com/?q=this+is+sex

It doesn't match URLs like:

·       http://company.com/msexplorer/

·       http://company.com/?q=essex

Note that the '=' separator can be a prefix but not a postfix for the word.

About the + sign in Entry Types

The "+" cannot be applied in Keywords when added to a List. As an example, when Google+ is entered, Google will be blocked while + will be treated as a space.  This relates to Keywords, Multiple Keywords, Multiple Words and Whole Word. 

Multiple Keywords Entry Type

The Multiple Keywords entry type is a set of a few words separated by a space. The matched URLs should have these words in any order.

As an example, the multi-keywords ‘we want cookie’ matches these example URLs:

·       http://company.com/?q=we+want+cookie

·       http://company.com/?q=cookie+we+want

·       http://company.com/we/cookie/want.gif

·       http://company.com/wewantcookie/

·       http://company.com/?q=ewe+wants+cookie

If a list has two ‘phrases’ that have a common part, the URL matching the phrase is completed first. As an example, if the phrases are ‘we want cookie’ or ‘we want milk’, the URL:

http://company.com/?q=we+want+milk+and+cookie matches the ‘we want milk’ phrase. 

But if one phrase is a subset of the other, the one with the longest phrase is chosen even if the shorter phrase is completed first.

For example, if phrases are: ‘we want cookie’, or ‘we want chocolate cookie’, the URL:

http://company.com/?q=we+want+cookie+and+chocolate matches the ‘we want chocolate cookie’ phrase.

If the list contains single keywords and phrases with these keywords, the URL matches the complete phrase if it contains the phrase or the first found single word.

As an example, if items are:

·       we

·       want

·       cookie

·       we want cookie

The URL:

http://company.com/?q=cookie+we+want matches the ‘we want cookie’ entry.

But the URL:

http://company.com/?q=cookie+we+like matches the entry ‘cookie’

Although the Multiple Keywords should be separated by a space, it allows multiple spaces as a separator.

Multiple Words Entry Type

The Entry Type ‘Multiple Words’ combines Whole Word and Multiple Keywords' properties. For example, the URL:

http://company.com/?q=ewe+wants+cookie matches Multiple Keywords ‘we want cookie’ but doesn't match Multiple Words ‘we want cookie’ (because ‘ewe’ doesn't match the Whole Word ‘we’ and ‘wants’ doesn't match the Whole Word ‘want’).

Note that the multi-keyword entry ‘this is cookie’ matches the URL like

http://company.com/?q=this+cooke because the ‘is’ keyword is a part of ‘this’.  But the multi-whole-word ‘this is cookie’ doesn't match ‘this+cookie’ because ‘is’ is not a Whole Word.

Any list can mix Multiple Words and Multiple Keywords phrases with the same words. The processing function should distinguish the context phrase and use non-restricted Keywords only in Multiple Keyword phrases such as:

·       Multi Keyword ‘this is alex’

·       Multi Word ‘this is bob’

The first entry matches URLs:

http://company.com/?q=this+is+alex

http://company.com/?q=thisisalex

http://company.com/?q=this+alex

But the second entry matches:

http://company.com/?q=this+is+bob

and doesn't match:

http://company.com/?q=thisis+bob

http://company.com/?q=this+bob

Although the URL has ‘is’ substring, it is not a restricted Whole Word and can be used for the first entry check but can't for the second one.

The URL like:

http://company.com/?q=thisis+bob+and+his+friend+alex matches the first entry although the ‘bob’ appears first in the URL.

All new entry types are case insensitive (as well as the existent Keyword type).

Multiple Words and Multiple Keywords with Negation Feature

Multiple Words and Multiple Keywords can work with a negation feature.  This means that the whole phrase (Multiple Keywords) does not match if the word is found. Two characters can be used in the search.  They are the bang ( ! ) and the caret ( ^ ).

Example

Definition

!word

This example means that the multi-keyword doesn't match the string if this word is found even within other words.

^longword

This word should be skipped, and other words should not be searched inside this substring. As an example, the entry sex ^msexplorer means search for 'sex' but only if it is not a part of 'msexplorer' substring

 

Example of 'not' Operator

Lenovo NetFilter can already work with 'Multiple Keywords' entries but this new feature adds the ability to use a not operator (!) to instruct that if the “not” word is found then do not deny/allow it.

Example:

You want to find all Request URLs that have the word ‘games’ and ‘educational’ in it but not if it also has ‘online’.

The URL list entry would be: games educational !online

The URL below would not match:

http://www.example.xy/games/educational/online/cargo.html

The URL below would match:

http://www.example.xy/games/educational/cargo.html

This feature can be used in cases where you want to scan the URL for certain keywords but exclude them from the results if a keyword is found.

Whole Word feature with Restrictions

A second aspect to this is the caret operator (^).  Using the ^ is similar to using the whole word feature with restrictions.

Example:

You want to find the occurrence of a word in the URL but not if it is inside a particular word.

The URL list entry would be: quit ^quitter

The below URLs will match

http://www.example.xy/games/educational/quit/cargo.html

http://www.example.xy/games/educational/quitting/cargo.html

The below URL will not match

http://www.example.xy/games/educational/quitter/cargo.html

Keyword Performance

Keyword list processing is extremely fast due to the nature of algorithm implemented.  In fact, since the keyword is matched anywhere in the URL, it can be faster than URL List matching depending on the nature of the match. 

It is important to note the URL matching applies auto wild carding and a form of regular expression analysis.  Keyword analysis in most cases may not be acceptable.  From a performance standpoint keyword matching is very fast.

Keyword vs Multiple Keywords vs Whole Word vs Multiple Words

The table below contains examples of different list entries. It’s important to understand in which deployment we would choose one method over another. In the following test, 'this is a test' has been used for Keyword, Multiple Keywords, Whole Word and Multiple Words.

Legend:

 

X

This indicates that the tested URL matches the list entry and is denied

O

This means that it does not match and it is allowed.

 

URL Used:

Keyword

Multiple Keywords

Whole Word

Multiple Words

http://company.com/?q=this+is+a+test

X

X

X

X

http://company.com/?q=this+is+a+test+something

X

X

X

X

http://company.com/this+is+a+test/something.html

X

X

X

X

http://company.com/?q=this+is+a+testing

X

X

O

 O

http://company.com/?q=readthis+is+a+test

X

X

O

O

http://company.com/?q=is+a+test+this

O

X

O

X

http://company.com/?q=this+is+a+something+test

O

X

O

X

http://thisisatest.com

O

X

O

O

http://company.com/this/is/a/test/something.html

O

X

O

X

http://company.com/test/a/is/this/something.html

O

X

O

X

http://company.com/sothis/his/a/test/something.html

O

X

O

O

http://company.com/this.is.a.test/something.html

O

X

O

X

http://company.com/this.test.is.a/something.html

O

X

O

X

http://company.com/this.his.as.test/something.html

O

X

O

O

http://company.com/this&is&a&test/something.html

O

X

O

X

http://company.com/this&his&a&tester/something.html

O

X

O

O

http://company.com/this&test&is&a/something.html

O

X

O

X

http://company.com/this=is=a=test/something.html

O

X

O

O

http://company.com/?this=is&a=test

O

X

O

O

http://company.com/?q=this+is+a+retest

O

X

O

O

http://company.com/?q=thisisatest

O

X

O

O

http://company.com/?q=thisisa

O

 O

O

O

http://company.com/?q=this%is%a%test

O

X

O

O

http://company.com/?q=athistest

O

X

O

O

http://atest.com/?q=thisis

O

X

O

O

 

 


    • Related Articles

    • Importing and Exporting Lists

      Importing and Exporting Lists The most efficient way to add a list of URLs to your local Allow or Deny lists is to create an external file in Excel or a text editor and then import the list into the WebAdmin. Use the Export button to export a list to ...
    • Local Lists

      URL / Keywords Local Lists The URL/Keyword Local Lists are used to perform Actions for specific URLs for a selected Group policy. Only users assigned to that Policy’s Group, during times when the Policy is active, will be affected. Lists allow for ...
    • Shared Lists

      Shared Lists Shared Lists can be created in the 'Lists' window, List Search, the List Migration window, or they can be converted from a Local List. They can then be added to Group/Policy Templates and used for filtering in a Group Policy. Shared ...
    • List

      Lists Window The ‘List’ window is the main window for managing Lists. It is found under the Policies menu.  Use the 'Lists' window to create, manage and assign Lists for filtering. URL Lists support URLs and URL parts (such as scheme, path, or ...
    • Editing List Settings

      List Settings Tab The ‘Lists' window is the main window for managing Lists. It is found under the Policies menu.  Use the 'Lists' window to create, manage and assign Lists for filtering. Lists support URLs and URL parts (such as scheme, path, or ...