When adding an Entry to a List, you can choose the ‘Entry Type’. The search order is important when using these Entry Types. See Order of Precedence for Entry Types' below.
scheme://username:password@domain:port/path?query_string#anchor
The table below shows the available options and definitions for the Entry Types displayed when adding a New Entry.
Option | Description |
URL | Use this option for a base URL entry. |
Scheme | The ‘Scheme’ entry type only processes the scheme part of the URL.
|
Path | The ‘Path’ Entry Type searches in the path section of the URL and ignores the host. All path entries begin at the ‘/’. You can input query types without the leading character. This will be added by the WebAdmin.
|
Query | The ‘Query’ entry type searches the CGI query of the URL and ignores the host and path. Query entries begin at the ‘?’. You can input query types without the leading character. They will be added by the WebAdmin.
|
Keyword | Keywords are strings of characters that can appear in a URL. |
Multiple Keywords | The Multiple Keywords means that the entry is a set of multiple Keyword entries separated by a space. The matched URLs should have these words in any order. |
Whole Word | The Whole Word type means that the entry word matches only a whole substring between separators (./+&=) or the line end. |
Multiple Words | The request URL must contain each word in any order to match this entry. This is like multiple Keywords but each part should match exactly a whole substring between separators. |
File Extension | File extension support allows you to enter extensions such as CSS, PNG, into any List. This functionality is supported in all Lists. File extension support can complement your filtering policy by preventing access to various file types that are suspect, or simply not allowed as part of your IT Acceptable Use Policy. |
Regular Expression | Regular expressions are supported in All lists. However, there is a performance impact when they are used. It is recommended that the full URL be used where possible. Using a Keyword instead of a regular expression can often resolve the issue. A Keyword looks for the occurrence of that Keyword in the URL or URI. |
Default Action | This action is used if nothing specific is found in the List. This can be used to categorize unknown entries such as unknown countries or unknown User Agents. |
This order of precedence is not adjustable and is based on the performance characteristics of each specific list type. The search order in the list is shown in the image below.
Once a match is made, processing of the other features will not be completed. The order is based on list performance characteristics.
The following example displays how a list would act when adding more and more entries of different types. The type of entries and the entry that is matched is based on the above processing precedence. Each feature set also has a processing precedence which must also be known for the exact nature of the processing.
Remember, each type of list feature used also has a processing precedence that must be considered when adding multiple entries of the same type to the same list. Keep your lists simple and avoid adding similar entries. Add the most simplistic entry to the best list type that will meet your requirements.
You can add a scheme to your Local and System-Wide Lists. In the example below, three Scheme types have been entered. We want to deny access to ftp and lastfm sites but allow Skype. The ‘Scheme’ entry type only processes the scheme part of the URL. Protocols or schemes with ports specified can be entered as ‘Schemes’.
Lenovo NetFilter can help correct URL entries. See 'List Suggestions' documentation.
You can add a path to your Local and System-wide Lists. In the example below, only a path that begins with ‘video’ will be denied. The ‘Path’ Entry Type searches in the path section of the URL and ignores the host. All path entries begin at the ‘/’. You can input query types without the leading character.
In the example below, only the first path entry domain.com/video is denied. If the entry was domain.com/news/video, the /video would not be denied because it is not the first path entry. The entry /*/video would deny the video path.
You can add a query to your Local and System-wide Lists. In the example below, it will deny a query in any URL if it has the form ‘q=sex’. Any query with ‘sex’ in it will be denied.
For example, Google and Bing include the searched word in the CGI query as ‘q=word’. But Yahoo includes it as ‘p=word’ and YouTube does it as ‘search_query=word’. Therefore, in the example below, the query will block Google and Bing requests but will not block the same search in Yahoo or YouTube.
You can input query types without the leading character. They will be added by the WebAdmin.
Regular expressions are supported in all lists; however, there is a performance impact when they are used. It is recommended that the full URL be used where possible. Using a Keyword instead of a regular expression can often resolve the issue. A Keyword looks for the occurrence of that keyword in the URL or URI.
The order of precedence for the types is URL, Keyword, extension, and lastly regular expression; this order is based on performance. Once a match is made, then the processing stops.
A warning message displays in the Lists window when a ‘Regular Expression’ is added to a list warning that a regular expression entry is very slow on processing.
File extension support allows you to enter extensions like .EXE, .CSS, or .PNG, into any List. This functionality is supported in all Lists. File extension support can complement your filtering policy by preventing access to various file types that are suspect, or simply not allowed as part of your IT Acceptable Use Policy.
It is important to note that this may make the enforcement of filtering policy more restrictive or permissive than without this feature. You will want to consider over blocking and/or under blocking when developing your filtering policy.
Entries | Example URL | Meaning |
ZIP | Matched because of the file extension. | |
ZIP | Not matched, since ZIP is not a file extension in this case, rather a keyword. | |
EXE | Matched because of the file extension. | |
RAR | Not matched, because file.rar is not a filename, rather a part of the URL. | |
RAR | Matched because the file.rar is the extension of the filename. CGI parameters are stripped. | |
JPG | Matched because special characters, /?#& will be stripped. |
Adding an extension that is a subset of another extension will result in an unknown processing order and output. If you add the extension '.exe' as one category and the extension '.ex' in another category, the output will be unknown and must be tested for a better determination.
Extension list processing is extremely fast due to the nature of the algorithm implemented. In fact, since the extension is matched only at the end of a URL, it can exceed the performance of all other methods.
Keyword support allows all lists to contain Keywords for Categorization. Keyword support removes the need, in some cases, for creating complex regular expressions to do Keyword matching in all lists.
Keywords are any substring that is searched in the whole URL string, from the scheme to the file name or the CGI Query. By default, they must contain three or more characters and no spaces (This default is configurable in WebAdmin Settings under 'Minimum Keyword length'). It looks for Keywords in any URL and part of a URL included in the hostname, path, and query. Its main purpose is to block Search Engine queries.
In the WebAdmin, you always select the current entry type explicitly. Therefore, if you add http:// but the type is 'Keyword' it will be saved and then searched as a keyword. In addition, if the type is 'URL' but you input an entry without the http:// prefix, it will be interpreted as a URL and searched as a URL (rather than as a Scheme if the line doesn’t include dots, slashes, and a question mark.)
Keywords can be very powerful, but they can easily cause the blocking of more sites than you intend (i.e. Overblocking). Therefore, always use keywords as a last resort. You can also use the Whole Word entry type instead of Keyword.
As noted above, always select the current entry type explicitly. Therefore, if you add something with http:// but the type is Keyword it will be saved and then searched as a keyword. The Keyword type can be used in any list. However, the keyword sex also blocks search engine queries for Essex, Wessex, sextant, and sexton.
Entry | Meaning | Notes |
sex | *sex* | Anywhere in the URL that the string 'sex' is found, there will be a match, resulting in a policy decision. For example, the following sites would match: · http://example.site.org/path/sussex.file.html
|
.org | *.org* | This will match the string .org if found in the URL. The following sites would match: · http://www.sex.org/file/path · http://www.cool.com/something.org/something/file · http://example.com/something/?query=.org
|
Keyword processing can be used to block Search Engines requests that contain specified search terms.
The list processing procedure always tries to find the longest matching word. It always stops the Keyword search when the first matching word is found. As an example, if a list has the keywords 'test' and 'tes', it means that if the URL has a substring 'test', it will never match the 'tes' entry. But this entry can match another URL where it is a part of some other word, e.g. 'quotes'.
This section explains more about Keyword and Multiple Keywords Types as well as Whole Word and Multiple Words types.
For Keywords, a whole or partial word, for example 'key' would match both 'key' and 'keyword'.
For Whole Word, a whole word, for example 'key' would match –key- but not 'keyword'.
These can be added as both a complete singular entry or as a multiple instance entry.
Singular type Keyword entries will use the full text provided, including spaces, and match it as one word or keyword. This means that 'a key' will match 'a key' but not 'key a'. The restrictions of Keyword or Whole Word will also apply depending on which type of entry is used as well; meaning for Keywords 'a key' will match 'a keyword', where Whole Word will not.
Multiple type entries will separate the text at each space to create multiple words or keywords. This means that 'a key' will match both 'a key' and 'key a'. The restrictions of Keyword or Whole Word will also apply, depending on which type of entry is used. This means that for Multiple Keywords 'a key' will match 'a small monkey' where Multiple Words will not. See the ‘Multiple Keywords Entry Type’ and the ‘Multiple Word Entry Type’ topics below for more information.
The 'Whole Word' type means that the entry word matches only a whole substring between separators that are ./+&= or the line end.'
For example, if the word is 'sex' and the type is Whole Word it matches URLs:
· http://company.com/?q=sex&a=b
· http://company.com/?q=this+is+sex
It doesn't match URLs like:
· http://company.com/msexplorer/
Note that the '=' separator can be a prefix but not a postfix for the word.
The "+" cannot be applied in Keywords when added to a List. As an example, when Google+ is entered, Google will be blocked while + will be treated as a space. This relates to Keywords, Multiple Keywords, Multiple Words and Whole Word.
The Multiple Keywords entry type is a set of a few words separated by a space. The matched URLs should have these words in any order.
As an example, the multi-keywords ‘we want cookie’ matches these example URLs:
· http://company.com/?q=we+want+cookie
· http://company.com/?q=cookie+we+want
· http://company.com/we/cookie/want.gif
· http://company.com/wewantcookie/
· http://company.com/?q=ewe+wants+cookie
If a list has two ‘phrases’ that have a common part, the URL matching the phrase is completed first. As an example, if the phrases are ‘we want cookie’ or ‘we want milk’, the URL:
http://company.com/?q=we+want+milk+and+cookie matches the ‘we want milk’ phrase.
But if one phrase is a subset of the other, the one with the longest phrase is chosen even if the shorter phrase is completed first.
For example, if phrases are: ‘we want cookie’, or ‘we want chocolate cookie’, the URL:
http://company.com/?q=we+want+cookie+and+chocolate matches the ‘we want chocolate cookie’ phrase.
If the list contains single keywords and phrases with these keywords, the URL matches the complete phrase if it contains the phrase or the first found single word.
As an example, if items are:
· we
· want
· cookie
· we want cookie
The URL:
http://company.com/?q=cookie+we+want matches the ‘we want cookie’ entry.
But the URL:
http://company.com/?q=cookie+we+like matches the entry ‘cookie’
Although the Multiple Keywords should be separated by a space, it allows multiple spaces as a separator.
The Entry Type ‘Multiple Words’ combines Whole Word and Multiple Keywords' properties. For example, the URL:
http://company.com/?q=ewe+wants+cookie matches Multiple Keywords ‘we want cookie’ but doesn't match Multiple Words ‘we want cookie’ (because ‘ewe’ doesn't match the Whole Word ‘we’ and ‘wants’ doesn't match the Whole Word ‘want’).
Note that the multi-keyword entry ‘this is cookie’ matches the URL like
http://company.com/?q=this+cooke because the ‘is’ keyword is a part of ‘this’. But the multi-whole-word ‘this is cookie’ doesn't match ‘this+cookie’ because ‘is’ is not a Whole Word.
Any list can mix Multiple Words and Multiple Keywords phrases with the same words. The processing function should distinguish the context phrase and use non-restricted Keywords only in Multiple Keyword phrases such as:
· Multi Keyword ‘this is alex’
· Multi Word ‘this is bob’
The first entry matches URLs:
http://company.com/?q=this+is+alex
http://company.com/?q=thisisalex
http://company.com/?q=this+alex
But the second entry matches:
http://company.com/?q=this+is+bob
and doesn't match:
http://company.com/?q=thisis+bob
http://company.com/?q=this+bob
Although the URL has ‘is’ substring, it is not a restricted Whole Word and can be used for the first entry check but can't for the second one.
The URL like:
http://company.com/?q=thisis+bob+and+his+friend+alex matches the first entry although the ‘bob’ appears first in the URL.
All new entry types are case insensitive (as well as the existent Keyword type).
Multiple Words and Multiple Keywords can work with a negation feature. This means that the whole phrase (Multiple Keywords) does not match if the word is found. Two characters can be used in the search. They are the bang ( ! ) and the caret ( ^ ).
Example | Definition |
!word | This example means that the multi-keyword doesn't match the string if this word is found even within other words. |
^longword | This word should be skipped, and other words should not be searched inside this substring. As an example, the entry sex ^msexplorer means search for 'sex' but only if it is not a part of 'msexplorer' substring
|
Lenovo NetFilter can already work with 'Multiple Keywords' entries but this new feature adds the ability to use a not operator (!) to instruct that if the “not” word is found then do not deny/allow it.
Example:
You want to find all Request URLs that have the word ‘games’ and ‘educational’ in it but not if it also has ‘online’.
The URL list entry would be: games educational !online
The URL below would not match:
http://www.example.xy/games/educational/online/cargo.html
The URL below would match:
http://www.example.xy/games/educational/cargo.html
This feature can be used in cases where you want to scan the URL for certain keywords but exclude them from the results if a keyword is found.
A second aspect to this is the caret operator (^). Using the ^ is similar to using the whole word feature with restrictions.
Example:
You want to find the occurrence of a word in the URL but not if it is inside a particular word.
The URL list entry would be: quit ^quitter
The below URLs will match
http://www.example.xy/games/educational/quit/cargo.html
http://www.example.xy/games/educational/quitting/cargo.html
The below URL will not match
http://www.example.xy/games/educational/quitter/cargo.html
Keyword list processing is extremely fast due to the nature of algorithm implemented. In fact, since the keyword is matched anywhere in the URL, it can be faster than URL List matching depending on the nature of the match.
It is important to note the URL matching applies auto wild carding and a form of regular expression analysis. Keyword analysis in most cases may not be acceptable. From a performance standpoint keyword matching is very fast.
The table below contains examples of different list entries. It’s important to understand in which deployment we would choose one method over another. In the following test, 'this is a test' has been used for Keyword, Multiple Keywords, Whole Word and Multiple Words.
Legend: |
|
X | This indicates that the tested URL matches the list entry and is denied |
O | This means that it does not match and it is allowed. |
URL Used: | Keyword | Multiple Keywords | Whole Word | Multiple Words |
X | X | X | X | |
X | X | X | X | |
X | X | X | X | |
X | X | O | O | |
X | X | O | O | |
O | X | O | X | |
O | X | O | X | |
O | X | O | O | |
O | X | O | X | |
O | X | O | X | |
O | X | O | O | |
O | X | O | X | |
O | X | O | X | |
O | X | O | O | |
O | X | O | X | |
O | X | O | O | |
O | X | O | X | |
O | X | O | O | |
O | X | O | O | |
O | X | O | O | |
O | X | O | O | |
O | O | O | O | |
O | X | O | O | |
O | X | O | O | |
O | X | O | O |