A.1 Basic Search Query

A basic query is a search for a value on a field. The syntax is as follows:

msg:<value>

The field name (msg) is separated from the value by a colon.

For example, to search for a phrase that includes the word “authentication,” you can specify the search query as follows:

msg:authentication

Or, to search for events of severity 5, you can specify the search query as follows:

sev:5

If the value has spaces or other delimiters in it, you should use quotation marks. For example:

msg:"value with spaces"

Sentinel classifies event fields as either tokenized fields or non-tokenized fields. A tokenized field is indexed and is searched differently than a non-tokenized field.

A.1.1 Case Insensitivity

Indexing and searching in Sentinel is not case-sensitive, unless you are using wildcards in a search query. To know more about wildcards query, see Section A.1.5, Wildcards in Search Queries. For example, the following queries are all equivalent:

msg:AdMin
msg:admin
msg:ADMIN

A.1.2 Special Characters

If you include special characters as part of a search, the special characters must be escaped. These characters are as follows:

+ - && || ! ( ) { } [ ] ^ " ~ * ? : \ /

Use “ \” before the character you want to escape. For example, to search for ISO/IEC_27002:2005 in the rv145 (Tag) field, use the following query:

rv145:ISO\/IEC_27002\:2005

You can also use quotation marks around the query:

rv145:"ISO/IEC_27002:2005"

If the value contains quotation marks, you must escape it by using the “\” character instead of quotation marks. For example, to search for “system “mail” service” in the initiatorservicename field, you must specify the query as follows:

sp:"system \"mail\" service"

A.1.3 Operators

Lucene supports AND, OR, and NOT Boolean operators, which allow words to be combined. Boolean operators must be always capitalized.

OR Operator

The OR operator is the default conjunction operator. If there is no Boolean operator between two clauses, the OR operator is used. The OR operator links two clauses and finds a matching event if either of the clauses is satisfied. The symbol || can be used in place of the word OR. For example, consider the following query:

sun:admin OR dun:admin

This query finds events whose initiator username or target username is “admin.” The following query produces the same result because OR is used by default:

sun:admin dun:admin

AND Operator

The AND operator links two clauses and finds a matching event only if both clauses are satisfied. The symbol && can be used in place of the word AND. For example, consider the following query:

sun:admin AND dun:tester

This query finds events whose initiator username is admin and the target username is tester.

NOT Operator

The NOT operator excludes events that match the clause after the NOT. The symbol ! can be used in place of the word NOT. For example, consider the following query:

sev:[0 TO 5] NOT st:I NOT st:A NOT st:P

This query matches all events whose severity is between 0 and 5, but excludes those whose sensor type is I (internal), A (audit), or P (performance); that is, it excludes Sentinel internal events.

The NOT operator cannot be used by itself because it is a way to exclude events from a set that has been found by other search terms. For example, consider the following query:

NOT st:I NOT st:A NOT st:P

This query might seem like it should return all events where the sensor type is not I, A, or P. However, it is an invalid query because a query cannot begin with the NOT operator.

Operator Precedence

Parentheses can be used in the usual way to change operator precedence. They can be nested to any depth, as shown in the following examples:

(sun:admin OR dun:admin) AND (sip:10.0.0.1 OR sip:10.0.0.2) 
((sun:admin OR dun:admin) AND (sip:10.0.0.1 OR sip:10.0.0.2)) OR (msg:user AND evt:authentication) 

A.1.4 The Default Search Field

Lucene uses a default search field, which is the field that is searched if no field is specified. In Sentinel, _data is the default search field. By default, the default search field is a concatenation of the following event fields:

evt,msg,sun,iuid,dun,tuid,sip,sp,dip,dp,rv42,shn,rv35,rv41,dhn,rv45,obsip,sn,obsdom,obssvcname,ttd,ttn,rv36,fn,ei,rt1,rv43,rv40,isvcc

The default search field is indexed and searched as a tokenized field. The result is that you can search for words that might appear in any event field.

You can also customize the set of event fields that are concatenated in the default search field by adding the indexedlog.datafield.ids property in the configuration.properties file. For more information, see Customizing the Default Search Fieldin the Sentinel Administration Guide.

For example, suppose you have two non-tokenized fields in an event, sun (initiatorusername) and dun (targetusername). The sun field has the following value:

report-administrator

The dun field has the following value:

system-tester

The _data field contains the concatenation of these fields separated by a single space character:

report-administrator system-tester

Because the _data field is a tokenized field, the words “report,” “administrator,” “system,” and “tester” are indexed and searchable. The following queries would find this event:

report
_data:report
report-administrator
_data:report-administrator
report tester

In addition, the following queries also find this event:

sun:report-administrator
dun:system-tester

A.1.5 Wildcards in Search Queries

Sentinel supports wildcards in search values but not in regular expressions:

  • The asterisk (*) matches zero or more characters.

  • The questions mark (?) matches any one character.

For example:

  • adm*test: Matches admtest, ADMTEST, admintest, adMINtEst (note the lack of case sensitivity).

  • adm?test: Matches adm1test and AdMatest. Does not match admtest or ADMINTEST because it must have exactly one character between "adm" and "test."

NOTE:The usage of wildcards varies for tokenized and non tokenized fields. For more information see, Section A.1.7, Tokenized Fields and Section A.1.8, Non-Tokenized Fields.

A.1.6 Leading Wildcards

Leading wildcards are not valid in searches because Lucene does not allow the * or ? characters to be the first character of a search value. For example, the following queries are invalid:

  • sun:*adm* The semantic is “find any event whose initiator username value contains the letters a, d, and m in sequence.“

  • sun:*tester The semantic is “find any event whose initiatorusername value ends with “tester.”

  • sun:* The semantic is “find any event whose initiator username field is non-empty.”

    Because this is an important type of query, Sentinel provides an alternative way to accomplish this. For more information, see The notnull Query.

A.1.7 Tokenized Fields

Fields that are classified as tokenized fields are parsed into individual words for indexing. Therefore, a search occurs only on words within the field value. Characters that are considered to be word delimiters are not searchable, nor are words that are considered to be stop words. Lucene removes extremely common words to save disk space and speed up searching. These words are ignored in search filters. Currently, the following stop words are removed:

  • a

  • an

  • and

  • are

  • as

  • at

  • be

  • but

  • by

  • for

  • if

  • in

  • into

  • is

  • it

  • no

  • not

  • of

  • on

  • or

  • such

  • that

  • the

  • their

  • then

  • there

  • these

  • they

  • this

  • to

  • was

  • will

  • with

With Wildcards

Wildcards are applied differently to tokenized fields and non-tokenized fields. Wildcards for tokenized fields match only words that were parsed from the value and not the entire value.

Quoted Queries

For tokenized fields, when wildcards are quoted then they are not treated as wildcards, but as word delimiters. For example, consider the following query:

msg:"user* fail*"

The search value "user* fail*" is parsed into two words, “user” and “fail.” The semantic is "find any event where the msg field contains “user” AND “fail” words in that order, and there are no intervening words between them.” Thus, it does not match the following value:

The user authentication has failed on the server.

This is because the wildcard is not treated as a wildcard but as a word delimiter.

Non Quoted Queries

For tokenized fields, when wildcards are used then the searches is done on the words within the value and not on the full value.

For example, if you specify the search query msg:authentication*failed to search for the message The user authentication has failed on the server, it does not return the events with this message. This is because “*” does not match anything between “authentication” and “failed.” However, it matches any words that begin with “authentication” and end with “failed.” For example, it returns results if any of the following words are used: “authenticationhasfailed,” “authenticationuserfailed,” and “authenticationserverfailed.”

Without Wildcards

Word parsing in Lucene can be done for tokenized fields without wildcards for quoted and non quoted queries.

Quoted Queries

When it does a search, Lucene examines all of the words in a field and tries to match words in the search value. For example, suppose that you specify a search for messages containing the following value:

msg:"user-authentication failed on the server"

The words that are parsed within this value are “user,” “authentication,” “failed,” and “server.” These are the only search words that would match this value. “On” and “the” are omitted because they are stop words.

The value has the hyphen character (-) between some words. Hyphens are treated as word delimiters, so Lucene does not search for hyphens. Consider, the following query:

msg:"user-authentication"

The results might not be exactly what you expect. The query search value matches the value, but not because it is matching the hyphen. It matches because Lucene first parses the words in the search value and identifies the words “user” and “authentication.” Lucene then matches those words against values that have the words “user” and “authentication” with no intervening words in between. This query would also match the following value, even though there is no hyphen between “user” and “authentication”:

user authentication has failed on the server

Consider the following query:

msg:"failed on server"

This query has the stop word, "on," which is ignored. However, the stop word does affect the relative positioning that is expected to be between words when evaluating a value to see if it matches. The “failed on server” search matches any phrase where the words failed and server are separated by exactly one word. It does not matter what the word is because the separating word is a stop word and is ignored. Thus, the above query would match all of the following:

failed on server
failed-on server
failed a server
failed-a-server

Proximity indicators created by using the ~ character followed by a value, make this more complicated. The query dictates an expected distance between words. In the “failed on server” query, the expected distance between “failed” and “server” is one word. The proximity indicator specifies how much variance there can be from the expected distance. For example, consider the following query, where a proximity indicator of one (~1) is specified:

msg:"failed on server"~1

This query indicates that the distance between “failed” and “server” could be plus or minus one from the expected distance, which is one because of the stop word “on.” Thus, the distance could be 1, 1-1 (0), or 1+1 (2). Thus, all of the following would match:

failed on server
failed on the server
failed finance server

As of Lucene version 3.1, word parsing is done according to word break rules outlined in the Unicode Text Segmentation algorithm. For more information, see Unicode Text Segmentation.

For information on tokenized fields in Sentinel, in the Sentinel Main interface click Tips on the top right corner of the Sentinel Main interface. A table is displayed that lists all the event fields and whether an event field is searchable or not.

Non Quoted Queries

For tokenized fields, following non quoted query can fetch the search result user authentication has failed on the server, as it considers each word for the search:

msg:user authentication

The above query matches the events because Lucene first parses the words in the search value and identifies the words “user” and “authentication.”

A.1.8 Non-Tokenized Fields

Fields that are classified as non-tokenized fields are parsed fully for indexing. Thus, a search occurs on full field values.

With Wildcards

Wildcards for non tokenized fields match only words that were parsed from the value and not the entire value.

Quoted Queries

If the non tokenized field with wildcard queries are quoted, then the wildcard will be considered as delimiter, so it will not fetch the expected results.

For example following query can not fetch the search result "Firewall":

rv31:"fire*"

Non Quoted Queries

If the non tokenized field with wildcard queries are non quoted, then the search query must contain lowercase characters and those must be the initial characters/word of the expected query result.

For example, if the expected result is DNS Proxy, for the field rv150, then the search query must be as follows:

rv150:dns*

However, the following query does not fetch the expected search result:

rv150:proxy*

This is because, in the above example the search term proxy is not the initial character/word of the expected search result.

Without Wildcards

For non tokenized fields, the search occurs for full field values as its non tokenized.

Quoted Queries

Fields that are classified as non-tokenized fields are parsed fully for indexing. Thus, a search occurs on full field values. For example, to search events whose initiatoruserfullname (iufname) field has the value “Bob White”, you must specify the query as follows:

iufname:"Bob White"

However, following examples will not fetch the result:

iufname:"Bob"
iufname:"White"

Non Quoted Queries

For non quoted queries, the search query must have the entire value of the expected result.

Example queries for Non Quoted Non Tokenized Fields without Wildcards:

 rv150:DNS Proxy

The above query will fetch the expected result as DNS Proxy.

 rv31:firewall

The above query will fetch the expected result as Firewall.

However, the following set of queries could not fetch the expected search results, as indexing for non tokenized field is done with lowercase and complete field value.

rv150:DNS
rv150:dns
rv31:Fire