You are here: Running solutions in production > The Monitor > Searching

Searching

In the MD Link Monitor, using the controls labelled "Search" near the top of the window, you can perform searches in the History Log, Input Queue, or Output Messages tabs. Search does not apply to the Summary tab. MD Link provides four types of search, which you can choose from the drop-down list next to the search text field. The types of search are: "Full Syntax", "Literal string", "Raw literal string", and "HL7".

On this page:

Full Syntax

Full Syntax Search - Introduction

"Full syntax" searches are based on entire words as they appear in your messages in the MD Link service's database. MD Link considers a word to be any string of letters and/or numbers. Words do not include punctuation or other characters. Therefore characters such as these: _ \ / - . ^ ~ \ & are considered word-break characters. For example, an HL7 segment such as "PID||575^^^2^ID_1|" contains five words: PID, 575, 2, ID, and 1.

This breaking up of messages into words is how MD Link implements its "fast search" feature. The reduced flexibility of word-based search (compared to an arbitrary string search) is what gives the better performance. This increase in performance is significant: it gives "Full syntax" searches the ability to search an MD Link database containing millions of messages in a few seconds. Without it, the same search would take minutes or hours.

With the full syntax, you can search for ADT or A01 but there is no way to search for ADT^A01 exactly, because the "^" character is neither a letter nor a number. (You can however search for both ADT and A01 - more discussion on that is further down this page.)

If you use full syntax and you try to search for non-letter, non-number characters such as ADT^A01, OB/GYN, or 24.00, the results will often be undesirable, because some of those characters represent options in the full syntax, unrelated to the non-word character that you were trying to search for. In other cases those characters will be ignored by the search engine and you might get results deceptively close to what you intended. Either way, the results can be surprising and undesirable. Therefore we recommend that you omit non-letter, non-number characters from your full syntax search, unless you intend to use any special syntax as described in this document.

Full Syntax Search - Wildcards

Wildcard patterns are supported with * and ?. Below is a table of examples, showing which wildcard queries match which message strings.

			Query:
		ID0*	ID0?	ID0*0	ID0?0
Message string:	ID01	Yes	Yes	No	No
	ID010	Yes	No	Yes	Yes
	ID0110	Yes	No	Yes	No
	ID0111	Yes	No	No	No

The * and ? operators do not work at the start of a wildcard pattern - only in the middle or at the end. For example, if you search for *01, however sensible that is, it is not supported by MD Link, and will show no results. This means that you need to know the first few characters of the words you are searching for in order for the wildcard syntax to work. For an alternative, see the section on Regular Expressions below.

Full Syntax Search - Multiple search words

If you use more than one word in your query - for example, MFN M07 - then all messages which contain either MFN or M07 will be shown - for example, both MFN^M07 and MFR^M07 messages. Put another way: by default, multi-word searches are implicitly joined by an "or" operator, not an "and" operator. This can lead to a surprising result: as you add more words to your search, you will "widen" the search and tend to get more messages matching. You might have expected that adding more words would narrow the search instead.

To narrow your search to show only messages that contain both MFN and M07, use the plus sign: +MFN +M07.

You can combine terms in brackets to do some boolean logic. For example (+A04 +McDOUGALL) (A03) will show all messages which either contain both A04 and McDOUGALL, or A03.

You can negate a search term using the minus sign. For example -A03 will show all messages that do not contain A03. -A03 -ADMIT will show all messages which contain neither A03 nor ADMIT.

Combining a negative term with a positive term requires care. Since the default behaviour of adding more terms is to widen the search, a search for McDOUGALL -A03 will match all messages which either contain McDOUGALL or don't contain A03. If you want to match all messages that contain McDOUGALL and don't contain A03, search for McDOUGALL +(-A03)

Another result of this default widening behaviour is that if your search contains one word with a plus sign and several without - for example +McDOUGALL A03 A04 A05 - the A03 A04 and A05 parts won't affect your search results. Only the +McDOUGALL part does. This might be surprising, but it is logically consistent. The plus sign means "must match" so each message much match McDOUGALL. So if a message matches McDOUGALL then it is shown, otherwise it isn't. A03 A04 and A05 don't affect the list of matching messages that appear, but they still affect the visual highlighting of the text when you select one of those matching messages.

Full Syntax Search - Boolean Operators

Some boolean operators are supported: AND OR and NOT. However, the rules for these operators are not intuitive, and they don't provide any clear benefit compared to the "+" and "-" syntax described above. We recommend against using AND OR and NOT.

Full Syntax Search - Fuzzy Searches

Use the tilde operator, you can do a "fuzzy search", which means an approximate search. This is useful for searching for misspellings. For example if you search for McDOUGALL~ that will match McDOUGALL as well as McDOUGAL and MacDOUGALL. Furthermore, there is an optional parameter called the "edit distance" which controls the level of fuzziness - that is, how many characters are allowed to be different between the word in your query and the word in the message. For example McDOUGALL~1 will match McDOUGAL (one L removed) but not McDOUGA (2 Ls removed). McDOUGALL~2 will match both McDOUGAL and McDOUGA. The maximum edit distance is limited. The parameter is silently capped at 2, so searching for McDOUGALL~3, McDOUGALL~4, and so on is the same as searching for McDOUGALL~2. The default parameter value is 2 - so searching for McDOUGALL~ is the same as searching for McDOUGALL~2.

Full Syntax Search - Phrase Searches

You can search for several words in a row using something called a "phrase search". For example, consider four messages containing these pieces of text:

PATIENT REPORTED SUDDEN ONSET CHEST PAIN
PATIENT REPORTED SUDDEN ONSET OF CHEST PAIN
PATIENT REPORTED SUDDEN ONSET OF SEVERE CHEST PAIN
NO CHEST COMPLAINTS - HOWEVER PATIENT REPORTED SUDDEN ONSET STOMACH PAIN
KNEE MOBILITY ISSUE, NO PAIN

Say your goal is to match the first three messages, and not match the last two. The options described on this page so far offer no obvious way to do that:

If you search for sudden onset chest pain that will match all five messages.

If you search for +sudden +onset +chest +pain that will match the first four messages.

You need a search option that takes the word order into account. This is known as a "phrase search" and you enable it by enclosing your search words in double quotes.

If you search for "sudden onset chest pain" (including the double quotes) that will match message #1 only.

In order to accommodate the words "of" and "severe", you need an additional option of "phrase search" that allows you to skip words.

If you search for "sudden onset chest pain"~1 that will match messages #1 and #2. The "1" after the tilde is the "proximity" parameter, and allows 1 word to appear between "onset" and "chest" (or between any other pair of words for that matter).

If you search for "sudden onset chest pain"~2 that will match messages #1, #2, and #3.

There is no maximum for the "proximity" parameter. This is unlike Fuzzy Searches, which have a maximum of 2. Fuzzy Searches can be mistaken for Phrase Searches because they both use the tilde character.

Full Syntax Search - Regular Expressions

Regular expressions searches are supported, with a limited syntax. This is distinct from the "wildcard syntax" explained above, and more flexible. To specify a regular expression search, begin and end your query with forward slash ("/") characters. For example:

/.9./	Matches all words that contain the number 9 anywhere in the word.
/ID[01].*/	Matches all words that start with ID, followed by either a 0 or a 1.
/[a-z]+[0-9]+.*/	Matches all words that start with one or more letters from a-z, followed by one or more numerical digits.
/[0-9]{5,}.*/	Matches all words that start with five or more numerical digits.

For a thorough explanation of this regular expression syntax, see this page.

Regular expression searches are sometimes slow. If you encounter undesirable slowness, convert your regular expression query into one that uses wildcard syntax instead, if you can.

Note that regular expressions allow you to do a search without knowing the start of the word. This is more flexible than the wildcard syntax. It is also the reason why regular expression searches are sometimes slow.

Regular expression searches can be combined with other, non-regular-expression search terms. For example, +/.*9.*/ +ID1 would match all messages that both a) contain one or more words that contain the number 9 , and b) contain the exact word ID1. (The "+" signs accomplish the "both" part of this query: they function as an "and" operator. Without them, this search - like all multi-term searches - would act as though it had an "or" operator between the terms.)

Full Syntax Search - Case sensitivity

Simple full syntax searches are case-insensitive. A search for assessment will match all messages containing assessment, ASSESSMENT, Assessment, and so on.

If your full syntax search contains wildcards or regular expressions, those parts are case-sensitive. But MD Link stores only a lower-case copy of the messages in its search index. Therefore your search for ASSESS* will return no results. To avoid this, it is safest to always use lower case in your search string - for example assess*.

Literal String Search - Introduction

Literal String search exists to address a limitation of the Full Syntax search. That limitation is the inability to search for delimiter characters. For example, if you want to search your messages for 2.5, there is no way to do that with the full syntax. There is a partial workaround: do a full syntax phrase search for "2 5" (with quotes). However this would also match any message containing "2/5" or "2_5".

With literal string search, you can simply search for 2.5 and you will get the correct results.

All literal string searches are case-insensitive.

Literal string search does not support searching for a string that is entirely delimiter characters. For example, searching for ^ or ... will return no results.

Literal String Search - Performance

In most cases, a literal string search will be as fast as a similar full syntax phrase search. The exception is if you are unlucky in terms of the statistical distribution of your messages, in that you have many messages that match on the word parts of your search, but not the delimiter parts. For example, if you do a literal string search for 2.5 and your database contains 1000 messages that contain "2_5" for every one message that contains "2.5", then your literal string search will take minutes instead of seconds.

Raw Literal String Search - Introduction

The Raw Literal String search option is almost identical to the Literal String search option. It differs in how it deals with three characters: the ampersand, less-than, and greater-than characters (& < and >). The reason for this special handling is due to the rules of XML serialization which state that whenever one of these characters appears in XML text content, it requires an "XML entity", which is an escape sequence that appears in the text instead of the character itself. All messages in MD Link - including HL7 messages - are contained in XML documents, so these XML escaping rules apply to all messages in MD Link, and MD Link's search options need to address them.

Raw Literal String Search - Example: "&" character in HL7 messages

Consider the following output for an HL7 Socket Event, as it would appear in the MD Link Monitor:

<?xml version="1.0" encoding="UTF-8"?>

  <HL7SocketEvent_data>

    <message>MSH|^~\&amp;|EPIC|EPICADT|SMS|SMSADT|199912271408||ADT^A04|1817457||2.5|

...

UB2||||||||71&amp;Prior Stay Date&amp;NUBC^20190106^20190107|

...

Note the XML elements enclosing the HL7 message, and the three instances of "&". "&" is the XML entity escape sequence here. Each instance of "&" represents one ampersand. The regular (non-raw) literal string search option handles these so that if you searched for 71&prior stay date that would match the "71&Prior Stay Date" part of the message. Similarly if you did a regular literal string search for MSH|^~\&|EPIC that would match the "MSH|^~\&|EPIC" part of the message.

This behaviour of the regular (non-raw) literal string option is convenient for those times when you are typing your search while thinking in terms of HL7. It is inconvenient when you want to copy a string from the Monitor's message display and paste it in the Monitor's search text field. If you copied MSH|^~\&|EPIC and did a regular literal string search for that, it would not match the above message. But a raw literal string search would.

Raw Literal String Search - Example: including both XML markup and text content in query

Another use case for raw literal string search is searching for XML markup and text content at the same time. For example, consider this output message for an HL7 Parser Task:

<?xml version="1.0" encoding="UTF-8"?>

<ADT_A04>

  <MSH>

...

    <MSH.3-Sending_Application>

      <HD.1-Namespace_ID>EPIC</HD.1-Namespace_ID>

    </MSH.3-Sending_Application>

...

A raw literal string search for <HD.1-Namespace_ID>EPIC</HD.1-Namespace_ID> will match that message. A regular literal string search for the same query string will not match it, because behind the scenes MD Link will convert all instances of < to < and > to > effectively converting that regular literal string search into a raw literal string search for <HD.1-Namespace_ID>EPIC</HD.1-Namespace_ID>, which does not exist in the message.

HL7 Search - Introduction

With HL7 search, you can search for messages that have a certain HL7 field or component equal to a certain value. For example, you can search for PID-5 = DOE^JOHN or PID-5-2 = JOHN.

Only pipe-delimited HL7 v2 messages are searched - not XML-formatted HL7.

HL7 search - like literal string search - can handle delimiter characters in the value. For example, you can search for ORC-3 = H50966_20181101113700 or MSH-9 = ORM^O01.

HL7 search looks for an exact match on the entire HL7 value - not a substring match. For example, if you search for MSH-10 = 2019, you should not expect to match all messages with an MSH-10 value that merely starts with "2019". You will match all messages with an MSH-10 value of exactly "2019" (which are unlikely to exist with real-world HL7 messages). Also, a search for PID-5 = DOE^JOHN will not match if the message contains "DOE^JOHN^^^^".

HL7 search does not support patterns or logic, like Full Syntax search does.

HL7 search is case-sensitive on the HL7 field or component (specifically: on the segment ID within it), and case-insensitive on the value.

HL7 search doesn't support searching on a value that is entirely delimiter characters, or empty. There needs to be at least one letter or number in the value.

In summary, an HL7 search is like a literal string search, but narrows down the results more through the specified HL7 field or component.

HL7 Search - Repetitions

HL7 search has some options for searching on segment and field repetitions.

PID-3[2] will search on the second repetition of PID-3, on all components.

PID-3[2]-1 will search on the second repetition of PID-3, on the first component only.

OBX[5]-3 will search on the fifth OBX segment, field 3, and on all field repetitions and components within field 3.

OBX[|1=8]-3 will search for an OBX segment with a field 1 equal to 8, and then on field 3 of that OBX segment.

PID-3[^4=hosp]-4 will search for a field repetition of PID-3 such that the fourth component of that repetition equals "hosp", and then on component 4 of that field repetition.

OBX[|3=Score^^ScoringSystem2]-5 will search for an OBX segment with a field 3 equal to "Score^^ScoringSystem2", and then on field 5 of that OBX segment.

HL7 search uses the same syntax to specify fields and components that the MD Link Jython interface calls the "alternate syntax". More example on this syntax in the Jython context are here. All of the examples on that page will work with HL7 search.

HL7 Search - Supported and unsupported use cases

HL7 v2 pipe-encoded messages are supported. HL7 XML formats - such as v2 XML or v3 - are not supported.

Non-standard HL7 encoding characters (for example, the "broken pipe" character "¦") are supported, as long as those encoding characters are not letters or numbers.

HL7 messages contained in a single XML element are supported. They can be surrounded by whitespace. HL7 messages surrounded by non-whitespace characters, or with XML markup in the middle of the message, are not supported.

Complete HL7 messages - starting with an "MSH" segment - are supported. Text fragments consisting of only an HL7 segment or field are not supported.

HL7 search is technically supported for MD Link logs, but given the above limitations (in particular, the requirement that the HL7 message is contained in an XML element) it is unlikely that you will have logs that can be searched with HL7 search, in a real-world situation. HL7 search is designed for searching MD Link output messages and input queues - not logs.

HL7 Search - Performance

In general, an HL7 search will perform as though the HL7 field or component was ignored, and you did a Literal String search on the same value. That is: in most cases an HL7 search is roughly as fast as a typical Full Syntax search, and will return in seconds. There are some exceptions to this good performance.

For example, if you did an HL7 search for OBX-5 = >50.0 and your database contained 1000 messages with an OBX-5 of "<50.0" for every one message with an OBX-5 of ">50.0", then your HL7 search would take minutes instead of seconds.

Another example is if you did an HL7 search for OBX-5 = 2.5 and 100% of your messages both contain at least one OBX segment and have an MSH-12 of "2.5". Then your HL7 search would take minutes or even hours instead of seconds.

All search types - Performance and the "fast search" feature

In order to make your searches in the Monitor's Output Messages tab fast, you need to do two things:

Enable the "Index component output for fast searching" solution-level property, for all of your solutions.
Enable the "Use Fast Search Index" checkbox under the Monitor's "View" menu.

Both of these steps are necessary regardless of which of the four search types you use.

You can perform a search with any of the search types described above no matter which kind of node you have selected in the Monitor's tree - service, solution, or event/task. Following the two steps above will make your searches in the Output Messages tab faster regardless of which kind of tree node you have selected, but the gains will be most significant if you have a solution or component node selected (not a service node). Therefore we recommend that you do searches with a solution or event/task node (not a service node) selected in the tree whenever possible.

This high performance search is only available for the Output Messages tab - not the Logs or Input Queue tabs. The same search syntax is available for all of the tabs, but the high performance is only available for the Output Messages tab.

All search types - Implementation details

MD Link's "fast search" feature - for all four search types (Full syntax, Literal string, Raw literal string, and HL7) - uses the Apache Lucene library as its back-end. The syntax described in this document under Full Syntax is a subset of Lucene's syntax.

If you are using Full Syntax and your syntax is incorrect, your search will fail silently and return no matches. There will be no error message that helps you diagnose what is wrong with your search.

MD Link does not sort search results by best match. They are always sorted in descending order of the message timestamp. MD Link does not show you any match ranking either. Each message in the service's database either matches your query, or it doesn't.

You can only search on the message XML content field in the search text field. You can't search on the timestamp or other fields. If you happen to look up Apache Lucene search syntax yourself, independently of this MD Link documentation, you might find some examples of multi-field searches. They won't work with MD Link.

All search types - Limitations

The search feature has a word length limit of 255 characters. Words longer than this are still searched, but they are searched as though they were split up to meet the 255-character limit. This may lead to some messages matching your searches that you wouldn't expect, if you have words longer than 255 characters in your messages, because MD Link will see the 256th character as the start of a new word.