For a recent project I was involved in building some rather open-ended reporting functionality. Although my preference is to identify clear requirements to develop a targeted solution and avoid over-engineering, in this case it was not possible.

One of the huge strengths of Domino is full text searching. In the post-8.5.3 world there is even less reason to avoid them, because full text indexes can be stored on an external drive using the FTBasePath notes.ini variable and FTSearchSorted can be used to sort the search results. For Windows I also found code to write the search string to the Search Bar and process the search. And in IBM Notes Social Edition the combobox to change the search results sorting is no longer hidden away in the More… section (hopefully that will not be removed prior to GA).

But there are a few gotchas which may not be well known. Certainly I was not aware of some of them.

UNK Table – Be Careful With Fields

I have done virtually no development on relational databases. But from what I have heard, one of the significant differences is the ease of adding / changing fields in Domino. If you want to change a field from Text to Number, you can. It won’t corrupt the database, the most immediate impact will be an error if a user tries to save a pre-existing document that has a non-numeric value in that field. You can use the same field name on different forms with different data types. It doesn’t throw an error and the database will function perfectly happily in many respects.

Except full-text searching.

Fields are stored in the UNK (UNique Key) table in a database, along with the datatype. That is the datatype of the first field encountered when the UNK table was built. So say you have an existing database with a pre-existing Full Text Index and you change a Text field to a Date/Time field, for example. If you select that field in the search bar you will still have comparators of contains and does not contain, not the date/time comparators.

The only way to rebuild the UNK table is to change the datatype in all Forms or Subforms in the database, delete the Full Text Index, ensure all users are out of the database, and do an offline compact (thanks to Duffbert for that one!)

If you have the same field with different datatypes (fortunately not a scenario I’ve encountered), it gets more complex. You’ll need another field name. So be careful when creating fields or changing field data types.

Readers Fields

If the database has Readers fields and you do NotesDatabase.FTSearch(), it searches all documents and returns all documents that match. It doesn’t return all documents that match that the user has access to. The FTSearch appears to be against a textual dump of the database, not against the documents themselves, respecting access rights. NotesView.FTSearch(), however, does respect Readers fields. The only reference I’ve found to that was a comment on a blog post by Andre Guirard, the comment from Sjef Bosman.

There is not a security hole here, because the document is available in the collection, but no fields are accessible and nothing can be output from it. The Size NotesItem has an integer value 0, which is the check that needs to be made in any code iterating over the documents.

Strings Containing Double Quotes

If you’re trying to match against an exact string, it needs to be enclosed in double quotes. The challenge then is how to match against an exact string that already contains double quotes. LotusScript experts will know that the editor accepts curly braces ({…}) or pipes  (|…|) as an alternative to double quotes. Full-Text searches do not.

If you remove the double quotes, you won’t get a match (because it doesn’t match the value that does have the double quotes). You can’t use a wildcard operator (*) instead of the internal double quotes. You can’t use ? instead of the double quotes. The best option I’ve found (though by no means ideal) is to replace the internal double quotes with a pipe. You get more matches than required, but it’s the only way to get the matches to include the documents you are looking for.

Length Restrictions

Occasionally there are dialog lists that could hold long values. If you’re matching a text field there is a restriction around 128 characters (on one occasion I was able to pass in 129 characters, but on all others I couldn’t, so I’m considering 128 as the maximum). It’s probably a rare scenario and even rarer that two possible values have the same first 128 characters. The workaround I used was to break off at the end of the last word prior to 128 characters.

There is also a restriction on the overall Full-Text search criteria. It’s a big one, but one we added a check in, just for completeness. If the search criteria are too long, nothing gets pasted into the search bar and the back-end search fails. The maximum we used was 15900 characters, as a nice round number.

Summary

Most of what I’ve found here is not new, but it’s worth highlighting what I’ve found, if only so I can find all the references if I have to search for the references in a year or so!

Update – Indexing, Updating the Full Text Index and Latency

I was aware that even the Full Text Index setting immediate is not necessarily immediate, particularly on a busy server. I’m by no means an admin, but it appears you can use allocate full-text indexing to a separate tab by using the notes.ini variable UPDATE_FULLTEXT_THREAD=1 on the server.

You can force an update using UpdateFTIndex, but herein lies another thing I learned amidst the bulk of research for this project. If Readers fields are used and UpdateFTIndex is triggered by a user without access to all documents in the database, the only documents added to the Full-Text Index will be those the current user has access to. This is a good reason for using sessionAsSigner in Server-Side JavaScript or Java.

18 thoughts on “Full Text Search Musings”

  1. Hi,

    Domino Development here.. to you points:

    UNKs – matching field names to internal numbers has ALWAYS been the practice of FTI in Domino, and it’s sensitive to changes in the UNK table, absolutely (we advise people to rebuild FT indexes periodically anyway). And mixed types with the same field name? That’s just bad design.

    Reader Lists – we have to look into that.

    Double quotes – try prepending your query with a backslash (you can’t prepend only part of a query)

    Length – I had thought the limit was 64k, the Domino per-NRPC limit, but perhaps there are shorter limits somewhere along the way.

    Thanks for the write-up and keep on searching!

    John Curtis

    1. Thanks for taking the time to respond.

      I agree completely that changing data types or duplicate field names with different data types is definitely bad design. I don’t think (i.e. desperately hope!) I’ve never done it and would hope I never come across it.

      The backslash doesn’t seem to work. After a bit more re-investigation, passing as [Field_Name] = “My “quoted” Value” seems to return documents matching the string either with or without quotes, so is probably the best.

      1. Right .. the presence of [Fieldname] definitely won’t work with the backslash. The backslash is to send the whole query, verbatim, to GTR.

        Latency in FTIs is 15 minutes from the last database close or completed scheduled replication. It’s settable but only server-wide. But all the APIs support some way of forcing the index to be updated.

        -John

  2. It’s useful to have all these issues documented in one place – I think I will bookmark this page on Delicious!

    That UNK issue has been tripping me up for years – it would be nice if there was an easier way to update the table.

  3. Paul, really useful post.

    I would love to use FTI more as the means of filtering “view” data. Our users like the idea of drop down filters above each column in XPages but we have not found an efficient way of doing this on large views.

    I never quite trust FTIs though. My experience is that sometimes it takes a while for the FTI to update and other times it seems inconsistent.

    I have always worked to the scheme that if it is absolutely important that all of the correct documents are returned then FTI cannot be trusted.

    For field names we always add a suffix that relates to the type e.g author_nm or address1_tx or body_rtx. This was originally used to stop us from comparing numbers with strings.

  4. Sean

    Philippe Riand’s JDBC Access project on OpenNTF is probably going to be the best resource for view filtering on any column. I only investigated it for a little while earlier in the year and had problems getting it working, but my understanding is that once the external index of the database is created, any updates are not channelled through view indexes but written directly from note updates to the external index.

    Full text indexes are not immediate to update, but for reporting purposes I’ve usually found it sufficient. When educating users about full-text searching I always explain that the results may not include documents just updated. Thankfully there are alternate options if users need to find documents based on specific criteria.

  5. Nathan T. Freeman

    That “bypass” of readers fields also exists with a Database.getAllDocuments(). I’m sure there’s lots more places where the behavior exists. I think that’s why Document.isValid() is there.

    1. IsValid() just checks it’s not a deletion stub. I’m sure both IsValid and IsDeleted were true for docs the user didn’t have access to. Fortunately doc.Size=0 identified whether or not the user had access to it.

  6. Nathan T. Freeman

    “And mixed types with the same field name? That’s just bad design.”

    John, then give us a means to prevent it at the API level. “Bad design” is blaming the customer. There is no means to enforce a schema on an NSF.

  7. I’ve been experiencing a strange error lately when performing a FTSearch (through the REST service control) using the wildcard character (*). The error I get is : Notes error : Not enough memory for Full Text Indexing or Search. I only get this error when using the following searchstring syntax : ([Ansvarlig] contains “*johnsen*”) AND ([Prioritet] contains “*3*”).
    If I remove the wildcard from the Prioritet field part : ([Ansvarlig] contains “*johnsen*”) AND ([Prioritet] contains “3”) I don’t the the error…
    Both fields, Ansvarlig + Prioritet, are text fields according to the UNK table (checked using Notespeek). Server is Win2008 R2.
    Any idea to why I experience this strange error ?

    1. From my experience and from what John Curtis was saying, wildcard operators can’t be used within a Full Text search against a single field. If your search terms are complete words (i.e. you’re not trying a fuzzy search, but the field may still contain additional content), you should be able to use the “Contains” comparator and still return matching documents. I have successfully done that where the comparison string for the field was too long.

      1. Are you sure ? When I use : ([Ansvarlig] contains “ohn”), I get no results.. Also tried : ([Ansvarlig] contains ohn), also with empty results.. But, if I use : ([Ansvarlig] contains “*ohn*”), I get the correct result..

          1. Yes, that will work, as long as the text you’re searching for doesn’t contain one of the keyword like “and” or “or”. Then they need to be escaped or the whole thing surrounded with double quotes. Single words are fine though.

  8. ok, thanks for your feedback ! Btw, ([Ansvarlig] contains “*ohn*”) returns documents where the field Ansvarlig e.g contains Johnson or John or Johnny

    To my first question : Did you ever experience the error : Notes error : Not enough memory for Full Text Indexing or Search ?

    Thanks !

    1. No, I’ve not seen that one. I know if you programmatically use FTSearch on a database that’s not Full-Text indexed it falls back to a Db.Search, in which case it uses a temporary index. That can have size limitations, but it usually throws a different error.

  9. @Petter – Yes, we’ve seen “not enough memory for full text indexing or search” but it was a long time ago. If you have something like:

    A or B

    -or-

    A or (B and C)

    Then any one piece of that might work OK, but the combination may give the error. Something about the GTR engine needs lots of memory when boolean logic is needed. 64-bit Domino probably won’t have the problem, but if you’re stuck on 32-bit or the Notes client then you may need to contact support. They may have an INI tweak that can help.

  10. Old thread, but still relevant, this just caught me out with Domino Volt 1.0.2.9.

    I’d created a Volt form, imported data from a spreadsheet and was trying to use Filters to select documents from within a date range. You guessed it, the date UNK field type must have been TEXT.

    Deleted FTIndex, Copy Style Compact, ReCreate FTIndex and I’m working.

    Unfortunately this doesn’t seem to work for the numeric fields I have on the form.

    Any ideas anyone?

    Kev

Leave a Comment

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Scroll to Top