I was in a debate today with a user who will remain nameless... but what is the ideal distance operator you use...
1.2, 1.6, 2.5?
No right or wrong, just what you use.
Rick- are you asking what the ideal distance operator is for a "phrase search" or [close to search]?
If so, my answer would be that it would depend on the number of words you are looking for within that "phrase search" or [close to search]. Also, which approach are you trying to take? Meaning, if you want to be really strict and reduce your opportunity for false positives (but in turn increase the opportunity for misses, you'd start with a small distance; say with the default of half a second. If you want to cast a wider net (which is how we train) we'd suggest you start with a larger time frame to allow for additional words to be spoken in-between those words (say with a distance operator of 1.5 seconds). That larger time frame will decrease your chance of misses, but in turn increase the chance of false positives. We suggest this approach however, because we feel it is easier to identify and refine to eliminate false positives than it is to identify and add to decrease misses. (You don't know what you don't know).
Let's show an example:
If I wanted to find examples at a bank where people are calling in to close out their accounts I may start out by searching for "cancel account":0.5
With this example the system would only bring back instances when these two words are spoken immediately one right after the other. By conducting this type of strict search I may be missing out on other conversations that are contextually accurate for what I want. What if a customer said "I'm calling to cancel my checking account" ?
My search would NOT have brought that transcript back, because I was too strict and did not allow enough time for the caller to speak all 4 words. By increasing it to a 1.5 second time frame it WOULD bring that transcript back (4 words * 0.3 seconds per word = 1.2 seconds). So my suggestion would be to start with:
Now, on the flip side of that, if I didn't want that type of conversation (meaning, I truly only wanted the system to return instances where they said "cancel account" with NO opportunity for other words to be spoken in between), then opening it up to that 1.5 seconds would have been a mistake.
At the end of the day my opinion is that it all depends on context. There is no cut/dry answer to your question. From a training perspective we usually suggest that people start out with the larger distance operator and refine down from there if too many false positives are being returned.
Thanks Anne, thats awesome.
Back when beginning my speech analyst career, I actually used to say the phrase I was searching for out loud and time myself using a stopwatch. I can't imagine what my non-speech analyst neighbors thought...
Now I begin with 0.3 seconds per word, and modify while testing. Depending on how popular the words are that I am searching for determines the amount of time to a large degree. For example, a phrase containing common words "I can offer a payment plan of":1.5 would have a much quicker time frame than a phrase containing less common words "Assist you to the best of my ability":2.4.
Retrieving data ...