Perhaps you have noticed that there really are no successful text analytics systems, which are in general use on people’s desktops. It is fair to ask why this is the case.
It isn’t that people don’t have the need to absorb larger bunches of text. In fact, I might take a guess that the basic approach taken by the makers and vendors that have preceded us isn’t appropriate to what people want to get from text data.
Alternatively, Leximancer is built to analyze big, medium or small; English or Greek or Malay; medical, CPG or high-tech; long or short bodies of unstructured text from just about any source. The idea of what we’re accomplishing is new, and so is our way of making customers and partners successful.
The purpose of this posting is to examine what text offers most people, then compare this with what previous attempts at text analyzing software have tried to do, and failed, as well as arm you with questions to consider when evaluating options.
Text tells the story.
Text tells us the story. A good story lays out the ideas and characters with their attributes. We read the text to set the scene – to explain the situation that we have dropped in on. It is like the first episode of a TV series. After that, we read on to see how the characters and ideas interact. There are changing relationships.
A survey or report or set of online product reviews are no different. We need to see what issues, products or services are front-of-mind for the authors or responders, what attributes they assign to these issues and products, and how they see the relationships. We then move on to start answering questions and fixing problems. This is how we apply the knowledge gained.
In concrete terms:
1. We discover the concepts of the situation from the text.
2. We discover the explanations, or insights, from the text.
3. We can then act on these insights to alter the system.
Step 1 is important and neglected. You cannot understand the situation without understanding the background ideas.
You cannot understand an IT textbook using the concepts from political science. You would struggle to paint a seascape with a palette suitable for a childs cartoon. Unfortunately, this problem is insidious and leads to mistakes that we fail to notice. Why? Because if we naively analyze some data with a set of ideas that we know well, and we fondly expect will apply to the data, we may never see that we are missing a quite different perspective.
Most text analyzing systems will not automatically extract a clear set of the concepts and actors that characterize the text. Systems that come with predefined sets of categories, dictionaries and entity lists are a menace. You cannot risk interpreting your data filtered through an understanding created by someone who is not familiar with your data and your situation, even if the answer looks simple and neat. This leads to
Question 1: Does the system’s set of categories, entities, and concepts reflect a real understanding of my data and my situation?
Some systems use predefined categories that are manually tuned by the vendor during pre-sales. The vendor’s consultants will sift through your data and construct extensive lists of terms, pattern matchers and possibily rules. The analysis will then look okay at that time, but things change. New issues will arise in your business, and the terms and entities will change over time. This leads to Question 2:
Question 2: How much time and effort did the vendor invest in tuning the category dictionaries, rules, and entity lists before go-live? When your data inevitably changes, can you afford to feasibly repeat this process to maintain the fidelity of your analysis?
If the analytics system does not use predefined categories, it may use document or word clustering. Many such systems do not produce clear or validated concepts. Remember that for easy and regular use, the discovered patterns of meaning need to be stable and clear. Don’t be fooled by people who say that this sort of system works because it looks attractive and even compelling. There are ways to check whether discovered term clusters are real measures of meaning, or whether they are wasting your time. Here are some questions for vendors who offer term or document clustering or other concept map solutions:
Question 3: If the product uses document clustering: how does the system scale with vast numbers of documents? If a document contains several different ideas, can it be in two topics at once? If I cut up the same documents into different chunks, would the pattern of clusters be similar? Text content isn’t always organized in predictable ways, so this is an important set of questions.
Question 4: If I take two different documents either by different authors or in different languages, would the discovered patterns of meaning look similar between the two? Multinationals – think about this if you want a consistent, true view of your customer comments.
Step 2 is almost totally ignored. Text information can tell you a story so you can improve business performance—with customers, with marketing. What else would you really want to do with it?
Quantitative, categorical, and numerical data mining is really good for establishing metrics and testing to see if pre-defined metrics change. Great. Do this. It is really good for predicting whether a pre-selected situation is matched, such as customer churn probability.
But don’t forget that analyzing text comments from customers or competitor product reviews on the other hand excels at telling you what is happening. Because text is human communication – that is what it is for. So why waste this extremely valuable and rich source of intelligence?
Think of it this way. If your metrics show your sales are rising, everyone feels great. If your metrics show you your results are falling off a cliff, how do you work out how to fix the system? This is the feedback you need for controlling a system. Your text data will tell you how to turn things around faster and more accurately than almost any other source of management information.
Unfortunately, this is where most text analytics systems fail or don’t even bother. Here are some other questions:
Question 5: Does the system suggest chains of meaning which are well supported by the data, and which I can understand and explain to a manager? In other words, is it an explanatory model?
Question 6: Can I test hypotheses (educated guesses) based on the perspective of the customer?
Question 7: How does a simple list of terms tell me much about the reasons for what is happening, without having to do a whole lot of guessing or having to read large amounts of text after all?
Step 3: Set your bar high and expect an automatic, systematic and scalable system that can enable unstructured textual information to become a real enterprise asset—good for uncovering new customer insights, new product ideas, and business process improvements that were previously unachievable. And now act on what you find!
I hope this helps. People are still doing a whole lot of writing and talking trying to tell you things. I think we need to listen more carefully, understand what they are saying and then act thoughtfully.
By Andrew E. SmithRead Full Post | Make a Comment ( 5 so far )
Roger Levy’s line of work allowed him to explore many tools and technologies in the world of data and text mining. He gets it. And he needs it.
In his line of work – forensic and special investigations technology – the amount of data he works with can be immense. He has to sift through hundreds of depositions involved with these investigations and find key concepts, as well as major differences in recollections in the depositions.
When he was first introduced to Leximancer, he saw the potential. He first used Leximancer in his previous position as group general manager at Telstra, but he also engaged with Leximancer himself with his own company, Forensic Technology Pty Ltd. After working with nearly every text analytics platform rolled out in the last decade, Roger Levy has found Leximancer to be the best for his work.
“Leximancer’s key benefits are identifying the focal points and key elements of large text documents and identifying trends and differences,” Levy said. “Currently we use Leximancer to analyze depositions during investigations and legal proceedings, looking for key focuses, differences in recollections and summarizing experiences. We also are evaluating the platform in several other areas including evidence analysis.”
Forensic investigations often involve the services of a medical examiner, crime laboratory analyst, crime scene examiner, forensic engineer, psychological profiler, statistician, computer analyst and a polygraph expert. The amount of written reports and depositions accompanying a forensic investigation can be massive, and an automated data analysis platform like Leximancer serves to bring speed and interactive visual mapping tools to the laborious process of analyzing the body of collected data.
Roger feels he can definitely recommend Leximancer to others working in the forensic sciences and feels that this is the beginning of a long-term relationship in a highly technical industry that can benefit from the rapid, thorough analysis of complex documents.
Leximancer’s powerful technology has many applications across a variety of industries from rapid analysis of commentary on social networks to bringing speed to forensic and eDiscovery investigations to providing “hidden” customer insight to marketers.Read Full Post | Make a Comment ( None so far )