Groupon has filed a patent for systems and methods to classify and tag textual data. The patent describes a method to access a collection of documents with labels indicating services offered by a merchant, generate queries based on extracted features, and assign precision scores to select queries. These selected queries can then be used to indicate labels for machine-readable text. The patent also includes a method to assign labels to unknown textual portions based on query results. GlobalData’s report on Groupon gives a 360-degree view of the company including its patenting strategy. Buy the report here.
According to GlobalData’s company profile on Groupon, dynamic premium pricing was a key innovation area identified from patents. Groupon's grant share as of September 2023 was 68%. Grant share is based on the ratio of number of grants to total number of patents.
Classification and tagging of textual data
A recently filed patent (Publication Number: US20230315772A1) describes a method and apparatus for tagging machine-readable text recovered from electronic sources. The method involves applying queries to the text, which are automatically generated from a corpus of documents with labels indicating services offered by a merchant. Each query has an associated weight and is based on an extracted feature set and a precision score. The queries are used to assign labels to different parts of the text, and the merchant is classified based on these labels.
The method also includes generating a score for each query, indicating its ability to return relevant results. This is done by accessing the corpus of documents and generating queries based on extracted features and the documents. A precision score is calculated for each query, based on the number of true positive documents returned divided by the total number of documents returned. A query subset is then selected based on a precision score threshold, which provides an indication of the labels to be applied to the text.
The generation of queries involves creating an array of feature index pairs, which includes features and their positions in a sentence. The queries are generated based on combinations of these feature index pairs. Additionally, a distance measure is calculated between features in the queries, and the distance is rounded to the next highest multiple of a predetermined number.
The method also includes ignoring a subset of words in the corpus, such as rare words or stop words, and scoring another subset of words based on their relationship to the labels. Features are then extracted from the scored words, which satisfy a predetermined threshold.
The apparatus described in the patent includes at least one processor and memory with computer program code. It applies the same method as described above, including generating queries, assigning labels, and classifying merchants based on the labels.
Overall, this patent presents a method and apparatus for tagging machine-readable text using automatically generated queries and extracted features. It offers a systematic approach to classify merchants based on the content of the text, providing a valuable tool for analyzing large volumes of electronic sources.