A Data Science Central Community
I was reading about the "Automated Reduced Error Predictive Analytics" patent secured by Rice Analytics (see below) and my first question is:
How can you successfully sue competitors about using a mathematical technology? After all, most vendors offer error and variance reduction as well as dimension reduction and automated model selection (based on optimizing goodness-of-fit) in their software. All statistical and data mining consultants, including myself, also use similar techniques to help solve business problems from their clients. For instance, I have developed methodology that achieves the same goal, and my methodology (hidden forests, see http://www.analyticbridge.com/forum/topics/hidden-decision-trees-vs) is public domain, non-patented, and everybody can use it freely.
Any claim about patent violation would most likely fail, the defendant's argument being "my algorithm is different, the only thing that our technology shares with the defendant's system is a methodology - well known and used by analytic professionals for decades - to reduce dimensionsonality, automate model selection and reduce error".
What about the newly recently published algorithm for random number generation based on the decimals of numbers similar to Pi (see http://www.analyticbridge.com/profiles/blogs/new-state-of-the-art-r...). This is public domain and non-patented. Could such a methodology be patented (assuming it would never have been published)? I don't think so, but would like to have your opinion on this.
The Rice Analytics Patent
Rice Analytics Issued Fundamental Patent on RELR Method
This Patent Covers RELR Error Modeling and Related Dimension Reduction
St. Louis, MO (USA), October 4, 2011 – Rice Analytics, the pioneers in automated reduced error regression, announced today the issuance to it by the US Patent Office for a patent for fundamental aspects of its Reduced Error Logistic Regression (RELR) technology. This patent covers important error modeling and dimension reduction aspects of RELR. Dan Rice, the inventor of RELR and President of Rice Analytics, stated the significance of this RELR patent as follows:
“While large numbers of patents are important in many technology applications, it is also clear that just one fundamental patent can lead to the breakthrough commercialization of an entire industry. The MRI patent in the early 1970’s had such an effect and by the 1990’s had resulted in billions of dollars in licensing fees and enormous practical applications in medicine. We believe that this RELR patent could have a similar effect in the field of Big Data analytics because RELR completely avoids the problematic and risky issues related to error and arbitrary model building choices that plague all other Big Data high dimensional regression algorithms. RELR finally allows Big Data machine learning to be completely automated and interpretable. Just as the MRI allowed the physician to work at a much higher level and avoid arbitrary diagnostic choices where two physicians would come to completely different and inaccurate diagnoses, RELR allows analytic professionals to work at a much higher level and completely avoid arbitrary guesses in model building. Thus, different modelers will no longer either build completely different models with the very same data or have to rely upon pre-filled parameters that are the arbitrary choices of others. Most modelers would spend significant time testing arbitrary parameters because they are worried about the large risk associated with such parameters, but then it is very hard for them to find the time to be creative. The complete automation that is the basis of RELR frees analytic professionals to work at a much higher and creative level, so they can pose better modeling problems and develop insightful model interpretations. Most importantly, unlike parsimonious variable selection in all other algorithms, RELR’s Parsed variable selection models actually can be interpreted because these models are not built with arbitrary choices and because they are consistent with maximum probability statistical theory.”
This US patent referenced as number 8,032,473 describes a method of modeling and reducing error in logistic regression that can be applied quite generally in machine learning applications. Logistic regression is one of the more general advanced analytics methods because it can be used to model the probability of outcomes in all classic regression problems without regard to the form of the dependent variable. The most common application of logistic regression is in modeling categorical outcomes, such as binary or ordinal outcomes. Yet, any continuous dependent variable can be categorized into intervals and also modeled with logistic regression, such as in forecasting and survival analysis problems. Logistic regression remains one of the most widely used advanced analytics methods in business, government, medicine, and science applications. The reason for the popularity of logistic regression is that it allows the possibility of insight into the key putative drivers of the predicted regression outcome, but problems related to error and dimensionality are major limiting factors and prevent such insight with non-experimental data. This patented RELR method overcomes these problems.
Read more about this patent at http://www.riceanalytics.com/_wsn/page9.html
Comment
Daniel,
As a publisher with revenue generated from ads, I can only write positive things, for obvious business reasons. Capri has more flexibility and can be a bit more critical, and thus add a different value to AB :-)
Vincent
Vincent,
Thanks for the response and I know that it is a challenge and I also know that you and Capri (if it is only you or also other people) develop very creative ideas which are a joy to read and think about. Social media like this has a large benefit though because now you can just ask the inventors about the limits of their claims and their answer is out there for all to see.
Dan
@Daniel: While I have developed a few patents back in 2006 (related to Internet traffic scoring and decision trees), I moved from being a corporate scientist to becoming a publisher. As a result, I don't want to patent new techniques anymore, but instead my interest is to make my new analytic inventions freely available to a large audience of analytic professionals, in order to attract new members and thus advertisers. This could indeed create problems, as I might publish patented material without even knowing it. Since I am not paid by a University or any organisation to do my own research (you could call me an independent data scientist), I need to do my research at the speed of the light. It took me 10 minutes to produce my new random number generator based on decimals of interesting numbers (similar to Pi), while it would take 3 years to a PhD student to develop the same ideas. In 10 minutes, there is no way I can check whether the idea is new or not, or whether it is already patented. In my quest to provide high-quality content to my readers, I might inadvertently, one day, reinvent the wheel and publish techniques very similar to what other people have already patented. As a publisher with little money, what would be the outcome, should this issue arise? It would be very easy to prove that what I published is not plagiarism.
And to answer your other question: yes, Capri is closely affiliated with us.
I am not sure who Capri is and the link that references his/her decision tree methodology just goes to where Vincent is talking about how his own decision tree method is proprietary and how he was thinking about changing it in December 2010. I am not sure if Capri is working on Vincent’s method or if Capri’s method is similar to Vincent’s or what that connection would be, but I see no reference to any details in Vincent’s decision tree method. So, it is a bit hard to respond to an anonymous person and discuss an anonymous decision tree method. However, the RELR method has nothing to do with decision trees or standard logistic regression. Our patent claims are very specific to the patented RELR error model and the dimension reduction specifically based upon that.
Clearly, based upon how the RELR patent claims are structured, one could use the same dimension reduction method that the RELR method uses – dimension reduction based upon t values – and not violate these claims if one does not also use the patented RELR error model. For scientific modeling reasons, I would not recommend that though. We compared using t values for dimension reduction in Ridge Regression vs. the patented RELR error model in a JSM 2008 paper and the Ridge Regression model was atrocious. Its highest magnitude regression coefficients tended to be from complicated interaction effects or nonlinear effects, whereas RELR’s top magnitude regression coefficients were almost entirely from simple main effects. Also, the Ridge regression coefficients were greatly inflated in magnitude and had very wide standard errors in comparison to RELR. There is no statistical theory to support the use of t values in Ridge Regression or Standard Logistic or other methods, but the patent reviews the algebra behind why the RELR error model leads to the use of t values for dimension reduction in RELR. Again, legally you could use t values for dimension reduction with other methods – but statistically it would likely lead to poor results compared to what RELR gives unless you can also derive statistical theory to support your use of t values. In general, I have found throughout my career that arbitrary methods that are not based upon statistical theory do not perform well.
There has been a great increase in the issuance of analytics patents over the past few years. Much of that has been driven by large analytics software companies who are patenting their own proprietary regression and decision tree methods. However, I noticed that Vincent has a recently submitted patent pending also, so it is not unheard of to patent an analytics method. Below is a link to a KDnuggets piece on Friday about our newly issued RELR patent. There is also a link there to this US RELR patent document and to a page on our web site that reviews recent papers and presentations on the RELR method including the Ridge Regression comparison that I described:
http://www.kdnuggets.com/2011/10/rice-analytics-patent-on-relr-meth...
You need to be a member of AnalyticBridge to add comments!
Join AnalyticBridge