A Data Science Central Community
We are investigating a metric that measures the presence or absence of a structure or pattern in data sets. The purpose is to measure the strength of the association between two variables, and generalizes our modern correlation coefficient in a few ways:
Curious pattern: 3-D waves created by 2-D circular motions of each dot
The structuredness coefficient, let's denote it as w, is not yet fully defined - we are working on this right now. You are welcome to help us come up with a great, robust, simple, easy-to-compute, easy-to-understand, easy-to-interpret metric. In a nutshell, we are working under the following framework:
Note that this type of structuredness coefficient makes no assumption on the shape of the underlying domains, where the n points are located. These domains could be smooth, bumpy, made up of lines, made up of dual points etc. They might even be non numeric domain at all (e.g. if the data consists of keywords).
Related articles
Comment
You might want to check out Topological Data Analysis techniques used by Ayasdi (http://www.ayasdi.com/) for comparative purposes.
Your notation is a bit confusing. A n-dimensional vector is not something that can be represented by (x,y) where x and y are real numbers. Also, you have not come up with a mathematical definition for "behavior uniquely characterizing the absence of structure". Do you mean white noise? Do you mean a flat trend line? It sounds like an interesting idea, but I'm not sure how to implement it from your description.
Given a set of n points in Euclidean space of dimension d, compute for each point the distance to its nearest neighbor(s). Then compute the variance of this data set, call it v. Multiply v by n^d to produce the parameter p. As n goes to infinity, p tends to a limit which intuitively measures how evenly distributed the point set is. For instance, for d=1 and uniformly distributed points, p approaches 1/4 in the limit. For the plane, use d=2.
© 2021 TechTarget, Inc.
Powered by
Badges | Report an Issue | Privacy Policy | Terms of Service
Most Popular Content on DSC
To not miss this type of content in the future, subscribe to our newsletter.
Other popular resources
Archives: 2008-2014 | 2015-2016 | 2017-2019 | Book 1 | Book 2 | More
Most popular articles
You need to be a member of AnalyticBridge to add comments!
Join AnalyticBridge