Previously, we saw how unsupervised learning actually has built-in supervision, albeit hidden from the user.In this post we will see how supervised and unsupervised learning algorithms share more in common than the textbooks would suggest. As a matter of fact, both classes can use identical equations for creating mathematical models of the data, and both can use identical learning algorithms to find optimal parameter values for those models.The consequence of this relation is that one can easily transform a supervised learning method into an unsupervised one, and vice versa. The only change you need to do is determine how Y will be computed; that is, you have to decide how your error for learning (training) will be defined.You may have not noticed so far, but the general linear model (GLM) has been used as a versatile model with a versatile set of learning methods in order to create various supervised and unsupervised learning methods.When one thinks of GLM, probably the first methods that come to mind are regression and inferential statistics (e.g., ANOVA), both of which fall into the category of supervised learning. However, GLM has been used just as extensively in unsupervised setups. This relates to dimensionality reduction techniques in which the algorithm is not being told with which dimensions particular data points are being saturated. Rather, the algorithm is left to “discover” on its own those dimensions. Principal component analysis (PCA) and various forms of factor analyses are all examples of unsupervised applications of GLM.This easy jump from supervised to unsupervised is not just a property of simple models such as GLM. Exactly the same applies to computationally elaborate methods such as deep learning neural networks. A neural network can be easily set to operate with supervision or unsupervised; most commonly known ones are supervised applications, such as image recognition in which humans initially provided labels about the categories to which each image belongs. The network then learns that assignment, and if everything is done right, is capable of correctly classifying new images representing those trained categories (e.g., distinguishing human faces from houses; from tools; etc.).Neural networks can be used just as efficiently in an unsupervised learning setup. Perhaps the most common examples are auto-encoders, which are capable of detecting anomalies in data. Here, the network is trained to produce an output that has exactly the same values as the inputs it receives. The difference between what it has generated and what it should have generated i.e., the error, is used for adjusting its synaptic weights. The training continues until the network can do the job satisfactorily well using data that have not been used for training (i.e., test data set).What makes this learning non-trivial is that the topology of the neural network is made such that at least one of the hidden layers has a smaller number of units than the number of units in the input (and output) layer(s). This forces the network to find a representation of the data with reduced dimensionality, similar to that performed by PCA and factor analyses.Such networks are useful for applications in which labels possibly do not exist, or would be impractically difficult to obtain. Also, they can be very useful for applications in which collection of labels may take years, such as for example, fraud detection and predictive maintenance.A piece of advice to data scientists: don’t be afraid to turn your supervised learning method into an unsupervised one or vice versa, if you see that this fits your problem. You will need some creative thinking and more coding than usual but as a result, you may end up with exactly the solution that the task you are solving requires.Here is one general rule to keep in mind: supervised learning methods will always be capable of solving a wider range of different real-life problems than unsupervised ones. This is because supervised ones are much more specialized: their error computation is already determined by the algorithm. In addition, error computation is limited to whatever can be extracted from the input data. In contrast, unsupervised methods, being open to error data coming from the outside world, can basically take advantage of the errors “computed” by the entire external universe – including the physical events underlying the actual phenomenon that these methods are trying to model (e.g., a real physical event of a machine becoming broken provides the training information for a predictive model of whether a machine will soon be broken).All other things being equal, supervised methods will require less data and computational power to achieve a similar result. Unsupervised algorithms can learn to classify objects, as for example cats. But this comes with the expense of a lot more resources than needed for a supervised equivalent. In case of Google’s algorithm that discovered cats in images, 10 million images were required, 1 billion connections, 16,000 computer cores, three days of computation and a team of eight scientists from Google and Stanford. That’s a lot of resources.In conclusion, we now know the terms ‘supervised’ and ‘unsupervised’ may be misleading, as there is quite a bit of supervision in unsupervised learning. Maybe a better analogy would be if supervised learning was referred to as ‘micro-managed learning’, and instead of unsupervised learning we used the term ‘macro-managed learning’. These two would probably better describe what is actually happening in the background of the respective algorithms.Knowing that supervised and unsupervised methods can be seen as two different applications of the same general set of tools can be quite useful for creative problem solving in data science. By assuming a bit of an inventive attitude, one can relatively effortlessly convert an existing method from one form to another, as circumstances require.See More

Previously, we saw how unsupervised learning actually has built-in supervision, albeit hidden from the user.In this post we will see how supervised and unsupervised learning algorithms share more in common than the textbooks would suggest. As a matter of fact, both classes can use identical equations for creating mathematical models of the data, and both can use identical learning algorithms to find optimal parameter values for those models.The consequence of this relation is that one can easily transform a supervised learning method into an unsupervised one, and vice versa. The only change you need to do is determine how Y will be computed; that is, you have to decide how your error for learning (training) will be defined.You may have not noticed so far, but the general linear model (GLM) has been used as a versatile model with a versatile set of learning methods in order to create various supervised and unsupervised learning methods.When one thinks of GLM, probably the first methods that come to mind are regression and inferential statistics (e.g., ANOVA), both of which fall into the category of supervised learning. However, GLM has been used just as extensively in unsupervised setups. This relates to dimensionality reduction techniques in which the algorithm is not being told with which dimensions particular data points are being saturated. Rather, the algorithm is left to “discover” on its own those dimensions. Principal component analysis (PCA) and various forms of factor analyses are all examples of unsupervised applications of GLM.This easy jump from supervised to unsupervised is not just a property of simple models such as GLM. Exactly the same applies to computationally elaborate methods such as deep learning neural networks. A neural network can be easily set to operate with supervision or unsupervised; most commonly known ones are supervised applications, such as image recognition in which humans initially provided labels about the categories to which each image belongs. The network then learns that assignment, and if everything is done right, is capable of correctly classifying new images representing those trained categories (e.g., distinguishing human faces from houses; from tools; etc.).Neural networks can be used just as efficiently in an unsupervised learning setup. Perhaps the most common examples are auto-encoders, which are capable of detecting anomalies in data. Here, the network is trained to produce an output that has exactly the same values as the inputs it receives. The difference between what it has generated and what it should have generated i.e., the error, is used for adjusting its synaptic weights. The training continues until the network can do the job satisfactorily well using data that have not been used for training (i.e., test data set).What makes this learning non-trivial is that the topology of the neural network is made such that at least one of the hidden layers has a smaller number of units than the number of units in the input (and output) layer(s). This forces the network to find a representation of the data with reduced dimensionality, similar to that performed by PCA and factor analyses.Such networks are useful for applications in which labels possibly do not exist, or would be impractically difficult to obtain. Also, they can be very useful for applications in which collection of labels may take years, such as for example, fraud detection and predictive maintenance.A piece of advice to data scientists: don’t be afraid to turn your supervised learning method into an unsupervised one or vice versa, if you see that this fits your problem. You will need some creative thinking and more coding than usual but as a result, you may end up with exactly the solution that the task you are solving requires.Here is one general rule to keep in mind: supervised learning methods will always be capable of solving a wider range of different real-life problems than unsupervised ones. This is because supervised ones are much more specialized: their error computation is already determined by the algorithm. In addition, error computation is limited to whatever can be extracted from the input data. In contrast, unsupervised methods, being open to error data coming from the outside world, can basically take advantage of the errors “computed” by the entire external universe – including the physical events underlying the actual phenomenon that these methods are trying to model (e.g., a real physical event of a machine becoming broken provides the training information for a predictive model of whether a machine will soon be broken).All other things being equal, supervised methods will require less data and computational power to achieve a similar result. Unsupervised algorithms can learn to classify objects, as for example cats. But this comes with the expense of a lot more resources than needed for a supervised equivalent. In case of Google’s algorithm that discovered cats in images, 10 million images were required, 1 billion connections, 16,000 computer cores, three days of computation and a team of eight scientists from Google and Stanford. That’s a lot of resources.In conclusion, we now know the terms ‘supervised’ and ‘unsupervised’ may be misleading, as there is quite a bit of supervision in unsupervised learning. Maybe a better analogy would be if supervised learning was referred to as ‘micro-managed learning’, and instead of unsupervised learning we used the term ‘macro-managed learning’. These two would probably better describe what is actually happening in the background of the respective algorithms.Knowing that supervised and unsupervised methods can be seen as two different applications of the same general set of tools can be quite useful for creative problem solving in data science. By assuming a bit of an inventive attitude, one can relatively effortlessly convert an existing method from one form to another, as circumstances require.See More

One of the first lessons you’ll receive in machine learning is that there are two broad categories: supervised and unsupervised learning. Supervised learning is usually explained as the one to which you provide the correct answers, training data, and the machine learns the patterns to apply to new data. Unsupervised learning is (apparently) where the machine figures out the correct answer on its own.Supposedly, unsupervised learning can discover something new that has not been found in the data before. Supervised learning cannot do that.The problem with definitionsIt’s true that there are two classes of machine learning algorithm, and each is applied to different types of problems, but is unsupervised learning really free of supervision?In fact, this type of learning also involves a whole lot of supervision, but the supervision steps are hidden from the user. This is because the supervision is not explicitly presented in the data; you can only find it within the algorithm.To understand this let us first consider the use of supervised learning. A prototypical method for supervised learning is regression. Here, the input and the output values – named X and Y respectively – are provided for the algorithm. The learning algorithm then assesses the model’s parameters such that it tries to predict the outputs (Y) for new inputs (X) as accurately as possible.In other words, supervised learning finds a function: Y’ = f(X)Supervised learning successSupervised learning success is assessed by seeing how close Y’ is to Y, i.e. by computing error function.This general principle of supervision in learning is the basic principle for logistic regression, support vector machines, decision trees, deep learning networks and many other techniques.In contrast, unsupervised learning does not provide Y for the algorithm – only X is provided. Thus, for each given input we do not explicitly provide a correct output. The machine’s task is to “discover” Y on its own.A common example is cluster (or clustering) analysis. Before a clustering analysis, there aren’t known clusters for the data points within the inputs, and yet the machine finds those clusters after the analysis. It’s almost as if the machine is creative – discovering something new in the data.Nothing newIn fact, there is nothing new; the machine discovers only what it has been told to discover. Every unsupervised algorithm specifies what needs to be found in the data.There must be criterion saying what success is. We don’t let algorithms do whatever they want, or ask machines to perform random analyses. There is always a goal to be accomplished, and that goal is carefully formulated as a constraint within the algorithms.For example, in a clustering algorithm, you may require the distances between cluster centroids to be maximized, while the distances between data points belonging to the same cluster are minimized. Plus, for each data set there is an implicit Y, which for example may state to maximize the distance-between/distance-within ratio.Therefore, the lack of supervision in these algorithms is nothing like the metaphorical “unsupervised child in a porcelain shop”, as this would not give us particularly useful machine learning. Instead, what we have is more akin to letting adults enter a porcelain shop without having to send a nanny too. The reason for our trust in adults is that they have already been supervised during childhood and have since (hopefully) internalized some of the rules.Something similar happens with unsupervised machine learning algorithms; supervision has been internalized, as these methods come equipped with algorithms that informs what are good or bad model behaviours. Just as (most) adults have an internal voice telling them not to smash every item in the shop, unsupervised machine learning methods possess internal machinery that dictates what constitutes good behaviour.Supervised vs. unsupervisedFundamentally, the difference between supervised and unsupervised learning boils down to whether the computation of error utilizes an externally provided Y, or whether Y is internally computed from input data (X).In both cases there is a form of supervision.As all unsupervised learning is actually supervised, the main differentiator becomes the frequency at which intervention takes place. For example, do we intervene for each data point or just once, when the algorithm for computing Y out of X is designed?Hence, within the so-called unsupervised methods, supervision is present, but hidden (it is disguised) because no special effort is required from the end user to supply supervision data. The algorithm seems to be magically supervised without an apparent supervisor. However, this does not mean that someone hasn’t gone through the pain of setting up the proper equations to implement an internal supervisor.Consequently, unsupervised learning methods don’t truly discover anything new in any way that would overshadow the “discoveries” of supervised methods.This blog entry is reposted from my original blog entry at www.teradata.comSee More

]]>

]]>