☀️ light
🌙 dark
system
[
it
]
irfan toor
engineer . c|eh . data scientist (ai)
Blog
AI
Business
Cyber
Engineering
Life
Hacks
Smart
Projects
title
keyword
category
|
text
images
{"nbformat_minor": 0, "cells": [{"source": "# scikit-learn Cookbook\n\nThis cookbook contains recipes for some common applications of machine learning. You'll need a working knowledge of [pandas](http://pandas.pydata.org/), [matplotlib](http://matplotlib.org/), [numpy](http://www.numpy.org/), and, of course, [scikit-learn](http://scikit-learn.org/stable/) to benefit from it.", "cell_type": "markdown", "metadata": {}}, {"execution_count": 1, "cell_type": "code", "source": "#
\n%matplotlib inline", "outputs": [], "metadata": {"collapsed": false, "trusted": true}}, {"source": "##Training with k-Fold Cross-Validation\nThis recipe repeatedly trains a [logistic regression](http://en.wikipedia.org/wiki/Logistic_regression) classifier over different subsets (folds) of sample data. It attempts to match the percentage of each class in every fold to its percentage in the overall dataset ([stratification](http://en.wikipedia.org/wiki/Stratified_sampling)). It evaluates each model against a test set and collects the confusion matrices for each test fold into a `pandas.Panel`.\n\nThis recipe defaults to using the [Iris data set](http://en.wikipedia.org/wiki/Iris_flower_data_set). To use your own data, set `X` to your instance feature vectors, `y` to the instance classes as a factor, and `labels` to the instance classes as human readable names.", "cell_type": "markdown", "metadata": {}}, {"execution_count": 2, "cell_type": "code", "source": "#
\nimport warnings\nwarnings.filterwarnings('ignore') #notebook outputs warnings, let's ignore them\nimport pandas\nimport sklearn\nimport sklearn.datasets\nimport sklearn.metrics as metrics \nfrom sklearn.linear_model import LogisticRegression\nfrom sklearn.cross_validation import StratifiedKFold\n\n# load the iris dataset\ndataset = sklearn.datasets.load_iris()\n\n# define feature vectors (X) and target (y)\nX = dataset.data \ny = dataset.target \nlabels = dataset.target_names \nlabels ", "outputs": [{"execution_count": 2, "output_type": "execute_result", "data": {"text/plain": "array(['setosa', 'versicolor', 'virginica'], \n dtype='|S10')"}, "metadata": {}}], "metadata": {"collapsed": false, "trusted": true}}, {"execution_count": 3, "cell_type": "code", "source": "#
\n# use log reg classifier\nclf = LogisticRegression()\n\ncms = {}\nscores = []\ncv = StratifiedKFold(y, n_folds=10)\nfor i, (train, test) in enumerate(cv):\n # train then immediately predict the test set\n y_pred = clf.fit(X[train], y[train]).predict(X[test])\n # compute the confusion matrix on each fold, convert it to a DataFrame and stash it for later compute\n cms[i] = pandas.DataFrame(metrics.confusion_matrix(y[test], y_pred), columns=labels, index=labels)\n # stash the overall accuracy on the test set for the fold too\n scores.append(metrics.accuracy_score(y[test], y_pred))\n\n# Panel of all test set confusion matrices\npl = pandas.Panel(cms)\ncm = pl.sum(axis=0) #Sum the confusion matrices to get one view of how well the classifiers perform\ncm", "outputs": [{"execution_count": 3, "output_type": "execute_result", "data": {"text/plain": " setosa versicolor virginica\nsetosa 50 0 0\nversicolor 0 45 5\nvirginica 0 1 49", "text/html": "
\n
\n
\n
\n
\n
setosa
\n
versicolor
\n
virginica
\n
\n
\n
\n
\n
setosa
\n
50
\n
0
\n
0
\n
\n
\n
versicolor
\n
0
\n
45
\n
5
\n
\n
\n
virginica
\n
0
\n
1
\n
49
\n
\n
\n
\n
"}, "metadata": {}}], "metadata": {"collapsed": false, "trusted": true}}, {"execution_count": 4, "cell_type": "code", "source": "#
\n# accuracy predicting the test set for each fold\nscores", "outputs": [{"execution_count": 4, "output_type": "execute_result", "data": {"text/plain": "[0.93333333333333335,\n 0.93333333333333335,\n 0.8666666666666667,\n 1.0,\n 1.0,\n 0.93333333333333335,\n 0.93333333333333335,\n 1.0,\n 1.0,\n 1.0]"}, "metadata": {}}], "metadata": {"collapsed": false, "trusted": true}}, {"source": "## Principal Component Analysis Plots\nThis recipe performs a [PCA](http://en.wikipedia.org/wiki/Principal_component_analysis) and plots the data against the first two principal components in a scatter plot. It then prints the [eigenvalues and eigenvectors of the covariance matrix](http://www.quora.com/What-is-an-eigenvector-of-a-covariance-matrix) and finally prints the precentage of total variance explained by each component. \n\nThis recipe defaults to using the [Iris data set](http://en.wikipedia.org/wiki/Iris_flower_data_set). To use your own data, set `X` to your instance feature vectors, `y` to the instance classes as a factor, and `labels` to human-readable names of the classes.", "cell_type": "markdown", "metadata": {}}, {"execution_count": 5, "cell_type": "code", "source": "#
\nimport warnings\nwarnings.filterwarnings('ignore') #notebook outputs warnings, let's ignore them\nfrom __future__ import division\nimport math\nimport pandas as pd\nimport numpy as np\nimport matplotlib.pyplot as plt\nimport sklearn.datasets\nimport sklearn.metrics as metrics\nfrom sklearn.decomposition import PCA\nfrom sklearn.preprocessing import StandardScaler\n\n# load the iris dataset\ndataset = sklearn.datasets.load_iris()\n# define feature vectors (X) and target (y)\nX = dataset.data \ny = dataset.target \nlabels = dataset.target_names ", "outputs": [], "metadata": {"collapsed": false, "trusted": true}}, {"execution_count": 6, "cell_type": "code", "source": "#
\n# define the number of components to compute, recommend n_components < y_features\npca = PCA(n_components=2) \nX_pca = pca.fit_transform(X)\n\n# plot the first two principal components\nfig, ax = plt.subplots()\nplt.scatter(X_pca[:,0], X_pca[:,1])\nplt.grid()\nplt.title('PCA of the dataset')\nax.set_xlabel('Component #1') \nax.set_ylabel('Component #2')\nplt.show()", "outputs": [{"output_type": "display_data", "data": {"image/png": "iVBORw0KGgoAAAANSUhEUgAAAYkAAAEZCAYAAABiu9n+AAAABHNCSVQICAgIfAhkiAAAAAlwSFlz\nAAALEgAACxIB0t1+/AAAIABJREFUeJztnXu8HVV597+/cAlJIM0FXxAFoghegYiKqfjKUctFVChe\na6uC7YtXGjURKRFeqCKVWiJFaVGsBOuLaKtBUq4RidJSuYSrQJBUAygIQogBgUSS5/1jrZ0zZ5/Z\ne8/ee/aetc9+vp/P+pw9M2tmfnM565n1rLWeJTPDcRzHcfKYVLUAx3EcJ13cSDiO4zgNcSPhOI7j\nNMSNhOM4jtMQNxKO4zhOQ9xIOI7jOA1xI+E4GSQdIOkeSY9LOrxA/jmSNkvqyf9SPPbze3FsxymC\nGwmnciStkfRkLJh/I+k8SdMy2w+R9BNJ6yU9LGmFpLfWHWMkFqif7lLOZ4GzzGwHM7u4gdY3dHmO\n0um1ser3eZx08AftpIABbzGzHYD9gFcCJwJIegfwXWAJ8Bwz+1/A/wXeWneMo4CfAe/vUstuwJ0t\ntKrLc/SSfmlL+R44JeJGwkkKM3sAuBx4aVy1GPismX3DzB6PeX5iZh+s7RNrHW8HPgzsJukVzc4h\n6ZjoUnpU0g8kPTuu/x/g+cCyWGvZpm6/fyUYkWWx1vOpzOb3SrpX0m8lLcrsI0l/I2m1pEckfUfS\nzCbajpP0gKRfSfrLum1vlnSzpN9Juk/SyZnNP4l/10Vtr5a0h6QfxfP+VtK3JP1R5njHx/Osl7Sq\nVkNqoXnceZrda2cCYGaePFWagF8Cb4y/dyXUCP4WeBGwGdi9xf7vA+6Jv/8fwV3UKO8bgN8Cc4Ft\ngbOAH9dpeUMLrW/ILM+JGr8KTAb2AZ4GXhi3fxy4FtgF2AY4B7igwbEPBX4DvASYClwQj/38uP1A\n4KXx994x7xFxefeYd1LmeHsAb4zn3RH4MfCluO2FwH3AznF5t8x5GmrOO4+niZ0qF+DJE7AGeBx4\nLP7+SixwD4gF0rYt9v8hcFr8/afAw8DWDfL+C/CFzPI0YCOwW1zu1Ejskll3HfCu+PuuuvzPjucb\nV8gC36hdR1zeM2skcvKfCSyu09Gw8I735qb4+wXAQzUjUpfvzkaai5zH08RK7m5yUsAIX8QzzWyO\nmR1rZhuAR+P2ZzfaUdKuwAjwb3HV5cB2wJsb7PJs4N4tJzb7fTzPc7q6gvBVX+NJYPv4e3dgqaTH\nJD1GKICfAXZqoO3+zPJ92Y3RhXR1bLxfB3wImN1IkKSdJF0YXUq/A/61lt/MVgOfAE4BHpL07Zrb\njWAIimp2JjhuJJyUuZtQaL6jSZ73Ed7jSyU9SPjS347QkJ3HA4RCENjSnjEb+HVBTe2GTb4PODQa\nwFqaamYP5uR9kOD2qbFb3fYLgIuA55rZDIIbqPY/nKfrNGAT8DIz+yNG71XYwezbZva/CYbMgNML\naPaw0UOGGwknWczMgAXASZKOljRd0iRJr5X01ZjtKMLX8L6Z9HbgMEmzcg77beADkvaVNJlQkP7U\nzO7LyZvHQwRff1HOAU6TtBuApGc1GX/xXeBoSS+WNBU4uW779sBjZrZR0v7AnzNaaP+W4Abaoy7/\n74H1kp4DHFfbIGkvSW+I92ADoR1lUwHNeedxJjBuJJykMbPvAe8G/pLwtf8bwliGiyTNIzR0n21m\nD2fSMmA18Gc5x7sKOAn4HqFW8by8fE34O+DE6IpZUDtsk/z/CFwMXClpPfDfwP4NrvVyQjvDj4Cf\nA1fVHfujwGfjcU4CvpPZ90ng88B/SVobjcjfEroU/w5YFq+5drzJ8Vp+S6jB7Aic0Epz3Xkei+dx\nJjAKH2sVnVz6BsF3/LCZ7Z2zfQT4AfCLuOp7ZnZq/xQ6juMMN1tXfP7zgC8D32yS58dm1jI8guM4\njlM+lbqbzOwaQrfHZvjITsdxnIpIvU3CgNdIulXSpZJeUrUgx3GcYaJqd1MrbgJ2NbMnJb2J0P1v\nr4o1OY7jDA1JGwmLsXri78sk/ZOkWWa2NptPkvfddhzH6QAza+rST9pISNqJ0PPJYlc71RuIGq0u\ntN9IOsXMTqlaRxbXVJwUdbmmYrim4hT5wK7USEj6NiFo2Y6S7icMHtoGwMy+Shhp+xFJzxBCHbTT\nn71q5lQtIIc5VQvIYU7VAhowp2oBOcypWkAOc6oWkMOcqgXkMKdqAZ1SqZEws/e02H42cHaf5DiO\n4zh1pN67aZBZUrWAHJZULSCHJVULaMCSqgXksKRqATksqVpADkuqFpDDkqoFdEqlI67LQpKl1ibh\nOI6TOkXKTq9J9IgYUiQpXFNxUtTlmorhmsrFjYTjOI7TEHc3OY7jDCnubnIcx3G6wo1Ej0jRB+ma\nipOiLtdUDNdULm4kHMdxnIZ4m4Qz8Eg6BGYtDEtrzzCzK6pV5DiDQZGy042EM9AEAzF9KZw1JayZ\n/xSsP9INheO0xhuuKyRFH+TE1DRrYTAQRxHSWVNGaxVV6iof11QM11QubiQcx3Gchri7yRloUnY3\neVuJkzreJuEMBSkWxikbL8ep4W0SFZKiD3KiajKzK8wePTikcgrhFNtKJurzKxvXVC5uJBzHcZyG\nuLvJcXqAu5ucQcDbJBynQlJsK3GcLN4mUSEp+iBdU3FSbCtJ8V65pmKkqKkobiQcx3Gchri7yXEc\nZ0hxd5PjOI7TFW4kekSKPkjXVJwUdbmmYrimcqnUSEj6hqSHJN3eJM9Zku6RdKukl/dTn+M4zrBT\naZuEpP8NPAF808z2ztl+GHCsmR0m6dXAP5rZvJx83iYxQQndSKedBpN3h833wrpF3pXUccoh+TYJ\nM7sGeKxJlsOB82Pe64AZknbqhzaneoKBmPoDmLIfLJ4NZ+4H038Q1juO0w9Sb5N4DnB/ZvlXwHMr\n0tIWKfogB0/TrIWwz2T4BzIxkCaXMV9Ed7qqwTUVwzWVy9ZVCyhAfVUo1z8maQmwJi6uA24xsxVx\n2whAn5fnAlWef9xy5l4loae13llR8V2EW1m7jI0zJY3480vreaWyDMyVlIyelN6n+PvoeJ/WUIDK\nx0lImgMsa9AmcQ6wwswujMurgAPN7KG6fN4mMQEZdTdNjbUJgPkbYP0R3i7hON2TfJtEAS4G3g8g\naR6wrt5ADAKSDpFmXxmS+9OLEgzBk0fAUzfBgkfhEze5gXCc/lJ176ZvAwcCOwIPAScD2wCY2Vdj\nnq8AhwK/Bz5gZjflHCe5mkTNHaKEooFmXTSpkKImSFOXayqGaypOkbKz0jYJM3tPgTzH9kNL75i1\nEBbHyWcAmAILFgL+New4TvKk7m4aWFL8anBNxUlRl2sqhmsql0Ho3TTgrD0D5r8WyLqbzqhUkuM4\nTkG8JtEjMt3OroD1R8KC5SFVNztZin21U9QEaepyTcVwTeXiNYk+EI2Ct0E4jjNwVD5OogxS7N3k\nTCzkU5E6E5AiZacbCcdpQUrdmB2nTCbCYLqBJUUfpGsqzlhdsxYGA7ElftSUfsSPaq4pDVxTMVLU\nVBQ3Eo4zAfFR/k5ZuLvJSYKUff6D5G4KWmecBsyFv5oEe5OyXqdavE3CGQgGoRBO2YjVGH8fjydM\nx/IbYMFys0cPrlCekyDeJlEhKfog09WUjM8/66I5LrvNzK4we/TgkFId51J/H08Hvlaxpv7jmsrF\nx0k4Q8+oi2bGXPhAdNF89HWSbkuxxtAeD+Cj/J1ucHeTUzlVupsmkosm5z5uhk23wO99XnAnl+Sj\nwDoOBFeOpCNjdFxgfc98/vVtCzlRegkumsOb7pdiodvP++gMEWY28ClcRvU66jSNVK3BNY071yEw\n/UlYYiFNfxKmrQy/LaYlBvMMpj4NHNJ4v7Ct8XlmXRlS43wp3yvXNPE1RV3WKo/XJJwhIm9uj08Q\n3FvUuWievNC2fIUXnxNk1OWzuHa810pKqqeW47SDG4keYQnGj3dNeUx6FNaV6KLp3SRT1d+r8bim\nYqSoqShuJJwhIn9uD2sZpdfnBHGGF+/d1CNSnNPWNRVvgK7X1cZ+PeuplXevqm5Q93eqGClqAu/d\n5DjjaF1rqBW8258qzX6sVvAW2a92/P721PL2D6e3eE2ix1T9pee0xyCECKkhzb4SFh802v5xPvVj\nOyQtglkLwtLaxWZ2Wv+VOqniNYmK8S+9QaR1w/OgGP5gIKZ/HhbHNfM/Lwk3FE47eOymHpFSTKKx\nmtIiRU2BFblrM4b/oJCmL+1VKO76cN/j79XaM0JN53xCmv9UWFdj1gI4i8z7R61WUVYo8RSfn2sq\nl0prEpIOBc4EtgK+bman120fAX4A/CKu+p6ZndpXkc6QUevJ9JEpcC/jezL1rotrlrxaKKw/mYz1\n6rT9w2u4TltUONJvK2A1MAfYBrgFeHFdnhHg4jJGDZaou/BoWtocqespjdTsGYd19SO0p60se4R1\n/nlmXdnmdSyC6ZZ5/yysa33sdt5zT4ObipSdVdYk9gdWm9kaAEkXAkcAd9XlS6ZBut0vMPNYOgOJ\nNe3JVD9m4qMbYOuXwuLJYXn8O1FVG4aZnSYJWBAbrtcvDutmjzTbz2sazhgqtGDvAM7NLL8X+HJd\nngOBR4FbgUuBl3RqDcvRXOzrDjgEtr8hta8wEowfk6KmVroY85U9Iyf20+g7QYe1yQb7HVfStTXV\n1E4tJsXn55ra0mWt8lRZkyjS9/YmYFcze1LSm4CLgL3yMkpaAqyJi+uAWywOXqk1GnW7DLPi4VfU\nn3tL/vAVNvUH8NbJcAjxK+xk4Iay9bSvf7xeX264PJf4oOu3Axtg7Wnhec++MlR+VxC8owAbZ44O\nnpq1EI6ZArvXtk+BY0+VtKHF+TfA+lgL3TgTnvgucEMZ1xePfTIsOCgsrl8e1tXYOHNshf6uuC6Q\nyPNpdn1zYy+uJPS0ep/6uRx/Hx3v0xqKUKEFmwdcnlk+ATi+xT6/BGZ1Yg1L0tzyq7AMX7KnwUmt\n3olBfB+KvOeeJkYqUnZWWZO4EdhT0hzC9FnvBt6TzSBpJ+BhMzNJ+xMG/63tt9Aa5m0MTh2t34nq\n4j512hbi77kzhoqt2JuAuwm9nE6I6z4EfCj+/hjwM0LPp2uBeZ1awz5eU/wKOz65rzAS9IumqKls\nXWxpw5ixcrQnVPvvRDua6FNtIMXn55ra0mWt8lQ6TsLMLgMuq1v31czvs4Gz+62rG2zLV9jZp8K2\nj/lX2MSik6/z+E4Qegyd3aceQ/0Zz+FMfDwsRw+wgsHg+o0lGIUyRU2Qr6u7rqHdF9pl3Kuyu+Om\n+PxcU7m4kXAmNOUWir37Oi9/LEVuW8gKH//gtE3VPrF++dUq0DRStYZh10QXfvk8XeN7Ki00mPVI\ne6PvF1qYQ3vGJmBROzrbvVfUjZruRU+rYXunJpKmqMta5fEAfyVTC5wG23+xV4HfnKKUHWBx7Rkw\nf0MIpvcp4Fxg8ewigf7M7ApYfyqcuxk+DJw5CaafOFqDKEdnNnBfOO+jB4fktQWnM9zdVCI5Puul\nedX5CsM0rOjHedohRU0wVtfo85oxG56aBOcAv2I0wipQyPU0awQWT8rZp21NebRuMym/O26Kz881\nlYsbiVIpOheB+4X7Q/eF4vjn9SngFOBrSekMNH//zMc/OJ1QtU+sX361/ujI+nyvzvX5VjkClwT9\nor3WRIfRTGu68p/X2wwuN9jRMu0Im8IYiMbnoEnbQxGdre5VN20mqT4/19RzXdYqT9OahKQXA7sA\n15nZE5n1h5rZ5V3apwlI9ovwLuCf+za61snHetId+U5CbeKpzTD/NzBpZzhmEuy9XyMXY01Loy/5\ncnRm37/bCW0mZ80GDvIaq9MxTSzMfMJo6IsIs6/8aWbbzVVbwHatYR+1NP0ixOPiVJ5aPaPmz2vq\nRpi+eXR5xqaUYjNlru2RlHR5SjMVKTub1SQ+CLzCzJ6I8ZX+XdIcMzuzTCM10bAWX4S25WvyY6fB\n5N1h0r19lDf0dD8nyLaz4cz9Rv3+5yTVQ7D2/sXeTQdVradGVZ01nBJoYmHuqFvenlD4fYkQhrty\nK9iONaxA00jdciZ+z+R7gg+7v7WJek0ppH5ramNOkFxd+X7//jzLdu4VJdRY6aCdJG+fMrSk/E4N\nqqaoy1rlaVaTeFjSXDO7JR7pCUlvAf4F2KdTozSMjP96/QTwV3hcnUGkvifSP28A7ocFM2HzvbB+\nkSXwlWyhBnRq3ax0hXW1W+MK+aedBtPnhm6+o/t4HKkBp4mF2RXYOWe9gNdWbQHbtYbV6sv7ep1X\nt7wlQugj4be3U/ToXSnxC3vaSpj+dIrtS91eZ5uz08VzzbO8fQZxTo1hSUXKzoY1CTO7H0DSvmZ2\na2a9Af9Zop0aUu4hjNwFmG+wYR/YYWv4B4DZMP8Hko6wBL5KJxJWwlgBG+P3Xzw5hVhO9fn7GWdq\n9FwXN9ijujk1nBJoYmF2in9vzqw7vWrL16k1rEDTSOZ33VfdjgZvN5hh8DKDF1mjr7BeaUolpaip\nqK4yv5AZ68tflFcLqNfEmHauqXU1mmlN594upidXQ31vr6dH5/m+3GAn63QcyER+p1LUFHVZyzxN\ndl4K/BR4FPgIcABwW9UX1emFVvVSMMY1MWPlWLfSwvgP/LZGRqLUgVApvqgpaiqqq1FB2sG56o+z\nafTd2PIuXNn6w+PyOvdlbftCC11123Nj5hXs4w3j8Rbdbl2da6K/UylqirqsZZ4WBxBhVNj7gTOA\n3wE/Sa1GkaKRiLqajLDN/rNdbjDd6kbwxn+4tHzdnvKecXdfyMXarGY9kj1+45HgW/LHyK/TVpbZ\n+6pR7amfNQVP5aVuaxLXAt8E7gNeDkwGbga2xRuuC+pq7I4YNSAHROMw02Dr38WG6/V5X5JVX4+n\nfr4nMzaNrSUsbPKRUdtn3rgv+bIbjcuqPXlKIxUpOxsOBDKz1wCfBYzQX/NK4AXAF4GdG+3nBCSN\nNNtuIXT0NSF8wj8Qhp9MnQ5rl8Gkn8LefddUBSlqgnJ0ZcN2Nw8bv/aM0Jh7PiHNfwrWnQQLHg0R\nZ79FeEc+kgkhPm6fDXDz6hCK/MxJYcDf9KWweXa315ElvrdHwoLlIa0/2RLrXJHiO5WipsIUsDS3\nZH8DrwQWVm0B27WGFWgaoWEjX616PnPj+K+8mRvH71fO1xoJ+kVT1FSGrnafIYX9/1nXTq2dq9mk\nQtn2gvK//FN8fq6pLV3WMk+Bg0zJ/D6p6ovq9EIr1NakT/3MZ/KMxNj93MebeipWwLfv5mlgaHJ7\nPjU7p79LnhqlUozEIKSUjcSoxrx/4K1/HRqos43VnFe1Vk/tPNdGXUXLaQuoL+CLtXN5e4GnYqlI\n2ZlUcLKJRDEf5PQ7YP0S+OQfQlq/xMw+0OK4Bf3cnWrqLylqgnZ0NZp6NK+dYW2hAWSNpiAFNjTb\nz8a3F/Q8NHg7z6+bd7dXmvpFipoKU7EVOxRYRRh+fHyDPGfF7bcCL+/UGlZwbSN1yyWFg+gm1EJ6\nftEUNbWjq/WXfXtunmbPmNx2rmprC8XvU/90p/hOpagp6rKWeQocZFx3V+CAEsRtBawG5gDbEBrF\nX1yX5zDg0vj71cBPO73QFFInhcbY/T0GTmqp7MKvyDPu9j2q5j41v65BvKaJkIqUnUXmuP4yYZxE\nlq/krGuX/YHVZrYGQNKFwBGEwXs1DicGODKz6yTNkLSTmT3U5bkrwXoyS5pTJdZGLKiy5lSYaO+R\nz/ueOE0szB8DC4FfAQvi74WEeRtvLcGCvQM4N7P8XuDLdXmWAa/JLP+QMBFS29awAgs90mb+IrH7\n3d00oLqKPrtm+VK8V0U1Nb+u0gf8Dex9qkCXtcrTrCaxLbADwS20Q2b9+ljAd4sVzKci+0laAqyJ\ni+sI4ztWxG0jAH1engsUzX8cTP1ciCgK8NHXSTrJzL5Ylz9+tR57asj3xIlxXSF9mXtVxf0YtOV2\nnl+B5e1PHW3gXkEYGHfuQmDM84vP82Q49l2w7WMxWuqG7DOs6H68CmYdFAbnPbU9bLMenjgxq63F\n/htio/pC2DgTnviubakpbJw51oFwV1zX2fUCcyVV/f70+H3qbDn+PjrepzUUoYClmdMjCzYPuDyz\nfAJ1jdeE4aZ/llleRYxO2641TDl5W8PET4P8jMkNJjg2TEi5x/euu318ttYqT5E2icmSziU0MNfy\nm5m9oZAVasyNwJ5x/uwHgHcD76nLczFwLHChpHnAOhvA9gif39cpa06FfrxLBeamIPxrnlXKHBVW\nwhwfTg8pYGluI4QKfzUhJMcryWkX6NCKvQm4m9DL6YS47kPAhzJ5vhK33wrs16k1rMBCj8S/Lb+S\niuQpU1NKKUVNvdJF173bOK7X70n+u5g3N8Xb4t/tb6j6WQ3CO5WipqjLWuYpcJCVVV9IGRdagaaR\n8LeYm6HdAqSTAifFFzVFTanqgu1v6LXLqkFE2pVN3E3HVX1fBuPZpacp6rJWeYq4m5ZJ+hjwfTIj\nPs1sbYF9hxYbbURryNhqPWfEUbUt6bTLYBFN/SZFTZCqrm0fq+a8kx6Fdd+BT/4FMAl+/xCcd0eq\nbqEUn12KmgpTwNKsAX5Zn6q2gO1awxLP1cEXf5FpIJt1iSw/eJynwUtF35kenOO8nBhji/L39QFx\ng5SKlJ2Vi+zXhZZ0nsL/pIybarL9gr6xgenMSJBglTdFTf3Q1UmBypawHL0tiOvPESbCGve+PZK9\nT/0wYKk8u4miKeqyVnlaupskTSMMptvNzI6RtCfwQjP7j1b7TjzG9fIo1LvDio+Q3S8Edav1Wml0\nvnJ6yjjV0M0I4zbepY6pP4dUZN6izv43nPQp0iZxHrASeE1cfgD4d2AIjURxrKUPclxBDxwzG/Y+\nqFZowKxGx+6oy2BrTf0nRU3Qa10df2z0UFMz1i6G+Z8fXZ4PPPOYpEOKvHfQ327gKb5TKWoqTIHq\nyMr49+bMuq7DcvS7ylTSeUoO5ralWv9I3pzWY883du7iqu+5p27eo0YzyKXrzwcWhbnXZ8R3cVzI\nkGbhRJJxRXka91ytZZ4CB7mW8LV7c1zeA7i+6otr90JLPFchnzBt+CBbh5uethKmb+r2n6wdTX28\nn8lp6rWu8YXm1KfHz1qY24mh0nuV/56OjpMI1zVjZfjombblY6bfHS2qvk+DoinqslZ5iribTgEu\nB54r6QLgAEZjfwwd1hOfcM31dPsU+C9g1WZYt6J2Pmn2Qlg8yf29EwMb5y7cdjacuV+z5xvcNduf\nKs1+LO1R+5tfDGdOAWbD/KXN3KbOgFDQ2uwIvCWmHau2fp1Yw9QTYe7i3NqCd3md2KnYXAvF3TX0\naGBmKx3N59ieuO6mdu93SqlI2Vn0QM8h1CAOBF4HvK7qi2v3QlNPPnfx8KZOC99OjjWaZ0uhtqgs\nA1T2LH2DkAb9f7MUIwGcThhQdylhfodlwLKqL67dC61A00iD9bn/LP2YuauRphTvU9Wp37qKFb5X\nZ9+NR/Leg/ZrJTM2dVNLZdx4oK7mOynFkPTz2bURdqev71Mb98pa5SnSJnEkYVxE00nYndY07x/f\nfOyDTbDZyJyxNH++tXfjI1PgXuBTwAdmw7lL25/Brb777TmTulM+inURzdVnp0uYApbmMmCHqi1e\nt9YwheTz/HrqNLFl5PM8g8sbfrXStutqoZXRc67D68mO6h7IdrdW9zv1VKTsLFKTeAq4RdJVjAb4\nMzOb372JcmDzbGm7e2Dq7jDzaVj7BTM7rWpVTnWEr+oZp8Gk3WHDvfD7RRZ6ud0EHz4ohP7Kx1p+\nzdfXWM99CtafCgtG8vOXT16tATbc1XyvNGl9vycABSzN0TEdFdPRwFFVW8B2rWEFmkZy1tV/dWyG\nbTYVCZ7WK01VpxQ1VakrviOZMRM7WhhHwSGUNJ8EJdZYW92nvHMVC0c+fVN2rMUgPLtB0xR1Wcs8\nBQ80Gdg7pm2qvrBOLjSVl4ItA45mbApV/edao+Bp/dKU4n2qOlVnJPIK0HkWC9mRMgv4Xt+nnI+i\npgEqx/9vFDOE9fckxXcqRU1Rl7XMU+TiCK1lP4lpDXBg1RfX7oWmlMb+k+yRYyRmbkylEPBU5bth\nY4xE1dqa6y4e6Xh8bWn60632aXKutrrwehr33KxVniJtEouBg83sbgBJewEXAvsV2NdpyQcIAdNq\nzAeO2SYb6M8mmo/TacLaM2D+6wi1d0JPpic3wJPJRvlt1DOp+UjrZ4BzMr87Ptcb4RiPRtBLClia\n24qsS90aVqBppMm2umr45E0w85mQxgf6a3wMn750IuqiYQyk9sbe9EZXfW2BkXZHWnczcLRxTSu7\n7PNut6HLWuUpUpNYKenrwLcAAX8B3Ni+OXJq2LgeERvOMHv6ijCXxN4H5e1TF2p5BUw/0fuUT0ys\njTEx/Rpf0Li2QMPxU+Pf89DzJ8Qia2+f5upWbYbz43iP+U/BE98tfGFOawpYmu2AhYQ5rr8PfBKY\nXLUFbNcaDkJi/FfUJmBR/vpsjWOhha9Ob8cYttSv8QVlho1pN3+BfRel1Jg/SKlI2Vn0QJOBfYF9\ngG2rvrBOLnRQUnjhZ2wKVeiF8Z9gxsrGVezLLXST9Ia7YUwpGImwvTdBBcveN6VzpJBKMRLAm4H7\ngR/HdD9wWNUX1+6FVqBppLP9cv8Zc+YYrsXcmWdtBH/rSFOK92mi68orpPI0dfNV3r6evPaF9u5T\nnwr4rp5dL+5p1e9TE13WKk/R3k2vN7PVAJL2IAT7u7TAvrlImgV8B9id0KX2XWa2LiffGmA9sAn4\ng5nt3+k5B5sN98L8qYyN61QbJbsfUGQSYmdAaMf/b30a8dvoPJJGih5jcOIz+XzdYyhgaW6oW1b9\nug6s198Dn46/jwe+0CDfL4FZZVjDQUk0/mJrED12sGPHeMp7B8p3ITV6fwb9uoZZZznXirXKU7R3\n06VArcfAO4EbJb0tnuH7HdimwwlzUwCcD6wA/qZBXnVw/IFktAfT738Nn9wJtBHWL7bRL61xXzI2\nDLFjnK4YnC/4VGgekXnoKGBplsR0XkzZ3+d1aL0ey/xWdrku3y+Amwldbo/pxhpWYKFH2swfawQL\ne9YQ3a4GL4cWAAASbklEQVSmFO/TMOiiJP//6PF692XcjqZG15Xis6PkmlfC77m1ytOyJmFmR7dl\ndSKSlgM752z6TN3xTZI1OMwBZvagpGcByyWtMrNrGpxvCaF9A2AdcIuZrYjbRuK5+rk8l1BDKph/\n+1PhrClwMfBXhOaaEYApcOypkjZ0qy9zr6q4H4O23ObzK28Z2ADrT4YFcczM+uVk2iPaP97GmZAN\nsnpXXNfZ8bpYjrXeY08NZ37iRMu0a5R4/+ZKanf/V8GseL/XLgduMHv04Np2SSOD+j5ll+Pvo+N9\nWkMBFK1J4wzS84G/BubAFqNiZnZ4kRM0OOYqgmX9jaRnA1eb2Yta7HMy8ISZjav2STIzG2i3VBhI\nt/igYCQOZ7TR7HxgwfLaC+s47TLqbjor6z5J1t1UN3C0j6HLB+P+lEmRsrNIm8RFwNcJ05Zujuua\nW5bWXEwoBU+Pfy+qzyBpKrCVmT0uaRpwMPC3XZ43YWp+0GOmhHg9NYbcH+p0jQ1Qu1U17Sfem6kp\nBXxW15fl/8occxbwQ+DnwJXAjLh+F+CS+Pv5wC0x/Qw4ocnxrGyNJVzjSAf7RD/o5Htg5vo4j3Fp\nc0t0oinF+zSsuoZBUxntJ+1q6kdvphSfXdRlrfIUqUl8WdIpBKu6xS9qZje1Z45GMbO1wJ/krH+A\nMHgPM/sFwY83FIxWsTfMhsm7wpdiFND5J0paaYl++TnO4OO9mZpRpE3iC8D7gNWMupsws9f3Vlpx\nBr1NYqxP9Bzgw3ibhDOMVNU+0O92kFQoq03incDzzGxjObKc8WR9ohdXLcZxKsMqaj+xNiLvDhuT\nCuS5HZjZMpczhnbCFYzlg4SG6/Njmr85hAavUlPvSFETpKlrWDSZ2RVmjx4cUvsGYljuU78oUpOY\nCaySdAOjbRJmXXSBdeqp94k++Qf4+Fbw4klh1q1zvV3CcZxKKNImMRJ/1jKKYCR+3ENdbTHobRJQ\n7xPdMBvO3s/bJZx2GFa/eh5+L4pRSpuEhVF6OwOvIhiK683s4ZI0OpGsTzQMrHOc4nh8plH8XpRL\nyzYJSe8CriM0YL8LuF7SO3stbNDpzge59ozQq2NLu8RTYV2VmnpDipogTV3NNc1aGHoEHUVIZ00Z\n/ZKuSlNV1ELc9PdeNCPN+1SMIm0SJwKvqtUeFOIoXQX8Wy+FDTNV9fBwHMepp0ibxO3APhYzSpoE\n3Gpme/dBXyEmQptEJ7jf1akxzPGH6vF7UZwiZWcRI/FFwvzWFxAard8N3GZmny5LaLcMo5HwfwSn\nHv9oGMXvRTFKMRLxQG8HDoiL15jZ0hL0lUaKRiIbWrg3x69FjS3eA6rXmjohRU2Qpi7XlHv+ccag\nak15pKgJuuzdJGlPYCcz+08z+x7wvbj+tZL2MLP/KVeu4zhOcRr1YiJnLnCncxrWJCRdQoi8elvd\n+n2Az5vZW/ugrxAp1iTapd3qsbubnGGnk9q0M5Zux0nsVG8gAMzsNknP61qdswVJi2DG52CvScGr\nd27Lft3eA8pxnH7QbJzEjCbbtitbyESjaL/oWCP4HJw5KUR//RZh4qHW/brbjXGTYl/tFDVBmrpc\nUz3544n8PpVLs5rEjZI+aGZfy66UdAywsreyholZC2HxpMysWIRw4Y7jNKNRbXqQC+QUadYmsTOw\nFNjIqFF4BTAZONLMHuyLwgIMcptEvl/1E5th3WHtuI+8y5/jOO3SdRdYSQJeD7yMELfpDjP7Uakq\nS2CwjcS4BujNsP4kMzuti2N4I7bjOC0pbZxE6qRoJNrpF91tLaBoL48U+2qnqAnS1OWaiuGailPW\nzHROj/FZsRzHSRWvSUwA3N3kOE4nuLtpiPCGa8dx2qVI2VlkjmunA/rdDa/ImIkUuwamqAnS1OWa\niuGayqUSIyHpnZLukLRJ0n5N8h0qaZWkeyQd30+NjuM4TkXuJkkvAjYDXwUWmtlNOXm2Au4G/gT4\nNXAD8B4zuysn79C7mxzHcdol2d5NZrYKIAzDaMj+wGozWxPzXggcAYwzEo7jOE5vSLlN4jnA/Znl\nX8V1A0GKPkjXVJwUdbmmYrimculZTULScmDnnE2LzGxZgUO05QeTtARYExfXAbfUBq/UHlCfl+cC\nVZ5/3HLmXiWhJ/Flf34DugzMlZSMnpTep/j76Hif1lCASrvASrqaxm0S84BTzOzQuHwCsNnMTs/J\n620SjuM4bTIoXWAbCbwR2FPSHEnbEubWvrh/shzHcZyqusAeKel+YB5wiaTL4vpdFGbEw8yeAY4l\nhKu4E/hOXs+mVOnEBynpEGn2lSHpkBQ09ZoUNUGaulxTMVxTuVTVu2kpIQx5/foHgDdnli8DLuuj\ntMpQg/l6feS04zhV4mE5EsHn63Ucp98MSpuE4ziOkyhuJHpE+z7I/Pl6q9XUe1LUBGnqck3FcE3l\n4vNJJII1mK+3WlWOM3zIIyqPwdskEsBfSsdJAw3Z3CxFyk6vSVSM92pynJSYtTD8L9Y6kDAl1u6H\n9v/R2yR6RHEf5KyF4avlKEI6a8poraIqTf0jRU2Qpi7XVAzXVC5ek3Acx9nC2jNg/muBrLup1A4k\ng4a3SVTMsPlAHSd1hqmNsEjZ6UYiAYbppXQcJx18MF2FtOODtALzU/dbU79IUROkqcs1FcM1lYsb\nCcdxHKch7m5yHMcZUtzd5DiO43SFG4ke0QsfZLfzTaToF01RE6SpyzUVwzWVi4+TGBB8ZLbjOFXg\nbRIDgs834Tjd493Nx+KxmxzHcSJeG+8Mb5PoEeX7ILufbyJFv2iKmiBNXa6pGI019S9OWnFN6eM1\niQHB55twHKcKvE3CcZyhwOOkjcdjNzmO42Twhuux+GC6CknRB+maipOiLtdUjGaa+hUnrR1NqVOJ\nkZD0Tkl3SNokab8m+dZIuk3SzZKu76dGx3EcpyJ3k6QXAZuBrwILzeymBvl+CbzCzNa2OJ67mxzH\ncdok2XESZrYKQCpUrnvh7ziOUxGpt0kY8ENJN0o6pmox7ZCiD9I1FSdFXa6pGK6pXHpWk5C0HNg5\nZ9MiM1tW8DAHmNmDkp4FLJe0ysyuaXC+JcCauLgOuMXMVsRtIwB9Xp4LVHn+ccuZe5WEnsSX/fkN\n6DIwV1IyelJ6n+Lvo+N9WkMBKu0CK+lqmrRJ1OU9GXjCzMaNMvY2CcdxnPYZlC6wuQIlTZW0Q/w9\nDTgYuL2fwhzHcYadqrrAHinpfmAecImky+L6XSRdErPtDFwj6RbgOuA/zOzKKvR2Qoo+SNdUnBR1\nuaZiuKZyqap301Jgac76B4A3x9+/IPjxHMdxnIrwsByO4zhDyqC0STiO4ziJ4kaiR6Tog3RNxUlR\nl2sqhmsqFzcSjuM4TkO8TcJxHGdI8TYJx3EcpyvcSPSIFH2Qrqk4KepyTcVwTeXiRsJxHMdpiLdJ\nOI7jDCneJuE4juN0hRuJHpGiD9I1FSdFXa6pGK6pXNxIOI7jOA3xNgnHcZwhxdskHMdxnK5wI9Ej\nUvRBuqbipKjLNRXDNZWLGwnHcRynId4m4TiOM6R4m4TjOI7TFW4kekSKPkjXVJwUdbmmYrimcnEj\n4TiO4zTE2yQcx3GGFG+TcBzHcbqiEiMh6YuS7pJ0q6TvS/qjBvkOlbRK0j2Sju+3zm5I0QfpmoqT\noi7XVAzXVC5V1SSuBF5qZvsCPwdOqM8gaSvgK8ChwEuA90h6cV9VdsfcqgXk4JqKk6Iu11QM11Qi\nlRgJM1tuZpvj4nXAc3Oy7Q+sNrM1ZvYH4ELgiH5pLIEZVQvIwTUVJ0VdrqkYrqlEUmiT+Evg0pz1\nzwHuzyz/Kq5zHMdx+sTWvTqwpOXAzjmbFpnZspjnM8BGM7sgJ9+gd7uaU7WAHOZULSCHOVULaMCc\nqgXkMKdqATnMqVpADnOqFpDDnKoFdEplXWAlHQ0cA7zRzJ7O2T4POMXMDo3LJwCbzez0nLyDblAc\nx3EqoVUX2J7VJJoh6VDgOODAPAMRuRHYU9Ic4AHg3cB78jL6GAnHcZzeUFWbxJeB7YHlkm6W9E8A\nknaRdAmAmT0DHAtcAdwJfMfM7qpIr+M4zlAyIUZcO47jOL0hhd5NpSFpoaTNkmZVrQVA0ufigMFb\nJF0ladcENBUayNhnTe+UdIekTZL2q1hLcgM4JX1D0kOSbq9aSw1Ju0q6Oj63n0man4Cm7SRdF//f\n7pT0d1VrqiFpq+g1WVa1FgBJayTdFjVd3yzvhDESsQA+CLi3ai0Z/t7M9jWzucBFwMlVC6LAQMYK\nuB04EvhJlSISHsB5HkFTSvwB+KSZvRSYB3ys6nsV2zdfH//f9gFeL+m1VWrK8HGC2zwV140BI2b2\ncjPbv1nGCWMkgMXAp6sWkcXMHs8sbg88UpWWGgUHMvYVM1tlZj+vWgeJDuA0s2uAx6rWkcXMfmNm\nt8TfTwB3AbtUqwrM7Mn4c1tgK2BthXIAkPRc4DDg60BKnWwKaZkQRkLSEcCvzOy2qrXUI+nzku4D\njgK+ULWeOhoNZBxWfABnB8QeiC8nfHRUiqRJkm4BHgKuNrM7q9YEfInQm3Nzq4x9xIAfSrpR0jHN\nMlbSBbYTmgzO+wzBZXJwNntfRNF60KCZfQb4jKS/IbwsH6haU8zTbCBjJZoSIBVXwMAgaXvg34GP\nxxpFpcRa8tzY1naFpBEzW1GVHklvAR42s5sTC/J3gJk9KOlZhF6mq2KNdRwDYyTM7KC89ZJeBjwP\nuFUSBPfJSkn7m9nDVenK4QL69NXeSlMcyHgY8MZ+6IG27lOV/BrIdi7YlVCbcHKQtA3wPeBbZnZR\n1XqymNnvYnf6VwIrKpTyGuBwSYcB2wHTJX3TzN5foSbM7MH497eSlhJcrblGYuDdTWb2MzPbycye\nZ2bPI/xT79cPA9EKSXtmFo8Abq5KS43MQMYjmgxkrJIqfbZbBnBK2pYwgPPiCvUki8IX2b8Ad5rZ\nmVXrAZC0o6QZ8fcUQkeWSv/nzGyRme0ay6Y/A35UtYGQNFXSDvH3NIIXpmHPuYE3Ejmk5DL4O0m3\nRx/pCLCwYj3QYCBjlUg6UtL9hF4yl0i6rAodqQ7glPRt4FpgL0n3S+q5y7IABwDvJfQgujmmqntg\nPRv4Ufx/uw5YZmZXVaypnhTKp52AazL36T/M7MpGmX0wneM4jtOQiViTcBzHcUrCjYTjOI7TEDcS\njuM4TkPcSDiO4zgNcSPhOI7jNMSNhOM4jtMQNxLOhELSzpIulLQ6xqW5pG5Q48Ah6UBJf1wg37Xx\n71JJO2XWJxdq3Bkc3Eg4E4Y4CngpYVTrC8zslYS4Xjs13zN5Xk8I79AQSS8AVsd78GwzeyizOcVQ\n486A4EbCmUi8nhC08Gu1FWZ2m5n9J2yZcOn2ONnKu+K6EUk/lnSRpP+R9AVJ75N0fcz3/JhviaRz\nJN0g6W5Jb47rt5N0Xsx7Uy2Im6SjFSZ1ukzSzyWdXtMk6WBJ10paKem7MTRCbSKYU+L62yS9MEZY\n/RDwyTiqecz8CJKmxJGzVxFG9d9JCC1ys6R94z1ILtS4MzgMTIA/xynAy4CVeRskvR3YlzAZzbOA\nGyTVJjnaB3gRoSD9JXCume2vMNvaXwOfjPl2M7NXxa/2q+PfjwGbzGwfSS8ErpS0V8y/LzAX2Ajc\nLeksYAMhcvEbzewphdnvFgCfI4Rs+K2ZvULSR4BPmdkxks4BHjezxfXXZWZPEaKefoUQS2lvYJqZ\n/XMnN9Bx6vGahDORaBZj5gDgAgs8DPwYeFXc5wYze8jMNgKrCbGbAH4GzMkc+7sAZrYa+AXBsBwA\nfCuuv5swM+JeMf9VZva4mW0gfOHPIcSneglwraSbgfcDu2V0fj/+vSlzbmgd+HDveI59geTmVXEG\nF69JOBOJO4B3NNleX9DWjMqGzLrNmeXNNP8fqe3fqADPHndT5ljLzezPW+yTzd8QSScBbwf2AH4K\nPB84SNJlZpbEHN3OYOM1CWfCYGY/AiYrM9OWpH2iH/8a4N0KM5c9C3gdcD3FQ5MLeKcCexAK41Xx\nuH8Rz7UXoVawqsFxjVCQHxCPgaRpBXpfPQ7s0OCaPwf8H+AbwKuBW81sHzcQTlm4kXAmGkcCfxK7\nwP4M+DzwoJktJbhhbiU08h4X3U5GYzdVdpsB9xEMy6XAh6J76p+ASZJuI8yJfVScHzv3uGb2CHA0\n8G1JtxJCgL+wxbmXAUfGxugDcvIeSDBW+wP/Xb8x0VDjzoDgocIdpwCSziPMT/D9lpkdZwLhNQnH\ncRynIV6TcBzHcRriNQnHcRynIW4kHMdxnIa4kXAcx3Ea4kbCcRzHaYgbCcdxHKchbiQcx3Gchvx/\ncMoIdtMS+BwAAAAASUVORK5CYII=\n", "text/plain": "
"}, "metadata": {}}], "metadata": {"collapsed": false, "trusted": true}}, {"execution_count": 7, "cell_type": "code", "source": "#
\n# eigendecomposition on the covariance matrix\ncov_mat = np.cov(X_pca.T)\neig_vals, eig_vecs = np.linalg.eig(cov_mat)\nprint('Eigenvectors \\n%s' %eig_vecs)\nprint('\\nEigenvalues \\n%s' %eig_vals)", "outputs": [{"output_type": "stream", "name": "stdout", "text": "Eigenvectors \n[[ 1.00000000e+00 2.27318015e-17]\n [ 0.00000000e+00 1.00000000e+00]]\n\nEigenvalues \n[ 4.22484077 0.24224357]\n"}], "metadata": {"collapsed": false, "trusted": true}}, {"execution_count": 8, "cell_type": "code", "source": "#
\n# prints the percentage of overall variance explained by each component\nprint(pca.explained_variance_ratio_)", "outputs": [{"output_type": "stream", "name": "stdout", "text": "[ 0.92461621 0.05301557]\n"}], "metadata": {"collapsed": false, "trusted": true}}, {"source": "## K-Means Clustering Plots\n\nThis recipe performs a [K-means clustering](http://en.wikipedia.org/wiki/K-means_clustering) `k=1..n` times. It prints and plots the the within-clusters sum of squares error for each `k` (i.e., inertia) as an indicator of what value of `k` might be appropriate for the given dataset.\n\nThis recipe defaults to using the [Iris data set](http://en.wikipedia.org/wiki/Iris_flower_data_set). To use your own data, set `X` to your instance feature vectors, `y` to the instance classes as a factor, and `labels` to human-readable names of the classes. To change the number of clusters, modify `k`.", "cell_type": "markdown", "metadata": {}}, {"execution_count": 9, "cell_type": "code", "source": "#
\nimport warnings\nwarnings.filterwarnings('ignore') #notebook outputs warnings, let's ignore them\nfrom time import time\nimport numpy as np\nimport matplotlib.pyplot as plt\nimport sklearn.datasets\nfrom sklearn.cluster import KMeans\n\n# load datasets and assign data and features\ndataset = sklearn.datasets.load_iris()\n# define feature vectors (X) and target (y)\nX = dataset.data\ny = dataset.target\n\n# set the number of clusters, must be >=1\nn = 6\ninertia = [np.NaN]\n\n# perform k-means clustering over i=0...k\nfor k in range(1,n):\n k_means_ = KMeans(n_clusters=k)\n k_means_.fit(X)\n print('k = %d, inertia= %f' % (k, k_means_.inertia_ ))\n inertia.append(k_means_.inertia_) \n \n# plot the SSE of the clusters for each value of i\nax = plt.subplot(111)\nax.plot(inertia, '-o')\nplt.xticks(range(n))\nplt.title(\"Inertia\")\nax.set_ylabel('Inertia')\nax.set_xlabel('# Clusters')\nplt.show() ", "outputs": [{"output_type": "stream", "name": "stdout", "text": "k = 1, inertia= 680.824400\nk = 2, inertia= 152.368706\nk = 3, inertia= 78.940841\nk = 4, inertia= 57.317873\nk = 5, inertia= 46.535582\n"}, {"output_type": "display_data", "data": {"image/png": "iVBORw0KGgoAAAANSUhEUgAAAYQAAAEZCAYAAACXRVJOAAAABHNCSVQICAgIfAhkiAAAAAlwSFlz\nAAALEgAACxIB0t1+/AAAIABJREFUeJzt3Xm0XXV99/H3hyEyJBCmJmRijhJBBZQ6lmgBgSpg7UKw\nVaqAtbTAaqmVuPqUtD51wD6K2GqbgIID0WgLghAgDLcFFaMSZIgBAgaTQG5AZIhMCfk8f+x9ycnN\nnXPP2Wf4vNa66+79O3vv872XcL73N8s2ERERW1UdQERENIckhIiIAJIQIiKilIQQERFAEkJERJSS\nECIiAkhCiGgoSV+R9A9VxxHRF2UeQnQ6ScuB02zfNMrP/fPyuW8bzedG1EtqCBHg8mvUSNpmNJ8X\n0QhJCBEFSfpzSbdJ+pykJyQ9JOmYmgt2lnSJpEckrZT0SUlbla/9uaQfSvq8pMeBbwNfAd4k6RlJ\nT5TXXSrpk+XxLpJ+IGlN+X5XS5pcwc8eASQhRPToqSEcDiwFdgMuAC6pueZS4EVgP+AQ4Gjg9JrX\nDwceBH4P+DPgo8CPbY+zvWvN+/S8l8rnTyu/ngP+bTR/qIjhSEKI2NTDti9x0bn2dWBPSb8naQJw\nLPA3tp+z/RhwIXByzb2P2P532xtsP0/xgd8XAdh+wvYVtp+3vRb4FHBE3X6yiEGknTNiU6t7Dmw/\nKwlgLLA7sC3waFkGxR9Uv665d8Vw3kjSDsAXgHcCu5TFYyXJGe0RFUhCiBiaFcALwG62N/RzTe8P\n8f4+1HvKzwWmA4fbXiPpdcAdFDWIJIRouDQZRRT6a94BwPajwA3A5yWNk7SVpP0k/cEAt60Gpkja\nttf79LzXWIp+g6ck7QqcP/LwI7ZcEkJEwfQ9/LT2/IPAGGAJ8ATwXWBir/tr3QzcC6yWtKaP6y4E\ntgceB34ELOjjGRENU9eJaZJeSTH8rse+wP8Bvgl8B9gLWA6cZPvJ8p5ZwIeBl4Czbd9QtwAjIuJl\nDZupXI7XXkUxNO8s4HHbF0j6OLCL7fMkzQAuB94ATAZuBKYP0GYbERGjpJFNRkcCy2yvAI4HLivL\nLwNOLI9PAObZXmd7ObCMIoFERESdNTIhnAzMK48n2O4uj7uBCeXxJGBlzT0rKWoKERFRZw1JCJLG\nAO+m6ITbRDneeqB2q3SyRUQ0QKPmIRwL/Lyc3QnQLWmi7dWS9gR6RmCsAqbW3DelLHuZpCSIiIgR\nsD3g8OpGJYRT2NhcBHAVcCrw2fL7lTXll0v6PEVT0QHAot4PG+yH6hSSZtueXXUczSC/i43yu9go\nv4uNhvLHdN0TgqQdKTqUz6gp/gwwX9JplMNOAWwvkTSfYpz3euDMTOGPiGiMuicE27+jWAemtuwJ\niiTR1/WfoljkK/ohHXQcTD0b9n+VdOwbYcVF9j3XVh1XRLS2rGXUYopk8KYvwtz9oQuYuRecsZ90\nEB2eFLqqDqCJdFUdQBPpqjqAVtJyW2iWC0F2bB+CdOx1sOCdm79y3HX2tcc2PqKIaAVD+ezMWkYt\nZ9x2fZeP3b6xcUREu0lCaDnPPN93+drnGhtHRLSbJISWs+IiOGPZpmWnPwi//lI18UREu0incoux\n77lWOgg47iwYtwPs+0bY8JkO71COiFGQTuUWJ3EBYJuPVx1LRDSvoXx2JiG0OInpwK3AVJsXq44n\nIppTRhl1AJv7KWZ2n1B1LBHR2pIQ2sMc4CNVBxERrS1NRm1AYjtgBfD7Ng9VHU9ENJ80GXUIm+eB\nbwCnVx1LRLSu1BDahMSBwM3ANJt1VccTEc0lNYQOYvNL4AHgXVXHEhGtKQmhvaRzOSJGLE1GbURi\ne4rO5dfbLK84nIhoImky6jA2zwHfAk6rOpaIaD2pIbQZiYOA64G9bNZXHU9ENIfUEDqQzT0U+1Qf\nV3EoEdFikhDa01zSuRwRw5QmozYksQNF5/LrbFZUHU9EVC9NRh3K5llgHvDhqmOJiNaRGkKbkngt\n8ANgb5uXqo4nIqrVFDUESeMlfU/SLyUtkfT7knaVtFDS/ZJukDS+5vpZkh6QtFTS0fWOr13Z/AJ4\nBDim6lgiojU0osnoi8C1tg8EXgMsBc4DFtqeDtxUniNpBvA+YAbFB9mXJaVZa+TmAGdUHUREtIa6\nNhlJ2hlYbHvfXuVLgSNsd0uaCHTZfpWkWcAG258tr7sOmG379pp702Q0RBJjgV8DB9usqjqeiKhO\nMzQZ7QM8Julrku6QNFfSjsAE293lNd3AhPJ4ErCy5v6VwOQ6x9i2bNYC84EPVR1LRDS/bRrw/EOB\nv7b9U0kXUjYP9bBtSQNVUzZ7TdLsmtMu212jEGu7mgP8t8SnbDZUHUxENIakmcDM4dxT74SwElhp\n+6fl+feAWcBqSRNtr5a0J7CmfH0VMLXm/ill2SZsz65fyO3F5g6Jx4GjKJa0iIgOUP6h3NVzLun8\nwe6pa5OR7dXACknTy6IjgXuBq4FTy7JTgSvL46uAkyWNkbQPcACwqJ4xdogsix0Rg6r7PARJrwUu\nBsYAD1K0Z29N0bY9jWLdnZNsP1le/wmKCVXrgXNsX9/reelUHiaJcRSdyzNsHq06nohovKF8dmZi\nWoeQmAs8ZPPpqmOJiMZLQoiXSbwB+DZwQDqXIzpPMww7jebxM+AZ4B1VBxIRzSkJoUPYmHQuR8QA\n0mTUQSR2pujEf6X98lDfiOgAaTKKTdg8BVzBxiG/EREvS0LoPHOBj0iklhURm0hC6Dy3A88zzCnt\nEdH+khA6TE3ncpbFjohNpFO5A0nsAvwK2N/m8arjiYj6S6dy9MnmtxTrRn2w6lgionkkIXSuOaRz\nOSJqJCF0rh8CG4C3Vh1IRDSHJIQOVXYuzyUzlyOilE7lDiaxG8WS5PvaPFF1PBFRP+lUjgHZ/Aa4\nBvhA1bFERPWSEGIOcEY6lyMiCSH+F9gWeFPVgUREtZIQOlw6lyOiRzqVA4k9gPuBfWyerDqeiBh9\n6VSOIbF5DLge+NOqY4mI6iQhRI+5wF+kczmicyUhRI9bgB2Aw6sOJCKqkYQQANhsoKglZFnsiA5V\n94QgabmkuyQtlrSoLNtV0kJJ90u6QdL4mutnSXpA0lJJR9c7vtjEpcB7JXaqOpCIaLxG1BAMzLR9\niO2e5ojzgIW2pwM3ledImgG8D5gBHAN8WVJqMQ1i003x3+P9VccSEY3XqA/b3h2VxwOXlceXASeW\nxycA82yvs70cWEbatBttDpmTENGRGlVDuFHSzyT1tE9PsN1dHncDE8rjScDKmntXApMbEGNsdCOw\ni8RhVQcSEY21TQPe4y22H5W0B7BQ0tLaF21b0kCz4zZ7TdLsmtMu212jEmlgs0HiYopawl9UHU9E\njIykmcDM4dxT94Rg+9Hy+2OSrqBoAuqWNNH2akl7AmvKy1cBU2tun1KW9X7m7PpG3fG+Btwrca7N\n2qqDiYjhK/9Q7uo5l3T+YPfUtclI0g6SxpXHOwJHA3dT7Od7annZqcCV5fFVwMmSxkjaBzgAWFTP\nGGNzNo9Q/EM6ueJQIqKB6l1DmABcIannvb5l+wZJPwPmSzoNWA6cBGB7iaT5wBJgPXCmW22xpfYx\nB/gn4OKqA4mIxsjidtEnia2Bh4ATbRZXHU9EbJksbhcjZvMScAmZuRzRMVJDiH5JTAHuAqba/K7q\neCJi5FJDiC1isxL4IWUfT0S0tySEGExmLkd0iCSEGMwCYKrEwVUHEhH1lYQQA7JZTzqXIzpCOpVj\nUBJ7AXdQdC4/W3U8ETF86VSOUWHzMPAT4E+qjiUi6icJIYZqDmk2imhrSQgxVNcA+0nMqDqQiKiP\nJIQYEpt1FKugppYQ0abSqRxDJrEPxeqzU22erzqeiBi6dCrHqLL5FcVooz+uOpaIGH1JCDFcmbkc\n0aaSEGK4rgZeJfHKqgOJiNGVhBDDYvMicCnpXI5oO+lUjmGT2B/4EUXn8gtVxxMRg0unctSFzTKK\nvbFPrDqWiBg9SQgxUulcjmgzaTKKEZF4BbACeHNZY4iIJpYmo6ibsu/gMuD0qmOJiNGRGkKMWDn0\n9H+AaeXoo4hoUqkhRF3Z3AcsBY6vOpaI2HJ1TwiStpa0WNLV5fmukhZKul/SDZLG11w7S9IDkpZK\nOrrescWoyLLYEW2iETWEc4AlQE/b1HnAQtvTgZvKcyTNAN4HzACOAb4sKTWY5vffwKHlwncR0cLq\n+oEraQpwHHAx0NN2dTxFZyTl956x7CcA82yvs70cWAYcXs/4YsuVq55+k3QuR7S8ev8F/gXgY8CG\nmrIJtrvL425gQnk8CVhZc91KYHKd44vRMRf4kMS2VQcSESO3Tb0eLOldwBrbiyXN7Osa25Y00DCn\nPl+TNLvmtMt210jjjC1ns0TiQeCPgCurjicioPzcnTmce+qWEIA3A8dLOg7YDthJ0jeAbkkTba+W\ntCewprx+FTC15v4pZdlmbM+uX9gxQnMpZi4nIUQ0gfIP5a6ec0nnD3bPkOYhlH/tv5rig93lm/3z\nUAOTdATwd7bfLekC4De2PyvpPGC87fPKTuXLKfoNJgM3Avu7V4CZh9CcJLanaOY71ObhquOJiE2N\nyjwESf8JnAScVRadBOw1gnh6Ptg/Axwl6X7gHeU5tpcA8ylGJC0AzuydDKJ52TwHfAv4cNWxRMTI\nDFpDkHS37YMl3WX7NZLGAtfZfmtjQtwsntQQmpTEwRTJfG+b9VXHExEbjdZM5efK789KmgysByZu\naXDRfmzupljw7tiqY4mI4RtKQviBpF2AzwE/B5YD8+oZVLS0LIsd0aKGtbidpO2A7Ww/Wb+QBo0h\nTUZNTGJHilrCa+xN5pVERIWG8tnZb0KQ9Ie2b5L0XvqYD2D7v0cnzOFJQmh+Ev8OdNsMeSRaRNTX\nUD47B5qH8AcUaw29m74niFWSEKIlzAWukvgXm5eqDiYihmYoo4z2tf3QYGWNkhpCa5BYBMy2ubbq\nWCJi9EYZfa+Psu+OLKToIFkWO6LF9NtkJOlAiqWox0v6Y4rVSg3sRDFjOWIg3wY+JzHJ5pGqg4mI\nwQ3UhzCdov9g5/J7j2fIX34xCJu1EvOBDwH/UnU8ETG4AfsQJG0D/L3tTzUupIGlD6F1SBxG0eS4\nn73JEugR0WBb3Idgez3wnlGNKjqGzc+BJ4Ajq44lIgY3lFFGXwC2Bb4D/I6yL8H2HfUPr894UkNo\nIRIfBY60+ZOqY4noZFs0Ma3mIV30PTHt7VsU3QglIbQWiZ2Ah4FX2XQPdn1E1MeoJIRmk4TQeiQu\nBh6w+WzVsUR0qtHaD2GipEskXVeez5B02mgFGR1hDnCGVPc9vCNiCwzlf9BLgRuASeX5A8Df1Cug\naEs/peh/qqSZMSKGZigJYXfb34FiTRrb6yCbn8TQ2Zgsix3R9IaSENZK2q3nRNIbgafqF1K0qW8B\nR0vsUXUgEdG3oSSEc4GrgX0l/Qj4BnB2XaOKtmPzJPB94NSqY4mIvg1plJGkbYFXlqf3lc1Glcgo\no9Yl8WbgaxRDUFtreFtEi9vS/RBqHQ7sXV5/qCRsf30L44vO82NgHcVeG/9TcSwR0cugCUHSN4F9\ngTthk81OkhBiWGwsvdy5nIQQ0WSGMlP5l8AMN8kMtjQZtTaJXYGHKBa8+03V8UR0itHaIOceYM8R\nvPl2kn4i6U5JSyR9uizfVdJCSfdLukHS+Jp7Zkl6QNJSSUcP9z2j+dk8QTFI4YNVxxIRmxrqWkav\nAxYBL5TFtn38oA+XdrD9bLmM9m3A3wHHA4/bvkDSx4FdbJ8naQZwOfAGYDJwIzDd9oZez0wNocVJ\nvA34T+DV6VyOaIzR6lSePdIAbD9bHo4BtgZ+S5EQjijLLwO6gPOAE4B55Qim5ZKWUXRm3z7S94+m\ndRvFqrlvKY8jogkMmhBsd4304ZK2Au4A9gO+YvteSRNs96x62Q1MKI8nsemH/0qKmkK0mbJzeS5F\n53ISQkSTGGhP5bX0sex1ybZ3GuzhZXPP6yTtDFwv6e29XrekgZoM+nxN0uya064tSVpRma8D/yix\ni81vqw4mot1ImgnMHM49/SYE22O3MJ7aZz0l6RrgMKBb0kTbqyXtCawpL1sFTK25bUpZ1tfzZo9W\nbFENm8clrgX+DPhS1fFEtJvyD+WunnNJ5w92T92WI5a0e88IIknbA0cBi4Gr2Lh8wanAleXxVcDJ\nksZI2gc4gKIjO9rXXOAjEhkkENEEhjpTeST2BC4r+xG2Ar5h+yZJi4H55Z4Ky4GTAGwvkTQfWEKx\nmuqZzTL3IeqmC9gOeCPFLOaIqFB2TItKSXwMONDmw1XHEtHOsoVmND2J3wPuA/a2s6x6RL2M1kzl\niLqxWQMsBP606lgiOl0SQjSDOaRzOaJySQjRDG4GxgGvrzqQiE6WhBCVs9kAL89cjoiKpFM5moLE\nROCXwF42T1cdT0S7SadytAyb1RRNR6dUHUtEp0pCiGYyBzij6iAiOlUSQjSThcDuEodVHUhEJ0pC\niKZRdi5fTGoJEZVIp3I0FYnJwN3ANJu1VccT0S7SqRwtx2YV8L/A+6qOJaLTJCFEM8qchIgKJCFE\nM7oOmCTx2qoDiegkSQjRdGxeIp3LEQ2XTuVoShJTgTuBqTbPVh1PRKtLp3K0LJsVFLuonVR1LBGd\nIgkhmtkc0rkc0TBJCNHMrgX2kjio6kAiOkESQjQtm/XAJaRzOaIh0qkcTU1iL+DnFJ3Lz1UdT0Sr\nSqdytDybh4GfAu+tOpaIdpeEEK0gncsRDVDXhCBpqqRbJN0r6R5JZ5flu0paKOl+STdIGl9zzyxJ\nD0haKunoesYXLeMHwAESB1YdSEQ7q2sfgqSJwETbd0oaS9EWfCLwIeBx2xdI+jiwi+3zJM0ALgfe\nAEwGbgSm295Q88z0IXQgiU8B29n8bdWxRLSiyvsQbK+2fWd5vJZiz9zJwPHAZeVll1EkCYATgHm2\n19leDiwDDq9njNEyLgY+ILFd1YFEtKuG9SFI2hs4BPgJMMF2d/lSNzChPJ4ErKy5bSVFAokOZ/MQ\nsBh4T9WxRLSrbRrxJmVz0X8B59h+RtpYa7FtSQO1W232mqTZNaddtrtGKdRobnOBM4F5VQcS0ewk\nzQRmDueeuicESdtSJINv2L6yLO6WNNH2akl7AmvK8lXA1Jrbp5Rlm7A9u44hR/P6PvBvEtNt7q86\nmIhmVv6h3NVzLun8we6p9ygjUcw0XWL7wpqXrgJOLY9PBa6sKT9Z0hhJ+wAHAIvqGWO0DpsXgUuB\n0ysOJaIt1XuU0VsptkO8i41NP7MoPuTnA9OA5cBJtp8s7/kE8GFgPUUT0/W9nplRRh1M4gDgNoo9\nl1+oOp6IVjGUz84sXREtR+Jm4D9s5lcdS0SrqHzYaUSdzCEL3kWMutQQouVIvAJYAbzJ5sGq44lo\nBakhRFsq+w6+QTqXI0ZVagjRkiReRTGkbqrNuorDiWh6qSFE27JZCtwHvLvqWCLaRRJCtLIsix0x\nitJkFC2rXOhuJfAGm19VHU9EMxvKZ2dD1jKKqAeb56XLb4P/uVH67Qp45nlYcZF9z7VVxxbRipIQ\nomVJBx0HbzsE/nMasG9ResZ+0kEkKUQMX/oQooVNPRu+Mm3Tsrn7w7SzqoknorUlIUQLG9fPZjm7\nju+7PCIGkoQQLeyZ5/sun3aYxK0SZ0rs0diYIlpXEkK0sBUXwRnLNi07/UG46U+AC4C3Ag9ILJD4\ngMS4CoKMaBkZdhotrehYnnYWjN0e1j4Hv/5SbYeyxI4Ue3ifAhwBXA9cDizI8tnRSbL8dUQNid2A\n91Ikh9cCV1Bsx3mLzUtVxhZRb0kIEf2QmAy8D3g/MJliw6bLgUX25vt4R7S6JISIIZCYTlFrOAXY\nlqLWMM/m3koDixhFSQgRwyAh4BA2JocnKGoN37ZZXmFoEVssCSFihCS2ohil9H6Kfof7KZLDd23W\nVBlbxEgkIUSMAokxwFEUtYZ3AbdTNCtdYfN0lbFFDFUSQsQoK4exvoui5jATWEiRHK6x6WeiXET1\nkhAi6khiFzYOYz0U+D5Fs9LNNuurjC2itySEiAaRmAScRFFzmAZ8lyI53J5hrNEMKt9CU9JXJXVL\nurumbFdJCyXdL+kGSeNrXpsl6QFJSyUdXc/YIkaTzSM2F9ocTtEZvQb4KvCQxKckDq42wojB1Xst\no68Bx/QqOw9YaHs6cFN5jqQZFBOFZpT3fFlS1lqKlmOzzOaTFP+W30Ox78g1EndLfEJin2ojjOhb\nXT9wbd8K/LZX8fHAZeXxZcCJ5fEJwDzb62wvB5YBh9czvoh6srHNnTZ/D+wNnAlMARZJ/FjiLIkJ\nlQYZUaOKv8An2O4uj7vh5f8hJlHsj9tjJcWSAhEtz2aDza02Z1L8W/9nij947pO4QeJDEjtXG2V0\nukq30LRtSQN1uPX5mqTZNaddtrtGM66IerJZBywAFkjsQDGM9RTgQombKDqjr7F5rsIwo8VJmkkx\nNHrIqkgI3ZIm2l4taU94edbnKmBqzXVTyrLN2J5d3xAjGsPmWYqF9eZLjAf+GPgoMFfiKoo5Djdm\nGGsMV/mHclfPuaTzB7uniiajq4BTy+NTgStryk+WNEbSPsABwKIK4ouohM2TNl+1OZKiQ/oOYDbw\niMS/S7ylXFIjoi7qOg9B0jyKTUl2p+gv+EeKyTvzKcZqLwdOsv1kef0ngA8D64FzbF/fxzMzDyE6\nisR+wMkUcxzGUq7GCtyVOQ4xVJmYFtFGytVYD6ZIDKcAa9m4VPeDxe5xU8+GcdsV+02vuKh297jo\nbEkIEW2qbDp6E0ViOAkWPAHX7QZf3H3jVWcsgx+fk6QQkIQQ0REktoE/vR2+ddjmr370QfiPzwOP\nlF+rgO50UneeoXx2VjrsNCK2nM16ad3avl/ddmuKZqZjKOY/TAL2kHicIjnUJopHepX9Jn0UnSUJ\nIaItPNPP0tsPLrX5y9qSokbBBIqJnz1JYjLFGkyTasp3kHiUzRPFJsc2/SSjaDVJCBFtYcVFcMZ+\nMHf/jWWnPwi//lLvK8vmolX0M8+nh8T2bJoweo4PqS2XXn5enwmj/P6ozYtb9jNGvaUPIaJNFKOM\npp0FY7eHtc/Br79U7w7lcuTTzmxe2+h9PAF4koGbqFYBj9lsqGfMnSqdyhHRFCS2Bvag/4TRc7wz\nxZylwWocTw3Uv5EhuJtLQoiIliLxCmBPBq5tTKZYZaGfhDFrGjz5V/CVvTY+OUNwkxAioi1JjKPf\nhHHukfD/xm9+18eehs8tBp4pv56uOe791ddrL7TyqKsMO42ItmTzDHBf+bUJaUUXxZI5vTy2jGLZ\n8XG9vnai2K+id3nPaz3HkoadRPorX9uovpKNzWeDS0KIiDbT3xDcNWtsbh7pU8vmrP6SRe3XbhQJ\npr/XxwE7SjzL8GspfZbbvNB3zAcdB2/6YjH6bPCGlSSEiGgzQx+COxzlh+4LwONbFt/LS4+MZfBa\nyTiKxUEHTEQqPuv7SBZ/+Dr44pB35UtCiIi2Yt9zrXQQcFxDh+AOR9lc9HT5tcVqai+9ksXz/wpD\n36Y1ncoREW1KOvY6WPDO8ozBPjuz2UZERNtacVEx5HZoUkOIiGhjG2ewLzgm8xAiImJIn51pMoqI\nCCAJISIiSkkIEREBJCFEREQpCSEiIoAmTAiSjpG0VNIDkj5edTwREZ2iqRKCpK2Bf6PYEHwGcIqk\nA6uNqnlJmll1DM0iv4uN8rvYKL+L4WmqhAAcDiyzvdz2OuDbwAkVx9TMZlYdQBOZWXUATWRm1QE0\nkZlVB9BKmi0hTAZW1JyvLMsiIqLOmi0htNa06YiINtJUS1dIeiMw2/Yx5fksYIPtz9Zc0zwBR0S0\nkJZay0jSNhRb4v0hxYbZi4BTbP+y0sAiIjpAU22QY3u9pL8Grge2Bi5JMoiIaIymqiFERER1mq1T\neUCZtFaQ9FVJ3ZLurjqWqkmaKukWSfdKukfS2VXHVBVJ20n6iaQ7JS2R9OmqY6qSpK0lLZZ0ddWx\nVE3Sckl3lb+PRf1e1yo1hHLS2n3AkcAq4Kd0aP+CpLcBa4Gv2z646niqJGkiMNH2nZLGAj8HTuzE\nfxcAknaw/WzZH3cb8He2b6s6ripI+lvgMGCc7eOrjqdKkn4FHGb7iYGua6UaQiatlWzfCvy26jia\nge3Vtu8sj9cCvwQmVRtVdWw/Wx6OoeiHG/ADoF1JmgIcB1wMZEOtwqC/h1ZKCJm0FgOStDdwCPCT\naiOpjqStJN0JdAO32F5SdUwV+QLwMWBD1YE0CQM3SvqZpDP6u6iVEkJrtG1FJcrmou8B55Q1hY5k\ne4Pt1wFTgD/oxLV8JL0LWGN7Makd9HiL7UOAY4G/KpudN9NKCWEVMLXmfCpFLSE6nKRtgf8Cvmn7\nyqrjaQa2nwKuAV5fdSwVeDNwfNluPg94h6SvVxxTpWw/Wn5/DLiCogl+M62UEH4GHCBpb0ljgPcB\nV1UcU1RMkoBLgCW2L6w6nipJ2l3S+PJ4e+AoYHG1UTWe7U/Ynmp7H+Bk4GbbH6w6rqpI2kHSuPJ4\nR+BooM8Rii2TEGyvB3omrS0BvtPBI0nmAT8CpktaIelDVcdUobcAfwa8vRxSt1jSMVUHVZE9gZvL\nPoSfAFfbvqnimJpBpzc3TwBurfl38QPbN/R1YcsMO42IiPpqmRpCRETUVxJCREQASQgREVFKQoiI\nCCAJISIiSkkIEREBJCFEB5L0aUkzJZ0o6bwBrvugpLvLZYPvkHRuWX6ppPeO4H33knTKlsQeUU9J\nCNGJDgduB44A/revCyQdC5wDHGX7NcAbgafKl83IJjvtA7x/ODeUy1hHNEQSQnQMSRdI+gXwBuDH\nwGnAVyT9Qx+XzwLOtb0awPaLti/u45nLJe1aHr9e0i3l8RE1M6d/Xi6+9xngbWXZOeXKpJ+TtEjS\nLyR9pLx3pqRbJX0fuKdceuCacuObuyWdVIdfT0Rz7akcUU+2/17SfOADwLlAl+239nP5qyk22xn0\nsf2UnwucafvHknYAXgA+TrFhzbsBygTwpO3DJb0CuE1Sz5IChwCvtv1w2Ty1yvYflfftNIS4IoYt\nNYToNIcIP0xnAAABfElEQVQBdwEHUmymUy8/BL4g6SxgF9svsflSzEcDH5S0mKIJa1dg//K1RbYf\nLo/vAo6S9BlJb7X9dB3jjg6WGkJ0BEmvBS6l2CfgcWCHolh3AG+2/XyvW+6lWDr6lkEevZ6Nf1ht\n11No+7OSfgD8EfBDSe/s5/6/tr2wV6wzgd/VPOsBSYeUz/q/km6y/clB4ooYttQQoiPY/kW5Qcj9\ntg8EbgaOtn1oH8kA4NPA5yRNAJA0RtJpfVy3nI17Drw88kjSfrbvtX0Bxf7frwSeBsbV3Hs9cGZP\nx7Gk6WXz0iYk7Qk8b/tbwL8Chw7nZ48YqtQQomNI2oONewy/yvbS/q61vaBMBjeWey6YYt+F3v4J\nuETS00AXG/sUzpH0dootHO8BFpSvvVQuQ/w14CJgb+CO8j3WAO9h81FMB1Mkpw3Ai8BfDvNHjxiS\nLH8dERFAmowiIqKUhBAREUASQkRElJIQIiICSEKIiIhSEkJERABJCBERUUpCiIgIAP4/HLn8H8Qm\nXLcAAAAASUVORK5CYII=\n", "text/plain": "
"}, "metadata": {}}], "metadata": {"collapsed": false, "trusted": true}}, {"source": "## SVM Classifier Hyperparameter Tuning with Grid Search\n\nThis recipe performs a [grid search](http://en.wikipedia.org/wiki/Hyperparameter_optimization) for the best settings for a [support vector machine,](http://en.wikipedia.org/wiki/Support_vector_machine) predicting the class of each flower in the dataset. It splits the dataset into training and test instances once.\n\nThis recipe defaults to using the [Iris data set](http://en.wikipedia.org/wiki/Iris_flower_data_set). To use your own data, set `X` to your instance feature vectors, `y` to the instance classes as a factor, and `labels` to human-readable names of the classes. Modify `parameters` to change the grid search space or the `scoring='accuracy'` value to optimize a different metric for the classifier (e.g., precision, recall).", "cell_type": "markdown", "metadata": {}}, {"execution_count": 10, "cell_type": "code", "source": "#
\nimport numpy as np\nimport matplotlib.pyplot as plt\nimport sklearn.datasets\nimport sklearn.metrics as metrics\nfrom sklearn.svm import SVC\nfrom sklearn.grid_search import GridSearchCV\nfrom sklearn.metrics import classification_report\nfrom sklearn.cross_validation import train_test_split\nfrom sklearn.preprocessing import label_binarize\n\n# load datasets and features\ndataset = sklearn.datasets.load_iris()\n# define feature vectors (X) and target (y)\nX = dataset.data\ny = dataset.target\nlabels = dataset.target_names\n\n# separate datasets into training and test datasets once, no folding\nX_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2)", "outputs": [], "metadata": {"collapsed": false, "trusted": true}}, {"execution_count": 11, "cell_type": "code", "source": "#
\n#define the parameter dictionary with the kernels of SVCs\nparameters = [\n {'kernel': ['rbf'], 'gamma': [1e-3, 1e-4, 1e-2], 'C': [1, 10, 100, 1000]},\n {'kernel': ['linear'], 'C': [1, 10, 100, 1000]},\n {'kernel': ['poly'], 'degree': [1, 3, 5], 'C': [1, 10, 100, 1000]}\n]\n\n# find the best parameters to optimize accuracy\nsvc_clf = SVC(C=1, probability= True)\nclf = GridSearchCV(svc_clf, parameters, cv=5, scoring='accuracy') #5 folds\nclf.fit(X_train, y_train) #train the model \nprint(\"Best parameters found from SVM's:\")\nprint clf.best_params_ \nprint(\"Best score found from SVM's:\") \nprint clf.best_score_", "outputs": [{"output_type": "stream", "name": "stdout", "text": "Best parameters found from SVM's:\n{'kernel': 'linear', 'C': 1}\nBest score found from SVM's:\n0.983333333333\n"}], "metadata": {"collapsed": false, "trusted": true}}, {"source": "## Plot ROC Curves\nThis recipe plots the [reciever operating characteristic (ROC) curve](http://en.wikipedia.org/wiki/Receiver_operating_characteristic) for a [SVM classifier](http://en.wikipedia.org/wiki/Support_vector_machine) trained over the given dataset.\n\nThis recipe defaults to using the [Iris data set](http://en.wikipedia.org/wiki/Iris_flower_data_set) which has three classes. The recipe uses a [one-vs-the-rest strategy](http://scikit-learn.org/stable/modules/multiclass.html#one-vs-the-rest) to create the [binary classifications](http://en.wikipedia.org/wiki/Binary_classification) appropriate for ROC plotting. To use your own data, set `X` to your instance feature vectors, `y` to the instance classes as a factor, and `labels` to human-readable names of the classes.\n\nNote that the recipe adds noise to the iris features to make the ROC plots more realistic. Otherwise, the classification is nearly perfect and the plot hard to study. **Remove the noise generator if you use your own data!**", "cell_type": "markdown", "metadata": {}}, {"execution_count": 12, "cell_type": "code", "source": "#
\nimport warnings\nwarnings.filterwarnings('ignore') #notebook outputs warnings, let's ignore them\nimport numpy as np\nimport matplotlib.pyplot as plt\nimport sklearn.datasets\nimport sklearn.metrics as metrics\nfrom sklearn.svm import SVC\nfrom sklearn.multiclass import OneVsRestClassifier\nfrom sklearn.cross_validation import train_test_split\nfrom sklearn.preprocessing import label_binarize\n\n# load iris, set and data\ndataset = sklearn.datasets.load_iris()\nX = dataset.data\n# binarize the output for binary classification\ny = label_binarize(dataset.target, classes=[0, 1, 2])\nlabels = dataset.target_names", "outputs": [], "metadata": {"collapsed": false, "trusted": true}}, {"execution_count": 13, "cell_type": "code", "source": "#
\n# add noise to the features so the plot is less ideal\n# REMOVE ME if you use your own dataset!\nrandom_state = np.random.RandomState(0)\nn_samples, n_features = X.shape\nX = np.c_[X, random_state.randn(n_samples, 200 * n_features)]", "outputs": [], "metadata": {"collapsed": false, "trusted": true}}, {"execution_count": 14, "cell_type": "code", "source": "#
\n# split data for cross-validation\nX_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3)\n\n# classify instances into more than two classes, one vs rest\n# add param to create probabilities to determine Y or N as the classification\nclf = OneVsRestClassifier(SVC(kernel='linear', probability=True))\n\n# fit estiamators and return the distance of each sample from the decision boundary\ny_score = clf.fit(X_train, y_train).decision_function(X_test)", "outputs": [], "metadata": {"collapsed": false, "trusted": true}}, {"execution_count": 15, "cell_type": "code", "source": "#
\n# plot the ROC curve, best for it to be in top left corner\nplt.figure(figsize=(10,5))\nplt.plot([0, 1], [0, 1], 'k--') # add a straight line representing a random model \nfor i, label in enumerate(labels):\n # false positive and true positive rate for each class\n fpr, tpr, _ = metrics.roc_curve(y_test[:, i], y_score[:, i])\n # area under the curve (auc) for each class\n roc_auc = metrics.auc(fpr, tpr)\n plt.plot(fpr, tpr, label='ROC curve of {0} (area = {1:0.2f})'.format(label, roc_auc))\nplt.xlim([0.0, 1.0])\nplt.ylim([0.0, 1.05])\nplt.title('Receiver Operating Characteristic for Iris data set')\nplt.xlabel('False Positive Rate') # 1- specificity\nplt.ylabel('True Positive Rate') # sensitivity\nplt.legend(loc=\"lower right\")\nplt.show()", "outputs": [{"output_type": "display_data", "data": {"image/png": "iVBORw0KGgoAAAANSUhEUgAAAmUAAAFRCAYAAAA1jNoBAAAABHNCSVQICAgIfAhkiAAAAAlwSFlz\nAAALEgAACxIB0t1+/AAAIABJREFUeJzs3XmcHFW9///XO4GAmIUEuIIhrLJGVjEsooRFWRXQHyCg\ngF4FF8CrsokgiYCAXgWEq/J1A1mVHZFNhCiyiCibQNgkJkBAQkJYJUA+vz/OaVLpdM/0zHRPdc+8\nn49HPaZr/1RVT/enzzl1ShGBmZmZmZVrSNkBmJmZmZmTMjMzM7O24KTMzMzMrA04KTMzMzNrA07K\nzMzMzNqAkzIzMzOzNuCkzKwLkv4h6UNlx9EuJH1D0k9L2vfZko4vY9/NJmlfSdf3ct1evScl7S5p\nhqSXJG3Qm333cH8flDS1Cds5QNItzYjJrN05KbOOIWmapFfzl8ozks6VNLKV+4yI90bEn1q5jwpJ\nS0g6SdK/8nE+Iumw/th3nXgmSppRnBYRJ0XE51u0P0k6VNL9kl7OCcRvJL23svs8lErSJEnn9mUb\nEXF+RGzfwL4WSUT78J78X+BLETEiIu7txfrVsU2R9N/15kfELRGxdl/308OY+nxternfAfODwcrl\npMw6SQC7RMQIYANgPeCYckPqOUmL1Zl1MbA1sCMwHPg0cKCk01sQgySp2dvto9OBQ4FDgNHAmsAV\nwE7N3pGkoc3eZjvvO1/rlYAHe7l+re+KuklyF+9xM+tKRHjw0BED8ASwTWH8u8DvCuObAbcBc4B7\ngK0K88YAvwSeAmYDlxfm7ZKXnwPcCqxXmDcN2AZ4N/AqMLowbyPgOWBoHv8s6UtvNnAdsFJh2fnA\nl4BHgcdrHNu2wGvA2KrpE4A3gdXy+BTgJOAvwFxS0jK6wXMwBTghH+OrwOrAZ3LMLwKPAwfmZd+Z\n43kLeCnPXwGYBJybl1klH9d+wL/yuTi6sL93AOfk8/EgcAQwo861XSMf5yZdXP9fAmcCV+d47qic\nlzz/dGB6Pi93AVsW5k0CLgHOzfM/C7wfuD2fq6eBM4DFC+uMB34PPA88A3wD2B54HZiXz8vdedlR\nwM/zdp4EjgeG5HkH5HP+A2BWnncAcEueL+BU4Nkc23153wfm/bye93Vl4T25bX49FDgaeCyfk7uA\nFavO2xLAy/lavQw8mqevk98Tc4B/AB8trHM28GPgmrzONjWux83AZ/Prifm4jwBm5us+sXi9gSPz\nMi8CU2ttMy+3DHBVPhd/yefrlu6uM7BDnWtT8z1eZ9/vAf4IvEB6P19UmLd24f0wFdgjT695nTx4\n6M1QegAePDQ6kJKyypfRivnL61t5fGz+wtshj2+Xx5fJ478DLiR9eS4GfDBP3yh/Gb6f9OW4X97P\n4oV9bpNf/wH4XCGe7wE/yq93JSVca5FKoL8J3FpYdj5wPbA0sESNYzsZuLnOcU8DPp9fT8lfbOsC\nS5ETjQbPwZS8rXVyjIuRSqFWzfM/BLwCbJTHt6IqiQKOY9Gk7CzSF//6wH+AtYrHlM/52Hy9ptc5\nxi8AT3Rz/c/Ox7MJKRk5D7iwMH9fUgnbEOBrpORgWJ43ifTF+bE8viSwMSnpHQKsTPri/kqePyKv\n/1VgGKnkckLhHPyqKrbLSUnMO4DlSMlEJcE9AHgD+HLe15IsnJRtT0ouRubxtYDl8+tfAt+u8X9Q\neU8ens/rGnl8PWBMnfM3nwXJ/eKkRO6o/D7YmpS0rFk41y8Am+fxWu/Z6qTsDdIPhsXzMU6svH/y\nMU0vHNdKFBLqqu1elId3kJLTJ4E/NXida12buu/xGvu+EPhGfj0M2CK/ficwA9g/73dDUtK2Tr3r\n5MFDbwZXX1onEXCFpBdJH/CPk0p+AD4FXBMR1wFExI2kL7qdJa1A+hX9hYiYGxFvRkSl4fCBwFkR\n8ddIfkX6xbtZjf1fAOwNb1cH7ZWnQUoqToqIhyNiPunLaUNJ4wrrnxQRL0TE6zW2vSypNKaWmXk+\npOqiX0XEgxHxKnAssGeuXqp7Dgrrnh0RD0XE/HweromIJ/LyfwJuAD6Yl69VvVlr2uSIeD0i7gPu\nJVUtA+wBfCef86dIJRz1qkyX6eL4KwK4LCLuioi3gPNJX47k+M+PiDn52H5AShTXKqx/W0RclZf9\nT0T8PSLuzMv/C/h/pEQUUunp0xFxakTMi4iXI+LOwjl4+zgkvYtU5fzViHgtIp4DTgM+Wdj30xHx\nf3lf/6k6rjdISeA6kobk91DxXHRVzfw54JsR8Wg+rvsjYnYXy1dsBrwzIk7O74ObSSWQexeWuSIi\nbs/brfWerTYfOC4i3qhxjG+Rrsd4SYtHxPSI+Gf1BnLV7sdJP7Zei4gHSKVub5+Dbq7zQtcmL9/V\ne7zaPGAVSWPzdb8tT9+F9KPhnLzfe4DLSO/xmvs16w0nZdZJAtg1IkaSfoVvQyo1gVTSsYekOZUB\n+ACwPDAOmB0Rc2tsc2Xg61XrrUiqrqx2GbC5pOVJv7jnR8SfC9s5vbCN5/P0sYX1F2o0X+U5UvVg\nLe8mlRDV2s50UsnEsnR9DmrGIGlHSXdIej4vvxMpQeqJYgLxKqlUqRJ3cX9PdrGN56l//EXPFl6/\nVtgXkg6T9KCkF/KxjGJBMrvI/iWtKelqSTMlzQVOZMGxjwMWSRrqWJl0DWYWzvtPSCVmFXWvfUTc\nRKqW/T/gWUlnSRrR4L5XJP046anqawOpCrryvo8a87vzXETMqzUjIh4D/odUYvmspAvzj6Vqy5FK\n7qrf429r4DpTtXxP3uNHkJKrO/Ndrp/J01cGNq3639oHeFflEOvt36wnnJRZR8q/eM8ATsmTppOq\n1UYXhhER8V3SB/wYSaNqbGo6cGLVesMj4tc19jmH9Ct7L9IH8oVV2zmwajvvjIg7ipvo4pBuJH3o\nr1icKGlT0hfvTYXJK1W9foOU1HV1DhaJQdISwKWktnn/FRGjSW2IVL1sg8dQbSYpuakYV29BUtXw\nipLe14Ptv03SB0lVeXtExNL5WOaycOlFdew/JlVZviciRpGqnCufidOB1ersbn7V+AxS6eoyhfM+\nKiLW62LfC4mIMyJiE1K19Jr5WLpdL+/7Pd0sU8vTwLiqmz1WJrW57K3ujvHCiPhg3k+w4H+36DlS\n28Lq9zjQ0HVeKIYG3uPVMT4bEQdGxFjgIOBHklYnvR/+WON/68uNHLtZo5yUWSc7DZiQE5fzgI9K\n+oikoZKWzF06jI2ImcC1pA/YpSUtrgX9PP0U+IKkCfmGxHdK2lnS8Dr7vIDUruQTLKi6hFQycrSk\ndQEkjZK0R431a4qIP5ASk0slrZuPYTNSw/QfRUSlNETApyStI2kp4NvAxRERXZ2Dwq6KX0bD8jAL\nmC9pR+AjhfnPAsto4W5HelJF8xvgG/mcjwUOps6XV65++xFwoaStJA3L8X9S0pEN7HsE6ct8Vl73\nW0B33aUMJzXMflXS2sAXC/N+B6wg6StKXZWMkDQhz3uWVMWlHPtMUrL+g7zcEEmrq8G+xCRtImlT\nSYuTShr/Q6ruq+yrXnII8DPgeEnvye/f9SWNaWC3d+R9HZH/HyaSquguqoTVSOyNyqWS2+Qk6XUW\nPsa35Wrpy4BJkt6R/5/2Z8H7prvr/AyFa0P37/HqOPco/DB6Ie/3LVLV7pqSPpXP1+KS3p/fN9D9\ndTJriJMy61gRMYvU3uTIiHiS1Nj+aODfpF+2X2fBe/zTpBKlqaQP0EPzNv4GfJ5UfTSb1Fh/P+r/\n8r2KVDIxMyLuL8RyBemX/0W5Kux+UgPutxdp4JA+QWo8fR0pWTgX+FlEHFK1nXNJDbFnkr5wKsdS\n7xzULC2KiJfyur/Jx743cGVh/lRSaeA/Jc3O1U1RdSxdHde3SVWGT5CSlotJbXZqiohDWVCNN4fU\nEH1X0jmv7Kt6f5Xx6/LwCOlmhtdYuNqr1rqHkUo8XyS1J7uoskw+Nx8GPko6z4+QqszJxwHwvKS7\n8uv9SNeicvftxSyoNq4Xd2XayLz/2Tn2WaSbSCDd0blurjK7jEX9gHT9biCVGP2U1Mi+luK1fyMf\n246k0qkzgU9HxCNdxNydrkpWlyC1s3yOBW0kv1FnOweTEuZngF/koaK767zQtenuPV7DJsAdkl7K\nyx0aEdMi4mVSMvdJUmnizHw8w/J63V0ns4Yo/cA2s04g6WZSFeUvul24zUj6IrBnRGxddixmZu3I\nJWVmnacj7vKStLykD+TqvLVI3RdcXnZcZmbtyr0um3WeTineHkZqa7cqqX3OhaR2Y2ZmVoOrL83M\nzMzagKsvzczMzNpAR1RfSnJxnpmZmXWMiOhx+9+OSMqgdwdn7UHSpIiYVHYc1nO+dp3N169z+dp1\ntt4WJrn60szMzKwNOCkzMzMzawNOyqw/TCk7AOu1KWUHYH0ypewArNemlB2A9b+O6BJDUrhNmZmZ\nmXWC3uYtLikzMzMzawNOyszMzMzaQEuTMkm/kPSspPu7WOaHkh6VdK+kjVoZj5mZmVm7anVJ2S+B\nHerNlLQT8J6IWAM4EPhxi+MxMzMza0stTcoi4hZgTheLfAw4Jy/7F2BpSe9qZUxmZmZm7ajsNmVj\ngRmF8SeBFUuKxczMzKw07fCYpepbRtu/jw4zMzOzKpIErN3b9ctOyp4CxhXGV8zTFiFpUmF0SkRM\naV1YZmYLSMwGRpcdR395njGM6bLliZkVTcnDcXl8ci+3U3ZSdhVwMHCRpM2AFyLi2VoL+sGsZlai\n0RGLlOoPXJoTuMNuawFNVsRxA+e9JWkxYE/gG8AbwHeAy4E3e7O9liZlki4EtgKWlTSDlEQuDhAR\nZ0XENZJ2kvQY8ArwmVbGY2ZmZtZE3wa2BA4Hro/8mKRUi9lzfsySmVk3JGJwlZTJJWXWEgOwpGxY\nRMyrMd2PWTIzMzNrNkkjak2vlZD1hZMyMzMzsxokrSDpe8A0SeO6XaGPnJSZmZmZFUhaTdKPgQeA\nYcCGETGjm9X6zEmZmZmZWSbp48BfgdnA2hHxlf5IyKD8LjHMzMzM2smNwGoRMbe/d+ykzMzMzAad\n3Ps+UdUNRUS8WE5Err40MzOzQUTSEEkfA24Dti07niKXlJmZmdmAV6f3/ZtLDaqKkzIzMzMb0CSt\nA1wNPAkcBtxQXW3ZDpyUmZmZ2UD3BLBfRNxadiBdcVJmZmZmA1pE/Ado64QM3NDfzMzMBoBK7/uS\n9i47lt5yUmZmZmYdq0bv+38uOaRec/WlmfWaxGxgdNlx9IM5ZQdgVhZNVjP/z5v2vyRpFHAmsCPw\nE2CtiHiuWdsvg5MyM+uL0RGo7CDMrKVGx3HRjv/nLwG3AweX0ft+KzgpMzMzs44TEfOBH5UdRzO5\nTZmZmZm1pUrv+5L2LTuW/uCSMjPrTGpqOxdbmNvQWalq9L5/bLkR9Q8nZWbWqUYTbdnOxcx6KT8k\n/PPAkcBTwOHA9e3Y+34rOCkzMzOzthARIWlV4ICIuKXsePqbkzIzMzNrGxHxjbJjKIsb+puZmVm/\nyr3vf6zsONqNkzIzMzPrF1W9729edjztxkmZmZmZtZSk90o6D/grMJvU+/6graasx23KzMzMrNW+\nSCod+/JA6X2/FZyUmZmZWUtFxJfLjqETuPrSzMzM+iz3vv++suPoZE7KzMzMrNckLSZpH+Be4MeS\nhpUdU6dyUmZmZmY9JmkJSQcCDwMHAYcBm0bEvHIj61xuU2YdTcLPPyyXn5FoNngdB2wA7B8Rfy47\nmIHASZl1utER+PmHZmb979iIeKvsIAYSV1+amZlZjzkhaz4nZWZmZraIQu/7SBpXdjyDgZMyMzMz\ne5uk8ZLOZUHv+0TEjHKjGhyclJmZmRkAkvYE/kDqfX+1iPhmySENKm7ob2ZmZhW/A66OiFfLDmQw\nclJmZmY2yEgaAkRERHF6RLxSUkiGqy/NzMwGjdz7/r6k3ve3LTseW5hLyszMzAY4SUsABwBHAE+S\net//Q5kx2aKclJmZmQ1gktYDrgPuwb3vtzUnZWZmZgPbI8DOEXFP2YFY15yUmZmZDWAR8TqplMza\nnBv6m5mZdbhK7/uS9i87Fus9J2VmZmYdqkbv+9eUHJL1gZMyMzOzDiNpaUlXADcBD5J734+I50oO\nzfqgpUmZpB0kTZX0qKQja8xfVtJ1ku6R9A9JB7QyHjMzswHiReByYNWIOCki5pYdkPVdy5IySUOB\nM4EdgHWBvSWtU7XYwcDdEbEhMBH4viTffGBmZtaFiJgfEef4cUgDSytLyiYAj0XEtIh4A7gI2LVq\nmZnAyPx6JPB8RLzZwpjMzMw6Qu59fx9Jnyk7FusfrUzKxgIzCuNP5mlFPwXGS3qa9MiHr7QwHjMz\ns7YnaQlJBwIPAwcB/yo5JOsnrawqjO4X4WjgnoiYKGl14PeSNoiIl6oXlDSpMDolIqY0J0wzM7Py\n5YeEfxX4Gu59v6NImkhqhtUnrUzKngLGFcbHkUrLirYATgSIiMclPQGsBdxVvbGImNSaMM3MzMoX\nEfMlLYV73+84uaBoSmVc0nG92U4rqy/vAtaQtIqkYcBewFVVy0wFtgOQ9C5SQvbPFsZkZmbWtiLi\neCdkg1fLkrLcYP9g4HpSHyq/joiHJB0k6aC82HeATSTdC9wIHBERs1sVk5mZWdly7/t7lh2HtZ+W\ndj8REdcC11ZNO6vwehbw0VbGYGZm1g4kjQeOAnYCTis5HGtD7tHfzMyshSS9X9LlwB+AB0i97x9f\ncljWhtxRq5mZWWvtSUrI9nVnr9YVJ2VmZmYtFBGHlx2DdQYnZWbWOGk2MLrsMLI5ZQdgVpEfEbh5\nRNxSdizWuZyUmVlPjCZCZQdh1i4kLQEcABwBzJD04fxoQbMec0N/MzOzHpI0XNLXSH1r7gocEBET\nnZBZX7ikzMzMrOeOBlYHdomIu8sOxgYGJ2VmZmY9982IaOQZz2YNc/WlmZlZHZLG1pruhMxawUmZ\ndUtitkS044DvwDOzFpD0XknnAffWS8zMms1JmTVidARq02FM2SfHzAYOSRNy7/s3Av8AVo+Ip0oO\nywYJJ2VmZmaApH2BS4CbSI9COjki5pYclg0ibuhvZmaWXAZcHBHzyg7EBicnZWZmNqjk3vfnR8T8\n4vSIeK2kkMwAV1+amdkgIWkJSQcCDwMTSw7HbBFOyszMbEDLve9/nQW97+8fETeVHJbZIlx9aWZm\nA5akDUh3Ut6Ee9+3NuekzMzMBrIHgQ9ExCNlB2LWHSdlZmY2YOUHhDshs47gNmVmZtbRJI2XdK6k\nz5Udi1lfOCkzM7OOVOh9/w/AA8DFJYdk1ieuvjQzs44iaQzwG2AN4HvAvhHxarlRmfWdkzIzM+s0\nc4CfAFe5930bSJyUmZlZR4mIID2j0mxAcVJmNhhIs4HRTdjSnCZsw6xbkpYA9geGRsSPy47HrD80\nnJRJWsp19guTaNYXXbvzF3HnG02Eyg7CrDuShgMHAl8H7gVOaOr2JzftB8pg4u+AftJtUiZpC+Bn\nwAhgnKQNgQMj4kutDq4DjI7AX3RmZn0kaQhwDHAwcDOt631/dBznHyjWnhopKTsN2AG4EiAi7pG0\nVUujMjOzQSUi5kt6GfhgRDxcdjxmZWio+jIipksL/bB4szXhmJnZYBURPyg7BrMyNdJ57HRJHwCQ\nNEzSYcBDrQ3LzMwGotz7/v5lx2HWjhpJyr4IfBkYCzwFbJTHzczMGlLV+/4yZcdj1o4aqb5cMyL2\nKU7IJWe3tiYkMzMbKCRNBL4JrIl73zfrUiMlZWc2OM3MzKzatsD5wBoRcaYTMrP66paUSdoc2AJY\nTtLX4O2uH0bgB5mbmVkDIuLYsmMw6xRdJVfDSAnY0Px3eB5eBP6/1odmZmadQNISkrYvOw6zTle3\npCwi/gj8UdLZETGt/0IyM7NOUNX7/t2SboqIN0oOy6xjNdLQ/1VJ/wusC7wjT4uI2KZ1YZmZWbuS\nNAY4hHQnfit73zcbVBpJys4Hfg3sAhwEHAA818KYzMysvX0FWBHYMiIeKTsYs4GikaRsmYj4maRD\nC1Wad7U6MDMza08RcVzZMZgNRI3cRTkv/31G0i6SNgZGtzAmMzNrA5JWV9Uz9sysdRopKTtR0tKk\nhpxnACOBr7Y0KjMzK42kCcA3gM2BjYGny43IbHDotqQsIn4bES9ExP0RMTEiNgae6YfYzMysnyjZ\nRtKNwMXATcBqEeGEzKyfdNV57BBgd2B14B8RcY2kTYDvAP8FbNg/ITaXxGyaV/06p0nbMTMr236k\n0rGTgQsiYl43y/cbTZY/t21QUETUniH9DFgVuBPYCpgJrE16htmVUW/FVgQpRUQ0pV2DRETgNhI2\nuEhBk/6HbGCSNAx4KyLeKjuWapqsiOP8/rXO0du8pas2ZZsB60fEfElLkqosV4+I53sQ1A7AaaSn\nAvwsIk6pscxE4FRgcWBWRExsPHyzNqam/rrvK5cOGAD58/yN6uSrnUrGzAarrpKyNyJiPkBE/EfS\nEz1MyIaSHly+HfAU8FdJV0XEQ4Vllgb+D9g+Ip6UtGyvjsKsPY126ZS1i9z7/kHA14B9gD+WG5GZ\nVesqKVtb0v2F8dUL4xER63ez7QnAY5VHNEm6CNgVeKiwzD7ApRHxZN7orJ4Eb2ZmXSv0vn8wqfG+\ne983a1NdJWXr9HHbY4EZhfEngU2rllkDWFzSzaSHnp8eEef2cb9mZgbkfiVvBC4n9b7/cMkhmVkX\nunog+bQ+bruRGwEWJ/WBsy2wFHC7pDsi4tHqBSVNKoxOiYgpfYzPzGygux/YICJmdLukmfVabh8/\nsa/baaTz2N56ChhXGB9HKi0rmkFq3P8a8JqkPwEbAIskZRExqUVxmpl1POXbvYrTIuINFq6xMLMW\nyAVFUyrjknr1KLJGHrPUW3cBa0haJd9qvRdwVdUyVwJbShoqaSlS9eaDLYzJzGxAkTRB0uWkRvxm\n1sEaSsokLSVprZ5sOCLeJDUsvZ6UaP06Ih6SdJCkg/IyU4HrgPuAvwA/jQgnZWZmXci9728t6fek\n3vf/APyq5LDMrI/qdh779gLSx4DvAUtExCqSNgImR8TH+iPAHIM7j7XO4w5brQUkLQNcTeoDr+16\n328Fdx5rnaYVncdWTCJVK94MEBF3S1qtpzsyM7OmmA18G7ihHXvfN7PeayQpeyMiXpAWSvjmtyge\nMzPrQm7Mf23ZcZhZ8zWSlD0gaV9gMUlrAIcCt7U2LDOzwavQ+/5bEXFa2fGYWf9opKH/IcB44HXg\nQuBF4H9aGZSZ2WAkaUy+lf4J0lNRppQbkZn1p0ZKytaKiKOBo1sdjJnZYCRpCHAK8FngCuADEfFI\nuVGZWX9rJCn7gaTlSbdd/zoi/tHimMzMBpWImC/pcWBD975vNnh1W30ZEROBrYFZwFmS7pd0bKsD\nMzMbTCLiJ07IzAa3hjqPjYiZEXE68AXgXuBbLY3KzGwAyr3vu+d9M6up26RM0rqSJkn6B3Am6c7L\nsS2PzMxsAMi9728j6UZSMxAzs5oaaVP2C+AiYPuIeKrF8ZiZDRiSdgaOBZZmkPS+b2a9121SFhGb\n9UcgZmYD0PrA/wKXu/d9M+tO3aRM0sURsYek+2vMjohYv4VxmZl1vIg4qewYzKxzdFVS9pX8dxdY\n5AHeXT/F3KzTSbNJD3zuiznNCMXaW+59/yMRcVnZsZhZZ6vb0D8ins4vvxQR04oD8KV+ic6sPKOJ\nUB+HMWUfhLVOVe/7e0lqpI2umVldjXSJ8ZEa03ZqdiBmZp1A0gqSvgc8BqwEbBkRe0XEmyWHZmYd\nrqs2ZV8klYitXtWubARwa6sDMzNrU/8NDCP1vj+97GDMbODoqrj9AuBa0m3cR7KgXdlLEfF8qwMz\nM2tHEXFC2TGY2cDUVfVl5PZjXwZeAl7MQ0hyWxkzG9AkrS+p+iYnM7OW6aqk7EJgZ+Bv1L7bctWW\nRGRmVpKchG0NHA2sAWwGzCw1KDMbNOomZRGxc/67Sr9FY2ZWAklDSN3/HI173zezkjTy7MsP5H54\nkPRpST+QtHLrQzMz6zf7A5NIve+Pj4iznZCZWX9rpEuMnwCvStoA+BrwT+BXLY3KzKx/nQu8LyIu\n8eOQzKwsjXR2+GZEzJe0G/B/EfEzSZ9tdWBmZs2WS/3/U92nmPsYA01uylMsWsVPx7BBoZGk7CVJ\nRwOfAj4oaSiweGvDMjNrnnzH+CGku8k/AdxSbkRtaXQcF77b1KxEjVRf7gW8Dnw2Ip4BxgLfa2lU\nZmZNUOh9/1EW9L7vhMzM2lK3SVlEzATOB5aWtAup6N9tysysrUnaBHiAVLK/YUT8d0Q8UnJYZmZ1\nNXL35Z7AX4A9gD2BOyXt0erAzMz66G5g7Yj4n4iYUXYwZmbdaaRN2THA+yPi3wCSlgP+AFzcysDM\nzBolSRGxUCfX+S7Kf5cUkplZjzXSpkzAc4Xx51nwHEwzs1Io2UbSjcAXyo7HzKyvGikpuw64XtIF\npGRsL9KDys3M+l293vdLDcrMrAm6Tcoi4nBJHwe2zJPOiojLWxuWmdmiJC0L3Ay8AXwHuNydvZrZ\nQFE3KZO0Jqnri/cA9wGHR8ST/RWYmVkNzwNfBG6tbkNmZtbpuiop+wVwDqmTxY8CPwQ+3h9BtTW1\nda/X1jzuQbwN5UTsz2XHYWbWCl0lZcMj4qf59VRJd/dHQB1gNOFer81apdD7/usRcXLZ8ZiZ9Zeu\nkrIlJW2cXwt4Rx4X6Qfr31senZkNGpJWAL4G/DdwBXBKuRGZmfWvrpKyZ4DvdzG+dUsiMrNBJd9N\neSbwSeBcUu/708uNysys/9VNyiJiYj/GYWaDVETMl3QHMKnSSbWZ2WDUSD9lZmYt5efpmpk11qO/\nmVmfFHrf/2rZsZiZtSuXlJlZy9Toff+EciMaWDS5qV30uBsYs5J1m5TlD9V9gVUj4tuSVgKWj4g7\nWx6dmXUsSXsCx+Le91tpdBznLnrMBopGqi9/BGwO7JPHX87TzMy6MhY4HHhfRFzihMzMrGuNVF9u\nGhEbVTqPjYjZkhZvcVxm1uEi4tSyYzAz6ySNlJTNkzS0MiJpOWB+60Iys04haYykT5Udh5nZQNBI\nUnYGcDkn4LFZAAAgAElEQVTwX5K+A9wKnNTIxiXtIGmqpEclHdnFcu+X9KYkP1vTrANIWkHS94BH\ngYmSfNOQmVkfdftBGhHnSfobsG2etGtEPNTderl07UxgO+Ap4K+SrqpeNy93CnAd6RFOZtamJK1G\naie2F/ArUu/7M8qNysxsYGjk7suVgFeA3+ZJIWmlBh6DMgF4LCKm5e1cBOwKVCd0hwCXAO/vQdxm\nVo5PAM8Da0XEc2UHY2Y2kDRS5XANEPn1ksCqwMPA+G7WGwsUf0E/CWxaXEDSWFKitg0pKQvMrG1F\nxPfKjsHMbKBqpPryvcVxSRsDX25g240kWKcBR0VESBKuvjQrXf5f3Ay4IyL8Q8nMrJ/0uHFuRPxd\n0qbdL8lTwLjC+DhSaVnR+4CL0ncAywI7SnojIq6q3pikSYXRKRExpSdxm1nXCr3vf4PUS/xE4Jky\nYzIz6wSSJpI+M/u2ne5+CEv6emF0CLAxMCYitu9mvcVI1ZzbAk8DdwJ717tJQNIvgd9GxGU15kVE\nc3qtloiIPpTISUGTYjFrB/l/dU9SMube9zuIJivco79Z++lt3tJISdnwwus3gauBS7tbKSLelHQw\ncD0wFPh5RDwk6aA8/6yeBmtmLbE/sB9wGHCDqyzNzMrRZUlZ7q7iuxHx9boL9QOXlJm1jqQhEeEO\noTuQS8rM2lNv85a6ncdKWixXX3wgN/w1sw6We99f5BFpTsjMzNpDVz3635n/3gNcKenTkj6RB/e8\nb9YhCr3vP4b7AzQza1tdtSmrlI4tSeoscpuq+Ys0yG8lqWl9mM1p0nbM2lpV7/vnknrf767T59Jo\nsmaT7vq0xvnzzGwA6SopW07S14D7+yuYrvSpHZjZIJO7rfkd8BM6p/f90W4fZWaDWVdJ2VBgRH8F\nYmZNdRewekTMLTsQMzNrTFdJ2TMRMbnfIjGzHss34QyNiDeL0/NNOk7IzMw6SFcN/c2sTUkaIulj\nwO3A58qOx8zM+q6rkrLt+i0KM2tIvd73Sw3KzMyaom5SFhHP92cgZtY1ScsBd5CeK3s4cL173zcz\nGzh6/EByMyvNLGCPiPh72YGYmVnzOSkz6xC5VMwJmZnZAOWG/mZtpNL7vqRJZcdiZmb9y0mZWRuQ\ntJqkHwMPAMOAX5QckpmZ9TNXX5qVSNJQ4JfAznRW7/tmZtZkTsrMShQRb0m6CjjEve+bmQ1uTsrM\nShYRl5Qdg5mZlc9tysxarNL7vqSjy47FzMzal5MysxaRtJikfYB7gUnA1HIjMjOzdubqS7MWkHQA\ncCzufd/MzBrkpMysNZYA9o+IP5cdSCtpsmYDo5u0uTlN2o6ZWUdyUmbWAhFxVtkx9JPRcVyo7CDM\nzAYCtykz66Xc+/6BZcdhZmYDg5Mysx6q6n1/vCSXOJuZWZ85KTNrkKR1JZ0L/BWYDawdEV+JiDdL\nDs3MzAYA/8I3a9yHgQeBg937vpmZNZuTMrMGRcTpZcdgZmYDl6svzQpy7/sfluQ7Cs3MrF85KTNj\nkd73TwaWKzkkMzMbZFx9aYOapCWA/YEjgSeBw4Ab3Pu+mZn1NydlNth9CtiVQdD7vpmZtTcnZTbY\n/SIifl52EGZmZm5TZoNC7n1/WPV0V1OamVm7cFJmA1pV7/sblx2PmZlZPU7KbECSNL6q9/21IuKO\nksMyMzOry0mZDTiSNgf+QOp9f7WI+GZEPFdyWGZmZl1yQ38biP5CSsZeLTuQdqTJmg2MbtLm5jRp\nO21DktsZmlnDIqJpnY07KbOOJWkIsFhEzCtOj4j5gBOy+kbHcc37EBmImvkha2YDV7N/xLn60jpO\nVe/7B5QcjpmZWVO4pMw6Rr3e90sNyszMrEmclFlHkPRfwN3APbj3fTMzG4AGT/WlNBsp+jwMwIbN\nnSAi/g1sExE7OyEzs+5I+qKkZyW9KKlZN7aURtL2ki4vO45OIOlgSSeXHUdvDJ6kDEYToSYMY8o+\nkMEqIh4uOwazMkmaJulVSS9JekbSuZJGVi2zhaSbcjLygqSrJK1TtcxISadJ+lfe1mOSTpW0TP8e\nUWtIWhz4PrBtRIyMiKb8mJZ0gKRbmrGtXjgROKmkfTeFpA0l/U3SK5LukrRBF8uOlHSepOfycJ6k\nEYX5H5X0j/z+vbXqPf5TYF9Jy7XyeFphMCVl1uYqve9LOrHsWMzaVAC7RMQIYANgPeCYyszcR9/1\nwOXACsCqpBtibpW0al5mGKkfv3WA7fO2NgdmARNaFbik/mwuszywJPBQP+6zZSS9HxgZEXf2cv2h\nTQ6pNzEMA64EfgUsDZwDXJkT6FomAcuS3sOrA+/K05C0BnAecCAwCvgtcFXlOCPideBaYL/WHE3r\nOCmz0tXoff/0kkMya3sR8SzpRpfxhcnfBc6JiDMi4pWImBMRxwJ3kL/QSF9U44DdI2Jq3tZzEXFi\nRFxba1/5f/T3kp7PJXRH5elnSzq+sNxESTMK49MkHSHpPuDl/Priqm2fLun0/HqUpJ9LelrSk5KO\nz13f1IppiVza91QeTpU0TNKaLEjGXpB0Y411l8wlL7MkzZF0Z263WjeGXBLzY2DzXDozu7D8ryT9\nOx/vNyUpz3uPpD/mEsvnJF1UddzTJc3NpUZb1jrObEdgSo3zVnN9SZMkXZJLUucC+3d1biWtnktX\nZxVKpUZ1EU9vTASGRsTpEfFGRJwBCNimzvLjgSsi4uWIeBG4ggXv9e2BWyLittwF0inAWGCrwvpT\ngJ2bfAwt1/KkTNIOkqZKelTSkTXm7yvpXkn35SLI9Vsdk7UHSUMlXUL61f4AC3rf/3fJoZm1s8oX\n/orADqTOkpG0FKnE6+Ia6/wG+HB+vR1wbaOdK+cqoxuBa0ilb+8h/c9CKrnrrp+mT5KSilHARcBO\nkobnbQ8F9gDOz8ueDcwjlYxsBHwE+Fyd7X6TVLK3QR4mAMdExCMs+PIeFRHb1Vh3f2AksCIwBjgI\neK2rGCLiIeALwO0RMSIWNGU5AxhBKtHZipT0fibPOx64LiKWJiUNPyzEcGeOezRwAXBxLk2q5b1A\ndfON7tb/GHBxRIzK82seV2H5E0nXdx1S0j6pTizk7+s5dYYz66w2Hrivatq9LPyjouh64BOSllZq\nE/gJ0nsQ0nuu2JfgkDxe3NZU0vnpLBHRsgEYCjwGrAIsTrpzbp2qZTYn/eNA+oC5o8Z2os/xNGMb\nHlrxHtkRWKrsOAbTwCSi7BjaeWjK503rYpsGvAS8CMwnVVMOyfNWzNPWrLHeDsC8/Pr3wHd6sM+9\ngb/VmfdL4PjC+ERgRmH8CeCAqnVuAT6dX38YeCy/fhfwH2DJqn3fVGffjwE7FMY/AjyRX6+Sz8WQ\nOut+BrgVWK9qepcxkPpFvKUwbyjwOrB2YdqBwM359TnAWcDYBs7z7Op4CvNuAA5sdH1SQjWl0eOq\nsa3dgL83+b17LHBh1bTzgOPqLL9Efq++lYfrgcXzvLWAl0lJ8LC87beAIwvrrwG82cxjqBNn9GR6\nd0OrS8omkP7hpkXEG6RfSbsWF4iI2yNibh79C+mDxQaJiGj4F7tZO5CIZgy93H0Au0bESFICtA2w\nSZ43h5SIrFBjvRWAyvNfZwHv7sE+xwH/7E2w2Yyq8QtICQHAPiwoJVuZ9ON9ZqXUBfgJUK+x9ruB\nfxXGp9P4cZ1L+pK/KFd9npLbvPU0hmXz8tVxjM2vjyCV4Nyp1Ci9UoKGpMMkPZirNueQShKXrbOf\nOaSSvbc1sP6ThdddHpekd0m6KFdrzs3np9k3fbxUfQw55hfrLH8+qXRweF7vn6Qkjkg3fe0PnAk8\nnWN9kIWPeQQwlw7T6qRsLAv/Qz7JgjdrLf/NguJJGwCUe9+XdELZsZg1QwRqxtD3OOJPpKqzU/L4\nK8DtwJ41Ft+TBVWONwLb5+rORkwHVqsz7xWguJ3la4VaNX4JMFHSWFKJzAV5+gxSqdMyETE6D6Mi\nYr06+36aVCJWsVKe1q2IeDMivh0R44EtgF1I1Y7Tu4mh+lhmAW/UiOPJvJ9nI+LAiBhLqiL9kdIN\nTR8EDgf2iIilI2I0KYGo9764D1izMtLg+sVYuzu33yGVNL03UnXnp+kiP5D0QG5XV2v4UZ3VHgCq\nmyetn6fXsgNwVkS8lt/bZwE7vX1wEZdGxHoRsSypZHAVUrvkinVItXMdpdVJWcO/BiVtDXyW1Fu7\ndTilRrgHkn7pHAT8qeSQzAai04AJkjbN40eRGnUfImmEpNH5B9GmwOS8zLmkL+lLJa2VG7EvI+lo\nSTvW2MfVwAqSvpL/r0dIqtyleQ+pjdhoScsD/9NdwBHxHKkR9tnAP3OpBxExk1RN94O8jyG5AfqH\n6mzqQuAYSctKWhb4Vj62bindkLBebtP2EimxeisinukmhmeBFZXvGIyIt0jt9U6UNFzSysBXySU6\nkvbIbf8AXiB9J84nleK8CcxSujnhWyxailR0DQs3Yu/R+g2c2+GkBPvFnCwf3tX5i4jxkdrV1Rq+\nVGe1KcBbkg7N76ND87m4qc7y9wGfV7op4x2kauF7KzMlvS+3S14O+H/AlZHaE1ZsRboDs6O0Oil7\nilT0XTGOhYsXAciN+38KfCzq9CejdDdJZZjYimCtOSR9iVTUvCup9/2tIsKPQzJrsoiYRWq3dGQe\nv5V0Z9rHSaVG00iNnbeMiMfzMvNIjf2nktrszCU1HRlDukuzeh8vk9p+fRSYCTxCqjqFlATdm/dz\nHamJSiM/xi8AtmVBKVnFfqQ2Qg+S2khdTO3SN4ATgLtIX9735dfFEvmu4lg+b3tu3tcUFiR0XcVQ\nuSnpGUmVG5IOISU0/yS1lzs/In6R520C3CHpJVJ3EIdGxDTSubqOdC6nkW4ymF4v2Ii4G5hbSIa7\nW7/WDRhdHddkYON8Pn4LXFpj/T7JTZh2y3HMyX93i4g34e2b/v5RWOUAUungU6S8YRVSlWXFaXk7\nU4Hngc9XZkhaktRe+ZxmHkNXcqL/dp7S6+3kBmktkevoHyb98z1Nultk70h3sVSWWYmUKX8qIhb5\nQMjLRET0rbhfCvq6DWuIpP2B+/IHibUZTVbEcf5fqKcpnzdmTSbpw8CXImL3smNpd5IOBlaMiKP6\nYV81Py96+znS0qQMIBeHn0a6S+XnEXGSpIMAIuIsST8DdmdBlv9GREyo2oaTMrMmcVLWNSdlZtao\njkvKmsFJWfuRtBrpLrBTe7TeZM0m9atj5ZkTx/lxYfU4KTOzRjU7KevPx17YACDpvaTGxDsCP5E0\nNDd2bdRol9KYmZktyo9ZsoZI2kTSFSza+35PEjIzMzOrwyVl1qhNSAnZPu7s1czMrPmclFlDIuIn\nZcdgZmY2kLn60t6We9/fTZLfF2ZmZv3MX75W3fv+12j+M8/MzMysG07KBrH8WJCvs3Dv+x/Kj0Ex\nM+tYkr4o6VlJL0oqpRseSR+UNLWP21hF0vxm1mBI2l7S5c3a3kAm6WBJJ/fX/pyUDW6fACYAu0TE\nzhHx57IDMrP6JE2T9Gp+8PMzks6VNLJqmS0k3ZSTkRckXSVpnaplRko6TdK/8rYek3SqpAFRSp6f\nTfl9YNuIGFnv8X2tFhG3RMTaZey7GycCJ5UdRF9I2lDS3yS9IukuSRs0sM4YSc9JuqUwbU1JV0r6\nt6TnJV0nac3Caj8F9s3P2Gw5J2WDWEScExF7+XFIZh0jSD+iRpCeabkecExlpqTNgeuBy4EVgFVJ\nz6a8VdKqeZlhpDup1wG2z9vaHJhF+pHWEvmxe/1leWBJ4KHuFuyrdm6Dq/TA9epp7wdGRsSdzdpm\nf8vv4SuBXwFLk55xeWXlQfFdOIX07M9ir/mjgCtIz9l8F+lxkFdWZkbE66QHm+/XrPi70rZvJmse\nSatKWqLsOMyseSLiWeAGYHxh8neBcyLijIh4JSLmRMSxpAeNT8rL7AeMA3aPiKl5W89FxIkRcW2t\nfUkaL+n3uSThGUlH5elnSzq+sNxESTMK49MkHSHpPuDl/Priqm2fLun0/HqUpJ9LelrSk5KOr5f0\n5Lawp0l6Kg+nShqWSzkqydgLkm6sse61kr5cNe1eSbvl12sXjneqpD0Ky50t6ceSrpH0MjBR0k6S\nHsylk0/mZiG1zsc4SZflUplZks7I04dIOiafr2clnVNdAlrYxrtz6efzkh6V9LnCvEmSLsklqHNZ\n+AHeFTuSHsBefQ2mS5qbS5227GqbXV0nSasrldTOyqVS50kaVetY+mAiMDQiTo+INyLiDEDANvVW\nkLQF6X/ll3lZACLirxHxy4h4IT8c/TRgLS1c5T0F2LnJx1CTk7IBLH+Qngv8lfSL2sw6nwAkrQjs\nAPwljy9FKvG6uMY6vwE+nF9vB1zbaH+DkkYANwLXkErf3kMqaYNU4tDds/o+SUoERgEXATtJGp63\nPRTYAzg/L3s2MA9YHdgI+AjwOWr7Jqlkb4M8TACOiYhHWJCojoqI7WqsewGwd+EY1wVWAn4n6Z3A\n74HzgOVy/D/SwlXAewPHR8Rw4Dbg58DnI2Jk3vdN1TvMx3o18ASwMjAWuDDPPoCUQE0EVgOGA2fW\nOe6LSM+KXgH4/4DvSNq6MP9jwMURMSofZ7X3km7qKrqTdA5H53UuzqVR9bZ5Nl1fpxNzfOuQfgBM\nqnMsSLpP0pw6Q71zMB64r2ravSz8A6W4j6HAGcCXa82v8iFgZlWV91TS+Wk5J2UDkKQJSo04K73v\nrx4Rd5Uclpn1nYArJL1I+mJ+HDghzxtD+kyfWWO9Z4Bl8+tl6ixTzy7A0xFxakTMi4iXI+KvVTHV\nE8API+KpiHg9IqYDfwd2z/O3AV6NiDslvYuUvH01Il7LNxydRkqKatkH+HZEzIqIWcBk4NMNxASp\numpDSePy+L7ApRHxRj7eJ3LzjvkRcQ9wGSl5fHv9iLgdICL+Q0pQxksaGRFz6zQJmUBKVA7Px/d6\nRNxW2P/3I2JaRLwCfAP4ZHUpYY53C+DIfC3uBX7GwlVrt0XEVYXYqi0NvFScEBHn51LV+RHxA2AJ\nYK1a2yQl13WvU0Q8HhF/yCVYs4BTga1qxFHZ9/oRMbrOcHCd1YYDc6umvQiMqLP8ocAd3TXVyT90\nziT1QlD0Eum4W86dxw4wkj5A+iXzPWBf975v1lyarO5KhhrSy2fABrBrRNwk6UPAb0lP27gTmAPM\nJ33xP1K13gpA5a7qWcC7e7DPcaQ7tHtrRtV4pZTqXFJiVSklWxlYHJgpvX1qhpCSz1reDfyrMD6d\nBo8rIl6S9Lscx3dJCUWlpGdlYFNJxZKSxUjtlyBdgyerNvkJUtu+k3NV7VERcUfVMuOAf0XE/Boh\nrVDjWBYjtXEqejcwOyduxWU3KYxXx1ZtDlB9c8hhwGfz9iPPX7awSHGbXV6nnFyfDmxJSpKGALO7\niamnXqo+BlLS9GL1gpLeDRwCvK+rDSo15L8B+L+I+HXV7BEsmgS2hJOygec2YI2ImFd2IGYDUS+T\nqaaLiD/lNkmnAFtHxCuSbgf2BP5YtfieLKhyvBE4QdJSDf5omw7sVWfeK8BShfHla4VaNX4J8H1J\nY4HdgM3y9BnA68AydRKXak8Dq7Cg/dhKeVqjLgSOU7oTb8mIuDlPnw78MSI+0uiGck3Ebrma7BBS\ndfFKVYvNAFaSNLTGM4Mrx1KxEvAm8GzVdp4GxkgaHhEvF5YtJk3d/Wi4j9SoHUjddgCHA9tExAN5\n2mwWLm0sbrO76/Qd4C3gvRHxQm6nd0a9YCQ9wKLnquLciPhSjekPAF+vmrZ+nf1USigfzEnkO4B3\nSHoaGBsRkduP3UAqAa11V+o6wD31jqGZXH3ZoZR6339H9fRInJCZDQ6nARMkbZrHjyI1xD5E0ghJ\noyWdAGxKqt6DVEI1A7hU0lq5kfkyko6WtGONfVwNrCDpK0qN60dIqtyleQ+pjdhoScsD/9NdwLm6\nawqpXdI/I+LhPH0m6YvxB3kfQ3Kj8Q/V2dSFwDGSlpW0LPCtfGyNuoZU6jOZ1E6reLxrSvqUpMXz\n8H5Jla4tFkrK8/x9JY3KydZLpKSk2p2kauOTJS0lacnc+LxyLF9V6pNsOCmxuag66YmIGaQf3ifl\na7E+qYTrvB4ed7E6cQQpAZyldKPEt1i0FKoYQ3fXaTgpWX8xJ96HdxVMRIyPiBF1hloJGaT3z1uS\nDs3n4VBSKfEibflYcJ0rbQ+/BdwNbJgTspGkO5b/HBFH19nfVqQ7MFvOSVmHyW/Ag0gNNffubnkz\nG7hym51zgCPz+K3A9sDHSaUq00hfRFtGxON5mXmkxv5TSQ3a55JuFhhDukuzeh8vk24S+CgpqXiE\n1CAdUhJ0b97PdaTkppHq3QuAbVm0Ifp+wDBStwWzSTct1Cp9g9SW7i5Syc99+fUJhfldxpHPw2XV\nceTj/QipSvMp0jGflOOqbLd6258Cnsh3Jx5IaiO2UBw5Yfso6UaJ6aTEeM+8zC9I5/JPpKriV0kl\nbrWOZW9SqdrTOf5vRcRNheW6O+67gbmFxPq6PDxCuo6vsXCVca1tdnWdJgMbk95XvwUu7S6mnspt\n/3bLcczJf3fLd0+Sk+R/5GXnRcS/K0OOa15+Dal94ybAZ5T67HtJ6S7aFfO2liS1oTunmcdQjyKa\neq5aQlJE9LHKQAr6uo0S5V9PB5KKbO8BTop+6uxVkzWbdFdOM8yJ42JMk7Zl1nRN+bwxa2OSPgx8\nKSJ273bhQU7SwcCKEXFUnfk1Py96+znipKwD5IaT9wM3Ayd3dwdJ0/c/WdEu7WjMWs1JmZk1qtlJ\nmRv6d4CIeFbSJpFuJzczM7MByG3K2owK9xgXOSEzMzMb2JyUtQkt6H3/lLJjMTMzs/7npKxkuff9\nK1jQ+/6JJYdkZmZmJXCbspLkTgavIXVK911gH/e+b2ZmNng5KStJRLwl6STSM8Xc2auZmdkg56Ss\nRBExpewYzMzMrD24TVkL5d73D5T0v2XHYmY2mEj6oqRnc+/sPer8Ovfqvkov9/tBSVObvWwv4rhQ\n0q6t2PZAI+kvktYtOw5wUtYSkoZL+hrpcRm7AleUHJKZDQCSpkl6NScNz0g6Nz+7r7jMFpJuysnI\nC5KukrRO1TIjJZ0m6V95W49JOlXSMv17RK0haXHg+8C2ETEyIub0ZP383MVpvdl3RNwSEWt3v2TP\nlu2J/EzM9SPiymZvuz9J+qqkmZLmSvq5pGF1lvtg4RFJlWG+pN3z/AMkvVU1v/hM1f8Fvt0fx9Qd\nJ2VNJukwUjK2KbBLROzcX49DMrMBL0ifKyNIz7RcDzimMlPS5qSHK18OrACsSno25a2SVs3LDCPd\n7b0OsH3e1ubALGACLSKpP5vLLA8sCTzU7A1L6oTvzYPo2UPK39bP16kuSduTnum6DemB4quRnqu5\niJzcvv0gc2AX4GXSMz0rbq162PmfCvN+C2ydn55Tqk54cyVS9GlIDy3tD9NJD//dq78fh2Rmg0dE\nPAvcAIwvTP4ucE5EnBERr0TEnIg4lvSg8Ul5mf2AccDuETE1b+u5iDgxIq6tta/cj+LvJT2fS+iO\nytPPlnR8YbmJkmYUxqdJOkLSfcDL+fXFVds+XdLp+fWoXCLytKQnJR1fLwnKzUNOk/RUHk6VNEzS\nmixIxl6QdGONda+V9OWqafdK2i2/ni9ptcIx/ljSNZJeBiZK2ljS3bk08jeSfl05D3XOwdfz9l+Q\ndJGkJeosO07SZZL+LWmWpDPy9NVz6ecsSc9JOk/SqFrnJdsB+GNhu12uX3WdXpI0RNJmkm6TNEfS\nPZK2Kiz/GUkP5uN/XNKBXcTSW/sDP4uIhyLiBVJJ1gENrnsAcHFEvFaYVveRRxHxH+BvwPa9C7V5\nOicpi1Afh355CHZE/CYiHumPfZnZoCQASSuSvnz/kseXIpV4XVxjnd8AH86vtwOubbQLHkkjgBtJ\nXfisALyHVNIGqeSuuwcofxLYERgFXATsJGl43vZQYA/g/Lzs2cA8YHVgI+AjwOfqbPebpJK9DfIw\nATgmf/5WEtVREbFdjXUvAPYuHOO6wErA7+rsa2/g+IgYDtxFKon8BTAauBDYjfrnIfIxbk8quVyf\nGslFPhdXA0+QSobGks5XxYmk878OKameVGtnkt6Z9/Nw1azu1q9cp6XzclcD346I0cBhwKWF6u1n\ngZ0jYiTwGeBUSRvViWfLnNjVG7aotR6wLqmUt+I+4F3qpn1gPv5PAOcUJgewUU5IH5Z0TD7fRQ+R\n3kel6pykrI3kX43HdL+kmVlTCbhC0oukUvnHgRPyvDGkz/SZNdZ7Blg2v16mzjL17AI8HRGnRsS8\niHg5Iv5aFVM9AfwwIp6KiNfz4+L+Duye528DvBoRd+aqox2Br0bEaxHxHHAaKVmoZR9S0jArImaR\nqrY+3UBMkNr5bihpXB7fF7g0It6ot3xE3J5fbwgMzaWRb0XE5cCd3ezvhxHxTG7b9tu8jWoTSMnQ\n4fn4X4+IW///9u49XK6qvOP492caIAmmYLEWQgIoGEEtT4ICLUkJoJiL8UK5FCIWoZWnLRcLtlIq\nxLQYtKJV6yPUUEiJmJQHglJMAwSJBEJCaa6IaMI1XIRwDxA1wNs/1ppkmMzM2XNyzsw+J7/P88xz\nZu+99t7vnpUz581aa68NEBEPRMStEbEpX+u/AofXOQakpApgQ2VFgf3fVE/Ap4B5ETE/77+AlIxO\nysvzIuKh/P52Uovt2HrBRMQdEbFrk9fiBtexM/Bi1fJL+edbG5SvOAZYX9M9eTvw3oh4OylhOxH4\nu5r9NrDls+sYJ2UtUJp9/3rS/xJfr5Npm1l/t61DKbYMqeiOAD6eWyjGkZKaD+RtzwNvkP6w19od\nWJ/fPwPs0cI5h5PGyXbXuprl6laqk9jSSrYXMBB4stKKAlwGvL3BcfcAHqlafpSC1xURG0itYpU4\n/qwqjq2KA4/VnPfxmjK111jrV1XvN5ISjlrDgUci4o3aDZLekbs9H5P0IjCLlFzX80L+uTl5Kbh/\n9QVOecwAAA03SURBVDXsBRxX3aIFHEYaq4ekCZKWKHVnPw9MbBJPd70MVN/EUulu3VCnbLU/B66q\nXhERD0XEI/n9vaSu0GNr9htK+4Y5NVSKAX1lJ2kMMBV4N+kujSkR8aqm6TlNa+1W6z6q4/9QzUoj\noqtWmLaIiNvzmKOvAkdExCuS7gKOp2o8UXY8W7ocFwAXSRpcsAvzUeCEBtteAQZXLf9BvVBrlq8F\nvi5pGKnb79C8fh3wG+D36iUmdTwB7M2W8WMj8rqiZgNTJS0CdoqI25qUrb6GJ0ldi9VGAGtbOHc9\n64ARkgZExOs126YDrwPvi4gX8ti3f6sbaPp38AAwEljcwv7V1/goMCsithorlsfDXUdqTftRngj9\nehq0TkoaS+r6bmR8pUWwxs9ILYrX5uUDgaea3UmbWz4PB/6yyfk2F69Z3p+aZK4TnJQVsy/pf1E/\nqJl9f9eYWo4vaDPbLn0T+FtJh0TEUuA84Calua9mkr7jzyXdDf7BvM8s0t1510n6HLCGNDbqdGB5\nncH+NwLfkHQ2qeVqB2D/iLgbWAGcK+kiYEfgc10FHBHrJS3M8T0YEb/I65+UdHM+1wWkhG8fYFhN\nV1TFbOCLkipdqRfmaytqHmlc2DTePHarVu13/GJST8kZpM9jEumz/UkL567nblLC9xVJU0mtnqNz\n916lK++lnMzWdr3VmkdKTipJWav7fx/4X0lHk5L5gaTkeQ2pG3EHUovrG5ImkMb+ra53oIhYRNdd\njvVcBcyUdDWppfEC4Mou9jmZdJflQ9Urc4zLIuIpSe8h3bF8TdX2nYDRbOn+7hh3XxYQETPzy49D\nMrPSyOOD/pM0dQC5xeEjpHE1TwAPk1oYxkTEA7nMb0mD/e8HbiH9sV5KGpO2pM45XibdJDCZlDT8\nktR1CikJWpnPM5+U3BTpmv0BcFT+We3TpD/49wHPkW5aqNf6Bmks3T2kAeCr8vuLqrY3jSN/DnMb\nxBE176Nqv02kz/c0Ui/CFFLi+tsG+2916jrHJ7eOTSY1AjxKajk7PpeZRkoaXiSNSbuui3N8L8dV\n0dL+EfEYaY7N84GnczznAspdv2eRkprnSF3APT4fWkTcRLqb+DbSv68HSD1WACjdDXtezW4n8+YB\n/hVHAiuV7p79Men6p1dtnwzcFhG/qrNvWymiu0Mb2kdSRC93GeQm2eNIrWFFms7RNIVbysz6l3Z8\n31j/Imkp8N2IqJcQdERuYbom+vgEsu0gaQlwakTc1419635fdPd7ZLvvvsy3Zp8OnENqiv8f4NmO\nBmVmZqWlNBv8L0ldeFOA9/HmiUo7LiKmdF3KACLi0K5Ltcd2m5RJehtwJnAGaSzApIhY0dmozMys\nDxhJ6r4bQupWOzbSZL5m22S7TcpIt/COAA7zZK9mZlZURMwAZnQ6Dut/ttukLCK+TzefDWZmZmbW\n0/r93Zd59v1BnY7DzMzMrJl+m5Tl2fd/SJpjZf9Ox2NmZmbWTL9KypQcKWkBaRbgBcA7I2JZh0Mz\nMzMza6q/jSkbA1wKXMzWs++bmRWi7j+b0sys23o1KZM0nvQYkAHA5RHx1Tplvg1MAF4FTomI5dtw\nyjuAA+o8N8zMrBBPHGtmndJr3ZeSBgDfAcYDBwAnStq/psxEYN+I2A/4LKmVq8ixd5Q0pHZ9JE7I\nSkbSuE7HYN3juuvbXH99l+tu+9SbY8oOBtZGxMP5WWFzSM/SqvYx8nOq8sN0d5H0jkYHlLSzpHOA\nB0nPHrO+YVynA7BuG9fpAGybjOt0ANZt4zodgLVfbyZlw0gPVK14LK/rqsye9Q4maSopGTsE+GhE\nzOq5UM3MzMw6qzfHlBUdKFs7fqPRfiOAMZ5938zMzPojRfTOTUaSDgW+FBHj8/I/AG9UD/aXdBmw\nMCLm5OX7gcNrnyHmO6HMzMysL+nOTUO92VJ2D7CfpL2BJ4ATgBNrytxAeiD4nJzEvVDvoa6+G8rM\nzMz6u15LyiLiNUlnADeRpsT4j4j4uaTT8/Z/j4h5kiZKWgu8Anymt+IxMzMzK7Ne6740MzMzs+JK\n9ZglSeMl3S9pjaQvNCjz7bx9paRR7Y7R6uuq7iRNyXW2StKdkv6wE3FafUV+93K5D0p6TZKnpCmJ\ngt+b4yQtl3SvpIVtDtGaKPDduZuk+ZJW5Po7pQNhWh2SrpD0lKTVTcq0lrNERClepC7OtcDewEBg\nBbB/TZmJwLz8/hBgSafj9qtw3f0R8Lv5/XjXXXleReqvqtxPgBuBP+103H4V/t3bBfgZsGde3q3T\ncfvVUv19Cbi4UnfAs8DvdDp2vwJgLDAKWN1ge8s5S5laynp8sllrmy7rLiLuiogX8+JSGsxHZx1R\n5HcP4EzgWmB9O4OzporU3UnAdRHxGEBEPNPmGK2xIvX3JDA0vx8KPBsRr7UxRmsgIhYBzzcp0nLO\nUqakrEcnm7W2KlJ31U4D5vVqRNaKLutP0jDSH4vKo9A8GLUcivzu7Qe8TdJtku6RdHLborOuFKm/\nGcB7JT0BrATOblNstu1azll69YHkLerpyWatfQrXgaQjgFOBw3ovHGtRkfr7JnBeRIQksfXvoXVG\nkbobCIwGjgIGA3dJWhIRa3o1MiuiSP2dD6yIiHGS3gXcIunAiNjQy7FZz2gpZylTUvY4MLxqeTgp\nq2xWZs+8zjqrSN2RB/fPAMZHRLMmX2uvIvV3EGk+QUjjWiZI2hQRN7QnRGugSN2tA56JiI3ARkm3\nAwcCTso6r0j9/THwZYCIeEDSQ8BI0lygVm4t5yxl6r7cPNmspB1Ik83WfuHfAHwaNj8xoO5ks9Z2\nXdadpBHAXOBTEbG2AzFaY13WX0S8MyL2iYh9SOPK/soJWSkU+d78ETBG0gBJg0kDju9rc5xWX5H6\nux/4EEAejzSS9BxoK7+Wc5bStJSFJ5vts4rUHXAhsCtwaW5t2RQRB3cqZtuiYP1ZCRX83rxf0nxg\nFfAGMCMinJSVQMHfvenAlZJWkhpS/j4inutY0LaZpNnA4cBuktYBU0nDBbqds3jyWDMzM7MSKFP3\npZmZmdl2y0mZmZmZWQk4KTMzMzMrASdlZmZmZiXgpMzMzMysBJyUmZmZmZWAkzIz6xGSXpe0vOo1\noknZl3vgfDMlPZjP9X95csZWjzFD0nvy+/Nrtt25rTHm41Q+l1WS5krauYvyB0qa0BPnNrO+xfOU\nmVmPkLQhIt7a02WbHONK4L8jYq6kDwOXRMSB23C8bY6pq+NKmgmsjoivNyl/CnBQRJzZ07GYWbm5\npczMeoWkIZIW5FasVZI+VqfM7pJuzy1JqyWNyeuPlrQ473uNpCGNTpN/LgL2zfuek4+1WtLZVbH8\nWNKKvP64vH6hpIMkfQUYlOOYlbe9nH/OkTSxKuaZko6R9BZJX5N0t6SVkj5b4GO5C3hXPs7B+RqX\nSbpT0rvzo3b+CTghx3Jcjv0KSUtz2a0+RzPrH0rzmCUz6/MGSVqe3z8IHA98MiI2SNqNlJDUPtfv\nJGB+REyX9BZgcC77j8BREbFR0heAc4B/bnLuycAqSaOBU4CDSf/pXCrpp6RE6PGImAQgaWjeL4CI\niPMk/U1EjKo6ZqUbYU6+lnk5aToSOB34C9Kz7A6WtCNwh6SbI+LhegFKGgAcDdyaV/0cGBsRr0v6\nEDA9Io6VdAGppeysvN904NaIOFXSLvmaFkTEq00+DzPrg5yUmVlP2Vid1EgaCFwsaSzpmYt7SPr9\niHi6ap+7gSty2R9GxEpJ44ADgMX5Oak7AIvrnE/A1yR9EXgaOA34MDA3IjbmGOYCY4H5wCW5RezG\niLijheuaD3wrJ2QTgJ9GxG8kHQ28X9KxudxQUmvdwzX7V5LVYXnbZXn9LsBVkvYlJYCV72OxpQUQ\nUiI3WdLn8/KOwHDgFy1cg5n1AU7KzKy3TAF2A0bn1qCHgJ2qC0TEopy0fRSYKekbwPPALRFxUhfH\nD+DzETG3siK3OFUnNEqniTWSRgGTgIsk3RoRzVreqmP8taSFwEdILWazqzafERG3dHGIjRExStIg\n0oOnPw5cT2r5uzUiPilpL2Bhk2McExFrisRrZn2Xx5SZWW8ZCjydE7IjgL1qC+Q7NNdHxOXA5cAo\nYAlwmKTK2KshkvZrcA7VLC8CPiFpUB6H9glgkaTdgV9HxNXAJfk8tTZJavQf1f8CTmVLqxukBOuv\nK/vkMWGDG+xPbr07C/iyUhPgUOCJvPkzVUVfAqpvOLgp70c+T73YzawfcFJmZj2l9lbuq4EPSFoF\nnEwaQ1Vb9ghghaRlpFaob0XEM6RxYbMlrSR1XY4scs6IWA7MJHWLLgFmRMRK4P2ksVjLgQuBi+oc\n63ukcWmz6hz7ZuBPSC14r+V1lwP3AcskrQYupX7vw+bjRMQKYG2+1n8hde8uAwZUlbsNOKAy0J/U\nojYw3yxxLzCtwWdhZn2cp8QwMzMzKwG3lJmZmZmVgJMyMzMzsxJwUmZmZmZWAk7KzMzMzErASZmZ\nmZlZCTgpMzMzMysBJ2VmZmZmJeCkzMzMzKwE/h/2gwdGK2ADGgAAAABJRU5ErkJggg==\n", "text/plain": "
"}, "metadata": {}}], "metadata": {"collapsed": false, "trusted": true}}, {"source": "## Build a Transformation and Classification Pipeline \n\nThis recipe builds a [transformation and training pipeline](http://scikit-learn.org/stable/modules/pipeline.html) for a model that can classify a snippet of text as belonging to one of 20 [USENET](http://en.wikipedia.org/wiki/Usenet) [newgroups](http://en.wikipedia.org/wiki/Usenet_newsgroup). It then prints the [precision, recall, and F1-score](http://en.wikipedia.org/wiki/Precision_and_recall) for predictions over a held-out test set as well as the confusion matrix.\n\nThis recipe defaults to using the [20 USENET newsgroup](http://kdd.ics.uci.edu/databases/20newsgroups/20newsgroups.html) dataset. To use your own data, set `X` to your instance feature vectors, `y` to the instance classes as a factor, and `labels` to human-readable names of the classes. Then modify the pipeline components to perform appropriate transformations for your data.\n\n
**Warning:** Running this recipe with the sample data may consume a significant amount of memory.
", "cell_type": "markdown", "metadata": {}}, {"execution_count": 16, "cell_type": "code", "source": "#
\nimport pandas\nimport sklearn.metrics as metrics\nfrom sklearn.feature_extraction.text import CountVectorizer, TfidfTransformer\nfrom sklearn.feature_extraction.text import HashingVectorizer\nfrom sklearn.linear_model import Perceptron\nfrom sklearn.naive_bayes import MultinomialNB\nfrom sklearn.linear_model import SGDClassifier\nfrom sklearn.pipeline import Pipeline\nfrom sklearn.cross_validation import train_test_split\nfrom sklearn.datasets import fetch_20newsgroups\n\n# download the newsgroup dataset\ndataset = fetch_20newsgroups('all')\n\n# define feature vectors (X) and target (y) \nX = dataset.data\ny = dataset.target\nlabels = dataset.target_names\nlabels", "outputs": [{"execution_count": 16, "output_type": "execute_result", "data": {"text/plain": "['alt.atheism',\n 'comp.graphics',\n 'comp.os.ms-windows.misc',\n 'comp.sys.ibm.pc.hardware',\n 'comp.sys.mac.hardware',\n 'comp.windows.x',\n 'misc.forsale',\n 'rec.autos',\n 'rec.motorcycles',\n 'rec.sport.baseball',\n 'rec.sport.hockey',\n 'sci.crypt',\n 'sci.electronics',\n 'sci.med',\n 'sci.space',\n 'soc.religion.christian',\n 'talk.politics.guns',\n 'talk.politics.mideast',\n 'talk.politics.misc',\n 'talk.religion.misc']"}, "metadata": {}}], "metadata": {"collapsed": false, "trusted": true}}, {"execution_count": 17, "cell_type": "code", "source": "#
\n# split data holding out 30% for testing the classifier\nX_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=0)\n\n# pipelines concatenate functions serially, output of 1 becomes input of 2\nclf = Pipeline([\n ('vect', HashingVectorizer(analyzer='word', ngram_range=(1,3))), # count frequency of words, using hashing trick\n ('tfidf', TfidfTransformer()), # transform counts to tf-idf values,\n ('clf', SGDClassifier(loss='hinge', penalty='l2', alpha=1e-3, n_iter=5))\n])", "outputs": [], "metadata": {"collapsed": false, "trusted": true}}, {"execution_count": 18, "cell_type": "code", "source": "#
\n# train the model and predict the test set\ny_pred = clf.fit(X_train, y_train).predict(X_test)\n\n# standard information retrieval metrics\nprint metrics.classification_report(y_test, y_pred, target_names=labels)", "outputs": [{"output_type": "stream", "name": "stdout", "text": " precision recall f1-score support\n\n alt.atheism 0.86 0.95 0.90 137\n comp.graphics 0.89 0.84 0.86 178\n comp.os.ms-windows.misc 0.85 0.90 0.88 168\ncomp.sys.ibm.pc.hardware 0.85 0.78 0.81 180\n comp.sys.mac.hardware 0.95 0.83 0.89 174\n comp.windows.x 0.88 0.91 0.90 193\n misc.forsale 0.78 0.92 0.84 157\n rec.autos 0.93 0.89 0.91 188\n rec.motorcycles 0.96 0.95 0.95 185\n rec.sport.baseball 0.94 0.95 0.95 183\n rec.sport.hockey 0.92 0.98 0.95 180\n sci.crypt 0.95 0.97 0.96 181\n sci.electronics 0.93 0.80 0.86 180\n sci.med 0.95 0.94 0.95 159\n sci.space 0.91 0.98 0.95 185\n soc.religion.christian 0.88 0.92 0.90 176\n talk.politics.guns 0.93 0.98 0.95 176\n talk.politics.mideast 0.92 0.98 0.95 180\n talk.politics.misc 0.97 0.91 0.94 127\n talk.religion.misc 0.99 0.69 0.81 108\n\n avg / total 0.91 0.91 0.91 3395\n\n"}], "metadata": {"collapsed": false, "trusted": true}}, {"execution_count": 19, "cell_type": "code", "source": "#
\n# show the confusion matrix in a labeled dataframe for ease of viewing\nindex_labels = ['{} {}'.format(i, l) for i, l in enumerate(labels)]\npandas.DataFrame(metrics.confusion_matrix(y_test,y_pred), index=index_labels)", "outputs": [{"execution_count": 19, "output_type": "execute_result", "data": {"text/plain": " 0 1 2 3 4 5 6 7 8 9 \\\n0 alt.atheism 130 0 0 0 0 0 0 0 0 0 \n1 comp.graphics 1 150 5 4 1 7 3 0 0 0 \n2 comp.os.ms-windows.misc 0 1 152 3 0 6 2 0 0 0 \n3 comp.sys.ibm.pc.hardware 0 3 10 140 4 3 7 1 0 1 \n4 comp.sys.mac.hardware 1 2 2 4 145 4 4 0 1 3 \n5 comp.windows.x 0 5 5 2 0 176 1 1 0 0 \n6 misc.forsale 0 2 0 3 0 0 144 2 0 1 \n7 rec.autos 1 1 0 0 1 1 5 167 3 3 \n8 rec.motorcycles 2 0 0 0 0 0 4 2 175 0 \n9 rec.sport.baseball 0 1 0 0 0 0 3 1 1 174 \n10 rec.sport.hockey 0 1 0 0 0 0 1 0 0 1 \n11 sci.crypt 0 1 1 0 0 1 0 0 0 0 \n12 sci.electronics 2 1 3 8 1 2 5 3 0 1 \n13 sci.med 0 0 0 0 0 0 1 1 0 0 \n14 sci.space 0 0 0 0 0 0 0 0 0 0 \n15 soc.religion.christian 2 1 0 1 0 0 4 1 0 0 \n16 talk.politics.guns 1 0 1 0 0 0 0 0 0 1 \n17 talk.politics.mideast 0 0 0 0 0 0 0 1 0 0 \n18 talk.politics.misc 1 0 0 0 0 0 0 0 2 0 \n19 talk.religion.misc 11 0 0 0 0 0 0 0 0 0 \n\n 10 11 12 13 14 15 16 17 18 19 \n0 alt.atheism 0 0 0 0 0 3 1 2 0 1 \n1 comp.graphics 2 1 1 1 1 0 1 0 0 0 \n2 comp.os.ms-windows.misc 1 0 1 0 1 1 0 0 0 0 \n3 comp.sys.ibm.pc.hardware 2 3 2 0 3 1 0 0 0 0 \n4 comp.sys.mac.hardware 0 1 2 2 1 0 0 2 0 0 \n5 comp.windows.x 1 1 0 0 1 0 0 0 0 0 \n6 misc.forsale 2 1 1 0 0 1 0 0 0 0 \n7 rec.autos 1 0 2 0 1 0 1 0 1 0 \n8 rec.motorcycles 1 0 0 0 1 0 0 0 0 0 \n9 rec.sport.baseball 2 0 0 1 0 0 0 0 0 0 \n10 rec.sport.hockey 177 0 0 0 0 0 0 0 0 0 \n11 sci.crypt 0 176 1 0 0 0 0 1 0 0 \n12 sci.electronics 1 2 144 1 5 0 0 0 1 0 \n13 sci.med 0 1 0 150 2 1 2 0 1 0 \n14 sci.space 1 0 1 0 182 0 0 1 0 0 \n15 soc.religion.christian 0 0 0 2 0 162 0 3 0 0 \n16 talk.politics.guns 0 0 0 0 1 0 172 0 0 0 \n17 talk.politics.mideast 1 0 0 0 1 0 0 177 0 0 \n18 talk.politics.misc 0 0 0 0 0 1 4 4 115 0 \n19 talk.religion.misc 0 0 0 1 0 15 4 3 0 74 ", "text/html": "
\n
\n
\n
\n
\n
0
\n
1
\n
2
\n
3
\n
4
\n
5
\n
6
\n
7
\n
8
\n
9
\n
10
\n
11
\n
12
\n
13
\n
14
\n
15
\n
16
\n
17
\n
18
\n
19
\n
\n
\n
\n
\n
0 alt.atheism
\n
130
\n
0
\n
0
\n
0
\n
0
\n
0
\n
0
\n
0
\n
0
\n
0
\n
0
\n
0
\n
0
\n
0
\n
0
\n
3
\n
1
\n
2
\n
0
\n
1
\n
\n
\n
1 comp.graphics
\n
1
\n
150
\n
5
\n
4
\n
1
\n
7
\n
3
\n
0
\n
0
\n
0
\n
2
\n
1
\n
1
\n
1
\n
1
\n
0
\n
1
\n
0
\n
0
\n
0
\n
\n
\n
2 comp.os.ms-windows.misc
\n
0
\n
1
\n
152
\n
3
\n
0
\n
6
\n
2
\n
0
\n
0
\n
0
\n
1
\n
0
\n
1
\n
0
\n
1
\n
1
\n
0
\n
0
\n
0
\n
0
\n
\n
\n
3 comp.sys.ibm.pc.hardware
\n
0
\n
3
\n
10
\n
140
\n
4
\n
3
\n
7
\n
1
\n
0
\n
1
\n
2
\n
3
\n
2
\n
0
\n
3
\n
1
\n
0
\n
0
\n
0
\n
0
\n
\n
\n
4 comp.sys.mac.hardware
\n
1
\n
2
\n
2
\n
4
\n
145
\n
4
\n
4
\n
0
\n
1
\n
3
\n
0
\n
1
\n
2
\n
2
\n
1
\n
0
\n
0
\n
2
\n
0
\n
0
\n
\n
\n
5 comp.windows.x
\n
0
\n
5
\n
5
\n
2
\n
0
\n
176
\n
1
\n
1
\n
0
\n
0
\n
1
\n
1
\n
0
\n
0
\n
1
\n
0
\n
0
\n
0
\n
0
\n
0
\n
\n
\n
6 misc.forsale
\n
0
\n
2
\n
0
\n
3
\n
0
\n
0
\n
144
\n
2
\n
0
\n
1
\n
2
\n
1
\n
1
\n
0
\n
0
\n
1
\n
0
\n
0
\n
0
\n
0
\n
\n
\n
7 rec.autos
\n
1
\n
1
\n
0
\n
0
\n
1
\n
1
\n
5
\n
167
\n
3
\n
3
\n
1
\n
0
\n
2
\n
0
\n
1
\n
0
\n
1
\n
0
\n
1
\n
0
\n
\n
\n
8 rec.motorcycles
\n
2
\n
0
\n
0
\n
0
\n
0
\n
0
\n
4
\n
2
\n
175
\n
0
\n
1
\n
0
\n
0
\n
0
\n
1
\n
0
\n
0
\n
0
\n
0
\n
0
\n
\n
\n
9 rec.sport.baseball
\n
0
\n
1
\n
0
\n
0
\n
0
\n
0
\n
3
\n
1
\n
1
\n
174
\n
2
\n
0
\n
0
\n
1
\n
0
\n
0
\n
0
\n
0
\n
0
\n
0
\n
\n
\n
10 rec.sport.hockey
\n
0
\n
1
\n
0
\n
0
\n
0
\n
0
\n
1
\n
0
\n
0
\n
1
\n
177
\n
0
\n
0
\n
0
\n
0
\n
0
\n
0
\n
0
\n
0
\n
0
\n
\n
\n
11 sci.crypt
\n
0
\n
1
\n
1
\n
0
\n
0
\n
1
\n
0
\n
0
\n
0
\n
0
\n
0
\n
176
\n
1
\n
0
\n
0
\n
0
\n
0
\n
1
\n
0
\n
0
\n
\n
\n
12 sci.electronics
\n
2
\n
1
\n
3
\n
8
\n
1
\n
2
\n
5
\n
3
\n
0
\n
1
\n
1
\n
2
\n
144
\n
1
\n
5
\n
0
\n
0
\n
0
\n
1
\n
0
\n
\n
\n
13 sci.med
\n
0
\n
0
\n
0
\n
0
\n
0
\n
0
\n
1
\n
1
\n
0
\n
0
\n
0
\n
1
\n
0
\n
150
\n
2
\n
1
\n
2
\n
0
\n
1
\n
0
\n
\n
\n
14 sci.space
\n
0
\n
0
\n
0
\n
0
\n
0
\n
0
\n
0
\n
0
\n
0
\n
0
\n
1
\n
0
\n
1
\n
0
\n
182
\n
0
\n
0
\n
1
\n
0
\n
0
\n
\n
\n
15 soc.religion.christian
\n
2
\n
1
\n
0
\n
1
\n
0
\n
0
\n
4
\n
1
\n
0
\n
0
\n
0
\n
0
\n
0
\n
2
\n
0
\n
162
\n
0
\n
3
\n
0
\n
0
\n
\n
\n
16 talk.politics.guns
\n
1
\n
0
\n
1
\n
0
\n
0
\n
0
\n
0
\n
0
\n
0
\n
1
\n
0
\n
0
\n
0
\n
0
\n
1
\n
0
\n
172
\n
0
\n
0
\n
0
\n
\n
\n
17 talk.politics.mideast
\n
0
\n
0
\n
0
\n
0
\n
0
\n
0
\n
0
\n
1
\n
0
\n
0
\n
1
\n
0
\n
0
\n
0
\n
1
\n
0
\n
0
\n
177
\n
0
\n
0
\n
\n
\n
18 talk.politics.misc
\n
1
\n
0
\n
0
\n
0
\n
0
\n
0
\n
0
\n
0
\n
2
\n
0
\n
0
\n
0
\n
0
\n
0
\n
0
\n
1
\n
4
\n
4
\n
115
\n
0
\n
\n
\n
19 talk.religion.misc
\n
11
\n
0
\n
0
\n
0
\n
0
\n
0
\n
0
\n
0
\n
0
\n
0
\n
0
\n
0
\n
0
\n
1
\n
0
\n
15
\n
4
\n
3
\n
0
\n
74
\n
\n
\n
\n
"}, "metadata": {}}], "metadata": {"collapsed": false, "trusted": true}}, {"source": "
\n
\n
\n
This notebook was created using
IBM Knowledge Anyhow Workbench
. To learn more, visit us at
https://knowledgeanyhow.org
.
\n
\n
", "cell_type": "markdown", "metadata": {}}], "nbformat": 4, "metadata": {"kernelspec": {"display_name": "Python 2", "name": "python2", "language": "python"}, "language_info": {"mimetype": "text/x-python", "nbconvert_exporter": "python", "version": "2.7.6", "name": "python", "file_extension": ".py", "pygments_lexer": "ipython2", "codemirror_mode": {"version": 2, "name": "ipython"}}}}
Projects
Ask Irfan
: Ask Irfan
RAG
: Retrieval Augmentated Generation
Irfan's Engine
: A bare-minimum PHP framework to create web applications or APIs focussing on the requests and the responses
Exploits SE
A Research Project of Irfan TOOR to do a search of exploits
more ...
Some articles
10 Security Tools
Effects of Radiation
Une IA super-intelligente pourrait arriver dès 2027
more ...
Some example notebooks
instaquery
sklearn
webserver