diff --git a/decision tree classification.ipynb b/decision tree classification.ipynb index 72186d6..a366e17 100644 --- a/decision tree classification.ipynb +++ b/decision tree classification.ipynb @@ -24,7 +24,7 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "# Constructing a model" + "# Constructing a model (algorithm)" ] }, { @@ -309,10 +309,17 @@ "\n", "This dataset became a typical test case for many statistical classification techniques in machine learning such as support vector machines\n", "\n", + "**Reference**\n", + "\n", + " R. A. Fisher (1936). \"The use of multiple measurements in taxonomic problems\". Annals of Eugenics. 7 (2): 179–188.\n", + " \n", + " https://en.wikipedia.org/wiki/Iris_flower_data_set\n", + "\n", "**Content**\n", "\n", "The dataset contains a set of 150 records under 5 attributes - Petal Length, Petal Width, Sepal Length, Sepal width and Class(Species).\n", "\n", + "---\n", "So, our objective here is to predict the class that is the specie of the iris flower, given it's features which are:\n", "1. sepal_length\n", "2. sepal width\n", @@ -562,7 +569,7 @@ "source": [ "## Model visualization\n", "\n", - "We will use the methon print.tree() to visualize our tree." + "We will use the method print.tree() to visualize our tree." ] }, { @@ -598,12 +605,12 @@ "source": [ "## Testing the model\n", "\n", - "We are using the definded method predict() to determine the classes of the Test dataset - those will be stored in the Y_pred which we will than compare to Y_test with the help of sklearn library function called accuracy_score" + "We are using definded method predict() to determine the classes of the Test dataset - those will be stored in the Y_pred which we will then compare to Y_test with the help of sklearn library function called accuracy_score" ] }, { "cell_type": "code", - "execution_count": 32, + "execution_count": 41, "metadata": {}, "outputs": [ { @@ -612,7 +619,7 @@ "0.9333333333333333" ] }, - "execution_count": 32, + "execution_count": 41, "metadata": {}, "output_type": "execute_result" } @@ -639,6 +646,16 @@ "Our objective here is to predict if the customer will purchase the iPhone or not given their gender, age and salary." ] }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### About the data \n", + "\n", + "Despite all the effort I couldn't find the origin of this data thus it shouldn't be used for any other purposes. \n", + "The dataset contains a set of 400 records under 4 attributes - Gender, Age, Salary and Class( whether the person made a purchase or not)." + ] + }, { "cell_type": "markdown", "metadata": {},