The Fruit Dataset for Clustering Practice - Metadata

Each field in the fruit dataset is briefly described below.

ID: an integer unique identifier for each datapoint.

image_filename: text field containing the name of the file of the image.

image: a .png image pasted into the cell of the Excel file.

colour1: a categorical variable with 6 values describing the colour of the fruit in the images.

colour2: a categorical variable with 4 values describing the colour of the fruit in the images.

leaf: a binary value with 1 indicating presence of a leaf in the picture and 0 indicating no leaf.

shadow-size: an ordinal variable with three values (0,1,2). 0 = no shadow, 1 = a small shadow 2 = a large shadow.

width: an integer value with a pixel count of the width of the fruit in some version of the image.

height: an integer value with a pixel count of the height of the fruit in the same version of the image that was used to obtain the pixel width measurement. The point at which the stem entered the fruit on the image was used as the top of the fruit in this measurement.

widthtoheight: the ratio of the width to height measurement.

weight: continuous variable with simulated values. For each fruit type, weight is normally distributed with the mean weight for each fruit based on the weight for medium fruit of each type listed in the https://nutritiondata.self.com and standard deviation based roughly on the difference between this weight and the weights listed for small and large sizes of each fruit.

typeoffruit: categorical/binary variable indicating if the fruit is an apple or a pear.