【kaggle】いろんな果物を分類してみる

機械学習

カグルのデータセットに、こんなんありました。果物と野菜の栄養成分表です。
野菜もありますが、ひとまず、果物について、分類してみようと思います。
ちょっと調べてみたのですが、果物と野菜を分類する明確な定義ってないんですって。
なので、このデータで果物とされている子たちを分類していきます。
教師なしデータなので、ここ『いろんなアルゴリズムの分類予測平面を見てみる』でも少し扱った
k平均法でやっていきます。

データの確認と編集

さて、いつもの連中をインポートして、CSVを読み込みます。

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
plt.style.use("ggplot")

df_s=pd.read_csv("fruits.csv")
df_s
nameenergy (kcal/kJ)water (g)protein (g)total fat (g)carbohydrates (g)fiber (g)sugars (g)calcium (mg)iron (mg)magnessium (mg)phosphorus (mg)potassium (mg)sodium (g)vitamin A (IU)vitamin C (mg)vitamin B1 (mg)vitamin B2 (mg)viatmin B3 (mg)vitamin B5 (mg)vitamin B6 (mg)vitamin E (mg)
0Apple nutrition facts48/20086.70.270.1312.71.310.150.074119003840.0190.0280.0910.0710.0370.05
1Apricot nutrition facts48/20186.41.40.3911.1229.24130.39102325911926100.030.040.60.240.0540.89
2Avocado nutrition160/67073.23214.78.536.70.66120.5529524857146100.0670.131.7381.3890.2572.07
3Banana nutrition facts89/37174.911.090.3322.842.612.2350.2627223581648.70.0310.0730.6650.3340.3670.1
4Blackberries nutrition43/18188.151.390.499.615.34.88290.6220221621214210.020.0260.6460.2760.031.17
5Blueberry nutrition facts57/24084.210.740.3314.492.49.9660.28612771549.70.0370.0410.4180.1240.0520.57
6Carambola or Starfruit31/12891.381.040.336.732.83.9830.08101213326134.40.0140.0160.3670.3910.0170.15
7Cherimoya fruit nutrition75/31379.391.570.6817.71312.87100.2717262877512.60.1010.1310.6440.3450.2570.27
8Cherry fruit nutrition facts63/26382.251.060.216.012.112.82130.36112122206470.0270.0330.1540.1990.0490.07
9Clementine47/19886.580.850.1512.021.79.18300.141021177148.80.0860.030.6360.1510.0750.2
10Cranberries46/19487.130.390.1312.24.64.0480.256138526013.30.0120.020.1010.2950.0571.2
11Currants Black63/26481.961.40.4115.38551.54245932222301810.050.050.30.3980.066
12Currants, Red and White56/23483.951.40.213.84.37.373311344275142410.040.050.10.0640.070.1
13Date nutrition: Dates, deglet noor282/117820.532.450.3975.03863.35391.0243626562100.40.0520.0661.2740.5890.1650.05
14Dates, medjool277/116021.321.810.1574.976.766.47640.95462696114900.050.061.610.8050.249
15Durian nutrition147/61564.991.475.3327.093.860.43303943624419.70.3740.21.0740.230.316
16Fig nutrition: Figs, raw74/31079.110.750.319.182.916.26350.371714232114220.060.050.40.30.1130.11
17Grapefruit, pink and red42/17688.060.770.1410.661.66.89220.089181350115031.20.0430.0310.2040.2620.0530.13
18Grapefruit, white33/13890.480.690.18.411.17.31120.069814803333.30.0370.020.2690.2830.0430.13
19Grape nutrition: Grapes, red or green69/28880.540.720.1618.10.915.48100.3672019126610.80.0690.070.1880.050.0860.19
20Groundcherries or Cape Gooseberries53/22285.41.90.711.29140720110.110.042.8
21Guava nutrition facts68/28580.82.550.9514.325.48.92180.2622404172624228.30.0670.041.0840.4510.110.73
22Gooseberries44/18487.870.880.5810.184.3250.311027198129027.70.040.030.30.2860.080.37
23Jackfruit nutrition94/39373.231.470.324.011.6340.6373630332976.70.030.110.40.108
24Kiwi nutriion: Kiwifruit, gold60/25183.221.230.5614.23210.98200.291429316372105.40.0240.0460.280.50.0571.49
25Kiwi nutriion: Kiwifruit, green61/25583.071.140.5214.6638.99340.31173431238792.70.0270.0250.3410.1830.0631.46
26Kumquat nutrition71/29680.851.880.8615.96.59.36620.8620191861029043.90.0370.090.4290.2080.0360.15
27Lemon nutrition: Lemon with peel20/8487.41.20.310.74.7610.71215145330770.050.040.20.2320.109
28Lime nutrition30/12688.260.70.210.542.81.69330.661810225029.10.030.020.20.2170.0430.22
29Litchis or Lychees nutrition66/27681.760.830.4416.531.315.2350.3110311711071.50.0110.0650.6030.10.07
30Mandarin-Clementine nutrition47/19886.580.850.1512.021.79.18300.141021177148.80.0860.030.6360.1510.0750.2
31Mandarin-Tangerine nutrition53/22385.170.810.3113.341.810.58370.151220166268126.70.0580.0360.3760.2160.0780.2
32Mango nutrition facts65/27281.710.510.27171.814.8100.13911156276527.70.0580.0570.5840.160.1341.12
33Melons, cantaloupe34/14190.150.840.198.160.97.8690.21121526716338236.70.0410.0190.7340.1050.0720.05
34Melons, casaba28/11891.851.110.16.580.95.69110.341151829021.80.0150.0310.2320.0840.1630.05
35Melons, honeydew36/15089.820.540.149.090.88.1260.1710112281850180.0380.0120.4180.1550.0880.02
36Mulberries, raw43/18087.681.440.399.81.78.1391.851838194102536.40.0290.1010.620.050.87
37Nectarine nutrition44/18587.591.060.3210.551.77.8960.2892620103325.40.0340.0271.1250.1850.0250.77
38Oranges nutrition facts47/19786.750.940.1211.752.49.35400.11014181022553.20.0870.040.2820.250.060.18
39Papaya nutrition39/16388.830.610.149.811.85.9240.11052573109461.80.0270.0320.3380.2180.0190.73
40Passion fruit nutrition97/40672.932.20.723.3810.411.2121.629683482812723000.131.50.10.02
41Peaches nutrition39/16588.870.910.259.541.58.3960.2592019003266.60.0240.0310.8060.1530.0250.73
42Pear nutrition58/24283.710.380.1215.463.19.890.177111191234.20.0120.0250.1570.0480.0280.12
43Persimmon nutrition: Persimmons, native127/53164.40.80.433.5272.526310166
44Pineapple nutrition50/209860.540.1213.121.49.85130.2912810915847.80.0790.0320.50.2130.1120.02
45Plum nutrition46/19287.230.70.2811.421.49.9260.1771615703459.50.0280.0260.4170.1350.0290.26
46Pomegranate nutrition facts83/34677.931.671.1718.7413.67100.312362363010.20.0670.0530.2930.3770.0750.6
47Quince fruit57/23883.80.40.115.31.9110.7817197440150.020.030.20.0810.04
48Raspberry nutrition52/22085.751.20.6511.946.54.42250.69222915113326.20.0320.0380.5980.3290.0550.87
49Strawberry nutrition facts32/13690.950.670.37.6824.89160.41132415311258.80.0240.0220.3860.1250.0470.29
50Tangerine nutrition53/22385.170.810.3113.341.810.58370.151220166268126.70.0580.0360.3760.2160.0780.2
51Watermelon nutrition facts30/12791.450.610.157.550.46.270.24101111215698.10.0330.0210.1780.2210.0450.05

infoはこんな感じ。

df_s.info()

52種の果物のデータが、22個のカラムで入ってます。
欠損値なしとは書いてますが、”-”こんなん入ってたり、数値だけどobject型だったりするので、
整えていかなあきません。まず、 energy (kcal/kJ) この子、2つの数字がobjectで入ってるので、
やっていきます。

その前に、まず、nameを別途置いときます。

df_name=df_s.iloc[:,:1]
df_name

そして、該当のとこを、

df_en=df_s["energy (kcal/kJ)"].str.split("/", expand=True)
df_en.columns=["energy(kcal)","energy(kJ)"]
df_en.head(10)

スプリットして、こう

energy(kcal)energy(kJ)
048200
148201
2160670
389371
443181
557240
631128
775313
863263
947198

これらが、objectなので、使用するkcalの方を数字に。

df_en["energy(kcal)"]=df_en["energy(kcal)"].astype(float,errors="raise")
df_en.info()

ほんでもってwater含め、ここから右を使うから、

df_data=df_s.iloc[:,2:]

こないして、

df_data=pd.concat([df_en["energy(kcal)"],df_data],axis=1)

エナジーくっつけて、

df_data=pd.concat([df_name,df_data],axis=1)

nameもくっつけとく。

nameenergy(kcal)water (g)protein (g)total fat (g)carbohydrates (g)fiber (g)sugars (g)calcium (mg)iron (mg)magnessium (mg)phosphorus (mg)potassium (mg)sodium (g)vitamin A (IU)vitamin C (mg)vitamin B1 (mg)vitamin B2 (mg)viatmin B3 (mg)vitamin B5 (mg)vitamin B6 (mg)vitamin E (mg)
0Apple nutrition facts4886.70.270.1312.71.310.150.074119003840.0190.0280.0910.0710.0370.05
1Apricot nutrition facts4886.41.40.3911.1229.24130.39102325911926100.030.040.60.240.0540.89
2Avocado nutrition16073.23214.78.536.70.66120.5529524857146100.0670.131.7381.3890.2572.07
3Banana nutrition facts8974.911.090.3322.842.612.2350.2627223581648.70.0310.0730.6650.3340.3670.1
4Blackberries nutrition4388.151.390.499.615.34.88290.6220221621214210.020.0260.6460.2760.031.17
5Blueberry nutrition facts5784.210.740.3314.492.49.9660.28612771549.70.0370.0410.4180.1240.0520.57
6Carambola or Starfruit3191.381.040.336.732.83.9830.08101213326134.40.0140.0160.3670.3910.0170.15
7Cherimoya fruit nutrition7579.391.570.6817.71312.87100.2717262877512.60.1010.1310.6440.3450.2570.27
8Cherry fruit nutrition facts6382.251.060.216.012.112.82130.36112122206470.0270.0330.1540.1990.0490.07
9Clementine4786.580.850.1512.021.79.18300.141021177148.80.0860.030.6360.1510.0750.2
10Cranberries4687.130.390.1312.24.64.0480.256138526013.30.0120.020.1010.2950.0571.2
11Currants Black6381.961.40.4115.38551.54245932222301810.050.050.30.3980.066
12Currants, Red and White5683.951.40.213.84.37.373311344275142410.040.050.10.0640.070.1
13Date nutrition: Dates, deglet noor28220.532.450.3975.03863.35391.0243626562100.40.0520.0661.2740.5890.1650.05
14Dates, medjool27721.321.810.1574.976.766.47640.95462696114900.050.061.610.8050.249
15Durian nutrition14764.991.475.3327.093.860.43303943624419.70.3740.21.0740.230.316
16Fig nutrition: Figs, raw7479.110.750.319.182.916.26350.371714232114220.060.050.40.30.1130.11
17Grapefruit, pink and red4288.060.770.1410.661.66.89220.089181350115031.20.0430.0310.2040.2620.0530.13
18Grapefruit, white3390.480.690.18.411.17.31120.069814803333.30.0370.020.2690.2830.0430.13
19Grape nutrition: Grapes, red or green6980.540.720.1618.10.915.48100.3672019126610.80.0690.070.1880.050.0860.19
20Groundcherries or Cape Gooseberries5385.41.90.711.29140720110.110.042.8
21Guava nutrition facts6880.82.550.9514.325.48.92180.2622404172624228.30.0670.041.0840.4510.110.73
22Gooseberries4487.870.880.5810.184.3250.311027198129027.70.040.030.30.2860.080.37
23Jackfruit nutrition9473.231.470.324.011.6340.6373630332976.70.030.110.40.108
24Kiwi nutriion: Kiwifruit, gold6083.221.230.5614.23210.98200.291429316372105.40.0240.0460.280.50.0571.49
25Kiwi nutriion: Kiwifruit, green6183.071.140.5214.6638.99340.31173431238792.70.0270.0250.3410.1830.0631.46
26Kumquat nutrition7180.851.880.8615.96.59.36620.8620191861029043.90.0370.090.4290.2080.0360.15
27Lemon nutrition: Lemon with peel2087.41.20.310.74.7610.71215145330770.050.040.20.2320.109
28Lime nutrition3088.260.70.210.542.81.69330.661810225029.10.030.020.20.2170.0430.22
29Litchis or Lychees nutrition6681.760.830.4416.531.315.2350.3110311711071.50.0110.0650.6030.10.07
30Mandarin-Clementine nutrition4786.580.850.1512.021.79.18300.141021177148.80.0860.030.6360.1510.0750.2
31Mandarin-Tangerine nutrition5385.170.810.3113.341.810.58370.151220166268126.70.0580.0360.3760.2160.0780.2
32Mango nutrition facts6581.710.510.27171.814.8100.13911156276527.70.0580.0570.5840.160.1341.12
33Melons, cantaloupe3490.150.840.198.160.97.8690.21121526716338236.70.0410.0190.7340.1050.0720.05
34Melons, casaba2891.851.110.16.580.95.69110.341151829021.80.0150.0310.2320.0840.1630.05
35Melons, honeydew3689.820.540.149.090.88.1260.1710112281850180.0380.0120.4180.1550.0880.02
36Mulberries, raw4387.681.440.399.81.78.1391.851838194102536.40.0290.1010.620.050.87
37Nectarine nutrition4487.591.060.3210.551.77.8960.2892620103325.40.0340.0271.1250.1850.0250.77
38Oranges nutrition facts4786.750.940.1211.752.49.35400.11014181022553.20.0870.040.2820.250.060.18
39Papaya nutrition3988.830.610.149.811.85.9240.11052573109461.80.0270.0320.3380.2180.0190.73
40Passion fruit nutrition9772.932.20.723.3810.411.2121.629683482812723000.131.50.10.02
41Peaches nutrition3988.870.910.259.541.58.3960.2592019003266.60.0240.0310.8060.1530.0250.73
42Pear nutrition5883.710.380.1215.463.19.890.177111191234.20.0120.0250.1570.0480.0280.12
43Persimmon nutrition: Persimmons, native12764.40.80.433.5272.526310166
44Pineapple nutrition50860.540.1213.121.49.85130.2912810915847.80.0790.0320.50.2130.1120.02
45Plum nutrition4687.230.70.2811.421.49.9260.1771615703459.50.0280.0260.4170.1350.0290.26
46Pomegranate nutrition facts8377.931.671.1718.7413.67100.312362363010.20.0670.0530.2930.3770.0750.6
47Quince fruit5783.80.40.115.31.9110.7817197440150.020.030.20.0810.04
48Raspberry nutrition5285.751.20.6511.946.54.42250.69222915113326.20.0320.0380.5980.3290.0550.87
49Strawberry nutrition facts3290.950.670.37.6824.89160.41132415311258.80.0240.0220.3860.1250.0470.29
50Tangerine nutrition5385.170.810.3113.341.810.58370.151220166268126.70.0580.0360.3760.2160.0780.2
51Watermelon nutrition facts3091.450.610.157.550.46.270.24101111215698.10.0330.0210.1780.2210.0450.05

ひとまず、こんなんできました。
”-”こんなん入ってるところを、NANにして、dropnaで削除。infoで確認。

df_data=df_data.replace("-",np.nan) 
df_data=df_data.dropna()

df_data.info()

52種から38種まで減りました。平均値で補完しようとも思いましたが、
ちょっと、まずいだろうということで欠損あるデータは削除にしました。
で、まだ、objectだらけなので、これらを数字に。

df_data["fiber (g)"]=df_data["fiber (g)"].astype(float,errors="raise")
df_data["sugars (g)"]=df_data["sugars (g)"].astype(float,errors="raise")
df_data["magnessium (mg)"]=df_data["magnessium (mg)"].astype(float,errors="raise")
df_data["potassium (mg)"]=df_data["potassium (mg)"].astype(float,errors="raise")
df_data["sodium (g)"]=df_data["sodium (g)"].astype(float,errors="raise")
df_data["vitamin A (IU)"]=df_data["vitamin A (IU)"].astype(float,errors="raise")
df_data["vitamin B1 (mg)"]=df_data["vitamin B1 (mg)"].astype(float,errors="raise")
df_data["vitamin B2 (mg)"]=df_data["vitamin B2 (mg)"].astype(float,errors="raise")
df_data["viatmin B3 (mg)"]=df_data["viatmin B3 (mg)"].astype(float,errors="raise")
df_data["vitamin B5 (mg)"]=df_data["vitamin B5 (mg)"].astype(float,errors="raise")
df_data["vitamin B6 (mg)"]=df_data["vitamin B6 (mg)"].astype(float,errors="raise")
df_data["vitamin E (mg)"]=df_data["vitamin E (mg)"].astype(float,errors="raise")

python先生からloc使いなさいよ、とアドバイスをいただきましたが、また勉強しておきます。
38種になったので、ここでもっかいnameを置いときます。

df_name=df_data.iloc[:,:1].reset_index(drop=True)

name以外をデータとして使うので、改めてこうします。

df_data_=df_data.iloc[:,1:]

これでデータの準備完了!

k平均法でやっていきますよってに

標準化しないといけません。

from sklearn.preprocessing import StandardScaler
scaler=StandardScaler()

df_data_std=scaler.fit_transform(df_data_)

k平均法をインポート。

from sklearn.cluster import KMeans

いくつに分けるかは、人が決めるのでした。エルボー法で妥当なところを見てみます。

sum = []

for i  in range(1,11):
    km = KMeans(n_clusters=i)
    km.fit(df_data_std)
    sum.append(km.inertia_)

plt.figure(figsize=(10,10))
plt.plot(range(1,11),sum,marker="o",color="blue",ms=15)
plt.xlabel("cluster")
plt.ylabel("sum")
plt.show()

比較的キレイに肘が見えてくれました。4ですね。ということで、

k_means=KMeans(n_clusters=4).fit(df_data_std)
k_means.labels_

こんなん出してくれました。あとは、データを整えます。

df_data_std=pd.DataFrame(df_data_std)
df_data_std.columns=df_data_.columns
df_data_std["labels"]=k_means.labels_
df_results=pd.concat([df_name,df_data_std],axis=1)

df_results.to_csv("fruits_results.csv")

標準化の時に、ndarray型になったdf_data_stdをデータフレーム型にし、
カラムを元に戻し、labelsカラムにk平均法のクラスタリング情報を格納し、
名前をくっつけて、resultsデータフレームにしました。
こいつをCSVで書き出し編集して、分類の解釈をしていきます。

k平均法の分類を解釈していく

まずは、クラス0の子たち。

namelabelsenergy(kcal)water (g)protein (g)total fat (g)carbohydrates (g)fiber (g)sugars (g)calcium (mg)iron (mg)magnessium (mg)phosphorus (mg)potassium (mg)sodium (g)vitamin A (IU)vitamin C (mg)vitamin B1 (mg)vitamin B2 (mg)viatmin B3 (mg)vitamin B5 (mg)vitamin B6 (mg)vitamin E (mg)
7Apricot nutrition facts0-0.290.260.70-0.14-0.29-0.42-0.10-0.390.26-0.430.150.38-0.452.45-0.54-0.51-0.060.36-0.10-0.400.85
12Melons, casaba0-0.750.740.15-0.26-0.72-1.01-0.47-0.540.05-0.30-1.29-0.281.51-0.57-0.24-1.21-0.40-0.69-0.801.10-0.82
14Cherry fruit nutrition facts00.06-0.110.06-0.220.16-0.360.28-0.390.13-0.30-0.010.06-0.70-0.47-0.61-0.65-0.32-0.92-0.29-0.47-0.78
15Nectarine nutrition0-0.380.370.06-0.17-0.35-0.58-0.24-0.91-0.20-0.570.39-0.12-0.70-0.05-0.65-0.33-0.551.87-0.35-0.800.61
16Carambola or Starfruit0-0.680.700.02-0.16-0.710.01-0.65-1.13-1.04-0.43-0.73-0.71-0.21-0.470.07-1.26-0.96-0.310.58-0.91-0.62
17Oranges nutrition facts0-0.310.29-0.17-0.26-0.23-0.20-0.081.61-0.96-0.43-0.57-0.29-0.70-0.220.542.14-0.06-0.55-0.05-0.31-0.56
18Peaches nutrition0-0.500.48-0.23-0.20-0.44-0.69-0.19-0.91-0.33-0.57-0.09-0.21-0.70-0.06-0.62-0.79-0.400.96-0.49-0.800.53
19Melons, cantaloupe0-0.610.59-0.36-0.23-0.57-1.01-0.24-0.69-0.50-0.17-0.490.453.224.730.13-0.00-0.840.75-0.71-0.15-0.82
20Mandarin-Tangerine nutrition0-0.170.15-0.42-0.17-0.09-0.530.051.39-0.75-0.17-0.09-0.42-0.210.50-0.120.79-0.21-0.28-0.21-0.07-0.52
21Tangerine nutrition0-0.170.15-0.42-0.17-0.09-0.530.051.39-0.75-0.17-0.09-0.42-0.210.50-0.120.79-0.21-0.28-0.21-0.07-0.52
22Grapefruit, pink and red0-0.430.41-0.49-0.25-0.34-0.63-0.340.28-1.04-0.57-0.25-0.69-0.701.23-0.010.09-0.40-0.77-0.00-0.41-0.66
24Blueberry nutrition facts0-0.080.07-0.55-0.160.02-0.20-0.02-0.91-0.20-0.97-0.73-1.20-0.45-0.48-0.55-0.19-0.03-0.16-0.62-0.420.21
25Grape nutrition: Grapes, red or green00.20-0.26-0.59-0.240.36-1.010.57-0.610.13-0.83-0.09-0.21-0.21-0.46-0.521.301.05-0.82-0.960.04-0.54
26Plum nutrition0-0.330.33-0.62-0.19-0.27-0.74-0.02-0.91-0.66-0.83-0.41-0.50-0.70-0.03-0.55-0.61-0.58-0.16-0.57-0.74-0.40
27Lime nutrition0-0.700.43-0.62-0.22-0.350.01-0.901.091.14-0.97-0.25-0.98-0.21-0.49-0.06-0.51-0.81-0.79-0.20-0.55-0.48
28Grapefruit, white0-0.640.62-0.64-0.26-0.55-0.90-0.30-0.46-1.13-0.57-1.05-0.58-0.70-0.520.04-0.19-0.81-0.590.09-0.55-0.66
29Strawberry nutrition facts0-0.660.66-0.68-0.18-0.62-0.42-0.56-0.170.34-0.040.23-0.54-0.45-0.550.68-0.79-0.73-0.25-0.62-0.49-0.34
30Papaya nutrition0-0.500.48-0.79-0.25-0.42-0.53-0.450.43-0.96-0.43-1.290.370.041.150.76-0.65-0.36-0.39-0.20-0.880.53
31Watermelon nutrition facts0-0.700.71-0.79-0.24-0.63-1.28-0.42-0.83-0.37-0.43-0.81-0.89-0.450.32-0.59-0.37-0.77-0.85-0.19-0.52-0.82
32Pineapple nutrition0-0.240.22-0.93-0.26-0.11-0.74-0.03-0.39-0.16-0.17-1.05-0.92-0.45-0.480.411.77-0.360.08-0.220.40-0.88
33Melons, honeydew0-0.570.56-0.93-0.25-0.48-1.07-0.21-0.91-0.66-0.43-0.810.123.71-0.49-0.34-0.14-1.10-0.16-0.480.07-0.88
34Mango nutrition facts00.11-0.16-0.98-0.190.26-0.530.49-0.61-0.83-0.57-0.81-0.51-0.210.63-0.100.790.570.32-0.460.701.31
35Cranberries0-0.330.33-1.21-0.25-0.190.98-0.65-0.76-0.33-0.97-0.65-1.13-0.21-0.47-0.46-1.35-0.81-1.070.15-0.361.47
36Pear nutrition0-0.060.02-1.23-0.260.110.17-0.04-0.69-0.66-0.83-0.81-0.83-0.45-0.53-0.68-1.35-0.62-0.91-0.97-0.75-0.68
37Apple nutrition facts0-0.290.29-1.44-0.25-0.15-0.80-0.00-0.98-1.08-1.23-0.81-1.08-0.70-0.51-0.69-1.02-0.51-1.10-0.86-0.63-0.82

38種中25種が入ってます、 このデータの見方だと、 わかりずらいですが、
【1.水分ばっかで栄養あんまりないグループ】
な感じです。
次、クラス1の子たち。

namelabelsenergy(kcal)water (g)protein (g)total fat (g)carbohydrates (g)fiber (g)sugars (g)calcium (mg)iron (mg)magnessium (mg)phosphorus (mg)potassium (mg)sodium (g)vitamin A (IU)vitamin C (mg)vitamin B1 (mg)vitamin B2 (mg)viatmin B3 (mg)vitamin B5 (mg)vitamin B6 (mg)vitamin E (mg)
1Date nutrition: Dates, deglet noor15.14-5.582.68-0.145.692.825.641.542.903.963.293.83-0.21-0.55-0.780.510.912.301.481.13-0.82

このクラス1種しかいなくて、デーツ(ナツメヤシ)です。22個の特徴量のうち、
8つの項目で最大値を持ってます。まさに、
【2.デーツグループ
ですね。
次、クラス2の子たち。

namelabelsenergy(kcal)water (g)protein (g)total fat (g)carbohydrates (g)fiber (g)sugars (g)calcium (mg)iron (mg)magnessium (mg)phosphorus (mg)potassium (mg)sodium (g)vitamin A (IU)vitamin C (mg)vitamin B1 (mg)vitamin B2 (mg)viatmin B3 (mg)vitamin B5 (mg)vitamin B6 (mg)vitamin E (mg)
2Avocado nutrition22.31-0.911.836.05-0.542.12-1.01-0.460.932.102.482.351.02-0.34-0.541.213.293.645.092.393.20

子たち、と書きましたが、なんとこれも1種しかいないんです。
アボカドです。この子も最大値4つ持っています。
【3.アボカドグループ】
です。
最後、クラス3の子たち。

namelabelsenergy(kcal)water (g)protein (g)total fat (g)carbohydrates (g)fiber (g)sugars (g)calcium (mg)iron (mg)magnessium (mg)phosphorus (mg)potassium (mg)sodium (g)vitamin A (IU)vitamin C (mg)vitamin B1 (mg)vitamin B2 (mg)viatmin B3 (mg)vitamin B5 (mg)vitamin B6 (mg)vitamin E (mg)
0Guava nutrition facts30.18-0.242.870.100.011.42-0.13-0.02-0.291.161.521.76-0.210.414.921.21-0.061.760.850.370.53
3Kumquat nutrition30.25-0.231.610.060.152.01-0.083.242.230.90-0.17-0.251.75-0.110.31-0.191.80-0.13-0.24-0.64-0.62
4Pomegranate nutrition facts30.52-0.491.210.200.420.660.37-0.61-0.12-0.171.200.180.04-0.57-0.531.210.42-0.520.52-0.110.27
5Cherimoya fruit nutrition30.34-0.361.02-0.010.320.120.29-0.61-0.250.500.390.631.02-0.56-0.472.793.320.490.372.39-0.38
6Currants, Red and White3-0.100.040.70-0.22-0.040.82-0.291.092.81-0.041.840.52-0.45-0.500.24-0.050.31-1.07-0.90-0.18-0.72
8Blackberries nutrition3-0.400.420.68-0.10-0.441.36-0.560.801.220.900.07-0.46-0.45-0.23-0.26-0.98-0.580.500.06-0.731.41
9Kiwi nutriion: Kiwifruit, gold3-0.01-0.020.38-0.07-0.00-0.420.090.13-0.160.100.630.880.04-0.451.85-0.790.16-0.561.07-0.362.05
10Raspberry nutrition3-0.190.200.32-0.03-0.222.01-0.610.501.511.160.63-0.55-0.45-0.52-0.13-0.42-0.140.360.30-0.380.81
11Kiwi nutriion: Kiwifruit, green30.01-0.030.21-0.080.040.12-0.121.17-0.080.501.040.840.04-0.431.53-0.65-0.62-0.38-0.36-0.271.99
13Banana nutrition facts30.66-0.760.11-0.160.80-0.100.22-0.98-0.291.830.071.24-0.45-0.47-0.57-0.471.170.550.323.91-0.72
23Fig nutrition: Figs, raw30.32-0.39-0.53-0.180.460.070.651.240.170.50-0.570.15-0.45-0.35-0.740.880.31-0.210.170.41-0.70

やはり、これだけ見ても、分かりずらいですが、残りの11種が属するのは、
【4.水分そんななくって、栄養多少はあるでグループ】
です。グアバとかチェリモヤとかキウイとかバナナ等々のそれっぽい子たちが入ってます。

もっとスマートなやり方あると思いますが、どうにかこうにかたどり着きました。
全く想像つきませんが、22次元空間に、
【1.水分ばっかで栄養あんまりないグループ】
【4.水分そんななくって、栄養多少はあるでグループ】
がなんとなく群を作っていて、めっちゃ遠くに、
【2.デーツグループ】
【3.アボカドグループ】
がポツンポツンとある、感じです。

果物の王様って、ドリアンだよなーと思ったんですけど、欠損値ありで消してしまってましたの。