Gaussian Naive Bayes

(2 versions intermédiaires par 2 utilisateurs non affichées)

Ligne 1 :

~~==en construction==~~

#REDIRECTION[[Classification naïve bayésienne]]

~~== Définition ==~~

[[Catégorie:ENGLISH]]

~~XXXXXXXXX~~

~~== Français ==~~

~~''' XXXXXXXXX '''~~

~~== Anglais ==~~

~~''' Gaussian naïve Bayes'''~~

When dealing with continuous data, a typical assumption is that the continuous values associated with each class are distributed according to a normal (or Gaussian) distribution. For example, suppose the training data contains a continuous attribute, {\displaystyle x}x. We first segment the data by the class, and then compute the mean and variance of {\displaystyle x}x in each class. Let {\displaystyle \mu _{k}}\mu _{k} be the mean of the values in {\displaystyle x}x associated with class Ck, and let {\displaystyle \sigma _{k}^{2}}{\displaystyle \sigma _{k}^{2}} be the Bessel corrected variance of the values in {\displaystyle x}x associated with class Ck. Suppose we have collected some observation value {\displaystyle v}v. Then, the probability distribution of {\displaystyle v}v given a class {\displaystyle C_{k}}C_{k}, {\displaystyle p(x=v\mid C_{k})}{\displaystyle p(x=v\mid C_{k})}, can be computed by plugging {\displaystyle v}v into the equation for a normal distribution parameterized by {\displaystyle \mu _{k}}\mu _{k} and {\displaystyle \sigma _{k}^{2}}{\displaystyle \sigma _{k}^{2}}. That is,

{\displaystyle p(x=v\mid C_{k})={\frac {1}{\sqrt {2\pi \sigma _{k}^{2}}}}\,e^{-{\frac {(v-\mu _{k})^{2}}{2\sigma _{k}^{2}}}}}{\displaystyle p(x=v\mid C_{k})={\frac {1}{\sqrt {2\pi \sigma _{k}^{2}}}}\,e^{-{\frac {(v-\mu _{k})^{2}}{2\sigma _{k}^{2}}}}}

Another common technique for handling continuous values is to use binning to discretize the feature values, to obtain a new set of Bernoulli-distributed features; some literature in fact suggests that this is necessary to apply naive Bayes, but it is not, and the discretization may throw away discriminative information.[5]

Sometimes the distribution of class-conditional marginal densities is far from normal. In these cases, kernel density estimation can be used for a more realistic estimate of the marginal densities of each class. This method, which was introduced by John and Langley,[12] can boost the accuracy of the classifier considerably. [13][14]

~~<small>~~

~~[https://en.wikipedia.org/wiki/Naive_Bayes_classifier#Gaussian_na%C3%AFve_Bayes Source : Wikipedia Machine Learning ]~~

~~[[Catégorie:vocabulary]]~~

[[Catégorie:~~Wikipedia-IA‎~~]]

« Gaussian Naive Bayes » : différence entre les versions

Dernière version du 1 août 2022 à 09:31

@@ Ligne 1 : / Ligne 1 : @@
-==en construction==
+#REDIRECTION[[Classification naïve bayésienne]]
-== Définition ==
+[[Catégorie:ENGLISH]]
-XXXXXXXXX
-== Français ==
-''' XXXXXXXXX '''
-== Anglais ==
-''' Gaussian naïve Bayes'''
-When dealing with continuous data, a typical assumption is that the continuous values associated with each class are distributed according to a normal (or Gaussian) distribution. For example, suppose the training data contains a continuous attribute, {\displaystyle x}x. We first segment the data by the class, and then compute the mean and variance of {\displaystyle x}x in each class. Let {\displaystyle \mu _{k}}\mu _{k} be the mean of the values in {\displaystyle x}x associated with class Ck, and let {\displaystyle \sigma _{k}^{2}}{\displaystyle \sigma _{k}^{2}} be the Bessel corrected variance of the values in {\displaystyle x}x associated with class Ck. Suppose we have collected some observation value {\displaystyle v}v. Then, the probability distribution of {\displaystyle v}v given a class {\displaystyle C_{k}}C_{k}, {\displaystyle p(x=v\mid C_{k})}{\displaystyle p(x=v\mid C_{k})}, can be computed by plugging {\displaystyle v}v into the equation for a normal distribution parameterized by {\displaystyle \mu _{k}}\mu _{k} and {\displaystyle \sigma _{k}^{2}}{\displaystyle \sigma _{k}^{2}}. That is,
-{\displaystyle p(x=v\mid C_{k})={\frac {1}{\sqrt {2\pi \sigma _{k}^{2}}}}\,e^{-{\frac {(v-\mu _{k})^{2}}{2\sigma _{k}^{2}}}}}{\displaystyle p(x=v\mid C_{k})={\frac {1}{\sqrt {2\pi \sigma _{k}^{2}}}}\,e^{-{\frac {(v-\mu _{k})^{2}}{2\sigma _{k}^{2}}}}}
-Another common technique for handling continuous values is to use binning to discretize the feature values, to obtain a new set of Bernoulli-distributed features; some literature in fact suggests that this is necessary to apply naive Bayes, but it is not, and the discretization may throw away discriminative information.[5]
-Sometimes the distribution of class-conditional marginal densities is far from normal. In these cases, kernel density estimation can be used for a more realistic estimate of the marginal densities of each class. This method, which was introduced by John and Langley,[12] can boost the accuracy of the classifier considerably. [13][14]
-<small>
-[https://en.wikipedia.org/wiki/Naive_Bayes_classifier#Gaussian_na%C3%AFve_Bayes  Source : Wikipedia  Machine Learning ]
-[[Catégorie:vocabulary]]
-[[Catégorie:Wikipedia-IA‎]]

« Gaussian Naive Bayes » : différence entre les versions

Dernière version du 1 août 2022 à 09:31

« Gaussian Naive Bayes » : différence entre les versions