« Data mining » : différence entre les versions

Version du 28 février 2018 à 16:58

Domaine

Définition

Le data mining couvre l’ensemble des outils et méthodes qui permettent d’extraire des connaissances à partir de grandes bases de données. On parle aussi de fouille, forage ou prospection de données, d’extraction de connaissances à partir de données.

C’est une analyse préliminaire où l’on explore, on cherche à confirmer des intuitions, à faire émerger des concepts (insights). C’est une façon de produire de la connaissance mais cette étape ne s’automatise pas. Certains y intègrent aussi la transformation des données en informations utiles, en établissant des relations entre les données, des corrélations (on parle aussi de patterns, de motifs, de critères) pour les catégoriser. Le data mining est une extension de l’analyse de données et des statistiques exploratoires pratiquées depuis plus de 30 ans. Il intègre (ou il est le prélude à) des techniques d’analyse issues de l’apprentissage automatique (comme le machine learning), de la reconnaissance de formes et des bases de données de diverses natures dont celles issues d’entrepôts de données (ou Data Warehouse).

Le Data Mining, également surnommé Knowledge Discovery in Data (découverte de savoir dans les données), repose sur des algorithmes complexes et sophistiqués permettant de segmenter les données et d’évaluer les probabilités futures, comme les tendances d’un marché.

Termes privilégiés

Anglais

Data Mining

Data mining is the process of discovering patterns in large data sets involving methods at the intersection of machine learning, statistics, and database systems.[1] It is an essential process where intelligent methods are applied to extract data patterns.[1][2] It is an interdisciplinary subfield of computer science.[1][3][4] The overall goal of the data mining process is to extract information from a data set and transform it into an understandable structure for further use.[1] Aside from the raw analysis step, it involves database and data management aspects, data pre-processing, model and inference considerations, interestingness metrics, complexity considerations, post-processing of discovered structures, visualization, and online updating.[1] Data mining is the analysis step of the "knowledge discovery in databases" process, or KDD.[5]

The term is a misnomer, because the goal is the extraction of patterns and knowledge from large amounts of data, not the extraction (mining) of data itself.[6] It also is a buzzword[7] and is frequently applied to any form of large-scale data or information processing (collection, extraction, warehousing, analysis, and statistics) as well as any application of computer decision support system, including artificial intelligence, machine learning, and business intelligence. The book Data mining: Practical machine learning tools and techniques with Java[8] (which covers mostly machine learning material) was originally to be named just Practical machine learning, and the term data mining was only added for marketing reasons.[9] Often the more general terms (large scale) data analysis and analytics – or, when referring to actual methods, artificial intelligence and machine learning – are more appropriate.

@@ Ligne 24 : / Ligne 24 : @@
 == Anglais ==
+=== Data Mining ===
+Data mining is the process of discovering patterns in large data sets involving methods at the intersection of machine learning, statistics, and database systems.[1] It is an essential process where intelligent methods are applied to extract data patterns.[1][2] It is an interdisciplinary subfield of computer science.[1][3][4] The overall goal of the data mining process is to extract information from a data set and transform it into an understandable structure for further use.[1] Aside from the raw analysis step, it involves database and data management aspects, data pre-processing, model and inference considerations, interestingness metrics, complexity considerations, post-processing of discovered structures, visualization, and online updating.[1] Data mining is the analysis step of the "knowledge discovery in databases" process, or KDD.[5]
+The term is a misnomer, because the goal is the extraction of patterns and knowledge from large amounts of data, not the extraction (mining) of data itself.[6] It also is a buzzword[7] and is frequently applied to any form of large-scale data or information processing (collection, extraction, warehousing, analysis, and statistics) as well as any application of computer decision support system, including artificial intelligence, machine learning, and business intelligence. The book Data mining: Practical machine learning tools and techniques with Java[8] (which covers mostly machine learning material) was originally to be named just Practical machine learning, and the term data mining was only added for marketing reasons.[9] Often the more general terms (large scale) data analysis and analytics – or, when referring to actual methods, artificial intelligence and machine learning – are more appropriate.<br />
+<br />
+<br />
+<br />
+<br />