Livres Data Science : Livres en anglais

couverture du livre Telling Stories with Data

Détails du livre

Sommaire

Critiques (1)

0 commentaire

Telling Stories with Data

With Applications in R

de Rohan Alexander

Public visé : Débutant

Résumé de l'éditeur

The book equips students with the end-to-end skills needed to do data science. That means gathering, cleaning, preparing, and sharing data, then using statistical models to analyse data, writing about the results of those models, drawing conclusions from them, and finally, using the cloud to put a model into production, all done in a reproducible way.

At the moment, there are a lot of books that teach data science, but most of them assume that you already have the data. This book fills that gap by detailing how to go about gathering datasets, cleaning and preparing them, before analysing them. There are also a lot of books that teach statistical modelling, but few of them teach how to communicate the results of the models and how they help us learn about the world. Very few data science textbooks cover ethics, and most of those that do, have a token ethics chapter. Finally, reproducibility is not often emphasised in data science books. This book is based around a straight-forward workflow conducted in an ethical and reproducible way: gather data, prepare data, analyse data, and communicate those findings. This book will achieve the goals by working through extensive case studies in terms of gathering and preparing data, and integrating ethics throughout. It is specifically designed around teaching how to write about the data and models, so aspects such as writing are explicitly covered. And finally, the use of GitHub and the open-source statistical language R are built in throughout the book.

Key Features:

Extensive code examples.
Ethics integrated throughout.
Reproducibility integrated throughout.
Focus on data gathering, messy data, and cleaning data.
Extensive formative assessment throughout.

Édition : CRC - 598 pages, 1^re édition, 27 juillet 2023

ISBN10 : 1032134771 - ISBN13 : 9781032134772

Commandez sur www.amazon.fr :

95.28 € TTC (prix éditeur 95.28 € TTC)

Telling stories with data
Drinking from a fire hose
Reproducible workflows

Foundations

Writing research
Static communication

Communication

Farm data
Gather data
Hunt data

Preparation

Clean and prepare
Store and share

Modeling

Exploratory data analysis
Linear models
Generalized linear models

Applications

Causality from observational data
Multilevel regression with post-stratification
Text as data
Concluding remarks

Critique du livre par la rédaction Thibaut Cuvelier le 30 décembre 2023

Bon nombre de gens s'intéressent de plus en plus aux données et cherchent à en faire quelque chose, mais pèchent par manque de connaissance dans le domaine. Ce livre offre une introduction au domaine, sans présupposer de connaissances particulières en informatique (un peu en statistiques). Il couvre bon nombre de sujets importants, comme les statistiques, le traitement de données, la visualisation ou la reproductibilité des résultats, mais n'aborde pas vraiment l'apprentissage automatique, y préférant les approches bayésiennes. Ainsi, l'auteur consacre un chapitre entier à la causalité inférée de données observées (et non d'expériences) ou encore une section à la confidentialité différentielle.

L'ouvrage se distingue par plusieurs caractéristiques, la principale étant l'éthique. Cet aspect est rarement abordé dans la litérature sur la science des données, mais l'auteur lui donne une importance particulière : il analyse bon nombre de situations où l'usage de données sans éthique a mené à des catastrophes évitables, par exemple. Il décortique les effets potentiels des transformations des données aussi selon l'angle éthique. L'auteur insiste sur les meilleures pratiques actuelles en termes de développement, notamment l'inclusion de tests ou la gestion de version (des sujets rarement mis en évidence en dehors du développement informatique classique).

Malgré son titre, le livre ne cherche pas à enseigner R, plutôt les éléments essentiels du langage. Tout le code source utilisé pour le livre (tableaux et figures) est inclus, avec des explications sur les principes sous-jacents, de telle sorte que l'ouvrage reste utile avec d'autres langages de programmation.

Commenter Signaler un problème

dourouc05 - Responsable Qt & Livres

l 30/12/2023 à 19:41

Telling Stories with Data
With Applications in R

The book equips students with the end-to-end skills needed to do data science. That means gathering, cleaning, preparing, and sharing data, then using statistical models to analyse data, writing about the results of those models, drawing conclusions from them, and finally, using the cloud to put a model into production, all done in a reproducible way.

At the moment, there are a lot of books that teach data science, but most of them assume that you already have the data. This book fills that gap by detailing how to go about gathering datasets, cleaning and preparing them, before analysing them. There are also a lot of books that teach statistical modelling, but few of them teach how to communicate the results of the models and how they help us learn about the world. Very few data science textbooks cover ethics, and most of those that do, have a token ethics chapter. Finally, reproducibility is not often emphasised in data science books. This book is based around a straight-forward workflow conducted in an ethical and reproducible way: gather data, prepare data, analyse data, and communicate those findings. This book will achieve the goals by working through extensive case studies in terms of gathering and preparing data, and integrating ethics throughout. It is specifically designed around teaching how to write about the data and models, so aspects such as writing are explicitly covered. And finally, the use of GitHub and the open-source statistical language R are built in throughout the book.

Key Features:

Extensive code examples.
Ethics integrated throughout.
Reproducibility integrated throughout.
Focus on data gathering, messy data, and cleaning data.
Extensive formative assessment throughout.

[Lire la suite]

Avez-vous lu ce livre ou pensez-vous le lire ?
Souhaitez-vous ajouter une critique de ce livre sur la page de la rubrique ?
Avez-vous un commentaire à faire ?

couverture du livre Machine Learning for Engineers

Détails du livre

Sommaire

Critiques (1)

0 commentaire

Machine Learning for Engineers

de Osvaldo Simeone

Public visé : Débutant

Résumé de l'éditeur

This self-contained introduction to machine learning, designed from the start with engineers in mind, will equip students with everything they need to start applying machine learning principles and algorithms to real-world engineering problems. With a consistent emphasis on the connections between estimation, detection, information theory, and optimization, it includes: an accessible overview of the relationships between machine learning and signal processing, providing a solid foundation for further study; clear explanations of the differences between state-of-the-art techniques and more classical methods, equipping students with all the understanding they need to make informed technique choices; demonstration of the links between information-theoretical concepts and their practical engineering relevance; reproducible examples using Matlab, enabling hands-on student experimentation. Assuming only a basic understanding of probability and linear algebra, and accompanied by lecture slides and solutions for instructors, this is the ideal introduction to machine learning for engineering students of all disciplines.

Édition : Cambridge - 450 pages, 1^re édition, 3 décembre 2022

ISBN10 : 1316512827 - ISBN13 : 9781316512821

Commandez sur www.amazon.fr :

67.19 € TTC (prix éditeur 67.19 € TTC)

Introduction and Background

When and How to Use Machine Learning
Background

Fundamental Concepts and Algorithms

Inference, or Model-Driven Prediction
Supervised Learning: Getting Started
Optimization for Machine Learning
Supervised Learning: Beyond Least Squares
Unsupervised Learning

Advanced Tools and Algorithms

Statistical Learning Theory
Exponential Family of Distributions
Variational Inference and Variational Expectation Minimization
Information-Theoretic Inference and Learning
Bayesian Learning

Beyond Centralized Single-Task Learning

Transfer Learning, Multi-task Learning, Continual Learning, and Meta-learning
Federated Learning

Epilogue

Beyond This Book

Critique du livre par la rédaction Thibaut Cuvelier le 4 février 2023

Les livres introductifs sur l'apprentissage automatique ont tendance à se focaliser sur les algorithmes possibles plutôt que sur un public cible. Cet ouvrage s'adresse surtout à un public d'ingénieurs en offrant un point de vue de la théorie de l'information sur le domaine, une théorie qui sert de vocabulaire unifiant. Malgré son titre, l'auteur donne une foultitude de détails mathématiques, tant sur les algorithmes, leurs principes que la théorie de l'apprentissage — les parties les plus avancées sont réservées aux annexes.

Dans un style narratif, l'auteur passe des probabilités élémentaires à une approche bayésienne ou fréquentiste de l'apprentissage, selon les moments. Tous les sujets classiques sont abordés, y compris les réseaux neuronaux profonds, mais sans que les algorithmes occupent la majorité du texte. Le livre démarre sur des principes généraux comme les fonctions de perte ou la maximisation de probabilité a posteriori avant des algorithmes basiques comme la régression linéaire, ce qui donne une impression d'un texte abstrait.

Le livre se termine par des avancées scientifiques récentes, en explorant l'apprentissage fédéré ou par transfert. Pour approfondir ses connaissances, chaque chapitre se termine par une série d'exercices sans correction et une courte bibliographie, contenant surtout des ouvrages plus avancés.

Au tout début, l'auteur nous fournit une liste de notations et d'acronymes. Le code utilisé pour générer les figures est le plus souvent disponible à la fois en Python et en MATLAB.

Commenter Signaler un problème

dourouc05 - Responsable Qt & Livres

l 04/02/2023 à 4:11

Machine Learning for Engineers

This self-contained introduction to machine learning, designed from the start with engineers in mind, will equip students with everything they need to start applying machine learning principles and algorithms to real-world engineering problems. With a consistent emphasis on the connections between estimation, detection, information theory, and optimization, it includes: an accessible overview of the relationships between machine learning and signal processing, providing a solid foundation for further study; clear explanations of the differences between state-of-the-art techniques and more classical methods, equipping students with all the understanding they need to make informed technique choices; demonstration of the links between information-theoretical concepts and their practical engineering relevance; reproducible examples using Matlab, enabling hands-on student experimentation. Assuming only a basic understanding of probability and linear algebra, and accompanied by lecture slides and solutions for instructors, this is the ideal introduction to machine learning for engineering students of all disciplines.

[Lire la suite]

Avez-vous lu ce livre ou pensez-vous le lire ?
Souhaitez-vous ajouter une critique de ce livre sur la page de la rubrique ?
Avez-vous un commentaire à faire ?

couverture du livre A First Course in Random Matrix Theory

Détails du livre

Sommaire

Critiques (1)

0 commentaire

A First Course in Random Matrix Theory

For physicists, engineers and data scientists

de Jean-Philippe Bouchaud et Marc Potters

Public visé : Intermédiaire

Résumé de l'éditeur

The real world is perceived and broken down as data, models and algorithms in the eyes of physicists and engineers. Data is noisy by nature and classical statistical tools have so far been successful in dealing with relatively smaller levels of randomness. The recent emergence of Big Data and the required computing power to analyse them have rendered classical tools outdated and insufficient. Tools such as random matrix theory and the study of large sample covariance matrices can efficiently process these big data sets and help make sense of modern, deep learning algorithms. Presenting an introductory calculus course for random matrices, the book focusses on modern concepts in matrix theory, generalising the standard concept of probabilistic independence to non-commuting random variables. Concretely worked out examples and applications to financial engineering and portfolio construction make this unique book an essential tool for physicists, engineers, data analysts, and economists.

Édition : Cambridge - 372 pages, 1^re édition, 12 mars 2020

ISBN10 : 1108488080 - ISBN13 : 9781108488082

Commandez sur www.amazon.fr :

69.42 € TTC (prix éditeur 69.42 € TTC)

Classical Random Matrix Theory

Deterministic Matrices
Wigner Ensemble and Semi-Circle Law
More on Gaussian Matrices*
Wishart Ensemble and Marčenko–Pastur Distribution
Joint Distribution of Eigenvalues
Eigenvalues and Orthogonal Polynomials*
The Jacobi Ensemble*

Sums and Products of Random Matrices

Addition of Random Variables and Brownian Motion
Dyson Brownian Motion
Addition of Large Random Matrices
Free Probabilities
Free Random Matrices
The Replica Method*
Edge Eigenvalues and Outliers

Applications

Addition and Multiplication: Recipes and Examples
Products of Many Random Matrices
Sample Covariance Matrices
Bayesian Estimation
Eigenvector Overlaps and Rotationally Invariant Estimators
Applications to Finance

Critique du livre par la rédaction Thibaut Cuvelier le 12 janvier 2023

En science des données, notamment, certains outils sont régulièrement ignorés, par exemple parce que requérant trop de bases mathématiques. Sans nul doute, les matrices aléatoires en font partie. Cet ouvrage développe la théorie sous-jacente, tout en laissant une place aux applications en dehors de cette théorie, qu'elles soient plus ou moins appliquées (analyse de données, économie, physique, etc.). Les auteurs l'ont voulu accessible, introductif et pédagogique, alors qu'il dépasse largement de ce cadre (les chapitres et sections plus avancés sont indiqués par une astérisque ou présentés dans une fonte plus petite).

Le livre s'ouvre sur une liste de notations, chose que l'on aimerait voir plus souvent. Les auteurs font plus souvent appel à l'intuition, par exemple en rappelant des résultats classiques sur des valeurs scalaires, qu'à des développements mathématiques très poussés, même s'ils ne sont pas omis. Le choix des sujets exposés a été guidé par les applications, c'est pour cela que seules les matrices à valeurs réelles sont mises en avant (les nombres complexes et quaternions sont aussi mentionnés, mais le plus souvent en note de base de page) et que l'étude des matrices de covariance est poussée.

Commenter Signaler un problème

dourouc05 - Responsable Qt & Livres

l 12/01/2023 à 1:08

A First Course in Random Matrix Theory
For physicists, engineers and data scientists

The real world is perceived and broken down as data, models and algorithms in the eyes of physicists and engineers. Data is noisy by nature and classical statistical tools have so far been successful in dealing with relatively smaller levels of randomness. The recent emergence of Big Data and the required computing power to analyse them have rendered classical tools outdated and insufficient. Tools such as random matrix theory and the study of large sample covariance matrices can efficiently process these big data sets and help make sense of modern, deep learning algorithms. Presenting an introductory calculus course for random matrices, the book focusses on modern concepts in matrix theory, generalising the standard concept of probabilistic independence to non-commuting random variables. Concretely worked out examples and applications to financial engineering and portfolio construction make this unique book an essential tool for physicists, engineers, data analysts, and economists.

[Lire la suite]

Avez-vous lu ce livre ou pensez-vous le lire ?
Souhaitez-vous ajouter une critique de ce livre sur la page de la rubrique ?
Avez-vous un commentaire à faire ?

couverture du livre Foundations of Data Science

Détails du livre

Sommaire

Critiques (1)

0 commentaire

Foundations of Data Science

de Avrim Blum, John Hopcroft, Ravindran Kannan

Public visé : Débutant

Résumé de l'éditeur

This book provides an introduction to the mathematical and algorithmic foundations of data science, including machine learning, high-dimensional geometry, and analysis of large networks. Topics include the counterintuitive nature of data in high dimensions, important linear algebraic techniques such as singular value decomposition, the theory of random walks and Markov chains, the fundamentals of and important algorithms for machine learning, algorithms and analysis for clustering, probabilistic models for large networks, representation learning including topic modelling and non-negative matrix factorization, wavelets and compressed sensing. Important probabilistic techniques are developed including the law of large numbers, tail inequalities, analysis of random projections, generalization guarantees in machine learning, and moment methods for analysis of phase transitions in large random graphs. Additionally, important structural and complexity measures are discussed such as matrix norms and VC-dimension. This book is suitable for both undergraduate and graduate courses in the design and analysis of algorithms for data.

Édition : Cambridge - 432 pages, 1^re édition, 23 janvier 2020

ISBN10 : 1108485065 - ISBN13 : 9781108485067

Commandez sur www.amazon.fr :

52.04 € TTC (prix éditeur 52.04 € TTC)

Introduction
High-Dimensional Space
Best-Fit Subspaces and Singular Value Decomposition (SVD)
Random Walks and Markov Chains
Machine Learning
Algorithms for Massive Data Problems: Streaming, Sketching, and Sampling
Clustering
Random Graphs
Topic Models, Nonnegative Matrix Factorization, Hidden Markov Models, and Graphical Models
Other Topics
Wavelets
Background Material

Critique du livre par la rédaction Thibaut Cuvelier le 29 décembre 2022

Beaucoup de gens l'oublient, la science des données tire ses racines dans les mathématiques, notamment les statistiques. Ce livre agira sûrement comme une piqûre de rappel, étant donné qu'il fournit une introduction aux principes mathématiques sous-jacents. Ces sujets ne sont pas abordés sous un angle purement théorique, mais véritablement de mathématiques appliquées : les théorèmes proposés sont souvent des preuves de convergence d'algorithmes très utilisés, les outils théoriques sont utiles dans le domaine spécifique de la science des données (comme la dimension VC), par exemple. Cela permet aux auteurs de construire les algorithmes depuis les principes de base.

Contrairement à ce que l'on pourrait croire, toutefois, les prérequis sont assez faibles (des bases en statistiques, en algèbre linéaire, en programmation). La majorité des résultats basiques très utiles de ces domaines sont d'ailleurs prouvés. Les auteurs présentent de manière rigoureuse et claire les principes utilisés en science des données, aidés en cela par un choix de notations cohérent. Ils ne se limitent pas aux sujets les plus basiques (chaînes de Markov, SVD, etc.), ils explorent aussi les flux de données et les enjeux qui en découlent, tout en présentant des techniques moins omniprésentes comme l'acquisition comprimée ou les vaguelettes.

Chaque chapitre présente une série d'exercices, mais sans solution, en plus d'une liste de références académiques, surtout historiques. On peut regretter que les analyses mathématiques présentées soient véritablement développées pour tous les sujets… sauf l'apprentissage automatique.

Commenter Signaler un problème

dourouc05 - Responsable Qt & Livres

l 29/12/2022 à 15:01

Foundations of Data Science

This book provides an introduction to the mathematical and algorithmic foundations of data science, including machine learning, high-dimensional geometry, and analysis of large networks. Topics include the counterintuitive nature of data in high dimensions, important linear algebraic techniques such as singular value decomposition, the theory of random walks and Markov chains, the fundamentals of and important algorithms for machine learning, algorithms and analysis for clustering, probabilistic models for large networks, representation learning including topic modelling and non-negative matrix factorization, wavelets and compressed sensing. Important probabilistic techniques are developed including the law of large numbers, tail inequalities, analysis of random projections, generalization guarantees in machine learning, and moment methods for analysis of phase transitions in large random graphs. Additionally, important structural and complexity measures are discussed such as matrix norms and VC-dimension. This book is suitable for both undergraduate and graduate courses in the design and analysis of algorithms for data.

[Lire la suite]

Avez-vous lu ce livre ou pensez-vous le lire ?
Souhaitez-vous ajouter une critique de ce livre sur la page de la rubrique ?
Avez-vous un commentaire à faire ?

couverture du livre Optimization for Data Analysis

Détails du livre

Sommaire

Critiques (1)

0 commentaire

Optimization for Data Analysis

de Steven J. Wright, Benjamin Recht

Résumé de l'éditeur

Optimization techniques are at the core of data science, including data analysis and machine learning. An understanding of basic optimization techniques and their fundamental properties provides important grounding for students, researchers, and practitioners in these areas. This text covers the fundamentals of optimization algorithms in a compact, self-contained way, focusing on the techniques most relevant to data science.

An introductory chapter demonstrates that many standard problems in data science can be formulated as optimization problems. Next, many fundamental methods in optimization are described and analyzed, including: gradient and accelerated gradient methods for unconstrained optimization of smooth (especially convex) functions; the stochastic gradient method, a workhorse algorithm in machine learning; the coordinate descent approach; several key algorithms for constrained optimization problems; algorithms for minimizing nonsmooth functions arising in data science; foundations of the analysis of nonsmooth functions and optimization duality; and the back-propagation approach, relevant to neural networks.

Édition : Cambridge - 238 pages, 1^re édition, 28 avril 2022

ISBN10 : 1316518981 - ISBN13 : 9781316518984

Commandez sur www.amazon.fr :

50.74 € TTC (prix éditeur 50.74 € TTC)

Introduction
Foundations of Smooth Optimization
Descent Methods
Gradient Methods Using Momentum
Stochastic Gradient
Coordinate Descent
First-Order Methods for Constrained Optimization
Nonsmooth Functions and Subgradients
Nonsmooth Optimization Methods
Duality and Algorithms
Differentiation and Adjoints

Critique du livre par la rédaction Thibaut Cuvelier le 2 décembre 2022

Optimisation et science des données sont deux domaines fort proches, mais qui gagneraient à travailler main dans la main. Cet ouvrage propose une introduction aux techniques d'optimisation spécifiquement dans le cadre de la science des données, en se focalisant sur les méthodes du premier ordre utilisées pour l'entraînement de modèles. Les auteurs incluent des preuves de convergence en temps fini (et non des preuves asymptotiques), ce qui permet de garantir une bonne performance théorique ; cependant, ils la contrastent toujours avec la performance pratique pour des problèmes de science des données.

La liste des sujets abordés est longue, surtout par rapport au volume de l'ouvrage : problèmes convexes ou non, dérivables ou non, contraints ou non, avec des méthodes duales. Cependant, l'optimisation discrète est oubliée, malgré son utilité notamment en sélection de variables. Le dernier chapitre aborde le sujet très important de la dérivation automatique, une brique essentielle du développement actuel de l'intelligence artificielle.

Le contenu est très mathématique, peu de preuves sont laissées au lecteur. Les chapitres tendent à suivre une structure similaire, ce qui facilite la recherche d'information a posteriori. Chaque chapitre se termine par une liste de références bibliographiques, mais aussi quelques exercices (proposés sans solution).

Commenter Signaler un problème

dourouc05 - Responsable Qt & Livres

l 02/12/2022 à 15:04

Optimization for Data Analysis

Optimization techniques are at the core of data science, including data analysis and machine learning. An understanding of basic optimization techniques and their fundamental properties provides important grounding for students, researchers, and practitioners in these areas. This text covers the fundamentals of optimization algorithms in a compact, self-contained way, focusing on the techniques most relevant to data science.

An introductory chapter demonstrates that many standard problems in data science can be formulated as optimization problems. Next, many fundamental methods in optimization are described and analyzed, including: gradient and accelerated gradient methods for unconstrained optimization of smooth (especially convex) functions; the stochastic gradient method, a workhorse algorithm in machine learning; the coordinate descent approach; several key algorithms for constrained optimization problems; algorithms for minimizing nonsmooth functions arising in data science; foundations of the analysis of nonsmooth functions and optimization duality; and the back-propagation approach, relevant to neural networks.

[Lire la suite]

Avez-vous lu ce livre ou pensez-vous le lire ?
Souhaitez-vous ajouter une critique de ce livre sur la page de la rubrique ?
Avez-vous un commentaire à faire ?

Détails du livre

Sommaire

Critiques (1)

0 commentaire

Clustering

Theoretical and Practical Aspects

de Dan A. Simovici

Public visé : Intermédiaire

Résumé de l'éditeur

This unique compendium gives an updated presentation of clustering, one of the most challenging tasks in machine learning. The book provides a unitary presentation of classical and contemporary algorithms ranging from partitional and hierarchical clustering up to density-based clustering, clustering of categorical data, and spectral clustering.

Most of the mathematical background is provided in appendices, highlighting algebraic and complexity theory, in order to make this volume as self-contained as possible. A substantial number of exercises and supplements makes this a useful reference textbook for researchers and students.

Édition : World Scientific - 884 pages, 1^re édition, 12 août 2021

ISBN10 : 9811241198 - ISBN13 : 9789811241192

Commandez sur www.amazon.fr :

239.51 € TTC (prix éditeur 239.51 € TTC)

Introduction
Set-Theoretical Preliminaries
Dissimilarities, Metrics, and Ultrametrics
Convexity
Graphs and Hypergraphs
Partitional Clustering
Statistical Approaches to Clustering
Hierarchical Clustering
Density-based Clustering
Categorical Data Clustering
Spectral Clustering
Correlation and Consensus Clustering
Clustering Quality
Clustering Axiomatization
Biclustering
Semi-supervised Clustering
Special Functions and Applications
Linear Algebra
Linear Programming
NP Completeness

Critique du livre par la rédaction Thibaut Cuvelier le 15 mars 2022

Le partitionnement de données est l'approche principale quand il s'agit d'extraire de l'information d'un jeu de données sans que l'on sache exactement ce qu'il faut prédire. Il s'agit d'un des domaines de la science des données qui voit le plus de nouveaux développements, notamment de par l'abondance de telles données, mais aussi l'un des plus anciens.

Cet ouvrage est prévu comme une véritable référence dans le domaine : tous les domaines principaux du partitionnement ont droit à leur chapitre pour en expliquer les principes, les résultats mathématiques principaux et quelques algorithmes parmi les plus importants, des plus classiques aux plus modernes (en présentant, le cas échéant, des preuves de convergence ou des ratios d'approximation par rapport à des algorithmes exacts), avec des références vers la littérature scientifique. Quand la plupart des livres traitant du partitionnement se limitent à des données continues, l'auteur s'aventure aussi dans le domaine des données catégoriques (par exemple, pour du partitionnement avec des règles d'association). Autant que possible, l'ouvrage indique des liens avec la théorie de la complexité, afin de montrer le besoin d'approximations pour obtenir des complexités temporelles polynomiales.

Toutefois, les bases mathématiques requises sont très élevées. Presque tous les éléments requis sont expliqués dans les premiers chapitres (trois cents pages) ou dans les annexes (cent pages), à l'exception des notions de dérivée, de matrice ou de nombre complexe. Néanmoins, ces notions sont présentées d'une manière très concise et difficile d'accès pour ceux qui n'ont pas déjà connaissance des principes de base et de la formalisation des domaines en question (théorie des ensembles ou des graphes, convexité, mesure de similarité et de distance, etc.). Ceux qui ont déjà eu une bonne exposition pourront cependant profiter des preuves présentées, qui permettent d'approfondir sa compréhension selon des points de vue spécifiques au partitionnement.

Le titre indique que la pratique du partitionnement est abordée, mais cela se limite à la mise en œuvre de quelques algorithmes en R et en Python, sans proposer d'étude de cas significative.

Commenter Signaler un problème

dourouc05 - Responsable Qt & Livres

l 15/03/2022 à 2:17

Clustering
Theoretical and Practical Aspects

This unique compendium gives an updated presentation of clustering, one of the most challenging tasks in machine learning. The book provides a unitary presentation of classical and contemporary algorithms ranging from partitional and hierarchical clustering up to density-based clustering, clustering of categorical data, and spectral clustering.

Most of the mathematical background is provided in appendices, highlighting algebraic and complexity theory, in order to make this volume as self-contained as possible. A substantial number of exercises and supplements makes this a useful reference textbook for researchers and students.

[Lire la suite]

Avez-vous lu ce livre ou pensez-vous le lire ?
Souhaitez-vous ajouter une critique de ce livre sur la page de la rubrique ?
Avez-vous un commentaire à faire ?

Détails du livre

Sommaire

Critiques (1)

2 commentaires

Operations Research

Introduction to Models and Methods

de Richard J Boucherie, Aleida Braaksma, Henk Tijms

Public visé : Débutant

Résumé de l'éditeur

This attractive textbook with its easy-to-follow presentation provides a down-to-earth introduction to operations research for students in a wide range of fields such as engineering, business analytics, mathematics and statistics, computer science, and econometrics. It is the result of many years of teaching and collective feedback from students.

The book covers the basic models in both deterministic and stochastic operations research and is a springboard to more specialized texts, either practical or theoretical. The emphasis is on useful models and interpreting the solutions in the context of concrete applications.

The text is divided into several parts. The first three chapters deal exclusively with deterministic models, including linear programming with sensitivity analysis, integer programming and heuristics, and network analysis. The next three chapters primarily cover basic stochastic models and techniques, including decision trees, dynamic programming, optimal stopping, production planning, and inventory control. The final five chapters contain more advanced material, such as discrete-time and continuous-time Markov chains, Markov decision processes, queueing models, and discrete-event simulation.

Each chapter contains numerous exercises, and a large selection of exercises includes solutions.

Édition : World Scientific - 512 pages, 1^re édition, 17 décembre 2021

ISBN10 : 9811239347 - ISBN13 : 9789811239342

Commandez sur www.amazon.fr :

147.94 € TTC (prix éditeur 147.94 € TTC)

Linear Programming
Integer Programming
Network Analysis
Decision Trees
Dynamic Programming
Inventory Management
Discrete-Time Markov Chains
Continuous-Time Markov Chains
Queueing Theory
Markov Decision Processes
Simulation

Critique du livre par la rédaction Thibaut Cuvelier le 18 janvier 2022

La recherche opérationnelle est un vaste domaine aux applications multiples et variées, qui nécessitent des techniques parfois très différentes. Cet ouvrage propose une introduction à toutes les spécialités principales de cette branche des mathématiques appliquées, tout en gardant une approche abordable à un public très varié. Certains livres tentent de traiter le sujet, mais gardent un niveau de formalisme très élevé et approfondissent des domaines dont l'intérêt ne saute pas aux yeux d'un débutant ; au contraire, l'ouvrage de Boucherie, Braaksma et Tijms fait preuve d'une pédagogie exemplaire, en introduisant des algorithmes et des modèles, mais aussi leur utilité réelle. Les auteurs préfèrent toujours une explication parfois très longue mais compréhensible à un texte concis.

Au programme, on retrouve les techniques d'optimisation mathématique (avec des modèles uniquement déterministes), notamment pour des problèmes de réseau, mais aussi des probabilités (arbres de probabilités, chaînes de Markov, files d'attente) et de la prise de décision sous incertitude (programmation dynamique, bases de l'apprentissage par renforcement). Les auteurs gardent aussi en tête que l'on peut approcher un même problème de diverses manières et n'hésitent pas à montrer l'exemple dans certaines applications.

Chaque chapitre se termine par des exercices, dont la solution succincte se trouve en annexe. On peut regretter que des outils des auteurs soient utilisés, plutôt que des applications habituellement utilisées dans la recherche (tant académique qu'industrielle) ou les applications.

Commenter Signaler un problème

dourouc05 - Responsable Qt & Livres

l 18/01/2022 à 4:20

Operations Research
Introduction to Models and Methods

This attractive textbook with its easy-to-follow presentation provides a down-to-earth introduction to operations research for students in a wide range of fields such as engineering, business analytics, mathematics and statistics, computer science, and econometrics. It is the result of many years of teaching and collective feedback from students.

The book covers the basic models in both deterministic and stochastic operations research and is a springboard to more specialized texts, either practical or theoretical. The emphasis is on useful models and interpreting the solutions in the context of concrete applications.

The text is divided into several parts. The first three chapters deal exclusively with deterministic models, including linear programming with sensitivity analysis, integer programming and heuristics, and network analysis. The next three chapters primarily cover basic stochastic models and techniques, including decision trees, dynamic programming, optimal stopping, production planning, and inventory control. The final five chapters contain more advanced material, such as discrete-time and continuous-time Markov chains, Markov decision processes, queueing models, and discrete-event simulation.

Each chapter contains numerous exercises, and a large selection of exercises includes solutions.

[Lire la suite]

Avez-vous lu ce livre ou pensez-vous le lire ?
Souhaitez-vous ajouter une critique de ce livre sur la page de la rubrique ?
Avez-vous un commentaire à faire ?

Stellar7 - Membre éclairé

l 18/01/2022 à 18:40

De ce que je peux lire, très bonne approche que de vouloir s'exprimer pour "tout le monde". Cela manque souvent pour comprendre les différentes approches, et en choisir une à approfondir.
Ce qui m'inquiète un peu, que ce soit ce livre ou d'autres, c'est le prix affiché. Cela risque de devenir un livre qu'on consulte à la bibliothèque (disons à la BU), et pas que l'on achète.

[HS Léger]
Dans la même veine, légèrement HS : certains livres non édités depuis longtemps, et pourtant pas très vendus, sont souvent très chers (p.ex. sur programmation lambda/fonctionnelle, Scheme/Lisp/Prolog). A contrario, certains ne sont presque plus trouvables et bradés, avec pourtant des raisonnements et algorithmes "de base" (comment dessiner une droite, un arc de cercle ou d'ellipse, pixel par pixel, de manière optimisée, p.ex. Hégron).
[/HS]

Bonnes lectures à tout le monde !

dourouc05 - Responsable Qt & Livres

l 20/01/2022 à 17:20

Envoyé par Stellar7

Ce qui m'inquiète un peu, que ce soit ce livre ou d'autres, c'est le prix affiché. Cela risque de devenir un livre qu'on consulte à la bibliothèque (disons à la BU), et pas que l'on achète.

Le prix du livre broché sur Amazon est plus ou moins abordable : 56 €. Plus de 140 €, comme celui cartonné, ce serait 100 % impayable…

couverture du livre Interactive Dashboards and Data Apps with Plotly and Dash

Détails du livre

Sommaire

Critiques (1)

0 commentaire

Interactive Dashboards and Data Apps with Plotly and Dash

Harness the power of a fully fledged frontend web framework in Python — no JavaScript required

de Elias Dabbas

Public visé : Débutant

Résumé de l'éditeur

With Plotly's Dash framework, it is now easier than ever for Python programmers to develop complete data apps and interactive dashboards. Dash apps can be used by a non-technical audience, and this will make data analysis accessible to a much wider group of people. This book will help you to explore the functionalities of Dash for visualizing data in different ways and getting the most out of it.

The book starts with an overview of the Dash ecosystem, its main packages, and the third-party packages crucial for structuring and building different parts of your apps. You'll learn how to create a basic Dash app and add different features to it. Next, you’ll integrate controls such as dropdowns, checkboxes, sliders, date pickers, and more in the app and then link them to charts and other outputs. Depending on the data you are visualizing, you'll also add several types of charts, including scatter plots, line plots, bar charts, histograms, and maps, as well as explore the options available for customizing them.

By the end of this book, you'll have developed the skills you need to create and deploy an interactive dashboard, handle complexities and code refactoring, and understand the process of improving your application.

Édition : Packt - 364 pages, 1^re édition, 21 mai 2021

ISBN10 : 1800568916 - ISBN13 : 9781800568914

Commandez sur www.amazon.fr :

47.21 € TTC (prix éditeur 47.21 € TTC)

Building a Dash App

Overview of the Dash Ecosystem
Exploring the Structure of a Dash App
Working with Plotly's Figure Objects
Data Manipulation and Preparation, Paving the Way to Plotly Express

Adding Functionality to Your App with Real Data

Interactively Comparing Values with Bar Charts and Dropdown Menus
Exploring Variables with Scatter Plots and Filtering Subsets with Sliders
Exploring Map Plots and Enriching Your Dashboards with Markdown
Calculating the Frequency of Your Data with Histograms and Building Interactive Tables

Taking Your App to the Next Level

Letting Your Data Speak for Itself with Machine Learning
Turbo-charge Your Apps with Advanced Callbacks
URLs and Multi-Page Apps
Deploying Your App
Next Steps

Critique du livre par la rédaction Thibaut Cuvelier le 27 décembre 2021

Pour tirer le meilleur profit des données disponibles, il est souvent nécessaire de passer par une phase de visualisation, que ce soit pour comprendre les données ou pour les montrer à d'autres. Ce livre utilise Plotly et Dash pour la réalisation de tableaux de bord, qui correspondent plutôt à ce deuxième besoin (montrer des données). Quand Plotly se charge de l'affichage de graphiques, Dash les intègre dans une application Web.

Ce livre, très agréable à lire, n'a pas de prérequis particuliers, à part la connaissance de Python : il n'est même pas nécessaire d'avoir une petite habitude de travailler avec des données, les bases sont expliquées (tout comme HTML, CSS et Bootstrap). Tout au long de l'ouvrage, l'auteur construit une application de visualisation d'un jeu de données de l'ONU, avec des graphiques plus avancés et de plus en plus d'interaction (jusque des animations).

Le code est toujours bien expliqué en détail, sans fioritures et avec les meilleures pratiques actuelles en Python ; à certains moments, plusieurs manières de procéder exploitant différentes facettes de Plotly sont proposées et comparées. Cependant, on pourra regretter que l'auteur ne fasse pas vraiment le lien avec les meilleures pratiques de visualisation (même si elles sont de mise dans les exemples).

Le chapitre sur l'apprentissage est quelque peu décevant, puisque seule une technique de partitionnement des données est présentée (ce que l'on peut difficilement qualifier d'apprentissage). On peut regretter qu'il n'y ait pas d'implémentation du principe de brushing and linking, une forme d'interactivité très efficace. Toutes les figures sont imprimées en noir et blanc, alors que le texte fait régulièrement référence aux couleurs présentes sur ces images : heureusement qu'elles sont disponibles en ligne.

Commenter Signaler un problème

dourouc05 - Responsable Qt & Livres

l 27/12/2021 à 1:37

Interactive Dashboards and Data Apps with Plotly and Dash
Harness the power of a fully fledged frontend web framework in Python — no JavaScript required

With Plotly's Dash framework, it is now easier than ever for Python programmers to develop complete data apps and interactive dashboards. Dash apps can be used by a non-technical audience, and this will make data analysis accessible to a much wider group of people. This book will help you to explore the functionalities of Dash for visualizing data in different ways and getting the most out of it.

The book starts with an overview of the Dash ecosystem, its main packages, and the third-party packages crucial for structuring and building different parts of your apps. You'll learn how to create a basic Dash app and add different features to it. Next, you’ll integrate controls such as dropdowns, checkboxes, sliders, date pickers, and more in the app and then link them to charts and other outputs. Depending on the data you are visualizing, you'll also add several types of charts, including scatter plots, line plots, bar charts, histograms, and maps, as well as explore the options available for customizing them.

By the end of this book, you'll have developed the skills you need to create and deploy an interactive dashboard, handle complexities and code refactoring, and understand the process of improving your application.

[Lire la suite]

Avez-vous lu ce livre ou pensez-vous le lire ?
Souhaitez-vous ajouter une critique de ce livre sur la page de la rubrique ?
Avez-vous un commentaire à faire ?

couverture du livre Hands-On Unsupervised Learning Using Python

Détails du livre

Sommaire

Critiques (1)

0 commentaire

Hands-On Unsupervised Learning Using Python

How to Build Applied Machine Learning Solutions from Unlabeled Data

de Ankur A. Patel

Public visé : Débutant

Résumé de l'éditeur

Many industry experts consider unsupervised learning the next frontier in artificial intelligence, one that may hold the key to general artificial intelligence. Since the majority of the world's data is unlabeled, conventional supervised learning cannot be applied. Unsupervised learning, on the other hand, can be applied to unlabeled datasets to discover meaningful patterns buried deep in the data, patterns that may be near impossible for humans to uncover.

Author Ankur Patel shows you how to apply unsupervised learning using two simple, production-ready Python frameworks: Scikit-learn and TensorFlow using Keras. With code and hands-on examples, data scientists will identify difficult-to-find patterns in data and gain deeper business insight, detect anomalies, perform automatic feature engineering and selection, and generate synthetic datasets. All you need is programming and some machine learning experience to get started.

Compare the strengths and weaknesses of the different machine learning approaches: supervised, unsupervised, and reinforcement learning
Set up and manage machine learning projects end-to-end
Build an anomaly detection system to catch credit card fraud
Clusters users into distinct and homogeneous groups
Perform semisupervised learning
Develop movie recommender systems using restricted Boltzmann machines
Generate synthetic images using generative adversarial networks

Édition : O'Reilly - 400 pages, 1^re édition, 18 mars 2019

ISBN10 : 1492035645 - ISBN13 : 9781492035640

Commandez sur www.amazon.fr :

43.01 € TTC (prix éditeur 43.01 € TTC)

Fundamentals of Unsupervised Learning

Unsupervised Learning in the Machine Learning Ecosystem
End-to-End Machine Learning Project

Unsupervised Learning Using Scikit-Learn

Dimensionality Reduction
Anomaly Detection
Clustering
Group Segmentation

Unsupervised Learning Using TensorFlow and Keras

Autoencoders
Hands-On Autoencoder
Semisupervised Learning

Deep Unsupervised Learning Using TensorFlow and Keras

Recommender Systems Using Restricted Boltzmann Machines
Feature Detection Using Deep Belief Networks
Generative Adversarial Networks
Time Series Clustering

Critique du livre par la rédaction Thibaut Cuvelier le 2 juillet 2019

Le titre de cet ouvrage promet une belle partie appliquée, c'est effectivement ce que l'on ressent à sa lecture : on ne compte plus les lignes de code pour bien montrer ce que l'auteur fait, notamment dans ses graphiques (le code les générant étant présent dans le livre in extenso). Tout le code est d'ailleurs écrit avec Python 3, en utilisant les dernières versions des bibliothèques, afin de rester utilisable aussi longtemps que possible. Ce côté appliqué est présent tout au long du livre, l'auteur cherche toujours à présenter une utilité aux algorithmes qu'il aborde, il ne se contente pas d'un inventaire à la Prévert, le lien avec les applications réalistes est toujours présent.

L'ouvrage est construit progressivement, avec des techniques de plus en plus avancées, en présentant d'abord brièvement les concepts théoriques (sans mathématiques, car tel n'est pas le but du livre), les algorithmes, puis en plongeant dans la pratique. Les approches sont bien souvent comparées sur un même exemple, afin d'en voir les avantages et inconvénients. Cependant, l'apprentissage non supervisé n'est vu que sous un seul angle : l'exploitation de données sans étiquettes dans l'objectif d'effectuer des prédictions, c'est-à-dire comme une approche entièrement supervisée. Ce faisant, tous les aspects d'analyse de données sont négligés : il aurait été agréable, par exemple, de voir une application de partitionnement de données pour comprendre ce qu'elles contiennent (comme déterminer, sans a priori, les différentes manières de participer à un jeu). Au contraire, dans les exemples de partitionnement, on sait d'avance le nombre de classes que l'on cherche.

Au niveau de la présentation, une grande quantité de code et parfois d'images est redondante. Dans les premiers exemples, qui montrent plusieurs algorithmes d'apprentissage supervisé, la validation croisée est présentée à chaque fois, au lieu de se focaliser sur les différences entre les algorithmes. Chaque chapitre commence par une bonne page d'importation de modules Python (y compris des modules qui ne sont pas utilisés dans ce chapitre !). Certaines parties présentent une grande quantité d'images disposées de telle sorte qu'elles prennent un maximum de place (six images de taille raisonnable présentées sur trois pages, alors qu'en les réduisant un peu on aurait pu tout faire tenir sur une seule face…). Par ailleurs, toutes les images sont en noir et blanc, mais ont été conçues en couleurs : il est souvent difficile de s'y retrouver, car l'information de couleur est très exploitée (notamment pour présenter plusieurs courbes : elles ont sûrement des couleurs très différentes, mais les niveaux de gris se ressemblent trop pour que l'on arrive à faire la distinction entre les courbes).

Le côté technique m'a vraiment déçu. Les algorithmes sont présentés très rapidement, leurs paramètres sont quelque peu vus comme des boîtes noires ou simplement ignorés : comment peut-on en comprendre l'impact sur la solution ? Le chapitre sur la détection d'anomalies n'est vu que comme une application de la réduction de dimensionnalité, on ne trouve aucune discussion des algorithmes spécifiquement prévus pour cette tâche (forêts d'isolation, SVM à une classe, etc.), ce qui est assez réducteur. On ne trouve aucune mention des plongements (comme word2vec pour la représentation de mots) dans la section sur les autoencodeurs, alors que c'en est une application très importante.

Le public ciblé semble n'avoir qu'une assez faible expérience en apprentissage automatique. Le livre sera surtout utile à ceux qui veulent une introduction rapide et pas trop poussée au domaine de l'apprentissage non supervisé, un survol du domaine en abordant toutes ses facettes principales. Ceux qui se demandent à quoi l'apprentissage non supervisé peut bien être utile seront servis, mais n'en verront pas toutes les possibilités.

Commenter Signaler un problème

dourouc05 - Responsable Qt & Livres

l 12/07/2019 à 19:00

Hands-On Unsupervised Learning Using Python

Many industry experts consider unsupervised learning the next frontier in artificial intelligence, one that may hold the key to general artificial intelligence. Since the majority of the world's data is unlabeled, conventional supervised learning cannot be applied. Unsupervised learning, on the other hand, can be applied to unlabeled datasets to discover meaningful patterns buried deep in the data, patterns that may be near impossible for humans to uncover.

Author Ankur Patel shows you how to apply unsupervised learning using two simple, production-ready Python frameworks: Scikit-learn and TensorFlow using Keras. With code and hands-on examples, data scientists will identify difficult-to-find patterns in data and gain deeper business insight, detect anomalies, perform automatic feature engineering and selection, and generate synthetic datasets. All you need is programming and some machine learning experience to get started.

Compare the strengths and weaknesses of the different machine learning approaches: supervised, unsupervised, and reinforcement learning
Set up and manage machine learning projects end-to-end
Build an anomaly detection system to catch credit card fraud
Clusters users into distinct and homogeneous groups
Perform semisupervised learning
Develop movie recommender systems using restricted Boltzmann machines
Generate synthetic images using generative adversarial networks

Voir les critiques.

couverture du livre Natural Language Processing with PyTorch

Détails du livre

Sommaire

Critiques (1)

0 commentaire

Natural Language Processing with PyTorch

Build Intelligent Language Applications Using Deep Learning

de Delip Rao, Brian McMahan

Public visé : Débutant

Résumé de l'éditeur

Natural Language Processing (NLP) provides boundless opportunities for solving problems in artificial intelligence, making products such as Amazon Alexa and Google Translate possible. If you’re a developer or data scientist new to NLP and deep learning, this practical guide shows you how to apply these methods using PyTorch, a Python-based deep learning library.

Authors Delip Rao and Brian McMahon provide you with a solid grounding in NLP and deep learning algorithms and demonstrate how to use PyTorch to build applications involving rich representations of text specific to the problems you face. Each chapter includes several code examples and illustrations.

Explore computational graphs and the supervised learning paradigm
Master the basics of the PyTorch optimized tensor manipulation library
Get an overview of traditional NLP concepts and methods
Learn the basic ideas involved in building neural networks
Use embeddings to represent words, sentences, documents, and other features
Explore sequence prediction and generate sequence-to-sequence models
Learn design patterns for building production NLP systems

Édition : O'Reilly - 256 pages, 1^re édition, 5 février 2019

ISBN10 : 1491978236 - ISBN13 : 9781491978238

Commandez sur www.amazon.fr :

52.25 € TTC (prix éditeur 52.25 € TTC)

Chapter 1. Introduction
Chapter 2. A Quick Tour of Traditional NLP
Chapter 3. Foundational Components of Neural Networks
Chapter 4. Feed-Forward Networks for Natural Language Processing
Chapter 5. Embedding Words and Types
Chapter 6. Sequence Modeling for Natural Language Processing
Chapter 7. Intermediate Sequence Modeling for Natural Language Processing
Chapter 8. Advanced Sequence Modeling for Natural Language Processing
Chapter 9. Classics, Frontiers, and Next Steps

Critique du livre par la rédaction Vincent PETIT le 14 août 2019

Cet ouvrage montre comment mettre en œuvre PyTorch, bibliothèque Python d’apprentissage machine, au travers du traitement du langage naturel humain. Il s’adresse à un public assez large et à la recherche d’une initiation, mais une solide connaissance du langage Python est requise.

Le livre se compose de 9 chapitres organisés de sorte à rendre la lecture progressive.

Une bonne moitié du livre concerne un tour d’horizon de la structure du langage naturel puis de l’apprentissage par réseau de neurones, perceptron, perceptron multicouches, réseau de neurones à convolution, sachant que la théorie mathématique sous-jacente n’est pas abordée, il s’agit ici de montrer comment l’on se sert de Pytorch.

Puis viennent les chapitres traitant de la vectorisation des mots dans une phrase, des méthodes d’optimisation pour l’apprentissage supervisé ou non, du séquençage des données, de leur prédiction et de leur étiquetage et enfin, des perspectives et limites du thème.

Le sujet est bien expliqué et les exemples de code suffisamment nombreux pour permettre de bien assimiler les principes. On se sent vraiment guidé jusqu’à la fin. J’ai trouvé intéressant le parallèle fait avec le langage naturel et en résumé, je dirai que le livre est un bon complément aux tutoriels qu’on retrouve sur internet et qu’il apporte un plus, notamment parce qu’il permet de s’inspirer de la méthode employée par les auteurs.

Tous les extraits de code publiés dans le livre sont disponibles sur un github.

Commenter Signaler un problème

dourouc05 - Responsable Qt & Livres

l 12/08/2019 à 1:38

Natural Language Processing with PyTorch
Build Intelligent Language Applications Using Deep Learning

Natural Language Processing (NLP) provides boundless opportunities for solving problems in artificial intelligence, making products such as Amazon Alexa and Google Translate possible. If you’re a developer or data scientist new to NLP and deep learning, this practical guide shows you how to apply these methods using PyTorch, a Python-based deep learning library.

Authors Delip Rao and Brian McMahon provide you with a solid grounding in NLP and deep learning algorithms and demonstrate how to use PyTorch to build applications involving rich representations of text specific to the problems you face. Each chapter includes several code examples and illustrations.

Explore computational graphs and the supervised learning paradigm
Master the basics of the PyTorch optimized tensor manipulation library
Get an overview of traditional NLP concepts and methods
Learn the basic ideas involved in building neural networks
Use embeddings to represent words, sentences, documents, and other features
Explore sequence prediction and generate sequence-to-sequence models
Learn design patterns for building production NLP systems

[Lire la suite]

Avez-vous lu ce livre ou pensez-vous le lire ?
Souhaitez-vous ajouter une critique de ce livre sur la page de la rubrique ?
Avez-vous un commentaire à faire ?

Détails du livre

Sommaire

Critiques (1)

0 commentaire

Data Visualization

Charts, Maps, and Interactive Graphics

de Robert Grant

Public visé : Intermédiaire

Résumé de l'éditeur

This is the age of data. There are more innovations and more opportunities for interesting work with data than ever before, but there is also an overwhelming amount of quantitative information being published every day. Data visualisation has become big business, because communication is the difference between success and failure, no matter how clever the analysis may have been. The ability to visualize data is now a skill in demand across business, government, NGOs and academia.

Data Visualization: Charts, Maps, and Interactive Graphics gives an overview of a wide range of techniques and challenges, while staying accessible to anyone interested in working with and understanding data.

Features:

Focusses on concepts and ways of thinking about data rather than algebra or computer code.
Features 17 short chapters that can be read in one sitting.
Includes chapters on big data, statistical and machine learning models, visual perception, high-dimensional data, and maps and geographic data.
Contains more than 125 visualizations, most created by the author.
Supported by a website with all code for creating the visualizations, further reading, datasets and practical advice on crafting the images.

Whether you are a student considering a career in data science, an analyst who wants to learn more about visualization, or the manager of a team working with data, this book will introduce you to a broad range of data visualization methods.

Édition : CRC Press - 218 pages, 1^re édition, 4 décembre 2018

ISBN10 : 113855359X - ISBN13 : 9781138553590

Commandez sur www.amazon.fr :

26.07 € TTC (prix éditeur 24.67 € TTC)

Why visualise?
Translating numbers to images
Continuous and discrete numbers
Percentages and risks
Showing data or statistics
Differences, ratios, correlations
Visual perception and the brain
Showing uncertainty
Time trends
Statistical predictive models
Machine learning techniques
Many variables
Maps and networks
Interactivity
Big data
Visualisation as part of a bigger package
Some overarching ideas

Critique du livre par la rédaction Thibaut Cuvelier le 18 juin 2019

La visualisation de données est un besoin de plus en plus pressant, notamment dans un contexte de mégadonnées : c'est bien de disposer de données, c'est mieux d'arriver à les exploiter correctement. La visualisation est un outil très utile pour cela, mais seulement quand elle est appliquée à bon escient. C'est ce que ce livre propose : des techniques de visualisation, un lien avec les statistiques, des principes de conception d'une bonne visualisation, mais aussi toute une série d'exemples. L'auteur a comme but d'enseigner la manière de réaliser des graphiques qui ont un impact.

L'auteur est un statisticien et cela se ressent dans la manière d'aborder les sujets : pas question d'afficher des barres d'erreur sans expliciter ce qu'elles représentent (écart type, erreur standard, intervalle de confiance ?), par exemple. Ce n'est pas une raison pour abrutir le lecteur de mathématiques, puisque l'ouvrage ne comporte aucune formule, vraiment aucune. Quelques outils statistiques sont présentés, mais assez brièvement, uniquement en expliquant les principes généraux (des références sont là pour compléter). Ce choix est parfois limitant : pour le bootstrap, notamment, l'auteur répète maintes fois l'utilité de la technique, mais ne l'explique pas vraiment.

Les mises en situation constituent l'épine dorsale du livre, en ce sens que chaque chapitre dispose d'une ou plusieurs visualisations réalistes, parfois comparées : quels sont les avantages de telle manière de représenter les données, quelles sont les interprétations plus faciles à réaliser sur tel graphique, quelle visualisation ne peut pas fonctionner (sans oublier le pourquoi, qu'il soit plutôt statistique ou visuel). Cette manière de procéder rend l'ouvrage très lisible et attirant.

Pour la mise en pratique, l'auteur met à disposition sur son site le code source de chaque graphique qu'il a réalisé pour le livre (surtout en R) — même si le livre en lui-même ne présente pas une seule ligne de code, ce n'est pas un tutoriel R ou Stata.

Commenter Signaler un problème

dourouc05 - Responsable Qt & Livres

l 19/06/2019 à 1:54

Data Visualization
Charts, Maps, and Interactive Graphics

This is the age of data. There are more innovations and more opportunities for interesting work with data than ever before, but there is also an overwhelming amount of quantitative information being published every day. Data visualisation has become big business, because communication is the difference between success and failure, no matter how clever the analysis may have been. The ability to visualize data is now a skill in demand across business, government, NGOs and academia.

Data Visualization: Charts, Maps, and Interactive Graphics gives an overview of a wide range of techniques and challenges, while staying accessible to anyone interested in working with and understanding data.

Features:

Focusses on concepts and ways of thinking about data rather than algebra or computer code.
Features 17 short chapters that can be read in one sitting.
Includes chapters on big data, statistical and machine learning models, visual perception, high-dimensional data, and maps and geographic data.
Contains more than 125 visualizations, most created by the author.
Supported by a website with all code for creating the visualizations, further reading, datasets and practical advice on crafting the images.

Whether you are a student considering a career in data science, an analyst who wants to learn more about visualization, or the manager of a team working with data, this book will introduce you to a broad range of data visualization methods.

[Lire la suite]

Avez-vous lu ce livre ou pensez-vous le lire ?
Souhaitez-vous ajouter une critique de ce livre sur la page de la rubrique ?
Avez-vous un commentaire à faire ?

couverture du livre Elements of Causal Inference

Détails du livre

Sommaire

Critiques (1)

0 commentaire

Elements of Causal Inference

Foundations and Learning Algorithms

de Jonas Peters, Dominik Janzing et Bernhard Schölkopf

Public visé : Expert

Résumé de l'éditeur

A concise and self-contained introduction to causal inference, increasingly important in data science and machine learning.

The mathematization of causality is a relatively recent development, and has become increasingly important in data science and machine learning. This book offers a self-contained and concise introduction to causal models and how to learn them from data. After explaining the need for causal models and discussing some of the principles underlying causal inference, the book teaches readers how to use causal models: how to compute intervention distributions, how to infer causal models from observational and interventional data, and how causal ideas could be exploited for classical machine learning problems. All of these topics are discussed first in terms of two variables and then in the more general multivariate case. The bivariate case turns out to be a particularly hard problem for causal learning because there are no conditional independences as used by classical methods for solving multivariate cases. The authors consider analyzing statistical asymmetries between cause and effect to be highly instructive, and they report on their decade of intensive research into this problem.

The book is accessible to readers with a background in machine learning or statistics, and can be used in graduate courses or as a reference for researchers. The text includes code snippets that can be copied and pasted, exercises, and an appendix with a summary of the most important technical concepts.

Édition : MIT Press - 288 pages, 1^re édition, 22 décembre 2017

ISBN10 : 0262037319 - ISBN13 : 9780262037310

Commandez sur www.amazon.fr :

42.03 € TTC (prix éditeur 42.03 € TTC)

Statistical and Causal Models
Assumptions for Causal Inference
Cause-Effect Models
Learning Cause-Effect Models
Connections with Machine Learning, I
Multivariate Causal Models
Learning Multivariate Causal Models
Connections with Machine Learning, II
Hidden Variables
Time Series

Critique du livre par la rédaction Thibaut Cuvelier le 21 avril 2019

L'apprentissage automatique est un champ extrêmement développé, mais uniquement pour la découverte de corrélations entre variables. Il est parfois facile de déduire un lien entre des symptômes et une pathologie, mais les algorithmes d'apprentissage ne peuvent pas déterminer qui implique qui : est-ce à cause des symptômes que la maladie est présente ou est-ce l'inverse ? L'apprentissage causal cherche à répondre à ce genre de question, le livre ne porte que sur ce sujet.

Les trois auteurs présentent l'essentiel de ce domaine trop peu connu de la science des données, d'une manière progressive : les premières explications se font « avec les mains », la formalisation suit (à deux variables aléatoires pour commencer, puis dans le cas général). Notamment, les analyses contrefactuelles sont abordées en détail. Les auteurs n'hésitent pas à parsemer leur texte de bouts de code pour faciliter la compréhension des concepts et la mise en pratique des algorithmes. À ce sujet, ils considèrent que le lecteur connaît les bases de R et de quelques bibliothèques pour comprendre ces morceaux de code, aucune explication syntaxique n'étant donnée. Autant que possible, les liens entre les concepts présentés et l'apprentissage automatique sont explicités.

Le style est austère et académique. Des renvois vers des articles scientifiques – y compris ceux des auteurs eux-mêmes dont la modestie ne semble pas souffrir – sont faits à de nombreuses reprises pour approfondir les sujets : la présentation d'un algorithme se limite bien souvent aux idées principales sous-jacentes, le reste étant disponible dans la littérature. Globalement, l'ouvrage n'est pas toujours aussi facile à suivre que l'on espérerait : il est plutôt destiné à des gens qui connaissent déjà les bases de l'inférence de causalité, mais cherchent à approfondir le sujet ou à découvrir d'autres axes de recherche dans le domaine.

À noter : le livre est aussi disponible gratuitement au format PDF.

Commenter Signaler un problème

dourouc05 - Responsable Qt & Livres

l 22/04/2019 à 0:59

Elements of Causal Inference
Foundations and Learning Algorithms

A concise and self-contained introduction to causal inference, increasingly important in data science and machine learning.

The mathematization of causality is a relatively recent development, and has become increasingly important in data science and machine learning. This book offers a self-contained and concise introduction to causal models and how to learn them from data. After explaining the need for causal models and discussing some of the principles underlying causal inference, the book teaches readers how to use causal models: how to compute intervention distributions, how to infer causal models from observational and interventional data, and how causal ideas could be exploited for classical machine learning problems. All of these topics are discussed first in terms of two variables and then in the more general multivariate case. The bivariate case turns out to be a particularly hard problem for causal learning because there are no conditional independences as used by classical methods for solving multivariate cases. The authors consider analyzing statistical asymmetries between cause and effect to be highly instructive, and they report on their decade of intensive research into this problem.

The book is accessible to readers with a background in machine learning or statistics, and can be used in graduate courses or as a reference for researchers. The text includes code snippets that can be copied and pasted, exercises, and an appendix with a summary of the most important technical concepts.

[Lire la suite]

Avez-vous lu ce livre ou pensez-vous le lire ?
Souhaitez-vous ajouter une critique de ce livre sur la page de la rubrique ?
Avez-vous un commentaire à faire ?

couverture du livre Machine Learning for Data Streams

Détails du livre

Sommaire

Critiques (1)

1 commentaire

Machine Learning for Data Streams

With Practical Examples in MOA

de Albert Bifet, Ricard Gavaldà, Geoff Holmes et Bernhard Pfahringer

Public visé : Expert

Résumé de l'éditeur

A hands-on approach to tasks and techniques in data stream mining and real-time analytics, with examples in MOA, a popular freely available open-source software framework.

Today many information sources—including sensor networks, financial markets, social networks, and healthcare monitoring—are so-called data streams, arriving sequentially and at high speed. Analysis must take place in real time, with partial data and without the capacity to store the entire data set. This book presents algorithms and techniques used in data stream mining and real-time analytics. Taking a hands-on approach, the book demonstrates the techniques using MOA (Massive Online Analysis), a popular, freely available open-source software framework, allowing readers to try out the techniques after reading the explanations.

The book first offers a brief introduction to the topic, covering big data mining, basic methodologies for mining data streams, and a simple example of MOA. More detailed discussions follow, with chapters on sketching techniques, change, classification, ensemble methods, regression, clustering, and frequent pattern mining. Most of these chapters include exercises, an MOA-based lab session, or both. Finally, the book discusses the MOA software, covering the MOA graphical user interface, the command line, use of its API, and the development of new methods within MOA. The book will be an essential reference for readers who want to use data stream mining as a tool, researchers in innovation or data stream mining, and programmers who want to create new algorithms for MOA.

Édition : MIT Press - 288 pages, 1^re édition, 2 mars 2018

ISBN10 : 0262037793 - ISBN13 : 9780262037792

Commandez sur www.amazon.fr :

46.84 € TTC (prix éditeur 46.84 € TTC)

Introduction

Introduction
Big Data Stream Mining
Hands-on Introduction to MOA

Stream Mining

Streams and Sketches
Dealing with Change
Classification
Ensemble Methods
Regression
Clustering
Frequent Pattern Mining

The MOA Software

Introduction to MOA and Its Ecosystem
The Graphical User Interface
Using the Command Line
Using the API
Developing New Methods in MOA

Critique du livre par la rédaction Thibaut Cuvelier le 13 avril 2019

L'apprentissage automatique est un domaine aux multiples facettes. Ce livre dépoussière l'une d'entre elles qui n'est que trop peu explorée dans la littérature : l'étude des flux de données, où les algorithmes doivent effectuer des prédictions, mais surtout s'adapter en temps réel à des données disponibles au compte-gouttes (même si ce dernier peut avoir un très bon débit !). Les auteurs font la part belle aux spécificités de ce paradigme : les calculs doivent être effectués très rapidement, on n'a presque pas de temps disponible par échantillon, ni de mémoire d'ailleurs.

Structurellement, on retrouve trois parties bien distinctes :

une introduction très générale au domaine, qui montre néanmoins l'essentiel de MOA, un logiciel dédié aux tâches d'apprentissage dans les flux ;
une présentation plus détaillée des algorithmes applicables à des flux, que ce soit pour les résumer, pour en dériver des modèles de prédiction ou pour explorer les données. Cette partie devrait plaire aux étudiants, professionnels et chercheurs qui souhaitent se lancer dans le domaine, notamment avec ses nombreuses références (pour les détails de certains algorithmes moins intéressants ou trop avancés : on sent un vrai lien entre le livre et la recherche actuelle dans le domaine). Les algorithmes sont détaillés avec un certain niveau de formalisme mathématique, pour bien comprendre ce qu'ils font (et pourquoi ils garantissent une certaine approximation de la réalité) ;
finalement, un guide d'utilisation assez succinct de MOA, avec un bon nombre de captures d'écran du logiciel (imprimées en couleurs !), qui détaille les différents onglets de l'interface graphique (à l'aide de listes très descriptives, mais liées aux autres chapitres de l'ouvrage) et passe rapidement sur les interfaces en ligne de commande et de programmation (ces deux derniers chapitres sont brefs et doivent être complémentés par celui sur l'interface graphique, qui contient les éléments essentiels).

On peut néanmoins reprocher quelques références vers la suite du livre (la section 4.6.2 considère parfois le contenu de la 4.9.2 intégré, par exemple), mais aussi l'omniprésence de MOA : on a l'impression que les auteurs se sont focalisés sur les algorithmes disponibles dans cette boîte à outils, plutôt que de présenter les algorithmes les plus intéressants en général. Cette remarque est toutefois assez mineure, au vu de l'exhaustivité de MOA.

À noter : le livre est aussi disponible gratuitement au format HTML, les auteurs répondant aux commentaires qui leur sont laissés.

Commenter Signaler un problème

Malick - Community Manager

l 22/04/2019 à 0:24

Bonjour chers membres du Club,

Je vous invite à lire la critique que Dourouc05 a faite pour vous au sujet du livre :

Machine Learning for Data Streams
With Practical Examples in MOA

L'apprentissage automatique est un domaine aux multiples facettes. Ce livre dépoussière l'une d'entre elles qui n'est que trop peu explorée dans la littérature : l'étude des flux de données, où les algorithmes doivent effectuer des prédictions, mais surtout s'adapter en temps réel à des données disponibles au compte-gouttes (même si ce dernier peut avoir un très bon débit !).
Les auteurs font la part belle aux spécificités de ce paradigme : les calculs doivent être effectués très rapidement, on n'a presque pas de temps disponible par échantillon, ni de mémoire d'ailleurs..Lire la suite de la critique...

Bonne lecture

dourouc05 - Responsable Qt & Livres

l 13/04/2019 à 19:29

Machine Learning for Data Streams

With Practical Examples in MOA
de Albert Bifet, Ricard Gavaldà, Geoff Holmes et Bernhard Pfahringer

couverture du livre Natural Language Processing with Python

Détails du livre

Sommaire

Critiques (2)

7 commentaires

Natural Language Processing with Python

de Steven Bird, Ewan Klein, and Edward Loper

Public visé : Intermédiaire

Résumé de l'éditeur

This book offers a highly accessible introduction to natural language processing, the field that supports a variety of language technologies, from predictive text and email filtering to automatic summarization and translation. With it, you'll learn how to write Python programs that work with large collections of unstructured text. You'll access richly annotated datasets using a comprehensive range of linguistic data structures, and you'll understand the main algorithms for analyzing the content and structure of written communication.

Packed with examples and exercises, Natural Language Processing with Python will help you:

Extract information from unstructured text, either to guess the topic or identify "named entities"
Analyze linguistic structure in text, including parsing and semantic analysis
Access popular linguistic databases, including WordNet and treebanks
Integrate techniques drawn from fields as diverse as linguistics and artificial intelligence

This book will help you gain practical skills in natural language processing using the Python programming language and the Natural Language Toolkit (NLTK) open source library. If you're interested in developing web applications, analyzing multilingual news sources, or documenting endangered languages -- or if you're simply curious to have a programmer's perspective on how human language works -- you'll find Natural Language Processing with Python both fascinating and immensely useful.

Édition : O'Reilly - 512 pages, 1^re édition, 7 juillet 2009

ISBN10 : 0596516495 - ISBN13 : 9780596516499

Commandez sur www.amazon.fr :

34.47 € TTC (prix éditeur 36.06 € TTC)

Chapter 1. Language Processing and Python
Chapter 2. Accessing Text Corpora and Lexical Resources
Chapter 3. Processing Raw Text
Chapter 4. Writing Structured Programs
Chapter 5. Categorizing and Tagging Words
Chapter 6. Learning to Classify Text
Chapter 7. Extracting Information from Text
Chapter 8. Analyzing Sentence Structure
Chapter 9. Building Feature-Based Grammars
Chapter 10. Analyzing the Meaning of Sentences
Chapter 11. Managing Linguistic Data

Critique du livre par la rédaction Franck Dernoncourt le 1^er février 2012

Utilisé par plus d'une centaine de cours dans le monde et disponible gratuitement en ligne à l'adresse http://www.nltk.org/book (licence CC BY-NC-ND), ce livre offre une excellente introduction au traitement automatique des langues naturelles en expliquant les théories par des exemples concrets d'implémentation. Il se veut donc une introduction pratique au domaine, par opposition à une introduction purement théorique. Chaque chapitre du livre se termine par une série d'exercices classés par ordre de difficulté, mais malheureusement non corrigés.

La particularité principale du livre est qu'il présente de nombreux exemples de code, en se basant sur la bibliothèque open-source et gratuite NLTK (http://www.nltk.org) écrite en Python par notamment les auteurs de ce livre. Très bien documentée, la bibliothèque NLTK offre de nombreuses fonctionnalités de traitement des langues (analyse lexicale, étiquetage grammatical, analyse syntaxique, etc.) tout en interfaçant aussi bien des bases de données tel WordNet que des bibliothèques et logiciels tiers tels l'étiqueteur grammatical Stanford Tagger et le prouveur automatisé Prover9. Un grand nombre de corpus est également disponible via NLTK, ce qui est très appréciable pour mettre en œuvre des processus d'entraînement ainsi que pour réaliser des tests, notamment des tests de performance. Comme le livre présente les nombreuses facettes du traitement automatique des langues naturelles, il parcourt au travers de ses exemples une grande partie des fonctionnalités de NLTK.

La limite principale de la bibliothèque NLTK est les performances de Python en termes de vitesse de calcul. L'utilisation de Python permet toutefois au lecteur de ne pas être trop gêné par la barrière du langage, Python étant à ce jour sans conteste un des langages les plus simples d'accès. Pour ceux n'ayant aucune ou peu d'expérience en Python, certaines sections du livre sont dédiées uniquement à l'explication du langage Python, ce qui permet de rendre l'ouvrage accessible à tout public.

Néanmoins, bien que donnant un aperçu excellent et concret de l'ensemble du traitement automatique des langues naturelles, le focus du livre sur les exemples en Python fait que mécaniquement le livre consacre moins de place aux considérations théoriques. En ce sens, il est un complément idéal au livre de référence Speech and Language Processing (écrit par Daniel Jurafsky et James H. Martin) dont l'approche est beaucoup plus théorique.

Critique du livre par la rédaction Julien Plu le 1^er mai 2012

Ce livre sur NLTK est réellement bien écrit, il n'est pas nécessaire d'avoir une expérience en traitement automatique du langage pour pouvoir aborder cet ouvrage, il vous apprendra tout ce dont vous avez besoin pour comprendre chaque chapitre. La seule obligation est d'avoir une connaissance du langage Python.
Les exemples sont non seulement simples, mais aussi très utiles, car ce sont des choses dont on pourrait avoir besoin dans une application. J'ai principalement aimé les chapitres sur les extractions d'entités nommées, l'apprentissage pour la création d'un classifieur et l'analyse du sens d'une phrase qui sont particulièrement bien faits et expliqués.
La seule remarque que je ferais est le manque de détails sur toutes les possibilités de création et d'utilisation d'une grammaire via les expressions régulières NLTK ou non.

Commenter Signaler un problème

Djug - Expert éminent sénior

l 06/02/2012 à 8:11

Bonjour,

La rédaction de DVP a lu pour vous l'ouvrage suivant: Natural Language Processing with Python, de Steven Bird, Ewan Klein, et Edward Loper.

Envoyé par Résumé de l'éditeur

This book offers a highly accessible introduction to natural language processing, the field that supports a variety of language technologies, from predictive text and email filtering to automatic summarization and translation. With it, you'll learn how to write Python programs that work with large collections of unstructured text. You'll access richly annotated datasets using a comprehensive range of linguistic data structures, and you'll understand the main algorithms for analyzing the content and structure of written communication.

Packed with examples and exercises, Natural Language Processing with Python will help you:

Extract information from unstructured text, either to guess the topic or identify "named entities"
Analyze linguistic structure in text, including parsing and semantic analysis
Access popular linguistic databases, including WordNet and treebanks
Integrate techniques drawn from fields as diverse as linguistics and artificial intelligence

This book will help you gain practical skills in natural language processing using the Python programming language and the Natural Language Toolkit (NLTK) open source library. If you're interested in developing web applications, analyzing multilingual news sources, or documenting endangered languages -- or if you're simply curious to have a programmer's perspective on how human language works -- you'll find Natural Language Processing with Python both fascinating and immensely useful.

L'avez-vous lu? Comptez-vous le lire bientôt?

Quel est votre avis?

Exprimez-vous!! Votre avis nous intéresse.

Franck Dernoncourt - Membre émérite

l 06/02/2012 à 9:00

Voici une liste de définitions que j'ai trouvé intéressantes dans ce livre (les pages indiquées sont sous format n° de page du livre / n° de page de mon PDF) :

hypernym/hyponym relation, i.e., the relation between superordinate and subordinate concepts (p69 / 90)
Another rimportant way to navigate the WordNet network is from items to their components (meronyms) or to the things they are contained in (holonyms) (p710 / 91)
the same dictionary word (or lemma) (p104 / 125)
strip off any affixes, a task known as stemming. (p107 / 128)
Tokenization is the task of cutting a string into identifiable linguistic units that constitute a piece of language data (p109 / 130)
Tokenization is an instance of a more general problem of segmentation. (p112 § 133)
The %s and %d symbols are called conversion specifiers (p118 / 139)
The process of classifying words into their parts-of-speech and labeling them accord-ingly is known as part-of-speech tagging, POS tagging, or simply tagging. Parts-of-speech are also known as word classes or lexical categories. The collection of tagsused for a particular task is known as a tagset. Our emphasis in this chapter is onexploiting tags, and tagging text automatically. (p179 / 200)
As n gets larger, the specificity of the contexts increases, as does the chance that the data we wish to tag contains contexts that were not present in the training data. This is known as the sparse data problem, and is quite pervasive in NLP. As a consequence, there is a trade-off between the accuracy and the coverage of our results (and this is related to the precision/recall trade-off in information retrieval) (p205 / 226)
A convenient way to look at tagging errors is the confusion matrix. It charts expected tags (the gold standard) against actual tags gen-erated by a tagger (p207 / 228)
All languages acquire new lexical items. A list of words recently added to the Oxford Dictionary of English includes cyberslacker, fatoush, blamestorm, SARS, cantopop,bupkis, noughties, muggle, and robata. Notice that all these new words are nouns, and this is reflected in calling nouns an open class. By contrast, prepositions are regarded as a closed class. That is, there is a limited set of words belonging to the class. (p211 / 232)
Common tagsets often capture some morphosyntactic information, that is, informa-tion about the kind of morphological markings that words receive by virtue of theirsyntactic role. (p212 / 233)
Classification is the task of choosing the correct class label for a given input. (p221 / 242)
The first step in creating a classifier is deciding what features of the input are relevant,and how to encode those features. For this example, we’ll start by just looking at thefinal letter of a given name. The following feature extractor function builds a dictionary containing relevant information about a given name. (p223 / 244)
Recognizing the dialogue acts underlying the utterances in a dialogue can be an important first step in understanding the conversation. The NPS Chat Corpus, which was demonstrated in Section 2.1, consists of over 10,000 posts from instant messaging sessions. These posts have all been labeled with one of 15 dialogue act types, such as “Statement,” “Emotion,” “y/n Question,” and “Continuer.” (p235 / 256)
Recognizing textual entailment (RTE) is the task of determining whether a given piece of text T entails another text called the “hypothesis”. (p235 / 256)
A confusion matrix is a table where each cell [i,j] indicates how often label j was pre-dicted when the correct label was i. (p240 / 261)
Numeric features can be converted to binary features by binning, which replaces them with features such as “4<x<6.” (p249 / 270)
Named entities are definite noun phrases that refer to specific types of individuals, such as organizations, persons, dates, and so on. The goal of a named entity recognition (NER) system is to identify all textual men-tions of the named entities. This can be broken down into two subtasks: identifyingthe boundaries of the NE, and identifying its type. (p281 / 302)
Since our grammar licenses two trees for this sentence, the sentence is said to be structurally ambiguous. The ambiguity in question is called a prepositional phrase attachment ambiguity. (p299 / 320)
A grammar is said to be recursive if a category occurring on the left hand side of a production also appears on the righthand side of a production. (p301 / 322)
A parser processes input sentences according to the productions of a grammar, and builds one or more constituent structures that conform to the grammar. A grammar is a declarative specification of well-formedness—it is actually just a string, not a program. A parser is a procedural interpretation of the grammar. It searches through the space of trees licensed by a grammar to find one that has the required sentence alongits fringe. (p302 / 323)
Phrase structure grammar is concerned with how words and sequences of words combine to form constituents. A distinct and complementary approach, dependency grammar, focuses instead on how words relate to other words. (p310 / 331)
A dependency graph is projective if, when all the words are written in linear order, the edges can be drawn above the words without crossing. (p311 / 332)
In the tradition of dependency grammar, the verbs in Table 8-3 (whose dependents have Adj, NP, S and PP, which are often called complements of the respective verbs, are different) are said to have different valencies. (p313 / 335)
This ambiguity is unavoidable, and leads to horrendous inefficiency in parsing seemingly innocuous sentences. The solution to these problems is provided by probabilistic parsing, which allows us to rank the parses of an ambiguous sentence on the basis of evidence from corpora. (p318 / 339)
A probabilistic context-free grammar (or PCFG) is a context-free grammar that as-sociates a probability with each of its productions. It generates the same set of parses for a text that the corresponding context-free grammar does, and assigns a probability to each parse. The probability of a parse generated by a PCFG is simply the product ofthe probabilities of the productions used to generate it. (p320 / 341)
We can see that morphological properties of the verb co-vary with syntactic properties of the subject noun phrase. This co-variance is called agreement. (p329 / 350)
A feature path is a sequence of arcs that can be followed from the root node (p339 / 360)
A more general feature structure subsumes a less general one. (p341 / 362)
Merging information from two feature structures is called unification. (p342 / 363)
The two sentences in (5) can be both true, whereas those in (6) and (7) cannot be. In other words, the sentences in (5) are consistent, whereas those in (6) and (7) are inconsistent. (p365 / 386)
A model for a set W of sentences is a formal representation of a situation in which allthe sentences in W are true. (p367 / 388)
An argument is valid if there is no possible situation in which its premises are all true and its conclusion is not true. (p369 / 390)
In the sentences "Cyril is tall. He likes maths.", we say that he is coreferential with the noun phrase Cyril. (p373 / 394)
In the sentence "Angus had a dog but he disappeared.", "he" is bound by the indefinite NP "a dog", and this is a different relationship than coreference. If we replace the pronoun he by a dog, the result "Angus had a dog but a dog disappeared" is not semantically equivalent to the original sentence "Angus had a dog but he disappeared." (p374 / 395)
In general, an occurrence of a variable x in a formula F is free in F if that occurrence doesn’t fall within the scope of all x or some x in F. Conversely, if x is free in formula F, then it is bound in all x.F and exists x.F. If all variable occurrences in a formulaare bound, the formula is said to be closed. (p375 / 396)
The general process of determining truth or falsity of a formula in a model is called model checking. (p379 / 400)
Principle of Compositionality: the meaning of a whole is a function of the meaningsof the parts and of the way they are syntactically combined. (p385 / 406)
? is a binding operator, just as the first-order logic quantifiers are. (p387 / 408)
A discourse representation structure (DRS) presents the meaning of discourse in terms of a list of discourse referents and a list of conditions.The discourse referents are the things under discussion in the discourse, and they correspond to the individual variables of first-order logic. The DRS conditions apply to those discourse referents, and correspond to atomic open formulas of first-orderlogic. (p397 / 418)
Inline annotation modifies the original document by inserting special symbols or control sequences that carry the annotated information. For example, when part-of-speech tagging a document, the string "fly" might be replacedwith the string "fly/NN", to indicate that the word fly is a noun in this context. Incontrast, standoff annotation does not modify the original document, but instead creates a new file that adds annotation information using pointers that reference the original document. For example, this new document might contain the string "<token id=8pos='NN'/>", to indicate that token 8 is a noun. (p421 / 442)

Un autre dictionnaire de NLP disponible online : http://www.cse.unsw.edu.au/~billw/nlpdict.html

Franck Dernoncourt - Membre émérite

l 06/02/2012 à 20:23

Également, pour ceux intéressés par le sujet, Stanford lance un cours d'introduction au traitement automatique des langues naturelles : http://www.nlp-class.org/