Introduction to Programming Big Data with R (bpdR)
Setting up your environment to use pbdR
Scope and tools available in pbdR
Packages commonly used with Big Data alongside pbdR
Message Passing Interface (MPI)
Using pbdR MPI 5
Summing Matrices with Reduce
Scatter / Gather
Other MPI communications
Creating a distributed diagonal matrix
SVD of a distributed matrix
Building a distributed matrix in parallel
Monte Carlo Integration
Reading on all processes
Broadcasting from one process
Reading partitioned data
Developers involved in projects that use machine learning with Apache Mahout.
Hands on introduction to machine learning. The course is delivered in a lab format based on real world practical use cases.
Implementing Recommendation Systems with Mahout
Introduction to recommender systems
Representing recommender data
Basics of clustering
Clustering quality improvements
Optimizing clustering implementation
Application of clustering in real world
Basics of classification
Classifier quality improvements
This training course is for people that would like to apply Machine Learning in practical applications.
This course is for data scientists and statisticians that have some familiarity with statistics and know how to program R (or Python or other chosen language). The emphasis of this course is on the practical aspects of data/model preparation, execution, post hoc analysis and visualization.
The purpose is to give practical applications to Machine Learning to participants interested in applying the methods at work.
Sector specific examples are used to make the training relevant to the audience.
Bayesian categorical data analysis
Bayesian Graphical Models
Factor Analysis (FA)
Principal Component Analysis (PCA)
Independent Component Analysis (ICA)
Support Vector Machines (SVM) for regression and classification
Hidden Markov Models (HMM)
Space State Models
If you try to make sense out of the data you have access to or want to analyse unstructured data available on the net (like Twitter, Linked in, etc...) this course is for you.
It is mostly aimed at decision makers and people who need to choose what data is worth collecting and what is worth analyzing.
It is not aimed at people configuring the solution, those people will benefit from the big picture though.
During the course delegates will be presented with working examples of mostly open source technologies.
Short lectures will be followed by presentation and simple exercises by the participants
Content and Software used
All software used is updated each time the course is run so we check the newest versions possible.
It covers the process from obtaining, formatting, processing and analysing the data, to explain how to automate decision making process with machine learning.
Structured vs unstructured
Static vs streamed
Attitudinal, behavioural and demographic data
Data-driven vs user-driven analytics
Volume, velocity and variety of data
kGroups, k-means, nearest neighbours
Ant colonies, birds flocking
Support vector machine
Naive Bayes classification
Cost of software
Cost of development
Data Preparation (MapReduce)
Model deployment and integration
Overview of Open Source and commercial software
Selection of R-project package
Hadoop and Mahout
Selected Apache projects related to Big Data and Analytics
Selected commercial solution
Integration with existing software and data sources