HBase for Developers Training Course
This course introduces HBase – a NoSQL store on top of Hadoop. The course is intended for developers who will be using HBase to develop applications, and administrators who will manage HBase clusters.
We will walk a developer through HBase architecture and data modelling and application development on HBase. It will also discuss using MapReduce with HBase, and some administration topics, related to performance optimization. The course is very hands-on with lots of lab exercises.
Duration : 3 days
Audience : Developers & Administrators
Course Outline
- Section 1: Introduction to Big Data & NoSQL
- Big Data ecosystem
- NoSQL overview
- CAP theorem
- When is NoSQL appropriate
- Columnar storage
- HBase and NoSQL
- Section 2 : HBase Intro
- Concepts and Design
- Architecture (HMaster and Region Server)
- Data integrity
- HBase ecosystem
- Lab : Exploring HBase
- Section 3 : HBase Data model
- Namespaces, Tables and Regions
- Rows, columns, column families, versions
- HBase Shell and Admin commands
- Lab : HBase Shell
- Section 3 : Accessing HBase using Java API
- Introduction to Java API
- Read / Write path
- Time Series data
- Scans
- Map Reduce
- Filters
- Counters
- Co-processors
- Labs (multiple) : Using HBase Java API to implement time series , Map Reduce, Filters and counters.
- Section 4 : HBase schema Design : Group session
- students are presented with real world use cases
- students work in groups to come up with design solutions
- discuss / critique and learn from multiple designs
- Labs : implement a scenario in HBase
- Section 5 : HBase Internals
- Understanding HBase under the hood
- Memfile / HFile / WAL
- HDFS storage
- Compactions
- Splits
- Bloom Filters
- Caches
- Diagnostics
- Section 6 : HBase installation and configuration
- hardware selection
- install methods
- common configurations
- Lab : installing HBase
- Section 7 : HBase eco-system
- developing applications using HBase
- interacting with other Hadoop stack (MapReduce, Pig, Hive)
- frameworks around HBase
- advanced concepts (co-processors)
- Labs : writing HBase applications
- Section 8 : Monitoring And Best Practices
- monitoring tools and practices
- optimizing HBase
- HBase in the cloud
- real world use cases of HBase
- Labs : checking HBase vitals
Requirements
- comfortable with Java programming language
- comfortable in Java programming language (navigate Linux command line, edit files with vi / nano)
- A Java IDE like Eclipse or IntelliJ
Lab environment:
A working HBase cluster will be provided for students. Students would need an SSH client and a browser to access the cluster.
Zero Install : There is no need to install HBase software on students’ machines!
Need help picking the right course?
china@nobleprog.com or 400 6116 540
HBase for Developers Training Course - Enquiry
HBase for Developers - Consultancy Enquiry
Consultancy Enquiry
Testimonials (5)
Intresting presentation and excercises
Szymon - Agora SA
Course - Scylla Database
Trainer's preparation & organization, and quality of materials provided on github.
Mateusz Rek - MicroStrategy Poland Sp. z o.o.
Course - Impala for Business Intelligence
It gives me an insight on Redis, and also guide me to the right path if I want to know more about Redis
Ameer Fiqri Barahim - Sarawak Information Systems Sdn Bhd
Course - Redis for High Availability and Performance Training Course
The VM I liked very much The Teacher was very knowledgeable regarding the topic as well as other topics, he was very nice and friendly I liked the facility in Dubai.
Safar Alqahtani - Elm Information Security
Course - Big Data Analytics in Health
Liked very much the interactive way of learning.
Luigi Loiacono
Course - Data Analysis with Hive/HiveQL
Upcoming Courses
Related Courses
Administrator Training for Apache Hadoop
35 HoursAudience:
The course is intended for IT specialists looking for a solution to store and process large data sets in a distributed system environment
Goal:
Deep knowledge on Hadoop cluster administration.
Big Data Analytics in Health
21 HoursBig data analytics involves the process of examining large amounts of varied data sets in order to uncover correlations, hidden patterns, and other useful insights.
The health industry has massive amounts of complex heterogeneous medical and clinical data. Applying big data analytics on health data presents huge potential in deriving insights for improving delivery of healthcare. However, the enormity of these datasets poses great challenges in analyses and practical applications to a clinical environment.
In this instructor-led, live training (remote), participants will learn how to perform big data analytics in health as they step through a series of hands-on live-lab exercises.
By the end of this training, participants will be able to:
- Install and configure big data analytics tools such as Hadoop MapReduce and Spark
- Understand the characteristics of medical data
- Apply big data techniques to deal with medical data
- Study big data systems and algorithms in the context of health applications
Audience
- Developers
- Data Scientists
Format of the Course
- Part lecture, part discussion, exercises and heavy hands-on practice.
Note
- To request a customized training for this course, please contact us to arrange.
Big Data Storage Solution - NoSQL
14 HoursWhen traditional storage technologies don't handle the amount of data you need to store there are hundereds of alternatives. This course try to guide the participants what are alternatives for storing and analyzing Big Data and what are theirs pros and cons.
This course is mostly focused on discussion and presentation of solutions, though hands-on exercises are available on demand.
Big Data & Database Systems Fundamentals
14 HoursThe course is part of the Data Scientist skill set (Domain: Data and Technology).
NoSQL Database with Microsoft Azure Cosmos DB
14 HoursThis instructor-led, live training in China (online or onsite) is aimed at database administrators or developers who wish to use Microsoft Azure Cosmos DB to develop and manage highly responsive and low latency applications.
By the end of this training, participants will be able to:
- Provision the necessary Cosmos DB resources to start building databases and applications.
- Scale application performance and storage by utilizing APIs in Cosmos DB.
- Manage database operations and reduce cost by optimizing Cosmos DB resources.
Hadoop Administration
21 HoursThe course is dedicated to IT specialists that are looking for a solution to store and process large data sets in distributed system environment
Course goal:
Getting knowledge regarding Hadoop cluster administration
Hadoop For Administrators
21 HoursApache Hadoop is the most popular framework for processing Big Data on clusters of servers. In this three (optionally, four) days course, attendees will learn about the business benefits and use cases for Hadoop and its ecosystem, how to plan cluster deployment and growth, how to install, maintain, monitor, troubleshoot and optimize Hadoop. They will also practice cluster bulk data load, get familiar with various Hadoop distributions, and practice installing and managing Hadoop ecosystem tools. The course finishes off with discussion of securing cluster with Kerberos.
“…The materials were very well prepared and covered thoroughly. The Lab was very helpful and well organized”
— Andrew Nguyen, Principal Integration DW Engineer, Microsoft Online Advertising
Audience
Hadoop administrators
Format
Lectures and hands-on labs, approximate balance 60% lectures, 40% labs.
Hadoop for Developers (4 days)
28 HoursApache Hadoop is the most popular framework for processing Big Data on clusters of servers. This course will introduce a developer to various components (HDFS, MapReduce, Pig, Hive and HBase) Hadoop ecosystem.
Advanced Hadoop for Developers
21 HoursApache Hadoop is one of the most popular frameworks for processing Big Data on clusters of servers. This course delves into data management in HDFS, advanced Pig, Hive, and HBase. These advanced programming techniques will be beneficial to experienced Hadoop developers.
Audience: developers
Duration: three days
Format: lectures (50%) and hands-on labs (50%).
Hortonworks Data Platform (HDP) for Administrators
21 HoursThis instructor-led, live training in China (online or onsite) introduces Hortonworks Data Platform (HDP) and walks participants through the deployment of Spark + Hadoop solution.
By the end of this training, participants will be able to:
- Use Hortonworks to reliably run Hadoop at a large scale.
- Unify Hadoop's security, governance, and operations capabilities with Spark's agile analytic workflows.
- Use Hortonworks to investigate, validate, certify and support each of the components in a Spark project.
- Process different types of data, including structured, unstructured, in-motion, and at-rest.
Data Analysis with Hive/HiveQL
7 HoursThis course covers how to use Hive SQL language (AKA: Hive HQL, SQL on Hive, HiveQL) for people who extract data from Hive
Impala for Business Intelligence
21 HoursCloudera Impala is an open source massively parallel processing (MPP) SQL query engine for Apache Hadoop clusters.
Impala enables users to issue low-latency SQL queries to data stored in Hadoop Distributed File System and Apache Hbase without requiring data movement or transformation.
Audience
This course is aimed at analysts and data scientists performing analysis on data stored in Hadoop via Business Intelligence or SQL tools.
After this course delegates will be able to
- Extract meaningful information from Hadoop clusters with Impala.
- Write specific programs to facilitate Business Intelligence in Impala SQL Dialect.
- Troubleshoot Impala.
A Practical Introduction to NoSQL Databases
28 HoursRelational databases have been the technology of choice for storing, retrieving and querying data. Relational databases allow users to organize their data using a structured, well-defined set of patterns (model). While this approach works well for storing data that is standardized and well-understood in advance (think of a hospital check-in application that holds patient records with the same consistent set of pre-defined fields...patient id, first name, last name, date of last visit, etc.), there are limitations to this model. For organizations whose incoming data is not well-defined (think of an online inquiry form for a startup whose still in the process of experimenting with different fields for collecting visitor data, removing and adding fields as they go to accommodate for the changing nature of the business), any established definitions for how the data should fit into an existing database would need to be re-defined regularly. This would require recreating the data model (schema) that dictates the structure of the data and its allowed data types to support different types of data inputs, etc., before any new data could be saved to the database.
Enter NoSQL (Not Only SQL) databases. NoSQL databases free users from having to predefine the structure of the incoming data, allowing them to insert and update new data on the fly. NoSQL databases are often faster than relational databases and can handle very large amounts of data with ease. NoSQL databases also scale better than relational databases, due to their ability to efficiently partition data across many servers (cluster) and load balance the access to this data. NoSQL databases integrate particularly well with applications that support real-time analytics, site personalization, IoT, and mobile apps
In this instructor-led, live training, participants will understand the architecture, design principles and functionality of the most popular NoSQL databases as they setup, operate and asses a number of NoSQL databases in a live lab environment. The goal of this training is to enable participants to intelligently evaluate, propose and implement a suitable NoSQL database solution within their organization.
By the end of this training, participants will be able to:
- Install and configure different types of NoSQL databases, including MongoDB, Cassandra, Redis and Neo4j
- Understand the benefits and disadvantages of NoSQL databases vs relational databases
- Understand the underlying data formats used by NoSQL databases and how these formats can be used to an advantage when developing modern applications (desktop, mobile, cloud, IoT)
- Perform create, insert, update, delete operations in a NoSQL database
- Setup a mixed environment with both a relational database and NoSQL working in tandem
- Configure a cluster of NoSQL database to distribute the processing of very large datasets
- Understand the security implications of using a NoSQL database
- Deploy and scale a NoSQL database in a production environment
Audience
- Database professionals
- Data architects
- Data strategists
- Project managers
- Application developers wishing to integrate a flexible database solution in their application
Format of the Course
- Part lecture, part discussion, exercises and heavy hands-on practice
Note
- To request a customized training for this course, please contact us to arrange.
Scylla Database
21 HoursScylla is an open-source distributed NoSQL data store. It is compatible with Apache Cassandra but performs at significantly higher throughputs and lower latencies.
In this course, participants will learn about Scylla's features and architecture while obtaining practical experience with setting up, administering, monitoring, and troubleshooting Scylla.
Audience
- Database administrators
- Developers
- System Engineers
Format of the course
- The course is interactive and includes discussions of the principles and approaches for deploying and managing Scylla distributed databases and clusters.
- The course includes a heavy component of hands-on exercises and practice.