Hadoop for Developers (4 days)培训

课程编码

hadoopdev

课程时长

28 小时 通常来说是4天,包括中间休息。

要求

  • comfortable with Java programming language (most programming exercises are in java)
  • comfortable in Linux environment (be able to navigate Linux command line, edit files using vi / nano)

Lab environment

Zero Install : There is no need to install hadoop software on students’ machines! A working hadoop cluster will be provided for students.

Students will need the following

  • an SSH client (Linux and Mac already have ssh clients, for Windows Putty is recommended)
  • a browser to access the cluster. We recommend Firefox browser

课程概览

Apache Hadoop是用于处理服务器群集上的Big Data的最流行的框架。本课程将向开发人员介绍各种组件(HDFS,MapReduce,Pig, Hive和HBase)的Hadoop生态系统。

    Machine Translated

    课程大纲

    Section 1: Introduction to Hadoop

    • hadoop history, concepts
    • eco system
    • distributions
    • high level architecture
    • hadoop myths
    • hadoop challenges
    • hardware / software
    • lab : first look at Hadoop

    Section 2: HDFS

    • Design and architecture
    • concepts (horizontal scaling, replication, data locality, rack awareness)
    • Daemons : Namenode, Secondary namenode, Data node
    • communications / heart-beats
    • data integrity
    • read / write path
    • Namenode High Availability (HA), Federation
    • labs : Interacting with HDFS

    Section 3 : Map Reduce

    • concepts and architecture
    • daemons (MRV1) : jobtracker / tasktracker
    • phases : driver, mapper, shuffle/sort, reducer
    • Map Reduce Version 1 and Version 2 (YARN)
    • Internals of Map Reduce
    • Introduction to Java Map Reduce program
    • labs : Running a sample MapReduce program

    Section 4 : Pig

    • pig vs java map reduce
    • pig job flow
    • pig latin language
    • ETL with Pig
    • Transformations & Joins
    • User defined functions (UDF)
    • labs : writing Pig scripts to analyze data

    Section 5: Hive

    • architecture and design
    • data types
    • SQL support in Hive
    • Creating Hive tables and querying
    • partitions
    • joins
    • text processing
    • labs : various labs on processing data with Hive

    Section 6: HBase

    • concepts and architecture
    • hbase vs RDBMS vs cassandra
    • HBase Java API
    • Time series data on HBase
    • schema design
    • labs : Interacting with HBase using shell;   programming in HBase Java API ; Schema design exercise

    客户评论

    ★★★★★
    ★★★★★

    课程分类

    促销课程

    订阅促销课程

    为尊重您的隐私,我公司不会把您的邮箱地址提供给任何人。您可以享有优先权和随时取消订阅的权利。

    我们的客户

    is growing fast!

    We are looking to expand our presence in China!

    As a Business Development Manager you will:

    • expand business in China
    • recruit local talent (sales, agents, trainers, consultants)
    • recruit local trainers and consultants

    We offer:

    • Artificial Intelligence and Big Data systems to support your local operation
    • high-tech automation
    • continuously upgraded course catalogue and content
    • good fun in international team

    If you are interested in running a high-tech, high-quality training and consulting business.

    Apply now!