数据挖掘培训,Data Mining培训


Data Mining with R

very tailored to needs

Yashan Wang - MoneyGram International


代码 名字 期限 概览
kdd Knowledge Discover in Databases (KDD) 21小时 Knowledge discovery in databases (KDD) is the process of discovering useful knowledge from a collection of data. Real-life applications for this data mining technique include marketing, fraud detection, telecommunication and manufacturing. In this course, we introduce the processes involved in KDD and carry out a series of exercises to practice the implementation of those processes. Audience     Data analysts or anyone interested in learning how to interpret data to solve problems Format of the course     After a theoretical discussion of KDD, the instructor will present real-life cases which call for the application of KDD to solve a problem. Participants will prepare, select and cleanse sample data sets and use their prior knowledge about the data to propose solutions based on the results of their observations. Introduction     KDD vs data mining Establishing the application domain Establishing relevant prior knowledge Understanding the goal of the investigation Creating a target data set Data cleaning and preprocessing Data reduction and projection Choosing the data mining task Choosing the data mining algorithms Interpreting the mined patterns
druid Druid: Build a fast, real-time data analysis system 21小时 Druid is an open-source, column-oriented, distributed data store written in Java. It was designed to quickly ingest massive quantities of event data and execute low-latency OLAP queries on that data. Druid is commonly used in business intelligence applications to analyze high volumes of real-time and historical data. It is also well suited for powering fast, interactive, analytic dashboards for end-users. Druid is used by companies such as Alibaba, Airbnb, Cisco, eBay, Netflix, Paypal, and Yahoo. In this course we explore some of the limitations of data warehouse solutions and discuss how Druid can compliment those technologies to form a flexible and scalable streaming analytics stack. We walk through many examples, offering participants the chance to implement and test Druid-based solutions in a lab environment. Audience     Application developers     Software engineers     Technical consultants     DevOps professionals     Architecture engineers Format of the course     Part lecture, part discussion, heavy hands-on practice, occasional tests to gauge understanding Introduction Installing and starting Druid Druid architecture and design Real-time ingestion of event data Sharding and indexing Loading data Querying data Visualizing data Running a distributed cluster Druid + Apache Hive Druid + Apache Kafka Druid + others Troubleshooting Administrative tasks
BigData_ A practical introduction to Data Analysis and Big Data 28小时 Participants who complete this training will gain a practical, real-world understanding of Big Data and its related technologies, methodologies and tools. Participants will have the opportunity to put this knowledge into practice through hands-on exercises. Group interaction and instructor feedback make up an important component of the class. The course starts with an introduction to elemental concepts of Big Data, then progresses into the programming languages and methodologies used to perform Data Analysis. Finally, we discuss the tools and infrastructure that enable Big Data storage, Distributed Processing, and Scalability. Audience Developers / programmers IT consultants Format of the course     Part lecture, part discussion, heavy hands-on practice and implementation, occasional quizing to measure progress. Introduction to Data Analysis and Big Data What makes Big Data "big"? Velocity, Volume, Variety, Veracity (VVVV) Limits to traditional Data Processing Distributed Processing Statistical Analysis Types of Machine Learning Analysis Data Visualization Distributed Processing MapReduce Languages used for Data Analysis R language (crash course) Python (crash course) Approaches to Data Analysis Statistical Analysis Time Series analysis Forecasting with Correlation and Regression models Inferential Statistics (estimating) Descriptive Statistics in Big Data sets (e.g. calculating mean) Machine Learning Supervised vs unsupervised learning Classification and clustering Estimating cost of specific methods Filter Natural Language Processing Processing text Understaing meaning of the text Automatic text generation Sentiment/Topic Analysis Computer Vision Big Data infrastructure Data Storage Relational databases (SQL) MySQL Postgres Oracle Non-relational databases (NoSQL) Cassandra MongoDB Neo4js Understanding the nuances: hierarchical, object-oriented, document-oriented, graph-oriented, etc. Distributed File Systems HDFS Search Engines ElasticSearch Distributed Processing Spark Machine Learning libraries: MLlib Spark SQL Scalability Public cloud AWS, Google, Aliyun, etc. Private cloud OpenStack, Cloud Foundry, etc. Auto-scalability Choosing right solution for the problem  
mdlmrah Model MapReduce and Apache Hadoop 14小时 The course is intended for IT specialist that works with the distributed processing of large data sets across clusters of computers. Data Mining and Business Intelligence Introduction Area of application Capabilities Basics of data exploration Big data What does Big data stand for? Big data and Data mining MapReduce Model basics Example application Stats Cluster model Hadoop What is Hadoop Installation Configuration Cluster settings Architecture and configuration of Hadoop Distributed File System Console tools DistCp tool MapReduce and Hadoop Streaming Administration and configuration of Hadoop On Demand Alternatives
sspsspas Statistics with SPSS Predictive Analytics Software 14小时 Goal: Learning to work with SPSS at the level of independence The addressees: Analysts, researchers, scientists, students and all those who want to acquire the ability to use SPSS package and learn popular data mining techniques. Using the program The dialog boxes input / downloading data the concept of variable and measuring scales preparing a database Generate tables and graphs formatting of the report Command language syntax automated analysis storage and modification procedures create their own analytical procedures Data Analysis descriptive statistics Key terms: eg variable, hypothesis, statistical significance measures of central tendency measures of dispersion measures of central tendency standardization Introduction to research the relationships between variables correlational and experimental methods Summary: This case study and discussion
bdbiga Big Data Business Intelligence for Govt. Agencies 35小时 Advances in technologies and the increasing amount of information are transforming how business is conducted in many industries, including government. Government data generation and digital archiving rates are on the rise due to the rapid growth of mobile devices and applications, smart sensors and devices, cloud computing solutions, and citizen-facing portals. As digital information expands and becomes more complex, information management, processing, storage, security, and disposition become more complex as well. New capture, search, discovery, and analysis tools are helping organizations gain insights from their unstructured data. The government market is at a tipping point, realizing that information is a strategic asset, and government needs to protect, leverage, and analyze both structured and unstructured information to better serve and meet mission requirements. As government leaders strive to evolve data-driven organizations to successfully accomplish mission, they are laying the groundwork to correlate dependencies across events, people, processes, and information. High-value government solutions will be created from a mashup of the most disruptive technologies: Mobile devices and applications Cloud services Social business technologies and networking Big Data and analytics IDC predicts that by 2020, the IT industry will reach $5 trillion, approximately $1.7 trillion larger than today, and that 80% of the industry's growth will be driven by these 3rd Platform technologies. In the long term, these technologies will be key tools for dealing with the complexity of increased digital information. Big Data is one of the intelligent industry solutions and allows government to make better decisions by taking action based on patterns revealed by analyzing large volumes of data — related and unrelated, structured and unstructured. But accomplishing these feats takes far more than simply accumulating massive quantities of data.“Making sense of thesevolumes of Big Datarequires cutting-edge tools and technologies that can analyze and extract useful knowledge from vast and diverse streams of information,” Tom Kalil and Fen Zhao of the White House Office of Science and Technology Policy wrote in a post on the OSTP Blog. The White House took a step toward helping agencies find these technologies when it established the National Big Data Research and Development Initiative in 2012. The initiative included more than $200 million to make the most of the explosion of Big Data and the tools needed to analyze it. The challenges that Big Data poses are nearly as daunting as its promise is encouraging. Storing data efficiently is one of these challenges. As always, budgets are tight, so agencies must minimize the per-megabyte price of storage and keep the data within easy access so that users can get it when they want it and how they need it. Backing up massive quantities of data heightens the challenge. Analyzing the data effectively is another major challenge. Many agencies employ commercial tools that enable them to sift through the mountains of data, spotting trends that can help them operate more efficiently. (A recent study by MeriTalk found that federal IT executives think Big Data could help agencies save more than $500 billion while also fulfilling mission objectives.). Custom-developed Big Data tools also are allowing agencies to address the need to analyze their data. For example, the Oak Ridge National Laboratory’s Computational Data Analytics Group has made its Piranha data analytics system available to other agencies. The system has helped medical researchers find a link that can alert doctors to aortic aneurysms before they strike. It’s also used for more mundane tasks, such as sifting through résumés to connect job candidates with hiring managers. Each session is 2 hours Day-1: Session -1: Business Overview of Why Big Data Business Intelligence in Govt. Case Studies from NIH, DoE Big Data adaptation rate in Govt. Agencies & and how they are aligning their future operation around Big Data Predictive Analytics Broad Scale Application Area in DoD, NSA, IRS, USDA etc. Interfacing Big Data with Legacy data Basic understanding of enabling technologies in predictive analytics Data Integration & Dashboard visualization Fraud management Business Rule/ Fraud detection generation Threat detection and profiling Cost benefit analysis for Big Data implementation Day-1: Session-2 : Introduction of Big Data-1 Main characteristics of Big Data-volume, variety, velocity and veracity. MPP architecture for volume. Data Warehouses – static schema, slowly evolving dataset MPP Databases like Greenplum, Exadata, Teradata, Netezza, Vertica etc. Hadoop Based Solutions – no conditions on structure of dataset. Typical pattern : HDFS, MapReduce (crunch), retrieve from HDFS Batch- suited for analytical/non-interactive Volume : CEP streaming data Typical choices – CEP products (e.g. Infostreams, Apama, MarkLogic etc) Less production ready – Storm/S4 NoSQL Databases – (columnar and key-value): Best suited as analytical adjunct to data warehouse/database Day-1 : Session -3 : Introduction to Big Data-2 NoSQL solutions KV Store - Keyspace, Flare, SchemaFree, RAMCloud, Oracle NoSQL Database (OnDB) KV Store - Dynamo, Voldemort, Dynomite, SubRecord, Mo8onDb, DovetailDB KV Store (Hierarchical) - GT.m, Cache KV Store (Ordered) - TokyoTyrant, Lightcloud, NMDB, Luxio, MemcacheDB, Actord KV Cache - Memcached, Repcached, Coherence, Infinispan, EXtremeScale, JBossCache, Velocity, Terracoqua Tuple Store - Gigaspaces, Coord, Apache River Object Database - ZopeDB, DB40, Shoal Document Store - CouchDB, Cloudant, Couchbase, MongoDB, Jackrabbit, XML-Databases, ThruDB, CloudKit, Prsevere, Riak-Basho, Scalaris Wide Columnar Store - BigTable, HBase, Apache Cassandra, Hypertable, KAI, OpenNeptune, Qbase, KDI Varieties of Data: Introduction to Data Cleaning issue in Big Data RDBMS – static structure/schema, doesn’t promote agile, exploratory environment. NoSQL – semi structured, enough structure to store data without exact schema before storing data Data cleaning issues Day-1 : Session-4 : Big Data Introduction-3 : Hadoop When to select Hadoop? STRUCTURED - Enterprise data warehouses/databases can store massive data (at a cost) but impose structure (not good for active exploration) SEMI STRUCTURED data – tough to do with traditional solutions (DW/DB) Warehousing data = HUGE effort and static even after implementation For variety & volume of data, crunched on commodity hardware – HADOOP Commodity H/W needed to create a Hadoop Cluster Introduction to Map Reduce /HDFS MapReduce – distribute computing over multiple servers HDFS – make data available locally for the computing process (with redundancy) Data – can be unstructured/schema-less (unlike RDBMS) Developer responsibility to make sense of data Programming MapReduce = working with Java (pros/cons), manually loading data into HDFS Day-2: Session-1: Big Data Ecosystem-Building Big Data ETL: universe of Big Data Tools-which one to use and when? Hadoop vs. Other NoSQL solutions For interactive, random access to data Hbase (column oriented database) on top of Hadoop Random access to data but restrictions imposed (max 1 PB) Not good for ad-hoc analytics, good for logging, counting, time-series Sqoop - Import from databases to Hive or HDFS (JDBC/ODBC access) Flume – Stream data (e.g. log data) into HDFS Day-2: Session-2: Big Data Management System Moving parts, compute nodes start/fail :ZooKeeper - For configuration/coordination/naming services Complex pipeline/workflow: Oozie – manage workflow, dependencies, daisy chain Deploy, configure, cluster management, upgrade etc (sys admin) :Ambari In Cloud : Whirr Day-2: Session-3: Predictive analytics in Business Intelligence -1: Fundamental Techniques & Machine learning based BI : Introduction to Machine learning Learning classification techniques Bayesian Prediction-preparing training file Support Vector Machine KNN p-Tree Algebra & vertical mining Neural Network Big Data large variable problem -Random forest (RF) Big Data Automation problem – Multi-model ensemble RF Automation through Soft10-M Text analytic tool-Treeminer Agile learning Agent based learning Distributed learning Introduction to Open source Tools for predictive analytics : R, Rapidminer, Mahut Day-2: Session-4 Predictive analytics eco-system-2: Common predictive analytic problems in Govt. Insight analytic Visualization analytic Structured predictive analytic Unstructured predictive analytic Threat/fraudstar/vendor profiling Recommendation Engine Pattern detection Rule/Scenario discovery –failure, fraud, optimization Root cause discovery Sentiment analysis CRM analytic Network analytic Text Analytics Technology assisted review Fraud analytic Real Time Analytic Day-3 : Sesion-1 : Real Time and Scalable Analytic Over Hadoop Why common analytic algorithms fail in Hadoop/HDFS Apache Hama- for Bulk Synchronous distributed computing Apache SPARK- for cluster computing for real time analytic CMU Graphics Lab2- Graph based asynchronous approach to distributed computing KNN p-Algebra based approach from Treeminer for reduced hardware cost of operation Day-3: Session-2: Tools for eDiscovery and Forensics eDiscovery over Big Data vs. Legacy data – a comparison of cost and performance Predictive coding and technology assisted review (TAR) Live demo of a Tar product ( vMiner) to understand how TAR works for faster discovery Faster indexing through HDFS –velocity of data NLP or Natural Language processing –various techniques and open source products eDiscovery in foreign languages-technology for foreign language processing Day-3 : Session 3: Big Data BI for Cyber Security –Understanding whole 360 degree views of speedy data collection to threat identification Understanding basics of security analytics-attack surface, security misconfiguration, host defenses Network infrastructure/ Large datapipe / Response ETL for real time analytic Prescriptive vs predictive – Fixed rule based vs auto-discovery of threat rules from Meta data Day-3: Session 4: Big Data in USDA : Application in Agriculture Introduction to IoT ( Internet of Things) for agriculture-sensor based Big Data and control Introduction to Satellite imaging and its application in agriculture Integrating sensor and image data for fertility of soil, cultivation recommendation and forecasting Agriculture insurance and Big Data Crop Loss forecasting Day-4 : Session-1: Fraud prevention BI from Big Data in Govt-Fraud analytic: Basic classification of Fraud analytics- rule based vs predictive analytics Supervised vs unsupervised Machine learning for Fraud pattern detection Vendor fraud/over charging for projects Medicare and Medicaid fraud- fraud detection techniques for claim processing Travel reimbursement frauds IRS refund frauds Case studies and live demo will be given wherever data is available. Day-4 : Session-2: Social Media Analytic- Intelligence gathering and analysis Big Data ETL API for extracting social media data Text, image, meta data and video Sentiment analysis from social media feed Contextual and non-contextual filtering of social media feed Social Media Dashboard to integrate diverse social media Automated profiling of social media profile Live demo of each analytic will be given through Treeminer Tool. Day-4 : Session-3: Big Data Analytic in image processing and video feeds Image Storage techniques in Big Data- Storage solution for data exceeding petabytes LTFS and LTO GPFS-LTFS ( Layered storage solution for Big image data) Fundamental of image analytics Object recognition Image segmentation Motion tracking 3-D image reconstruction Day-4: Session-4: Big Data applications in NIH: Emerging areas of Bio-informatics Meta-genomics and Big Data mining issues Big Data Predictive analytic for Pharmacogenomics, Metabolomics and Proteomics Big Data in downstream Genomics process Application of Big data predictive analytics in Public health Big Data Dashboard for quick accessibility of diverse data and display : Integration of existing application platform with Big Data Dashboard Big Data management Case Study of Big Data Dashboard: Tableau and Pentaho Use Big Data app to push location based services in Govt. Tracking system and management Day-5 : Session-1: How to justify Big Data BI implementation within an organization: Defining ROI for Big Data implementation Case studies for saving Analyst Time for collection and preparation of Data –increase in productivity gain Case studies of revenue gain from saving the licensed database cost Revenue gain from location based services Saving from fraud prevention An integrated spreadsheet approach to calculate approx. expense vs. Revenue gain/savings from Big Data implementation. Day-5 : Session-2: Step by Step procedure to replace legacy data system to Big Data System: Understanding practical Big Data Migration Roadmap What are the important information needed before architecting a Big Data implementation What are the different ways of calculating volume, velocity, variety and veracity of data How to estimate data growth Case studies Day-5: Session 4: Review of Big Data Vendors and review of their products. Q/A session: Accenture APTEAN (Formerly CDC Software) Cisco Systems Cloudera Dell EMC GoodData Corporation Guavus Hitachi Data Systems Hortonworks HP IBM Informatica Intel Jaspersoft Microsoft MongoDB (Formerly 10Gen) MU Sigma Netapp Opera Solutions Oracle Pentaho Platfora Qliktech Quantum Rackspace Revolution Analytics Salesforce SAP SAS Institute Sisense Software AG/Terracotta Soft10 Automation Splunk Sqrrl Supermicro Tableau Software Teradata Think Big Analytics Tidemark Systems Treeminer VMware (Part of EMC)
datamin Data Mining 21小时 Course can be provided with any tools, including free open-source data mining software and applicationsIntroduction Data mining as the analysis step of the KDD process ("Knowledge Discovery in Databases") Subfield of computer science Discovering patterns in large data sets Sources of methods Artificial intelligence Machine learning Statistics Database systems What is involved? Database and data management aspects Data pre-processing Model and inference considerations Interestingness metrics Complexity considerations Post-processing of discovered structures Visualization Online updating Data mining main tasks Automatic or semi-automatic analysis of large quantities of data Extracting previously unknown interesting patterns groups of data records (cluster analysis) unusual records (anomaly detection) dependencies (association rule mining) Data mining Anomaly detection (Outlier/change/deviation detection) Association rule learning (Dependency modeling) Clustering Classification Regression Summarization Use and applications Able Danger Behavioral analytics Business analytics Cross Industry Standard Process for Data Mining Customer analytics Data mining in agriculture Data mining in meteorology Educational data mining Human genetic clustering Inference attack Java Data Mining Open-source intelligence Path analysis (computing) Reactive business intelligence Data dredging, data fishing, data snooping
d2dbdpa From Data to Decision with Big Data and Predictive Analytics 21小时 Audience If you try to make sense out of the data you have access to or want to analyse unstructured data available on the net (like Twitter, Linked in, etc...) this course is for you. It is mostly aimed at decision makers and people who need to choose what data is worth collecting and what is worth analyzing. It is not aimed at people configuring the solution, those people will benefit from the big picture though. Delivery Mode During the course delegates will be presented with working examples of mostly open source technologies. Short lectures will be followed by presentation and simple exercises by the participants Content and Software used All software used is updated each time the course is run so we check the newest versions possible. It covers the process from obtaining, formatting, processing and analysing the data, to explain how to automate decision making process with machine learning. Quick Overview Data Sources Minding Data Recommender systems Target Marketing Datatypes Structured vs unstructured Static vs streamed Attitudinal, behavioural and demographic data Data-driven vs user-driven analytics data validity Volume, velocity and variety of data Models Building models Statistical Models Machine learning Data Classification Clustering kGroups, k-means, nearest neighbours Ant colonies, birds flocking Predictive Models Decision trees Support vector machine Naive Bayes classification Neural networks Markov Model Regression Ensemble methods ROI Benefit/Cost ratio Cost of software Cost of development Potential benefits Building Models Data Preparation (MapReduce) Data cleansing Choosing methods Developing model Testing Model Model evaluation Model deployment and integration Overview of Open Source and commercial software Selection of R-project package Python libraries Hadoop and Mahout Selected Apache projects related to Big Data and Analytics Selected commercial solution Integration with existing software and data sources
pmml Predictive Models with PMML 7小时 The course is created to scientific, developers, analysts or any other people who want to standardize or exchange their models with Predictive Model Markup Language (PMML) file format.Predictive Models Intro to predictive models Predictive models supported by PMML PMML Elements Header Data Dictionary Data Transformations Model Mining Schema Targets Output API Overview of API providers for PMML Executing your model in a cloud
dataminr Data Mining with R 14小时 Sources of methods Artificial intelligence Machine learning Statistics Sources of data Pre processing of data Data Import/Export Data Exploration and Visualization Dimensionality Reduction Dealing with missing values R Packages Data mining main tasks Automatic or semi-automatic analysis of large quantities of data Extracting previously unknown interesting patterns groups of data records (cluster analysis) unusual records (anomaly detection) dependencies (association rule mining) Data mining Anomaly detection (Outlier/change/deviation detection) Association rule learning (Dependency modeling) Clustering Classification Regression Summarization Frequent Pattern Mining Text Mining Decision Trees Regression Neural Networks Sequence Mining Frequent Pattern Mining Data dredging, data fishing, data snooping
datashrinkgov Data Shrinkage for Government 14小时 Why shrink data Relational databases Introduction Aggregation and disaggregation Normalisation and denormalisation Null values and zeroes Joining data Complex joins Cluster analysis Applications Strengths and weaknesses Measuring distance Hierarchical clustering K-means and derivatives Applications in Government Factor analysis Concepts Exploratory factor analysis Confirmatory factor analysis Principal component analysis Correspondence analysis Software Applications in Government Predictive analytics Timelines and naming conventions Holdout samples Weights of evidence Information value Scorecard building demonstration using a spreadsheet Regression in predictive analytics Logistic regression in predictive analytics Decision Trees in predictive analytics Neural networks Measuring accuracy Applications in Government
matlab2 MATLAB 基础 21小时 MATLAB软件简介 MATLAB(矩阵实验室)是MATrix LABoratory的缩写,是一款由美国The MathWorks公司出品的商业科学计算和仿真软件.MATLAB拥有一套可用于算法开发,数据可视化,数据分析以及数值计算的高级技术计算语言和交互式环境.除了矩阵运算,求解线性系统方程,绘制函数/数据图像等常用功能外,MATLAB还可以用来创建用户界面及与调用其它语言(包括C,C++,Java,Python和FORTRAN)编写的程序。 尽管MATLAB最初主要用于科学计算,但其不断增加的各种附加工具箱(到目前为止将近100个)使之适合不同领域和行业的应用,如控制系统设计与分析,生物医疗,图像处理,信号处理与通讯,金融建模和分析,汽车,航天航空等。另外还有一个基于模型化设计(MBD)的图形化仿真软件包Simulink用于系统模拟,代码生成,动态/嵌入式系统开发等方面. 培训目的  本课程将全面介绍MATLAB科学技术计算环境,旨在于使初学者迅速掌握MATLAB原理,在课程结束后可以: -> 熟悉MATLAB界面,查找帮助; -> 键入命令,进行变量,向量和矩阵的基本操作; -> 对数据进行多种可视化展示; -> 处理数据文件和不同数据类型; -> 编写脚本和函数,并在其中包含必要的逻辑和分支控制; -> 读写文本和二进制文件 课程特色 本次课程使用MATLAB2014a用于演示。本着由浅入深,注重实践,重点问题反复强调的原则,不拘泥于PPT讲义,尽量多使用实例进行示范操作.   课程大纲 1. MATLAB产品介绍 1.1 一个例子 C vs MATLAB 1.2 MATLAB产品总览 1.3 MATLAB应用领域 1.4 MATLAB能为您做些什么 1.5 MATLAB基础课程大纲 2. 使用MATLAB界面 Use MATLAB Interface 目标: 介绍MATLAB集成开发界面的主要特性和一些基本数据,文件,图形可视化操作 2.1 MATLAB界面介绍 2.2 从文件读入数据 2.3 保存和载入变量 2.4 为数据绘制图形 2.5 绘图工具 2.6 数据基础分析和拟合工具 2.7 为其他应用导出数据 3. MATLAB变量和表达式 目标: 键入MATLAB命令,强调如何创建和访问变量中的数据 3.1 输入命令 3.2 创建变量 3.3 获得帮助 3.4 访问和修改变量值 3.5 生成字符变量 4. MATLAB向量 目标: 对向量进行数学和统计计算并使其可视化. 4.1 向量计算 4.2 向量绘图 4.3 基本绘图选项 4.4 为向量做标注 5. MATLAB矩阵 目标: 使用矩阵作为数学对象或者向量的集合,理解如何在不同的应用中使用恰当的MATLAB语法 5.1矩阵尺寸和维度 5.2 矩阵计算 5.3 矩阵统计 5.4 为多个列的数据绘制图形 5.5 改变矩阵数据排列 5.6 多维矩阵 6. MATLAB脚本 目标: 将多个MATLAB命令创建成一个脚本以便反复使用. 6.1 一个建模的例子 6.2 追溯历史命令 6.3 创建脚本 6.4运行脚本 6.5注释和代码单元 6.6发布代码 7. 处理数据文件 目标: 从文件中导入数据.对于文件中不同数据格式应用细胞阵列等混合数据类型 7.1 导入数据 7.2 混合数据类型 7.3 细胞阵列 7.4 数字,字符串和细胞阵列转换 7.5 导出数据 8. 多向量绘图 目标: 为更为复杂的向量和公式绘图并利用MATLAB命令进行标注 8.1 图形结构 8.2 绘制子图形 8.3 为公式绘制图形 8.4 使用颜色 8.5 修改图形属性 9. MATLAB逻辑和流程控制 目标: 使用逻辑操作,变量和索引技巧来创建可适用于不同条件的代码. 9.1 逻辑操作和变量 9.2 按逻辑值索引 9.3 编程结构 9.4 流程控制 9.5 循环 10. MATLAB矩阵和图像可视化 目标: 可视化二维或三维图像和矩阵数据 10.1 分散数据插值 10.2 三维矩阵可视化 10.3 二维矩阵可视化 10.4 索引图像和色彩映射表 10.5 真彩色图像 11 MATLAB数据分析 目标: 在MATLAB中执行典型的数据分析任务,包括拟合理论模型. 11.1 处理丢失数据 11.2 求解相关性 11.3 平滑数据 11.4 频域分析和傅里叶变换 12 MATLAB函数 目标: 将脚本进一步编写成函数,加大执行任务的自动化程度 12.1 为何使用函数 12.2 创建函数 12.3 添加注释 12.4 调用函数和子函数 12.5 提高编程效率 12.6 MATLAB工作区 12.7 MATLAB路径和调用优先级 13 MATLAB数据类型 目标:进一步探讨MATLAB中的数据结构和数据类型转换 13.1 数据类型总览 13.2 整数 13.3 结构体 13.4 数据类型转换 14 MATLAB文件处理 目标: 如何导入,导出,和控制底层数据以及读写文本和二进制文件 14.1 打开关闭文件 14.2 读写文本文件 14.3 读写二进制文件 15 MATLAB基础教程总结 目标: 总结MATLAB基础课程,回顾MATLAB一些重要的基本操作 15.1课程总结 15.2 其他课程   请注意实际课程可能会与上述大纲有细微差别
datavis1 Data Visualization 28小时 This course is intended for engineers and decision makers working in data mining and knoweldge discovery. You will learn how to create effective plots and ways to present and represent your data in a way that will appeal to the decision makers and help them to understand hidden information. Day 1: what is data visualization why it is important data visualization vs data mining human cognition HMI common pitfalls Day 2: different type of curves drill down curves categorical data plotting multi variable plots data glyph and icon representation Day 3: plotting KPIs with data R and X charts examples what if dashboards parallel axes mixing categorical data with numeric data Day 4: different hats of data visualization how can data visualization lie disguised and hidden trends a case study of student data visual queries and region selection
dsbda Data Science for Big Data Analytics 35小时 Introduction to Data Science for Big Data Analytics Data Science Overview Big Data Overview Data Structures Drivers and complexities of Big Data Big Data ecosystem and a new approach to analytics Key technologies in Big Data Data Mining process and problems Association Pattern Mining Data Clustering Outlier Detection Data Classification Introduction to Data Analytics lifecycle Discovery Data preparation Model planning Model building Presentation/Communication of results Operationalization Exercise: Case study From this point most of the training time (80%) will be spent on examples and exercises in R and related big data technology. Getting started with R Installing R and Rstudio Features of R language Objects in R Data in R Data manipulation Big data issues Exercises Getting started with Hadoop Installing Hadoop Understanding Hadoop modes HDFS MapReduce architecture Hadoop related projects overview Writing programs in Hadoop MapReduce Exercises Integrating R and Hadoop with RHadoop Components of RHadoop Installing RHadoop and connecting with Hadoop The architecture of RHadoop Hadoop streaming with R Data analytics problem solving with RHadoop Exercises Pre-processing and preparing data Data preparation steps Feature extraction Data cleaning Data integration and transformation Data reduction – sampling, feature subset selection, Dimensionality reduction Discretization and binning Exercises and Case study Exploratory data analytic methods in R Descriptive statistics Exploratory data analysis Visualization – preliminary steps Visualizing single variable Examining multiple variables Statistical methods for evaluation Hypothesis testing Exercises and Case study Data Visualizations Basic visualizations in R Packages for data visualization ggplot2, lattice, plotly, lattice Formatting plots in R Advanced graphs Exercises Regression (Estimating future values) Linear regression Use cases Model description Diagnostics Problems with linear regression Shrinkage methods, ridge regression, the lasso Generalizations and nonlinearity Regression splines Local polynomial regression Generalized additive models Regression with RHadoop Exercises and Case study Classification The classification related problems Bayesian refresher Naïve Bayes Logistic regression K-nearest neighbors Decision trees algorithm Neural networks Support vector machines Diagnostics of classifiers Comparison of classification methods Scalable classification algorithms Exercises and Case study Assessing model performance and selection Bias, Variance and model complexity Accuracy vs Interpretability Evaluating classifiers Measures of model/algorithm performance Hold-out method of validation Cross-validation Tuning machine learning algorithms with caret package Visualizing model performance with Profit ROC and Lift curves Ensemble Methods Bagging Random Forests Boosting Gradient boosting Exercises and Case study Support vector machines for classification and regression Maximal Margin classifiers Support vector classifiers Support vector machines SVM’s for classification problems SVM’s for regression problems Exercises and Case study Identifying unknown groupings within a data set Feature Selection for Clustering Representative based algorithms: k-means, k-medoids Hierarchical algorithms: agglomerative and divisive methods Probabilistic base algorithms: EM Density based algorithms: DBSCAN, DENCLUE Cluster validation Advanced clustering concepts Clustering with RHadoop Exercises and Case study Discovering connections with Link Analysis Link analysis concepts Metrics for analyzing networks The Pagerank algorithm Hyperlink-Induced Topic Search Link Prediction Exercises and Case study Association Pattern Mining Frequent Pattern Mining Model Scalability issues in frequent pattern mining Brute Force algorithms Apriori algorithm The FP growth approach Evaluation of Candidate Rules Applications of Association Rules Validation and Testing Diagnostics Association rules with R and Hadoop Exercises and Case study Constructing recommendation engines Understanding recommender systems Data mining techniques used in recommender systems Recommender systems with recommenderlab package Evaluating the recommender systems Recommendations with RHadoop Exercise: Building recommendation engine Text analysis Text analysis steps Collecting raw text Bag of words Term Frequency –Inverse Document Frequency Determining Sentiments Exercises and Case study
neo4j Beyond the relational database: neo4j 21小时 Relational, table-based databases such as Oracle and MySQL have long been the standard for organizing and storing data. However, the growing size and fluidity of data have made it difficult for these traditional systems to efficiently execute highly complex queries on the data. Imagine replacing rows-and-columns-based data storage with object-based data storage, whereby entities (e.g., a person) could be stored as data nodes, then easily queried on the basis of their vast, multi-linear relationship with other nodes. And imagine querying these connections and their associated objects and properties using a compact syntax, up to 20 times lighter than SQL? This is what graph databases, such as neo4j offer. In this hands-on course, we will set up a live project and put into practice the skills to model, manage and access your data. We contrast and compare graph databases with SQL-based databases as well as other NoSQL databases and clarify when and where it makes sense to implement each within your infrastructure. Audience Database administrators (DBAs) Data analysts Developers System Administrators DevOps engineers Business Analysts CTOs CIOs Format of the course Heavy emphasis on hands-on practice. Most of the concepts are learned through samples, exercises and hands-on development.   Getting started with neo4j neo4j vs relational databases neo4j vs other NoSQL databases Using neo4j to solve real world problems Installing neo4j Data modeling with neo4j Mapping white-board diagrams and mind maps to neo4j Working with nodes Creating, changing and deleting nodes Defining node properties Node relationships Creating and deleting relationships Bi-directional relationships Querying your data with Cypher Querying your data based on relationships MATCH, RETURN, WHERE, REMOVE, MERGE, etc. Setting indexes and constraints Working with the REST API REST operations on nodes REST operations on relationships REST operations on indexes and constraints Accessing the core API for application development Working with NET, Java, Javascript, Python APIs Closing remarks  
processmining Process Mining 21小时 Process mining, or Automated Business Process Discovery (ABPD), is a technique that applies algorithms to event logs for the purpose of analyzing business processes. Process mining goes beyond data storage and data analysis; it bridges data with processes and provides insights into the trends and patterns that affect process efficiency.  Format of the course     The course starts with an overview of the most commonly used techniques for process mining. We discuss the various process discovery algorithms and tools used for discovering and modeling processes based on raw event data. Real-life case studies are examined and data sets are analyzed using the ProM open-source framework. Audience     Data science professionals     Anyone interested in understanding and applying process modeling and data mining Overview     Discovering, analyzing and re-thinking your processes Types of process mining     Discovery, conformance and enhancement Process mining workflow     From log data analysis to response and action Other tools for process mining     PMLAB, Apromoro     Commercial offerings Closing remarks


课程日期价格【远程 / 传统课堂】
From Data to Decision with Big Data and Predictive Analytics - 北京 - 侨福芳草地星期一, 2017-07-10 09:30¥51650 / ¥55850
Big Data Business Intelligence for Govt. Agencies - 苏州 - 晋合广场星期一, 2017-07-17 09:30¥71210 / ¥77410
Data Shrinkage for Government - 海淀 - 创而新大厦星期一, 2017-07-17 09:30¥23840 / ¥27840
Model MapReduce and Apache Hadoop - 厦门 - 国际银行大厦星期二, 2017-07-18 09:30¥18180 / ¥21380


数据挖掘,培训,课程,培训课程, 学数据挖掘班,数据挖掘私教,数据挖掘课程,小组数据挖掘课程,数据挖掘晚上培训,学习数据挖掘 ,数据挖掘远程教育,数据挖掘老师,数据挖掘教程,数据挖掘s辅导,数据挖掘培训师,企业数据挖掘培训,短期数据挖掘培训,数据挖掘讲师,数据挖掘周末培训,一对一数据挖掘课程,数据挖掘辅导班