亚洲精品少妇久久久久久,亚洲狼人香蕉香蕉在线28,亚洲AV高清在线一区二区三区

書單推薦

更多

·二十四節(jié)氣｜白露

·二十四節(jié)氣｜處暑

·二十四節(jié)氣｜立秋

·二十四節(jié)氣｜大暑

·二十四節(jié)氣｜夏至

·科學出版社精品典藏

·清華大學出版社—2024年度好

·二十四節(jié)氣 | 立春

新書推薦

更多

·《中國經濟學(2025年第2輯總

·《高速閱讀法：用學到的知識

·《行為博弈》

·《神經網絡設計與應用》

·《精準落實》

·《新生物學本質主義研究》

·《賞文物話中醫(yī)》

·《把熱愛變成事業(yè)》

Hadoop應用架構（影印版）

定　　價：89 元

作者：：(美)Mark Grover//Ted Malaska//Jonathan Seidman//Gwen Shapira
出版時間：2017/2/1
ISBN：9787564170011
出版社：東南大學出版社

中圖法分類：TP274
頁碼：
紙張：膠版紙
版次：1
開本：16開

9

7

1

8

7

0

5

0

6

1

4

1

在使用Apache Hadoop設計端到端數(shù)據(jù)管理解決方案時獲得專家級指導。當其他很多渠道還停留在解釋Hadoop生態(tài)系統(tǒng)中該如何使用各種紛繁復雜的組件時，這本專注實踐的書已帶領你從架構的整體角度思考，它對于你的特別應用場景而言是必不可少的，將所有組件緊密結合在一起，形成完整有針對性的應用程序。
為了增強學習效果，本書第二部分提供了各種詳細的架構案例．涵蓋部分*常見的Hadoop應用場景。
無論你是在設計一個新的Hadoop應用還是正計劃將 Hadoop整合到現(xiàn)有的數(shù)據(jù)基礎架構中，Mark Grover 、Ted Malaska、Jonathan Seidman、Gwen Shapira編*的《Hadoop應用架構(影印版)(英文版) 》都將在這整個過程中提供技巧性的指導。
使用Hadoop存放數(shù)據(jù)和建模數(shù)據(jù)時需要考慮的要素在系統(tǒng)中導入數(shù)據(jù)和從系統(tǒng)中導出數(shù)據(jù)的*佳實踐指導數(shù)據(jù)處理的框架，包括MapReduce、Spark和 Hive 常用Hadoop處理模式，例如移除重復記錄和使用窗口分析 Giraph，GraphX以及其他Hadoop上的大圖片處理工具使用工作流協(xié)作和調度工具，例如Apache Oozie 使用Apache Storm、Apache Spark Streaming 和Apache Flume處理準實時數(shù)據(jù)流點擊流分析、欺詐防止和數(shù)據(jù)倉庫的架構實例

Foreword Preface Part Ⅰ. Architectural Considerations for Hadoop Applications 1. Data Modeling in Hadoop Data Storage Options Standard File Formats Hadoop File Types Serialization Formats Columnar Formats Compression HDFS Schema Design Location of HDFS Files Advanced HDFS Schema Design HDFS Schema Design Summary HBase Schema Design Row Key Timestamp Hops Tables and Regions Using Columns Using Column Families Time-to-Live Managing Metadata What Is Metadata? Why Care About Metadata? Where to Store Metadata? Examples of Managing Metadata Limitations of the Hive Metastore and HCatalog Other Ways of Storing Metadata Conclusion 2. Data Movement Data Ingestion Considerations Timeliness of Data Ingestion Incremental Updates Access Patterns Original Source System and Data Structure Transformations Network Bottlenecks Network Security Push or Pull Failure Handling Level of Complexity Data Ingestion Options File Transfers Considerations for File Transfers versus Other Ingest Methods Sqoop: Batch Transfer Between Hadoop and Relational Databases Flume: Event-Based Data Collection and Processing Kafka Data Extraction Conclusion 3. Processing Data in Hadoop MapReduce MapReduce Overview Example for MapReduce When to Use MapReduce Spark Spark Overview Overview of Spark Components Basic Spark Concepts Benefits of Using Spark Spark Example When to Use Spark Abstractions Pig Pig Example When to Use Pig Crunch Crunch Example When to Use Crunch Cascading Cascading Example When to Use Cascading Hive Hive Overview Example of Hive Code When to Use Hive Impala Impala Overview Speed-Oriented Design Impala Example When to Use Impala Conclusion 4. Common Hadoop Processing Patterns Pattern: Removing Duplicate Records by Primary Key Data Generation for Deduplication Example Code Example: Spark Deduplication in Scala Code Example: Deduplication in SQL Pattern: Windowing Analysis Data Generation for Windowing Analysis Example Code Example: Peaks and Valleys in Spark Code Example: Peaks and Valleys in SQL Pattern: Time Series Modifications Use HBase and Versioning Use HBase with a RowKey of RecordKey and StartTime Use HDFS and Rewrite the Whole Table Use Partitions on HDFS for Current and Historical Records Data Generation for Time Series Example Code Example: Time Series in Spark Code Example: Time Series in SQL Conclusion 5. Graph Processing on Hadoop What Is a Graph? What Is Graph Processing? How Do You Process a Graph in a Distributed System? The Bulk Synchronous Parallel Model BSP by Example Giraph Read and Partition the Data Batch Process the Graph with BSP Write the Graph Back to Disk Putting It All Together When Should You Use Giraph? GraphX Just Another RDD GraphX Pregel Interface vprog0 sendMessage0 mergeMessage0 Which Tool to Use? Conclusion 6. Orchestration Why We Need Workflow Orchestration The Limits of Scripting The Enterprise Job Scheduler and Hadoop Orchestration Frameworks in the Hadoop Ecosystem Oozie Terminology Oozie Overview Oozie Workflow Workflow Patterns Point-to-Point Workflow Fan- Out Workflow Capture-and-Decide Workflow Parameterizing Workflows Classpath Definition Scheduling Patterns Frequency Scheduling Time and Data Triggers Executing Workflows Conclusion 7. Near-Real-Time Processing with Hadoop Stream Processing Apache Storm Storm High-Level Architecture Storm Topologies Tuples and Streams Spouts and Bolts Stream Groupings Reliability of Storm Applications Exactly-Once Processing Fault Tolerance Integrating Storm with HDFS Integrating Storm with HBase Storm Example: Simple Moving Average Evaluating Storm Trident Trident Example: Simple Moving Average Evaluating Trident Spark Streaming Overview of Spark Streaming Spark Streaming Example: Simple Count Spark Streaming Example: Multiple Inputs Spark Streaming Example: Maintaining State Spark Streaming Example: Windowing Spark Streaming Example: Streaming versus ETL Code Evaluating Spark Streaming Flume Interceptors Which Tool to Use? Low-Latency Enrichment, Validation, Alerting, and Ingestion NRT Counting, Rolling Averages, and Iterative Processing Complex Data Pipelines Conclusion Part Ⅱ. Case Studies 8. Clickstream Analysis Defining the Use Case Using Hadoop for Clickstream Analysis Design Overview Storage Ingestion The Client Tier The Collector Tier Processing Data Deduplication Sessionization Analyzing Orchestration Conclusion 9. Fraud Detection Continuous Improvement Taking Action Architectural Requirements of Fraud Detection Systems Introducing Our Use Case High-Level Design Client Architecture Profile Storage and Retrieval Caching HBase Data Definition Delivering Transaction Status: Approved or Denied? Ingest Path Between the Client and Flume Near-Real-Time and Exploratory Analytics Near-Real-Time Processing Exploratory Analytics What About Other Architectures? Flume Interceptors Kafka to Storm or Spark Streaming External Business Rules Engine Conclusion 10. Data Warehouse Using Hadoop for Data Warehousing Defining the Use Case OLTP Schema Data Warehouse: Introduction and Terminology Data Warehousing with Hadoop High-Level Design Data Modeling and Storage Ingestion Data Processing and Access Aggregations Data Export Orchestration Conclusion A. Joins in Impala Index

你還可能感興趣

我要評論

感谢您访问我们的网站，您可能还对以下资源感兴趣：

欧美自拍区日韩国产区

av久久久久久久久久久精品视频亚洲喷潮av二区国产51自产区在线 A级毛片黄免费观看视频