The Google File System

Posted on May 31, 2022   1 minute read ∼ Filed in  : 

Introduction

The paper presents file system interface design to support distributed applications, and also discusses many aspects of the design.

The files system cluster has been used in:

  1. Hundreds of terabytes of storage
  2. Across thousands of disks on thousand of machines.
  3. Accessed by hundreds of clients.

The GFS has good performance, scalability, reliability, and availability. And it is also designed based on the following points

  1. Component failures are the norm. Such as failure caused by application bugs, operating system bugs, human errors, etc.
  2. Files are huge by traditional standards. Multi-GB files are common.
  3. Most are append-only data rather than overwriting existing data.
  4. Co-design the application and file system API to increase flexibility. Eg.
    • It relaxed GFS’ consistency model to simplify the file system without imposing a burden on the application.
    • Introduce atomic append operation. Multiple clients can append concurrently without extra synchronization.

Design Overview

Assumptions

  1. The system is built on inexpensive machines which often fail.
  2. Should store a modest number of large files, A few million files with 100 MB.
  3. Supports workloads include
    • Large streaming reads: Each request reads hundreds of KBs
    • Small random reads: Few KBs at the arbitrary offset.
  4. The system only focuses on sequential writes. While small writing is less efficient.
  5. The system must efficiently support multiple clients concurrently append to the same file.
  6. High sustained bandwidth is more important than low latency. Most applications require processing data in bulk at a high rate. And they don’t require a fast response time.

Interface

Supports create, delete, open, close, read, and write operations.

Also has snapshot, and record appends operations.

  1. Snapshot: creates a copy of a file or a directory tree at a low cost.
  2. Record appends: Allows multiple clients to append data atomicity.

Architecture





END OF POST




Tags Cloud


Categories Cloud




It's the niceties that make the difference fate gives us the hand, and we play the cards.