Twitter 의 DBA 팀장/데이터베이스 아키텍처인 Jeremy Cole 는 O’Relilly MySQL 컨퍼런스 : Bin and Small Data at @Twitter




https://github.com/twitter/mysql


이야기들 중 재미있는것 하나로 그는 임시적인 샤딩(temporal sharding) 을 사용하는 이전 방식으로부터의 변화에 대해 말했는데, T-Bird 라 불리우는 더 분산된 새로운 스토어로의 착수에 관한 것이다. T-Brid 는  MySQL을 사용하고, Gizzard 기반으로 만들어짐.

Twitter 의 이전(Original) Tweet 스토어 :

  • 임시 샤딩된 트윗은 (temporally sharded tweets) 은 good-idea-at-the-time 아키텍처였다. 임시 샤딩은 간단히 동일기간 트윗은 동일 샤딩에 함께 저장되는것으로 의미한다.
  • 문제는 한 장비에서 트윗이 꽉 차면, 두번째, 그다음 세번째, 이런식으로 한 장비에서 다른장비로 확산되는 것이다.
  • 이것은 매우 일반적인 접근이고, 그것은 몇가지 실제 결함을 가지고 있다:
    로드밸런싱(Load Balancing). 오래된 장비들 대부분은 사람들이 특히 트위터에서 지금 무슨일이 발생했는지에 관심이 있기 때문에 어떠한 트래픽도 받지 못한다.
    값비싼 (Expensive). 한 장비를 채우고, 매 3주마다 슬레이브 장비에 복제되는 값비싼 셋업이 된다.
    -  업무 조직상의 복잡 (Logistically complicated). df 매 3주마다 완전히 새로운 클러스터를 만드는것은 DBA 팀에게는 고통이 된다.

Twitter 의 새로운 Tweet 스토어 :

  • 트윗을 남기면 그것은 Gizzard 최상단에 개발된 T-bird 라 불리우는 내부 시스템에 저장된다.  세컨더리 인덱스는 이역시 Gizzard 기반인 T-flock 라 불리우는 별도의 시스템에 저장된다.
  • 각 트윗의 유니크 ID 는 클러스터를 가로질러 더 고르게 샤딩 할 수있는 Snowflake 에 의해서 생성된다. FlockDB 는 (Gizzard 를 이용해서) ID 간 관계를 저장하기 위해 ID to ID 매핑용으로 사용된다.
  • Gizzard 는 Twitter 의 MySQL (InnoDB) 을 기반으로 개발된 분산 데이터 저장소 프레임웍이다.
    -  InnoDB 는 데이터 손실이 차단하여 선택되었다. Gizzard 는 데이터스토어로만 사용한다. 데이터는 입력되어지고  저장되고 다시 꺼낸다.
    -  개별 노드에서 더 높은 성능을 취하고자 한다면 바이너리 로그 같은 기능과 복재기능을 껏다. Gizzard 는 샤딩, 데이터의 N 복재, 그리고 잡 스케줄링을 관장한다.
    -  Gizzard 는 Twitter 에서 다른 스토리지 시스템을 위한 블럭을 만드는데 사용되었다.
  • Gizzard 는 로드 밸런싱의 의미에서는 완벽하지 않는 트윗 스토어를 구현한다. 하지만 다음을 할 수 있다.
    -  Growing slowly 장비가 꽉 채워지고 있을때나 정확한 시간에 어렵게 잘래내야 할때 걱정하지 않고 시간이 흘러
    - DBA 는 잠을 좀 잘 수 있다. 그들은 자주 그러한 결정을 내릴 필요가 없고, 비능률적인 비용을 평가하지 않아도 된다.
  • MySQL  대부분의 시간에 사용할 만큼 아주 충분하다. Twitter 는 이전 릴리즈에 머물러 있어서 기능 이상의 안정성을 평가한다.??
  • MySQL 은 ID 생성과 그래프 스토리지 용으로 동작하지 않는다.
  • MySQL 은 1.5TB 미만보다 작은 데이타셋에 사용된다. 사이즈는 RAID 어레이의 사이즈이며, 큰 데이터셋을 위한 백업 저장소.
  • 전형적인 데이터베이스 서버 설정 : 메모리와 디스크의 좋은 균형을 갖춘 HP DL380, 72GB RAM, 24 disk RAID10
MySQL 모든것에 사용되지 않는다 :
  • Cassandra 는 고속 쓰기와 저속 읽기에 사용된다. 잇점은 MySQL 보다 싼 하드웨어에서 Cassandra 가 실행가능하고, 더 쉽게 확장할수 있고, 그리고 스킴리스 디자인(schemales design) 을 좋아한다.
  • Hadoop 은 비구조화된 처리와 수천억 row 와 같은 큰 데이터셋에 사용된다.
  • Vertica 는 분석, 큰 집합과 조인 등 MapReduce job 을 쓸 필요가 없을때  사용되고 있다.


Gizzard : 분산 저장소를 만들기 위한 라이브러리

https://github.com/twitter/gizzard


Gizzard is middleware

diagram

Gizzard operates as a middleware networking service. It sits “in the middle” between clients (typically, web front-ends like PHP and Ruby on Rails applications) and the many partitions and replicas of data. Sitting in the middle, all data querying and manipulation flow through Gizzard. Gizzard instances are stateless so run as many gizzards as are necessary to sustain throughput or manage TCP connection limits. Gizzard, in part because it is runs on the JVM, is quite efficient. One of Twitter’s Gizzard applications (FlockDB, our distributed graph database) can serve 10,000 queries per second per commodity machine. But your mileage may vary.

Gizzard supports any datastorage backend

Gizzard is designed to replicate data across any network-available data storage service. This could be a relational database, Lucene, Redis, or anything you can imagine. As a general rule, Gizzard requires that all write operations be idempotent and commutative (see the section on Fault Tolerance and Migrations), so this places some constraints on how you may use the back-end store. In particular, Gizzard does not guarantee that write operations are applied in order. It is therefore imperative that the system is designed to reach a consistent state regardless of the order in which writes are applied.

Gizzard handles partitioning through a forwarding table

Gizzard handles partitioning (i.e., dividing exclusive ranges of data across many hosts) by mappings ranges of data to particular shards. These mappings are stored in a forwarding table that specifies lower-bound of a numerical range and what shard that data in that range belongs to.

forwarding table

To be precise, you provide Gizzard a hash function that, given a key for your data (and this key can be application specific), produces a number that belongs to one of the ranges in the forwarding table. These functions are programmable so you can optimize for locality or balance depending on your needs.

This range-based approach differs from the "consistent hashing" technique used in many other distributed data-stores. This allows for heterogeneously sized partitions so that you easily manage hotspots, segments of data that are extremely popular. In fact, Gizzard does allows you to implement completely custom forwarding strategies like consistent hashing, but this isn't the recommended approach. For some more detail on partitioning schemes, read wikipedia):

Gizzard handles replication through a replication tree

Each shard referenced in the forwarding table can be either a physical shard or a logical shard. A physical shard is a reference to a particular data storage back-end, such as a SQL database. In contrast, A logical shard is just a tree of other shards, where eachbranch in the tree represents some logical transformation on the data, and each node is a data-storage back-end. These logical transformations at the branches are usually rules about how to propagate read and write operations to the children of that branch. For example, here is a two-level replication tree. Note that this represents just ONE partition (as referenced in the forwarding table):

Alt text

The “Replicate” branches in the figure are simple strategies to repeat write operations to all children and to balance reads across the children according to health and a weighting function. You can create custom branching/logical shards for your particular data storage needs, such as to add additional transaction/coordination primitives or quorum strategies. But Gizzard ships with a few standard strategies of broad utility such as Replicating, Write-Only, Read-Only, and Blocked (allowing neither reads nor writes). The utility of some of the more obscure shard types is discussed in the section on Migrations.

The exact nature of the replication topologies can vary per partition. This means you can have a higher replication level for a “hotter” partition and a lower replication level for a “cooler” one. This makes the system highly configurable. For instance, you can specify that the back-ends mirror one another in a primary-secondary-tertiary-etc. configuration for simplicity. Alternatively, for better fault tolerance (but higher complexity) you can “stripe” partitions across machines so that no machine is a mirror of any other.

Gizzard is fault-tolerant

Fault-tolerance is one of the biggest concerns of distributed systems. Because such systems involve many computers, there is some likelihood that one (or many) are malfunctioning at any moment. Gizzard is designed to avoid any single points of failure. If a certain replica in a partition has crashed, Gizzard routes requests to the remaining healthy replicas, bearing in mind the weighting function. If all replicas of in a partition are unavailable, Gizzard will be unable to serve read requests to that shard, but all other shards will be unaffected. Writes to an unavailable shard are buffered until the shard again becomes available.

In fact, if any number of replicas in a shard are unavailable, Gizzard will try to write to all healthy replicas as quickly as possible and buffer the writes to the unavailable shard, to try again later when the unhealthy shard returns to life. The basic strategy is that all writes are materialized to a durable, transactional journal. Writes are then performed asynchronously (but with manageably low latency) to all replicas in a shard. If a shard is unavailable, the write operation goes into an error queue and is retried later.

In order to achieve “eventual consistency”, this “retry later” strategy requires that your write operations are idempotent andcommutative. This is because a retry later strategy can apply operations out-of-order (as, for instance, when newer jobs are applied before older failed jobs are retried). In most cases this is an easy requirement. A demonstration of commutative, idempotent writes is given in the Gizzard demo app, Rowz.

Winged migrations

It’s sometimes convenient to copy or move data from shards from one computer to another. You might do this to balance load across more or fewer machines, or to deal with hardware failures. It’s interesting to explain some aspect of how migrations work just to illustrate some of the more obscure logical shard types. When migrating from Datastore A to Datastore A', a Replicating shard is set up between them but a WriteOnly shard is placed in front of Datastore A'. Then data is copied from the old shard to the new shard. The WriteOnly shard ensures that while the new Shard is bootstrapping, no data is read from it (because it has an incomplete picture of the corpus).

Alt text

Because writes will happen out of order (new writes occur before older ones and some writes may happen twice), all writes must be idempotent and commutative to ensure data consistency.


간단히 보장과  높은 수준의 유니크 ID 넘버를 생성하는 네트웍크 서비스

https://github.com/twitter/snowflake




참고사이트 : http://ds5apn.wordpress.com/2011/12/21/twitter-%EB%8A%94-%EC%96%B4%EB%96%BB%EA%B2%8C-mysql-%EC%9D%84-%EC%9D%B4%EC%9A%A9%ED%95%B4-%ED%95%98%EB%A3%A8-2%EC%96%B55%EC%B2%9C%EB%A7%8C-tweets-%EC%9D%84-%EC%A0%80%EC%9E%A5%ED%95%98%EB%8A%94/

'Databases' 카테고리의 다른 글

SQL Server 15138 에러  (0) 2012.09.27
marked as crashed and should be repaired  (2) 2012.06.13
phpmyadmin에 cvs import  (0) 2012.02.14
DENSE_RANK(Transact-SQL)  (0) 2011.12.30
하둡 Technical Review(DBguid.net)  (0) 2011.12.15

+ Recent posts