Saturday, March 24, 2012

RPC Library comparison for Big Data

There are various flavors of RPC implementations available in the open source arena. Each of RPC implementation libraries has its own pros and cons. Ideally; we should select the RPC library according to specific enterprise solution requirements of the project.
Some of the features that any RPC implementation aspires for are:
  • Cross Platform communication
  • Multiple Programming Languages
  • Support for Fast protocols (local, binary, zipped, etc.)
  • Support for Multiple transports
  • Flexible Server (configuration for non-blocking, multithreading, etc.)
  • Standard server and client implementations
  • Compatibility with other RPC libraries
  • Support for different data types and containers
  • Support for Asynchronous communication
  • Inherent support in Hadoop, NoSQL
  • Support for dynamic typing (no schema compilation)
  • Fast serialization
Focusing on Big Data stack, below I compare couple of RPC libraries,
 
Support for
Avro
Thrift
MessagePack
Protocol Buffers
BSON
Fast Infoset
Woodstox

Cross Platform
10
10
10
10
10
10
10

Multiple Languages
10
10
10
10
10
10
10
Critical Requirement <= 10
Fast Protocols
10
10
3
3
3
10
3

Flexible Server (configurable thread pool, NBlock)
7
10
7
0
3
3
0
Not so Critical Requirement <=5
Simple IDL
7
10
10
7
3
3
3

Standard Server and Client
10
10
10
3
3
3
3

Fast  and Compact Serialization
5
7
7
6
6
6
7

Multiple transports and protocols
7
10
3
0
0
0
0

Inherent support in Hadoop
10
3
0
0
0
0
0

Compatibility with other RPC Libraries
5
5
0
0
0
0
0

Data types, containers
10
7
10
7
3
3
3

No Schema compilation (dynamic typing)
5
0
5
0
5
5
5

Asynchronous calls/Callback
0
5
5
2
0
0
0

Score (out of 115)
96
97
82
48
46
53
44

Thrift, Avro and MessagePack looks really impressive to me. Thrift and Avro supports most of the above listed requirements and are very well tested in battles.
Another factor of classification can be,
  • for JSON based conversation between server and client, MessagePack is the best among all,
  • for Binary data conversations, BSON should be considered,
  • for XML based conversations, Fast Infoset and Woodstox should be considered.

No comments: