Table of Contents

File types

Parquet

  • Better Column selecting
  • Columnar format
  • Binary format
  • Encoded & Compressed
  • Support schema evolution - Format supports

Limitation:

  • Pushdown filters dont works on String / Binary (source)
  • Write speed tradeoff

Walkaround(s):

  • Immutability
    • Write using partitioning
    • Combine with a database (i.e. Cassandara) - after a while spilt out parquets
    • Write mode append, that added embedded schema

vs ORC

  • indexed
  • dont handles nested data

ORC

  • Nested Data
  • Columnar format
  • Predicate pushdown (Min max + bloomfilters)
  • ACID support / cannot add
  • suggested to streaming (source)

Avro

kb/bigdata/file_types.txt · Last modified: 2018/11/22 12:24 by yehuda
Back to top
Driven by DokuWiki Recent changes RSS feed Valid CSS Valid XHTML 1.0