It covers the data management functions of the whole life cycle such as collection, storage, query, analysis and visualization of time series data.
Tsfile is a column storage file format optimized for time series data, which reduces the hardware resources required for data storage and optimizes the performance of data query.
The file format adopts the idea of column storage, divides a time series data set into multiple subsets, and stores each subset according to the series.
A Tsfile includes a data area and an index area. The data area is composed of one or more ChunkGroups. The index area records the metadata information and related query indexes of the Tsfile. In each ChunkGroup, the data is divided into multiple chunks according to the time series. Each sub block divides the timing data into multiple pages, and stores the time and value in columns respectively.
Tsfile stores the timestamps and values of each time series in columns, encodes and compresses them by columns, which can effectively use the principle of data locality, provide the compression ratio of data during storage, and save 90% storage space.
Tsfile stores the time series in columns, which can effectively reduce the amount of data to be read during data query and reduce the number of disk I/O, so as to improve the query speed. It can complete TB data queries in milliseconds.
Tsfile divides the time series data into hierarchical structure according to "ChunkGroup", "Chunk" and "Page", pre aggregates the data at the level of "Chunk" and "Page", and constructs the aggregation index information of data points. This enables Tsfile to support aggregate queries natively and has efficient aggregate query efficiency.
Tsfile stores data block, index block and metadata information in the same file, so that the file can be self parsed through metadata information. This feature can make such files applicable to hive, spark and other data analysis platforms.