![]() ![]() At the moment, only simpleĭata-types and plain encoding are supported, so expect performance to be (i.e., logical segment) and no compression. The default is to produce a single output file with a single row-group Writing from fastparquet import write write ( 'outfile.parq', df ) write ( 'outfile2.parq', df, row_group_offsets =, compression = 'GZIP', file_scheme = 'hive' ) The latter is what is typically output by hive/spark. The file-path can be a single file,Ī metadata file pointing to other data files, or a directory (tree) containingĭata files. You may specify which columns to load, which of those to keep as categoricals Reading from fastparquet import ParquetFile pf = ParquetFile ( 'myfile.parq' ) df = pf. In which case you should also have cython to be able to rebuild the C files. You can also install latest version from github: pip install git+ You will need a suitable C compiler toolchain on your system. This may install an appropriate wheel, or compile from source. You may wish to install numpy first, to help pip’s resolver. Or install from PyPI: pip install fastparquet Install using conda, to get the latest compiled version: conda install -c conda-forge fastparquet (all development is against recent versions in the default anaconda channelsĬython >= 0.29.23 (if building from pyx files) Very competitive performance, in a small install size and codebase.ĭetails of this project, how to use it and comparisons to other work can be found in the documentation. ![]() We offer a high degree of support for the features of the parquet format, and The projects Dask, Pandas and intake-parquet. Fastparquet is a python implementation of the parquet
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |