docci

docci is a package which provides various document management utils to simplify work with files in python-applications (mostly web-applications)

Features

  • File abstraction via FileAttachment class which consists of file name and bytes-content and provides following features:

    • base64-string creation for file transference in json-apis

    • Content-Disposition header generation for file name identity in web apps

    • file name manipulation like extension extraction, mime-type detection

    • file save on disk - useful when have binary from web and you need to explore it as file on disk

  • Specific file utilities based on FileAttachment manipulation:

    • directories exploring - list directory files as list of FileAttachment’s

    • zip-file exploring - list zip file contents as list of FileAttachment’s

    • zip-file creation - create zip-archive from list of FileAttachment’s

    • openpyxl-based xlsx utils like converting xlsx to FileAttachment, xlsx creation from dicts

Usage

Firstly, you need to create FileAttachment:

# Creation from pdf
import pdfkit
from docci.file import FileAttachment

pdf_data: bytes = pdfkit.from_file("sample.pdf", output_path=False)
file = FileAttachment("sample.pdf", pdf_data)

# Creation from xlsx
from openpyxl import load_workbook
from docci.file import FileAttachment
from docci.xlsx import xlsx_to_bytes

xlsx = load_workbook("sample.xlsx")
xlsx_data = xlsx_to_bytes(xlsx)
file = FileAttachment("sample.xlsx", xlsx_data)

# Creation from file on disk
from docci.file import FileAttachment

file = FileAttachment.load("path/to/file")

# Creation from base64 str
from docci.file import FileAttachment

file = FileAttachment.load_from_base64("base64-string", "filename")

Now you can use the FileAttachment features:

# To get base64 file representation
file.content_base64

# To generate Content-Disposition header with file name
file.content_disposition

# To get file extension
file.extension

# To get file mimetype
file.mimetype

# To save file to disk
file.save("path/to/file")

Specific file utilities are just functions:

# To get directory files
from docci.file import list_dir_files

files = list_dir_files("path/to/dir")

# To list zip files
from docci.zip import list_zip_files

files = list_zip_files("path/to/zip")

# To create zip-archive
from docci.zip import zip_files

zip_file = zip_files("sample.zip", [file])

# To convert xlsx to FileAttachment
from openpyxl import load_workbook
from docci.xlsx import xlsx_to_file

xlsx_file = xlsx_to_file(load_workbook("path/to/xlsx"), "filename.xlsx")

# To create xlsx from dicts
from docci.xlsx import dicts_to_xlsx

xlsx = dicts_to_xlsx([
  {"col1": 1, "col2": 2},
  {"col1": 3, "col2": 4}
])

More features can be found in api reference below

API reference

docci.file

Utils for file manipulations like extracting file name from path

class docci.file.FileAttachment(name: str, content: bytes)

Class for file abstraction

Parameters
  • name – file name. Restricted symbols (like */:) and directory path (/opt/data/test.txt > test.txt) will be removed from the file name.

  • content – binary file content

property content_base64

Convert content to base64 binary string

property content_disposition

Convert file name to urlencoded Content-Disposition header

>>> FileAttachment("sample.py", b"").content_disposition
{'Content-Disposition': 'attachment; filename=sample.py'}
>>> FileAttachment("98 - February 2019.zip", b"").content_disposition
{'Content-Disposition': 'attachment; filename=98%20-%20February%202019.zip'}
property content_json

Return content as dict with base64 content

property content_stream

Return file attachment content as bytes stream

property extension
>>> FileAttachment("sample.py", b"").extension
'py'
classmethod load(path: str) → docci.file.FileAttachment

Load file from disk

classmethod load_from_base64(base64_str: Union[str, bytes], name: str) → docci.file.FileAttachment

Load file from base64 string

property mimetype

Guess mimetype by extension.

property name_without_extension
>>> FileAttachment("sample.py", b"").name_without_extension
'sample'
save(path: Optional[str] = None) → None

Save file to disk

docci.file.extract_file_name(path: str) → str

Extract file name from path, works to directories too

>>> extract_file_name("tests/test_api.py")
'test_api.py'
>>> extract_file_name("tests/test")
'test'
docci.file.list_dir_files(directory: str) → Tuple[str, Iterable[docci.file.FileAttachment]]

List directory files, return Directory - tuple of dir name and list of dir files

docci.file.normalize_name(raw_name: str, with_file_name_extract: bool = True) → str

Extract file name, remove restricted chars

>>> normalize_name('op/"oppa".txt')
'oppa.txt'
>>> normalize_name('op/"oppa".txt', with_file_name_extract=False)
'opoppa.txt'

docci.xlsx

Utils for working with openpyxl.Workbook

docci.xlsx.dicts_to_xlsx(dicts: Sequence[Dict], headers: Sequence[str] = None) → openpyxl.workbook.workbook.Workbook

Create openpyxl.Workbook with rows of {dicts} values.

Parameters
  • dicts – List of dicts to insert

  • headers – List of headers if None dict keys would be used.

Returns

openpyxl.Workbook

docci.xlsx.xlsx_from_bytes(bytes_: bytes) → openpyxl.workbook.workbook.Workbook

Create xlsx from bytes.

docci.xlsx.xlsx_from_file(file: docci.file.FileAttachment) → openpyxl.workbook.workbook.Workbook

Create xlsx from FileAttachment

docci.xlsx.xlsx_to_bytes(xlsx: openpyxl.workbook.workbook.Workbook) → bytes

Convert openpyxl.Workbook to bytes

docci.xlsx.xlsx_to_file(xlsx: openpyxl.workbook.workbook.Workbook, name: str) → docci.file.FileAttachment

Convert openpyxl.Workbook to FileAttachment

docci.zip

Utils for working with zip archives

docci.zip.list_zip_files(raw_zip_file: Union[str, bytes, _io.BytesIO, zipfile.ZipFile, docci.file.FileAttachment]) → Sequence[docci.file.FileAttachment]

List zip archive files

docci.zip.raw_to_zip(raw_zip_file: Union[str, bytes, _io.BytesIO, zipfile.ZipFile, docci.file.FileAttachment]) → zipfile.ZipFile

Convert path, bytes, stream, FileAttachment to ZipFile.

docci.zip.zip_dirs(dirs: Iterable[Tuple[str, Iterable[FileAttachment]]], zip_name: str) → docci.file.FileAttachment

Zip folders into single zip archive with {zip_name}

docci.zip.zip_files(files: Iterable[docci.file.FileAttachment], zip_name: str) → docci.file.FileAttachment

Zip files to archive with {zip_name}

Development & contribution

Publishing to PYPI

Bump version:

poetry version major/minor/patch

Build and publish package:

poetry publish --build

Published package can be found here: https://pypi.org/project/docci/