hup.io.csv module

File I/O for text files containing delimiter-separated values.

The delimiter-separated values format is a family of file formats, used for the storage of tabular data. In it’s most common variant, the comma-separated values format, the format was used many years prior to attempts to it’s standardization in RFC 4180, such that subtle differences often exist in the data produced and consumed by different applications. This circumstance has and basically been addressed by PEP 305 and the standard library module csv. The current module extends the capabilities of the standard library by I/O handling of file references, support of non-standard CSV headers, as used in CSV exports of the R programming language, automation in CSV parameter detection and row names.

class File(file: Union[IO[Any], os.PathLike, hup.io.abc.Connector], header: Optional[Iterable[str]] = None, comment: Optional[str] = None, dialect: Union[str, csv.Dialect, None] = None, delimiter: Optional[str] = None, namecol: Optional[str] = None, hformat: Optional[int] = None)

Bases: hup.base.attrib.Group

File Class for text files containing delimiter-separated values.

Parameters:
  • fileFile reference to a file object. The reference can ether be given as a String or path-like object, that points to a valid entry in the file system, an instance of the class Connector or an opened file object in reading or writing mode.
  • header – Optional list (or arbitrary iterable) of strings, that specify the column names within the CSV file. For an existing file, the header by default is extracted from the first content line (not blank and not starting with #). For a new file the header is required and an error is raised if the header is not given.
  • comment – Optional string, which precedes the header and the rows of the CSV file, e.g. to include metadata within the file. For an existing file, the string by default is extracted from the initial comment lines (starting with #). For a new file the comment by default is empty.
  • dialect – Optional parameter, that indicates the used CSV dialect. The parameter can be given as a dialect name from the list returned by the function csv.list_dialects(), or an instance of the class csv.Dialect.
  • delimiter – Single character, which is used to separetate the column values within the CSV file. For an existing file, the delimiter by default is detected from it’s appearance within the file. For a new file the default value is ,.
  • namecol – Optional column name of a column, which contains row names. If a valid column is given, then the readonly attribute rownames returns this column as a list. By default the column is infered from the used header format. For a RFC 4180 compliant header by default no row names are used. For a header as used in exports of the R programming language by default the first column is used to store row names.
  • hformat

    Used CSV Header format. The following formats are supported: 0: RFC 4180:

    The column header represents the structure of the rows.
    1: R programming language:
    The column header does not include the first column of the rows. This follows by the convention, that in the R programming language the CSV export adds an extra column with row names as the first column, which is omitted within the CSV header.
close() → None

Close all opened file handlers of CSV File.

comment

String containing the initial ‘#’ lines of the CSV file or an empty string, if no initial comment lines could be detected.

delimiter

Delimiter character of the CSV file or None, if for an existing file the delimiter could not be detected.

dialect = None
fields = None
header

List of strings containing column names from first non comment, non empty line of CSV file.

hformat
RFC 4180:
The column header represents the structure of the rows.
1: R programming language:
The column header does not include the first column of the rows. This follows by the convention, that in the R programming language the CSV export adds an extra column with row names as the first column, which is omitted within the CSV header.
Type:CSV Header format. The following formats are supported
Type:0
name = None
namecol

Readonly name of column, that contains the row names. By default the column is infered from the used header format. For a RFC 4180 compliant header by default no row names are used. For a header as used in exports of the R programming language by default the name of the first column is returned.

open(mode: str = 'r', columns: Optional[Tuple[str, ...]] = None) → hup.io.csv.HandlerBase

Open CSV file in reading or writing mode.

Parameters:
  • mode – String, which characters specify the mode in which the file is to be opened. The default mode is reading mode, which is indicated by the character r. The character w indicates writing mode. Thereby reading- and writing mode are exclusive and can not be used together.
  • columns – Has no effect in writing mode. For reading mode it specifies the columns, which are return from the CSV file by their respective column names. By default all columns are returned.
Returns:

In reading mode (if mode contains the character w) an instance of the class Reader is returned and in writing mode (if mode contains the character r) an instance of the class Writer is returned.

read(columns: Optional[Tuple[str, ...]] = None) → List[tuple]

Read all rows from current CSV file.

Parameters:columns – Specifies the columns, which are return from the CSV file by their respective column names. By default all columns are returned.
Returns:List of tuples, which contain the values of the specified columns.
rownames = None
write(rows: List[tuple]) → None

Write rows to current CSV file.

Parameters:rows – List of tuples (or arbitrary iterables), which respectively contain the values of a single row.
class HandlerBase(file: Union[IO[Any], os.PathLike, hup.io.abc.Connector], mode: str = 'r')

Bases: abc.ABC

CSV file I/O Handler Base Class.

Parameters:
  • fileFile reference to a file object. The reference can ether be given as a String or path-like object, that points to a valid entry in the file system, an instance of the class Connector or an opened file object in reading mode.
  • mode – String, which characters specify the mode in which the file is to be opened. The default mode is reading mode, which is indicated by the character r. The character w indicates writing mode. Thereby reading- and writing mode are exclusive and can not be used together.
close() → None

Close the CSV file handler.

read_row() → tuple

Read a single row from the referenced file as a tuple.

read_rows() → List[tuple]

Read multiple rows from the referenced file as a list of tuples.

write_row(row: Iterable[T_co]) → None

Write a single row to the referenced file from a tuple.

write_rows(rows: Iterable[tuple]) → None

Write multiple rows to the referenced file from a list of tuples.

class Reader(file: Union[IO[Any], os.PathLike, hup.io.abc.Connector], skiprows: int, usecols: Optional[Tuple[int, ...]], fields: List[Tuple[str, type]], **kwds)

Bases: hup.io.csv.HandlerBase

CSV file I/O Reader Class.

Parameters:
  • fileFile reference to a file object. The reference can ether be given as a String or path-like object, that points to a valid entry in the file system, an instance of the class Connector or an opened file object in reading mode.
  • skiprows – Number of initial lines within the given CSV file before the CSV Header. By default no lines are skipped.
  • usecols – Tuple with column IDs of the columns, which are imported from the given CSV file. By default all columns are imported.
  • fields – List (or arbitrary iterable) of field descriptors, respectively given by a tuple, containing a column name and a column type.
  • **kwds – Formatting parameters used by csv. See also Dialects and formatting parameters
read_row() → tuple

Read a single row from the referenced file as a tuple.

read_rows() → List[tuple]

Read multiple rows from the referenced file as a list of tuples.

write_row(row: Iterable[T_co]) → None

Write a single row to the referenced file from a tuple.

write_rows(rows: Iterable[tuple]) → None

Write multiple rows to the referenced file from a list of tuples.

class Writer(file: Union[IO[Any], os.PathLike, hup.io.abc.Connector], header: Iterable[str], comment: str = '', **kwds)

Bases: hup.io.csv.HandlerBase

CSV file I/O Writer Class.

Parameters:
  • fileFile reference to a file object. The reference can ether be given as a String or path-like object, that points to a valid entry in the file system, an instance of the class Connector or an opened file object in writing mode.
  • header – List (or arbitrary iterable) of column names, that specify the header of the CSV file.
  • comment – Initial comment of the CSV file.
  • **kwds – Formatting parameters used by csv. See also Dialects and formatting parameters
read_row() → tuple

Read a single row from the referenced file as a tuple.

read_rows() → List[tuple]

Read multiple rows from the referenced file as a list of tuples.

write_row(row: Iterable[T_co]) → None

Write a single row to the referenced file from a tuple.

write_rows(rows: Iterable[tuple]) → None

Write multiple rows to the referenced file from a list of tuples.

load(file: Union[IO[Any], os.PathLike, hup.io.abc.Connector], delimiter: Optional[str] = None, hformat: Optional[int] = None) → hup.io.csv.File

Load CSV file.

Parameters:
  • fileFile reference to a file object. The reference can ether be given as a String or path-like object, that points to a valid entry in the file system, an instance of the class Connector or an opened file object in reading or writing mode.
  • delimiter – Single character, which is used to separetate the column values within the CSV file. By default the delimiter is detected from it’s appearance within the file.
  • hformat
Returns:

Instance of class hup.io.csv.File

save(file: Union[IO[Any], os.PathLike, hup.io.abc.Connector], header: Iterable[str], values: List[tuple], comment: Optional[str] = None, delimiter: Optional[str] = None, hformat: Optional[int] = None) → None

Save data to CSV file.

Parameters:
  • fileFile reference to a file object. The reference can ether be given as a String or path-like object, that points to a valid entry in the file system, an instance of the class Connector or an opened file object in reading or writing mode.
  • header – Optional list (or arbitrary iterable) of strings, that specify the column names within the CSV file. For an existing file, the header by default is extracted from the first content line (not blank and not starting with #). For a new file the header is required and an error is raised if the header is not given.
  • comment – Optional string, which precedes the header and the rows of the CSV file, e.g. to include metadata within the file. For an existing file, the string by default is extracted from the initial comment lines (starting with #). For a new file the comment by default is empty.
  • delimiter – Single character, which is used to separetate the column values within the CSV file. For an existing file, the delimiter by default is detected from it’s appearance within the file. For a new file the default value is ,.
  • hformat