sdata documentation

sdata.git

https://zenodo.org/badge/DOI/10.5281/zenodo.4311397.svg https://img.shields.io/pypi/v/sdata.svg?style=flat-square

Motivation

To deal with experimental and simulation data is very often a pain due to an non existing standard data format.

Every lab, and even worse, every staff uses different data formats to store the measured data during an experimental study. Sometimes the raw data ist given by means of undocumented csv files, sometimes a bunch of Excel files with some strange tables layouts are available from the labs. This results often in an uncomplete data set.

To predict a system behavior the nessecary simluation task depend on these experimental result inorder to calibrate the simulation models. Very often the link to the raw experimental data is broken by using only some key result from the experiments.

The aim of this project is to fill the gap by providing an open self describing data structure to store results from experiments and simulations in the same manner.

It should be easy to define a standard data format for a particular experimental test setup based on the sdata environment within a specific project. Furthermore the data should be readable in future.

https://gitpitch.com/lepy/sdata/master?grs=github&t=beige#

Design goals

  • open data format for open science projects

  • self describing data

  • flexible data structure layout

    • hierarchical data structure (nesting groups, dictionaries)

    • (posix path syntax support?)

  • extendable data structure

  • platform independent

  • simple object model

  • support of standard metadata formats (key/value, …)

  • support of standard dataset formats (hdf5, netcdf, csv, …)

  • support of standard dataset types (datacubes, tables, series, …)

  • support of physical units (conversion of units)

  • transparent, optional data compression (zlib, blosc, …)

  • support of (de-)serialization of every dataset type (group, data, metadata)

  • easy defineable (project) standards, e.g. for a uniaxial tension test (UT)

  • (optional data encryption (gpg, …))

  • change management support?

  • Enable use of data structures from existing tensor libraries transparently

  • (single writer/ multiple reader (swmr) support)

  • (nested table support)

Data types

Metadata

Metadata describing EVERY object within the data structure.

Attributes

  • name .. Name of an attribute (str)

  • value .. Value of an attribute

  • dtype .. data type of the attribute (default: str)

  • unit .. physical unit of an attribute (optional)

  • description .. a description of an attribute (optional)

  • label .. a fancy label of an attribute , e.g. for plotting (optional)

  • required .. a boolean attribute for attribute checks (optional)

1import sdata
2attribute1 = sdata.Attribute("color", "blue")
3attribute1
(Attr'color':blue(str))
1attribute2 = sdata.Attribute(name="answer",
2                             value=42,
3                             dtype="int",
4                             unit="-",
5                             description="""The Answer to the Ultimate Question of Life, The Universe, and Everything""",
6                             label="Die Antwort")
7attribute2.to_dict()
{'name': 'answer',
 'value': 42,
 'unit': '-',
 'dtype': 'int',
 'description': 'The Answer to the Ultimate Question of Life, The Universe, and Everything',
 'label': 'Die Antwort'}

dtypes for attributes

  • int, (int64)

  • float, (float32, float64, float128)

  • str, (unicode)

  • bool

  • timestamp (datetime.isoformat with timezone)

  • (uuid planed)

sdata.metadata

1metadata = sdata.Metadata()
2metadata.add(attribute1)
3metadata.add(attribute2)
4print(metadata)
5metadata.df
                        name  value dtype unit                                        description        label
key
sdata_version  sdata_version  0.8.4   str    -
Augenfarbe             color   blue   str    -
answer                answer     42   int    -  The Answer to the Ultimate Question of Life, T...  Die Antwort
1data = sdata.Data(name="basic example", uuid="38b26864e7794f5182d38459bab85842", table=df)
2data.metadata.add("Temperatur",
3                  value=25.4,
4                  dtype="float",
5                  unit="degC",
6                  description="Temperatur",
7                  label="Temperatur T [°C]")
8data.metadata.df
                        name                             value  dtype  unit  description              label
key
sdata_version  sdata_version                             0.8.4    str     -
name                    name                     basic example    str     -
uuid                    uuid  38b26864e7794f5182d38459bab85842    str     -
Temperatur        Temperatur                              25.4  float  degC   Temperatur  Temperatur T [°C]

Core data types

Data

The Data class is the Base class for all classes within the sdata family. It provides a uuid, a name and the metadata functionality. It can group other Data objects. A Data object can store one pandas.DataFrame.

1import sdata
2data = sdata.Data(name="my data name", table=df, description="my data description")
 1df = pd.DataFrame({"time": [1.1, 2.1, 3.5],
 2                   "temperature": [2.4, 5.2, 2.2]},
 3
 4data_name = "Temperaturmessung-001"
 5data = sdata.Data(name=data_name,
 6                  uuid=sdata.uuid_from_str(data_name),
 7                  table=df,
 8                  description="Messergebnis Temperatur")
 9data.metadata.add("time",
10                  value=None,
11                  dtype="float",
12                  unit="s",
13                  description="Zeitachse",
14                  label="time $t$")
15data.metadata.add("temperature",
16                  value=None,
17                  dtype="float",
18                  unit="°C",
19                  description="Zeitachse",
20                  label="temperature $T$")
21data.describe()
 1import matplotlib.pyplot as plt
 2fig, ax = plt.subplots()
 3
 4x_var = "time"
 5y_var = "temperature"
 6
 7x_attr = data.metadata.get(x_var)
 8y_attr = data.metadata.get(y_var)
 9
10ax.plot(data.df[x_var], data.df[y_var], label=data.name)
11ax.legend(loc="best")
12ax.set_xlabel("{0.label} [{0.unit}]".format(x_attr))
13ax.set_ylabel("{0.label} [{0.unit}]".format(y_attr))
14print("plot")

Paper

https://zenodo.org/badge/DOI/10.5281/zenodo.4311323.svg

Examples

Creative Commons Lizenz

Usage examples

Dump and load a pandas dataframe

frame "pd.DataFrame()" as dataframe {
}

frame sdata [
sdata.Data()
====
Metadata
----
Dataframe
----
Description
]

cloud {

file hdf5

file json

file xlsx

file csv

file html
}

frame sdata2 [
sdata.Data()
====
Metadata
----
Dataframe
----
Description
]

frame sdata_wo_description [
sdata.Data()
====
Metadata
----
Dataframe
----

]

dataframe -right-> sdata

sdata --> xlsx

sdata --> hdf5

sdata --> json

sdata ..> csv

sdata --> html

json --> sdata2
hdf5 --> sdata2
csv --> sdata_wo_description
xlsx --> sdata2

import logging
logging.basicConfig(format='%(asctime)s %(levelname)s:%(message)s', level=logging.WARNING, datefmt='%I:%M:%S')

import os
import sys
import numpy as np
import pandas as pd
import sdata
print("sdata v{}".format(sdata.__version__))

# ## create a dataframe
df = pd.DataFrame({"a":[1.,2.,3.], "b":[4.,6.,1.]})

# ## create a Data object
data = sdata.Data(name="df",
                  uuid=sdata.uuid_from_str("df"),
                  table=df,
                  description="a pandas dataframe",)
# ## dump the data
# ### Excel IO
data.to_xlsx(filepath="/tmp/data1.xlsx")
data_xlsx = sdata.Data.from_xlsx(filepath="/tmp/data1.xlsx")
assert data.name==data_xlsx.name
assert data.uuid==data_xlsx.uuid
assert data.description==data_xlsx.description
print(data_xlsx)
data_xlsx.df

# ### Hdf5 IO
data.to_hdf5(filepath="/tmp/data1.hdf")
data_hdf5 = sdata.Data.from_hdf5(filepath="/tmp/data1.hdf")
assert data.name==data_xlsx.name
assert data.uuid==data_xlsx.uuid
assert data.description==data_hdf5.description
print(data_hdf5)
data_hdf5.df

# ### Json IO
data.to_json(filepath="/tmp/data1.json")
data_json = sdata.Data.from_json(filepath="/tmp/data1.json")
assert data.name==data_json.name
assert data.uuid==data_json.uuid
assert data.description==data_json.description
print(data_json)
data_json.df

# ### csv IO
data.to_csv(filepath="/tmp/data1.csv")
data_csv = sdata.Data.from_csv(filepath="/tmp/data1.csv")
assert data.name==data_csv.name
assert data.uuid==data_csv.uuid
# assert data.description==data_csv.description
assert data.df.shape == data_csv.df.shape
print(data_csv)
data_csv.df

# ### html export
data.to_html(filepath="/tmp/data1.html")

Components of sdata.Data

package "sdata.Data" as sdata {

    frame "Description" as description0 {
    }

    frame "Dataframe" as data0 {
    }

    frame "Metadata" as metadata0 {
    }

}

# ## create a Data object
data = sdata.Data(name="df",
                  table=pd.DataFrame({"a":[1.,2.,3.], "b":[4.,6.,1.]}),
                  description="a pandas dataframe",)

# Metadata
data.metadata

# Dataframe
data.df

# Description
data.description

sdata api

class sdata.Blob(**kwargs)[source]

Bases: Data

Binary Large Object as reference

Warning

highly experimental

ATTR_NAMES = []
SDATA_ATTRIBUTES = ['!sdata_version', '!sdata_name', '!sdata_uuid', '!sdata_class', '!sdata_parent', '!sdata_project', '!sdata_ctime', '!sdata_mtime']
SDATA_CLASS = '!sdata_class'
SDATA_CTIME = '!sdata_ctime'
SDATA_MTIME = '!sdata_mtime'
SDATA_NAME = '!sdata_name'
SDATA_PARENT = '!sdata_parent'
SDATA_PROJECT = '!sdata_project'
SDATA_UUID = '!sdata_uuid'
SDATA_VERSION = '!sdata_version'
VAULT_TYPES = ['filesystem', 'hdf5', 'db', 'www']
add_data(data)

add data, if data.name is unique

property asciiname
static clear_folder(path)

delete subfolder in export folder

Parameters

path – path

Returns

None

clear_group()

clear group dict

copy(**kwargs)

create a copy of the Data object

data = sdata.Data(name="data", uuid="38b26864e7794f5182d38459bab85842", description="this is remarkable")
datac = data.copy()
print("data  {0.uuid}".format(data))
print("datac {0.uuid}".format(datac))
print("datac.metadata['!sdata_parent'] {0.value}".format(datac.metadata["sdata_parent"]))
data  38b26864e7794f5182d38459bab85842
datac 2c4eb15900af435d8cd9c8573ca777e2
datac.metadata['!sdata_parent'] 38b26864e7794f5182d38459bab85842
Returns

Data

describe()

Generate descriptive info of the data

df = pd.DataFrame([1,2,3])
data = sdata.Data(name='my name',
            uuid='38b26864e7794f5182d38459bab85842',
            table=df,
            description="A remarkable description")
data.describe()
                0
metadata        3
table_rows      3
table_columns   1
description    24
Returns

pd.DataFrame

property description

description of the object

description_from_df(df)

set description from DataFrame of lines

Returns

description_to_df()

get description as DataFrame

Returns

DataFrame of description lines

property df

table object(pandas.DataFrame)

dir()

returns a nested list of all child objects

Returns

list of sdata.Data objects

exists(vault='filesystem')[source]

Test whether a object under the blob.url exists.

Parameters

vault

Returns

property filename
classmethod from_csv(s=None, filepath=None, sep=';')

import sdata.Data from csv

Parameters
  • s – csv str

  • filepath

  • sep – separator (default=”;”)

Returns

sdata.Data

classmethod from_folder(path)

sdata object instance

Parameters

path

Returns

classmethod from_hdf5(filepath, **kwargs)

import sdata.Data from hdf5

Parameters

filepath

Returns

sdata.Data

classmethod from_json(s=None, filepath=None)

create Data from json str or file

Parameters
  • s – json str

  • filepath

Returns

sdata.Data

classmethod from_sqlite(filepath, **kwargs)

import sdata.Data from sqlite

Parameters
  • filepath

  • kwargs

Returns

sdata.Data

classmethod from_url(url=None, stype=None)

create Data from json str or file

Parameters
  • url – url

  • stype – “json” (“xlsx”, “csv”)

Returns

sdata.Data

classmethod from_xlsx(filepath)

save table as xlsx

Parameters

filepath

Returns

gen_uuid()

generate new uuid string

Returns

str, e.g. ‘5fa04a3738e4431dbc34eccea5e795c4’

gen_uuid_from_state()

generate the same uuid for the same data

Returns

uuid

get_data_by_name(name)

:return obj by name

get_data_by_uuid(uid)

get data by uuid

Generates a link allowing the data in a given panda dataframe to be downloaded in: dataframe out: href string

get_group()
property group

get group

items()

get all child objects

Returns

[(child uuid, child objects), ]

keys()

get all child objects uuids

Returns

list of uuid’s

property md5

calculate the md5 hash of the blob

Returns

sha1

classmethod metadata_from_hdf5(filepath, **kwargs)

import sdata.Data.Metadata from hdf5

Parameters

filepath

Returns

sdata.Data

property name

name of the object

property osname
Returns

os compatible name (ascii?)

property prefix

prefix of the object name

property project

name of the project

refactor(fix_columns=True, add_table_metadata=True)

helper function

  • to cleanup dataframe column name

  • to define Attributes for all dataframe columns

property sha1

calculate the sha1 hash of the blob

Returns

sha1

property sha3_256

Return a SHA3 hash of the sData object with a hashbit length of 32 bytes.

sdata.Data(name="1", uuid=sdata.uuid_from_str("1")).sha3_256

'c468e659891eb5dea6eb6baf73f51ca0688792bf9ad723209dc22730903f6efa'
Returns

hashlib.sha3_256.hexdigest()

property sha3_256_table

Return a SHA3 hash of the sData.table object with a hashbit length of 32 bytes.

sdata.Data(name="1", uuid=sdata.uuid_from_str("1")).sha3_256_table

'c468e659891eb5dea6eb6baf73f51ca0688792bf9ad723209dc22730903f6efa'
Returns

hashlib.sha3_256.hexdigest()

property table

table object(pandas.DataFrame)

to_csv(filepath=None)

export sdata.Data to csv

Parameters

filepath

Returns

to_folder(path, dtype='csv')

export data to folder

Parameters
  • path

  • dtype

Returns

to_hdf5(filepath, **kwargs)

export sdata.Data to hdf5

Parameters
  • filepath

  • complib – default=’zlib’ [‘zlib’, ‘lzo’, ‘bzip2’, ‘blosc’, ‘blosc:blosclz’, ‘blosc:lz4’, ‘blosc:lz4hc’, ‘blosc:snappy’, ‘blosc:zlib’, ‘blosc:zstd’]

  • complevel – default=9 [0-9]

Returns

to_html(filepath, xlsx=True, style=None)

export Data to html

Parameters
  • filepath

  • xlsx

  • style

Returns

to_json(filepath=None)

export Data in json format

Parameters

filepath – export file path (default:None)

Returns

json str

to_sqlite(filepath, **kwargs)

export sdata.Data to sqlite

Parameters
  • filepath

  • kwargs

Returns

to_xlsx(filepath=None)

export atrributes and data to excel

Parameters

filepath

Returns

to_xlsx_base64()

get xlsx as byteio base64 encoded

Returns

base64

to_xlsx_byteio()

get xlsx as byteio

Returns

BytesIO

tree_folder(dir, padding='  ', print_files=True, hidden_files=False, last=True)

print tree folder structure

update_hash(fh, hashobject, buffer_size=65536)[source]

A hash represents the object used to calculate a checksum of a string of information.

hashobject = hashlib.md5()
df = pd.DataFrame([1,2,3])
url = "/tmp/blob.csv"
df.to_csv(url)
blob = sdata.Blob(url=url)
fh = open(url, "rb")
blob.update_hash(fh, hashobject)
hashobject.hexdigest()
Parameters
  • fh – file handle

  • hashobject – hash object, e.g. hashlib.sha1()

  • buffer_size – buffer size (default buffer_size=65536)

Returns

hashobject

update_mtime()

update modification time

Returns

property url

url of the blob

property uuid

uuid of the object

values()

get all child objects

Returns

list of child objects

verify_attributes()

check mandatory attributes

class sdata.Data(**kwargs)[source]

Bases: object

Base sdata object

ATTR_NAMES = []
SDATA_ATTRIBUTES = ['!sdata_version', '!sdata_name', '!sdata_uuid', '!sdata_class', '!sdata_parent', '!sdata_project', '!sdata_ctime', '!sdata_mtime']
SDATA_CLASS = '!sdata_class'
SDATA_CTIME = '!sdata_ctime'
SDATA_MTIME = '!sdata_mtime'
SDATA_NAME = '!sdata_name'
SDATA_PARENT = '!sdata_parent'
SDATA_PROJECT = '!sdata_project'
SDATA_UUID = '!sdata_uuid'
SDATA_VERSION = '!sdata_version'
add_data(data)[source]

add data, if data.name is unique

property asciiname
static clear_folder(path)[source]

delete subfolder in export folder

Parameters

path – path

Returns

None

clear_group()[source]

clear group dict

copy(**kwargs)[source]

create a copy of the Data object

data = sdata.Data(name="data", uuid="38b26864e7794f5182d38459bab85842", description="this is remarkable")
datac = data.copy()
print("data  {0.uuid}".format(data))
print("datac {0.uuid}".format(datac))
print("datac.metadata['!sdata_parent'] {0.value}".format(datac.metadata["sdata_parent"]))
data  38b26864e7794f5182d38459bab85842
datac 2c4eb15900af435d8cd9c8573ca777e2
datac.metadata['!sdata_parent'] 38b26864e7794f5182d38459bab85842
Returns

Data

describe()[source]

Generate descriptive info of the data

df = pd.DataFrame([1,2,3])
data = sdata.Data(name='my name',
            uuid='38b26864e7794f5182d38459bab85842',
            table=df,
            description="A remarkable description")
data.describe()
                0
metadata        3
table_rows      3
table_columns   1
description    24
Returns

pd.DataFrame

property description

description of the object

description_from_df(df)[source]

set description from DataFrame of lines

Returns

description_to_df()[source]

get description as DataFrame

Returns

DataFrame of description lines

property df

table object(pandas.DataFrame)

dir()[source]

returns a nested list of all child objects

Returns

list of sdata.Data objects

property filename
classmethod from_csv(s=None, filepath=None, sep=';')[source]

import sdata.Data from csv

Parameters
  • s – csv str

  • filepath

  • sep – separator (default=”;”)

Returns

sdata.Data

classmethod from_folder(path)[source]

sdata object instance

Parameters

path

Returns

classmethod from_hdf5(filepath, **kwargs)[source]

import sdata.Data from hdf5

Parameters

filepath

Returns

sdata.Data

classmethod from_json(s=None, filepath=None)[source]

create Data from json str or file

Parameters
  • s – json str

  • filepath

Returns

sdata.Data

classmethod from_sqlite(filepath, **kwargs)[source]

import sdata.Data from sqlite

Parameters
  • filepath

  • kwargs

Returns

sdata.Data

classmethod from_url(url=None, stype=None)[source]

create Data from json str or file

Parameters
  • url – url

  • stype – “json” (“xlsx”, “csv”)

Returns

sdata.Data

classmethod from_xlsx(filepath)[source]

save table as xlsx

Parameters

filepath

Returns

gen_uuid()[source]

generate new uuid string

Returns

str, e.g. ‘5fa04a3738e4431dbc34eccea5e795c4’

gen_uuid_from_state()[source]

generate the same uuid for the same data

Returns

uuid

get_data_by_name(name)[source]

:return obj by name

get_data_by_uuid(uid)[source]

get data by uuid

Generates a link allowing the data in a given panda dataframe to be downloaded in: dataframe out: href string

get_group()[source]
property group

get group

items()[source]

get all child objects

Returns

[(child uuid, child objects), ]

keys()[source]

get all child objects uuids

Returns

list of uuid’s

classmethod metadata_from_hdf5(filepath, **kwargs)[source]

import sdata.Data.Metadata from hdf5

Parameters

filepath

Returns

sdata.Data

property name

name of the object

property osname
Returns

os compatible name (ascii?)

property prefix

prefix of the object name

property project

name of the project

refactor(fix_columns=True, add_table_metadata=True)[source]

helper function

  • to cleanup dataframe column name

  • to define Attributes for all dataframe columns

property sha3_256

Return a SHA3 hash of the sData object with a hashbit length of 32 bytes.

sdata.Data(name="1", uuid=sdata.uuid_from_str("1")).sha3_256

'c468e659891eb5dea6eb6baf73f51ca0688792bf9ad723209dc22730903f6efa'
Returns

hashlib.sha3_256.hexdigest()

property sha3_256_table

Return a SHA3 hash of the sData.table object with a hashbit length of 32 bytes.

sdata.Data(name="1", uuid=sdata.uuid_from_str("1")).sha3_256_table

'c468e659891eb5dea6eb6baf73f51ca0688792bf9ad723209dc22730903f6efa'
Returns

hashlib.sha3_256.hexdigest()

property table

table object(pandas.DataFrame)

to_csv(filepath=None)[source]

export sdata.Data to csv

Parameters

filepath

Returns

to_folder(path, dtype='csv')[source]

export data to folder

Parameters
  • path

  • dtype

Returns

to_hdf5(filepath, **kwargs)[source]

export sdata.Data to hdf5

Parameters
  • filepath

  • complib – default=’zlib’ [‘zlib’, ‘lzo’, ‘bzip2’, ‘blosc’, ‘blosc:blosclz’, ‘blosc:lz4’, ‘blosc:lz4hc’, ‘blosc:snappy’, ‘blosc:zlib’, ‘blosc:zstd’]

  • complevel – default=9 [0-9]

Returns

to_html(filepath, xlsx=True, style=None)[source]

export Data to html

Parameters
  • filepath

  • xlsx

  • style

Returns

to_json(filepath=None)[source]

export Data in json format

Parameters

filepath – export file path (default:None)

Returns

json str

to_sqlite(filepath, **kwargs)[source]

export sdata.Data to sqlite

Parameters
  • filepath

  • kwargs

Returns

to_xlsx(filepath=None)[source]

export atrributes and data to excel

Parameters

filepath

Returns

to_xlsx_base64()[source]

get xlsx as byteio base64 encoded

Returns

base64

to_xlsx_byteio()[source]

get xlsx as byteio

Returns

BytesIO

tree_folder(dir, padding='  ', print_files=True, hidden_files=False, last=True)[source]

print tree folder structure

update_hash(hashobject)[source]

A hash represents the object used to calculate a checksum of a string of information.

data = sdata.Data()

md5 = hashlib.md5()
data.update_hash(md5)
md5.hexdigest()
'bbf323bdcb0bf961803b5504a8a60d69'

sha1 = hashlib.sha1()
data.update_hash(sha1)
sha1.hexdigest()
'3c59368c7735c1ecaf03ebd4c595bb6e73e90f0c'

hashobject = hashlib.sha3_256()
data.update_hash(hashobject).hexdigest()
'c468e659891eb5dea6eb6baf73f51ca0688792bf9ad723209dc22730903f6efa'

data.update_hash(hashobject).digest()
b'M8...'
Parameters

hash – hash object, e.g. hashlib.sha1()

Returns

hash

update_mtime()[source]

update modification time

Returns

property uuid

uuid of the object

values()[source]

get all child objects

Returns

list of child objects

verify_attributes()[source]

check mandatory attributes

class sdata.metadata.Attribute(name, value, **kwargs)[source]

Bases: object

Attribute class

DTYPES = {'bool': <class 'bool'>, 'float': <class 'float'>, 'int': <class 'int'>, 'str': <class 'str'>, 'timestamp': <class 'sdata.timestamp.TimeStamp'>}
property description

Attribute description

property dtype

Attribute type str

static guess_dtype(value)[source]

returns dtype class

Parameters

value

Returns

__class__

property label

Attribute label

property name

Attribute name

property required

Attribute required

to_csv(prefix='', sep=',', quote=None)[source]

export Attribute to csv

Parameters
  • prefix

  • sep

  • quote

Returns

to_dict()[source]

:returns dict of attribute items

to_list()[source]
property unit

Attribute unit

property value

Attribute value

class sdata.metadata.Metadata(**kwargs)[source]

Bases: object

Metadata container class

each Metadata entry has has a
  • name (256)

  • value

  • unit

  • description

  • type (int, str, float, bool, timestamp)

ATTRIBUTEKEYS = ['name', 'value', 'dtype', 'unit', 'description', 'label', 'required']
add(name, value=None, **kwargs)[source]

add Attribute

Parameters
  • name

  • value

  • kwargs

Returns

property attributes

returns Attributes

copy()[source]

returns a deep copy

property df

create dataframe

classmethod from_csv(filepath)[source]

create metadata from dataframe

classmethod from_dataframe(df)[source]

create metadata from dataframe

classmethod from_dict(d)[source]

setup metadata from dict

classmethod from_json(jsonstr=None, filepath=None)[source]

create metadata from json file

Parameters
  • jsonstr – json str

  • filepath – filepath to json file

Returns

Metadata

classmethod from_list(mlist)[source]

create metadata from a list of Attribute values

[[‘force_x’, 1.2, ‘float’, ‘kN’, ‘force in x-direction’],

[‘force_y’, 3.1, ‘float’, ‘N’, ‘force in y-direction’, ‘label’, True]]

get(name, default=None)[source]
get_attr(name)[source]

get Attribute by name

get_sdict()[source]

get sdata attribute as dict

get_udict()[source]

get user attribute as dict

static guess_dtype_from_value(value)[source]

guess dtype from value, e.g. ‘1.23’ -> ‘float’ ‘otto1.23’ -> ‘str’ 1 -> ‘int’ False -> ‘bool’

Parameters

value

Returns

dtype(value), dtype [‘int’, ‘float’, ‘bool’, ‘str’]

guess_value_dtype()[source]

try to cast the Attribute values, e.g. str -> float

Returns

is_complete()[source]

check all required attributes

items()[source]
Returns

list of Attribute items (keys, values)

keys()[source]
Returns

list of Attribute names

property name

Name of the Metadata

relabel(name, newname)[source]

relabel Attribute

Parameters
  • name – old attribute name

  • newname – new attribute name

Returns

None

property required_attributes
property sdata_attributes
property sdf

create dataframe for sdata attributes

property sdft

create transposed dataframe for sdata attributes

set_attr(name='N.N.', value=None, **kwargs)[source]

set Attribute

set_unit_from_name(add_description=True, fix_name=True)[source]

try to extract unit from attribute name

Returns

property sha3_256

Return a new SHA3 hash object with a hashbit length of 32 bytes.

Returns

hashlib.sha3_256.hexdigest()

property size

return number uf Attribute

to_csv(filepath=None, sep=',', header=False)[source]

serialize to csv

to_csv_header(prefix='#', sep=',', filepath=None)[source]

serialize to csv

to_dataframe()[source]

create dataframe

to_dict()[source]

serialize attributes to dict

to_json(filepath=None)[source]

create a json

Parameters

filepath – default None

Returns

json str

to_list()[source]

create a nested list of Attribute values

Returns

list

property udf

create dataframe for user attributes

update_from_dict(d)[source]

set attributes from dict

Parameters

d – dict

Returns

update_from_usermetadata(metadata)[source]

update user metadata from metadata

update_hash(hashobject)[source]

A hash represents the object used to calculate a checksum of a string of information.

hashobject = hashlib.sha3_256()
metadata = Metadata()
metadata.update_hash(hashobject)
hash.hexdigest()
Parameters

hash – hash object

Returns

hash_function().hexdigest()

property user_attributes
values()[source]
Returns

list of Attribute values

sdata.metadata.extract_name_unit(value)[source]

extract name and unit from a combined string

value: 'Target Strain Rate (1/s) '
name : 'Target Strain Rate'
unit : '1/s'

value: 'Gauge Length [mm] monkey '
name : 'Gauge Length'
unit : 'mm'

value: 'Gauge Length <mm> whatever '
name : 'Gauge Length'
unit : 'mm'
Parameters

value – string, e.g. ‘Length <mm> whatever’

Returns

name, unit

Further Reading

sdata package

Subpackages

sdata.experiments package

Submodules
sdata.experiments.ks2 module
sdata.experiments.lapshear module
sdata.experiments.material module
Module contents

sdata.io package

Submodules
sdata.io.hdf module
class sdata.io.hdf.FlatHDFDataStore(filepath, **kwargs)[source]

Bases: object

Flat HDF5 Store

store = FlatHDFDataStore(filepath="/tmp/mystore.h5")

data = sdata.Data(name="otto",
                  uuid="d4e97cedca6238bea16732ce88c1922f",
                  table=pd.DataFrame({"a": [1, 2, 3]}),
                  description="Hallo
Spencer”)

store.put(data)

loaded_data = store.get_data_by_uuid(“d4e97cedca6238bea16732ce88c1922f”) assert data.sha3_256 == loaded_data.sha3_256

close()[source]

close hdf store

Returns

get_all_metadata()[source]
get_data_by_uuid(uuid)[source]

get table by uuid

Parameters

uuid

Returns

get_dict()[source]
keys()[source]
put(data)[source]

store data in a pandas hdf5 store

sdata.io.pgp module
sdata.io.pud module
class sdata.io.pud.Pud(**kwargs)[source]

Bases: Data

run object, e.g. single tension test simulation

ATTRIBUTES = ['material_norm_name', 'material_number_norm', 'material_name', 'test', 'sample_ident_number', 'sample_geometry', 'sample_direction', 'nominal_pre_deformation <%>', 'actual_pre_deformation <%>', 'direction_of_pre_deformation', 'heat_treatment', 'actual_sample_width_<mm>', 'actual_sample_thickness_<mm>', 'actual_gauge_length_<mm>', 'nominal_testing_temperature_<K>', 'nominal_testing_speed_<m/s>', 'order', 'date_of_test_<dd.mm.yyyy>', 'tester', 'place_of_test', 'remark', 'data']
classmethod from_file(filepath)[source]

read pud file

WERKSTOFF_NORM_NAME                    = HC340LA
WERKSTOFFNUMMER_NORM                   =
MATERIALNAME                           = HC340LA
PRUEFUNG                               = FLIESSKURVE
PROBENIDENTNUMMER                      = id0815
PROBENGEOMETRIE                        =
ENTNAHMERICHTUNG                       = Quer (90deg)
VORVERFORMUNG_SOLL <%>                 = 0
VORVERFORMUNG_IST  <%>                 =
VORVERFORMUNGSRICHTUNG                 = Unverformt
WAERMEBEHANDLUNG                       = O
PROBENBREITE_IST <mm>                  = 20.014
PROBENDICKE_IST <mm>                   = 0.751
MESSLAENGE_IST <mm>                    = 80.0
MESSLAENGE_IST_FD <mm>                 = 80.0
PRUEFTEMPERATUR_SOLL <K>               = 293
PRUEFGESCHWINDIGKEIT_SOLL  <mm/s>      = 0.32
PRUEFGESCHWINDIGKEIT_IST   <mm/s>      = 0.32
DEHNRATE_SOLL <1/s>                    = 0.004
DEHNRATE_IST  <1/s>                    = 0.004
AUFTRAG                                =
PRUEFDATUM  <tt.mm.jjjj>               = 19.03.2017
PRUEFER                                = Otto
PRUEFSTELLE                            = SALZGITTER AG
BEMERKUNG                              = ASL 2009-056
DATEN                                  =
ZEIT <s>; KRAFT <N>; WEG <mm>; BREITENAENDERUNG <mm>; WEG_FD <mm>
1.2372;192.181;-0.0235;0.0012;-0.0235
1.2772;198.325;-0.0231;0.0012;-0.0231
1.2972;201.397;-0.0227;0.0012;-0.0227
1.3172;205.152;-0.0224;0.0013;-0.0224
1.3572;211.638;-0.022;0.0013;-0.022
1.3972;218.123;-0.0213;0.0013;-0.0213
sdata.io.vault module
class sdata.io.vault.FileSystemVault(rootpath, **kwargs)[source]

Bases: Vault

data vault on the filesystem

dump_blob(blob)[source]

store blob in vault

get_blob_by_name(name)[source]

get blob by name

Parameters

name

Returns

property index

get vault index

Returns

keys()[source]
list()[source]

get index from vault

Returns

load_blob(blob_uuid)[source]

get blob from vault

load_blob_metadata(blob_uuid)[source]

get blob.metadata from vault

load_blob_metadata_value_df(blob_uuid)[source]

get blob.metadata attribute.value(s) from vault

reindex()[source]

create index from vault

Returns

reindex_hfd5()[source]

get index from vault

Returns

df

exception sdata.io.vault.FilesystemVaultException[source]

Bases: VaultException

class sdata.io.vault.Hdf5Vault(rootpath, **kwargs)[source]

Bases: Vault

data vault on the filesystem

dump_blob(data)[source]

store blob in vault

initialize(**kwargs)[source]
iteritems()[source]
keys()[source]

get all blob keys

>>> keys = vault.keys()
['b5d2d05638db48d69d044a34e83aaa41', '21b83703d98e38a7be2e50e38326d0ce']
Returns

list of keys

load_blob(data_uuid, ignore_errors=True)[source]

get blob from vault

exception sdata.io.vault.Hdf5VaultException[source]

Bases: VaultException

class sdata.io.vault.Vault(rootpath, **kwargs)[source]

Bases: object

data vault

INDEXFILENAME = 'index'
OBJECTPATH = 'objects'
dump_blob(blob)[source]

store blob in vault

find_blobs(name=None, **kwargs)[source]

find blobs in vault

property index

get vault index

Returns

items()[source]

get vault items

Returns

(key, blob)

iteritems()[source]
itervalues()[source]
keys()[source]
load_blob(blob_uuid)[source]

get blob from vault

reindex()[source]

create index from vault

Returns

property rootpath
values()[source]

get vault values

Returns

(key, blob)

exception sdata.io.vault.VaultException[source]

Bases: Exception

class sdata.io.vault.VaultIndex[source]

Bases: object

Index of a Vault

INDEXDATAFRAME = 'indexdataframe'
property df

index dataframe

classmethod from_hdf5(filepath)[source]

read index dataframe from hdf5

Parameters

filepath

Returns

VaultIndex

get_names()[source]

get all data names from index

property name

index dataframe

to_hdf5(filepath, **kwargs)[source]

store Vault Index in hdf5 store

Parameters
  • filepath

  • kwargs

Returns

update_from_sdft(sdft)[source]
class sdata.io.vault.VaultSqliteIndex(db_file)[source]

Bases: object

Index of a Vault

property df

index dataframe

Returns

pd.DataFrame

drop_db()[source]

create new database

get_all_metadata()[source]

get all blob metadata

Returns

initialize()[source]

initialize index db

Returns

update_from_metadata(metadata)[source]

store sdata metadata

Module contents
class sdata.io.PID(*args, **kwargs)[source]

Bases: object

Process object, which has an uuid and metadata

add_attr(key, value)[source]
get_attr(key, default=None)[source]
get_metadata()[source]
get_name()[source]
get_uuid()[source]
property metadata
property name
set_attr(key, value)[source]
set_metadata(metadata)[source]
set_name(name)[source]
set_uuid(uuid)[source]
property uuid

Submodules

sdata.blob module

class sdata.blob.Blob(**kwargs)[source]

Bases: Data

Binary Large Object as reference

Warning

highly experimental

VAULT_TYPES = ['filesystem', 'hdf5', 'db', 'www']
exists(vault='filesystem')[source]

Test whether a object under the blob.url exists.

Parameters

vault

Returns

property md5

calculate the md5 hash of the blob

Returns

sha1

property sha1

calculate the sha1 hash of the blob

Returns

sha1

update_hash(fh, hashobject, buffer_size=65536)[source]

A hash represents the object used to calculate a checksum of a string of information.

hashobject = hashlib.md5()
df = pd.DataFrame([1,2,3])
url = "/tmp/blob.csv"
df.to_csv(url)
blob = sdata.Blob(url=url)
fh = open(url, "rb")
blob.update_hash(fh, hashobject)
hashobject.hexdigest()
Parameters
  • fh – file handle

  • hashobject – hash object, e.g. hashlib.sha1()

  • buffer_size – buffer size (default buffer_size=65536)

Returns

hashobject

property url

url of the blob

sdata.data module

class sdata.data.Data(**kwargs)[source]

Bases: object

Base sdata object

ATTR_NAMES = []
SDATA_ATTRIBUTES = ['!sdata_version', '!sdata_name', '!sdata_uuid', '!sdata_class', '!sdata_parent', '!sdata_project', '!sdata_ctime', '!sdata_mtime']
SDATA_CLASS = '!sdata_class'
SDATA_CTIME = '!sdata_ctime'
SDATA_MTIME = '!sdata_mtime'
SDATA_NAME = '!sdata_name'
SDATA_PARENT = '!sdata_parent'
SDATA_PROJECT = '!sdata_project'
SDATA_UUID = '!sdata_uuid'
SDATA_VERSION = '!sdata_version'
add_data(data)[source]

add data, if data.name is unique

property asciiname
static clear_folder(path)[source]

delete subfolder in export folder

Parameters

path – path

Returns

None

clear_group()[source]

clear group dict

copy(**kwargs)[source]

create a copy of the Data object

data = sdata.Data(name="data", uuid="38b26864e7794f5182d38459bab85842", description="this is remarkable")
datac = data.copy()
print("data  {0.uuid}".format(data))
print("datac {0.uuid}".format(datac))
print("datac.metadata['!sdata_parent'] {0.value}".format(datac.metadata["sdata_parent"]))
data  38b26864e7794f5182d38459bab85842
datac 2c4eb15900af435d8cd9c8573ca777e2
datac.metadata['!sdata_parent'] 38b26864e7794f5182d38459bab85842
Returns

Data

describe()[source]

Generate descriptive info of the data

df = pd.DataFrame([1,2,3])
data = sdata.Data(name='my name',
            uuid='38b26864e7794f5182d38459bab85842',
            table=df,
            description="A remarkable description")
data.describe()
                0
metadata        3
table_rows      3
table_columns   1
description    24
Returns

pd.DataFrame

property description

description of the object

description_from_df(df)[source]

set description from DataFrame of lines

Returns

description_to_df()[source]

get description as DataFrame

Returns

DataFrame of description lines

property df

table object(pandas.DataFrame)

dir()[source]

returns a nested list of all child objects

Returns

list of sdata.Data objects

property filename
classmethod from_csv(s=None, filepath=None, sep=';')[source]

import sdata.Data from csv

Parameters
  • s – csv str

  • filepath

  • sep – separator (default=”;”)

Returns

sdata.Data

classmethod from_folder(path)[source]

sdata object instance

Parameters

path

Returns

classmethod from_hdf5(filepath, **kwargs)[source]

import sdata.Data from hdf5

Parameters

filepath

Returns

sdata.Data

classmethod from_json(s=None, filepath=None)[source]

create Data from json str or file

Parameters
  • s – json str

  • filepath

Returns

sdata.Data

classmethod from_sqlite(filepath, **kwargs)[source]

import sdata.Data from sqlite

Parameters
  • filepath

  • kwargs

Returns

sdata.Data

classmethod from_url(url=None, stype=None)[source]

create Data from json str or file

Parameters
  • url – url

  • stype – “json” (“xlsx”, “csv”)

Returns

sdata.Data

classmethod from_xlsx(filepath)[source]

save table as xlsx

Parameters

filepath

Returns

gen_uuid()[source]

generate new uuid string

Returns

str, e.g. ‘5fa04a3738e4431dbc34eccea5e795c4’

gen_uuid_from_state()[source]

generate the same uuid for the same data

Returns

uuid

get_data_by_name(name)[source]

:return obj by name

get_data_by_uuid(uid)[source]

get data by uuid

Generates a link allowing the data in a given panda dataframe to be downloaded in: dataframe out: href string

get_group()[source]
property group

get group

items()[source]

get all child objects

Returns

[(child uuid, child objects), ]

keys()[source]

get all child objects uuids

Returns

list of uuid’s

classmethod metadata_from_hdf5(filepath, **kwargs)[source]

import sdata.Data.Metadata from hdf5

Parameters

filepath

Returns

sdata.Data

property name

name of the object

property osname
Returns

os compatible name (ascii?)

property prefix

prefix of the object name

property project

name of the project

refactor(fix_columns=True, add_table_metadata=True)[source]

helper function

  • to cleanup dataframe column name

  • to define Attributes for all dataframe columns

property sha3_256

Return a SHA3 hash of the sData object with a hashbit length of 32 bytes.

sdata.Data(name="1", uuid=sdata.uuid_from_str("1")).sha3_256

'c468e659891eb5dea6eb6baf73f51ca0688792bf9ad723209dc22730903f6efa'
Returns

hashlib.sha3_256.hexdigest()

property sha3_256_table

Return a SHA3 hash of the sData.table object with a hashbit length of 32 bytes.

sdata.Data(name="1", uuid=sdata.uuid_from_str("1")).sha3_256_table

'c468e659891eb5dea6eb6baf73f51ca0688792bf9ad723209dc22730903f6efa'
Returns

hashlib.sha3_256.hexdigest()

property table

table object(pandas.DataFrame)

to_csv(filepath=None)[source]

export sdata.Data to csv

Parameters

filepath

Returns

to_folder(path, dtype='csv')[source]

export data to folder

Parameters
  • path

  • dtype

Returns

to_hdf5(filepath, **kwargs)[source]

export sdata.Data to hdf5

Parameters
  • filepath

  • complib – default=’zlib’ [‘zlib’, ‘lzo’, ‘bzip2’, ‘blosc’, ‘blosc:blosclz’, ‘blosc:lz4’, ‘blosc:lz4hc’, ‘blosc:snappy’, ‘blosc:zlib’, ‘blosc:zstd’]

  • complevel – default=9 [0-9]

Returns

to_html(filepath, xlsx=True, style=None)[source]

export Data to html

Parameters
  • filepath

  • xlsx

  • style

Returns

to_json(filepath=None)[source]

export Data in json format

Parameters

filepath – export file path (default:None)

Returns

json str

to_sqlite(filepath, **kwargs)[source]

export sdata.Data to sqlite

Parameters
  • filepath

  • kwargs

Returns

to_xlsx(filepath=None)[source]

export atrributes and data to excel

Parameters

filepath

Returns

to_xlsx_base64()[source]

get xlsx as byteio base64 encoded

Returns

base64

to_xlsx_byteio()[source]

get xlsx as byteio

Returns

BytesIO

tree_folder(dir, padding='  ', print_files=True, hidden_files=False, last=True)[source]

print tree folder structure

update_hash(hashobject)[source]

A hash represents the object used to calculate a checksum of a string of information.

data = sdata.Data()

md5 = hashlib.md5()
data.update_hash(md5)
md5.hexdigest()
'bbf323bdcb0bf961803b5504a8a60d69'

sha1 = hashlib.sha1()
data.update_hash(sha1)
sha1.hexdigest()
'3c59368c7735c1ecaf03ebd4c595bb6e73e90f0c'

hashobject = hashlib.sha3_256()
data.update_hash(hashobject).hexdigest()
'c468e659891eb5dea6eb6baf73f51ca0688792bf9ad723209dc22730903f6efa'

data.update_hash(hashobject).digest()
b'M8...'
Parameters

hash – hash object, e.g. hashlib.sha1()

Returns

hash

update_mtime()[source]

update modification time

Returns

property uuid

uuid of the object

values()[source]

get all child objects

Returns

list of child objects

verify_attributes()[source]

check mandatory attributes

exception sdata.data.Sdata_Name_Exeption[source]

Bases: Exception

exception sdata.data.Sdata_Uuid_Exeption[source]

Bases: Exception

sdata.data.print_classes()[source]
sdata.data.uuid_from_str(name)[source]

sdata.metadata module

class sdata.metadata.Attribute(name, value, **kwargs)[source]

Bases: object

Attribute class

DTYPES = {'bool': <class 'bool'>, 'float': <class 'float'>, 'int': <class 'int'>, 'str': <class 'str'>, 'timestamp': <class 'sdata.timestamp.TimeStamp'>}
property description

Attribute description

property dtype

Attribute type str

static guess_dtype(value)[source]

returns dtype class

Parameters

value

Returns

__class__

property label

Attribute label

property name

Attribute name

property required

Attribute required

to_csv(prefix='', sep=',', quote=None)[source]

export Attribute to csv

Parameters
  • prefix

  • sep

  • quote

Returns

to_dict()[source]

:returns dict of attribute items

to_list()[source]
property unit

Attribute unit

property value

Attribute value

class sdata.metadata.Metadata(**kwargs)[source]

Bases: object

Metadata container class

each Metadata entry has has a
  • name (256)

  • value

  • unit

  • description

  • type (int, str, float, bool, timestamp)

ATTRIBUTEKEYS = ['name', 'value', 'dtype', 'unit', 'description', 'label', 'required']
add(name, value=None, **kwargs)[source]

add Attribute

Parameters
  • name

  • value

  • kwargs

Returns

property attributes

returns Attributes

copy()[source]

returns a deep copy

property df

create dataframe

classmethod from_csv(filepath)[source]

create metadata from dataframe

classmethod from_dataframe(df)[source]

create metadata from dataframe

classmethod from_dict(d)[source]

setup metadata from dict

classmethod from_json(jsonstr=None, filepath=None)[source]

create metadata from json file

Parameters
  • jsonstr – json str

  • filepath – filepath to json file

Returns

Metadata

classmethod from_list(mlist)[source]

create metadata from a list of Attribute values

[[‘force_x’, 1.2, ‘float’, ‘kN’, ‘force in x-direction’],

[‘force_y’, 3.1, ‘float’, ‘N’, ‘force in y-direction’, ‘label’, True]]

get(name, default=None)[source]
get_attr(name)[source]

get Attribute by name

get_sdict()[source]

get sdata attribute as dict

get_udict()[source]

get user attribute as dict

static guess_dtype_from_value(value)[source]

guess dtype from value, e.g. ‘1.23’ -> ‘float’ ‘otto1.23’ -> ‘str’ 1 -> ‘int’ False -> ‘bool’

Parameters

value

Returns

dtype(value), dtype [‘int’, ‘float’, ‘bool’, ‘str’]

guess_value_dtype()[source]

try to cast the Attribute values, e.g. str -> float

Returns

is_complete()[source]

check all required attributes

items()[source]
Returns

list of Attribute items (keys, values)

keys()[source]
Returns

list of Attribute names

property name

Name of the Metadata

relabel(name, newname)[source]

relabel Attribute

Parameters
  • name – old attribute name

  • newname – new attribute name

Returns

None

property required_attributes
property sdata_attributes
property sdf

create dataframe for sdata attributes

property sdft

create transposed dataframe for sdata attributes

set_attr(name='N.N.', value=None, **kwargs)[source]

set Attribute

set_unit_from_name(add_description=True, fix_name=True)[source]

try to extract unit from attribute name

Returns

property sha3_256

Return a new SHA3 hash object with a hashbit length of 32 bytes.

Returns

hashlib.sha3_256.hexdigest()

property size

return number uf Attribute

to_csv(filepath=None, sep=',', header=False)[source]

serialize to csv

to_csv_header(prefix='#', sep=',', filepath=None)[source]

serialize to csv

to_dataframe()[source]

create dataframe

to_dict()[source]

serialize attributes to dict

to_json(filepath=None)[source]

create a json

Parameters

filepath – default None

Returns

json str

to_list()[source]

create a nested list of Attribute values

Returns

list

property udf

create dataframe for user attributes

update_from_dict(d)[source]

set attributes from dict

Parameters

d – dict

Returns

update_from_usermetadata(metadata)[source]

update user metadata from metadata

update_hash(hashobject)[source]

A hash represents the object used to calculate a checksum of a string of information.

hashobject = hashlib.sha3_256()
metadata = Metadata()
metadata.update_hash(hashobject)
hash.hexdigest()
Parameters

hash – hash object

Returns

hash_function().hexdigest()

property user_attributes
values()[source]
Returns

list of Attribute values

sdata.metadata.extract_name_unit(value)[source]

extract name and unit from a combined string

value: 'Target Strain Rate (1/s) '
name : 'Target Strain Rate'
unit : '1/s'

value: 'Gauge Length [mm] monkey '
name : 'Gauge Length'
unit : 'mm'

value: 'Gauge Length <mm> whatever '
name : 'Gauge Length'
unit : 'mm'
Parameters

value – string, e.g. ‘Length <mm> whatever’

Returns

name, unit

sdata.timestamp module

ISO 8601 date time string parsing

Basic usage:

>>> parse_date("2007-01-25T12:00:00Z")

datetime.datetime(2007, 1, 25, 12, 0, tzinfo=<iso8601.Utc …>)

MIT License

Copyright (c) 2007 - 2015 Michael Twomey

Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the “Software”), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED “AS IS”, WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.

sdata.timestamp.FixedOffset(offset_hours, offset_minutes, name)[source]
exception sdata.timestamp.ParseError[source]

Bases: Exception

Raised when there is a problem parsing a date string

class sdata.timestamp.TimeStamp(datetimestr=None)[source]

Bases: object

2017-04-26T09:04:00.660000+00:00

property local

returns the datetime isoformat string for the local timezone :returns str

property utc

returns the utc.isoformat string :returns str

sdata.timestamp.parse_date(datestring, default_timezone=datetime.timezone.utc)[source]

Parses ISO 8601 dates into datetime objects

The timezone is parsed from the date string. However it is quite common to have dates without a timezone (not strictly correct). In this case the default timezone specified in default_timezone is used. This is UTC by default.

Parameters
  • datestring – The date to parse as a string

  • default_timezone – A datetime tzinfo instance to use when no timezone is specified in the datestring. If this is set to None then a naive datetime object is returned.

Returns

A datetime.datetime instance

Raises

ParseError when there is a problem parsing the date or constructing the datetime instance.

sdata.tools module

sdata.tools.col_to_idx(col)[source]

covert xls column to index :param col .. column, e.g. ‘AA’

:returns index, e.g. 26

sdata.workbook module

class sdata.workbook.Workbook(**kwargs)[source]

Bases: Data

Workbook with Sheets

Warning

highly experimental

add_sheet(data)[source]

create a sheet to the Workbook

Parameters

data

Returns

sdata.Data

create_sheet(name)[source]

create a sheet to the Workbook

Parameters

data

Returns

sdata.Data

classmethod from_hdf5(filepath, **kwargs)[source]

import sdata.Data from hdf5

Parameters

filepath

Returns

sdata.Data

get(name, default=None)[source]
get_sheet(name, default=None)[source]

get Sheet from Workbook

Parameters

name

Returns

sheet data

items()[source]
Returns

list of Attribute items (keys, values)

keys()[source]
Returns

list of Attribute names

classmethod metadata_from_hdf5(filepath, **kwargs)[source]

import sdata.Data.Metadata from hdf5

Parameters

filepath

Returns

sdata.Data

property sheetnames

return sheet names of the workbook

property sheets

all sheets of the workbook

to_hdf5(filepath, **kwargs)[source]

export sdata.Data to hdf5

Parameters
  • filepath

  • complib – default=’zlib’ [‘zlib’, ‘lzo’, ‘bzip2’, ‘blosc’, ‘blosc:blosclz’, ‘blosc:lz4’, ‘blosc:lz4hc’, ‘blosc:snappy’, ‘blosc:zlib’, ‘blosc:zstd’]

  • complevel – default=9 [0-9]

Returns

values()[source]
Returns

list of Attribute values

Module contents

class sdata.Blob(**kwargs)[source]

Bases: Data

Binary Large Object as reference

Warning

highly experimental

VAULT_TYPES = ['filesystem', 'hdf5', 'db', 'www']
exists(vault='filesystem')[source]

Test whether a object under the blob.url exists.

Parameters

vault

Returns

property md5

calculate the md5 hash of the blob

Returns

sha1

property sha1

calculate the sha1 hash of the blob

Returns

sha1

update_hash(fh, hashobject, buffer_size=65536)[source]

A hash represents the object used to calculate a checksum of a string of information.

hashobject = hashlib.md5()
df = pd.DataFrame([1,2,3])
url = "/tmp/blob.csv"
df.to_csv(url)
blob = sdata.Blob(url=url)
fh = open(url, "rb")
blob.update_hash(fh, hashobject)
hashobject.hexdigest()
Parameters
  • fh – file handle

  • hashobject – hash object, e.g. hashlib.sha1()

  • buffer_size – buffer size (default buffer_size=65536)

Returns

hashobject

property url

url of the blob

class sdata.Data(**kwargs)[source]

Bases: object

Base sdata object

ATTR_NAMES = []
SDATA_ATTRIBUTES = ['!sdata_version', '!sdata_name', '!sdata_uuid', '!sdata_class', '!sdata_parent', '!sdata_project', '!sdata_ctime', '!sdata_mtime']
SDATA_CLASS = '!sdata_class'
SDATA_CTIME = '!sdata_ctime'
SDATA_MTIME = '!sdata_mtime'
SDATA_NAME = '!sdata_name'
SDATA_PARENT = '!sdata_parent'
SDATA_PROJECT = '!sdata_project'
SDATA_UUID = '!sdata_uuid'
SDATA_VERSION = '!sdata_version'
add_data(data)[source]

add data, if data.name is unique

property asciiname
static clear_folder(path)[source]

delete subfolder in export folder

Parameters

path – path

Returns

None

clear_group()[source]

clear group dict

copy(**kwargs)[source]

create a copy of the Data object

data = sdata.Data(name="data", uuid="38b26864e7794f5182d38459bab85842", description="this is remarkable")
datac = data.copy()
print("data  {0.uuid}".format(data))
print("datac {0.uuid}".format(datac))
print("datac.metadata['!sdata_parent'] {0.value}".format(datac.metadata["sdata_parent"]))
data  38b26864e7794f5182d38459bab85842
datac 2c4eb15900af435d8cd9c8573ca777e2
datac.metadata['!sdata_parent'] 38b26864e7794f5182d38459bab85842
Returns

Data

describe()[source]

Generate descriptive info of the data

df = pd.DataFrame([1,2,3])
data = sdata.Data(name='my name',
            uuid='38b26864e7794f5182d38459bab85842',
            table=df,
            description="A remarkable description")
data.describe()
                0
metadata        3
table_rows      3
table_columns   1
description    24
Returns

pd.DataFrame

property description

description of the object

description_from_df(df)[source]

set description from DataFrame of lines

Returns

description_to_df()[source]

get description as DataFrame

Returns

DataFrame of description lines

property df

table object(pandas.DataFrame)

dir()[source]

returns a nested list of all child objects

Returns

list of sdata.Data objects

property filename
classmethod from_csv(s=None, filepath=None, sep=';')[source]

import sdata.Data from csv

Parameters
  • s – csv str

  • filepath

  • sep – separator (default=”;”)

Returns

sdata.Data

classmethod from_folder(path)[source]

sdata object instance

Parameters

path

Returns

classmethod from_hdf5(filepath, **kwargs)[source]

import sdata.Data from hdf5

Parameters

filepath

Returns

sdata.Data

classmethod from_json(s=None, filepath=None)[source]

create Data from json str or file

Parameters
  • s – json str

  • filepath

Returns

sdata.Data

classmethod from_sqlite(filepath, **kwargs)[source]

import sdata.Data from sqlite

Parameters
  • filepath

  • kwargs

Returns

sdata.Data

classmethod from_url(url=None, stype=None)[source]

create Data from json str or file

Parameters
  • url – url

  • stype – “json” (“xlsx”, “csv”)

Returns

sdata.Data

classmethod from_xlsx(filepath)[source]

save table as xlsx

Parameters

filepath

Returns

gen_uuid()[source]

generate new uuid string

Returns

str, e.g. ‘5fa04a3738e4431dbc34eccea5e795c4’

gen_uuid_from_state()[source]

generate the same uuid for the same data

Returns

uuid

get_data_by_name(name)[source]

:return obj by name

get_data_by_uuid(uid)[source]

get data by uuid

Generates a link allowing the data in a given panda dataframe to be downloaded in: dataframe out: href string

get_group()[source]
property group

get group

items()[source]

get all child objects

Returns

[(child uuid, child objects), ]

keys()[source]

get all child objects uuids

Returns

list of uuid’s

classmethod metadata_from_hdf5(filepath, **kwargs)[source]

import sdata.Data.Metadata from hdf5

Parameters

filepath

Returns

sdata.Data

property name

name of the object

property osname
Returns

os compatible name (ascii?)

property prefix

prefix of the object name

property project

name of the project

refactor(fix_columns=True, add_table_metadata=True)[source]

helper function

  • to cleanup dataframe column name

  • to define Attributes for all dataframe columns

property sha3_256

Return a SHA3 hash of the sData object with a hashbit length of 32 bytes.

sdata.Data(name="1", uuid=sdata.uuid_from_str("1")).sha3_256

'c468e659891eb5dea6eb6baf73f51ca0688792bf9ad723209dc22730903f6efa'
Returns

hashlib.sha3_256.hexdigest()

property sha3_256_table

Return a SHA3 hash of the sData.table object with a hashbit length of 32 bytes.

sdata.Data(name="1", uuid=sdata.uuid_from_str("1")).sha3_256_table

'c468e659891eb5dea6eb6baf73f51ca0688792bf9ad723209dc22730903f6efa'
Returns

hashlib.sha3_256.hexdigest()

property table

table object(pandas.DataFrame)

to_csv(filepath=None)[source]

export sdata.Data to csv

Parameters

filepath

Returns

to_folder(path, dtype='csv')[source]

export data to folder

Parameters
  • path

  • dtype

Returns

to_hdf5(filepath, **kwargs)[source]

export sdata.Data to hdf5

Parameters
  • filepath

  • complib – default=’zlib’ [‘zlib’, ‘lzo’, ‘bzip2’, ‘blosc’, ‘blosc:blosclz’, ‘blosc:lz4’, ‘blosc:lz4hc’, ‘blosc:snappy’, ‘blosc:zlib’, ‘blosc:zstd’]

  • complevel – default=9 [0-9]

Returns

to_html(filepath, xlsx=True, style=None)[source]

export Data to html

Parameters
  • filepath

  • xlsx

  • style

Returns

to_json(filepath=None)[source]

export Data in json format

Parameters

filepath – export file path (default:None)

Returns

json str

to_sqlite(filepath, **kwargs)[source]

export sdata.Data to sqlite

Parameters
  • filepath

  • kwargs

Returns

to_xlsx(filepath=None)[source]

export atrributes and data to excel

Parameters

filepath

Returns

to_xlsx_base64()[source]

get xlsx as byteio base64 encoded

Returns

base64

to_xlsx_byteio()[source]

get xlsx as byteio

Returns

BytesIO

tree_folder(dir, padding='  ', print_files=True, hidden_files=False, last=True)[source]

print tree folder structure

update_hash(hashobject)[source]

A hash represents the object used to calculate a checksum of a string of information.

data = sdata.Data()

md5 = hashlib.md5()
data.update_hash(md5)
md5.hexdigest()
'bbf323bdcb0bf961803b5504a8a60d69'

sha1 = hashlib.sha1()
data.update_hash(sha1)
sha1.hexdigest()
'3c59368c7735c1ecaf03ebd4c595bb6e73e90f0c'

hashobject = hashlib.sha3_256()
data.update_hash(hashobject).hexdigest()
'c468e659891eb5dea6eb6baf73f51ca0688792bf9ad723209dc22730903f6efa'

data.update_hash(hashobject).digest()
b'M8...'
Parameters

hash – hash object, e.g. hashlib.sha1()

Returns

hash

update_mtime()[source]

update modification time

Returns

property uuid

uuid of the object

values()[source]

get all child objects

Returns

list of child objects

verify_attributes()[source]

check mandatory attributes

Indices and tables