sdata documentation
Motivation
To deal with experimental and simulation data is very often a pain due to an non existing standard data format.
Every lab, and even worse, every staff uses different data formats to store the measured data during an experimental study. Sometimes the raw data ist given by means of undocumented csv files, sometimes a bunch of Excel files with some strange tables layouts are available from the labs. This results often in an uncomplete data set.
To predict a system behavior the nessecary simluation task depend on these experimental result inorder to calibrate the simulation models. Very often the link to the raw experimental data is broken by using only some key result from the experiments.
The aim of this project is to fill the gap by providing an open self describing data structure to store results from experiments and simulations in the same manner.
It should be easy to define a standard data format for a particular experimental test setup based on the sdata environment within a specific project. Furthermore the data should be readable in future.
https://gitpitch.com/lepy/sdata/master?grs=github&t=beige#
Design goals
open data format for open science projects
self describing data
flexible data structure layout
hierarchical data structure (nesting groups, dictionaries)
(posix path syntax support?)
extendable data structure
platform independent
simple object model
support of standard metadata formats (key/value, …)
support of standard dataset formats (hdf5, netcdf, csv, …)
support of standard dataset types (datacubes, tables, series, …)
support of physical units (conversion of units)
transparent, optional data compression (zlib, blosc, …)
support of (de-)serialization of every dataset type (group, data, metadata)
easy defineable (project) standards, e.g. for a uniaxial tension test (UT)
(optional data encryption (gpg, …))
change management support?
Enable use of data structures from existing tensor libraries transparently
(single writer/ multiple reader (swmr) support)
(nested table support)
Data types
Metadata
Metadata describing EVERY object within the data structure.
Attributes
name
.. Name of an attribute (str
)value
.. Value of an attributedtype
.. data type of the attribute (default:str
)unit
.. physical unit of an attribute (optional)description
.. a description of an attribute (optional)label
.. a fancy label of an attribute , e.g. for plotting (optional)required
.. a boolean attribute for attribute checks (optional)
1import sdata
2attribute1 = sdata.Attribute("color", "blue")
3attribute1
(Attr'color':blue(str))
1attribute2 = sdata.Attribute(name="answer",
2 value=42,
3 dtype="int",
4 unit="-",
5 description="""The Answer to the Ultimate Question of Life, The Universe, and Everything""",
6 label="Die Antwort")
7attribute2.to_dict()
{'name': 'answer',
'value': 42,
'unit': '-',
'dtype': 'int',
'description': 'The Answer to the Ultimate Question of Life, The Universe, and Everything',
'label': 'Die Antwort'}
dtypes for attributes
int
, (int64)float
, (float32, float64, float128)str
, (unicode)bool
timestamp
(datetime.isoformat with timezone)(
uuid
planed)
sdata.metadata
1metadata = sdata.Metadata()
2metadata.add(attribute1)
3metadata.add(attribute2)
4print(metadata)
5metadata.df
name value dtype unit description label
key
sdata_version sdata_version 0.8.4 str -
Augenfarbe color blue str -
answer answer 42 int - The Answer to the Ultimate Question of Life, T... Die Antwort
1data = sdata.Data(name="basic example", uuid="38b26864e7794f5182d38459bab85842", table=df)
2data.metadata.add("Temperatur",
3 value=25.4,
4 dtype="float",
5 unit="degC",
6 description="Temperatur",
7 label="Temperatur T [°C]")
8data.metadata.df
name value dtype unit description label
key
sdata_version sdata_version 0.8.4 str -
name name basic example str -
uuid uuid 38b26864e7794f5182d38459bab85842 str -
Temperatur Temperatur 25.4 float degC Temperatur Temperatur T [°C]
Core data types
Data
The Data
class is the Base class for all classes within the sdata family. It provides a uuid, a name and the metadata functionality.
It can group other Data objects. A Data object can store one pandas.DataFrame.
1import sdata
2data = sdata.Data(name="my data name", table=df, description="my data description")
1df = pd.DataFrame({"time": [1.1, 2.1, 3.5],
2 "temperature": [2.4, 5.2, 2.2]},
3
4data_name = "Temperaturmessung-001"
5data = sdata.Data(name=data_name,
6 uuid=sdata.uuid_from_str(data_name),
7 table=df,
8 description="Messergebnis Temperatur")
9data.metadata.add("time",
10 value=None,
11 dtype="float",
12 unit="s",
13 description="Zeitachse",
14 label="time $t$")
15data.metadata.add("temperature",
16 value=None,
17 dtype="float",
18 unit="°C",
19 description="Zeitachse",
20 label="temperature $T$")
21data.describe()
1import matplotlib.pyplot as plt
2fig, ax = plt.subplots()
3
4x_var = "time"
5y_var = "temperature"
6
7x_attr = data.metadata.get(x_var)
8y_attr = data.metadata.get(y_var)
9
10ax.plot(data.df[x_var], data.df[y_var], label=data.name)
11ax.legend(loc="best")
12ax.set_xlabel("{0.label} [{0.unit}]".format(x_attr))
13ax.set_ylabel("{0.label} [{0.unit}]".format(y_attr))
14print("plot")
Paper
Ingolf Lepenies. (2020). Das sdata-Format (Version 0.8.4). http://doi.org/10.5281/zenodo.4311323
Examples
Tensile Test Data [High strain rate data for Medium Mn 2 (10 Wt. % Mn) Steel](https://gist.github.com/lepy/fdbdce805b206322d8013a1375da0eb9),
Creative Commons Lizenz
Usage examples
Dump and load a pandas dataframe
import logging
logging.basicConfig(format='%(asctime)s %(levelname)s:%(message)s', level=logging.WARNING, datefmt='%I:%M:%S')
import os
import sys
import numpy as np
import pandas as pd
import sdata
print("sdata v{}".format(sdata.__version__))
# ## create a dataframe
df = pd.DataFrame({"a":[1.,2.,3.], "b":[4.,6.,1.]})
# ## create a Data object
data = sdata.Data(name="df",
uuid=sdata.uuid_from_str("df"),
table=df,
description="a pandas dataframe",)
# ## dump the data
# ### Excel IO
data.to_xlsx(filepath="/tmp/data1.xlsx")
data_xlsx = sdata.Data.from_xlsx(filepath="/tmp/data1.xlsx")
assert data.name==data_xlsx.name
assert data.uuid==data_xlsx.uuid
assert data.description==data_xlsx.description
print(data_xlsx)
data_xlsx.df
# ### Hdf5 IO
data.to_hdf5(filepath="/tmp/data1.hdf")
data_hdf5 = sdata.Data.from_hdf5(filepath="/tmp/data1.hdf")
assert data.name==data_xlsx.name
assert data.uuid==data_xlsx.uuid
assert data.description==data_hdf5.description
print(data_hdf5)
data_hdf5.df
# ### Json IO
data.to_json(filepath="/tmp/data1.json")
data_json = sdata.Data.from_json(filepath="/tmp/data1.json")
assert data.name==data_json.name
assert data.uuid==data_json.uuid
assert data.description==data_json.description
print(data_json)
data_json.df
# ### csv IO
data.to_csv(filepath="/tmp/data1.csv")
data_csv = sdata.Data.from_csv(filepath="/tmp/data1.csv")
assert data.name==data_csv.name
assert data.uuid==data_csv.uuid
# assert data.description==data_csv.description
assert data.df.shape == data_csv.df.shape
print(data_csv)
data_csv.df
# ### html export
data.to_html(filepath="/tmp/data1.html")
Components of sdata.Data
# ## create a Data object
data = sdata.Data(name="df",
table=pd.DataFrame({"a":[1.,2.,3.], "b":[4.,6.,1.]}),
description="a pandas dataframe",)
# Metadata
data.metadata
# Dataframe
data.df
# Description
data.description
sdata api
- class sdata.Blob(**kwargs)[source]
Bases:
Data
Binary Large Object as reference
Warning
highly experimental
- ATTR_NAMES = []
- SDATA_ATTRIBUTES = ['!sdata_version', '!sdata_name', '!sdata_uuid', '!sdata_class', '!sdata_parent', '!sdata_project', '!sdata_ctime', '!sdata_mtime']
- SDATA_CLASS = '!sdata_class'
- SDATA_CTIME = '!sdata_ctime'
- SDATA_MTIME = '!sdata_mtime'
- SDATA_NAME = '!sdata_name'
- SDATA_PARENT = '!sdata_parent'
- SDATA_PROJECT = '!sdata_project'
- SDATA_UUID = '!sdata_uuid'
- SDATA_VERSION = '!sdata_version'
- VAULT_TYPES = ['filesystem', 'hdf5', 'db', 'www']
- add_data(data)
add data, if data.name is unique
- property asciiname
- static clear_folder(path)
delete subfolder in export folder
- Parameters
path – path
- Returns
None
- clear_group()
clear group dict
- copy(**kwargs)
create a copy of the Data object
data = sdata.Data(name="data", uuid="38b26864e7794f5182d38459bab85842", description="this is remarkable") datac = data.copy() print("data {0.uuid}".format(data)) print("datac {0.uuid}".format(datac)) print("datac.metadata['!sdata_parent'] {0.value}".format(datac.metadata["sdata_parent"]))
data 38b26864e7794f5182d38459bab85842 datac 2c4eb15900af435d8cd9c8573ca777e2 datac.metadata['!sdata_parent'] 38b26864e7794f5182d38459bab85842
- Returns
Data
- describe()
Generate descriptive info of the data
df = pd.DataFrame([1,2,3]) data = sdata.Data(name='my name', uuid='38b26864e7794f5182d38459bab85842', table=df, description="A remarkable description") data.describe()
0 metadata 3 table_rows 3 table_columns 1 description 24
- Returns
pd.DataFrame
- property description
description of the object
- description_from_df(df)
set description from DataFrame of lines
- Returns
- description_to_df()
get description as DataFrame
- Returns
DataFrame of description lines
- property df
table object(pandas.DataFrame)
- dir()
returns a nested list of all child objects
- Returns
list of sdata.Data objects
- exists(vault='filesystem')[source]
Test whether a object under the blob.url exists.
- Parameters
vault –
- Returns
- property filename
- classmethod from_csv(s=None, filepath=None, sep=';')
import sdata.Data from csv
- Parameters
s – csv str
filepath –
sep – separator (default=”;”)
- Returns
sdata.Data
- classmethod from_folder(path)
sdata object instance
- Parameters
path –
- Returns
- classmethod from_hdf5(filepath, **kwargs)
import sdata.Data from hdf5
- Parameters
filepath –
- Returns
sdata.Data
- classmethod from_json(s=None, filepath=None)
create Data from json str or file
- Parameters
s – json str
filepath –
- Returns
sdata.Data
- classmethod from_sqlite(filepath, **kwargs)
import sdata.Data from sqlite
- Parameters
filepath –
kwargs –
- Returns
sdata.Data
- classmethod from_url(url=None, stype=None)
create Data from json str or file
- Parameters
url – url
stype – “json” (“xlsx”, “csv”)
- Returns
sdata.Data
- classmethod from_xlsx(filepath)
save table as xlsx
- Parameters
filepath –
- Returns
- gen_uuid()
generate new uuid string
- Returns
str, e.g. ‘5fa04a3738e4431dbc34eccea5e795c4’
- gen_uuid_from_state()
generate the same uuid for the same data
- Returns
uuid
- get_data_by_name(name)
:return obj by name
- get_data_by_uuid(uid)
get data by uuid
- get_download_link()
Generates a link allowing the data in a given panda dataframe to be downloaded in: dataframe out: href string
- get_group()
- property group
get group
- items()
get all child objects
- Returns
[(child uuid, child objects), ]
- keys()
get all child objects uuids
- Returns
list of uuid’s
- property md5
calculate the md5 hash of the blob
- Returns
sha1
- classmethod metadata_from_hdf5(filepath, **kwargs)
import sdata.Data.Metadata from hdf5
- Parameters
filepath –
- Returns
sdata.Data
- property name
name of the object
- property osname
- Returns
os compatible name (ascii?)
- property prefix
prefix of the object name
- property project
name of the project
- refactor(fix_columns=True, add_table_metadata=True)
helper function
to cleanup dataframe column name
to define Attributes for all dataframe columns
- property sha1
calculate the sha1 hash of the blob
- Returns
sha1
- property sha3_256
Return a SHA3 hash of the sData object with a hashbit length of 32 bytes.
sdata.Data(name="1", uuid=sdata.uuid_from_str("1")).sha3_256 'c468e659891eb5dea6eb6baf73f51ca0688792bf9ad723209dc22730903f6efa'
- Returns
hashlib.sha3_256.hexdigest()
- property sha3_256_table
Return a SHA3 hash of the sData.table object with a hashbit length of 32 bytes.
sdata.Data(name="1", uuid=sdata.uuid_from_str("1")).sha3_256_table 'c468e659891eb5dea6eb6baf73f51ca0688792bf9ad723209dc22730903f6efa'
- Returns
hashlib.sha3_256.hexdigest()
- property table
table object(pandas.DataFrame)
- to_csv(filepath=None)
export sdata.Data to csv
- Parameters
filepath –
- Returns
- to_folder(path, dtype='csv')
export data to folder
- Parameters
path –
dtype –
- Returns
- to_hdf5(filepath, **kwargs)
export sdata.Data to hdf5
- Parameters
filepath –
complib – default=’zlib’ [‘zlib’, ‘lzo’, ‘bzip2’, ‘blosc’, ‘blosc:blosclz’, ‘blosc:lz4’, ‘blosc:lz4hc’, ‘blosc:snappy’, ‘blosc:zlib’, ‘blosc:zstd’]
complevel – default=9 [0-9]
- Returns
- to_html(filepath, xlsx=True, style=None)
export Data to html
- Parameters
filepath –
xlsx –
style –
- Returns
- to_json(filepath=None)
export Data in json format
- Parameters
filepath – export file path (default:None)
- Returns
json str
- to_sqlite(filepath, **kwargs)
export sdata.Data to sqlite
- Parameters
filepath –
kwargs –
- Returns
- to_xlsx(filepath=None)
export atrributes and data to excel
- Parameters
filepath –
- Returns
- to_xlsx_base64()
get xlsx as byteio base64 encoded
- Returns
base64
- to_xlsx_byteio()
get xlsx as byteio
- Returns
BytesIO
- tree_folder(dir, padding=' ', print_files=True, hidden_files=False, last=True)
print tree folder structure
- update_hash(fh, hashobject, buffer_size=65536)[source]
A hash represents the object used to calculate a checksum of a string of information.
hashobject = hashlib.md5() df = pd.DataFrame([1,2,3]) url = "/tmp/blob.csv" df.to_csv(url) blob = sdata.Blob(url=url) fh = open(url, "rb") blob.update_hash(fh, hashobject) hashobject.hexdigest()
- Parameters
fh – file handle
hashobject – hash object, e.g. hashlib.sha1()
buffer_size – buffer size (default buffer_size=65536)
- Returns
hashobject
- update_mtime()
update modification time
- Returns
- property url
url of the blob
- property uuid
uuid of the object
- values()
get all child objects
- Returns
list of child objects
- verify_attributes()
check mandatory attributes
- class sdata.Data(**kwargs)[source]
Bases:
object
Base sdata object
- ATTR_NAMES = []
- SDATA_ATTRIBUTES = ['!sdata_version', '!sdata_name', '!sdata_uuid', '!sdata_class', '!sdata_parent', '!sdata_project', '!sdata_ctime', '!sdata_mtime']
- SDATA_CLASS = '!sdata_class'
- SDATA_CTIME = '!sdata_ctime'
- SDATA_MTIME = '!sdata_mtime'
- SDATA_NAME = '!sdata_name'
- SDATA_PARENT = '!sdata_parent'
- SDATA_PROJECT = '!sdata_project'
- SDATA_UUID = '!sdata_uuid'
- SDATA_VERSION = '!sdata_version'
- property asciiname
- static clear_folder(path)[source]
delete subfolder in export folder
- Parameters
path – path
- Returns
None
- copy(**kwargs)[source]
create a copy of the Data object
data = sdata.Data(name="data", uuid="38b26864e7794f5182d38459bab85842", description="this is remarkable") datac = data.copy() print("data {0.uuid}".format(data)) print("datac {0.uuid}".format(datac)) print("datac.metadata['!sdata_parent'] {0.value}".format(datac.metadata["sdata_parent"]))
data 38b26864e7794f5182d38459bab85842 datac 2c4eb15900af435d8cd9c8573ca777e2 datac.metadata['!sdata_parent'] 38b26864e7794f5182d38459bab85842
- Returns
Data
- describe()[source]
Generate descriptive info of the data
df = pd.DataFrame([1,2,3]) data = sdata.Data(name='my name', uuid='38b26864e7794f5182d38459bab85842', table=df, description="A remarkable description") data.describe()
0 metadata 3 table_rows 3 table_columns 1 description 24
- Returns
pd.DataFrame
- property description
description of the object
- property df
table object(pandas.DataFrame)
- property filename
- classmethod from_csv(s=None, filepath=None, sep=';')[source]
import sdata.Data from csv
- Parameters
s – csv str
filepath –
sep – separator (default=”;”)
- Returns
sdata.Data
- classmethod from_hdf5(filepath, **kwargs)[source]
import sdata.Data from hdf5
- Parameters
filepath –
- Returns
sdata.Data
- classmethod from_json(s=None, filepath=None)[source]
create Data from json str or file
- Parameters
s – json str
filepath –
- Returns
sdata.Data
- classmethod from_sqlite(filepath, **kwargs)[source]
import sdata.Data from sqlite
- Parameters
filepath –
kwargs –
- Returns
sdata.Data
- classmethod from_url(url=None, stype=None)[source]
create Data from json str or file
- Parameters
url – url
stype – “json” (“xlsx”, “csv”)
- Returns
sdata.Data
- get_download_link()[source]
Generates a link allowing the data in a given panda dataframe to be downloaded in: dataframe out: href string
- property group
get group
- classmethod metadata_from_hdf5(filepath, **kwargs)[source]
import sdata.Data.Metadata from hdf5
- Parameters
filepath –
- Returns
sdata.Data
- property name
name of the object
- property osname
- Returns
os compatible name (ascii?)
- property prefix
prefix of the object name
- property project
name of the project
- refactor(fix_columns=True, add_table_metadata=True)[source]
helper function
to cleanup dataframe column name
to define Attributes for all dataframe columns
- property sha3_256
Return a SHA3 hash of the sData object with a hashbit length of 32 bytes.
sdata.Data(name="1", uuid=sdata.uuid_from_str("1")).sha3_256 'c468e659891eb5dea6eb6baf73f51ca0688792bf9ad723209dc22730903f6efa'
- Returns
hashlib.sha3_256.hexdigest()
- property sha3_256_table
Return a SHA3 hash of the sData.table object with a hashbit length of 32 bytes.
sdata.Data(name="1", uuid=sdata.uuid_from_str("1")).sha3_256_table 'c468e659891eb5dea6eb6baf73f51ca0688792bf9ad723209dc22730903f6efa'
- Returns
hashlib.sha3_256.hexdigest()
- property table
table object(pandas.DataFrame)
- to_hdf5(filepath, **kwargs)[source]
export sdata.Data to hdf5
- Parameters
filepath –
complib – default=’zlib’ [‘zlib’, ‘lzo’, ‘bzip2’, ‘blosc’, ‘blosc:blosclz’, ‘blosc:lz4’, ‘blosc:lz4hc’, ‘blosc:snappy’, ‘blosc:zlib’, ‘blosc:zstd’]
complevel – default=9 [0-9]
- Returns
- to_html(filepath, xlsx=True, style=None)[source]
export Data to html
- Parameters
filepath –
xlsx –
style –
- Returns
- to_json(filepath=None)[source]
export Data in json format
- Parameters
filepath – export file path (default:None)
- Returns
json str
- to_sqlite(filepath, **kwargs)[source]
export sdata.Data to sqlite
- Parameters
filepath –
kwargs –
- Returns
- tree_folder(dir, padding=' ', print_files=True, hidden_files=False, last=True)[source]
print tree folder structure
- update_hash(hashobject)[source]
A hash represents the object used to calculate a checksum of a string of information.
data = sdata.Data() md5 = hashlib.md5() data.update_hash(md5) md5.hexdigest() 'bbf323bdcb0bf961803b5504a8a60d69' sha1 = hashlib.sha1() data.update_hash(sha1) sha1.hexdigest() '3c59368c7735c1ecaf03ebd4c595bb6e73e90f0c' hashobject = hashlib.sha3_256() data.update_hash(hashobject).hexdigest() 'c468e659891eb5dea6eb6baf73f51ca0688792bf9ad723209dc22730903f6efa' data.update_hash(hashobject).digest() b'M8...'
- Parameters
hash – hash object, e.g. hashlib.sha1()
- Returns
hash
- property uuid
uuid of the object
- class sdata.metadata.Attribute(name, value, **kwargs)[source]
Bases:
object
Attribute class
- DTYPES = {'bool': <class 'bool'>, 'float': <class 'float'>, 'int': <class 'int'>, 'str': <class 'str'>, 'timestamp': <class 'sdata.timestamp.TimeStamp'>}
- property description
Attribute description
- property dtype
Attribute type str
- property label
Attribute label
- property name
Attribute name
- property required
Attribute required
- to_csv(prefix='', sep=',', quote=None)[source]
export Attribute to csv
- Parameters
prefix –
sep –
quote –
- Returns
- property unit
Attribute unit
- property value
Attribute value
- class sdata.metadata.Metadata(**kwargs)[source]
Bases:
object
Metadata container class
- each Metadata entry has has a
name (256)
value
unit
description
type (int, str, float, bool, timestamp)
- ATTRIBUTEKEYS = ['name', 'value', 'dtype', 'unit', 'description', 'label', 'required']
- property attributes
returns Attributes
- property df
create dataframe
- classmethod from_json(jsonstr=None, filepath=None)[source]
create metadata from json file
- Parameters
jsonstr – json str
filepath – filepath to json file
- Returns
Metadata
- classmethod from_list(mlist)[source]
create metadata from a list of Attribute values
- [[‘force_x’, 1.2, ‘float’, ‘kN’, ‘force in x-direction’],
[‘force_y’, 3.1, ‘float’, ‘N’, ‘force in y-direction’, ‘label’, True]]
- static guess_dtype_from_value(value)[source]
guess dtype from value, e.g. ‘1.23’ -> ‘float’ ‘otto1.23’ -> ‘str’ 1 -> ‘int’ False -> ‘bool’
- Parameters
value –
- Returns
dtype(value), dtype [‘int’, ‘float’, ‘bool’, ‘str’]
- property name
Name of the Metadata
- relabel(name, newname)[source]
relabel Attribute
- Parameters
name – old attribute name
newname – new attribute name
- Returns
None
- property required_attributes
- property sdata_attributes
- property sdf
create dataframe for sdata attributes
- property sdft
create transposed dataframe for sdata attributes
- set_unit_from_name(add_description=True, fix_name=True)[source]
try to extract unit from attribute name
- Returns
- property sha3_256
Return a new SHA3 hash object with a hashbit length of 32 bytes.
- Returns
hashlib.sha3_256.hexdigest()
- property size
return number uf Attribute
- property udf
create dataframe for user attributes
- update_hash(hashobject)[source]
A hash represents the object used to calculate a checksum of a string of information.
hashobject = hashlib.sha3_256() metadata = Metadata() metadata.update_hash(hashobject) hash.hexdigest()
- Parameters
hash – hash object
- Returns
hash_function().hexdigest()
- property user_attributes
- sdata.metadata.extract_name_unit(value)[source]
extract name and unit from a combined string
value: 'Target Strain Rate (1/s) ' name : 'Target Strain Rate' unit : '1/s' value: 'Gauge Length [mm] monkey ' name : 'Gauge Length' unit : 'mm' value: 'Gauge Length <mm> whatever ' name : 'Gauge Length' unit : 'mm'
- Parameters
value – string, e.g. ‘Length <mm> whatever’
- Returns
name, unit
Further Reading
sdata package
Subpackages
sdata.experiments package
Submodules
sdata.experiments.ks2 module
sdata.experiments.lapshear module
sdata.experiments.material module
Module contents
sdata.io package
Submodules
sdata.io.hdf module
- class sdata.io.hdf.FlatHDFDataStore(filepath, **kwargs)[source]
Bases:
object
Flat HDF5 Store
store = FlatHDFDataStore(filepath="/tmp/mystore.h5") data = sdata.Data(name="otto", uuid="d4e97cedca6238bea16732ce88c1922f", table=pd.DataFrame({"a": [1, 2, 3]}), description="Hallo
- Spencer”)
store.put(data)
loaded_data = store.get_data_by_uuid(“d4e97cedca6238bea16732ce88c1922f”) assert data.sha3_256 == loaded_data.sha3_256
sdata.io.pgp module
sdata.io.pud module
- class sdata.io.pud.Pud(**kwargs)[source]
Bases:
Data
run object, e.g. single tension test simulation
- ATTRIBUTES = ['material_norm_name', 'material_number_norm', 'material_name', 'test', 'sample_ident_number', 'sample_geometry', 'sample_direction', 'nominal_pre_deformation <%>', 'actual_pre_deformation <%>', 'direction_of_pre_deformation', 'heat_treatment', 'actual_sample_width_<mm>', 'actual_sample_thickness_<mm>', 'actual_gauge_length_<mm>', 'nominal_testing_temperature_<K>', 'nominal_testing_speed_<m/s>', 'order', 'date_of_test_<dd.mm.yyyy>', 'tester', 'place_of_test', 'remark', 'data']
- classmethod from_file(filepath)[source]
read pud file
WERKSTOFF_NORM_NAME = HC340LA WERKSTOFFNUMMER_NORM = MATERIALNAME = HC340LA PRUEFUNG = FLIESSKURVE PROBENIDENTNUMMER = id0815 PROBENGEOMETRIE = ENTNAHMERICHTUNG = Quer (90deg) VORVERFORMUNG_SOLL <%> = 0 VORVERFORMUNG_IST <%> = VORVERFORMUNGSRICHTUNG = Unverformt WAERMEBEHANDLUNG = O PROBENBREITE_IST <mm> = 20.014 PROBENDICKE_IST <mm> = 0.751 MESSLAENGE_IST <mm> = 80.0 MESSLAENGE_IST_FD <mm> = 80.0 PRUEFTEMPERATUR_SOLL <K> = 293 PRUEFGESCHWINDIGKEIT_SOLL <mm/s> = 0.32 PRUEFGESCHWINDIGKEIT_IST <mm/s> = 0.32 DEHNRATE_SOLL <1/s> = 0.004 DEHNRATE_IST <1/s> = 0.004 AUFTRAG = PRUEFDATUM <tt.mm.jjjj> = 19.03.2017 PRUEFER = Otto PRUEFSTELLE = SALZGITTER AG BEMERKUNG = ASL 2009-056 DATEN = ZEIT <s>; KRAFT <N>; WEG <mm>; BREITENAENDERUNG <mm>; WEG_FD <mm> 1.2372;192.181;-0.0235;0.0012;-0.0235 1.2772;198.325;-0.0231;0.0012;-0.0231 1.2972;201.397;-0.0227;0.0012;-0.0227 1.3172;205.152;-0.0224;0.0013;-0.0224 1.3572;211.638;-0.022;0.0013;-0.022 1.3972;218.123;-0.0213;0.0013;-0.0213
sdata.io.vault module
- class sdata.io.vault.FileSystemVault(rootpath, **kwargs)[source]
Bases:
Vault
data vault on the filesystem
- property index
get vault index
- Returns
- exception sdata.io.vault.FilesystemVaultException[source]
Bases:
VaultException
- class sdata.io.vault.Hdf5Vault(rootpath, **kwargs)[source]
Bases:
Vault
data vault on the filesystem
- exception sdata.io.vault.Hdf5VaultException[source]
Bases:
VaultException
- class sdata.io.vault.Vault(rootpath, **kwargs)[source]
Bases:
object
data vault
- INDEXFILENAME = 'index'
- OBJECTPATH = 'objects'
- property index
get vault index
- Returns
- property rootpath
- class sdata.io.vault.VaultIndex[source]
Bases:
object
Index of a Vault
- INDEXDATAFRAME = 'indexdataframe'
- property df
index dataframe
- classmethod from_hdf5(filepath)[source]
read index dataframe from hdf5
- Parameters
filepath –
- Returns
VaultIndex
- property name
index dataframe
Module contents
Submodules
sdata.blob module
- class sdata.blob.Blob(**kwargs)[source]
Bases:
Data
Binary Large Object as reference
Warning
highly experimental
- VAULT_TYPES = ['filesystem', 'hdf5', 'db', 'www']
- exists(vault='filesystem')[source]
Test whether a object under the blob.url exists.
- Parameters
vault –
- Returns
- property md5
calculate the md5 hash of the blob
- Returns
sha1
- property sha1
calculate the sha1 hash of the blob
- Returns
sha1
- update_hash(fh, hashobject, buffer_size=65536)[source]
A hash represents the object used to calculate a checksum of a string of information.
hashobject = hashlib.md5() df = pd.DataFrame([1,2,3]) url = "/tmp/blob.csv" df.to_csv(url) blob = sdata.Blob(url=url) fh = open(url, "rb") blob.update_hash(fh, hashobject) hashobject.hexdigest()
- Parameters
fh – file handle
hashobject – hash object, e.g. hashlib.sha1()
buffer_size – buffer size (default buffer_size=65536)
- Returns
hashobject
- property url
url of the blob
sdata.data module
- class sdata.data.Data(**kwargs)[source]
Bases:
object
Base sdata object
- ATTR_NAMES = []
- SDATA_ATTRIBUTES = ['!sdata_version', '!sdata_name', '!sdata_uuid', '!sdata_class', '!sdata_parent', '!sdata_project', '!sdata_ctime', '!sdata_mtime']
- SDATA_CLASS = '!sdata_class'
- SDATA_CTIME = '!sdata_ctime'
- SDATA_MTIME = '!sdata_mtime'
- SDATA_NAME = '!sdata_name'
- SDATA_PARENT = '!sdata_parent'
- SDATA_PROJECT = '!sdata_project'
- SDATA_UUID = '!sdata_uuid'
- SDATA_VERSION = '!sdata_version'
- property asciiname
- static clear_folder(path)[source]
delete subfolder in export folder
- Parameters
path – path
- Returns
None
- copy(**kwargs)[source]
create a copy of the Data object
data = sdata.Data(name="data", uuid="38b26864e7794f5182d38459bab85842", description="this is remarkable") datac = data.copy() print("data {0.uuid}".format(data)) print("datac {0.uuid}".format(datac)) print("datac.metadata['!sdata_parent'] {0.value}".format(datac.metadata["sdata_parent"]))
data 38b26864e7794f5182d38459bab85842 datac 2c4eb15900af435d8cd9c8573ca777e2 datac.metadata['!sdata_parent'] 38b26864e7794f5182d38459bab85842
- Returns
Data
- describe()[source]
Generate descriptive info of the data
df = pd.DataFrame([1,2,3]) data = sdata.Data(name='my name', uuid='38b26864e7794f5182d38459bab85842', table=df, description="A remarkable description") data.describe()
0 metadata 3 table_rows 3 table_columns 1 description 24
- Returns
pd.DataFrame
- property description
description of the object
- property df
table object(pandas.DataFrame)
- property filename
- classmethod from_csv(s=None, filepath=None, sep=';')[source]
import sdata.Data from csv
- Parameters
s – csv str
filepath –
sep – separator (default=”;”)
- Returns
sdata.Data
- classmethod from_hdf5(filepath, **kwargs)[source]
import sdata.Data from hdf5
- Parameters
filepath –
- Returns
sdata.Data
- classmethod from_json(s=None, filepath=None)[source]
create Data from json str or file
- Parameters
s – json str
filepath –
- Returns
sdata.Data
- classmethod from_sqlite(filepath, **kwargs)[source]
import sdata.Data from sqlite
- Parameters
filepath –
kwargs –
- Returns
sdata.Data
- classmethod from_url(url=None, stype=None)[source]
create Data from json str or file
- Parameters
url – url
stype – “json” (“xlsx”, “csv”)
- Returns
sdata.Data
- get_download_link()[source]
Generates a link allowing the data in a given panda dataframe to be downloaded in: dataframe out: href string
- property group
get group
- classmethod metadata_from_hdf5(filepath, **kwargs)[source]
import sdata.Data.Metadata from hdf5
- Parameters
filepath –
- Returns
sdata.Data
- property name
name of the object
- property osname
- Returns
os compatible name (ascii?)
- property prefix
prefix of the object name
- property project
name of the project
- refactor(fix_columns=True, add_table_metadata=True)[source]
helper function
to cleanup dataframe column name
to define Attributes for all dataframe columns
- property sha3_256
Return a SHA3 hash of the sData object with a hashbit length of 32 bytes.
sdata.Data(name="1", uuid=sdata.uuid_from_str("1")).sha3_256 'c468e659891eb5dea6eb6baf73f51ca0688792bf9ad723209dc22730903f6efa'
- Returns
hashlib.sha3_256.hexdigest()
- property sha3_256_table
Return a SHA3 hash of the sData.table object with a hashbit length of 32 bytes.
sdata.Data(name="1", uuid=sdata.uuid_from_str("1")).sha3_256_table 'c468e659891eb5dea6eb6baf73f51ca0688792bf9ad723209dc22730903f6efa'
- Returns
hashlib.sha3_256.hexdigest()
- property table
table object(pandas.DataFrame)
- to_hdf5(filepath, **kwargs)[source]
export sdata.Data to hdf5
- Parameters
filepath –
complib – default=’zlib’ [‘zlib’, ‘lzo’, ‘bzip2’, ‘blosc’, ‘blosc:blosclz’, ‘blosc:lz4’, ‘blosc:lz4hc’, ‘blosc:snappy’, ‘blosc:zlib’, ‘blosc:zstd’]
complevel – default=9 [0-9]
- Returns
- to_html(filepath, xlsx=True, style=None)[source]
export Data to html
- Parameters
filepath –
xlsx –
style –
- Returns
- to_json(filepath=None)[source]
export Data in json format
- Parameters
filepath – export file path (default:None)
- Returns
json str
- to_sqlite(filepath, **kwargs)[source]
export sdata.Data to sqlite
- Parameters
filepath –
kwargs –
- Returns
- tree_folder(dir, padding=' ', print_files=True, hidden_files=False, last=True)[source]
print tree folder structure
- update_hash(hashobject)[source]
A hash represents the object used to calculate a checksum of a string of information.
data = sdata.Data() md5 = hashlib.md5() data.update_hash(md5) md5.hexdigest() 'bbf323bdcb0bf961803b5504a8a60d69' sha1 = hashlib.sha1() data.update_hash(sha1) sha1.hexdigest() '3c59368c7735c1ecaf03ebd4c595bb6e73e90f0c' hashobject = hashlib.sha3_256() data.update_hash(hashobject).hexdigest() 'c468e659891eb5dea6eb6baf73f51ca0688792bf9ad723209dc22730903f6efa' data.update_hash(hashobject).digest() b'M8...'
- Parameters
hash – hash object, e.g. hashlib.sha1()
- Returns
hash
- property uuid
uuid of the object
sdata.metadata module
- class sdata.metadata.Attribute(name, value, **kwargs)[source]
Bases:
object
Attribute class
- DTYPES = {'bool': <class 'bool'>, 'float': <class 'float'>, 'int': <class 'int'>, 'str': <class 'str'>, 'timestamp': <class 'sdata.timestamp.TimeStamp'>}
- property description
Attribute description
- property dtype
Attribute type str
- property label
Attribute label
- property name
Attribute name
- property required
Attribute required
- to_csv(prefix='', sep=',', quote=None)[source]
export Attribute to csv
- Parameters
prefix –
sep –
quote –
- Returns
- property unit
Attribute unit
- property value
Attribute value
- class sdata.metadata.Metadata(**kwargs)[source]
Bases:
object
Metadata container class
- each Metadata entry has has a
name (256)
value
unit
description
type (int, str, float, bool, timestamp)
- ATTRIBUTEKEYS = ['name', 'value', 'dtype', 'unit', 'description', 'label', 'required']
- property attributes
returns Attributes
- property df
create dataframe
- classmethod from_json(jsonstr=None, filepath=None)[source]
create metadata from json file
- Parameters
jsonstr – json str
filepath – filepath to json file
- Returns
Metadata
- classmethod from_list(mlist)[source]
create metadata from a list of Attribute values
- [[‘force_x’, 1.2, ‘float’, ‘kN’, ‘force in x-direction’],
[‘force_y’, 3.1, ‘float’, ‘N’, ‘force in y-direction’, ‘label’, True]]
- static guess_dtype_from_value(value)[source]
guess dtype from value, e.g. ‘1.23’ -> ‘float’ ‘otto1.23’ -> ‘str’ 1 -> ‘int’ False -> ‘bool’
- Parameters
value –
- Returns
dtype(value), dtype [‘int’, ‘float’, ‘bool’, ‘str’]
- property name
Name of the Metadata
- relabel(name, newname)[source]
relabel Attribute
- Parameters
name – old attribute name
newname – new attribute name
- Returns
None
- property required_attributes
- property sdata_attributes
- property sdf
create dataframe for sdata attributes
- property sdft
create transposed dataframe for sdata attributes
- set_unit_from_name(add_description=True, fix_name=True)[source]
try to extract unit from attribute name
- Returns
- property sha3_256
Return a new SHA3 hash object with a hashbit length of 32 bytes.
- Returns
hashlib.sha3_256.hexdigest()
- property size
return number uf Attribute
- property udf
create dataframe for user attributes
- update_hash(hashobject)[source]
A hash represents the object used to calculate a checksum of a string of information.
hashobject = hashlib.sha3_256() metadata = Metadata() metadata.update_hash(hashobject) hash.hexdigest()
- Parameters
hash – hash object
- Returns
hash_function().hexdigest()
- property user_attributes
- sdata.metadata.extract_name_unit(value)[source]
extract name and unit from a combined string
value: 'Target Strain Rate (1/s) ' name : 'Target Strain Rate' unit : '1/s' value: 'Gauge Length [mm] monkey ' name : 'Gauge Length' unit : 'mm' value: 'Gauge Length <mm> whatever ' name : 'Gauge Length' unit : 'mm'
- Parameters
value – string, e.g. ‘Length <mm> whatever’
- Returns
name, unit
sdata.timestamp module
ISO 8601 date time string parsing
Basic usage:
>>> parse_date("2007-01-25T12:00:00Z")
datetime.datetime(2007, 1, 25, 12, 0, tzinfo=<iso8601.Utc …>)
MIT License
Copyright (c) 2007 - 2015 Michael Twomey
Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the “Software”), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:
The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.
THE SOFTWARE IS PROVIDED “AS IS”, WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
- exception sdata.timestamp.ParseError[source]
Bases:
Exception
Raised when there is a problem parsing a date string
- class sdata.timestamp.TimeStamp(datetimestr=None)[source]
Bases:
object
2017-04-26T09:04:00.660000+00:00
- property local
returns the datetime isoformat string for the local timezone :returns str
- property utc
returns the utc.isoformat string :returns str
- sdata.timestamp.parse_date(datestring, default_timezone=datetime.timezone.utc)[source]
Parses ISO 8601 dates into datetime objects
The timezone is parsed from the date string. However it is quite common to have dates without a timezone (not strictly correct). In this case the default timezone specified in default_timezone is used. This is UTC by default.
- Parameters
datestring – The date to parse as a string
default_timezone – A datetime tzinfo instance to use when no timezone is specified in the datestring. If this is set to None then a naive datetime object is returned.
- Returns
A datetime.datetime instance
- Raises
ParseError when there is a problem parsing the date or constructing the datetime instance.
sdata.tools module
sdata.workbook module
- class sdata.workbook.Workbook(**kwargs)[source]
Bases:
Data
Workbook with Sheets
Warning
highly experimental
- classmethod from_hdf5(filepath, **kwargs)[source]
import sdata.Data from hdf5
- Parameters
filepath –
- Returns
sdata.Data
- classmethod metadata_from_hdf5(filepath, **kwargs)[source]
import sdata.Data.Metadata from hdf5
- Parameters
filepath –
- Returns
sdata.Data
- property sheetnames
return sheet names of the workbook
- property sheets
all sheets of the workbook
Module contents
- class sdata.Blob(**kwargs)[source]
Bases:
Data
Binary Large Object as reference
Warning
highly experimental
- VAULT_TYPES = ['filesystem', 'hdf5', 'db', 'www']
- exists(vault='filesystem')[source]
Test whether a object under the blob.url exists.
- Parameters
vault –
- Returns
- property md5
calculate the md5 hash of the blob
- Returns
sha1
- property sha1
calculate the sha1 hash of the blob
- Returns
sha1
- update_hash(fh, hashobject, buffer_size=65536)[source]
A hash represents the object used to calculate a checksum of a string of information.
hashobject = hashlib.md5() df = pd.DataFrame([1,2,3]) url = "/tmp/blob.csv" df.to_csv(url) blob = sdata.Blob(url=url) fh = open(url, "rb") blob.update_hash(fh, hashobject) hashobject.hexdigest()
- Parameters
fh – file handle
hashobject – hash object, e.g. hashlib.sha1()
buffer_size – buffer size (default buffer_size=65536)
- Returns
hashobject
- property url
url of the blob
- class sdata.Data(**kwargs)[source]
Bases:
object
Base sdata object
- ATTR_NAMES = []
- SDATA_ATTRIBUTES = ['!sdata_version', '!sdata_name', '!sdata_uuid', '!sdata_class', '!sdata_parent', '!sdata_project', '!sdata_ctime', '!sdata_mtime']
- SDATA_CLASS = '!sdata_class'
- SDATA_CTIME = '!sdata_ctime'
- SDATA_MTIME = '!sdata_mtime'
- SDATA_NAME = '!sdata_name'
- SDATA_PARENT = '!sdata_parent'
- SDATA_PROJECT = '!sdata_project'
- SDATA_UUID = '!sdata_uuid'
- SDATA_VERSION = '!sdata_version'
- property asciiname
- static clear_folder(path)[source]
delete subfolder in export folder
- Parameters
path – path
- Returns
None
- copy(**kwargs)[source]
create a copy of the Data object
data = sdata.Data(name="data", uuid="38b26864e7794f5182d38459bab85842", description="this is remarkable") datac = data.copy() print("data {0.uuid}".format(data)) print("datac {0.uuid}".format(datac)) print("datac.metadata['!sdata_parent'] {0.value}".format(datac.metadata["sdata_parent"]))
data 38b26864e7794f5182d38459bab85842 datac 2c4eb15900af435d8cd9c8573ca777e2 datac.metadata['!sdata_parent'] 38b26864e7794f5182d38459bab85842
- Returns
Data
- describe()[source]
Generate descriptive info of the data
df = pd.DataFrame([1,2,3]) data = sdata.Data(name='my name', uuid='38b26864e7794f5182d38459bab85842', table=df, description="A remarkable description") data.describe()
0 metadata 3 table_rows 3 table_columns 1 description 24
- Returns
pd.DataFrame
- property description
description of the object
- property df
table object(pandas.DataFrame)
- property filename
- classmethod from_csv(s=None, filepath=None, sep=';')[source]
import sdata.Data from csv
- Parameters
s – csv str
filepath –
sep – separator (default=”;”)
- Returns
sdata.Data
- classmethod from_hdf5(filepath, **kwargs)[source]
import sdata.Data from hdf5
- Parameters
filepath –
- Returns
sdata.Data
- classmethod from_json(s=None, filepath=None)[source]
create Data from json str or file
- Parameters
s – json str
filepath –
- Returns
sdata.Data
- classmethod from_sqlite(filepath, **kwargs)[source]
import sdata.Data from sqlite
- Parameters
filepath –
kwargs –
- Returns
sdata.Data
- classmethod from_url(url=None, stype=None)[source]
create Data from json str or file
- Parameters
url – url
stype – “json” (“xlsx”, “csv”)
- Returns
sdata.Data
- get_download_link()[source]
Generates a link allowing the data in a given panda dataframe to be downloaded in: dataframe out: href string
- property group
get group
- classmethod metadata_from_hdf5(filepath, **kwargs)[source]
import sdata.Data.Metadata from hdf5
- Parameters
filepath –
- Returns
sdata.Data
- property name
name of the object
- property osname
- Returns
os compatible name (ascii?)
- property prefix
prefix of the object name
- property project
name of the project
- refactor(fix_columns=True, add_table_metadata=True)[source]
helper function
to cleanup dataframe column name
to define Attributes for all dataframe columns
- property sha3_256
Return a SHA3 hash of the sData object with a hashbit length of 32 bytes.
sdata.Data(name="1", uuid=sdata.uuid_from_str("1")).sha3_256 'c468e659891eb5dea6eb6baf73f51ca0688792bf9ad723209dc22730903f6efa'
- Returns
hashlib.sha3_256.hexdigest()
- property sha3_256_table
Return a SHA3 hash of the sData.table object with a hashbit length of 32 bytes.
sdata.Data(name="1", uuid=sdata.uuid_from_str("1")).sha3_256_table 'c468e659891eb5dea6eb6baf73f51ca0688792bf9ad723209dc22730903f6efa'
- Returns
hashlib.sha3_256.hexdigest()
- property table
table object(pandas.DataFrame)
- to_hdf5(filepath, **kwargs)[source]
export sdata.Data to hdf5
- Parameters
filepath –
complib – default=’zlib’ [‘zlib’, ‘lzo’, ‘bzip2’, ‘blosc’, ‘blosc:blosclz’, ‘blosc:lz4’, ‘blosc:lz4hc’, ‘blosc:snappy’, ‘blosc:zlib’, ‘blosc:zstd’]
complevel – default=9 [0-9]
- Returns
- to_html(filepath, xlsx=True, style=None)[source]
export Data to html
- Parameters
filepath –
xlsx –
style –
- Returns
- to_json(filepath=None)[source]
export Data in json format
- Parameters
filepath – export file path (default:None)
- Returns
json str
- to_sqlite(filepath, **kwargs)[source]
export sdata.Data to sqlite
- Parameters
filepath –
kwargs –
- Returns
- tree_folder(dir, padding=' ', print_files=True, hidden_files=False, last=True)[source]
print tree folder structure
- update_hash(hashobject)[source]
A hash represents the object used to calculate a checksum of a string of information.
data = sdata.Data() md5 = hashlib.md5() data.update_hash(md5) md5.hexdigest() 'bbf323bdcb0bf961803b5504a8a60d69' sha1 = hashlib.sha1() data.update_hash(sha1) sha1.hexdigest() '3c59368c7735c1ecaf03ebd4c595bb6e73e90f0c' hashobject = hashlib.sha3_256() data.update_hash(hashobject).hexdigest() 'c468e659891eb5dea6eb6baf73f51ca0688792bf9ad723209dc22730903f6efa' data.update_hash(hashobject).digest() b'M8...'
- Parameters
hash – hash object, e.g. hashlib.sha1()
- Returns
hash
- property uuid
uuid of the object