Wrangler
Submodules
Data Parser module
In: scintillometry.wrangler.data_parser.py
Parses raw data, creates datasets.
Writing functions for parsing yet another variation of yet another data format is often excruciatingly finicky. To add support for new data sources:
Create a new parsing function in its most relevant class. Possible classes include WranglerScintillometer, WranglerWeather, WranglerTransect,`WranglerEddy`,`WranglerVertical`.
Add an elif statement to the class’ parse_<ClassName> function (the only function in the class that includes a source argument).
Call the new parsing function e.g.:
parse_vertical(file_path, source="new_format")
- class scintillometry.wrangler.data_parser.WranglerEddy[source]
Bases:
WranglerIO
Parses eddy covariance data.
- transform
Inherited methods for transforming dataframe parameters.
- Type:
- parse_eddy_covariance(file_path, source='innflux', tzone=None)[source]
Parses eddy covariance measurements.
Currently only supports innFLUX.
- Parameters:
file_path (str) – Path to eddy covariance measurements.
source (str) – Data source of eddy covariance measurements. Only supports innFLUX. Default “innflux”.
tzone (str) – Local timezone of the measurement period. Default None.
- Returns:
- Parsed eddy covariance
measurements.
- Return type:
dict[pd.DataFrame, pd.DataFrame]
- Raises:
NotImplementedError – <source> measurements are not supported. Use “innflux”.
- parse_innflux(file_name, timezone=None, headers=None)[source]
Parses InnFLUX eddy covariance data.
The input data should be pre-processed from raw eddy covariance measurements using the innFLUX Eddy Covariance code.[1]
The input file should contain data for the sensible heat flux, wind speed, and Obukhov length. It may optionally contain data for the stability parameter.
If parsing a .csv file, optionally pass a list of column headers using the <headers> argument. If no list is passed, a default list of headers is used. This argument is ignored for non-csv files.
- Parameters:
file_name (str) – Path to innFLUX data.
timezone (str) – Local timezone during the scintillometer’s operation. Default None.
headers (list) – List of column headers for data. Default None.
- Returns:
Parsed and localised innFLUX data.
- Return type:
pd.DataFrame
- parse_innflux_csv(file_path, header_list=None)[source]
Parse pre-processed innFLUX data from .csv files.
If innFLUX data was provided as a pre-processed .csv file (i.e. you are not licensed to use raw data), it may only contain data for a limited number of variables with no headers present.
Optionally pass a list of column headers using the <header_list> argument. If no list is passed, a default list of headers is used.
- Parameters:
file_path (str) – Path to .csv file.
header_list (list) – List of column headers for data. Default None.
- Returns:
Contains innFLUX measurements.
- Return type:
pd.DataFrame
- parse_innflux_mat(file_path)[source]
Parse MATLAB® data structures generated by innFLUX.
Supports MATLAB® array version 7 with MATLAB® serial dates. Systematic errors in time conversion are in O(20 ms) so timestamps are rounded to the nearest second.
- Parameters:
file_path (str) – Path to .mat file.
- Returns:
Contains innFLUX measurements.
- Return type:
pd.DataFrame
- Raises:
ValueError – File does not have a .mat extension.
KeyError – InnFLUX data does not contain any values for <key>.
- class scintillometry.wrangler.data_parser.WranglerIO[source]
Bases:
object
Performs file operations on data.
- class scintillometry.wrangler.data_parser.WranglerParsing[source]
Bases:
WranglerIO
Wrapper class for parsing data.
- scintillometer
Inherited methods for parsing scintillometry data.
- Type:
- weather
Inherited methods for parsing weather data.
- Type:
- transect
Inherited methods for parsing topographical data.
- Type:
- eddy
Inherited methods for parsing eddy covariance data.
- Type:
- vertical
Inherited methods for parsing vertical measurements.
- Type:
- stitch
Inherited methods for merging parsed data into unified datasets.
- Type:
- wrangle_data(bls_path, transect_path, calibrate, weather_dir='./ext/data/raw/ZAMG/', station_id='11803', tzone='CET', weather_source='zamg')[source]
Wrangle BLS, ZAMG, and transect datasets.
- Parameters:
bls_path (str) – Path to a raw .mnd data file using FORMAT-1.
transect_path (str) – Path to processed transect. The data must be formatted as: <path_height>, <normalised_path_position>. The normalised path position maps to: [0: receiver location, 1: transmitter location].
calibrate (list) – Contains the incorrect and correct path lengths. Format as [incorrect, correct].
weather_dir (str) – Path to directory with local weather data. Default “./ext/data/raw/ZAMG/”.
station_id (str) – ZAMG weather station ID (Klima-ID). Default 11803.
tzone (str) – Local timezone during the scintillometer’s operation. Default “CET”.
weather_source (str) – Data source of weather data. Currently supports ZAMG. Default “zamg”.
- Returns:
BLS, ZAMG, and transect dataframes, an interpolated dataframe at 60s resolution containing BLS and ZAMG data, and a pd.TimeStamp object of the scintillometer’s recorded start time of data collection. All returned objects are localised to the timezone selected by the user.
- Return type:
dict
- class scintillometry.wrangler.data_parser.WranglerScintillometer[source]
Bases:
WranglerIO
Parses scintillometry data.
- transform
Inherited methods for transforming dataframe parameters.
- Type:
- calibrate_data(data, path_lengths)[source]
Calibrates data if the wrong path length was set.
Recalibrate data if the wrong path length was set in SRun or on the dip switches in the scintillometer. Use the argument:
--c, --calibrate <wrong_path_length> <correct_path_length>
- Parameters:
data (pd.DataFrame) – Parsed and localised scintillometry dataframe.
path_lengths (list) – Contains the incorrect and correct path lengths, [m]. Format as [incorrect, correct].
- Returns:
Recalibrated dataframe.
- Return type:
pd.DataFrame
- Raises:
ValueError – Calibration path lengths must be formatted as: <wrong_path_length> <correct_path_length>.
- parse_mnd_lines(line_list)[source]
Parses data and variable names from a list of .mnd lines.
- Parameters:
line_list (list) – Lines read from .mnd file in FORMAT-1.
- Returns:
Contains a list of lines of parsed BLS data, an ordered list of variable names, the file timestamp, and any additional header parameters in the file header.
- Return type:
dict
- Raises:
Warning – The input file does not follow FORMAT-1.
- parse_scintillometer(file_path, timezone='CET', calibration=None)[source]
Parses .mnd files into dataframes.
- Parameters:
file_path (str) – Path to a raw .mnd data file using FORMAT-1.
timezone (str) – Local timezone during the scintillometer’s operation. Default “CET”.
calibration (list) – Contains the incorrect and correct path lengths, [m]. Format as [incorrect, correct]. Default None.
- Returns:
Parsed and localised scintillometry data.
- Return type:
pd.DataFrame
- class scintillometry.wrangler.data_parser.WranglerStitch[source]
Bases:
object
Merges parsed data into unified datasets.
- merge_scintillometry_weather(scintillometry, weather)[source]
Merges parsed scintillometry and weather dataframes.
This replaces any weather data collected by the scintillometer with external weather data. It only preserves \(C_{n}^{2}\) and SHF data from the scintillometer.
If temperature or pressure data is in Celsius or Pa, they are automatically converted to Kelvin and hPa, respectively - any subsequent maths assumes these units.
- Parameters:
scintillometry (pd.DataFrame) – Parsed and localised scintillometry data.
weather (pd.DataFrame) – Parsed and localised weather data.
- Returns:
Merged dataframe containing both scintillometry data, and interpolated weather data.
- Return type:
pd.DataFrame
- class scintillometry.wrangler.data_parser.WranglerTransect[source]
Bases:
WranglerIO
Parses topographical data.
- transform
Inherited methods for transforming dataframe parameters.
- Type:
- parse_dgm_processed(file_path)[source]
Parses path transect from pre-processed DGM data.
The pre-processed data is a .csv file formatted as: <path_height>, <normalised_path_position>. The normalised path position maps to: [0: receiver location, 1: transmitter location].
- Parameters:
file_path (str) – Path to processed transect. The data must be formatted as: <path_height>, <normalised_path_position>. The normalised path position maps to: [0: receiver location, 1: transmitter location].
- Returns:
Parsed path transect data.
- Return type:
pd.DataFrame
- Raises:
FileNotFoundError – No file found with path: <file_path>.
ValueError – Normalised position is not between 0 and 1.
- parse_transect(file_path, source='dgm_processed')[source]
Parses scintillometer path transect.
- Parameters:
file_path (str) – Path to topographical data.
source (str) – Data source of topographical data. Currently supports pre-processed DGM 5m. Default “dgm_processed”.
- Returns:
Parsed path transect data.
- Return type:
pd.DataFrame
- Raises:
NotImplementedError – <source> measurements are not supported. Use “dgm_processed”.
- class scintillometry.wrangler.data_parser.WranglerTransform[source]
Bases:
object
Transforms data labelling and indexing.
- change_index_frequency(data, frequency='60S')[source]
Change frequency of time index.
- Parameters:
data (pd.DataFrame or pd.Series) – An object with a time or datetime index.
frequency (str) – Reindexing frequency. Default “60S”.
- Returns:
Object with new index frequency.
- Return type:
pd.DataFrame or pd.Series
- convert_time_index(data, tzone=None)[source]
Make tz-naive dataframe tz-aware.
- Parameters:
data (pd.DataFrame) – Tz-naive dataframe.
tzone (str) – Local timezone. Default None.
- Returns:
Tz-aware dataframe in local timezone or UTC.
- Return type:
pd.DataFrame
- parse_iso_date(x, date=True)[source]
Parses timestamp with mixed ISO-8601 duration and date.
Uses integer properties of bool to act as index for partition.
- Parameters:
x (str) – Timestamp containing ISO-8601 duration and date, i.e. “<ISO-8601 duration>/<ISO-8601 date>”.
date (bool) – If True, returns date. Otherwise, returns duration. Default True.
- Returns:
ISO-8601 string representing either a duration or a date.
- Return type:
str
- class scintillometry.wrangler.data_parser.WranglerVertical[source]
Bases:
WranglerIO
Parses vertical measurements.
- transform
Inherited methods for transforming dataframe parameters.
- Type:
- construct_hatpro_levels(levels=None)[source]
Construct HATPRO scanning levels.
Hardcoded scan levels specifically for HATPRO Retrieval data from HATPRO UIBK Met (612m). Scan levels are integer measurement heights relative to the station’s elevation.
- Parameters:
levels (list[int]) – HATPRO measurement heights, \(z_{scan}\) [m]. Default None.
- Returns:
HATPRO measurement heights, \(z_{scan}\) [m].
- Return type:
list
- Raises:
TypeError – Input levels must be a list or tuple of integers.
- load_hatpro(file_name, levels, tzone='CET', station_elevation=612.0)[source]
Load raw HATPRO data into dataframe.
- Parameters:
file_name (str) – Path to raw HATPRO data.
levels (list[int]) – Height of HATPRO scan level, \(z_{scan}\) [m].
tzone (str) – Local timezone of the scintillometer’s measurement period. Default “CET”.
station_elevation (float) – Station elevation, \(z_{stn}\) [m]. Default 612.0.
- Returns:
Contains tz-aware and pre-processed HATPRO data.
- Return type:
pd.DataFrame
- parse_hatpro(file_prefix, timezone='CET', scan_heights=None, elevation=612.0)[source]
Parses HATPRO Retrieval data.
- Parameters:
file_prefix (str) –
Path prefix for HATPRO Retrieval data. There should be two HATPRO files ending with “humidity” and “temp”. The path prefix should be identical for both files, e.g.:
./path/to/file_humidity.csv ./path/to/file_temp.csv
would require file_prefix = “./path/to/file_”.
timezone (str) – Local timezone of the scintillometer’s measurement period. Default “CET”.
scan_heights (list[int]) – Heights of HATPRO measurement levels, \(z_{scan}\) [m]. Default None.
elevation (float) – Station elevation, \(z_{stn}\) [m]. Default 612.0.
- Returns:
Vertical measurements from HATPRO for temperature \(T\) [K], and absolute humidity \(\rho_{v}\) [g \(\cdot\) m -3].
- Return type:
dict[str, pd.DataFrame]
- parse_vertical(file_path, source='hatpro', tzone='CET', levels=None, station_elevation=612.0)[source]
Parses vertical measurements.
Currently only supports HATPRO.
- Parameters:
file_path (str) –
Path to vertical measurements. For HATPRO Retrieval data there should be two HATPRO files ending with “humidity” and “temp”. The path should be identical for both files, e.g.:
./path/to/file_humidity.csv ./path/to/file_temp.csv
would require file_path = “./path/to/file_”.
source (str) – Instrument used for vertical measurements. Only supports HATPRO. Default “hatpro”.
levels (list[int]) – Heights of HATPRO measurements, \(z_{scan}\) [m]. Default None.
tzone (str) – Local timezone of the scintillometer’s measurement period. Default “CET”.
station_elevation (float) – Station elevation, \(z_{stn}\) [m]. Default 612.0.
- Returns:
Vertical measurements for temperature \(T\) [K], and absolute humidity \(\rho_{v}\) [g \(\cdot\) m -3].
- Return type:
dict[pd.DataFrame, pd.DataFrame]
- Raises:
NotImplementedError – <source> measurements are not supported. Use “hatpro”.
- class scintillometry.wrangler.data_parser.WranglerWeather[source]
Bases:
WranglerIO
Parses meteorological data.
- transform
Inherited methods for transforming dataframe parameters.
- Type:
- parse_weather(timestamp, source='zamg', data_dir='./ext/data/raw/ZAMG/', station_id='11803', timezone='CET')[source]
Parses weather data. Only supports ZAMG files.
- Parameters:
timestamp (pd.Timestamp) – Start time of climate record.
source (str) – Data source of weather data. Currently supports ZAMG. Default “zamg”.
data_dir (str) – Location of weather data files. Default “./ext/data/raw/ZAMG/”.
station_id (str) – Weather station ID (e.g. ZAMG Klima-ID). Default 11803.
timezone (str) – Local timezone of the scintillometer’s measurement period. Default “CET”.
- Returns:
Parsed weather data.
- Return type:
pd.DataFrame
- Raises:
NotImplementedError – <source> measurements are not supported. Use “zamg”.
- parse_zamg_data(timestamp, klima_id='11803', data_dir='./ext/data/raw/ZAMG/', timezone='CET')[source]
Parses ZAMG climate records.
- Parameters:
timestamp (pd.Timestamp) – Start time of climate record.
klima_id (str) – ZAMG weather station ID (Klima-ID). Default “11803”.
data_dir (str) – Location of ZAMG data files. Default “./ext/data/raw/ZAMG/”.
timezone (str) – Local timezone during the scintillometer’s operation. Default “CET”.
- Returns:
Parsed ZAMG records.
- Return type:
pd.DataFrame