Wrangler

Submodules

Data Parser module

In: scintillometry.wrangler.data_parser.py

Parses raw data, creates datasets.

Writing functions for parsing yet another variation of yet another data format is often excruciatingly finicky. To add support for new data sources:

  • Create a new parsing function in its most relevant class. Possible classes include WranglerScintillometer, WranglerWeather, WranglerTransect,`WranglerEddy`,`WranglerVertical`.

  • Add an elif statement to the class’ parse_<ClassName> function (the only function in the class that includes a source argument).

  • Call the new parsing function e.g.:

    parse_vertical(file_path, source="new_format")
    
class scintillometry.wrangler.data_parser.WranglerEddy[source]

Bases: WranglerIO

Parses eddy covariance data.

transform

Inherited methods for transforming dataframe parameters.

Type:

WranglerTransform

parse_eddy_covariance(file_path, source='innflux', tzone=None)[source]

Parses eddy covariance measurements.

Currently only supports innFLUX.

Parameters:
  • file_path (str) – Path to eddy covariance measurements.

  • source (str) – Data source of eddy covariance measurements. Only supports innFLUX. Default “innflux”.

  • tzone (str) – Local timezone of the measurement period. Default None.

Returns:

Parsed eddy covariance

measurements.

Return type:

dict[pd.DataFrame, pd.DataFrame]

Raises:

NotImplementedError – <source> measurements are not supported. Use “innflux”.

parse_innflux(file_name, timezone=None, headers=None)[source]

Parses InnFLUX eddy covariance data.

The input data should be pre-processed from raw eddy covariance measurements using the innFLUX Eddy Covariance code.[1]

The input file should contain data for the sensible heat flux, wind speed, and Obukhov length. It may optionally contain data for the stability parameter.

If parsing a .csv file, optionally pass a list of column headers using the <headers> argument. If no list is passed, a default list of headers is used. This argument is ignored for non-csv files.

Parameters:
  • file_name (str) – Path to innFLUX data.

  • timezone (str) – Local timezone during the scintillometer’s operation. Default None.

  • headers (list) – List of column headers for data. Default None.

Returns:

Parsed and localised innFLUX data.

Return type:

pd.DataFrame

parse_innflux_csv(file_path, header_list=None)[source]

Parse pre-processed innFLUX data from .csv files.

If innFLUX data was provided as a pre-processed .csv file (i.e. you are not licensed to use raw data), it may only contain data for a limited number of variables with no headers present.

Optionally pass a list of column headers using the <header_list> argument. If no list is passed, a default list of headers is used.

Parameters:
  • file_path (str) – Path to .csv file.

  • header_list (list) – List of column headers for data. Default None.

Returns:

Contains innFLUX measurements.

Return type:

pd.DataFrame

parse_innflux_mat(file_path)[source]

Parse MATLAB® data structures generated by innFLUX.

Supports MATLAB® array version 7 with MATLAB® serial dates. Systematic errors in time conversion are in O(20 ms) so timestamps are rounded to the nearest second.

Parameters:

file_path (str) – Path to .mat file.

Returns:

Contains innFLUX measurements.

Return type:

pd.DataFrame

Raises:
  • ValueError – File does not have a .mat extension.

  • KeyError – InnFLUX data does not contain any values for <key>.

class scintillometry.wrangler.data_parser.WranglerIO[source]

Bases: object

Performs file operations on data.

check_file_exists(fname)[source]

Check file exists.

Parameters:

fname (str) – Path to a file.

Raises:

FileNotFoundError – No file found with path: <fname>.

file_handler(filename)[source]

Opens file as read-only and appends each line to a list.

Parameters:

filename (str) – Path to file.

Returns:

List of lines read from file up to EOF.

Return type:

list

class scintillometry.wrangler.data_parser.WranglerParsing[source]

Bases: WranglerIO

Wrapper class for parsing data.

scintillometer

Inherited methods for parsing scintillometry data.

Type:

WranglerScintillometer

weather

Inherited methods for parsing weather data.

Type:

WranglerWeather

transect

Inherited methods for parsing topographical data.

Type:

WranglerTransect

eddy

Inherited methods for parsing eddy covariance data.

Type:

WranglerEddy

vertical

Inherited methods for parsing vertical measurements.

Type:

WranglerVertical

stitch

Inherited methods for merging parsed data into unified datasets.

Type:

WranglerStitch

wrangle_data(bls_path, transect_path, calibrate, weather_dir='./ext/data/raw/ZAMG/', station_id='11803', tzone='CET', weather_source='zamg')[source]

Wrangle BLS, ZAMG, and transect datasets.

Parameters:
  • bls_path (str) – Path to a raw .mnd data file using FORMAT-1.

  • transect_path (str) – Path to processed transect. The data must be formatted as: <path_height>, <normalised_path_position>. The normalised path position maps to: [0: receiver location, 1: transmitter location].

  • calibrate (list) – Contains the incorrect and correct path lengths. Format as [incorrect, correct].

  • weather_dir (str) – Path to directory with local weather data. Default “./ext/data/raw/ZAMG/”.

  • station_id (str) – ZAMG weather station ID (Klima-ID). Default 11803.

  • tzone (str) – Local timezone during the scintillometer’s operation. Default “CET”.

  • weather_source (str) – Data source of weather data. Currently supports ZAMG. Default “zamg”.

Returns:

BLS, ZAMG, and transect dataframes, an interpolated dataframe at 60s resolution containing BLS and ZAMG data, and a pd.TimeStamp object of the scintillometer’s recorded start time of data collection. All returned objects are localised to the timezone selected by the user.

Return type:

dict

class scintillometry.wrangler.data_parser.WranglerScintillometer[source]

Bases: WranglerIO

Parses scintillometry data.

transform

Inherited methods for transforming dataframe parameters.

Type:

WranglerTransform

calibrate_data(data, path_lengths)[source]

Calibrates data if the wrong path length was set.

Recalibrate data if the wrong path length was set in SRun or on the dip switches in the scintillometer. Use the argument:

--c, --calibrate <wrong_path_length> <correct_path_length>
Parameters:
  • data (pd.DataFrame) – Parsed and localised scintillometry dataframe.

  • path_lengths (list) – Contains the incorrect and correct path lengths, [m]. Format as [incorrect, correct].

Returns:

Recalibrated dataframe.

Return type:

pd.DataFrame

Raises:

ValueError – Calibration path lengths must be formatted as: <wrong_path_length> <correct_path_length>.

parse_mnd_lines(line_list)[source]

Parses data and variable names from a list of .mnd lines.

Parameters:

line_list (list) – Lines read from .mnd file in FORMAT-1.

Returns:

Contains a list of lines of parsed BLS data, an ordered list of variable names, the file timestamp, and any additional header parameters in the file header.

Return type:

dict

Raises:

Warning – The input file does not follow FORMAT-1.

parse_scintillometer(file_path, timezone='CET', calibration=None)[source]

Parses .mnd files into dataframes.

Parameters:
  • file_path (str) – Path to a raw .mnd data file using FORMAT-1.

  • timezone (str) – Local timezone during the scintillometer’s operation. Default “CET”.

  • calibration (list) – Contains the incorrect and correct path lengths, [m]. Format as [incorrect, correct]. Default None.

Returns:

Parsed and localised scintillometry data.

Return type:

pd.DataFrame

class scintillometry.wrangler.data_parser.WranglerStitch[source]

Bases: object

Merges parsed data into unified datasets.

merge_scintillometry_weather(scintillometry, weather)[source]

Merges parsed scintillometry and weather dataframes.

This replaces any weather data collected by the scintillometer with external weather data. It only preserves \(C_{n}^{2}\) and SHF data from the scintillometer.

If temperature or pressure data is in Celsius or Pa, they are automatically converted to Kelvin and hPa, respectively - any subsequent maths assumes these units.

Parameters:
  • scintillometry (pd.DataFrame) – Parsed and localised scintillometry data.

  • weather (pd.DataFrame) – Parsed and localised weather data.

Returns:

Merged dataframe containing both scintillometry data, and interpolated weather data.

Return type:

pd.DataFrame

class scintillometry.wrangler.data_parser.WranglerTransect[source]

Bases: WranglerIO

Parses topographical data.

transform

Inherited methods for transforming dataframe parameters.

Type:

WranglerTransform

parse_dgm_processed(file_path)[source]

Parses path transect from pre-processed DGM data.

The pre-processed data is a .csv file formatted as: <path_height>, <normalised_path_position>. The normalised path position maps to: [0: receiver location, 1: transmitter location].

Parameters:

file_path (str) – Path to processed transect. The data must be formatted as: <path_height>, <normalised_path_position>. The normalised path position maps to: [0: receiver location, 1: transmitter location].

Returns:

Parsed path transect data.

Return type:

pd.DataFrame

Raises:
  • FileNotFoundError – No file found with path: <file_path>.

  • ValueError – Normalised position is not between 0 and 1.

parse_transect(file_path, source='dgm_processed')[source]

Parses scintillometer path transect.

Parameters:
  • file_path (str) – Path to topographical data.

  • source (str) – Data source of topographical data. Currently supports pre-processed DGM 5m. Default “dgm_processed”.

Returns:

Parsed path transect data.

Return type:

pd.DataFrame

Raises:

NotImplementedError – <source> measurements are not supported. Use “dgm_processed”.

class scintillometry.wrangler.data_parser.WranglerTransform[source]

Bases: object

Transforms data labelling and indexing.

change_index_frequency(data, frequency='60S')[source]

Change frequency of time index.

Parameters:
  • data (pd.DataFrame or pd.Series) – An object with a time or datetime index.

  • frequency (str) – Reindexing frequency. Default “60S”.

Returns:

Object with new index frequency.

Return type:

pd.DataFrame or pd.Series

convert_time_index(data, tzone=None)[source]

Make tz-naive dataframe tz-aware.

Parameters:
  • data (pd.DataFrame) – Tz-naive dataframe.

  • tzone (str) – Local timezone. Default None.

Returns:

Tz-aware dataframe in local timezone or UTC.

Return type:

pd.DataFrame

parse_iso_date(x, date=True)[source]

Parses timestamp with mixed ISO-8601 duration and date.

Uses integer properties of bool to act as index for partition.

Parameters:
  • x (str) – Timestamp containing ISO-8601 duration and date, i.e. “<ISO-8601 duration>/<ISO-8601 date>”.

  • date (bool) – If True, returns date. Otherwise, returns duration. Default True.

Returns:

ISO-8601 string representing either a duration or a date.

Return type:

str

class scintillometry.wrangler.data_parser.WranglerVertical[source]

Bases: WranglerIO

Parses vertical measurements.

transform

Inherited methods for transforming dataframe parameters.

Type:

WranglerTransform

construct_hatpro_levels(levels=None)[source]

Construct HATPRO scanning levels.

Hardcoded scan levels specifically for HATPRO Retrieval data from HATPRO UIBK Met (612m). Scan levels are integer measurement heights relative to the station’s elevation.

Parameters:

levels (list[int]) – HATPRO measurement heights, \(z_{scan}\) [m]. Default None.

Returns:

HATPRO measurement heights, \(z_{scan}\) [m].

Return type:

list

Raises:

TypeError – Input levels must be a list or tuple of integers.

load_hatpro(file_name, levels, tzone='CET', station_elevation=612.0)[source]

Load raw HATPRO data into dataframe.

Parameters:
  • file_name (str) – Path to raw HATPRO data.

  • levels (list[int]) – Height of HATPRO scan level, \(z_{scan}\) [m].

  • tzone (str) – Local timezone of the scintillometer’s measurement period. Default “CET”.

  • station_elevation (float) – Station elevation, \(z_{stn}\) [m]. Default 612.0.

Returns:

Contains tz-aware and pre-processed HATPRO data.

Return type:

pd.DataFrame

parse_hatpro(file_prefix, timezone='CET', scan_heights=None, elevation=612.0)[source]

Parses HATPRO Retrieval data.

Parameters:
  • file_prefix (str) –

    Path prefix for HATPRO Retrieval data. There should be two HATPRO files ending with “humidity” and “temp”. The path prefix should be identical for both files, e.g.:

    ./path/to/file_humidity.csv
    ./path/to/file_temp.csv
    

    would require file_prefix = “./path/to/file_”.

  • timezone (str) – Local timezone of the scintillometer’s measurement period. Default “CET”.

  • scan_heights (list[int]) – Heights of HATPRO measurement levels, \(z_{scan}\) [m]. Default None.

  • elevation (float) – Station elevation, \(z_{stn}\) [m]. Default 612.0.

Returns:

Vertical measurements from HATPRO for temperature \(T\) [K], and absolute humidity \(\rho_{v}\) [g \(\cdot\) m -3].

Return type:

dict[str, pd.DataFrame]

parse_vertical(file_path, source='hatpro', tzone='CET', levels=None, station_elevation=612.0)[source]

Parses vertical measurements.

Currently only supports HATPRO.

Parameters:
  • file_path (str) –

    Path to vertical measurements. For HATPRO Retrieval data there should be two HATPRO files ending with “humidity” and “temp”. The path should be identical for both files, e.g.:

    ./path/to/file_humidity.csv
    ./path/to/file_temp.csv
    

    would require file_path = “./path/to/file_”.

  • source (str) – Instrument used for vertical measurements. Only supports HATPRO. Default “hatpro”.

  • levels (list[int]) – Heights of HATPRO measurements, \(z_{scan}\) [m]. Default None.

  • tzone (str) – Local timezone of the scintillometer’s measurement period. Default “CET”.

  • station_elevation (float) – Station elevation, \(z_{stn}\) [m]. Default 612.0.

Returns:

Vertical measurements for temperature \(T\) [K], and absolute humidity \(\rho_{v}\) [g \(\cdot\) m -3].

Return type:

dict[pd.DataFrame, pd.DataFrame]

Raises:

NotImplementedError – <source> measurements are not supported. Use “hatpro”.

class scintillometry.wrangler.data_parser.WranglerWeather[source]

Bases: WranglerIO

Parses meteorological data.

transform

Inherited methods for transforming dataframe parameters.

Type:

WranglerTransform

parse_weather(timestamp, source='zamg', data_dir='./ext/data/raw/ZAMG/', station_id='11803', timezone='CET')[source]

Parses weather data. Only supports ZAMG files.

Parameters:
  • timestamp (pd.Timestamp) – Start time of climate record.

  • source (str) – Data source of weather data. Currently supports ZAMG. Default “zamg”.

  • data_dir (str) – Location of weather data files. Default “./ext/data/raw/ZAMG/”.

  • station_id (str) – Weather station ID (e.g. ZAMG Klima-ID). Default 11803.

  • timezone (str) – Local timezone of the scintillometer’s measurement period. Default “CET”.

Returns:

Parsed weather data.

Return type:

pd.DataFrame

Raises:

NotImplementedError – <source> measurements are not supported. Use “zamg”.

parse_zamg_data(timestamp, klima_id='11803', data_dir='./ext/data/raw/ZAMG/', timezone='CET')[source]

Parses ZAMG climate records.

Parameters:
  • timestamp (pd.Timestamp) – Start time of climate record.

  • klima_id (str) – ZAMG weather station ID (Klima-ID). Default “11803”.

  • data_dir (str) – Location of ZAMG data files. Default “./ext/data/raw/ZAMG/”.

  • timezone (str) – Local timezone during the scintillometer’s operation. Default “CET”.

Returns:

Parsed ZAMG records.

Return type:

pd.DataFrame

References

Module contents