blueetl_core.utils¶

Core utilities.

Functions

`compare`(obj, value)	Return the result of the comparison between obj and value.
`concat_tuples`(iterable, args, *kwargs)	Build and return a Series from an iterable of tuples (value, conditions).
`ensure_list`(x)	Always return a list from the given argument.
`is_subfilter`(left, right[, strict])	Return True if `left` is a subfilter of `right`, False otherwise.
`longest_match_count`(iter1, iter2)	Return the number of matching elements from the beginning of the given iterables.
`query_frame`(df, query_list)	Given a query dictionary, return the DataFrame filtered by columns and index.
`query_series`(series, query_list)	Given a query dictionary, return the Series filtered by index.
`smart_concat`(iterable, *[, keys, copy, ...])	Build and return a Series or a Dataframe from an iterable of objects with the same index.

Classes

`CachedDataFrame`(df)	DataFrame wrapper to cache partial queries.
`CachedItem`(df, key, value)	Item of CachedDataFrame.

class blueetl_core.utils.CachedDataFrame(df: DataFrame)¶

Bases: object

DataFrame wrapper to cache partial queries.

Initialize the object with the base DataFrame.

The internal stack will contain CachedItems, each one containing a DataFrame filtered by the corresponding key and value, and by all the previous keys and values in the stack.

Examples

self._stack = [
    CachedItem(df=df0, key="simulation_id", value=1),
    CachedItem(df=df1, key="circuit_id", value=0),
    CachedItem(df=df2, key="window", value="w1"),
    CachedItem(df=df3, key="trial", value=0),
]

where:

df0 is self._df filtered by simulation_id=1
df1 is df0 filtered by circuit_id=0
df2 is df1 filtered by window="w1"
df3 is df2 filtered by trial=0

query(query: dict[str, Any], ignore_unknown_keys: bool = False) → DataFrame¶

Return the DataFrame filtered by query, using cached DataFrames if possible.

The order of the keys in the query dict is important.
The cache is reused only when the keys and their order are the same.
The cache is reused also when only some keys and their values match.

Parameters:

query – dict to be passed to etl.q.
ignore_unknown_keys – if True, ignore keys specified in the query but not present in the
False (DataFrame columns or in the index level names. If)
error. (unknown keys raise an)

class blueetl_core.utils.CachedItem(df: DataFrame, key: str, value: Any)¶

Bases: object

Item of CachedDataFrame.

blueetl_core.utils.compare(obj: Series | Index, value: Any) → ndarray¶

Return the result of the comparison between obj and value.

Parameters:

obj – Series, or Index.
value – value used for comparison. - if scalar, use equality - if list-like, use isin - if dict, any supported operators can be specified, and they will be AND-ed together

Examples

>>> df = pd.DataFrame({"gid": [0, 2, 3, 7, 8]})
>>> compare(df["gid"], 3)
    array([False, False,  True, False, False])
>>> compare(df["gid"], [3, 5, 8])
    array([False, False,  True, False,  True])
>>> compare(df["gid"], {"ge": 3, "lt": 8})
    array([False, False,  True,  True, False])

blueetl_core.utils.concat_tuples(iterable, *args, **kwargs)¶

Build and return a Series from an iterable of tuples (value, conditions).

Parameters:

iterable –
iterable of tuples (value, conditions), where
- value is a single value that will be added to the Series
- conditions is a dict containing the conditions to be used for the MultiIndex. The keys of the conditions must be the same for each tuple of the iterable, or an exception is raised.
args – positional arguments to be passed to pd.concat
kwargs – key arguments to be passed to pd.concat

Returns:

(pd.Series) result of the concatenation.

blueetl_core.utils.ensure_list(x: Any) → list¶: Always return a list from the given argument.

blueetl_core.utils.is_subfilter(left: dict, right: dict, strict: bool = False) → bool¶

Return True if left is a subfilter of right, False otherwise.

Parameters:

left – left filter dict.
right – right filter dict.
strict – if False, left is a subfilter of right if it’s equal or more specific; if True, left is a subfilter of right only if it’s more specific.

Examples

>>> print(is_subfilter({}, {}))
True
>>> print(is_subfilter({}, {}, strict=True))
False
>>> print(is_subfilter({}, {"key": 1}))
False
>>> print(is_subfilter({"key": 1}, {}))
True
>>> print(is_subfilter({"key": 1}, {"key": 1}))
True
>>> print(is_subfilter({"key": 1}, {"key": 1}, strict=True))
False
>>> print(is_subfilter({"key": 1}, {"key": [1]}))
True
>>> print(is_subfilter({"key": 1}, {"key": [1]}, strict=True))
False
>>> print(is_subfilter({"key": 1}, {"key": [1, 2]}))
True
>>> print(is_subfilter({"key": 1}, {"key": {"isin": [1, 2]}}))
True
>>> print(is_subfilter({"key": 1}, {"key": 2}))
False
>>> print(is_subfilter({"key": 1}, {"key": [2, 3]}))
False
>>> print(is_subfilter({"key": 1}, {"key": {"isin": [2, 3]}}))
False
>>> print(is_subfilter({"key1": 1, "key2": 2}, {"key1": 1}))
True
>>> print(is_subfilter({"key1": 1}, {"key1": 1, "key2": 2}))
False

blueetl_core.utils.longest_match_count(iter1, iter2) → int¶: Return the number of matching elements from the beginning of the given iterables.

blueetl_core.utils.query_frame(df: DataFrame, query_list: list[dict[str, Any]]) → DataFrame¶: Given a query dictionary, return the DataFrame filtered by columns and index.

blueetl_core.utils.query_series(series: Series, query_list: list[dict[str, Any]]) → Series¶: Given a query dictionary, return the Series filtered by index.

blueetl_core.utils.smart_concat(iterable, *, keys=None, copy=False, skip_empty=True, **kwargs)¶

Build and return a Series or a Dataframe from an iterable of objects with the same index.

This is similar to pd.concat, but the result is consistent even when the levels of the indexes are ordered differently, while pd.concat would blindly concatenate the indexes, ignoring and removing the names of the levels.

Moreover, it uses copy=False by default, that’s safe only if the original data isn’t going to change, but it’s more efficient, especially when concatenating a single item.

Parameters:

iterable – iterable or mapping of Series or DataFrames. All the objects must be of the same type, and they must have the same index, or an exception is raised.
keys – passed to pd.concat. If multiple levels passed, should contain tuples. Construct hierarchical index using the passed keys as the outermost level.
copy – passed to pd.concat. If the original data can be used without making a copy, then it can be set to False.
skip_empty –
if True, empty objects are skipped, unless they are all empty. If False, they are all passed to pd.concat, and the result may depend on the Pandas version. Note that in the latter case, you may see a FutureWarning with Pandas 2:

FutureWarning: The behavior of DataFrame concatenation with empty or all-NA entries is deprecated. In a future version, this will no longer exclude empty or all-NA columns when determining the result dtypes. To retain the old behavior, exclude the relevant entries before the concat operation.
kwargs – other keyword arguments to be passed to pd.concat

Returns:

(pd.Series|pd.DataFrame) result of the concatenation, same type of the input elements.

Examples

>>> idx1 = pd.MultiIndex.from_tuples([(10, 11), (20, 21)], names=["i1", "i2"])
>>> idx2 = pd.MultiIndex.from_tuples([(11, 10), (31, 30)], names=["i2", "i1"])
>>> df1 = pd.DataFrame({"A": [1, 2], "B": [3, 4]}, index=idx1)
>>> df2 = pd.DataFrame({"A": [5, 6], "B": [7, 8]}, index=idx2)
>>> pd.concat([df1, df2])  # index levels are lost
       A  B
10 11  1  3
20 21  2  4
11 10  5  7
31 30  6  8
>>> smart_concat([df1, df2])  # index levels are preserved
       A  B
i1 i2
10 11  1  3
20 21  2  4
10 11  5  7
30 31  6  8
>>> pd.concat([df1, df2], axis=1)  # index levels are lost
         A    B    A    B
10 11  1.0  3.0  NaN  NaN
20 21  2.0  4.0  NaN  NaN
11 10  NaN  NaN  5.0  7.0
31 30  NaN  NaN  6.0  8.0
>>> smart_concat([df1, df2], axis=1)  # index levels are preserved
         A    B    A    B
i1 i2
10 11  1.0  3.0  5.0  7.0
20 21  2.0  4.0  NaN  NaN
30 31  NaN  NaN  6.0  8.0