blueetl_core.utils¶
Core utilities.
Functions
|
Return the result of the comparison between obj and value. |
|
Build and return a Series from an iterable of tuples (value, conditions). |
|
Always return a list from the given argument. |
|
Return True if |
|
Return the number of matching elements from the beginning of the given iterables. |
|
Given a query dictionary, return the DataFrame filtered by columns and index. |
|
Given a query dictionary, return the Series filtered by index. |
|
Build and return a Series or a Dataframe from an iterable of objects with the same index. |
Classes
|
DataFrame wrapper to cache partial queries. |
|
Item of CachedDataFrame. |
- class blueetl_core.utils.CachedDataFrame(df: DataFrame)¶
Bases:
objectDataFrame wrapper to cache partial queries.
Initialize the object with the base DataFrame.
The internal stack will contain CachedItems, each one containing a DataFrame filtered by the corresponding key and value, and by all the previous keys and values in the stack.
Examples
self._stack = [ CachedItem(df=df0, key="simulation_id", value=1), CachedItem(df=df1, key="circuit_id", value=0), CachedItem(df=df2, key="window", value="w1"), CachedItem(df=df3, key="trial", value=0), ]
where:
df0isself._dffiltered bysimulation_id=1df1isdf0filtered bycircuit_id=0df2isdf1filtered bywindow="w1"df3isdf2filtered bytrial=0
- query(query: dict[str, Any], ignore_unknown_keys: bool = False) DataFrame¶
Return the DataFrame filtered by query, using cached DataFrames if possible.
The order of the keys in the query dict is important.
The cache is reused only when the keys and their order are the same.
The cache is reused also when only some keys and their values match.
- Parameters:
query – dict to be passed to
etl.q.ignore_unknown_keys – if True, ignore keys specified in the query but not present in the
False (DataFrame columns or in the index level names. If)
error. (unknown keys raise an)
- class blueetl_core.utils.CachedItem(df: DataFrame, key: str, value: Any)¶
Bases:
objectItem of CachedDataFrame.
- blueetl_core.utils.compare(obj: Series | Index, value: Any) ndarray¶
Return the result of the comparison between obj and value.
- Parameters:
obj – Series, or Index.
value – value used for comparison. - if scalar, use equality - if list-like, use isin - if dict, any supported operators can be specified, and they will be AND-ed together
Examples
>>> df = pd.DataFrame({"gid": [0, 2, 3, 7, 8]}) >>> compare(df["gid"], 3) array([False, False, True, False, False]) >>> compare(df["gid"], [3, 5, 8]) array([False, False, True, False, True]) >>> compare(df["gid"], {"ge": 3, "lt": 8}) array([False, False, True, True, False])
- blueetl_core.utils.concat_tuples(iterable, *args, **kwargs)¶
Build and return a Series from an iterable of tuples (value, conditions).
- Parameters:
iterable –
iterable of tuples (value, conditions), where
value is a single value that will be added to the Series
conditions is a dict containing the conditions to be used for the MultiIndex. The keys of the conditions must be the same for each tuple of the iterable, or an exception is raised.
args – positional arguments to be passed to pd.concat
kwargs – key arguments to be passed to pd.concat
- Returns:
(pd.Series) result of the concatenation.
- blueetl_core.utils.ensure_list(x: Any) list¶
Always return a list from the given argument.
- blueetl_core.utils.is_subfilter(left: dict, right: dict, strict: bool = False) bool¶
Return True if
leftis a subfilter ofright, False otherwise.- Parameters:
left – left filter dict.
right – right filter dict.
strict – if False,
leftis a subfilter ofrightif it’s equal or more specific; if True,leftis a subfilter ofrightonly if it’s more specific.
Examples
>>> print(is_subfilter({}, {})) True >>> print(is_subfilter({}, {}, strict=True)) False >>> print(is_subfilter({}, {"key": 1})) False >>> print(is_subfilter({"key": 1}, {})) True >>> print(is_subfilter({"key": 1}, {"key": 1})) True >>> print(is_subfilter({"key": 1}, {"key": 1}, strict=True)) False >>> print(is_subfilter({"key": 1}, {"key": [1]})) True >>> print(is_subfilter({"key": 1}, {"key": [1]}, strict=True)) False >>> print(is_subfilter({"key": 1}, {"key": [1, 2]})) True >>> print(is_subfilter({"key": 1}, {"key": {"isin": [1, 2]}})) True >>> print(is_subfilter({"key": 1}, {"key": 2})) False >>> print(is_subfilter({"key": 1}, {"key": [2, 3]})) False >>> print(is_subfilter({"key": 1}, {"key": {"isin": [2, 3]}})) False >>> print(is_subfilter({"key1": 1, "key2": 2}, {"key1": 1})) True >>> print(is_subfilter({"key1": 1}, {"key1": 1, "key2": 2})) False
- blueetl_core.utils.longest_match_count(iter1, iter2) int¶
Return the number of matching elements from the beginning of the given iterables.
- blueetl_core.utils.query_frame(df: DataFrame, query_list: list[dict[str, Any]]) DataFrame¶
Given a query dictionary, return the DataFrame filtered by columns and index.
- blueetl_core.utils.query_series(series: Series, query_list: list[dict[str, Any]]) Series¶
Given a query dictionary, return the Series filtered by index.
- blueetl_core.utils.smart_concat(iterable, *, keys=None, copy=False, skip_empty=True, **kwargs)¶
Build and return a Series or a Dataframe from an iterable of objects with the same index.
This is similar to
pd.concat, but the result is consistent even when the levels of the indexes are ordered differently, whilepd.concatwould blindly concatenate the indexes, ignoring and removing the names of the levels.Moreover, it uses
copy=Falseby default, that’s safe only if the original data isn’t going to change, but it’s more efficient, especially when concatenating a single item.- Parameters:
iterable – iterable or mapping of Series or DataFrames. All the objects must be of the same type, and they must have the same index, or an exception is raised.
keys – passed to pd.concat. If multiple levels passed, should contain tuples. Construct hierarchical index using the passed keys as the outermost level.
copy – passed to pd.concat. If the original data can be used without making a copy, then it can be set to False.
skip_empty –
if True, empty objects are skipped, unless they are all empty. If False, they are all passed to pd.concat, and the result may depend on the Pandas version. Note that in the latter case, you may see a FutureWarning with Pandas 2:
FutureWarning: The behavior of DataFrame concatenation with empty or all-NA entries is deprecated. In a future version, this will no longer exclude empty or all-NA columns when determining the result dtypes. To retain the old behavior, exclude the relevant entries before the concat operation.
kwargs – other keyword arguments to be passed to pd.concat
- Returns:
(pd.Series|pd.DataFrame) result of the concatenation, same type of the input elements.
Examples
>>> idx1 = pd.MultiIndex.from_tuples([(10, 11), (20, 21)], names=["i1", "i2"]) >>> idx2 = pd.MultiIndex.from_tuples([(11, 10), (31, 30)], names=["i2", "i1"]) >>> df1 = pd.DataFrame({"A": [1, 2], "B": [3, 4]}, index=idx1) >>> df2 = pd.DataFrame({"A": [5, 6], "B": [7, 8]}, index=idx2) >>> pd.concat([df1, df2]) # index levels are lost A B 10 11 1 3 20 21 2 4 11 10 5 7 31 30 6 8 >>> smart_concat([df1, df2]) # index levels are preserved A B i1 i2 10 11 1 3 20 21 2 4 10 11 5 7 30 31 6 8 >>> pd.concat([df1, df2], axis=1) # index levels are lost A B A B 10 11 1.0 3.0 NaN NaN 20 21 2.0 4.0 NaN NaN 11 10 NaN NaN 5.0 7.0 31 30 NaN NaN 6.0 8.0 >>> smart_concat([df1, df2], axis=1) # index levels are preserved A B A B i1 i2 10 11 1.0 3.0 5.0 7.0 20 21 2.0 4.0 NaN NaN 30 31 NaN NaN 6.0 8.0