ml4cps.automata.learn module¶
The module provides learning algorithms for creation of different kinds of automata.
Authors: - Nemanja Hranisavljevic, hranisan@hsu-hh.de, nemanja@ai4cps.com - Tom Westermann, tom.westermann@hsu-hh.de, tom@ai4cps.com
- ml4cps.automata.learn.FnDetectChangePoints(xout, udout, xout_shifts)¶
- Detects change points in output and input variables, filters them, and constructs a trace structure. This function analyzes the provided output (xout) and input (udout) data to detect change points using the findChangePoints function. It processes both output and input variables, aggregates and filters the detected change points, and returns a structured trace dictionary containing the processed data and change point information. :param xout: Output data array of shape (n_samples, n_outputs). :type xout: np.ndarray :param udout: Input data array of shape (n_samples, n_inputs). :type udout: np.ndarray :param xout_shifts: Shifted output data array, used for further processing. :type xout_shifts: np.ndarray - Returns:
- A dictionary with the following keys:
- ’x’: Filtered output data array. 
- ’xs’: Filtered shifted output data array. 
- ’chpoints’: Array of global change points detected across all variables. 
- ’chpoints_per_var’: List of arrays, each containing change points for a specific variable. 
- ’ud’: Filtered input data array. 
- ’labels_num’: Empty list (reserved for numeric labels). 
- ’labels_trace’: Empty list (reserved for trace labels). 
 
 
- Return type:
- dict 
 - Notes - Relies on global variables: num_var, num_ud, max_deriv, and chp_depths. 
- Uses external functions: findChangePoints and filterChangePoints. 
- The function is intended for use in time-series or sequential data analysis where detecting significant changes in variable values is required. 
 
- ml4cps.automata.learn.FnShiftAndDiff(xout, udout, norm_coeff, num_var, num_ud, max_deriv, Ts)¶
- Normalizes and processes state and input data by generating shifted versions and computing derivatives. This function performs the following operations: 1. Normalizes the state output xout using the provided normalization coefficients. 2. Generates shifted duplicates of xout up to the specified maximum derivative order. 3. Computes numerical derivatives of xout up to max_deriv order and appends them to the state data. 4. Strips the initial entries from xout and the shifted data to account for the derivative computation. 5. Normalizes the input data udout (if present) and strips initial entries to match the processed state data. :param xout: State output data of shape (n_samples, num_var). :type xout: np.ndarray :param udout: Input data of shape (n_samples, num_ud). :type udout: np.ndarray :param norm_coeff: Normalization coefficients of shape (num_var + num_ud, 1). :type norm_coeff: np.ndarray :param num_var: Number of state variables. :type num_var: int :param num_ud: Number of input variables. :type num_ud: int :param max_deriv: Maximum order of derivatives to compute. :type max_deriv: int :param Ts: Sampling time interval. :type Ts: float - Returns:
- xout (np.ndarray): Processed and augmented state data with derivatives, shape (n_samples - max_deriv, …). 
- udout (np.ndarray): Normalized input data, shape (n_samples - max_deriv, num_ud). 
- xout_shifts (pd.DataFrame): Shifted duplicates of the normalized state data, shape (n_samples - max_deriv, …). 
 
- Return type:
- Tuple[np.ndarray, np.ndarray, pd.DataFrame] 
 - Notes - The function assumes that xout and udout are NumPy arrays and that xout has at least max_deriv rows. 
- The normalization coefficients should be provided for both state and input variables. 
- The function uses zero-padding for shifted and derivative computations. 
 
- ml4cps.automata.learn.FnTraceToTrainingData(trace, num_var, num_ud, useTime)¶
- Converts a trace dictionary into training data suitable for machine learning models. :param trace: A dictionary containing the following keys: - ‘x’ (np.ndarray): Array of system variables over time (shape: [timesteps, num_var]). 
- ‘ud’ (np.ndarray): Array of user-defined variables over time (shape: [timesteps, num_ud]). 
- ‘labels_trace’ (np.ndarray): Array of state labels for each segment. 
- ‘chpoints’ (list or np.ndarray): Indices indicating change points (state switches) in the trace. 
 - Parameters:
- num_var (int) – Number of system variables to include in the feature vector. 
- num_ud (int) – Number of user-defined variables to include in the feature vector. 
- useTime (bool) – If True, includes the time since the last state switch as a feature. 
 
- Returns:
- Feature matrix where each row corresponds to a time step and contains:
- Current state label 
- System variables 
- User-defined variables (if num_ud > 0) 
- Time since last state switch (if useTime is True) 
 
 - Y (np.ndarray): Array of next state labels (class labels) for each feature vector in X. states (np.ndarray): Array of state labels for each time step in the trace. 
- Return type:
- X (np.ndarray) 
 - Notes - The function skips the last time step in the trace for feature construction, as it cannot form a (X, Y) pair. 
- The function assumes that the trace data is properly aligned and that ‘chpoints’ and ‘labels_trace’ are consistent. 
 
- ml4cps.automata.learn.build_pta(data, event_col='event', boundaries=1)¶
- Builds a Prefix Tree Acceptor (PTA) from a collection of event sequences. This function constructs a PTA by iterating through each sequence of events in the provided data. It adds states and transitions to the automaton based on the observed event sequences, and sets the depth, in-degree, and out-degree of the states. The PTA is useful for learning automata from positive examples. :param data: An iterable of event sequences. Each sequence can be a pandas DataFrame, - pandas Series, or string. If a DataFrame, it should contain at least a time column and an event column. - Parameters:
- event_col (str, optional) – The name of the column containing event labels in the input DataFrame or Series. Defaults to ‘event’. 
- boundaries (int or dict, optional) – Not currently used in the function, but intended for handling event boundaries or timing constraints. Defaults to 1. 
 
- Returns:
- The constructed Prefix Tree Acceptor representing the input event sequences. 
- Return type:
 - Notes - The function expects the presence of an Automaton class with methods for adding states, transitions, and final states. 
- If a sequence is a string, it is converted to a pandas Series of characters. 
- Timing information (dt) is calculated as the difference between consecutive time steps. 
- The function skips empty sequences. 
 
- ml4cps.automata.learn.computeDistance(der)¶
- Computes a distance metric over a sliding window for a given derivative array. For each position in the input array der, the function calculates the sum of absolute differences between two windows of size windowSize before and after the current position, after normalizing each window by subtracting its first element. The result is an array of distances. :param der: The input array (e.g., derivative values) over which to compute the distance. :type der: np.ndarray - Returns:
- An array containing the computed distances for each valid position. 
- Return type:
- np.ndarray 
 
- ml4cps.automata.learn.filterChangePoints(xout, udout, xout_shifts, chpoints, chp_var)¶
- Filters and synchronizes detected changepoints in time series data across multiple variables. This function processes global and local changepoint indices to ensure consistency and remove redundant or closely spaced changepoints. It updates the provided data arrays and changepoint lists accordingly. :param xout: Output variable time series data (samples x variables). :type xout: np.ndarray :param udout: Input variable time series data (samples x variables), or an empty array if not used. :type udout: np.ndarray :param xout_shifts: Shifted output variable data (samples x variables). :type xout_shifts: np.ndarray :param chpoints: List of global changepoint indices. :type chpoints: list or np.ndarray :param chp_var: List containing arrays of changepoint indices for each variable. :type chp_var: list of np.ndarray - Returns:
- xout (np.ndarray): Filtered output variable data. 
- udout (np.ndarray): Filtered input variable data. 
- xout_shifts (np.ndarray): Filtered shifted output variable data. 
- chpoints (np.ndarray): Filtered and synchronized global changepoint indices. 
- chp_var (list of np.ndarray): Updated list of changepoint indices for each variable. 
 
- Return type:
- tuple 
 - Notes - Uses global variables: windowSize, num_var, num_ud. 
- Assumes that changepoints are sorted and that there are at least two changepoints. 
- The function modifies chp_var in place. 
 
- ml4cps.automata.learn.filterindx(indx, windw)¶
- Filters out indices from the input array that are within a specified window of each other. Given a sorted array of indices, this function removes any index that is within windw distance from its predecessor, keeping only the first occurrence in each window. :param indx: A sorted array or list of integer indices. :type indx: array-like :param windw: The minimum allowed distance between consecutive indices. :type windw: int - Returns:
- The filtered array of indices, where no two indices are within windw of each other. 
- Return type:
- numpy.ndarray 
 - Example - >>> filterindx(np.array([1, 2, 3, 10, 12]), 2) array([ 1, 10, 12]) 
- ml4cps.automata.learn.findChangePoints(xout, depth, starting, ending, max_depth)¶
- Recursively detects change points in a multi-dimensional signal using a hierarchical approach. This function analyzes a segment of the input array xout at a given depth (dimension), computes a distance metric to identify potential change points (peaks), and then recursively searches for further change points in subsegments at deeper levels. The recursion stops when the maximum depth is reached or the segment is too small. :param xout: The input array containing the signal or features to analyze. Expected shape is (n_samples, n_features). :type xout: np.ndarray :param depth: The current depth (dimension) being analyzed. :type depth: int :param starting: The starting index of the segment to analyze. :type starting: int :param ending: The ending index (exclusive) of the segment to analyze. :type ending: int :param max_depth: The maximum depth (dimension) to analyze. :type max_depth: int - Returns:
- An array of indices representing detected change points within the specified segment. 
- Return type:
- np.ndarray 
 - Notes - Uses global variables windowSize and chp_depths for windowing and tracking change points per depth. 
- Utilizes computeDistance, find_peaks, and filterindx helper functions. 
- At the top level (depth == 0), prepends and appends boundary indices to the result. 
 
- ml4cps.automata.learn.rpni(positive_samples, negative_samples)¶
- Implements the RPNI (Regular Positive and Negative Inference) algorithm for learning a DFA from positive and negative samples. :param positive_samples: A collection of strings that should be accepted by the learned DFA. :type positive_samples: Iterable[str] :param negative_samples: A collection of strings that should be rejected by the learned DFA. :type negative_samples: Iterable[str] - Returns:
- A deterministic finite automaton (DFA) that accepts all positive samples and rejects all negative samples. 
- Return type:
- DFA 
 - Notes - The function first constructs a Prefix Tree Acceptor (PTA) from the positive samples. 
- It then attempts to merge states in the DFA, ensuring that no negative sample is accepted after each merge. 
- The merging process is guided by the constraint that all negative samples must be rejected. 
 
- ml4cps.automata.learn.simple_learn_from_event_logs(data, initial=True, count_repetition=True, verbose=False)¶
- Simple algorithm to learn a timed automaton from event log data. This function constructs a timed automaton by iterating over sequences of timestamped events. Each event sequence is treated as a trace, and transitions are created between states based on event occurrences and their timing. States are determined by the emitted events, optionally including repetition counts. The automaton can be initialized with an explicit initial state, and transitions can account for repeated events. :param data: A list of event sequences, where each sequence is a pandas Series with timestamps as indices and event labels as values. :type data: list or pandas.Series :param initial: If True, adds an explicit ‘initial’ state to the automaton. Defaults to True. :type initial: bool, optional :param count_repetition: If True, distinguishes states and transitions by counting consecutive repetitions of the same event. Defaults to True. :type count_repetition: bool, optional :param verbose: If True, prints detailed information about the learning process. Defaults to False. :type verbose: bool, optional - Returns:
- The learned timed automaton object. 
- Return type:
 - Notes - Each sequence in data should be a pandas Series indexed by timestamps. 
- If a sequence contains fewer than two events, it is skipped. 
- The function assumes the existence of an Automaton class with add_initial_state and add_single_transition methods. 
 
- ml4cps.automata.learn.simple_learn_from_signal_updates(data, sig_names, initial=True, verbose=False)¶
- Learns a timed automaton from sequences of signal updates. This function processes a list of dataframes, each representing a sequence of signal updates over time. For each sequence, it constructs states based on the values of the specified signals and adds transitions to an Automaton object whenever a signal value changes. :param data: List of dataframes, each containing time-stamped signal updates. - The first column is assumed to be the time column, and subsequent columns correspond to signal names. - Parameters:
- sig_names (list of str) – List of signal names to track and use for state construction. 
- initial (bool, optional) – If True, adds an initial state to the automaton for each sequence. Defaults to True. 
- verbose (bool, optional) – If True, prints detailed information about the learning process. Defaults to False. 
 
- Returns:
- The learned automaton with states and transitions based on the observed signal updates. 
- Return type:
 - Notes - Each state is represented as a dictionary mapping signal names to their current values. 
- Transitions are added only when all signal values are set (i.e., not None). 
- The event label for each transition is formatted as ‘<signal_name><-<value>’. 
- The transition is annotated with the time difference (delta_t) between consecutive events. 
 
- ml4cps.automata.learn.simple_learn_from_signal_vectors(data, drop_no_changes=False, verbose=False)¶
- Learns a timed automaton from a list of signal vector dataframes. This function processes sequences of signal vectors (as pandas DataFrames), detects changes in the specified signal columns, and constructs a timed automaton by adding transitions for each detected event. :param data: List of DataFrames, each representing a sequence of signal vectors. - The first column is assumed to be the time column, and the remaining columns are signal values. - Parameters:
- sig_names (list of str) – List of column names in the DataFrame that correspond to the signals to be considered for state transitions. 
- drop_no_changes (bool, optional) – If True, rows where no signal changes occur are dropped before processing. Default is False. 
- verbose (bool, optional) – If True, prints detailed information about the learning process. Default is False. 
 
- Returns:
- An Automaton object constructed from the observed transitions in the input data. 
- Return type:
 - Notes - Each transition in the automaton corresponds to a change in the signal vector, with the event label representing the difference between consecutive signal vectors and the transition time as the time delta. 
- The function assumes that the Automaton class and its add_single_transition method are defined elsewhere.