psykoda.preprocess package¶
Module contents¶
Preprocessing
- class psykoda.preprocess.ScreeningConfig(min: int, max: int = 100000000)[source]¶
Bases:
object
Log screening settings.
- max: int = 100000000¶
- min: int¶
- psykoda.preprocess.addr_in_subnets(sub_networks: list) → Callable[[str], bool][source]¶
Build “in some of these subnets” filter for IP addresses
- Returns
predicate for IP addresses
- Return type
in_subnets(addr)
Warning
Optimized for IPv4. Does not support IPv6.
- psykoda.preprocess.drop_null(df: pandas.core.frame.DataFrame) → pandas.core.frame.DataFrame[source]¶
- psykoda.preprocess.exclude_log(log: pandas.core.frame.DataFrame, exclusion: Iterable[dict]) → pandas.core.frame.DataFrame[source]¶
- psykoda.preprocess.extract_log(log: pandas.core.frame.DataFrame, subnets: Optional[List[str]], include_ports: Optional[List[int]] = None, exclude_ports: Optional[List[int]] = None) → pandas.core.frame.DataFrame[source]¶
extract logs with subnets and service_dport
- Parameters
subnets – List of subnets to which the IP addresses to be extracted belong. e.g [“10.25.148.0/24”, “192.168.0.0/16”] (CIDR format) None to extract all IP addresses.
include_ports – List of port numbers to extract. e.g [22, 3389] None to extract all port numbers.
exclude_ports – List of port numbers not to extract, e.g [22, 3389], Empty or None to exclude no port numbers. Exclusion takes precedence over inclusion.
- psykoda.preprocess.filter_out(log: pandas.core.frame.DataFrame, column_name: str, filter_patterns: pandas.core.indexes.base.Index) → pandas.core.frame.DataFrame[source]¶
Filter out rows according to patterns of column values.
- Parameters
log –
column_name – name of data or index column to match patterns against.
filter_patterns – patterns to filter out matching rows. if column_name is col.SRC_IP or col.DEST_IP, a pattern is a CIDR notation (ipaddress.ip_network() accepts). otherwise, a pattern is a string to match the values exactly.
- psykoda.preprocess.screening_numlog(log: pandas.core.frame.DataFrame, config: psykoda.preprocess.ScreeningConfig) → pandas.core.frame.DataFrame[source]¶
exclude ip addresses whose numbers of logs are out of [ config.min, config.max ]
- Parameters
log – Source log.
config – Settings for screening.
- Returns
Screened log.
- Return type
log