intelmq.lib package¶
Subpackages¶
Submodules¶
intelmq.lib.bot module¶
- The bot library has the base classes for all bots.
- Bot: generic base class for all kind of bots
- CollectorBot: base class for collectors
- ParserBot: base class for parsers
-
class
intelmq.lib.bot.
Bot
(bot_id: str, start: bool = False, sighup_event=None, disable_multithreading: bool = None)¶ Bases:
object
Not to be reset when initialized again on reload.
-
_Bot__disconnect_pipelines
()¶ Disconnecting pipelines.
-
_Bot__handle_sighup
()¶ Handle SIGHUP.
-
_Bot__handle_sighup_signal
(signum: int, stack: object)¶ Called when signal is received and postpone.
-
_Bot__handle_sigterm_signal
(signum: int, stack: object)¶ Calls when a SIGTERM is received. Stops the bot.
-
_Bot__init_logger
()¶ Initialize the logger.
-
_Bot__sleep
(remaining: typing.Union[float, NoneType] = None, log: bool = True)¶ Sleep handles interrupts and changed rate_limit-parameter.
time.sleep is stopped by signals such as SIGHUP. As rate_limit could have been changed, we initialize again and continue to sleep, if necessary at all.
Parameters: - remaining – Time to sleep. ‘rate_limit’ parameter by default if None
- log – Log the remaining sleep time, default: True
-
_Bot__stats
(force: bool = False)¶ Flush stats to redis
Only all self.__message_counter_delay (2 seconds), or with force=True
-
classmethod
_create_argparser
()¶ see https://github.com/certtools/intelmq/pull/1524/files#r464606370 why this code is not in the constructor
-
_parse_common_parameters
()¶ Parses and sanitizes commonly used parameters:
- extract_files
-
_parse_extract_file_parameter
(parameter_name: str = 'extract_files')¶ Parses and sanitizes commonly used parameters:
- extract_files
-
accuracy
= 100¶
-
acknowledge_message
()¶ Acknowledges that the last message has been processed, if any.
For bots without source pipeline (collectors), this is a no-op.
-
static
check
(parameters: dict) → typing.Union[typing.List[typing.List[str]], NoneType]¶ The bot’s own check function can perform individual checks on it’s parameters. init() is not called before, this is a staticmethod which does not require class initialization.
Parameters: parameters – Bot’s parameters, defaults and runtime merged together Returns: - None or a list of [log_level, log_message] pairs, both
- strings. log_level must be a valid log level.
Return type: output
-
description
= None¶
-
destination_pipeline_broker
= 'redis'¶
-
destination_pipeline_db
= 2¶
-
destination_pipeline_host
= '127.0.0.1'¶
-
destination_pipeline_password
= None¶
-
destination_pipeline_port
= 6379¶
-
destination_queues
= {}¶
-
enabled
= True¶
-
error_dump_message
= True¶
-
error_log_exception
= True¶
-
error_log_message
= False¶
-
error_max_retries
= 3¶
-
error_procedure
= 'pass'¶
-
error_retry_delay
= 15¶
-
group
= None¶
-
harmonization
¶
-
http_proxy
= None¶
-
http_timeout_max_tries
= 3¶
-
http_timeout_sec
= 30¶
-
http_user_agent
= 'Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/41.0.2228.0 Safari/537.36'¶
-
http_verify_cert
= True¶
-
https_proxy
= None¶
-
init
()¶
-
instances_threads
= 0¶
-
is_multithreaded
= False¶
-
load_balance
= False¶
-
log_processed_messages_count
= 500¶
-
log_processed_messages_seconds
= 900¶
-
logging_handler
= 'file'¶
-
logging_level
= 'INFO'¶
-
logging_path
= '/opt/intelmq/var/log/'¶
-
logging_syslog
= '/dev/log'¶
-
module
= None¶
-
name
= None¶
-
new_event
(*args, **kwargs)¶
-
process_manager
= 'intelmq'¶
-
rate_limit
= 0¶
-
receive_message
() → intelmq.lib.message.Message¶ If the bot is reloaded when waiting for an incoming message, the received message will be rejected to the pipeline in the first place to get to a clean state. Then, after reloading, the message will be retrieved again.
-
classmethod
run
(parsed_args=None)¶
-
run_mode
= 'continuous'¶
-
send_message
(*messages, path: str = '_default', auto_add=None, path_permissive: bool = False)¶ Parameters: - messages – Instances of intelmq.lib.message.Message class
- auto_add – ignored
- path_permissive – If true, do not raise an error if the path is not configured
-
set_request_parameters
()¶
-
shutdown
()¶
-
source_pipeline_broker
= 'redis'¶
-
source_pipeline_db
= 2¶
-
source_pipeline_host
= '127.0.0.1'¶
-
source_pipeline_password
= None¶
-
source_pipeline_port
= 6379¶
-
source_queue
= None¶
-
ssl_ca_certificate
= None¶
-
start
(starting: bool = True, error_on_pipeline: bool = True, error_on_message: bool = False, source_pipeline: typing.Union[str, NoneType] = None, destination_pipeline: typing.Union[str, NoneType] = None)¶
-
statistics_database
= 3¶
-
statistics_host
= '127.0.0.1'¶
-
statistics_password
= None¶
-
statistics_port
= 6379¶
-
stop
(exitcode: int = 1)¶
-
-
class
intelmq.lib.bot.
CollectorBot
(bot_id: str, start: bool = False, sighup_event=None, disable_multithreading: bool = None)¶ Bases:
intelmq.lib.bot.Bot
Base class for collectors.
Does some sanity checks on message sending.
-
_CollectorBot__add_report_fields
(report: intelmq.lib.message.Report)¶ Adds the configured feed parameters to the report, of they are set (!= None). The following parameters are set to these report fields:
- name -> feed.name
- code -> feed.code
- documentation -> feed.documentation
- provider -> feed.provider
- accuracy -> feed.accuracy
-
accuracy
= 100¶
-
bottype
= 'Collector'¶
-
code
= None¶
-
documentation
= None¶
-
name
= None¶
-
new_report
()¶
-
provider
= None¶
-
send_message
(*messages, path: str = '_default', auto_add: bool = True)¶ ” :param messages: Instances of intelmq.lib.message.Message class :param path: Named queue the message will be send to :param auto_add: Add some default report fields form parameters
-
-
class
intelmq.lib.bot.
ParserBot
(bot_id: str, start: bool = False, sighup_event=None, disable_multithreading: bool = None)¶ Bases:
intelmq.lib.bot.Bot
-
bottype
= 'Parser'¶
-
parse
(report: intelmq.lib.message.Report)¶ A generator yielding the single elements of the data.
Comments, headers etc. can be processed here. Data needed by self.parse_line can be saved in self.tempdata (list).
Default parser yields stripped lines. Override for your use or use an existing parser, e.g.:
parse = ParserBot.parse_csv
- You should do that for recovering lines too.
- recover_line = ParserBot.recover_line_csv
-
parse_csv
(report: intelmq.lib.message.Report)¶ A basic CSV parser. The resulting lines are lists.
-
parse_csv_dict
(report: intelmq.lib.message.Report)¶ A basic CSV Dictionary parser. The resulting lines are dictionaries with the column names as keys.
-
parse_json
(report: intelmq.lib.message.Report)¶ A basic JSON parser. Assumes a list of objects as input to be yield.
-
parse_json_stream
(report: intelmq.lib.message.Report)¶ A JSON Stream parses (one JSON data structure per line)
-
parse_line
(line: typing.Any, report: intelmq.lib.message.Report)¶ A generator which can yield one or more messages contained in line.
Report has the full message, thus you can access some metadata. Override for your use.
-
process
()¶
-
recover_line
(line: typing.Union[str, NoneType] = None) → str¶ Reverse of “parse” for single lines.
Recovers a fully functional report with only the problematic line by concatenating all strings in “self.tempdata” with “line” with LF newlines. Works fine for most text files.
Parameters: line (Optional[str], optional) – The currently process line which should be transferred into it’s original appearance. As fallback, “self._current_line” is used if available (depending on self.parse). The default is None. Raises: ValueError
– If neither the parameter “line” nor the member “self._current_line” is available.Returns: - str
- The reconstructed raw data.
-
recover_line_csv
(line: typing.Union[list, NoneType]) → str¶ - Parameter:
- line: Optional line as list. If absent, the current line is used as string.
-
recover_line_csv_dict
(line: typing.Union[dict, str, NoneType] = None) → str¶ Converts dictionaries to csv. self.csv_fieldnames must be list of fields.
-
recover_line_json
(line: dict) → str¶ Reverse of parse for JSON pulses.
Recovers a fully functional report with only the problematic pulse. Using a string as input here is not possible, as the input may span over multiple lines. Output is not identical to the input, but has the same content.
Parameters: line as dict. (The) – Returns: The JSON-encoded line as string. Return type: str
-
recover_line_json_stream
(line: typing.Union[str, NoneType] = None) → str¶ recover_line for JSON streams (one JSON element per line, no outer structure), just returns the current line, unparsed.
Parameters: line – The line itself as dict, if available, falls back to original current line Returns: unparsed JSON line. Return type: str
-
-
class
intelmq.lib.bot.
OutputBot
(bot_id: str, start: bool = False, sighup_event=None, disable_multithreading: bool = None)¶ Bases:
intelmq.lib.bot.Bot
Base class for outputs.
-
bottype
= 'Output'¶
-
export_event
(event: intelmq.lib.message.Event, return_type: typing.Union[type, NoneType] = None) → typing.Union[str, dict]¶ - exports an event according to the following parameters:
- message_hierarchical
- message_with_type
- message_jsondict_as_string
- single_key
- keep_raw_field
Parameters: return_type – Ensure that the returned value is of the given type. Optional. For example: str If the resulting value is not an instance of this type, the given object is called with the value as parameter E.g. str(retval)
-
-
class
intelmq.lib.bot.
ExpertBot
(bot_id: str, start: bool = False, sighup_event=None, disable_multithreading: bool = None)¶ Bases:
intelmq.lib.bot.Bot
Base class for expert bots.
-
bottype
= 'Expert'¶
-
intelmq.lib.bot_debugger module¶
Utilities for debugging intelmq bots.
BotDebugger is called via intelmqctl. It starts a live running bot instance, leverages logging to DEBUG level and permits even a non-skilled programmer who may find themselves puzzled with Python nuances and server deployment twists to see what’s happening in the bot and where’s the error.
- Depending on the subcommand received, the class either
- starts the bot as is (default)
- processes single message, either injected or from default pipeline (process subcommand)
- reads the message from input pipeline or send a message to output pipeline (message subcommand)
-
class
intelmq.lib.bot_debugger.
BotDebugger
(runtime_configuration, bot_id, run_subcommand=None, console_type=None, message_kind=None, dryrun=None, msg=None, show=None, loglevel=None)¶ Bases:
object
-
EXAMPLE
= '\nThe message may look like:\n \'{"source.network": "178.72.192.0/18", "time.observation": "2017-05-12T05:23:06+00:00"}\' '¶
-
arg2msg
(msg)¶
-
instance
= None¶
-
leverageLogger
(level)¶
-
load_configuration
(configuration_filepath: str) → dict¶ Load JSON or YAML configuration file.
Parameters: configuration_filepath – Path to file to load. Returns: Parsed configuration Return type: config Raises: ValueError
– if file not found
-
static
load_configuration_patch
(configuration_filepath: str, *args, **kwargs) → dict¶ Mock function for utils.load_configuration which ensures the logging level parameter is set to the value we want. If Runtime configuration is detected, the logging_level parameter is - inserted in all bot’s parameters. bot_id is not accessible here, hence we add it everywhere - inserted in the global parameters (ex-defaults). Maybe not everything is necessary, but we can make sure the logging_level is just everywhere where it might be relevant, also in the future.
-
logging_level
= None¶
-
messageWizzard
(msg)¶
-
output
= []¶
-
outputappend
(msg)¶
-
static
pprint
(msg) → str¶ We can’t use standard pprint as JSON standard asks for double quotes.
-
run
() → str¶
-
intelmq.lib.cache module¶
Cache is a set with information already seen by the system. This provides a way, for example, to remove duplicated events and reports in system or cache some results from experts like Cymru Whois. It’s possible to define a TTL value in each information inserted in cache. This TTL means how much time the system will keep an information in the cache.
-
class
intelmq.lib.cache.
Cache
(host: str, port: int, db: str, ttl: int, password: typing.Union[str, NoneType] = None)¶ Bases:
object
-
exists
(key: str)¶
-
flush
()¶ Flushes the currently opened database by calling FLUSHDB.
-
get
(key: str)¶
-
set
(key: str, value: typing.Any, ttl: typing.Union[int, NoneType] = None)¶
-
intelmq.lib.datatypes module¶
-
class
intelmq.lib.datatypes.
BotType
¶ Bases:
str
,enum.Enum
An enumeration.
-
COLLECTOR
= 'Collector'¶
-
EXPERT
= 'Expert'¶
-
OUTPUT
= 'Output'¶
-
PARSER
= 'Parser'¶
-
toJson
()¶
-
intelmq.lib.exceptions module¶
IntelMQ Exception Class
-
exception
intelmq.lib.exceptions.
InvalidArgument
(argument: typing.Any, got: typing.Any = None, expected=None, docs: str = None)¶
-
exception
intelmq.lib.exceptions.
ConfigurationError
(config: str, argument: str)¶
-
exception
intelmq.lib.exceptions.
IntelMQException
(message)¶ Bases:
Exception
-
exception
intelmq.lib.exceptions.
IntelMQHarmonizationException
(message)¶
-
exception
intelmq.lib.exceptions.
InvalidKey
(key: str)¶ Bases:
intelmq.lib.exceptions.IntelMQHarmonizationException
,KeyError
-
exception
intelmq.lib.exceptions.
InvalidValue
(key: str, value: str, reason: typing.Any = None, object: bytes = None)¶
-
exception
intelmq.lib.exceptions.
KeyExists
(key: str)¶
-
exception
intelmq.lib.exceptions.
KeyNotExists
(key: str)¶
-
exception
intelmq.lib.exceptions.
PipelineError
(argument: typing.Union[str, Exception])¶
-
exception
intelmq.lib.exceptions.
MissingDependencyError
(dependency: str, version: typing.Union[str, NoneType] = None, installed: typing.Union[str, NoneType] = None, additional_text: typing.Union[str, NoneType] = None)¶ Bases:
intelmq.lib.exceptions.IntelMQException
A missing dependency was detected. Log instructions on installation.
-
__init__
(dependency: str, version: typing.Union[str, NoneType] = None, installed: typing.Union[str, NoneType] = None, additional_text: typing.Union[str, NoneType] = None)¶ Parameters: - dependency (str) – The dependency name.
- version (Optional[str], optional) – The required version. The default is None.
- installed (Optional[str], optional) – The currently installed version. Requires ‘version’ to be given The default is None.
- additional_text (Optional[str], optional) – Arbitrary additional text to show. The default is None.
Returns: with prepared text
Return type:
-
intelmq.lib.harmonization module¶
The following types are implemented with sanitize() and is_valid() functions:
- Base64
- Boolean
- ClassificationTaxonomy
- ClassificationType
- DateTime
- FQDN
- Float
- Accuracy
- GenericType
- IPAddress
- IPNetwork
- Integer
- JSON
- JSONDict
- LowercaseString
- Registry
- String
- URL
- ASN
- UppercaseString
- TLP
-
class
intelmq.lib.harmonization.
Base64
¶ Bases:
intelmq.lib.harmonization.String
Base64 type. Always gives unicode strings.
Sanitation encodes to base64 and accepts binary and unicode strings.
-
static
is_valid
(value: str, sanitize: bool = False) → bool¶
-
static
sanitize
(value: str) → typing.Union[str, NoneType]¶
-
static
-
class
intelmq.lib.harmonization.
Boolean
¶ Bases:
intelmq.lib.harmonization.GenericType
Boolean type. Without sanitation only python bool is accepted.
Sanitation accepts string ‘true’ and ‘false’ and integers 0 and 1.
-
static
is_valid
(value: bool, sanitize: bool = False) → bool¶
-
static
sanitize
(value: bool) → typing.Union[bool, NoneType]¶
-
static
-
class
intelmq.lib.harmonization.
ClassificationType
¶ Bases:
intelmq.lib.harmonization.String
classification.type type.
The mapping follows Reference Security Incident Taxonomy Working Group – RSIT WG https://github.com/enisaeu/Reference-Security-Incident-Taxonomy-Task-Force/ with extensions.
- These old values are automatically mapped to the new ones:
- ‘botnet drone’ -> ‘infected-system’ ‘ids alert’ -> ‘ids-alert’ ‘c&c’ -> ‘c2-server’ ‘c2server’ -> ‘c2-server’ ‘infected system’ -> ‘infected-system’ ‘malware configuration’ -> ‘malware-configuration’ ‘Unauthorised-information-access’ -> ‘unauthorised-information-access’ ‘leak’ -> ‘data-leak’ ‘vulnerable client’ -> ‘vulnerable-system’ ‘vulnerable service’ -> ‘vulnerable-system’ ‘ransomware’ -> ‘infected-system’ ‘unknown’ -> ‘undetermined’
- These values changed their taxonomy:
- ‘malware’: In terms of the taxonomy ‘malicious-code’ they can be either ‘infected-system’ or ‘malware-distribution’
- but in terms of malware actually, it is now taxonomy ‘other’
- Allowed values are:
- application-compromise
- blacklist
- brute-force
- burglary
- c2-server
- copyright
- data-leak
- data-loss
- ddos
- ddos-amplifier
- dga-domain
- dos
- exploit
- harmful-speech
- ids-alert
- infected-system
- information-disclosure
- malware
- malware-configuration
- malware-distribution
- masquerade
- misconfiguration
- other
- outage
- phishing
- potentially-unwanted-accessible
- privileged-account-compromise
- proxy
- sabotage
- scanner
- sniffing
- social-engineering
- spam
- system-compromise
- test
- tor
- unauthorised-information-access
- unauthorised-information-modification
- unauthorized-use-of-resources
- undetermined
- unprivileged-account-compromise
- violence
- vulnerable-system
- weak-crypto
-
allowed_values
= ('application-compromise', 'blacklist', 'brute-force', 'burglary', 'c2-server', 'copyright', 'data-leak', 'data-loss', 'ddos', 'ddos-amplifier', 'dga-domain', 'dos', 'exploit', 'harmful-speech', 'ids-alert', 'infected-system', 'information-disclosure', 'malware', 'malware-configuration', 'malware-distribution', 'masquerade', 'misconfiguration', 'other', 'outage', 'phishing', 'potentially-unwanted-accessible', 'privileged-account-compromise', 'proxy', 'sabotage', 'scanner', 'sniffing', 'social-engineering', 'spam', 'system-compromise', 'test', 'tor', 'unauthorised-information-access', 'unauthorised-information-modification', 'unauthorized-use-of-resources', 'undetermined', 'unprivileged-account-compromise', 'violence', 'vulnerable-system', 'weak-crypto')¶
-
static
is_valid
(value: str, sanitize: bool = False) → bool¶
-
static
sanitize
(value: str) → typing.Union[str, NoneType]¶
-
class
intelmq.lib.harmonization.
DateTime
¶ Bases:
intelmq.lib.harmonization.String
Date and time type for timestamps.
Valid values are timestamps with time zone and in the format ‘%Y-%m-%dT%H:%M:%S+00:00’. Invalid are missing times and missing timezone information (UTC). Microseconds are also allowed.
Sanitation normalizes the timezone to UTC, which is the only allowed timezone.
The following additional conversions are available with the convert function:
- timestamp
- windows_nt: From Windows NT / AD / LDAP
- epoch_millis: From Milliseconds since Epoch
- from_format: From a given format, eg. ‘from_format|%H %M %S %m %d %Y %Z’
- from_format_midnight: Date from a given format and assume midnight, e.g. ‘from_format_midnight|%d-%m-%Y’
- utc_isoformat: Parse date generated by datetime.isoformat()
- fuzzy (or None): Use dateutils’ fuzzy parser, default if no specific parser is given
-
TIME_CONVERSIONS
= {'timestamp': <function DateTime.from_timestamp>, 'windows_nt': <function DateTime.from_windows_nt>, 'epoch_millis': <function DateTime.from_epoch_millis>, 'from_format': <function DateTime.convert_from_format>, 'from_format_midnight': <function DateTime.convert_from_format_midnight>, 'utc_isoformat': <function DateTime.parse_utc_isoformat>, 'fuzzy': <function DateTime.convert_fuzzy>, None: <function DateTime.convert_fuzzy>}¶
-
static
convert
(value, format='fuzzy') → str¶ Converts date time strings according to the given format. If the timezone is not given or clear, the local time zone is assumed!
- timestamp
- windows_nt: From Windows NT / AD / LDAP
- epoch_millis: From Milliseconds since Epoch
- from_format: From a given format, eg. ‘from_format|%H %M %S %m %d %Y %Z’
- from_format_midnight: Date from a given format and assume midnight, e.g. ‘from_format_midnight|%d-%m-%Y’
- utc_isoformat: Parse date generated by datetime.isoformat()
- fuzzy (or None): Use dateutils’ fuzzy parser, default if no specific parser is given
-
static
convert_from_format
(value: str, format: str) → str¶ Converts a datetime with the given format.
-
static
convert_from_format_midnight
(value: str, format: str) → str¶ Converts a date with the given format and adds time 00:00:00 to it.
-
static
convert_fuzzy
(value) → str¶
-
static
from_epoch_millis
(tstamp: str, tzone='UTC') → datetime.datetime¶ Returns ISO formatted datetime from given epoch timestamp with milliseconds. It ignores the milliseconds, converts it into normal timestamp and processes it.
-
static
from_timestamp
(tstamp: int, tzone='UTC') → str¶ Returns ISO formatted datetime from given timestamp. You can give timezone for given timestamp, UTC by default.
-
static
from_windows_nt
(tstamp: int) → str¶ Converts the Windows NT / LDAP / Active Directory format to ISO format.
The format is: 100 nanoseconds (10^-7s) since 1601-01-01. UTC is assumed.
Parameters: tstamp – Time in LDAP format as integer or string. Will be converted if necessary. Returns: Converted ISO format string See also
-
static
generate_datetime_now
() → str¶
-
static
is_valid
(value: str, sanitize: bool = False) → bool¶
-
midnight
= datetime.time(0, 0)¶
-
static
parse_utc_isoformat
(value: str, return_datetime: bool = False) → typing.Union[datetime.datetime, str]¶ Parse format generated by datetime.isoformat() method with UTC timezone. It is much faster than universal dateutil parser. Can be used for parsing DateTime fields which are already parsed.
Returns a string with ISO format. If return_datetime is True, the return value is a datetime.datetime object.
-
static
sanitize
(value: str) → typing.Union[str, NoneType]¶
-
class
intelmq.lib.harmonization.
FQDN
¶ Bases:
intelmq.lib.harmonization.String
Fully qualified domain name type.
All valid lowercase domains are accepted, no IP addresses or URLs. Trailing dot is not allowed.
To prevent values like ‘10.0.0.1:8080’ (#1235), we check for the non-existence of ‘:’.
-
static
is_valid
(value: str, sanitize: bool = False) → bool¶
-
static
sanitize
(value: str) → typing.Union[str, NoneType]¶
-
static
to_ip
(value: str) → typing.Union[str, NoneType]¶
-
static
-
class
intelmq.lib.harmonization.
Float
¶ Bases:
intelmq.lib.harmonization.GenericType
Float type. Without sanitation only python float/integer/long is accepted. Boolean is explicitly denied.
Sanitation accepts strings and everything float() accepts.
-
static
is_valid
(value: float, sanitize: bool = False) → bool¶
-
static
sanitize
(value: float) → typing.Union[float, NoneType]¶
-
static
-
class
intelmq.lib.harmonization.
Accuracy
¶ Bases:
intelmq.lib.harmonization.Float
Accuracy type. A Float between 0 and 100.
-
static
is_valid
(value: float, sanitize: bool = False) → bool¶
-
static
sanitize
(value: float) → typing.Union[float, NoneType]¶
-
static
-
class
intelmq.lib.harmonization.
GenericType
¶ Bases:
object
-
static
is_valid
(value: str, sanitize: bool = False) → bool¶
-
static
sanitize
(value) → typing.Union[str, NoneType]¶
-
static
-
class
intelmq.lib.harmonization.
IPAddress
¶ Bases:
intelmq.lib.harmonization.String
Type for IP addresses, all families. Uses the ipaddress module.
Sanitation accepts integers, strings and objects of ipaddress.IPv4Address and ipaddress.IPv6Address.
Valid values are only strings. 0.0.0.0 is explicitly not allowed.
-
static
is_valid
(value: str, sanitize: bool = False) → bool¶
-
static
sanitize
(value: typing.Union[int, str]) → typing.Union[str, NoneType]¶
-
static
to_int
(value: str) → typing.Union[int, NoneType]¶
-
static
to_reverse
(ip_addr: str) → str¶
-
static
version
(value: str) → int¶
-
static
-
class
intelmq.lib.harmonization.
IPNetwork
¶ Bases:
intelmq.lib.harmonization.String
Type for IP networks, all families. Uses the ipaddress module.
Sanitation accepts strings and objects of ipaddress.IPv4Network and ipaddress.IPv6Network. If host bits in strings are set, they will be ignored (e.g 127.0.0.1/32).
Valid values are only strings.
-
static
is_valid
(value: str, sanitize: bool = False) → bool¶
-
static
sanitize
(value: str) → typing.Union[str, NoneType]¶
-
static
version
(value: str) → int¶
-
static
-
class
intelmq.lib.harmonization.
Integer
¶ Bases:
intelmq.lib.harmonization.GenericType
Integer type. Without sanitation only python integer/long is accepted. Bool is explicitly denied.
Sanitation accepts strings and everything int() accepts.
-
static
is_valid
(value: int, sanitize: bool = False) → bool¶
-
static
sanitize
(value: int) → typing.Union[int, NoneType]¶
-
static
-
class
intelmq.lib.harmonization.
JSON
¶ Bases:
intelmq.lib.harmonization.String
JSON type.
Sanitation accepts any valid JSON objects.
Valid values are only unicode strings with JSON objects.
-
static
is_valid
(value: str, sanitize: bool = False) → bool¶
-
static
sanitize
(value: str) → typing.Union[str, NoneType]¶
-
static
-
class
intelmq.lib.harmonization.
JSONDict
¶ Bases:
intelmq.lib.harmonization.JSON
JSONDict type.
Sanitation accepts pythons dictionaries and JSON strings.
Valid values are only unicode strings with JSON dictionaries.
-
static
is_valid
(value: str, sanitize: bool = False) → bool¶
-
static
is_valid_subitem
(value: str) → bool¶
-
static
sanitize
(value: str) → typing.Union[str, NoneType]¶
-
static
sanitize_subitem
(value: str) → str¶
-
static
-
class
intelmq.lib.harmonization.
LowercaseString
¶ Bases:
intelmq.lib.harmonization.String
Like string, but only allows lower case characters.
Sanitation lowers all characters.
-
static
is_valid
(value: str, sanitize: bool = False) → bool¶
-
static
sanitize
(value: str) → typing.Union[bool, NoneType]¶
-
static
-
class
intelmq.lib.harmonization.
Registry
¶ Bases:
intelmq.lib.harmonization.UppercaseString
Registry type. Derived from UppercaseString.
Only valid values: AFRINIC, APNIC, ARIN, LACNIC, RIPE. RIPE-NCC and RIPENCC are normalized to RIPE.
-
ENUM
= ['AFRINIC', 'APNIC', 'ARIN', 'LACNIC', 'RIPE']¶
-
static
is_valid
(value: str, sanitize: bool = False) → bool¶
-
static
sanitize
(value: str) → str¶
-
-
class
intelmq.lib.harmonization.
String
¶ Bases:
intelmq.lib.harmonization.GenericType
Any non-empty string without leading or trailing whitespace.
-
static
is_valid
(value: str, sanitize: bool = False) → bool¶
-
static
-
class
intelmq.lib.harmonization.
URL
¶ Bases:
intelmq.lib.harmonization.String
URI type. Local and remote.
Sanitation converts hxxp and hxxps to http and https. For local URIs (file) a missing host is replaced by localhost.
Valid values must have the host (network location part).
-
static
is_valid
(value: str, sanitize: bool = False) → bool¶
-
static
sanitize
(value: str) → typing.Union[str, NoneType]¶
-
static
to_domain_name
(url: str) → typing.Union[str, NoneType]¶
-
static
to_ip
(url: str) → typing.Union[str, NoneType]¶
-
static
-
class
intelmq.lib.harmonization.
ASN
¶ Bases:
intelmq.lib.harmonization.Integer
ASN type. Derived from Integer with forbidden values.
Only valid are: 0 < asn <= 4294967295 See https://en.wikipedia.org/wiki/Autonomous_system_(Internet) > The first and last ASNs of the original 16-bit integers, namely 0 and > 65,535, and the last ASN of the 32-bit numbers, namely 4,294,967,295 are > reserved and should not be used by operators.
-
static
check_asn
(value: int) → bool¶
-
static
is_valid
(value: int, sanitize: bool = False) → bool¶
-
static
sanitize
(value: int) → typing.Union[int, NoneType]¶
-
static
-
class
intelmq.lib.harmonization.
UppercaseString
¶ Bases:
intelmq.lib.harmonization.String
Like string, but only allows upper case characters.
Sanitation uppers all characters.
-
static
is_valid
(value: str, sanitize: bool = False) → bool¶
-
static
sanitize
(value: str) → typing.Union[str, NoneType]¶
-
static
-
class
intelmq.lib.harmonization.
TLP
¶ Bases:
intelmq.lib.harmonization.UppercaseString
TLP level type. Derived from UppercaseString.
Only valid values: WHITE, GREEN, AMBER, RED.
Accepted for sanitation are different cases and the prefix ‘tlp:’.
-
enum
= ['WHITE', 'GREEN', 'AMBER', 'RED']¶
-
static
is_valid
(value: str, sanitize: bool = False) → bool¶
-
prefix_pattern
= re.compile('^(TLP:?)?\\s*')¶
-
static
sanitize
(value: str) → typing.Union[str, NoneType]¶
-
-
class
intelmq.lib.harmonization.
ClassificationTaxonomy
¶ Bases:
intelmq.lib.harmonization.String
classification.taxonomy type.
The mapping follows Reference Security Incident Taxonomy Working Group – RSIT WG https://github.com/enisaeu/Reference-Security-Incident-Taxonomy-Task-Force/
- These old values are automatically mapped to the new ones:
- ‘abusive content’ -> ‘abusive-content’ ‘information gathering’ -> ‘information-gathering’ ‘intrusion attempts’ -> ‘intrusion-attempts’ ‘malicious code’ -> ‘malicious-code’
- Allowed values are:
- abusive-content
- availability
- fraud
- information-content-security
- information-gathering
- intrusion-attempts
- intrusions
- malicious-code
- other
- test
- vulnerable
-
allowed_values
= ['abusive-content', 'availability', 'fraud', 'information-content-security', 'information-gathering', 'intrusion-attempts', 'intrusions', 'malicious-code', 'other', 'test', 'vulnerable']¶
-
static
is_valid
(value: str, sanitize: bool = False) → bool¶
-
static
sanitize
(value: str) → typing.Union[str, NoneType]¶
intelmq.lib.message module¶
Messages are the information packages in pipelines.
Use MessageFactory to get a Message object (types Report and Event).
-
class
intelmq.lib.message.
Event
(message: typing.Union[dict, tuple] = (), auto: bool = False, harmonization: typing.Union[dict, NoneType] = None) → None¶ Bases:
intelmq.lib.message.Message
-
__init__
(message: typing.Union[dict, tuple] = (), auto: bool = False, harmonization: typing.Union[dict, NoneType] = None) → None¶ Parameters: - message – Give a report and feed.name, feed.url and time.observation will be used to construct the Event if given. If it’s another type, the value is given to dict’s init
- auto – unused here
- harmonization – Harmonization definition to use
-
-
class
intelmq.lib.message.
Message
(message: typing.Union[dict, tuple] = (), auto: bool = False, harmonization: dict = None) → None¶ Bases:
dict
-
add
(key: str, value: str, sanitize: bool = True, overwrite: typing.Union[bool, NoneType] = None, ignore: typing.Sequence = (), raise_failure: bool = True) → typing.Union[bool, NoneType]¶ Add a value for the key (after sanitation).
Parameters: - key – Key as defined in the harmonization
- value – A valid value as defined in the harmonization If the value is None or in _IGNORED_VALUES the value will be ignored. If the value is ignored, the key exists and overwrite is True, the key is deleted.
- sanitize – Sanitation of harmonization type will be called before validation (default: True)
- overwrite – Overwrite an existing value if it already exists (default: None) If True, overwrite an existing value If False, do not overwrite an existing value If None, raise intelmq.exceptions.KeyExists for an existing value
- raise_failure – If a intelmq.lib.exceptions.InvalidValue should be raised for invalid values (default: True). If false, the return parameter will be False in case of invalid values.
Returns: - True if the value has been added.
- False if the value is invalid and raise_failure is False or the value existed
- and has not been overwritten.
- None if the value has been ignored.
Raises: intelmq.lib.exceptions.KeyExists
– If key exists and won’t be overwritten explicitly.intelmq.lib.exceptions.InvalidKey
– if key is invalid.intelmq.lib.exceptions.InvalidArgument
– if ignore is not list or tuple.intelmq.lib.exceptions.InvalidValue
– If value is not valid for the given key and raise_failure is True.
-
change
(key: str, value: str, sanitize: bool = True)¶
-
copy
()¶
-
deep_copy
()¶
-
finditems
(keyword: str)¶
-
get
(key, default=None)¶
-
hash
(*, filter_keys: typing.Iterable = frozenset(), filter_type: str = 'blacklist')¶ Return a SHA256 hash of the message as a hexadecimal string. The hash is computed over almost all key/value pairs. Depending on filter_type parameter (blacklist or whitelist), the keys defined in filter_keys_list parameter will be considered as the keys to ignore or the only ones to consider. If given, the filter_keys_list parameter should be a set.
‘time.observation’ will always be ignored.
-
is_valid
(key: str, value: str, sanitize: bool = True) → bool¶ Checks if a value is valid for the key (after sanitation).
Parameters: - key – Key of the field
- value – Value of the field
- sanitize – Sanitation of harmonization type will be called before validation (default: True)
Returns: True if the value is valid, otherwise False
Raises: intelmq.lib.exceptions.InvalidKey
– if given key is invalid.
-
serialize
()¶
-
set_default_value
(value: typing.Any = None)¶ Sets a default value for items.
-
to_dict
(hierarchical: bool = False, with_type: bool = False, jsondict_as_string: bool = False) → dict¶ Returns a copy of self, only based on a dict class.
Parameters: - hierarchical – Split all keys at a dot and save these subitems in dictionaries.
- with_type – Add a value named __type containing the message type
- jsondict_as_string – If False (default) treat values in JSONDict fields just as normal ones If True, save such fields as JSON-encoded string. This is the old behavior before version 1.1.
Returns: - A dictionary as copy of itself modified according
to the given parameters
Return type: new_dict
-
to_json
(hierarchical=False, with_type=False, jsondict_as_string=False)¶
-
static
unserialize
(message_string: str)¶
-
update
(other: dict)¶
-
-
class
intelmq.lib.message.
MessageFactory
¶ Bases:
object
unserialize: JSON encoded message to object serialize: object to JSON encoded object
-
static
from_dict
(message: dict, harmonization=None, default_type: typing.Union[str, NoneType] = None) → dict¶ Takes dictionary Message object, returns instance of correct class.
Parameters: - message – the message which should be converted to a Message object
- harmonization – a dictionary holding the used harmonization
- default_type – If ‘__type’ is not present in message, the given type will be used
See also
MessageFactory.unserialize MessageFactory.serialize
-
static
serialize
(message)¶ Takes instance of message-derived class and makes JSON-encoded Message.
The class is saved in __type attribute.
-
static
unserialize
(raw_message: str, harmonization: dict = None, default_type: typing.Union[str, NoneType] = None) → dict¶ Takes JSON-encoded Message object, returns instance of correct class.
Parameters: - message – the message which should be converted to a Message object
- harmonization – a dictionary holding the used harmonization
- default_type – If ‘__type’ is not present in message, the given type will be used
See also
MessageFactory.from_dict MessageFactory.serialize
-
static
-
class
intelmq.lib.message.
Report
(message: typing.Union[dict, tuple] = (), auto: bool = False, harmonization: typing.Union[dict, NoneType] = None) → None¶ Bases:
intelmq.lib.message.Message
-
__init__
(message: typing.Union[dict, tuple] = (), auto: bool = False, harmonization: typing.Union[dict, NoneType] = None) → None¶ Parameters: - message – Passed along to Message’s and dict’s init. If this is an instance of the Event class, the resulting Report instance has only the fields which are possible in Report, all others are stripped.
- auto – if False (default), time.observation is automatically added.
- harmonization – Harmonization definition to use
-
copy
()¶
-
intelmq.lib.pipeline module¶
-
class
intelmq.lib.pipeline.
Pipeline
(logger, pipeline_args: dict = None, load_balance=False, is_multithreaded=False)¶ Bases:
object
-
acknowledge
()¶ Acknowledge/delete the current message from the source queue
Parameters:
Raises: exceptions
– exceptions.PipelineError: If no message is heldReturns: None
-
clear_queue
(queue)¶
-
connect
()¶
-
disconnect
()¶
-
has_internal_queues
= False¶
-
nonempty_queues
() → set¶
-
receive
() → str¶
-
reject_message
()¶
-
send
(message: str, path: str = '_default', path_permissive: bool = False)¶
-
set_queues
(queues: typing.Union[str, NoneType], queues_type: str)¶ Parameters: - queues – For source queue, it’s just string. For destination queue, it can be one of the following: None or list or dict (of strings or lists, one of the key should be ‘_default’)
- queues_type – “source” or “destination”
The method assures self.destination_queues are in the form of dict of lists. It doesn’t assure there is a ‘_default’ key.
-
-
class
intelmq.lib.pipeline.
PipelineFactory
¶ Bases:
object
-
static
create
(logger, broker=None, direction=None, queues=None, pipeline_args: typing.Union[dict, NoneType] = None, load_balance=False, is_multithreaded=False)¶ direction: “source” or “destination”, optional, needed for queues queues: needs direction to be set, calls set_queues bot: Bot instance
-
static
-
class
intelmq.lib.pipeline.
Redis
(logger, pipeline_args: dict = None, load_balance=False, is_multithreaded=False)¶ Bases:
intelmq.lib.pipeline.Pipeline
-
_reject_message
()¶ Rejecting is a no-op as the message is in the internal queue anyway.
-
clear_queue
(queue)¶ Clears a queue by removing (deleting) the key, which is the same as an empty list in Redis
-
connect
()¶
-
count_queued_messages
(*queues) → dict¶
-
destination_pipeline_db
= 2¶
-
destination_pipeline_host
= '127.0.0.1'¶
-
destination_pipeline_password
= None¶
-
disconnect
()¶
-
has_internal_queues
= True¶
-
load_configurations
(queues_type)¶
-
nonempty_queues
() → set¶ Returns a list of all currently non-empty queues.
-
pipe
= None¶
-
send
(message: str, path: str = '_default', path_permissive: bool = False)¶
-
set_queues
(queues, queues_type)¶
-
source_pipeline_db
= 2¶
-
source_pipeline_host
= '127.0.0.1'¶
-
source_pipeline_password
= None¶
-
-
class
intelmq.lib.pipeline.
Pythonlist
(logger, pipeline_args: dict = None, load_balance=False, is_multithreaded=False)¶ Bases:
intelmq.lib.pipeline.Pipeline
This pipeline uses simple lists and is only for testing purpose.
It behaves in most ways like a normal pipeline would do, but works entirely without external modules and programs. Data is saved as it comes (no conversion) and it is not blocking.
-
_acknowledge
()¶ Removes a message from the internal queue and returns it
-
_receive
() → bytes¶ Receives the last not yet acknowledged message.
Does not block unlike the other pipelines.
-
_reject_message
()¶ No-op because of the internal queue
-
clear_queue
(queue)¶ Empties given queue.
-
connect
()¶
-
count_queued_messages
(*queues) → dict¶ Returns the amount of queued messages over all given queue names.
-
disconnect
()¶
-
send
(message: str, path: str = '_default', path_permissive: bool = False)¶ Sends a message to the destination queues
-
set_queues
(queues, queues_type)¶
-
state
= {}¶
-
-
class
intelmq.lib.pipeline.
Amqp
(logger, pipeline_args: dict = None, load_balance=False, is_multithreaded=False)¶ Bases:
intelmq.lib.pipeline.Pipeline
-
check_connection
()¶
-
clear_queue
(queue: str) → bool¶
-
connect
()¶
-
count_queued_messages
(*queues) → dict¶
-
destination_pipeline_amqp_exchange
= ''¶
-
destination_pipeline_amqp_virtual_host
= '/'¶
-
destination_pipeline_db
= 2¶
-
destination_pipeline_host
= '127.0.0.1'¶
-
destination_pipeline_password
= None¶
-
destination_pipeline_socket_timeout
= None¶
-
destination_pipeline_ssl
= False¶
-
destination_pipeline_username
= None¶
-
disconnect
()¶
-
intelmqctl_rabbitmq_monitoring_url
= None¶
-
load_configurations
(queues_type)¶
-
nonempty_queues
() → set¶
-
queue_args
= {'x-queue-mode': 'lazy'}¶
-
send
(message: str, path: str = '_default', path_permissive: bool = False)¶ In principle we could use AMQP’s exchanges here but that architecture is incompatible to the format of our pipeline configuration.
-
set_queues
(queues: dict, queues_type: str)¶
-
setup_channel
()¶
-
source_pipeline_amqp_exchange
= ''¶
-
source_pipeline_amqp_virtual_host
= '/'¶
-
source_pipeline_db
= 2¶
-
source_pipeline_host
= '127.0.0.1'¶
-
source_pipeline_password
= None¶
-
source_pipeline_socket_timeout
= None¶
-
source_pipeline_ssl
= False¶
-
source_pipeline_username
= None¶
-
intelmq.lib.processmanager module¶
-
class
intelmq.lib.processmanager.
IntelMQProcessManager
(*args, **kwargs)¶ Bases:
intelmq.lib.processmanager.ProcessManagerInterface
-
PIDDIR
= '/var/run/intelmq/'¶
-
PIDFILE
= '/var/run/intelmq/{}.pid'¶
-
static
_interpret_commandline
(pid: int, cmdline: typing.Iterable[str], module: str, bot_id: str) → typing.Union[bool, str]¶ Separate function to allow easy testing
- pid : int
- Process ID, used for return values (error messages) only.
- cmdline : Iterable[str]
- The command line of the process.
- module : str
- The module of the bot.
- bot_id : str
- The ID of the bot.
- Union[bool, str]
- DESCRIPTION.
-
bot_reload
(bot_id, getstatus=True)¶
-
bot_run
(bot_id, run_subcommand=None, console_type=None, message_action_kind=None, dryrun=None, msg=None, show_sent=None, loglevel=None)¶
-
bot_start
(bot_id, getstatus=True)¶
-
bot_status
(bot_id, *, proc=None)¶
-
bot_stop
(bot_id, getstatus=True)¶
-
-
class
intelmq.lib.processmanager.
ProcessManagerInterface
(interactive: bool, runtime_configuration: dict, logger: logging.Logger, returntype: intelmq.lib.datatypes.ReturnType, quiet: bool) → None¶ Bases:
object
Defines an interface all processmanager must adhere to
-
bot_reload
(bot_id: str, getstatus=True)¶
-
bot_run
(bot_id: str, run_subcommand=None, console_type=None, message_action_kind=None, dryrun=None, msg=None, show_sent=None, loglevel=None)¶
-
bot_start
(bot_id: str, getstatus=True)¶
-
bot_status
(bot_id: str) → str¶
-
bot_stop
(bot_id: str, getstatus=True)¶
-
-
class
intelmq.lib.processmanager.
SupervisorProcessManager
(interactive: bool, runtime_configuration: dict, logger: logging.Logger, returntype: intelmq.lib.datatypes.ReturnType, quiet: bool) → None¶ Bases:
intelmq.lib.processmanager.ProcessManagerInterface
-
DEFAULT_SOCKET_PATH
= '/var/run/supervisor.sock'¶
-
class
ProcessState
¶ Bases:
object
-
BACKOFF
= 30¶
-
EXITED
= 100¶
-
FATAL
= 200¶
-
RUNNING
= 20¶
-
STARTING
= 10¶
-
STOPPED
= 0¶
-
STOPPING
= 40¶
-
UNKNOWN
= 1000¶
-
static
is_running
(state: int) → bool¶
-
-
class
RpcFaults
¶ Bases:
object
-
ABNORMAL_TERMINATION
= 40¶
-
ALREADY_ADDED
= 90¶
-
ALREADY_STARTED
= 60¶
-
BAD_ARGUMENTS
= 3¶
-
BAD_NAME
= 10¶
-
BAD_SIGNAL
= 11¶
-
CANT_REREAD
= 92¶
-
FAILED
= 30¶
-
INCORRECT_PARAMETERS
= 2¶
-
NOT_EXECUTABLE
= 21¶
-
NOT_RUNNING
= 70¶
-
NO_FILE
= 20¶
-
SHUTDOWN_STATE
= 6¶
-
SIGNATURE_UNSUPPORTED
= 4¶
-
SPAWN_ERROR
= 50¶
-
STILL_RUNNING
= 91¶
-
SUCCESS
= 80¶
-
UNKNOWN_METHOD
= 1¶
-
-
SUPERVISOR_GROUP
= 'intelmq'¶
-
bot_reload
(bot_id: str, getstatus: bool = True)¶
-
bot_run
(bot_id, run_subcommand=None, console_type=None, message_action_kind=None, dryrun=None, msg=None, show_sent=None, loglevel=None)¶
-
bot_start
(bot_id: str, getstatus: bool = True)¶
-
bot_status
(bot_id: str) → str¶
-
bot_stop
(bot_id: str, getstatus: bool = True)¶
-
-
intelmq.lib.processmanager.
process_managers
()¶ Create a list of processmanagers in this class that are implementing the ProcessManagerInterface Return a dict with a short identifier of the processmanager as key and the classname as value: {‘intelmq’: intelmq.lib.processmanager.IntelMQProcessManager, ‘supervisor’: intelmq.lib.processmanager.SupervisorProcessManager}
intelmq.lib.splitreports module¶
Support for splitting large raw reports into smaller ones.
The main intention of this module is to help work around limitations in Redis which limits strings to 512MB. Collector bots can use the functions in this module to split the incoming data into smaller pieces which can be sent as separate reports.
Collectors usually don’t really know anything about the data they collect, so the data cannot be reliably split into pieces in all cases. This module can be used for those cases, though, where users know that the data is actually a line-based format and can easily be split into pieces as newline characters. For this to work, some assumptions are made:
The data can be split at any newline character
This would not work, for e.g. a CSV based formats which allow newlines in values as long as they’re within quotes.
The lines are much shorter than the maximum chunk size
Obviously, if this condition does not hold, it may not be possible to split the data into small enough chunks at newline characters.
Other considerations:
- To accommodate CSV formats, the code can optionally replicate the first line of the file at the start of all chunks.
- The redis limit applies to the entire IntelMQ report, not just the raw data. The report has some meta data in addition to the raw data and the raw data is encoded as base64 in the report. The maximum chunk size must take this into account, but multiplying the actual limit by 3/4 and subtracting a generous amount for the meta data.
-
intelmq.lib.splitreports.
generate_reports
(report_template: intelmq.lib.message.Report, infile: typing.BinaryIO, chunk_size: typing.Union[int, NoneType], copy_header_line: bool) → typing.Generator[[intelmq.lib.message.Report, NoneType], NoneType]¶ Generate reports from a template and input file, optionally split into chunks.
If chunk_size is None, a single report is generated with the entire contents of infile as the raw data. Otherwise chunk_size should be an integer giving the maximum number of bytes in a chunk. The data read from infile is then split into chunks of this size at newline characters (see read_delimited_chunks). For each of the chunks, this function yields a copy of the report_template with that chunk as the value of the raw attribute.
When splitting the data into chunks, if copy_header_line is true, the first line the file is read before chunking and then prepended to each of the chunks. This is particularly useful when splitting CSV files.
The infile should be a file-like object. generate_reports uses only two methods, readline and read, with readline only called once and only if copy_header_line is true. Both methods should return bytes objects.
- Params:
- report_template: report used as template for all yielded copies infile: stream to read from chunk_size: maximum size of each chunk copy_header_line: copy the first line of the infile to each chunk
Yields: report – a Report object holding the chunk in the raw field
-
intelmq.lib.splitreports.
read_delimited_chunks
(infile: typing.BinaryIO, chunk_size: int) → typing.Generator[[bytes, NoneType], NoneType]¶ Yield the contents of infile in chunk_size pieces ending at newlines. The individual pieces, except for the last one, end in newlines and are smaller than chunk_size if possible.
- Params:
- infile: stream to read from chunk_size: maximum size of each chunk
Yields: chunk – chunk with maximum size of chunk_size if possible
-
intelmq.lib.splitreports.
split_chunks
(chunk: bytes, chunk_size: int) → typing.List[bytes]¶ Split a bytestring into chunk_size pieces at ASCII newlines characters.
The return value is a list of bytestring objects. Appending all of them yields a bytestring equal to the input string. All items in the list except the last item end in newline. The items are shorter than chunk_size if possible, but may be longer if the input data has places where the distance between two neline characters is too long.
Note in particular, that the last item may not end in a newline!
- Params:
- chunk: The string to be split chunk_size: maximum size of each chunk
Returns: List of resulting chunks Return type: chunks
intelmq.lib.test module¶
Utilities for testing intelmq bots.
The BotTestCase can be used as base class for unittests on bots. It includes some basic generic tests (logged errors, correct pipeline setup).
-
class
intelmq.lib.test.
BotTestCase
¶ Bases:
object
Provides common tests and assert methods for bot testing.
-
assertAnyLoglineEqual
(message: str, levelname: str = 'ERROR')¶ Asserts if any logline matches a specific requirement.
Parameters: - message – Message text which is compared
- type – Type of logline which is asserted
Raises: ValueError
– if logline message has not been found
-
assertLogMatches
(pattern: str, levelname: str = 'ERROR')¶ Asserts if any logline matches a specific requirement.
Parameters: - pattern – Message text which is compared, regular expression.
- levelname – Log level of the logline which is asserted, upper case.
-
assertLoglineEqual
(line_no: int, message: str, levelname: str = 'ERROR')¶ Asserts if a logline matches a specific requirement.
Parameters: - line_no – Number of the logline which is asserted
- message – Message text which is compared
- levelname – Log level of logline which is asserted
-
assertLoglineMatches
(line_no: int, pattern: str, levelname: str = 'ERROR')¶ Asserts if a logline matches a specific requirement.
Parameters: - line_no – Number of the logline which is asserted
- pattern – Message text which is compared
- type – Type of logline which is asserted
-
assertMessageEqual
(queue_pos, expected_msg, compare_raw=True, path='_default')¶ Asserts that the given expected_message is contained in the generated event with given queue position.
-
assertNotRegexpMatchesLog
(pattern)¶ Asserts that pattern doesn’t match against log.
-
assertOutputQueueLen
(queue_len=0, path='_default')¶ Asserts that the output queue has the expected length.
-
assertRegexpMatchesLog
(pattern)¶ Asserts that pattern matches against log.
-
bot_types
= {'collector': 'CollectorBot', 'expert': 'ExpertBot', 'output': 'OutputBot', 'parser': 'ParserBot'}¶
-
get_input_internal_queue
()¶ Returns the internal input queue of this bot which can be filled with fixture data in setUp()
-
get_input_queue
()¶ Returns the input queue of this bot which can be filled with fixture data in setUp()
-
get_mocked_logger
(logger)¶
-
get_output_queue
(path='_default')¶ Getter for items in the output queues of this bot. Use in TestCase scenarios If there is multiple queues in named queue group, we return all the items chained.
-
harmonization
= {'event': {'classification.identifier': {'description': 'The lowercase identifier defines the actual software or service (e.g. ``heartbleed`` or ``ntp_version``) or standardized malware name (e.g. ``zeus``). Note that you MAY overwrite this field during processing for your individual setup. This field is not standardized across IntelMQ setups/users.', 'type': 'String'}, 'classification.taxonomy': {'description': 'We recognize the need for the CSIRT teams to apply a static (incident) taxonomy to abuse data. With this goal in mind the type IOC will serve as a basis for this activity. Each value of the dynamic type mapping translates to a an element in the static taxonomy. The European CSIRT teams for example have decided to apply the eCSIRT.net incident classification. The value of the taxonomy key is thus a derivative of the dynamic type above. For more information about check `ENISA taxonomies <http://www.enisa.europa.eu/activities/cert/support/incident-management/browsable/incident-handling-process/incident-taxonomy/existing-taxonomies>`_.', 'length': 100, 'type': 'ClassificationTaxonomy'}, 'classification.type': {'description': 'The abuse type IOC is one of the most crucial pieces of information for any given abuse event. The main idea of dynamic typing is to keep our ontology flexible, since we need to evolve with the evolving threatscape of abuse data. In contrast with the static taxonomy below, the dynamic typing is used to perform business decisions in the abuse handling pipeline. Furthermore, the value data set should be kept as minimal as possible to avoid *type explosion*, which in turn dilutes the business value of the dynamic typing. In general, we normally have two types of abuse type IOC: ones referring to a compromised resource or ones referring to pieces of the criminal infrastructure, such as a command and control servers for example.', 'type': 'ClassificationType'}, 'comment': {'description': 'Free text commentary about the abuse event inserted by an analyst.', 'type': 'String'}, 'destination.abuse_contact': {'description': 'Abuse contact for destination address. A comma separated list.', 'type': 'LowercaseString'}, 'destination.account': {'description': 'An account name or email address, which has been identified to relate to the destination of an abuse event.', 'type': 'String'}, 'destination.allocated': {'description': 'Allocation date corresponding to BGP prefix.', 'type': 'DateTime'}, 'destination.as_name': {'description': 'The autonomous system name to which the connection headed.', 'type': 'String'}, 'destination.asn': {'description': 'The autonomous system number to which the connection headed.', 'type': 'ASN'}, 'destination.domain_suffix': {'description': 'The suffix of the domain from the public suffix list.', 'type': 'FQDN'}, 'destination.fqdn': {'description': 'A DNS name related to the host from which the connection originated. DNS allows even binary data in DNS, so we have to allow everything. A final point is stripped, string is converted to lower case characters.', 'regex': '^.*[^\\.]$', 'type': 'FQDN'}, 'destination.geolocation.cc': {'description': 'Country-Code according to ISO3166-1 alpha-2 for the destination IP.', 'length': 2, 'regex': '^[a-zA-Z0-9]{2}$', 'type': 'UppercaseString'}, 'destination.geolocation.city': {'description': 'Some geolocation services refer to city-level geolocation.', 'type': 'String'}, 'destination.geolocation.country': {'description': 'The country name derived from the ISO3166 country code (assigned to cc field).', 'type': 'String'}, 'destination.geolocation.latitude': {'description': 'Latitude coordinates derived from a geolocation service, such as MaxMind geoip db.', 'type': 'Float'}, 'destination.geolocation.longitude': {'description': 'Longitude coordinates derived from a geolocation service, such as MaxMind geoip db.', 'type': 'Float'}, 'destination.geolocation.region': {'description': 'Some geolocation services refer to region-level geolocation.', 'type': 'String'}, 'destination.geolocation.state': {'description': 'Some geolocation services refer to state-level geolocation.', 'type': 'String'}, 'destination.ip': {'description': 'The IP which is the target of the observed connections.', 'type': 'IPAddress'}, 'destination.local_hostname': {'description': 'Some sources report an internal hostname within a NAT related to the name configured for a compromised system', 'type': 'String'}, 'destination.local_ip': {'description': 'Some sources report an internal (NATed) IP address related a compromised system. N.B. RFC1918 IPs are OK here.', 'type': 'IPAddress'}, 'destination.network': {'description': 'CIDR for an autonomous system. Also known as BGP prefix. If multiple values are possible, select the most specific.', 'type': 'IPNetwork'}, 'destination.port': {'description': 'The port to which the connection headed.', 'type': 'Integer'}, 'destination.registry': {'description': 'The IP registry a given ip address is allocated by.', 'length': 7, 'type': 'Registry'}, 'destination.reverse_dns': {'description': 'Reverse DNS name acquired through a reverse DNS query on an IP address. N.B. Record types other than PTR records may also appear in the reverse DNS tree. Furthermore, unfortunately, there is no rule prohibiting people from writing anything in a PTR record. Even JavaScript will work. A final point is stripped, string is converted to lower case characters.', 'regex': '^.*[^\\.]$', 'type': 'FQDN'}, 'destination.tor_node': {'description': 'If the destination IP was a known tor node.', 'type': 'Boolean'}, 'destination.url': {'description': 'A URL denotes on IOC, which refers to a malicious resource, whose interpretation is defined by the abuse type. A URL with the abuse type phishing refers to a phishing resource.', 'type': 'URL'}, 'destination.urlpath': {'description': 'The path portion of an HTTP or related network request.', 'type': 'String'}, 'event_description.target': {'description': 'Some sources denominate the target (organization) of a an attack.', 'type': 'String'}, 'event_description.text': {'description': 'A free-form textual description of an abuse event.', 'type': 'String'}, 'event_description.url': {'description': 'A description URL is a link to a further description of the the abuse event in question.', 'type': 'URL'}, 'event_hash': {'description': 'Computed event hash with specific keys and values that identify a unique event. At present, the hash should default to using the SHA1 function. Please note that for an event hash to be able to match more than one event (deduplication) the receiver of an event should calculate it based on a minimal set of keys and values present in the event. Using for example the observation time in the calculation will most likely render the checksum useless for deduplication purposes.', 'length': 40, 'regex': '^[A-F0-9./]+$', 'type': 'UppercaseString'}, 'extra': {'description': 'All anecdotal information, which cannot be parsed into the data harmonization elements. E.g. os.name, os.version, etc. **Note**: this is only intended for mapping any fields which can not map naturally into the data harmonization. It is not intended for extending the data harmonization with your own fields.', 'type': 'JSONDict'}, 'feed.accuracy': {'description': 'A float between 0 and 100 that represents how accurate the data in the feed is', 'type': 'Accuracy'}, 'feed.code': {'description': 'Code name for the feed, e.g. DFGS, HSDAG etc.', 'length': 100, 'type': 'String'}, 'feed.documentation': {'description': 'A URL or hint where to find the documentation of this feed.', 'type': 'String'}, 'feed.name': {'description': 'Name for the feed, usually found in collector bot configuration.', 'type': 'String'}, 'feed.provider': {'description': 'Name for the provider of the feed, usually found in collector bot configuration.', 'type': 'String'}, 'feed.url': {'description': 'The URL of a given abuse feed, where applicable', 'type': 'URL'}, 'malware.hash.md5': {'description': 'A string depicting an MD5 checksum for a file, be it a malware sample for example.', 'length': 200, 'regex': '^[ -~]+$', 'type': 'String'}, 'malware.hash.sha1': {'description': 'A string depicting a SHA1 checksum for a file, be it a malware sample for example.', 'length': 200, 'regex': '^[ -~]+$', 'type': 'String'}, 'malware.hash.sha256': {'description': 'A string depicting a SHA256 checksum for a file, be it a malware sample for example.', 'length': 200, 'regex': '^[ -~]+$', 'type': 'String'}, 'malware.name': {'description': 'The malware name in lower case.', 'regex': '^[ -~]+$', 'type': 'LowercaseString'}, 'malware.version': {'description': 'A version string for an identified artifact generation, e.g. a crime-ware kit.', 'regex': '^[ -~]+$', 'type': 'String'}, 'misp.attribute_uuid': {'description': 'MISP - Malware Information Sharing Platform & Threat Sharing UUID of an attribute.', 'length': 36, 'regex': '^[a-z0-9]{8}-[a-z0-9]{4}-[a-z0-9]{4}-[a-z0-9]{4}-[a-z0-9]{12}$', 'type': 'LowercaseString'}, 'misp.event_uuid': {'description': 'MISP - Malware Information Sharing Platform & Threat Sharing UUID.', 'length': 36, 'regex': '^[a-z0-9]{8}-[a-z0-9]{4}-[a-z0-9]{4}-[a-z0-9]{4}-[0-9a-z]{12}$', 'type': 'LowercaseString'}, 'output': {'description': 'Event data converted into foreign format, intended to be exported by output plugin.', 'type': 'JSON'}, 'protocol.application': {'description': 'e.g. vnc, ssh, sip, irc, http or smtp.', 'length': 100, 'regex': '^[ -~]+$', 'type': 'LowercaseString'}, 'protocol.transport': {'description': 'e.g. tcp, udp, icmp.', 'iregex': '^(ip|icmp|igmp|ggp|ipencap|st2|tcp|cbt|egp|igp|bbn-rcc|nvp(-ii)?|pup|argus|emcon|xnet|chaos|udp|mux|dcn|hmp|prm|xns-idp|trunk-1|trunk-2|leaf-1|leaf-2|rdp|irtp|iso-tp4|netblt|mfe-nsp|merit-inp|sep|3pc|idpr|xtp|ddp|idpr-cmtp|tp\\+\\+|il|ipv6|sdrp|ipv6-route|ipv6-frag|idrp|rsvp|gre|mhrp|bna|esp|ah|i-nlsp|swipe|narp|mobile|tlsp|skip|ipv6-icmp|ipv6-nonxt|ipv6-opts|cftp|sat-expak|kryptolan|rvd|ippc|sat-mon|visa|ipcv|cpnx|cphb|wsn|pvp|br-sat-mon|sun-nd|wb-mon|wb-expak|iso-ip|vmtp|secure-vmtp|vines|ttp|nsfnet-igp|dgp|tcf|eigrp|ospf|sprite-rpc|larp|mtp|ax.25|ipip|micp|scc-sp|etherip|encap|gmtp|ifmp|pnni|pim|aris|scps|qnx|a/n|ipcomp|snp|compaq-peer|ipx-in-ip|vrrp|pgm|l2tp|ddx|iatp|st|srp|uti|smp|sm|ptp|isis|fire|crtp|crdup|sscopmce|iplt|sps|pipe|sctp|fc|divert)$', 'length': 11, 'type': 'LowercaseString'}, 'raw': {'description': 'The original line of the event from encoded in base64.', 'type': 'Base64'}, 'rtir_id': {'description': 'Request Tracker Incident Response ticket id.', 'type': 'Integer'}, 'screenshot_url': {'description': 'Some source may report URLs related to a an image generated of a resource without any metadata. Or an URL pointing to resource, which has been rendered into a webshot, e.g. a PNG image and the relevant metadata related to its retrieval/generation.', 'type': 'URL'}, 'source.abuse_contact': {'description': 'Abuse contact for source address. A comma separated list.', 'type': 'LowercaseString'}, 'source.account': {'description': 'An account name or email address, which has been identified to relate to the source of an abuse event.', 'type': 'String'}, 'source.allocated': {'description': 'Allocation date corresponding to BGP prefix.', 'type': 'DateTime'}, 'source.as_name': {'description': 'The autonomous system name from which the connection originated.', 'type': 'String'}, 'source.asn': {'description': 'The autonomous system number from which originated the connection.', 'type': 'ASN'}, 'source.domain_suffix': {'description': 'The suffix of the domain from the public suffix list.', 'type': 'FQDN'}, 'source.fqdn': {'description': 'A DNS name related to the host from which the connection originated. DNS allows even binary data in DNS, so we have to allow everything. A final point is stripped, string is converted to lower case characters.', 'regex': '^.*[^\\.]$', 'type': 'FQDN'}, 'source.geolocation.cc': {'description': 'Country-Code according to ISO3166-1 alpha-2 for the source IP.', 'length': 2, 'regex': '^[a-zA-Z0-9]{2}$', 'type': 'UppercaseString'}, 'source.geolocation.city': {'description': 'Some geolocation services refer to city-level geolocation.', 'type': 'String'}, 'source.geolocation.country': {'description': 'The country name derived from the ISO3166 country code (assigned to cc field).', 'type': 'String'}, 'source.geolocation.cymru_cc': {'description': 'The country code denoted for the ip by the Team Cymru asn to ip mapping service.', 'length': 2, 'regex': '^[a-zA-Z0-9]{2}$', 'type': 'UppercaseString'}, 'source.geolocation.geoip_cc': {'description': 'MaxMind Country Code (ISO3166-1 alpha-2).', 'length': 2, 'regex': '^[a-zA-Z0-9]{2}$', 'type': 'UppercaseString'}, 'source.geolocation.latitude': {'description': 'Latitude coordinates derived from a geolocation service, such as MaxMind geoip db.', 'type': 'Float'}, 'source.geolocation.longitude': {'description': 'Longitude coordinates derived from a geolocation service, such as MaxMind geoip db.', 'type': 'Float'}, 'source.geolocation.region': {'description': 'Some geolocation services refer to region-level geolocation.', 'type': 'String'}, 'source.geolocation.state': {'description': 'Some geolocation services refer to state-level geolocation.', 'type': 'String'}, 'source.ip': {'description': 'The ip observed to initiate the connection', 'type': 'IPAddress'}, 'source.local_hostname': {'description': 'Some sources report a internal hostname within a NAT related to the name configured for a compromised system', 'type': 'String'}, 'source.local_ip': {'description': 'Some sources report a internal (NATed) IP address related a compromised system. N.B. RFC1918 IPs are OK here.', 'type': 'IPAddress'}, 'source.network': {'description': 'CIDR for an autonomous system. Also known as BGP prefix. If multiple values are possible, select the most specific.', 'type': 'IPNetwork'}, 'source.port': {'description': 'The port from which the connection originated.', 'length': 5, 'type': 'Integer'}, 'source.registry': {'description': 'The IP registry a given ip address is allocated by.', 'length': 7, 'type': 'Registry'}, 'source.reverse_dns': {'description': 'Reverse DNS name acquired through a reverse DNS query on an IP address. N.B. Record types other than PTR records may also appear in the reverse DNS tree. Furthermore, unfortunately, there is no rule prohibiting people from writing anything in a PTR record. Even JavaScript will work. A final point is stripped, string is converted to lower case characters.', 'regex': '^.*[^\\.]$', 'type': 'FQDN'}, 'source.tor_node': {'description': 'If the source IP was a known tor node.', 'type': 'Boolean'}, 'source.url': {'description': 'A URL denotes an IOC, which refers to a malicious resource, whose interpretation is defined by the abuse type. A URL with the abuse type phishing refers to a phishing resource.', 'type': 'URL'}, 'source.urlpath': {'description': 'The path portion of an HTTP or related network request.', 'type': 'String'}, 'status': {'description': 'Status of the malicious resource (phishing, dropzone, etc), e.g. online, offline.', 'type': 'String'}, 'time.observation': {'description': 'The time the collector of the local instance processed (observed) the event.', 'type': 'DateTime'}, 'time.source': {'description': 'The time of occurrence of the event as reported the feed (source).', 'type': 'DateTime'}, 'tlp': {'description': 'Traffic Light Protocol level of the event.', 'type': 'TLP'}}, 'report': {'extra': {'description': 'All anecdotal information of the report, which cannot be parsed into the data harmonization elements. E.g. subject of mails, etc. This is data is not automatically propagated to the events.', 'type': 'JSONDict'}, 'feed.accuracy': {'description': 'A float between 0 and 100 that represents how accurate the data in the feed is', 'type': 'Accuracy'}, 'feed.code': {'description': 'Code name for the feed, e.g. DFGS, HSDAG etc.', 'length': 100, 'type': 'String'}, 'feed.documentation': {'description': 'A URL or hint where to find the documentation of this feed.', 'type': 'String'}, 'feed.name': {'description': 'Name for the feed, usually found in collector bot configuration.', 'type': 'String'}, 'feed.provider': {'description': 'Name for the provider of the feed, usually found in collector bot configuration.', 'type': 'String'}, 'feed.url': {'description': 'The URL of a given abuse feed, where applicable', 'type': 'URL'}, 'raw': {'description': 'The original raw and unparsed data encoded in base64.', 'type': 'Base64'}, 'rtir_id': {'description': 'Request Tracker Incident Response ticket id.', 'type': 'Integer'}, 'time.observation': {'description': 'The time the collector of the local instance processed (observed) the event.', 'type': 'DateTime'}}}¶
-
input_queue
¶ Returns the input queue of this bot which can be filled with fixture data in setUp()
-
new_event
()¶
-
new_report
(auto=False, examples=False)¶
-
prepare_bot
(parameters={}, destination_queues=None, prepare_source_queue: bool = True)¶ Reconfigures the bot with the changed attributes.
Parameters: - parameters – optional bot parameters for this run, as dict
- destination_queues – optional definition of destination queues default: {“_default”: “{}-output”.format(self.bot_id)}
-
prepare_source_queue
()¶
-
run_bot
(iterations: int = 1, error_on_pipeline: bool = False, prepare=True, parameters={}, allowed_error_count=0, allowed_warning_count=0, stop_bot: bool = True)¶ Call this method for actually doing a test run for the specified bot.
Parameters: - iterations – Bot instance will be run the given times, defaults to 1.
- parameters – passed to prepare_bot
- allowed_error_count – maximum number allow allowed errors in the logs
- allowed_warning_count – maximum number allow allowed warnings in the logs
- bot_stop – If the bot should be stopped/shut down after running it. Set to False, if you are calling this method again afterwards, as the bot shutdown destroys structures (pipeline, etc.)
-
classmethod
setUpClass
()¶ Set default values and save original functions.
-
set_input_queue
(seq)¶ Setter for the input queue of this bot
-
tearDown
()¶ Check if the bot did consume all messages.
Executed after every test run.
-
classmethod
tearDownClass
()¶
-
test_bot_name
(*args, **kwargs)¶ Test if Bot has a valid name. Must be CamelCase and end with CollectorBot etc.
Accept arbitrary arguments in case the test methods get mocked and get some additional arguments. All arguments are ignored.
-
test_static_bot_check_method
(*args, **kwargs)¶ Check if the bot’s static check() method completes without errors (exceptions). The return value (errors) are not checked.
The arbitrary parameters for this test function are needed because if a mocker mocks the test class, parameters can be added. See for example intelmq.tests.bots.collectors.http.test_collector.
-
intelmq.lib.upgrades module¶
© 2020 Sebastian Wagner <wagner@cert.at>
SPDX-License-Identifier: AGPL-3.0-or-later
-
intelmq.lib.upgrades.
v100_dev7_modify_syntax
(configuration, harmonization, dry_run, **kwargs)¶ Migrate modify bot configuration format
-
intelmq.lib.upgrades.
v110_shadowserver_feednames
(configuration, harmonization, dry_run, **kwargs)¶ Replace deprecated Shadowserver feednames
-
intelmq.lib.upgrades.
v110_deprecations
(configuration, harmonization, dry_run, **kwargs)¶ Checking for deprecated runtime configurations (stomp collector, cymru parser, ripe expert, collector feed parameter)
-
intelmq.lib.upgrades.
v200_defaults_statistics
(configuration, harmonization, dry_run, **kwargs)¶ Inserting statistics_* parameters into defaults configuration file
-
intelmq.lib.upgrades.
v200_defaults_broker
(configuration, harmonization, dry_run, **kwargs)¶ Inserting *_pipeline_broker and deleting broker into/from defaults configuration
-
intelmq.lib.upgrades.
v112_feodo_tracker_ips
(configuration, harmonization, dry_run, **kwargs)¶ Fix URL of feodotracker IPs feed in runtime configuration
-
intelmq.lib.upgrades.
v112_feodo_tracker_domains
(configuration, harmonization, dry_run, **kwargs)¶ Search for discontinued feodotracker domains feed
-
intelmq.lib.upgrades.
v200_defaults_ssl_ca_certificate
(configuration, harmonization, dry_run, **kwargs)¶ Add ssl_ca_certificate to defaults
-
intelmq.lib.upgrades.
v111_defaults_process_manager
(configuration, harmonization, dry_run, **kwargs)¶ Fix typo in proccess_manager parameter
-
intelmq.lib.upgrades.
v202_fixes
(configuration, harmonization, dry_run, **kwargs)¶ Migrate Collector parameter feed to name. RIPE expert set query_ripe_stat_ip with query_ripe_stat_asn as default. Set cymru whois expert overwrite to true.
-
intelmq.lib.upgrades.
v210_deprecations
(configuration, harmonization, dry_run, **kwargs)¶ Migrating configuration
-
intelmq.lib.upgrades.
v213_deprecations
(configuration, harmonization, dry_run, **kwargs)¶ migrate attach_unzip to extract_files for mail attachment collector
-
intelmq.lib.upgrades.
v213_feed_changes
(configuration, harmonization, dry_run, **kwargs)¶ Migrates feed configuration for changed feed parameters.
-
intelmq.lib.upgrades.
v220_configuration
(configuration, harmonization, dry_run, **kwargs)¶ Migrating configuration
-
intelmq.lib.upgrades.
v220_azure_collector
(configuration, harmonization, dry_run, **kwargs)¶ Checking for the Microsoft Azure collector
-
intelmq.lib.upgrades.
v220_feed_changes
(configuration, harmonization, dry_run, **kwargs)¶ Migrates feed configuration for changed feed parameters.
-
intelmq.lib.upgrades.
v221_feed_changes
(configuration, harmonization, dry_run, **kwargs)¶ Migrates feeds’ configuration for changed/fixed parameters. Deprecation of HP Hosts file feed & parser.
-
intelmq.lib.upgrades.
v222_feed_changes
(configuration, harmonization, dry_run, **kwargs)¶ Migrate Shadowserver feed name
-
intelmq.lib.upgrades.
v230_csv_parser_parameter_fix
(configuration, harmonization, dry_run, **kwargs)¶ Fix CSV parser parameter misspelling
-
intelmq.lib.upgrades.
v230_deprecations
(configuration, harmonization, dry_run, **kwargs)¶ Deprecate malwaredomainlist parser
-
intelmq.lib.upgrades.
v230_feed_changes
(configuration, harmonization, dry_run, **kwargs)¶ Migrates feeds’ configuration for changed/fixed parameter
-
intelmq.lib.upgrades.
v233_feodotracker_browse
(configuration, harmonization, dry_run, **kwargs)¶ Migrate Abuse.ch Feodotracker Browser feed parsing parameters
-
intelmq.lib.upgrades.
v300_bots_file_removal
(configuration, harmonization, dry_run, **kwargs)¶ Remove BOTS file
-
intelmq.lib.upgrades.
v300_defaults_file_removal
(configuration, harmonization, dry_run, **kwargs)¶ Remove the defaults.conf file
-
intelmq.lib.upgrades.
v300_pipeline_file_removal
(configuration, harmonization, dry_run, **kwargs)¶ Remove the pipeline.conf file
-
intelmq.lib.upgrades.
v301_deprecations
(configuration, harmonization, dry_run, **kwargs)¶ Deprecate malwaredomains parser and collector
-
intelmq.lib.upgrades.
v310_feed_changes
(configuration, harmonization, dry_run, **kwargs)¶ Migrates feeds’ configuration for changed/fixed parameter
-
intelmq.lib.upgrades.
v310_shadowserver_feednames
(configuration, harmonization, dry_run, **kwargs)¶ Remove legacy Shadowserver feednames
intelmq.lib.utils module¶
Common utility functions for intelmq.
decode encode base64_decode base64_encode load_configuration log reverse_readline parse_logline
-
intelmq.lib.utils.
base64_decode
(value: typing.Union[bytes, str]) → str¶ Parameters: value – base64 encoded string Returns: decoded string Return type: retval Notes
Possible bytes - unicode conversions problems are ignored.
-
intelmq.lib.utils.
base64_encode
(value: typing.Union[bytes, str]) → str¶ Parameters: value – string to be encoded Returns: base64 representation of value Return type: retval Notes
Possible bytes - unicode conversions problems are ignored.
-
intelmq.lib.utils.
decode
(text: typing.Union[bytes, str], encodings: typing.Sequence[str] = ('utf-8',), force: bool = False) → str¶ Decode given string to UTF-8 (default).
Parameters: - text – if unicode string is given, same object is returned
- encodings – list/tuple of encodings to use
- force – Ignore invalid characters
Returns: converted unicode string
Raises: ValueError
– if decoding failed
-
intelmq.lib.utils.
encode
(text: typing.Union[bytes, str], encodings: typing.Sequence[str] = ('utf-8',), force: bool = False) → bytes¶ Encode given string from UTF-8 (default).
Parameters: - text – if bytes string is given, same object is returned
- encodings – list/tuple of encodings to use
- force – Ignore invalid characters
Returns: converted bytes string
Raises: ValueError
– if encoding failed
-
intelmq.lib.utils.
load_configuration
(configuration_filepath: str) → dict¶ Load JSON or YAML configuration file.
Parameters: configuration_filepath – Path to file to load. Returns: Parsed configuration Return type: config Raises: ValueError
– if file not found
-
intelmq.lib.utils.
load_parameters
(*configs: dict) → intelmq.lib.utils.Parameters¶ Load dictionaries into new Parameters() instance.
Parameters: *configs – Arbitrary number of dictionaries to load. Returns: class instance with items of configs as attributes Return type: parameters
-
intelmq.lib.utils.
log
(name: str, log_path: typing.Union[str, bool] = '/var/log/intelmq/', log_level: str = 'INFO', stream: object = None, syslog: typing.Union[bool, str, list, tuple] = None, log_format_stream: str = '%(name)s: %(message)s', logging_level_stream: typing.Union[str, NoneType] = None, log_max_size: typing.Union[int, NoneType] = 0, log_max_copies: typing.Union[int, NoneType] = None)¶
-
intelmq.lib.utils.
parse_logline
(logline: str, regex: str = '^(?P<date>\\d{4}-\\d{2}-\\d{2} \\d{2}:\\d{2}:\\d{2}, \\d+) - (?P<bot_id>([-\\w]+|py\\.warnings))(?P<thread_id>\\.[0-9]+)? - (?P<log_level>[A-Z]+) - (?P<message>.+)$') → typing.Union[dict, str]¶ Parses the given logline string into its components.
Parameters: - logline – logline to be parsed
- regex – The regular expression used to parse the line
Returns: - dictionary with keys: [‘date’, ‘bot_id’, ‘log_level’, ‘message’]
or string if the line can’t be parsed
Return type: result
See also
LOG_REGEX: Regular expression for default log format of file handler SYSLOG_REGEX: Regular expression for log format of syslog
-
intelmq.lib.utils.
reverse_readline
(filename: str, buf_size=100000) → typing.Generator[[str, NoneType], NoneType]¶
-
intelmq.lib.utils.
error_message_from_exc
(exc: Exception) → str¶ >>> exc = IndexError('This is a test') >>> error_message_from_exc(exc) 'This is a test'
Parameters: exc – Returns: The error message of exc Return type: result
-
intelmq.lib.utils.
parse_relative
(relative_time: str) → int¶ Parse relative time attributes and returns the corresponding minutes.
>>> parse_relative('4 hours') 240
Parameters: relative_time – a string holding a relative time specification Returns: Minutes Return type: result Raises: ValueError
– If relative_time is not parseableSee also
TIMESPANS: Defines the conversion of verbal timespans to minutes
-
class
intelmq.lib.utils.
RewindableFileHandle
(f)¶ Bases:
object
Can be used for easy retrieval of last input line to populate raw field during CSV parsing.
-
intelmq.lib.utils.
file_name_from_response
(response: requests.models.Response) → str¶ Extract the file name from the Content-Disposition header of the Response object or the URL as fallback
Parameters: response – a Response object retrieved from a call with the requests library Returns: The file name Return type: file_name
-
intelmq.lib.utils.
list_all_bots
() → dict¶ Compile a dictionary with all bots and their parameters.
Includes * the bots’ names * the description from the docstring * parameters including default values.
For the parameters, parameters of the Bot class are excluded if they have the same value.
-
intelmq.lib.utils.
get_global_settings
() → dict¶