Hairgap’s documentation

Introduction

Basic protocol to send files using the [hairgap binary](https://github.com/cea-sec/hairgap). The goal is to send random files through a unidirectionnal data-diode using UDP connections.

Build Status Documentation Status Pypi Status GitHub top language PyPI - License LGTM Grade

By default, hairgap can only send a file, without its name. This library implements a basic protocol to send complete directories and checksum transfered files.

This protocol is customizable and the sender side can add some attributes to each transfer.

  • We assume that the hairgap binary is installed and in the PATH environment variable.
  • The MAC adress of the destination must be known from the sender machine. You can inject this information into the ARP cache of the sender machine:
DESTINATION_IP="the IP address of the destination machine"
DESTINATION_MAC="the MAC address of the destination machine"
arp -s ${DESTINATION_IP} ${DESTINATION_MAC}

Command-line usage

First, you must start the receiver on the destination side:

pip3 install hairgap
pyhairgap receive ${DESTINATION_IP} directory/

Then you can send directories:

pip3 install hairgap
pyhairgap send ${DESTINATION_IP} directory/

You must ensure that only one transfer occurs at once (to a given port), due to UDP limitations.

How does it work?

  • First, an index file is created beside the directory to send, with the relative path of each file, their sizes and SHA256s.

    If a file is empty, this file is replaced by a magic value (since hairgap cannot send empty files). If a file starts by some magic values, then this file is overwritten to escape these magic values. The content of the directory to send is modified in-place.

  • When all files are checked and the index file is ready, the transfer of the index file occurs, then each file is sent.

    There is a 3-second sleep between two successive transfers.

  • On the receiver side, there are two infinite loops:

    • one for the reception thread, that receives a file and sends it to the processing thread
    • one for the processing thread, that process each file (if its an index, then it reads it and knows how to rename the next files)

Unexpected files (for example, if the index file has been sent before the start of the receive process) are deleted.

Transfer modes

Three transfer modes are available:

  • an index file is sent, followed by all files, one by one,
  • all files (including the index one) are sent as a single tar archive (created on the fly),
  • all files are gathered in a single tar.gz archive that is split. Then an index file is sent followed by the chunks.

The first one does not require extra storage (but remember that files can be modified in place since empty files cannot be sent) but can be very slow if many files are sent (due to the 3-second sleep after each transfer). The second one is the most efficient but requires to send potentially very large files. The third one is a trade-off between these methods, limiting the number of files to transfer and their size.

Customize transfers

Receiver side

Receive data from hairgap using a proprietary protocol. The algorithm is simple:

receive_file launches the command hairgapr that waits for a transfer and exists when a file is received. process_file read the first bytes of the file

  • if they match HAIRGAP_MAGIC_NUMBER_INDEX, then this is an index file, with:
    • the transfer identifier
    • the previous transfer identifier
    • the list of following files and their sha256 (in the transfer order)
  • otherwise, this is the next expected file, as read by the index file

Empty files cannot be sent by hairgap, so they are replaced by the HAIRGAP_MAGIC_NUMBER_EMPTY constant. If a new index file is read before the last expected file of the previous index, then we start a new index: we assume that the sender has been interrupted and has restarted the whole process.

A 5-second sleep (HAIRGAP_END_DELAY_S) is performed by the sender after each send.

Both functions can be serialized (only if we assume that the process_file function takes less than 5 seconds), but can also be run in separate threads for handling large files.

class hairgap.receiver.Receiver(config: hairgap.utils.Config, threading: bool = False, port: int = None)[source]

define the reception process. Can be split into two threads or can be serialize operations when files are small enough. You just have to call the loop method to start the reception process. Basically, the algorithm is:

while True:
    receive_file(temporary_filepath)
    if is_index_file(temporary_filepath):
        read_list_of_expected_filenames()
    else:
        filename = expected_filenames.pop()
        os.rename(temporary_filepath, filename)
get_current_transfer_directory() → Optional[str][source]

return a folder name where all files of a transfer can be moved to.

The index file has been read and the attributes are set. This folder will be automatically created. If None, all received files will be deleted.

process_received_file(tmp_abspath: str, valid: bool = True)[source]

process a received file the execution time of this method must be small when threading is False (5 seconds between two communications) => must be threaded when large files are processed since we compute their sha256.

Parameters:
  • tmp_abspath – the temporary absolute path
  • valid – the file has been correctly received by hairgap
Returns:

process_received_file_tar(tmp_abspath: str, valid: bool = True)[source]

process a tar.gz archive. a single file and a single directory are expected at the root of the received archive

Parameters:
  • tmp_abspath
  • valid
Returns:

receive_file(tmp_path) → Optional[bool][source]

receive a single file and returns True if hairgap did not raise an error False if hairgap did raise an error but Ctrl-C None if hairgap was terminated by Ctrl-C

transfer_complete()[source]

called when all files of a transfer are received.

the execution time of this method must be small when threading is False (5 seconds between two communications) You can read current_attributes to retrieve the attributes defined by the sender (set to None by default).

transfer_file_received(tmp_abspath, file_relpath, actual_sha256: Optional[str] = None, expected_sha256: Optional[str] = None, tmp_fd: _io.BytesIO = None)[source]

called when a file is received

the execution time of this method must be small if threading is False (5 seconds between two communications)

Parameters:
  • tmp_abspath – the path of the received file
  • file_relpath – the destination path of the received file
  • actual_sha256 – actual SHA256 (not provided in case of tar archives)
  • expected_sha256 – expected SHA256 (not provided in case of tar archives)
  • tmp_fd – provided when tmp_abspath is not given
Returns:

transfer_file_unexpected(tmp_abspath: str, prefix: bytes = None)[source]

called when an unexpected file has been received. Probably an interrupted transfer…

Parameters:
  • tmp_abspath – absolute path of the received file
  • prefix – is the first bytes of the received file.
transfer_start()[source]

called before the first file of a transfer

the execution time of this method must be small when threading is False (5 seconds between two communications)

Sender side

class hairgap.sender.DirectorySender(config: hairgap.utils.Config)[source]

Send the content of a directory. Must be subclassed to implement transfer_abspath and index_abspath.

sender = DirectorySender(Config())
sender.prepare_directory()
# modify in-place the data directory! generate the index file
sender.send_directory()
get_attributes() → Dict[str, str][source]

return a dict of attributes to add in the index file (like unique IDs to track transfers on the receiver side) keys and values must be simple strings (no new-lines symbols and not contains the ” = ” substring). Available keys must be added to the used Receiver.available_attributes.

index_abspath

returns the absolute path of the index file to create

prepare_directory() → Tuple[int, int][source]

create an index file and return the number of files and the total size (including the index file).

can modify in-place some files (those empty or beginning by `# *-* HAIRGAP-`) when not config.use_tar_archives

result is always (1, 0) when config.use_tar_archives and not config.always_compute_size to speed up

send_directory(port: Optional[int] = None)[source]

send all files using hairgap.

Parameters:port – the port to send to, overriding the default config

raise ValueError in case of error on the index or the directory to send

send_directory_no_tar(port: Optional[int] = None)[source]

send all files using hairgap

send_directory_tar(port: Optional[int] = None)[source]

send all files using hairgap, using the tar method.

Parameters:port – the port to send to, overriding the default config
split_source_files(dir_abspath: str, split_size: int)[source]

transform some files into a single, splitted, archive

move the content of the source folder in a subfolder create another folder in the same source folder create a tar.gz file with the first subfolder and split it into chunks into the second subfolder remove the first subfolder move the content of the second subfolder to its parent remove the second subfolder

transfer_abspath

returns the absolute path of directory to send

Configuration

class hairgap.utils.Config(destination_ip=None, destination_port: int = 15124, destination_path=None, end_delay_s: Optional[float] = 3.0, error_chunk_size: Optional[int] = None, keepalive_ms: Optional[int] = 500, max_rate_mbps: Optional[int] = None, mem_limit_mb: Optional[int] = None, mtu_b: Optional[int] = None, timeout_s: float = 3.0, redundancy: float = 3.0, hairgapr: str = 'hairgapr', hairgaps: str = 'hairgaps', tar: str = None, split: str = None, cat: str = None, use_tar_archives: Optional[bool] = None, always_compute_size: bool = True, split_size: Optional[int] = None)[source]

Stores hairgap command-line options, delay between successive sends, and temporary directory. Every parameter is accessed through a property decorator, so it can easily overriden. You should check https://github.com/cea-sec/hairgap for hairgap options.

static get_bin_prefix(name, path: Optional[str] = None)[source]

search a binary in standard paths. $PATH may be not set, but we only use it for basic UNIX tools (tar/cat/split)

hairgap.utils.ensure_dir(path, parent=True)[source]

Ensure that the given directory exists

Parameters:
  • path – the path to check
  • parent – only ensure the existence of the parent directory