summaryrefslogtreecommitdiff
path: root/src/lib/frct.c
Commit message (Collapse)AuthorAgeFilesLines
* lib: Set a timeout on FRCT control packetsDimitri Staessens41 hours1-15/+122
| | | | | | | | | | | | | Time out frct_tx for control packets at 250us so a full tx ring cannot stall the timer wheel (and with it KA, TLP, RXM fires). DATA frames (fresh, RXM, TLP, FIN) keep blocking - dropping them would lose recovery progress. Add inact_drop, drf_rebase, rq_released, tlp_snd, sdu_snd_alloc, sdu_snd_tx, sdu_sole, rxm_tx_dead, and per-type tx_drop counters. Signed-off-by: Dimitri Staessens <dimitri@ouroboros.rocks> Signed-off-by: Sander Vrijders <sander@ouroboros.rocks>
* lib: Add tail loss probe (TLP) to FRCPDimitri Staessens41 hours1-21/+162
| | | | | | | | | | | | | | | | | | | | | | | | The previous bugfixes that fixed a false-positive inactivity timer check unmasked the need for the tail loss probe (TLP) to quickly recover from loss of the HoL. When the sender finishes its app data with packets still inflight, RACK fast-rxm cannot fire (no ACK arrives), pre-DRF NACK is gated on rcv_cr.inact, and rto_mul climbs into seconds before the next RXM fire — recv-side deadline expires first. This adds the RFC 8985 §7.2 TLP: PTO = 2*SRTT + max_ack_delay (t_a), capped by the HoL slot's current RTO deadline. Probe fires from frcti_snd and after frcti_ack_rcv; per §7.3 at most one TLP per recovery episode, tracked via tlp_high_seq. On fire, re-emit HoL via fast_rxm_send and hand off to RTO — no PTO self re-arm. A new SND_TLP slot flag is Karn-skipped for RTT sampling, cleared by RTO retransmit or NACK-driven fast-rxm, and clears rto_mul on its TLP-ACK (rto_mul carries no CC meaning in FRCP). Also caps MAX_RTO_MUL at 8 instead of 20. Signed-off-by: Dimitri Staessens <dimitri@ouroboros.rocks> Signed-off-by: Sander Vrijders <sander@ouroboros.rocks>
* lib: Fix underrun in activity timerDimitri Staessens41 hours1-22/+50
| | | | | | | | | | | The inactivity timers are updated under rdlock with atomics on the time stamp. Inactivity is measured by comparison between the act stamp against the thread's own current time (now_ns). A different thread could already have updated the act stamp, causing undderun in the signed comparison and causing a false-positive on the inactive test. Signed-off-by: Dimitri Staessens <dimitri@ouroboros.rocks> Signed-off-by: Sander Vrijders <sander@ouroboros.rocks>
* lib: Skip DRF rebase on same-epoch retransmitDimitri Staessens4 days1-2/+24
| | | | | | | | | | | | | | | | | | | | On some stalls the receiver can go inactive and NACK-driven HoL retransmits and brief snd-window drains can deliver a DRF-flagged packet that's still part of the current receive epoch (rxm_pkt_prepare preserves the original flags). This treats in-window DRF and RXM-flagged DRF arrivals as same-epoch once the window is seeded and skip the release_rq + lwe/rwe rebase that would otherwise wipe valid OOO fragments from rq[]. Bootstrap (lwe == rwe) still rebases unconditionally so a NACK-driven retransmit of a lost initial DRF can seed the receiver. Harden seqno_rotate to redraw the ISN until it falls outside the peer's current rcv window, closing the ~RQ_SIZE / 2^32 collision where a true new epoch would be misclassified as same-epoch. Signed-off-by: Dimitri Staessens <dimitri@ouroboros.rocks> Signed-off-by: Sander Vrijders <sander@ouroboros.rocks>
* lib: Update FRCP implementationDimitri Staessens4 days1-500/+3339
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The Flow and Retransmission Control Protocol (FRCP) runs end-to-end between two peers over a flow. It provides reliability, in-order delivery, flow control, and liveness. Note that congestion avoidance is orthogonal to FRCP and handled in the IPCP. A fixed 16-octet header, network byte order, is prefixed to every FRCP packet: 0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | flags | hcs | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | window | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | seqno | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | ackno | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | payload (variable) ... +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ hcs is a CRC-16-CCITT-FALSE checksum over the PCI (and the stream extension when present), verified before any flag-driven dispatch. A single packet can simultaneously carry DATA + ACK + FC + RXM by OR-ing flag bits. An optional CRC trailer covers the body on DATA when qs.ber == 0, and on every SACK packet; an optional AEAD wrap (per-flow keys) sits outermost. Flag bits (MSB-first; bits 13..15 reserved, MUST be zero): +------+--------+--------+----------------------------------------+ | Bit | Mask | Name | Meaning | +------+--------+--------+----------------------------------------+ | 0 | 0x8000 | DATA | Carries caller payload | | 1 | 0x4000 | DRF | Start of a fresh data run | | 2 | 0x2000 | ACK | ackno field valid | | 3 | 0x1000 | NACK | Pre-DRF nudge (seqno informational) | | 4 | 0x0800 | FC | window field valid (rwe advertisement) | | 5 | 0x0400 | RDVS | Rendezvous probe (window-closed) | | 6 | 0x0200 | FFGM | First Fragment of a multi-fragment SDU | | 7 | 0x0100 | LFGM | Last Fragment of a multi-fragment SDU | | 8 | 0x0080 | RXM | Retransmission | | 9 | 0x0040 | SACK | Block list follows in payload | | 10 | 0x0020 | RTTP | RTT probe / echo (payload follows) | | 11 | 0x0010 | KA | Keepalive | | 12 | 0x0008 | FIN | End of stream marker | | 13-15| -- | -- | Reserved (MUST be zero) | +------+--------+--------+----------------------------------------+ (FFGM, LFGM) encodes the fragment role of a DATA packet (SCTP-style B/E): 11=SOLE, 10=FIRST, 00=MID, 01=LAST. Each fragment carries its own seqno; Retransmission recovers fragments individually, reassembly runs at consume time. In stream mode FFGM/LFGM are unused; per-byte position is carried by the stream extension below and end-of-stream is signalled by FIN on a 0-byte DATA packet. SACK payload (FRCT_ACK | FRCT_FC | FRCT_SACK): 0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | n_blocks | padding (2 octets) | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | start[0] | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | end[0] | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ ... n_blocks pairs total ... Each block describes a *present* (received) range strictly above the cumulative ACK in the PCI ackno. D-SACK (RFC 2883) is signalled in-band as block[0] - no flag bit, no extra framing - and consumed by the RACK reo_wnd_mult scaler (RFC 8985 sec. 7.2). RTTP payload (FRCT_RTTP only; 24 octets): 0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | probe_id | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | echo_id | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | | + nonce (16 octets, echoed verbatim) + | | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ Stream PCI extension (in_order == STREAM only; 8 octets after the base PCI on every DATA packet): 0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | start | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | end | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ start, end are monotonic 32-bit byte offsets; end - start equals the on-wire payload length. Stream mode is negotiated at flow allocation; the extension is present iff stream mode is in use, never on a per-packet basis. Service modes are an orthogonal (in_order, loss, ber) vector selected at flow_alloc; the cubes above map to the axes: +----------------+---------+------+-----+-----------------------+ | Cube | in_order| loss | ber | Engaged | +----------------+---------+------+-----+-----------------------+ | qos_raw | 0 | 1 | 1 | Raw passthrough | | qos_raw_safe | 0 | 1 | 0 | Raw + CRC trailer | | qos_rt | 1 | 1 | 1 | FRCP, no FRTX, no CRC | | qos_rt_safe | 1 | 1 | 0 | FRCP, no FRTX, CRC | | qos_msg | 1 | 0 | 0 | FRCP + FRTX | | qos_stream | 2 | 0 | 0 | FRCP + FRTX, stream | +----------------+---------+------+-----+-----------------------+ in_order=0 sends raw datagrams with no PCI (UDP-equivalent); in_order=1 engages FRCP with SDU framing; in_order=2 (stream) requires loss=0 and is rejected otherwise. loss=0 engages the FRTX retransmit machinery. ber=0 appends the CRC-32 trailer; QOS_DISABLE_CRC at build time forces ber=1 for development. Encryption is a separate per-flow attribute layered as an AEAD wrap outside the FRCP packet. Heritage: delta-t (Watson 1981) supplies timer-based connection management - no SYN/FIN handshake, the DRF marker, the t_mpl / t_a / t_r timers. RINA (Day 2008) supplies the unified flow_alloc(name, qos, ...) primitive and the orthogonal QoS-cube axes. Loss detection follows TCP/QUIC practice (RFCs 2018, 2883, 6582, 6298, 8985); RTT probing is nonce-authenticated like QUIC PATH_CHALLENGE. Adds oftp, a minimal file-transfer tool over an FRCP stream flow. The client reads from stdin or --in FILE and writes through a flow_alloc(qos_stream); the server (--listen) calls flow_accept and writes to stdout or --out FILE. Both sides compute a CRC-64/NVMe over the bytes they handle and print the result. The server rejects flows whose negotiated qs.in_order != STREAM. Two FRCP knobs are exposed via env vars on either side: OFTP_FRCT_RTO_MIN fccntl FRCTSRTOMIN (ns) OFTP_FRCT_STREAM_RING_SZ fccntl FRCTSRRINGSZ (octets) The ocbr_client gains an OCBR_QOS env var to pick the cube the client uses for flow_alloc; recognised values are raw, safe, rt, rt_safe, msg, stream. Unknown values fall back to raw with a warning on stderr. Without the env set behaviour is unchanged. Removes the deprecated lib/timerwheel.c Signed-off-by: Dimitri Staessens <dimitri@ouroboros.rocks> Signed-off-by: Sander Vrijders <sander@ouroboros.rocks>
* lib: Use push/pop for ssm_pk_buff opsDimitri Staessens4 days1-2/+2
| | | | | | | | | Renames the allocation for head/tail to push/pop instead of alloc/release as it's simpler and shorter. Took this approach insted of adopting the kernel's push/pull/put/trim. Signed-off-by: Dimitri Staessens <dimitri@ouroboros.rocks> Signed-off-by: Sander Vrijders <sander@ouroboros.rocks>
* lib: Rename ssm_pk_buff_get_idx to ssm_pk_buff_get_offDimitri Staessens2026-05-061-1/+1
| | | | | | | | | | The shared memory pool is now offset based instead of block index-based like the old shm_rdrbuff allocator. This renames the API more consistently. Also changes variables names to off instead of idx for consistency. Signed-off-by: Dimitri Staessens <dimitri@ouroboros.rocks> Signed-off-by: Sander Vrijders <sander@ouroboros.rocks>
* build: Update copyright to 2026Dimitri Staessens2026-02-181-1/+1
| | | | | Signed-off-by: Dimitri Staessens <dimitri@ouroboros.rocks> Signed-off-by: Sander Vrijders <sander@ouroboros.rocks>
* lib: Add per-user packet poolsDimitri Staessens2026-02-131-9/+9
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The IRMd will now check the user UID and GID for privileged access, avoiding unprivileged users being able to disrupt all IPC (e.g. by shm_open the single pool and corrupting its metadata). Non-privileged users are now limited to a PUP (per-user pool) for sending/receiving packets. It is still created by the IRMd, but owned by the user (uid) with 600 permissions. It does not add additional copies for local IPC between their own processes (i.e. over the local IPCP), but packets between processes owned by a different user or destined over the network (other IPCPs) will incur a copy when crossing the PUP / PUP or the PUP / GSPP boundary. Privileged users and users in the ouroboros group still have direct access to the GSPP (globally shared private pool) for packet transfer that will avoid additional copies when processing packets between processes owned by different users and to the network. This aligns the security model with UNIX trust domains defined by UID and GID by leveraging file permission on the pools in shared memory. ┌─────────────────────────────────────────────────────────────┐ │ Source Pool │ Dest Pool │ Operation │ Copies │ ├─────────────────────────────────────────────────────────────┤ │ GSPP │ GSPP │ Zero-copy │ 0 │ │ PUP.uid │ PUP.uid │ Zero-copy │ 0 │ │ PUP.uid1 │ PUP.uid2 │ memcpy() │ 1 │ │ PUP.uid │ GSPP │ memcpy() │ 1 │ │ GSPP │ PUP.uid │ memcpy() │ 1 │ └─────────────────────────────────────────────────────────────┘ This also renames the struct ai ("application instance") in dev.c to struct proc (process). Signed-off-by: Dimitri Staessens <dimitri@ouroboros.rocks> Signed-off-by: Sander Vrijders <sander@ouroboros.rocks>
* lib: Replace rdrbuff with a proper slab allocatorDimitri Staessens2026-01-261-22/+22
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This is a first step towards the Secure Shared Memory (SSM) infrastructure for Ouroboros, which will allow proper resource separation for non-privileged processes. This replaces the rdrbuff (random-deletion ring buffer) PoC allocator with a sharded slab allocator for the packet buffer pool to avoid the head-of-line blocking behaviour of the rdrb and reduce lock contention in multi-process scenarios. Each size class contains multiple independent shards, allowing parallel allocations without blocking. - Configurable shard count per size class (default: 4, set via SSM_POOL_SHARDS in CMake). The configured number of blocks are spread over the number of shards. As an example: SSM_POOL_512_BLOCKS = 768 blocks total These 768 blocks are shared among 4 shards (not 768 × 4 = 3072 blocks) - Lazy block distribution: all blocks initially reside in shard 0 and naturally migrate to process-local shards upon first allocation and subsequent free operations - Fallback with work stealing: processes attempt allocation from their local shard (pid % SSM_POOL_SHARDS) first, then steal from other shards if local is exhausted, eliminating fragmentation while maintaining low contention - Round-robin condvar signaling: blocking allocations cycle through all shard condition variables to ensure fairness - Blocks freed to allocator's shard: uses allocator_pid to determine target shard, enabling natural load balancing as process allocation patterns stabilize over time Maintains existing robust mutex semantics including EOWNERDEAD handling for dead process recovery. Internal structures exposed in ssm.h for testing purposes. Adds some tests (pool_test, pool_sharding_test.c. etc) verifying lazy distribution, migration, fallback stealing, and multiprocess behavior. Updates the ring buffer (rbuff) to use relaxed/acquire/release ordering on atomic indices. The ring buffer requires the (robust) mutex to ensure cross-structure synchronization between pool buffer writes and ring buffer index publication. Signed-off-by: Dimitri Staessens <dimitri@ouroboros.rocks> Signed-off-by: Sander Vrijders <sander@ouroboros.rocks>
* ipcpd: Update DHT for unicast layerDimitri Staessens2025-08-061-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This is a rewrite of the DHT for name-to-address resolution in the unicast layer. It is now integrated as a proper directory policy. The dir_wait_running function is removed, instead the a DHT peer is passed on during IPCP enrolment. Each DHT request/response gets a random 64-bit ID ('cookie'). DHT messages to the same peer are deduped, except in the case when the DHT is low on contacts. In that case, it will contact the per it received at enrolment for more contacts. To combat packet loss, these messages are not deduped by means of a 'magic cookie', chosen at random when the DHT starts. The DHT parameters (Kademlia) can be set using the configfile or the IRM command line tools: if DIRECTORY_POLICY == DHT [dht_alpha <search factor> (default: 3)] [dht_k <replication factor> (default: 8)] [dht_t_expire <expiration (s)> (default: 86400)] [dht_t_refresh <contact refresh (s)> (default: 900)] [dht_t_replicate <replication (s)> (default: 900)] This commit also adds support for a protocol debug level (PP). Protocol debugging for the DHT can be enabled using the DEBUG_PROTO_DHT build flag. The DHT has the following message types: DHT_STORE, sent to k peers. Not acknowledged. DHT_STORE --> [2861814146dbf9b5|ed:d9:e2:c4]. key: bcc236ab6ec69e65 [32 bytes] val: 00000000c4e2d9ed [8 bytes] exp: 2025-08-03 17:29:44 (UTC). DHT_FIND_NODE_REQ, sent to 'alpha' peers, with a corresponding response. This is used to update the peer routing table to iteratively look for the nodes with IDs closest to the requested key. DHT_FIND_NODE_REQ --> [a62f92abffb451c4|ed:d9:e2:c4]. cookie: 2d4b7acef8308210 key: a62f92abffb451c4 [32 bytes] DHT_FIND_NODE_RSP <-- [2861814146dbf9b5|ed:d9:e2:c4]. cookie: 2d4b7acef8308210 key: a62f92abffb451c4 [32 bytes] contacts: [1] [a62f92abffb451c4|9f:0d:c1:fb] DHT_FIND_VALUE_REQ, sent to 'k' peers, with a corresponding response. Used to find a value for a key. Will also send its closest known peers in the response. DHT_FIND_VALUE_REQ --> [2861814146dbf9b5|ed:d9:e2:c4]. cookie: 80a1adcb09a2ff0a key: 42dee3b0415b4f69 [32 bytes] DHT_FIND_VALUE_RSP <-- [2861814146dbf9b5|ed:d9:e2:c4]. cookie: 80a1adcb09a2ff0a key: 42dee3b0415b4f69 [32 bytes] values: [1] 00000000c4e2d9ed [8 bytes] contacts: [1] [a62f92abffb451c4|9f:0d:c1:fb] Also removes ubuntu 20 from appveyor config as it is not supported anymore. Signed-off-by: Dimitri Staessens <dimitri@ouroboros.rocks> Signed-off-by: Sander Vrijders <sander@ouroboros.rocks>
* irmd: Initial Flow Allocation Protocol HeaderDimitri Staessens2025-07-231-10/+10
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This adds the initial version for the flow allocation protocol header between IRMd instances. This is a step towards flow authentication. The header supports secure and authenticated flow allocation, supporting certificate-based authentication and ephemeral key exchange for end-to-end encryption. id: 128-bit identifier for the entity. timestamp: 64-bit timestamp (replay protection). certificate: Certificate for authentication. public key: ECDHE public key for key exchange. data: Application data. signature: Signature for integrity/authenticity. Authentication and encryption require OpenSSL to be installed. The IRMd compares the allocation request delay with the MPL of the Layer over which the flow allocation was sent. MPL is now reported by the Layer in ms instead of seconds. Time functions revised for consistency and adds some tests. The TPM can now print thread running times in Debug builds (TPM_DEBUG_REPORT_INTERVAL) and abort processes with hung threads (TPM_DEBUG_ABORT_TIMEOUT). Long running threads waiting for input should call tpm_wait_work() to avoid trigger a process abort. Signed-off-by: Dimitri Staessens <dimitri@ouroboros.rocks> Signed-off-by: Sander Vrijders <sander@ouroboros.rocks>
* lib: Add authentication functionsDimitri Staessens2025-07-041-1/+1
| | | | | | | | | | Adds functions needed for authentication using X509 certificates, implemented using OpenSSL. Refactors some library internals, and adds some unit tests for them. Signed-off-by: Dimitri Staessens <dimitri@ouroboros.rocks> Signed-off-by: Sander Vrijders <sander@ouroboros.rocks>
* lib: Revise app flow allocationDimitri Staessens2024-02-231-2/+2
| | | | | | | | | | | | | This revises the application flow allocator to use the flow_info struct/message between the components. Revises the messaging to move the use protocol buffers to its own source (serdes-irm). Adds a timeout to the IRMd flow allocator to make sure flow allocations don't hang forever (this was previously taken care of by the sanitize thread). Signed-off-by: Dimitri Staessens <dimitri@ouroboros.rocks> Signed-off-by: Sander Vrijders <sander@ouroboros.rocks>
* build: Update licenses to 2024Dimitri Staessens2024-01-131-1/+1
| | | | | | | Slow but steady. Signed-off-by: Dimitri Staessens <dimitri@ouroboros.rocks> Signed-off-by: Sander Vrijders <sander@ouroboros.rocks>
* lib: Wrap pthread_cond_timedwait for NULL abstimeDimitri Staessens2023-10-251-6/+1
| | | | | | | | | We often have the pattern where we NULL-check abstime for pthread_cond_timedwait to call pthread_cond_wait if it is. Added a __timedwait function to wrap this. Signed-off-by: Dimitri Staessens <dimitri@ouroboros.rocks> Signed-off-by: Sander Vrijders <sander@ouroboros.rocks>
* lib: Make crypt.c independent source fileDimitri Staessens2023-10-251-1/+1
| | | | | | | | | | | The cryptography functions were in a C source that was directly imported into dev.c, enabling ECDHE+AES256 symmetric key encryption on flows. Now crypt.c is an independent source file with associated crypt.h header, to prepare for security management and configuration in the IRMd. Signed-off-by: Dimitri Staessens <dimitri@ouroboros.rocks> Signed-off-by: Sander Vrijders <sander@ouroboros.rocks>
* include: Revise printing hashesDimitri Staessens2023-08-231-0/+2
| | | | | | | | | | | The code was a bit convoluted to print hashes as hex strings. Renamed to HASH_FMT32 and HASH_VAL32 to make clear we are printing the first 32 bits only, and added options to print 64 up to 512 bits as well. This doesn't depend on endianness anymore. Adds a small test for the hash (printing) functions. Signed-off-by: Dimitri Staessens <dimitri@ouroboros.rocks> Signed-off-by: Sander Vrijders <sander@ouroboros.rocks>
* lib: Fix signed/unsigned mismatches on raspbianDimitri Staessens2023-08-231-2/+2
| | | | | | | | Compilation on raspberry pi revealed some previously undetected signed/unsigned comparisons in the library. Signed-off-by: Dimitri Staessens <dimitri@ouroboros.rocks> Signed-off-by: Sander Vrijders <sander@ouroboros.rocks>
* build: Update copyright to 2023Dimitri Staessens2023-02-131-1/+1
| | | | | | | 2022 was a rather slow year... Signed-off-by: Dimitri Staessens <dimitri@ouroboros.rocks> Signed-off-by: Sander Vrijders <sander@ouroboros.rocks>
* lib: Rename timerwheel_ack timerwheel_delayed_ack0.19.2Dimitri Staessens2022-04-131-4/+2
| | | | | | | | | This makes it clear that we are scheduling a potential delayed acknowledgment instead of acknowledging a packet scheduled for retransmission. Also some small cosmetic fixes. Signed-off-by: Dimitri Staessens <dimitri@ouroboros.rocks> Signed-off-by: Sander Vrijders <sander@ouroboros.rocks>
* build: Update copyright to 2022Dimitri Staessens2022-04-031-1/+1
| | | | | | | Growing pains. Signed-off-by: Dimitri Staessens <dimitri@ouroboros.rocks> Signed-off-by: Sander Vrijders <sander@ouroboros.rocks>
* lib: Fix delayed ACK under high loadDimitri Staessens2022-04-031-3/+9
| | | | | | | | | | The delayed ACK was wrongly measuring the delay against the receiver activity instead of the sender activity. Also fixed receiver activity not being updated for non-data packets (and duplicates and other dropped traffic). Signed-off-by: Dimitri Staessens <dimitri@ouroboros.rocks> Signed-off-by: Sander Vrijders <sander@ouroboros.rocks>
* lib: Add support for Linux RTT estimatorDimitri Staessens2022-04-031-2/+7
| | | | | | | | | | | | | This adds the option to use the Round-Trip-Time (RTT) estimation algorithm as it is implemented in the TCP implementation in Linux. It looks like it outperforms the TCP default algorithm, so I enabled this one by default. Also adds the option to change the RTO timeout calculation to include more (or less) than 4 times the mdev (specified as a power of 2. Left the default value to 2 (so, 4 mdevs), but 3 (8 mdevs) gives better results in my tests. Signed-off-by: Dimitri Staessens <dimitri@ouroboros.rocks> Signed-off-by: Sander Vrijders <sander@ouroboros.rocks>
* lib: Fix timing of delayed ACKsDimitri Staessens2022-04-011-5/+25
| | | | | | | | | | Delayed ACKs are now sent after twice the internal tick time. Fixes initial ACK record (rcv_cr.seqno) being uninitialized (0) when the first ACK was to be sent. Adds some FRCT metrics for number of received delayed (bare) ACKs and the RTT estimator. Signed-off-by: Dimitri Staessens <dimitri@ouroboros.rocks> Signed-off-by: Sander Vrijders <sander@ouroboros.rocks>
* lib: Fix unidirectional FRCT traffic handlingDimitri Staessens2022-03-301-7/+9
| | | | | | | | | | | | | Unidirectional traffic has one of the peers only send bare FRCT packets. These never set a DRF, since they have no sequence number. At the receiver, all these ACKs and window updates were always dropped as the receiver connection record was timed out. Also fixes a SEGV if flow control kicks in (passing NULL timeout to pthread_cond_timedwait). Signed-off-by: Dimitri Staessens <dimitri@ouroboros.rocks> Signed-off-by: Sander Vrijders <sander@ouroboros.rocks>
* lib: Fix filtering encrypted packetsDimitri Staessens2022-03-301-52/+0
| | | | | | | | | | | | The frcti_filter was reading raw data from the buffers, causing the frcti_rcv to operate directly on encrypted packets. It decrypt and filter for invalid packets. I moved the function from frct to the fqueue implementation and renamed it fqueue_filter as it filters fqueues. Should be extended to filter out keepalives on non-FRCT flows, as these will now still cause spurious wakeups. Signed-off-by: Dimitri Staessens <dimitri@ouroboros.rocks> Signed-off-by: Sander Vrijders <sander@ouroboros.rocks>
* lib: Move timerwheel processing to its own threadDimitri Staessens2022-03-301-8/+0
| | | | | | | | | | | | | | | This is the first step moving away from scheduling the FRCT and flow monitoring functions as part of the IPC calls (flow_read / flow_write / fevent) and towards the more scalable (and far less complicated) implementation to take care of these functions in separate threads. If a process creates the first flow that requires FRCT, it will spin up a thread to process events on the timerwheel (retransmissions and delayed ACKs). This single thread lives until the last flow with FRCT is deallocated. Signed-off-by: Dimitri Staessens <dimitri@ouroboros.rocks> Signed-off-by: Sander Vrijders <sander@ouroboros.rocks>
* lib: Refactor reading packet from rbuffDimitri Staessens2022-03-301-1/+1
| | | | | | | | | | | | Reading packets from the rbuff and checking their validity (non-zero size, pass crc check, pass decryption) is now extracted into a function. Also adds a function to get the length of an sdu_du_buff instead of subtracting the tail and head pointers. Signed-off-by: Dimitri Staessens <dimitri@ouroboros.rocks> Signed-off-by: Sander Vrijders <sander@ouroboros.rocks>
* lib: Non-configurable delayed acks in FRCPDimitri Staessens2022-03-301-15/+11
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | It doesn't really make sense to manually and one-sidedly configure the timeout of delayed acknowledgements, as setting it too high upsets the peer's sRTT estimates. Even worse, it also causes a lot of spurious retransmissions if it exceeds the sRTT mean deviation calculated by the receiver. Compensating on bare acknowledgment for the ack delay could improve the RTT estimate deviation, but not the spurious retransmissions if it was set too high. This sets the delayed ack to wait for a single RTT mean deviation. Probably needs more tweaking to further reduce differences between the RTT estimates at the sender and receiver, e.g. compensate the RTT estimate for delayed acks, or increase the RTO to add 8 mdevs to sRTT instead of 4. However, it looks like the mdev estimate is the trickiest one to get to sync, not the RTT average. Linux reduces the sample weight for mdev from 1/4 to 1/32 in some cases, will give that a shot some day too to see if that further align sRTT estimates. In any case, this patch already improves things a lot. Also fixes a bug where the sender was sending acknowlegments on the first packets in flight for the 0 sequence number. The receiver activity was measured in seconds but compared to a timeout value in nanoseconds. There's still a lot of spurious retransmissions that start after actual packet loss occurs, I'm still investigating what causes it. Signed-off-by: Dimitri Staessens <dimitri@ouroboros.rocks> Signed-off-by: Sander Vrijders <sander@ouroboros.rocks>
* lib: Expose flow control metrics to RIB0.19.1Dimitri Staessens2022-03-161-12/+46
| | | | | | | | | | This exposes some additional metrics relating to FRCT / Flow control: the number of duplicate packets received, number of packets received out of the flow control window and / or reordering queue, and the number of rendez-vous messages sent. Signed-off-by: Dimitri Staessens <dimitri@ouroboros.rocks> Signed-off-by: Sander Vrijders <sander@ouroboros.rocks>
* lib: Fix buffer allocation when retransmitting0.19.0Dimitri Staessens2022-03-111-5/+15
| | | | | | | | | | | | | | | The timerwheel was retransmitting packets and the error check for negative values of the rbuff allocation was instead checking for non-zero values, causing a buffer allocation to succeed but the program to continue down the unhappy path leaving that packet stuck in the buffer unattended. Also fixes wrongly scheduled retransmissions that cause packet storms. FRCP is much more stable now. Still needs some work for high bandwidth-delay products (fast-retransmit). Signed-off-by: Dimitri Staessens <dimitri@ouroboros.rocks> Signed-off-by: Sander Vrijders <sander@ouroboros.rocks>
* lib: Pass Delta-t params to frcti_create()Dimitri Staessens2022-03-081-7/+7
| | | | | | | | | The parameters were set directly from the build configs. A first step to making FRCP configurable at runtime, is to pass the parameters to the frcti_create() function. Signed-off-by: Dimitri Staessens <dimitri@ouroboros.rocks> Signed-off-by: Sander Vrijders <sander@ouroboros.rocks>
* lib: Fix RTT estimator invocation in FRCTDimitri Staessens2022-03-031-1/+1
| | | | | | | The notorious off-by-one hit again. Signed-off-by: Dimitri Staessens <dimitri@ouroboros.rocks> Signed-off-by: Sander Vrijders <sander@ouroboros.rocks>
* lib: Fix lock reversal in timerwheelDimitri Staessens2022-03-031-5/+0
| | | | | | | | | There was a lock reversal in the timerwheel. There still is a thorough revision needed of the locking in dev.c after the FRCP logic is completed and tuned. Signed-off-by: Dimitri Staessens <dimitri@ouroboros.rocks> Signed-off-by: Sander Vrijders <sander@ouroboros.rocks>
* lib: Encrypt bare FRCP messages on encrypted flowsDimitri Staessens2022-03-031-28/+19
| | | | | | | | Bare FRCP messages (ACKs without data, Rendez-vous packets) were not encrypted on encrypted flows, causing the receiver to fail decryption. Signed-off-by: Dimitri Staessens <dimitri@ouroboros.rocks> Signed-off-by: Sander Vrijders <sander@ouroboros.rocks>
* lib: Add initial flow liveness monitoringDimitri Staessens2022-02-241-0/+1
| | | | | | | | | | | | | | | | | | This adds flow liveness monitoring for flows, with a fixed timeout of 120s. I will make it configurable at flow allocation later on (timeout needs to be communicated to the peer). If one peer dies, or doesn't call any IPC calls (flow_write/flow_read/fevent) it will stop sending keepalives and the other peer's read/writes will error on an -EFLOWDOWN after the timeout expires. Packets without a payload (0 length packets) are interpreted as keepalive packets for the flow. They can be sent from any application, but they will not trigger a message read at the receiver side (0 as a return value on flow_read indicates a previous partial read has completed at exactly the buffer size). Signed-off-by: Dimitri Staessens <dimitri@ouroboros.rocks> Signed-off-by: Sander Vrijders <sander@ouroboros.rocks>
* ipcpd: Fix free in fail path of readdirDimitri Staessens2022-02-171-0/+9
| | | | | | | | | The free of the buffer in the failure path of the readdir RIB functions was taking the wrong pointer in a couple of places. The FRCT RIB readdir was missing error handling for malloc and strdup. Signed-off-by: Dimitri Staessens <dimitri@ouroboros.rocks> Signed-off-by: Sander Vrijders <sander@ouroboros.rocks>
* lib: Add missing rwlock unlock in FRCTDimitri Staessens2021-12-221-2/+4
| | | | | | | There was a missing unlock in FRCT. Also fixes some indentation. Signed-off-by: Dimitri Staessens <dimitri@ouroboros.rocks> Signed-off-by: Sander Vrijders <sander@ouroboros.rocks>
* lib: Fix flow dealloc after expired FRCT timeoutDimitri Staessens2021-12-221-0/+1
| | | | | | | | | | If the timeout is already expired, the wait variable would be negative and return a negative value for the __frcti_dealloc function, thinking that the timeout was not expired causing an unnecessary wait even if all packets are acknowledged. Signed-off-by: Dimitri Staessens <dimitri@ouroboros.rocks> Signed-off-by: Sander Vrijders <sander@ouroboros.rocks>
* lib: Set initial sender rwe to sender seqnoDimitri Staessens2021-12-221-1/+1
| | | | | | | | | | | The initial sender right window edge (indicating acknowledged packet sequence number) was initialized to seqno - 1. This should be the same as seqno, since we acknowledge with the next expected sequence number. It also indicates that a flow without traffic has no outstanding acknowledgements. Signed-off-by: Dimitri Staessens <dimitri@ouroboros.rocks> Signed-off-by: Sander Vrijders <sander@ouroboros.rocks>
* lib: Application RIB with FRCT statisticsDimitri Staessens2021-06-301-12/+129
| | | | | | | | | | Application flows can now be monitored from the RIB, exposing FRCT statistics (window edges, retransmission timeout, rtt estimate, etc). Application RIB requires user permissions to be able to access /dev/fuse. Signed-off-by: Dimitri Staessens <dimitri@ouroboros.rocks> Signed-off-by: Sander Vrijders <sander@ouroboros.rocks>
* lib, ipcpd, irmd: Wrap pthread unlocks for cleanupDimitri Staessens2021-06-231-2/+1
| | | | | | | | | | | | This add an ouroboros/pthread.h header that wraps the pthread_..._unlock() functions for cleanup using pthread_cleanup_push() as this casting is not safe (and there were definitely bad casts in the code). The close() function is now also wrapped for cleanup in ouroboros/sockets.h. This allows enabling more compiler checks. Signed-off-by: Dimitri Staessens <dimitri@ouroboros.rocks> Signed-off-by: Sander Vrijders <sander@ouroboros.rocks>
* build: Update email addressesDimitri Staessens2021-01-031-2/+2
| | | | | | | | The ugent email addresses are shut down, updated to Ouroboros mail addresses. Signed-off-by: Dimitri Staessens <dimitri@ouroboros.rocks> Signed-off-by: Sander Vrijders <sander@ouroboros.rocks>
* build: Update copyright to 2021Dimitri Staessens2021-01-031-1/+1
| | | | | | | Happy New Year, Ouroboros! Signed-off-by: Dimitri Staessens <dimitri@ouroboros.rocks> Signed-off-by: Sander Vrijders <sander@ouroboros.rocks>
* lib: Add Rendez-Vous mechanism for flow controlDimitri Staessens2020-10-111-28/+128
| | | | | | | | | | | | | This adds the rendez-vous mechanism to handle the case where the sending window is closed and window updates get lost. If the sending window is closed, the sender side will send an RDVS every DELT_RDV time (100ms), and give up after MAX_RDV time (1 second). Upon reception of a RDVS packet, a window update is sent immediately. We can make this much more configurable later on (build options for defaults, fccntl for runtime tuning). Signed-off-by: Dimitri Staessens <dimitri@ouroboros.rocks> Signed-off-by: Sander Vrijders <sander@ouroboros.rocks>
* lib: Block on closed flow control windowDimitri Staessens2020-10-111-5/+104
| | | | | | | | | | | If the sending window for flow control is closed, the sending application will now block until the window opens. Beware that until the rendez-vous mechanism is implemented, shutting down a server while the client is sending (with non-timed-out blocking write) will cause the client to hang indefinitely because its window will close. Signed-off-by: Dimitri Staessens <dimitri@ouroboros.rocks> Signed-off-by: Sander Vrijders <sander@ouroboros.rocks>
* lib: Send and receive window updatesDimitri Staessens2020-10-111-9/+35
| | | | | | | | | This adds sending and receiving window updates for flow control. I used the 8 pad bits as part of the window update field, so it's 24 bits, allowing for ~16 million packets in flight. Signed-off-by: Dimitri Staessens <dimitri@ouroboros.rocks> Signed-off-by: Sander Vrijders <sander@ouroboros.rocks>
* lib: Add compiler configuration for FRCPDimitri Staessens2020-10-111-16/+16
| | | | | | | | | | | | | This allows configuring some parameters for FRCP at compile time, such as default values for Delta-t and configuration of the timerwheel. The timerwheel will now reschedule when it fails to create a packet, instead of setting the flow down immediately. Some new things added are options to store packets for retransmission on the heap, and using non-blocking calls for retransmission. The defaults do not change the current behaviour. Signed-off-by: Dimitri Staessens <dimitri@ouroboros.rocks> Signed-off-by: Sander Vrijders <sander@ouroboros.rocks>
* lib: Complete retransmission logicDimitri Staessens2020-09-251-97/+147
| | | | | | | | | | | | | | | | | | | | | | | | | | | This completes the retransmission (automated repeat-request, ARQ) logic, sending (delayed) ACK messages when needed. On deallocation, flows will ACK try to retransmit any remaining unacknowledged messages (unless the FRCTFLINGER flag is turned off; this is on by default). Applications can safely shut down as soon as everything is ACK'd (i.e. the current Delta-t run is done). The activity timeout is now passed to the IPCP for it to sleep before completing deallocation (and releasing the flow_id). That should be moved to the IRMd in due time. The timerwheel is revised to be multi-level to reduce memory consumption. The resolution bumps by a factor of 1 << RXMQ_BUMP (16) and each level has RXMQ_SLOTS (1 << 8) slots. The lowest level has a resolution of (1 << RXMQ_RES) (20) ns, which is roughly a millisecond. Currently, 3 levels are defined, so the largest delay we can schedule at each level is: Level 0: 256ms Level 1: 4s Level 2: about a minute. Signed-off-by: Dimitri Staessens <dimitri@ouroboros.rocks> Signed-off-by: Sander Vrijders <sander@ouroboros.rocks>