On Wed, 17 Jul 2019, Paolo Abeni wrote:
I spent some time reviewing the current code-base looking for missing
pieces required for multiple subflow support, and I think there are a
bunch of things that would be needed no matter what.
I'd like to share the list, with some random implementation details, to
possibly avoid duplicate effort and/or large misunderstanding on my
side. Here it goes:
* Update msk data_ack:
- hook in mptcp_attach_dss() - possibly rename the function to
That's a good place for a hook. Peter's current MP_JOIN patch set has a
- cope with 32bit ack
The current prototype cheats by ignoring the incoming acks. With the
single-subflow assumption, the MPTCP-level socket isn't yet keeping a copy
of the transmitted data and there's no action to take with incoming acks.
The peer may send 32-bit acks so we do need to handle both ack lengths
when we add ack handling.
- possibly some additional msk-level [spin-]lock would be required
protect from concurrent updates when multiple sub-flows are running
Possibly. cmpxchg (with appropriate barriers) might work well for updating
the ack since it keeps moving forward. If there was a concurrent update it
would get detected and it would be clear whether the update should be
attempted again. The mptcp-level transmit buffer could be trimmed later
when the mptcp socket can be properly locked.
- while at this, eventually re-consider (again!) conn_list
* Store in msk pending data, to allow retransmissions
- in sendmsg(), queue pending (unacked) xmit msg on msk rtx_queue
- does it need to be a rb tree?
- sort by mptcp seq
- cope with 32bit seq
I don't think we need an rb tree. Most of the usage will be appending new
data and trimming the head of the list as acks arrive. If we need to
retransmit, we start from the first non-acked byte, which is the head of
the list. Since we don't have SACK functionality, it doesn't seem like we
will attempt to retransmit from the middle of the unacked data unless
there's a subflow failure we're trying to recover from - and that's a rare
enough event we can just go through the pending data linearly. Maybe the
kernel has some lessons for us here :)
- allocate the node element from the msk page frag's allocator
- each node must contains: all the data frag, mptcp seq,
Do we need much beyond a list of page frags, the mptcp seq for the
beginning byte, and the length?
If we get fancy about handling retransmissions we could keep track of
which subflow the data was already sent on, but it looks like we don't
need that to get basic functionality up and running.
- NO need to check vs mptcp window before tcp_push()
unless we want to update the window size as per RFC
(max of subflow's window size)
because msk ws is greater equal than subflows ws
- free acked nodes (and data) still in sendmsg(),
just after acquiring msk lock, before doing any real
sendmsg related work
We can also check on the recvmsg side and in a workqueue (maybe shared
with the retransmit check?) so memory gets freed up even when the socket
isn't sending all the time.
* Implement msk-level retransmission
- add msk-level [hr-]timer
- schedule it in sendmsg, after tcp_push()
- e.g. if msk ack != msk seq
- which timeout? subflow's rtt based ?
The RFC mentions "a few subflow-level retransmission timeouts" as a
conservative policy (section 3.3.6).
- when the timer expires, trigger retransmit if msk ack is not
changed (increased) since the timer's scheduling
- reuse/factor-out part of current sendmsg_frag() retransmit
a bunch of already-in-kernel page frags.
That's what I've been thinking. Whether a first-time transmission or a
retransmission, we'll take the data from a list of page frags and
I hope the all above can be quite independent from e.g. path management
Yes, it should be.
If there is agreement on that, my plan would be to work on the listed
items, in order. As usual, any feedback more than welcome!
Ok. I've been planning to work on these tasks too, but am not ready to
start on them today. You should dive in. After I get a couple of other
tasks out of the way I can coordinate with you.