Preparation of patches for netdev: status
by Matthieu Baerts
Hello,
It took me more time than expected but I applied all the patches that
needed to be applied, the list is now empty:
https://patchwork.ozlabs.org/project/mptcp/list/?state=8
There are some new patches that needs review:
https://patchwork.ozlabs.org/project/mptcp/list/
But in summary, here are the items that still need to be done:
1) Rebase of Florian's patches added at the end of the series (at
least the ones impacting code before MPTCPv1 patches)
2) Remove export of __tcp_poll()
3) Review fixes related to TCP fallback from Paolo (RFC)
4) Review the update of the MAINTAINERS file / apply comments (?)
5) Check if there is no issue with MPTCPv1 patches (+ fixes if needed)
6) Review Mat's cover letters (part 1 & 2)
7) New cover letter for part 3
8) Review all commit messages (part 1, 2, 3)
9) Add signed-off of the submitters
10) Tests
*Florian* : is it still OK for you to do the rebase, items 1 & 2? :)
no urgency because we no longer target Monday but the end
of the week.
*Christoph* : may you look at the item 7 please? :)
I can do the item 9.
For the rest, they can be done by anybody, sometimes even better if
there are multiple people.
I hope I listed everything! Please tell me if something is missing or
not correct!
Cheers,
Matt
--
Matthieu Baerts | R&D Engineer
matthieu.baerts(a)tessares.net
Tessares SA | Hybrid Access Solutions
www.tessares.net
1 Avenue Jean Monnet, 1348 Louvain-la-Neuve, Belgium
2 years, 6 months
[PATCH 0/9] mptcp: support mptcp joins
by Florian Westphal
This set aims to improve handling of multiple subflows.
There are a few remaining problems, see below, but I wanted to send what
I have now rather than keep this back.
The first two patches are new, following few patches are a resend of the
earlier RFC series, I think they're good for acceptance now.
Patch 6 had a few changes since the RFC based on Paolos feedback, please
see patch description for details.
The last two patches are there to allow cancellation/teardown of
not-yet completed join requests. The connecting subflow flows
are placed on a 'join_list' so mptcp_shutdown can close these sockets
as well.
Last patch moves the 'eof hack' to a work queue so we don't set EOF on the
mptcp socket until all subflows have closed. Maybe this isn't needed, I
would be fine with dropping it in favor of mptcp-level DATA_FIN and
reset handling.
The rest of this work is to have msk->conn_list contain tcp_sock structs
without an outer 'struct socket'.
The problem is that for incoming join requests (which are not subject
to accept/request socket queue), there is no 'struct socket', so the kernel
will crash. Even when fixing those up, a NULL sk->sk_socket is problematic
as tcp stack will assume that such tcp sk is already detached from userspace,
i.e. NOSPACE wakeups do not work for them.
The solution here (suggested by Paolo) is to have sk->sk_socket refer
to the mptcp parent socket structure. This assignment is done by
a private function similar to sock_graft().
mptcp.org kernel will be able to establish multiple connections per
mptcp socket but does not consider those extra joins as established, as the
"4th ack" is missing.
A simple "tcp_send_ack" is enough, but there are two problems with this:
1. It can be lost
2. Given we are not expecting to receive any data on such flows this
doesn't seem to be a problem.
During testing I saw that sometimes mptcp.org kernel will emit further
join requests to mptcp-next but those trigger no response (and those
packets do not even show up in perfs network drop monitor, sigh).
I also saw a UAF at mptcp_close where a sock_hold() in tcp_close()
caused a 0->1 recount transition, but i've been unable to reproduce it so far
(might have been caused by a bug that has been fixed in the mean time).
The following changes since commit dd25ff04840135581b833762571aac08a2a94c6c:
sendmsg: truncate source buffer if mptcp sndbuf size was set from userspace (2019-11-30 02:47:59 +0000)
are available in the Git repository at:
git://git.breakpoint.cc/fw/mptcp-next.git mptcp_multijoin_11
for you to fetch changes up to 010c2e19c2d643a87c991bf1e7d4bfc140dfdba5:
don't flag parent socket as RCV_SHUTDOWN if one one subflow has closed (2019-11-30 23:21:25 +0100)
----------------------------------------------------------------
Florian Westphal (9):
token: handle join-before-accept-completion case
mptcp: fix race from mptcp_close/mptcp join
copy connection id from first subflow to mptcp socket
store tcp_sk, not socket
add mptcp_subflow_shutdown
make accept not allocate kernel socket struct
pass subflow socket to mptcp_finish_connect
subflow: place further subflows on new 'join_list'
don't flag parent socket as RCV_SHUTDOWN if one one subflow has closed
net/mptcp/pm.c | 18 +--
net/mptcp/protocol.c | 313 +++++++++++++++++++++++++++++++++++++--------------
net/mptcp/protocol.h | 16 +--
net/mptcp/subflow.c | 56 +++++----
net/mptcp/token.c | 9 +-
5 files changed, 292 insertions(+), 120 deletions(-)
2 years, 6 months
[PATCH] mptcp: Remove dss_flags field from struct mptcp_options_received
by Peter Krystad
This field is used only temporarily while parsing options,
use a local variable instead.
Formatted to apply directly to Implement MPTCP receive path.
squashto: Implement MPTCP receive path
Signed-off-by: Peter Krystad <peter.krystad(a)linux.intel.com>
---
include/linux/tcp.h | 1 -
net/mptcp/options.c | 12 ++++++------
2 files changed, 6 insertions(+), 7 deletions(-)
diff --git a/include/linux/tcp.h b/include/linux/tcp.h
index 184fc88a140b..a6d6438b27b1 100644
--- a/include/linux/tcp.h
+++ b/include/linux/tcp.h
@@ -107,7 +107,6 @@ struct tcp_options_received {
mp_join : 1,
dss : 1,
version : 4;
- u8 dss_flags;
u8 use_map:1,
dsn64:1,
data_fin:1,
diff --git a/net/mptcp/options.c b/net/mptcp/options.c
index c5c770d2e711..e69d56bfabc4 100644
--- a/net/mptcp/options.c
+++ b/net/mptcp/options.c
@@ -99,12 +99,12 @@ void mptcp_parse_option(const unsigned char *ptr, int opsize,
pr_debug("DSS");
ptr++;
- mp_opt->dss_flags = (*ptr++) & MPTCP_DSS_FLAG_MASK;
- mp_opt->data_fin = (mp_opt->dss_flags & MPTCP_DSS_DATA_FIN) != 0;
- mp_opt->dsn64 = (mp_opt->dss_flags & MPTCP_DSS_DSN64) != 0;
- mp_opt->use_map = (mp_opt->dss_flags & MPTCP_DSS_HAS_MAP) != 0;
- mp_opt->ack64 = (mp_opt->dss_flags & MPTCP_DSS_ACK64) != 0;
- mp_opt->use_ack = (mp_opt->dss_flags & MPTCP_DSS_HAS_ACK);
+ flags = (*ptr++) & MPTCP_DSS_FLAG_MASK;
+ mp_opt->data_fin = (flags & MPTCP_DSS_DATA_FIN) != 0;
+ mp_opt->dsn64 = (flags & MPTCP_DSS_DSN64) != 0;
+ mp_opt->use_map = (flags & MPTCP_DSS_HAS_MAP) != 0;
+ mp_opt->ack64 = (flags & MPTCP_DSS_ACK64) != 0;
+ mp_opt->use_ack = (flags & MPTCP_DSS_HAS_ACK);
pr_debug("data_fin=%d dsn64=%d use_map=%d ack64=%d use_ack=%d",
mp_opt->data_fin, mp_opt->dsn64,
--
2.17.2
2 years, 6 months
[PATCH] mptcp: Remove flags field from struct mptcp_options_received
by Peter Krystad
This field is only used temporarily while parsing options,
use a local variable instead.
Also move key fields to final position and fix indents.
Formatted to apply directly to Handle MPTCP TCP options.
squashto: Handle MPTCP TCP options
Signed-off-by: Peter Krystad <peter.krystad(a)linux.intel.com>
---
include/linux/tcp.h | 7 +++----
net/mptcp/options.c | 17 ++++++++---------
2 files changed, 11 insertions(+), 13 deletions(-)
diff --git a/include/linux/tcp.h b/include/linux/tcp.h
index da2a281e2c98..7c8a6f877900 100644
--- a/include/linux/tcp.h
+++ b/include/linux/tcp.h
@@ -97,13 +97,12 @@ struct tcp_options_received {
u16 mss_clamp; /* Maximal mss, negotiated at connection setup */
#if IS_ENABLED(CONFIG_MPTCP)
struct mptcp_options_received {
- u8 mp_capable : 1,
+ u64 sndr_key;
+ u64 rcvr_key;
+ u8 mp_capable : 1,
mp_join : 1,
dss : 1,
version : 4;
- u8 flags;
- u64 sndr_key;
- u64 rcvr_key;
} mptcp;
#endif
};
diff --git a/net/mptcp/options.c b/net/mptcp/options.c
index adf65d3ff27b..a3dc787e7408 100644
--- a/net/mptcp/options.c
+++ b/net/mptcp/options.c
@@ -14,6 +14,7 @@ void mptcp_parse_option(const unsigned char *ptr, int opsize,
{
struct mptcp_options_received *mp_opt = &opt_rx->mptcp;
u8 subtype = *ptr >> 4;
+ u8 flags;
switch (subtype) {
/* MPTCPOPT_MP_CAPABLE
@@ -31,9 +32,9 @@ void mptcp_parse_option(const unsigned char *ptr, int opsize,
if (mp_opt->version != 0)
break;
- mp_opt->flags = *ptr++;
- if (!((mp_opt->flags & MPTCP_CAP_FLAG_MASK) == MPTCP_CAP_HMAC_SHA1) ||
- (mp_opt->flags & MPTCP_CAP_EXTENSIBILITY))
+ flags = *ptr++;
+ if (!((flags & MPTCP_CAP_FLAG_MASK) == MPTCP_CAP_HMAC_SHA1) ||
+ (flags & MPTCP_CAP_EXTENSIBILITY))
break;
/* RFC 6824, Section 3.1:
@@ -49,7 +50,7 @@ void mptcp_parse_option(const unsigned char *ptr, int opsize,
*
* We don't implement DSS checksum - fall back to TCP.
*/
- if (mp_opt->flags & MPTCP_CAP_CHECKSUM_REQD)
+ if (flags & MPTCP_CAP_CHECKSUM_REQD)
break;
mp_opt->mp_capable = 1;
@@ -59,12 +60,10 @@ void mptcp_parse_option(const unsigned char *ptr, int opsize,
if (opsize == TCPOLEN_MPTCP_MPC_SYNACK) {
mp_opt->rcvr_key = get_unaligned_be64(ptr);
ptr += 8;
- pr_debug("MP_CAPABLE flags=%x, sndr=%llu, rcvr=%llu",
- mp_opt->flags, mp_opt->sndr_key,
- mp_opt->rcvr_key);
+ pr_debug("MP_CAPABLE sndr=%llu, rcvr=%llu",
+ mp_opt->sndr_key, mp_opt->rcvr_key);
} else {
- pr_debug("MP_CAPABLE flags=%x, sndr=%llu",
- mp_opt->flags, mp_opt->sndr_key);
+ pr_debug("MP_CAPABLE sndr=%llu", mp_opt->sndr_key);
}
break;
--
2.17.2
2 years, 6 months
Fwd: MPTCP implementation feedback for RFC6824bis
by Christoph Paasch
FYI, in case emails aren’t going through yet
Begin forwarded message:
> From: Alan Ford <alan.ford(a)gmail.com>
> Date: December 6, 2019 at 7:58:31 AM PST
> To: Christoph Paasch <cpaasch(a)apple.com>
> Cc: MultiPath TCP - IETF WG <multipathtcp(a)ietf.org>, Yoshifumi Nishida <nsd.ietf(a)gmail.com>, Philip Eardley <philip.eardley(a)bt.com>, Mirja Kuehlewind <ietf(a)kuehlewind.net>, mptcp Upstreaming <mptcp(a)lists.01.org>
> Subject: Re: MPTCP implementation feedback for RFC6824bis
>
> Hi all,
>
> Following on from the discussion of implementation feedback with Christoph, I propose the following edits to RFC6824bis - which is currently in AUTH48 - as clarifications.
>
> ADs, please can you confirm you consider these edits sufficiently editorial to fit into AUTH48.
>
> WG participants, please speak up if you have any concerns.
>
>
> Edit 1, clarifying reliability of MP_CAPABLE
>
> Change the sentence reading:
>
> The SYN with MP_CAPABLE occupies the first octet of data sequence space, although this does not need to be acknowledged at the connection level until the first data is sent (see Section 3.3).
>
> To:
>
> The SYN with MP_CAPABLE occupies the first octet of data sequence space, and this MUST be acknowledged at the connection level at or before the time the first data is sent or received (see Section 3.3).
>
>
> Change the sentence reading:
>
> If B has data to send first, then the reliable delivery of the ACK + MP_CAPABLE can be inferred by the receipt of this data with an MPTCP Data Sequence Signal (DSS) option (Section 3.3).
>
> To:
>
> If B has data to send first, then the reliable delivery of the ACK + MP_CAPABLE is ensured by the receipt of this data with an MPTCP Data Sequence Signal (DSS) option (Section 3.3) containing a DATA_ACK for the MP_CAPABLE (which is the first octet of the data sequence space).
>
>
> In my personal opinion either one of these edits would be sufficient for making the point, however clearly this has caused some confusion amongst the implementor community so making both these changes should make it absolutely clear as to the expected behaviour here.
>
>
> Edit 2, mapping constraint
>
> Change the sentence reading:
>
> A Data Sequence Mapping does not need to be included in every MPTCP packet, as long as the subflow sequence space in that packet is covered by a mapping known at the receiver.
>
> To:
>
> A Data Sequence Mapping MUST appear on a TCP segment which is covered by the mapping. It does not need to be included in every MPTCP packet, as long as the subflow sequence space in that packet is covered by a mapping known at the receiver.
>
>
> Best regards,
> Alan
>
>
2 years, 6 months
[PATCH v4] mptcp: fix option length of mp_capable syn/ack
by Davide Caratti
squashto: mptcp: Handle MPTCP TCP options
in current MPTCP, the server sends the client's key back in the syn/ack
packet. Because of that, both tcpdump and packetdrill refuse to parse
the mp_capable option in the syn/ack. Both RFC6824 and RFC6824bis don't
require this in section 3.1, only the server's key should be sent in
the syn/ack.
- change TCPOLEN_MPTCP_MPC_SYNACK from 20 (4+8+8) to 12 (4+8)
- don't write the client's key in mp_capable syn/ack
- adjust tests in mptcp_parse_option() to cope with the changed
value of TCPOLEN_MPTCP_MPC_SYNACK
3-way handshakes obtained with
# ./mptcp_connect.sh -4 -c -e tx
[...]
# tcpdump -c3 -tnnr ns1-5de04ec0-TGs4tf-ns1-5de04ec0-TGs4tf-MPTCP-MPTCP-10.0.1.1.pcap tcp
unpatched kernel:
IP 10.0.1.1.49752 > 10.0.1.1.10000: Flags [S], seq 2594718977, win 65535, options [mss 65495,sackOK,TS val 3944347269 ecr 0,nop,wscale 8,mptcp capable {0xf931b304deb39a42}], length 0
IP 10.0.1.1.10000 > 10.0.1.1.49752: Flags [S.], seq 3996234497, ack 2594718978, win 65535, options [mss 65495,sackOK,TS val 3944347270 ecr 3944347269,nop,wscale 8,mptcp capable[bad opt]>
^^^ bad option
IP 10.0.1.1.49752 > 10.0.1.1.10000: Flags [.], ack 1, win 256, options [nop,nop,TS val 3944347270 ecr 3944347270,mptcp capable {0xf931b304deb39a42,0x96614f185e633ce}], length 0
patched kernel:
IP 10.0.1.1.50850 > 10.0.1.1.10000: Flags [S], seq 626898452, win 65495, options [mss 65495,sackOK,TS val 2266878231 ecr 0,nop,wscale 7,mptcp capable {0x7fe37bfd872b9f9f}], length 0
IP 10.0.1.1.10000 > 10.0.1.1.50850: Flags [S.], seq 3966041155, ack 626898453, win 65483, options [mss 65495,sackOK,TS val 2266878232 ecr 2266878231,nop,wscale 7,mptcp capable {0x8a167e3af39b5fc1}], length 0
^^^ no more bad option
IP 10.0.1.1.50850 > 10.0.1.1.10000: Flags [.], ack 1, win 512, options [nop,nop,TS val 2266878232 ecr 2266878232,mptcp capable {0x7fe37bfd872b9f9f,0x8a167e3af39b5fc1}], length 0
v2: more conservative behavior when parsing received options
v3: adjust tests in mptcp_parse_option() to catch all possible
values of MP_CAPABLE option length specified by v0 protocol,
thanks Matthieu Baerts
v4: avoid copying the receiver key in mptcp_synack_options(),
thanks Peter Krystad
squashto: "mptcp: Handle MPTCP TCP options"
Fixes: 8fa520f034ed ("mptcp: Handle MPTCP TCP options")
Signed-off-by: Davide Caratti <dcaratti(a)redhat.com>
---
net/mptcp/options.c | 13 +++++--------
net/mptcp/protocol.h | 2 +-
2 files changed, 6 insertions(+), 9 deletions(-)
diff --git a/net/mptcp/options.c b/net/mptcp/options.c
index 98eb0281d196..b150e881ef7b 100644
--- a/net/mptcp/options.c
+++ b/net/mptcp/options.c
@@ -25,7 +25,7 @@ void mptcp_parse_option(const unsigned char *ptr, int opsize,
*/
case MPTCPOPT_MP_CAPABLE:
if (opsize != TCPOLEN_MPTCP_MPC_SYN &&
- opsize != TCPOLEN_MPTCP_MPC_SYNACK)
+ opsize != TCPOLEN_MPTCP_MPC_ACK)
break;
mp_opt->version = *ptr++ & MPTCP_VERSION_MASK;
@@ -57,7 +57,7 @@ void mptcp_parse_option(const unsigned char *ptr, int opsize,
mp_opt->sndr_key = get_unaligned_be64(ptr);
ptr += 8;
- if (opsize == TCPOLEN_MPTCP_MPC_SYNACK) {
+ if (opsize == TCPOLEN_MPTCP_MPC_ACK) {
mp_opt->rcvr_key = get_unaligned_be64(ptr);
ptr += 8;
pr_debug("MP_CAPABLE flags=%x, sndr=%llu, rcvr=%llu",
@@ -516,11 +516,9 @@ bool mptcp_synack_options(const struct request_sock *req, unsigned int *size,
if (subflow_req->mp_capable) {
opts->suboptions = OPTION_MPTCP_MPC_SYNACK;
opts->sndr_key = subflow_req->local_key;
- opts->rcvr_key = subflow_req->remote_key;
*size = TCPOLEN_MPTCP_MPC_SYNACK;
- pr_debug("req=%p, local_key=%llu, remote_key=%llu",
- subflow_req, subflow_req->local_key,
- subflow_req->remote_key);
+ pr_debug("req=%p, local_key=%llu",
+ subflow_req, subflow_req->local_key);
return true;
} else if (subflow_req->mp_join) {
opts->suboptions = OPTION_MPTCP_MPJ_SYNACK;
@@ -650,8 +648,7 @@ void mptcp_write_options(__be32 *ptr, struct mptcp_out_options *opts)
MPTCP_CAP_HMAC_SHA1);
put_unaligned_be64(opts->sndr_key, ptr);
ptr += 2;
- if ((OPTION_MPTCP_MPC_SYNACK |
- OPTION_MPTCP_MPC_ACK) & opts->suboptions) {
+ if (OPTION_MPTCP_MPC_ACK & opts->suboptions) {
put_unaligned_be64(opts->rcvr_key, ptr);
ptr += 2;
}
diff --git a/net/mptcp/protocol.h b/net/mptcp/protocol.h
index acad61c70de9..10347c9b4dff 100644
--- a/net/mptcp/protocol.h
+++ b/net/mptcp/protocol.h
@@ -34,7 +34,7 @@
/* MPTCP suboption lengths */
#define TCPOLEN_MPTCP_MPC_SYN 12
-#define TCPOLEN_MPTCP_MPC_SYNACK 20
+#define TCPOLEN_MPTCP_MPC_SYNACK 12
#define TCPOLEN_MPTCP_MPC_ACK 20
#define TCPOLEN_MPTCP_MPJ_SYN 12
#define TCPOLEN_MPTCP_MPJ_SYNACK 16
--
2.23.0
2 years, 6 months
[PATCH v3 0/3] sendmsg: fix pending issues.
by Paolo Abeni
This includes the change suggested by Mat. Bumped revision 3 to avoid
confusion with intermediated 2/3 v2.
The patch 1 should be inserted just after "tcp: clean ext on tx recycle"
The patches 2 and 3 have explicit individual squash-to tags, but will casue
a lot of conflicts, the uneasy ones to solve are on patch
"mptcp: rework mptcp_sendmsg_frag to accept optional dfrag"
A tree rebased with the above patches applied is available at:
https://github.com/pabeni/mptcp/tree/mptcp_net-next-FIN_v1_recvmsg_2
The original cover letter was as follow
---
This series addresses the known pending issue in mptcp_sendmsg,
specifically:
* the crash repored by matt (patch 3/3)
* the possible data corruption on ext allocation failure - the "TODO"" in
sendmsg_frag (patches 1,2/3)
uunfortunately none of the above is actually tested, since I can't
reproduce the issues - still the problems look real.
This is not squashed yet, as it will conflict with most patches, I'm just
trying to collect feedback.
Open points:
The patch 2/3 introduces a per socket cache of a single skb_ext. As per
recent upstream changes related to TCP rx/tx skb cache, that is debatable,
could have subtle performance and memory usage implications.
We could avoid such cache allways allocating the extension for each
sendmsg_frag() call, and than free it if unused. Both options looks
sub-optiomal to me do you have any preference? - I do have some for the
selected impl ;)
The new API exposed by patch 1/3 is open to misusage. e.g. the caller could
allocate a skb_ext, set arbitrary ext->offset[id] values and than call
__skb_ext_add(). That will lead to memory leak or GPF. I did no try to
address the issue, the '__' prefix should guide the users to extra care.
Can we do any better?
(this is a pre-existing topic) the sk_stream_wait_memory() calls in
sk_stream_wait_memory(), waits on the subflow. Should we wait on the mptcp
socket instead? Otherwise should we release the mptcp socket lock, before such
call?
Paolo Abeni (1):
skb: add helpers to allocate ext independently from sk_buff
mptcp: avoid data stream corruption on allocation failure.
mptcp: properly detect skb collapsing
include/linux/skbuff.h | 3 ++
net/core/skbuff.c | 35 ++++++++++++++++++++++++++++++--
net/mptcp/protocol.c | 53 +++++++++++++++++++++++++++++++------------------
net/mptcp/protocol.h | 1
4 files changed, 71 insertions(+), 21 deletions(-)
--
2.21.0
2 years, 6 months
[PATCH v3 0/3] mptcp: support for rfc6824 MP CAPABLE hanshake
by Christoph Paasch
Third iteration of the v1 MP_CAPABLE handshake, incorporating the
feedback we got from Peter & Matt.
The patches apply just before "mptcp: add basic kselftest for mptcp".
Christoph Paasch (2):
mptcp: parse and emit MP_CAPABLE option according to v1 spec.
mptcp: process MP_CAPABLE data option.
Paolo Abeni (1):
mptcp: move from sha1 (v0) to sha256 (v1)
include/linux/tcp.h | 3 +-
include/net/mptcp.h | 11 +-
net/ipv4/tcp_input.c | 2 +-
net/ipv4/tcp_output.c | 2 +-
net/mptcp/Kconfig | 10 ++
net/mptcp/crypto.c | 139 +++++++++++++++----------
net/mptcp/options.c | 232 ++++++++++++++++++++++++++++++++----------
net/mptcp/protocol.c | 22 ++--
net/mptcp/protocol.h | 20 ++--
net/mptcp/subflow.c | 43 +++++++-
10 files changed, 354 insertions(+), 130 deletions(-)
--
2.23.0
2 years, 6 months
[Weekly meetings] MoM - 5th of December 2019
by Matthieu Baerts
Hello,
Yesterday, we had our 78th meeting with Mat, Peter and Ossama (Intel
OTC), Christoph (Apple), Paolo, Florian and Davide (RedHat) and myself
(Tessares).
Thanks again for this new good meeting!
Here are the minutes of the meeting:
Accepted patches:
- The list of accepted patches can be seen on PatchWork:
https://patchwork.ozlabs.org/project/mptcp/list/?state=3
- By Mat Martineau
1199714 [v3] mptcp: Basic single-subflow DATA_FIN
Pending patches:
- The list of pending patches can be seen on PatchWork:
https://patchwork.ozlabs.org/project/mptcp/list/?state=*
- By Christoph Paasch, Davide Caratti, Florian Westphal, Matthieu
Baerts, Paolo Abeni, Peter Krystad
1194592: Changes Requested: [RFC,1/1] mptcp: Optimize struct mptcp_recei
1196109: RFC: [10/10,RFC] selftests:mptcp: decrease timeout to 100 sec
1202759: Awaiting Upstream: [0/9] mptcp: support mptcp joins
1202758: Awaiting Upstream: [1/9] token: handle join-before-accept-compl
1202760: Awaiting Upstream: [2/9] mptcp: fix race from mptcp_close/mptcp
1202761: Awaiting Upstream: [3/9] copy connection id from first subflow
1202762: Awaiting Upstream: [4/9] store tcp_sk, not socket
1202763: Awaiting Upstream: [5/9] add mptcp_subflow_shutdown
1202764: Awaiting Upstream: [6/9] make accept not allocate kernel socket
1202765: Awaiting Upstream: [7/9] pass subflow socket to mptcp_finish_co
1202766: Awaiting Upstream: [8/9] subflow: place further subflows on new
1203040: Changes Requested: [v3] mptcp: fix option length of mp_capable
1203224: Needs Review / ACK: [1/3] mptcp: move from sha1 (v0) to sha256
1203225: Needs Review / ACK: [2/3] mptcp: parse and emit MP_CAPABLE opti
1203227: Needs Review / ACK: [3/3] mptcp: process MP_CAPABLE data option
1203258: Awaiting Upstream: [v2,9/9] don't flag parent socket as RCV_SHU
1204201: Awaiting Upstream: [v3,1/3] skb: add helpers to allocate ext in
1204203: Awaiting Upstream: [v3,2/3] mptcp: avoid data stream corruption
1204204: Awaiting Upstream: [v3,3/3] mptcp: properly detect skb collapsi
FYI: Current Roadmap:
- Part 1 (mainly TCP changes, will be sent with Part 2):
- MAINTAINERS file [TODO]
- Part 2 (minimum set for MPTCP, up to KSelftests, one subflow):
- MPTCPv1 support [In review]
- opti in TCP options? [to be rebased]
- Send DATA_FIN, no corner cases [Done]
- IPv6 support [Done]
- if the peer never sends MPTCP-level ACK, a lot of memory will
be used [to be rebased but more important for part 3]
- Part 3 (after the KSelftests, to be sent ideally before the end
of the year)
- Full DATA_FIN support [WIP]
- Shared recv window (drop data received on other subflows) [TODO]
- Active backup support [WIP]
- Part 4:
- Shared recv window (full support)
- IPv6 - IPv4 mapped support
- not dropping MPTCP options (ADD_ADDR, etc.)
- FAST_CLOSE
- full MPTCP v1 support (reliable add_addr, etc.)
- Part 5:
- opti/perfs
- TFO
Items for the initial submission: Update:
- MPTCP v1 support:
- Review has been done, waiting for upstream
- *Matth*: connect on IRC in case of issue (on freenode)
- Paolo is reachable as 'pabeni'
- DATA_FIN:
- Done for part 2
- Mat has some fixes (receiver side) but that will be for Part 3
- optimisation of options in TCP "struct mptcp_options_received":
- Peter has shared a patch, to be rebased
- Matth will do the rebase
- MAINTAINERS file:
- Mat will share a patch + instructions about where to squash
it/them
What need to be done before sending the series to netdev:
- new patch:
- fallback handling should be cleaned-up a bit:
- *Paolo* is working on that.
- Might be difficult to have it for Monday, hopefully for
next week.
- But Paolo is suggesting to wait for it
- remove export of __tcp_poll:
- linked to the squash of some patches from Florian
applied at the end of the series
- Will be part of *Florian* 's rebase?
- MAINTAINERS file:
- *Mat* is working on it
- we need a new version for:
- Davide's fix for MPTCPv0 (address comments) → done during the
meeting (instead of following the meeting... :-D )
- apply and squashes all existing patches we have:
- Florian's patches that are currently at the end of the series
→ Florian will do that tomorrow, after Matth's modifications
- Florian's patches related to MP_JOINs (waiting to be applied)
→ Matth will do that (3)
- Paolo's patches related to sendmsg() (waiting to be applied)
→ Matth will do that (2)
- Christoph/Paolo's patches for MPTCPv1 (address comments) →
Matth will do that (1)
- Peter's patch to optimize mptcp_received_options (rebase +
split) → Matth will do that (4)
- fallback cleanup when ready (5)
Net-next:
- open for 8 weeks from likely Sunday
- But in these 8 weeks, there will be holidays (in
Europe/America/Australia, Sorry Asia)
- We should send our patches as soon as possible, ideally next week
if everything in good shape
- We need some real feedback on the code from Net/TCP maintainers.
Submissions:
- we want to send both part 1 and 2
- (they will certainly not accept to merge part 1 if they don't
have part 2)
- Idea from Paolo: create a part 3 with only MPTCPv1 patches:
- to have a smaller patch-set (part 2)
- Christoph can send this part 3
- *Mat* will draft a cover-letter and share it on the ML
Commit message:
- we have to make sure they are still OK, at least up to the kselftests
- *all* : Could be good if this is done by a maximum of people
- We can also propose patches modifying .topmsg file
→ git checkout t/<commit>, e.g. git checkout
t/mptcp-Add-path-manager-interface
→ $EDITOR .topmsg
→ send patch
Submission to netdev:
- we need to add the signed-off of the submitter (if not already
present I guess)
- We need to add Mat' signed-off up to kselftests then Christoph's
one on MPTCPv1 patches (included SHA256 support)
- *TODO* before the submission
Other WIP items:
- Active backup support:
- Florian is working on it, patches have been shared
- DATA_FIN:
- Mat is working on patch for part 3
- next step is sending the ACK
- after that: state machine for MPTCP
Force applications to use MPTCP without modifying the app:
- We can use LD_PRELOAD:
https://github.com/pabeni/mptcp-tools/blob/master/use_mptcp/use_mptcp.sh
- *all* : Please test it and report issues
- For the moment, some (most) apps don't work well because the
fallback is not done properly, Paolo is working on it)
- We could use BPF with CGroup, see ML (in some setups, it might be
difficult to use LD_PRELOAD)
- Question for next week: should we host this in script branch?
Tests:
- could be good to do some comparisons using a perf app (e.g. AB)
with and without MPTCP → *TODO*
RFC6824bis:
- Timing of submission relative to RFC6824bis official publication?:
- They are waiting for us, should converge in a few days
- The RFC is already allocated → 8684
- That should arrive soon:
https://www.rfc-editor.org/authors/rfc8684.html
- should we clarify what should be written in the "version" field
of the 3rd ACK of an MP_CAPABLE?:
- section 3.1: If a host supports multiple versions of MPTCP,
the sender of the MP_CAPABLE option SHOULD signal the highest version
number it supports. In return, in its MP_CAPABLE option, the receiver
will signal the version number it wishes to use, which MUST be equal to
or lower than the version number indicated in the initial MP_CAPABLE.
- Nothing about the 3rd ACK. Nothing for the other bits too
- *Christoph* will look at that
- A question about:
- "If receiving a message with the "B" flag set to 1 and this
is not understood, then the MP_CAPABLE in this SYN MUST be silently
ignored, which triggers a fallback to regular TCP"
- → it means we drop the MP_CAPABLE info but we don't drop the SYN
- what we are currently doing
mptcp.org:
- could be good to have v1
- also with SHA256 (note that the net-next implementation is a bit
different, using Crypto lib)
- Note: SHA1 was exactly like the mptcp.org implementation but
Paolo modified it to look like the Crypto implementation
Next meeting:
- We propose to have the next meeting on Thursday, the 12th of
December.
- Usual time: 17:00 UTC (9am PST, 6pm CET)
- Still open to everyone!
- https://annuel2.framapad.org/p/mptcp_upstreaming_20191212
Feel free to comment on these points and propose new ones for the next
meeting!
Talk to you next week,
Matt
--
Matthieu Baerts | R&D Engineer
matthieu.baerts(a)tessares.net
Tessares SA | Hybrid Access Solutions
www.tessares.net
1 Avenue Jean Monnet, 1348 Louvain-la-Neuve, Belgium
2 years, 6 months
Re: MPTCP implementation feedback for RFC6824bis
by Christoph Paasch
Hello Alan,
> On Dec 5, 2019, at 7:03 AM, Alan Ford <alan.ford(a)gmail.com> wrote:
>> On 5 Dec 2019, at 00:33, Christoph Paasch <cpaasch(a)apple.com <mailto:cpaasch@apple.com>> wrote:
>>> On Dec 4, 2019, at 1:50 PM, Alan Ford <alan.ford(a)gmail.com <mailto:alan.ford@gmail.com>> wrote:
>>> Thank you for the clarifications. I was revisiting the text to see ways to make these clarifications, however I find myself unsure of the need; please see comments below:
>>>
>>>
>>> Section 3.1 clarification
>>>
>>> Towards the end of Section 3.1 we actually say the following:
>>>
>>> The SYN with MP_CAPABLE occupies the first octet of data sequence space, although this does not need to be acknowledged at the connection level until the first data is sent (see Section 3.3).
>>>
>>> Which would seem to cover exactly this concern.
>>
>> I'm not sure how it covers the concern, because the issue I raise is that the client needs to somehow find out if the MP_CAPABLE made it to the receiver. And the clarification I suggest is to spell out that it is the DATA_ACK (see my next comment).
>>
>>> However if you still feel further clarification is required, then we could add this also to the sentence you suggest, i.e.:
>>>
>>> If B has data to send first, then the reliable delivery of the ACK + MP_CAPABLE can be inferred by the receipt of this data with an MPTCP Data Sequence Signal (DSS) option (Section 3.3).
>>>
>>
>> The ambiguity here is that DSS does not imply DATA_ACK. One could send a DSS without a DATA_ACK, but that would not signal to the client the reliable delivery of the MP_CAPABLE. Only the DATA_ACK will do so.
>
> OK well maybe I am not understanding your concern here. The first sentence I quote says that the MP_CAPABLE needs to be DATA_ACKed when the first data is sent.
Actually, strictly reading, the first sentence says that it must not be acknowledged until first data sent. It does not say that a MP_CAPABLE MUST be DATA_ACKed.
> The sentence beginning “If B has Data to send first” means:
>
> - A->B SYN
> - B->A SYN/ACK + MP_CAPABLE
> - A->B ACK + MP_CAPABLE
> - B->A ACK + DATA + DSS
>
> DSS can contain one or both of DSM and DATA_ACK. However the previously quoted sentence says that the MP_CAPABLE needs to be acknowledged at the connection-level (i.e. DATA_ACKed) when the first data is sent.
>
> However in writing this I realise that that text makes no comment about who is sending data. Because the intention is it applies to any data being sent (if B is sending, then it will include a DATA_ACK as well as a DSS; if A is sending then B will acknowledge with a DATA_ACK), but it could be read by someone to mean “only when acknowledging the first data does the MP_CAPABLE need to be included in the DATA_ACK”, which is not the desired meaning.
>
> I am all for improving clarity so I think a clearer version of “If B has data…” would go:
>
> If B has data to send first, then the reliable delivery of the ACK + MP_CAPABLE is ensured by the receipt of this data with an MPTCP Data Sequence Signal (DSS) option (Section 3.3) containing a DATA_ACK for the MP_CAPABLE (which is the first octet of the data sequence space).
>
> Would you be ok with that? It is shorter than the other edits whilst still being unambiguous.
This however does not say that the reception fo a DATA_ACK allows A to stop sending the MP_CAPABLE. I still think that it should be spelled out somewhere.
>
>>> Can change to:
>>>
>>> If B has data to send first, then the reliable delivery of the ACK + MP_CAPABLE can be inferred by the receipt of this data with an MPTCP Data Sequence Signal (DSS) option (Section 3.3) containing a DATA_ACK for the MP_CAPABLE (which is the first octet of the data sequence space). Furthermore, when A receives a DATA_ACK from B it is a signal of the reliable delivery of A's MP_CAPABLE.
>>>
>>> Please confirm if you still feel this is necessary, given the quote I provide.
>>
>> Yes, I still feel that this is necessary.
>>
>>>
>>>
>>> Early Mapping
>>>
>>> A Data Sequence Mapping does not need to be included in every MPTCP
>>> packet, as long as the subflow sequence space in that packet is
>>> covered by a mapping known at the receiver. This can be used to
>>> reduce overhead in cases where the mapping is known in advance. One
>>> such case is when there is a single subflow between the hosts, and
>>> another is when segments of data are scheduled in larger-than-packet-
>>> sized chunks.
>>>
>>> I would suggest simply adding a sentence at the beginning saying “A Data Sequence Mapping MUST appear on a TCP segment which is covered by the mapping”.
>>
>> Yes, adding that sentence would clarify this.
>>
>> I also would remove the sentence "This can be used to..." and "One such case is..." - it adds to the confusion IMO.
>
> I quite like to retain examples where possible; these seem harmless when combined with the clarification from the MUST, it ensures people behave!
Ok, fair enough!
Thanks,
Christoph
>
> Best regards,
> Alan
>
>>>
>>> Regarding late mapping, we say:
>>>
>>> Implementations MAY hold onto such unmapped data for a
>>> short while, in the expectation that a mapping will arrive shortly.
>>> Such unmapped data cannot be counted as being within the connection-
>>> level receive window because this is relative to the data sequence
>>> numbers, so if the receiver runs out of memory to hold this data, it
>>> will have to be discarded. If a mapping for that subflow-level
>>> sequence space does not arrive within a receive window of data, that
>>> subflow SHOULD be treated as broken, closed with a RST, and any
>>> unmapped data silently discarded.
>>>
>>> Note the last sentence. We already bound this suggestion to just a subflow receive window of data, and provide a mechanism to reject (RST and silently discard). If we add the above text re early mapping that would also apply in this case and provide your requirement that the mapping lands on a segment covered by the mapping.
>>
>> Yes, this last sentence, combined with the above-mentioned addition should make things fine.
>>
>>
>> Thanks,
>> Christoph
>>
>>>
>>> Any thoughts?
>>>
>>> Best regards,
>>> Alan
>>>
>>>> On 2 Dec 2019, at 17:27, Christoph Paasch <cpaasch(a)apple.com <mailto:cpaasch@apple.com>> wrote:
>>>>
>>>> Hello Alan,
>>>>
>>>> On 29/11/19 - 21:13:38, Alan Ford wrote:
>>>>>> On 28 Nov 2019, at 19:49, Christoph Paasch <cpaasch(a)apple.com <mailto:cpaasch@apple.com>> wrote:
>>>>>>> On Nov 28, 2019, at 8:16 AM, Alan Ford <alan.ford(a)gmail.com <mailto:alan.ford@gmail.com> <mailto:alan.ford@gmail.com <mailto:alan.ford@gmail.com>>> wrote:
>>>>>>>> On 27 Nov 2019, at 19:29, Christoph Paasch <cpaasch(a)apple.com <mailto:cpaasch@apple.com> <mailto:cpaasch@apple.com <mailto:cpaasch@apple.com>>> wrote:
>>>>>>>> Section 3.3.1, page 32 & 33, "A data sequence mapping does not need..."
>>>>>>>>
>>>>>>>> This paragraph states that it is permissive to send a mapping in advance. Late-mapping is specified a bit higher around the sentence
>>>>>>>> "Implementations MAY hold onto such unmapped data for a short while in the expectation that a mapping will arrive shortly"
>>>>>>>>
>>>>>>>> This kind of early/late mapping announcements are difficult to handle in an implementation. The Linux Kernel implementation of multipath-tcp.org <http://multipath-tcp.org/> <http://multipath-tcp.org/ <http://multipath-tcp.org/>> has always disallowed such kind of mappings. Meaning, whenever a DSS-option is received such that the range specified by the relative subflow-sequence number in the DSS-option and the data-length does not (partially) cover the TCP sequence number of the packet itself, the subflow will be killed with a TCP-RST. The problem around handling such early/late mappings is that it is unclear for how long the stack needs to remember these mappings (in the early-mapping case), or for how long he needs to hold on to the data (in the late-mapping case).
>>>>>>>>
>>>>>>>> We thus suggest to change this to the following:
>>>>>>>> Whenever a DSS-option is received on a packet such that the mapping of the subflow-sequence space does not partially cover the TCP-sequence number of the packet itself, the host MUST discard this mapping and MAY destroy the subflow with a TCP-RST. It should be noted that a DATA_FIN that does not come with data has a relative subflow-sequence number of 0 and thus should be handled separately.
>>>>>>>
>>>>>>> This one I do have an issue with:
>>>>>>>
>>>>>>> - It is a technical change
>>>>>>> - Wording to this effect has been in the document since pretty much the beginning
>>>>>>> - It is a MAY which might as well say “there is no guarantee this would work”
>>>>>>
>>>>>> The problem with the MAY is that the sender can't really know if the receiver accepts it (more regarding this below)
>>>>>>
>>>>>>> Most importantly, the replacement text seems not to address this issue at all. If I read it correctly, it says that the data sequence mapping option MUST partially cover the subflow sequence space of the packet itself. But that has nothing to do with late or early mapping, both could partially cover the subflow sequence space and preceding data.
>>>>>>>
>>>>>>> Can you clarify exactly what you want to permit and prevent, here?
>>>>>>
>>>>>> Let me try to clarify what exactly we mean with early/late mapping so that we are all on the same terms here:
>>>>>>
>>>>>> Early mapping:
>>>>>>
>>>>>> A TCP-segment with sequence-number 1 holds a DSS-option with subflow-sequence number 1001 and data-length 100. This means we need to allocate space to store this DSS-option so that when the TCP-segment with seqno 1001 arrives we can know the mapping. There may be coming more of these DSS-options which all need to be stored in allocated memory. It is unclear what the limit to this is and there is no way to communicate this limit to the sender.
>>>>>
>>>>> I don’t think we have ever intended to support a mapping like this. If the text is not clear here then yes, we might have an issue.
>>>>
>>>> Yes, I do think that the text is not very clear on that - we should clarify
>>>> that.
>>>>
>>>>> We intend only to support: A TCP segment with sequence-number 1 holds a DSS option for SSN 1 and length 10000 (so multiple segments in the future).
>>>>
>>>> Sounds good!
>>>>
>>>>> Or, slightly more convoluted, a TCP segment with SN 100, length 100, which holds a DSS option for 151-250, and bytes 50-150 were already covered by a previous mapping.
>>>>
>>>> Also good.
>>>>
>>>>> I do not believe we have ever intended mappings on data segments where the segments do not include any of the mapped data. We did, however, intend to support mappings on pure ACKs in order to avoid any option space limits.
>>>>
>>>> Mappings on a pure ACK are an unlikely use-case. The problem is that in the
>>>> end the mapping needs to reliably make it to the receiver as otherwise he
>>>> needs to throw the data away. Combining it with data implicitly makes it
>>>> reliable.
>>>>
>>>> In case of option-space limits, it is better to send the options that do
>>>> consume a lot of space (ADD_ADDR,...) on the pure ACK.
>>>>
>>>>>
>>>>>> Late mapping:
>>>>>>
>>>>>> The receiver receives data without DSS-options with TCP-sequence 1 to 1001. The corresponding DSS-option however arrives with the TCP-segment with seqno 2001. Here, the receiver needs to hold on to this data, waiting for the TCP-segment with the DSS-option. At one point the receiver needs to drop the data due to memory limits. Again, the sender has no way for knowing what this limit is.
>>>>>
>>>>> So this is slightly more problematic, and this is what the MAY in the text is designed to discuss.
>>>>>
>>>>> I believe we had intended to support a situation where you could have a segment 1-1000 without a DSS option, and 1001-2000 also without, and then 2001-3000 with the option then providing a mapping for 1 to 2001 or higher.
>>>>
>>>> I think this scenario is fine to some extend with a slight change that the
>>>> provided mapping should be for 1 to 2002 or higher (thus, including the byte
>>>> of the 2001-3000 segment).
>>>>
>>>>> This would allow a sender to start pushing out data before it knew what a mapping might look like.
>>>>> It does, however, seem an unlikely situation but you as a receiver could of course reduce the subflow window size in order to limit buffering. The text does recognise the memory issue and point out that this won’t be DATA ACKed and as such a sender should soon realise and retransmit on a separate subflow and then this subflow may eventually be closed.
>>>>
>>>> Yes, it seems like an unlikerly situation and I'm not sure about the
>>>> use-case for doing this.
>>>>
>>>>> Another situation would involve two mappings on the same TCP segment. If the first 50 bytes of a segment are covered by the mapping provided in that segment, but there are then also another 50 bytes, then the mapping can’t be provided until the next segment.
>>>>
>>>> Yes, two mappings on a single packet are problematic simply because of the
>>>> lack of option-space.
>>>>
>>>>> I guess my initial thought here is that this was intended to cover a number of corner cases but if it does not work for you then there are several compliant ways of dealing with it.
>>>>>
>>>>>> When the DSS-option comes together with the corresponding TCP-sequence it is straight-forward to store it together with the data. There are no issues with memory-allocation,... as all of this is accounted together with the announced window (yes, the memory is not counted against the window, but the receiver can foresee the DSS-option overhead when computing what window he should announce).
>>>>>>
>>>>>> When a receiver gets data without a DSS-option, he can store it for up to 64KB of data as that is the maximum data-level length and the last segment of the 64KB-train could be holding the DSS-option. After that he has to drop the data.
>>>>>>
>>>>>> When the mapping partially covers the segment it also isn't a problem as the unmapped part can safely be dropped and the mapped part can be passed on to the MPTCP-layer.
>>>>>>
>>>>>> All of this does not imply that every segment of a mapping needs to hold the DSS-option. Just one of them needs to have it.
>>>>>
>>>>> That last statement aligns with what was intended in the text. But are you really saying that, or are you saying that the first one needs to have it? Because that would be a change.
>>>>
>>>> No, the first one does not need to have it. Just one of them.
>>>>
>>>>
>>>> Cheers,
>>>> Christoph
>
2 years, 6 months