[RFC mptcp-next] tcp: mptcp: use mptcp receive buffer space to select rcv window
by Florian Westphal
In MPTCP, the receive windo is shared across all subflows, because it
refers to the mptcp-level sequence space.
This commit doesn't change choice of initial window for passive or active
connections.
While it would be possible to change those as well, this adds complexity
(especially when handling MP_JOIN requests).
However, the MPTCP RFC specifically says that a MPTCP sender
'MUST NOT use the RCV.WND field of a TCP segment at the connection level if
it does not also carry a DSS option with a Data ACK field.'
SYN/SYNACK packets do not carry a DSS option with a Data ACK field.
CC: Eric Dumazet <edumazet(a)google.com>
Signed-off-by: Florian Westphal <fw(a)strlen.de>
---
This patch would add additional direct call in __tcp_select_window().
I looked at mptcp option writing to check if it could be done there but
that seemed worse.
include/net/mptcp.h | 3 +++
net/ipv4/tcp_output.c | 5 +++++
net/mptcp/subflow.c | 17 +++++++++++++++++
3 files changed, 25 insertions(+)
diff --git a/include/net/mptcp.h b/include/net/mptcp.h
index 7489f9267640..1ef4520f45c3 100644
--- a/include/net/mptcp.h
+++ b/include/net/mptcp.h
@@ -66,6 +66,8 @@ static inline bool rsk_is_mptcp(const struct request_sock *req)
return tcp_rsk(req)->is_mptcp;
}
+void mptcp_space(const struct sock *ssk, int *space, int *full_space);
+
void mptcp_parse_option(const struct sk_buff *skb, const unsigned char *ptr,
int opsize, struct tcp_options_received *opt_rx);
bool mptcp_syn_options(struct sock *sk, const struct sk_buff *skb,
@@ -195,6 +197,7 @@ static inline bool mptcp_sk_is_subflow(const struct sock *sk)
return false;
}
+static inline void mptcp_space(const struct sock *ssk, int *s, int *fs) { }
static inline void mptcp_seq_show(struct seq_file *seq) { }
#endif /* CONFIG_MPTCP */
diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c
index 306e25d743e8..1a829536a115 100644
--- a/net/ipv4/tcp_output.c
+++ b/net/ipv4/tcp_output.c
@@ -2771,6 +2771,11 @@ u32 __tcp_select_window(struct sock *sk)
int full_space = min_t(int, tp->window_clamp, allowed_space);
int window;
+ if (sk_is_mptcp(sk)) {
+ mptcp_space(sk, &free_space, &allowed_space);
+ full_space = min_t(int, tp->window_clamp, allowed_space);
+ }
+
if (unlikely(mss > full_space)) {
mss = full_space;
if (mss <= 0)
diff --git a/net/mptcp/subflow.c b/net/mptcp/subflow.c
index 40ad7995b13b..aefcbb8bb737 100644
--- a/net/mptcp/subflow.c
+++ b/net/mptcp/subflow.c
@@ -745,6 +745,23 @@ bool mptcp_subflow_data_available(struct sock *sk)
return subflow->data_avail;
}
+/* If ssk has an mptcp parent socket, use the mptcp rcvbuf occupancy,
+ * not the ssk one.
+ *
+ * In mptcp, rwin is about the mptcp-level connection data.
+ *
+ * Data that is still on the ssk rx queue can thus be ignored,
+ * as far as mptcp peer is concerened that data is still inflight.
+ */
+void mptcp_space(const struct sock *ssk, int *space, int *full_space)
+{
+ const struct mptcp_subflow_context *subflow = mptcp_subflow_ctx(ssk);
+ const struct sock *sk = READ_ONCE(subflow->conn);
+
+ *space = tcp_space(sk);
+ *full_space = tcp_full_space(sk);
+}
+
static void subflow_data_ready(struct sock *sk)
{
struct mptcp_subflow_context *subflow = mptcp_subflow_ctx(sk);
--
2.24.1
2 years, 3 months
[PATCH v5 0/5] Add support for v1 of ADD_ADDR option
by Peter Krystad
Support parsing and writing the new option format, with hmac generation
and validation.
Support for echoing ADD_ADDR still to come.
v2: check msg len in mptcp_crypto_hmac_sha().
remove obsolete casts
v3: Rebase and split for squashing, patches ready to be applied.
v4: Get the first two patches of the set right
v5: plug hole in mptcp_options_received
Peter Krystad (5):
mptcp: Re-factor mptcp_crypto_hmac_sha()
mptcp: v1 ADD_ADDR changes: options and parsing
mptcp: v1 ADD_ADDR changes: add_addr_hmac_valid
mptcp: v1 ADD_ADDR changes: add subflow_generate_hmac()
mptcp: v1 ADD_ADDR changes: use subflow_generate_hmac()
include/linux/tcp.h | 19 ++++---
include/net/mptcp.h | 2 +
net/mptcp/crypto.c | 17 ++++--
net/mptcp/options.c | 132 +++++++++++++++++++++++++++++++++----------
net/mptcp/protocol.h | 19 +++++--
5 files changed, 140 insertions(+), 49 deletions(-)
--
2.17.2
2 years, 3 months
rare MPTCP self-test failure
by Paolo Abeni
Hi all,
sharing here some results alreay presented on IRC.
I see a poll() timeout on the client side while running MP_JOIN self-
tests in loop. That happens quite rarely (once every ~500 self-tests
runs), like this:
---
copyfd_io_poll: poll timed out (events: POLLIN 1, POLLOUT 0)
client exit code 2, server 0
\nnetns ns1-0-pjF9LD socket stat for 10000:
State Recv-Q Send-Q Local
Address:Port Peer
Address:Port
CLOSE-WAIT 1 0 10.0.1.1:10000 10.0.3.2:41501
ts sack cubic wscale:7,7 rto:215 rtt:15/7.5 ato:40 mss:1448 rcvmss:536 advmss:1448 cwnd:10 bytes_received:1 segs_out:1 segs_in:3 send 7.7Mbps lastsnd:30699 lastrcv:30702 lastack:30078 pacing_rate 15.4Mbps rcv_space:14600 minrtt:15
\nnetns ns2-0-pjF9LD socket stat for 10000:
State Recv-Q Send-Q Local Address:Port Peer Address:Port
FIN-WAIT-2 0 0 10.0.3.2:41501 10.0.1.1:10000 timer:(timewait,59sec,0)
TIME-WAIT 0 0 10.0.1.2:35014 10.0.1.1:10000 timer:(timewait,29sec,0)
single subflow syn[ ok ] - synack[ ok ] - ack[ ok ]
---
e.g. the MP_JOIN handshake is successful, but we miss a state change on
the msk client socket.
The pcap captures shows that the server _is not_ sending the TCP fin
for the MP_JOIN subflow. Due to the current mptcp_check_for_eof() hack,
that causes the missing state transition (see the attached pcap
capture).
Full DATA FIN support will addresses the above for good, but I was
wondering why the server don't send that FIN pkt?!?
[much later the above self-question, after digging more]
It looks like we need an additional __mptcp_flush_join_list() in
mptcp_shutdown(). I'll test a patch ASAP.
Cheers,
Paolo
2 years, 3 months
[PATCH net-next] mptcp: move msk state update to subflow_syn_recv_sock()
by Paolo Abeni
After commit 58b09919626b ("mptcp: create msk early"), the
msk socket is already available at subflow_syn_recv_sock()
time. Let's move there the state update, to mirror more
closely the first subflow state.
The above will also help multiple subflow supports.
Signed-off-by: Paolo Abeni <pabeni(a)redhat.com>
---
The idea is send this one to net-next asap, [well] before
part 3
---
net/mptcp/protocol.c | 9 +++------
net/mptcp/subflow.c | 2 ++
2 files changed, 5 insertions(+), 6 deletions(-)
diff --git a/net/mptcp/protocol.c b/net/mptcp/protocol.c
index 04c3caed92df..e959104832ef 100644
--- a/net/mptcp/protocol.c
+++ b/net/mptcp/protocol.c
@@ -861,6 +861,9 @@ struct sock *mptcp_sk_clone(const struct sock *sk, struct request_sock *req)
ack_seq++;
msk->ack_seq = ack_seq;
}
+
+ /* will be fully established after successful MPC subflow creation */
+ inet_sk_state_store(nsk, TCP_SYN_RECV);
bh_unlock_sock(nsk);
/* keep a single reference */
@@ -916,10 +919,6 @@ static struct sock *mptcp_accept(struct sock *sk, int flags, int *err,
mptcp_copy_inaddrs(newsk, ssk);
list_add(&subflow->node, &msk->conn_list);
- /* will be fully established at mptcp_stream_accept()
- * completion.
- */
- inet_sk_state_store(new_mptcp_sock, TCP_SYN_RECV);
bh_unlock_sock(new_mptcp_sock);
local_bh_enable();
}
@@ -1256,8 +1255,6 @@ static int mptcp_stream_accept(struct socket *sock, struct socket *newsock,
if (!ssk->sk_socket)
mptcp_sock_graft(ssk, newsock);
}
-
- inet_sk_state_store(newsock->sk, TCP_ESTABLISHED);
}
sock_put(ssock->sk);
diff --git a/net/mptcp/subflow.c b/net/mptcp/subflow.c
index 8434c7f5f712..052d72a1d3a2 100644
--- a/net/mptcp/subflow.c
+++ b/net/mptcp/subflow.c
@@ -234,6 +234,8 @@ static struct sock *subflow_syn_recv_sock(const struct sock *sk,
/* new mpc subflow takes ownership of the newly
* created mptcp socket
*/
+ inet_sk_state_store((struct sock *)new_msk,
+ TCP_ESTABLISHED);
ctx->conn = new_msk;
new_msk = NULL;
}
--
2.21.1
2 years, 3 months
[PATCH v4 0/5] Add support for v1 of ADD_ADDR option
by Peter Krystad
Support parsing and writing the new option format, with hmac generation
and validation.
Support for echoing ADD_ADDR still to come.
v2: check msg len in mptcp_crypto_hmac_sha().
remove obsolete casts
v3: Rebase and split for squashing, patches ready to be applied.
v4: Get the first two patches of the set right
Peter Krystad (5):
mptcp: Re-factor mptcp_crypto_hmac_sha()
mptcp: v1 ADD_ADDR changes: options and parsing
mptcp: v1 ADD_ADDR changes: add_addr_hmac_valid
mptcp: v1 ADD_ADDR changes: add subflow_generate_hmac()
mptcp: v1 ADD_ADDR changes: use subflow_generate_hmac()
include/linux/tcp.h | 19 ++++---
include/net/mptcp.h | 2 +
net/mptcp/crypto.c | 17 ++++--
net/mptcp/options.c | 132 +++++++++++++++++++++++++++++++++----------
net/mptcp/protocol.h | 19 +++++--
5 files changed, 140 insertions(+), 49 deletions(-)
--
2.17.2
2 years, 3 months
[PATCH] tcp: mptcp: use mptcp receive buffer space to select rcv window
by Florian Westphal
In MPTCP, the receive windo is shared across all subflows, because it
refers to the mptcp-level sequence space.
This commit doesn't change choice of initial window for passive or active
connections.
While it would be possible to change those as well, this adds complexity
(especially when handling MP_JOIN requests).
However, the MPTCP RFC specifically says that a MPTCP sender
'MUST NOT use the RCV.WND field of a TCP segment at the connection level if
it does not also carry a DSS option with a Data ACK field.'
SYN/SYNACK packets do not carry a DSS option with a Data ACK field.
Signed-off-by: Florian Westphal <fw(a)strlen.de>
---
Unless there are comments i will post this as RFC to netdev and will
CC Eric Dumazet.
include/net/mptcp.h | 13 +++++++++++++
net/ipv4/tcp_output.c | 6 ++++++
net/mptcp/subflow.c | 24 ++++++++++++++++++++++++
3 files changed, 43 insertions(+)
diff --git a/include/net/mptcp.h b/include/net/mptcp.h
index 7489f9267640..611a4d959470 100644
--- a/include/net/mptcp.h
+++ b/include/net/mptcp.h
@@ -66,6 +66,9 @@ static inline bool rsk_is_mptcp(const struct request_sock *req)
return tcp_rsk(req)->is_mptcp;
}
+int mptcp_full_space(const struct sock *ssk);
+int mptcp_space(const struct sock *ssk);
+
void mptcp_parse_option(const struct sk_buff *skb, const unsigned char *ptr,
int opsize, struct tcp_options_received *opt_rx);
bool mptcp_syn_options(struct sock *sk, const struct sk_buff *skb,
@@ -195,6 +198,16 @@ static inline bool mptcp_sk_is_subflow(const struct sock *sk)
return false;
}
+static inline int mptcp_space(struct sock *ssk)
+{
+ return 0;
+}
+
+static inline int mptcp_full_space(struct sock *ssk)
+{
+ return 0;
+}
+
static inline void mptcp_seq_show(struct seq_file *seq) { }
#endif /* CONFIG_MPTCP */
diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c
index 306e25d743e8..b786da1db76a 100644
--- a/net/ipv4/tcp_output.c
+++ b/net/ipv4/tcp_output.c
@@ -2771,6 +2771,12 @@ u32 __tcp_select_window(struct sock *sk)
int full_space = min_t(int, tp->window_clamp, allowed_space);
int window;
+ if (sk_is_mptcp(sk)) {
+ allowed_space = mptcp_full_space(sk);
+ free_space = mptcp_space(sk);
+ full_space = min_t(int, tp->window_clamp, allowed_space);
+ }
+
if (unlikely(mss > full_space)) {
mss = full_space;
if (mss <= 0)
diff --git a/net/mptcp/subflow.c b/net/mptcp/subflow.c
index f9fe60599f79..52d76af75b7f 100644
--- a/net/mptcp/subflow.c
+++ b/net/mptcp/subflow.c
@@ -738,6 +738,30 @@ bool mptcp_subflow_data_available(struct sock *sk)
return subflow->data_avail;
}
+/* If ssk has an mptcp parent socket, use the mptcp rcvbuf occupancy,
+ * not the ssk one.
+ *
+ * In mptcp, rwin is about the mptcp-level connection data.
+ *
+ * Data that is still on the ssk rx queue can thus be ignored,
+ * as far as mptcp peer is concerened that data is still inflight.
+ */
+int mptcp_full_space(const struct sock *ssk)
+{
+ const struct mptcp_subflow_context *subflow = mptcp_subflow_ctx(ssk);
+ const struct sock *sk = READ_ONCE(subflow->conn);
+
+ return sk ? tcp_full_space(sk) : tcp_full_space(ssk);
+}
+
+int mptcp_space(const struct sock *ssk)
+{
+ const struct mptcp_subflow_context *subflow = mptcp_subflow_ctx(ssk);
+ const struct sock *sk = READ_ONCE(subflow->conn);
+
+ return sk ? tcp_space(sk) : tcp_space(ssk);
+}
+
static void subflow_data_ready(struct sock *sk)
{
struct mptcp_subflow_context *subflow = mptcp_subflow_ctx(sk);
--
2.24.1
2 years, 3 months
[GIT] Sync with net-next on 20200316: conflicts
by Matthieu Baerts
Hello,
Recently, MPTCP-related patches from Paolo have been applied in
'net-next' repo:
- 58b09919626b (mptcp: create msk early)
- dc093db5cc05 (mptcp: drop unneeded checks)
This created conflicts:
- cf1a8046cf64: conflict in t/mptcp-Add-path-manager-interface
- 09497b43baab: conflict in
t/mptcp-Add-handling-of-incoming-MP_JOIN-requests
- 08b1bc302164: conflict in
t/mptcp-Add-handling-of-incoming-MP_JOIN-requests - part 2
- 5d58ead3f82c: conflict in
t/mptcp-Add-handling-of-outgoing-MP_JOIN-requests
- 2737f5aa7672: conflict in
t/mptcp-update-per-unacked-sequence-on-pkt-reception
- 2d24b3dc7d8a: conflict in t/mptcp-increment-MIB-counters-in-a-few-places
I hope they have been resolved as expected.
Tests are in progress. The "export" branch will be updated after the tests.
Cheers,
Matt
--
Matthieu Baerts | R&D Engineer
matthieu.baerts(a)tessares.net
Tessares SA | Hybrid Access Solutions
www.tessares.net
1 Avenue Jean Monnet, 1348 Louvain-la-Neuve, Belgium
2 years, 3 months
[PATCH] mptcp: copy mptcp receive buffer setting to subflow
by Florian Westphal
For incoming/outgoing connections, initial window is chosen based om
the subflow receive buffer size.
In case mptcp socket has a fixed buffer size, we should copy it to the
subflow/tcp socket as well.
Signed-off-by: Florian Westphal <fw(a)strlen.de>
---
I'll pass this to netdev@ unless there are objections.
net/mptcp/protocol.c | 10 ++++++++++
1 file changed, 10 insertions(+)
diff --git a/net/mptcp/protocol.c b/net/mptcp/protocol.c
index ddfd02ab6e31..a65cedcb905f 100644
--- a/net/mptcp/protocol.c
+++ b/net/mptcp/protocol.c
@@ -89,6 +89,15 @@ static bool __mptcp_can_create_subflow(const struct mptcp_sock *msk)
return !msk->first;
}
+static void sync_rcvbuf(const struct sock *msk, struct sock *ssk)
+{
+ if (unlikely((msk->sk_userlocks & SOCK_RCVBUF_LOCK) &&
+ ssk->sk_rcvbuf != msk->sk_rcvbuf)) {
+ ssk->sk_userlocks |= SOCK_RCVBUF_LOCK;
+ WRITE_ONCE(ssk->sk_rcvbuf, msk->sk_rcvbuf);
+ }
+}
+
static struct socket *__mptcp_socket_create(struct mptcp_sock *msk, int state)
{
struct mptcp_subflow_context *subflow;
@@ -107,6 +116,7 @@ static struct socket *__mptcp_socket_create(struct mptcp_sock *msk, int state)
if (err)
return ERR_PTR(err);
+ sync_rcvbuf(sk, ssock->sk);
msk->first = ssock->sk;
msk->subflow = ssock;
subflow = mptcp_subflow_ctx(ssock->sk);
--
2.24.1
2 years, 3 months
[PATCH mptcp] mptcp: re-check dsn before reading from subflow
by Florian Westphal
mptcp_subflow_data_available() is commonly called via
ssk->sk_data_ready(), in this case the mptcp socket lock
cannot be acquired.
Therefore, while we can safely discard subflow data that
was already received up to msk->ack_seq, we cannot be sure
that 'subflow->data_avail' will still be valid at the time
userspace wants to read the data -- a previous read on a
different subflow might have carried this data already.
In that (unlikely) event, msk->ack_seq will have been updated
and will be ahead of the subflow dsn.
We can check for this condition and skip/resync to the expected
sequence number.
Signed-off-by: Florian Westphal <fw(a)strlen.de>
---
I could also submit this directly for net-next, but this
patch is only needed w. MP_JOIN support.
net/mptcp/protocol.c | 26 ++++++++++++++++++++++++++
1 file changed, 26 insertions(+)
diff --git a/net/mptcp/protocol.c b/net/mptcp/protocol.c
index 53a2b0ba2241..8f2fc72fc5ed 100644
--- a/net/mptcp/protocol.c
+++ b/net/mptcp/protocol.c
@@ -121,6 +121,27 @@ static void __mptcp_move_skb(struct mptcp_sock *msk, struct sock *ssk,
MPTCP_SKB_CB(skb)->offset = offset;
}
+/* both sockets must be locked */
+static bool mptcp_subflow_dsn_valid(const struct mptcp_sock *msk,
+ struct sock *ssk)
+{
+ struct mptcp_subflow_context *subflow = mptcp_subflow_ctx(ssk);
+ u64 dsn = mptcp_subflow_get_mapped_dsn(subflow);
+
+ /* revalidate data sequence number.
+ *
+ * mptcp_subflow_data_available() is usually called
+ * without msk lock. Its unlikely (but possible)
+ * that msk->ack_seq has been advanced since the last
+ * call found in-sequence data.
+ */
+ if (likely(dsn == msk->ack_seq))
+ return true;
+
+ subflow->data_avail = 0;
+ return mptcp_subflow_data_available(ssk);
+}
+
static bool __mptcp_move_skbs_from_subflow(struct mptcp_sock *msk,
struct sock *ssk,
unsigned int *bytes)
@@ -133,6 +154,11 @@ static bool __mptcp_move_skbs_from_subflow(struct mptcp_sock *msk,
bool done = false;
int rcvbuf;
+ if (!mptcp_subflow_dsn_valid(msk, ssk)) {
+ *bytes = 0;
+ return false;
+ }
+
rcvbuf = max(ssk->sk_rcvbuf, sk->sk_rcvbuf);
if (rcvbuf > sk->sk_rcvbuf)
sk->sk_rcvbuf = rcvbuf;
--
2.24.1
2 years, 3 months