[MPTCP][PATCH v7 mptcp-next 0/5] Add REMOVE_ADDR support
by Geliang Tang
v7:
- add RM_ADDR mib
- update RM_ADDR selftests test case
v6:
- rename lookup_anno_list_by_saddr to remove_anno_list_by_saddr as
Paolo suggested.
- add msk socket lock when traverse msk->conn_list as Paolo suggested.
- Since the first three patches in v5 have been merged to export
branch, drop them from this patchset.
- add remove addr and subflow selftest test case.
- this patchset is against mptcp_net-next's export branch.
v5:
- merge mptcp_nl_remove_subflow() and mptcp_nl_remove_addr()
- add cond_resched
- reduce the indentation level in mptcp_pm_nl_rm_addr_received
v4:
- update mptcp_subflow_shutdown()'s args.
- add rm_id check to make sure we don't shutdown the first subflow.
- add conn_list empty check.
- move anno_list to mptcp_pm_data.
- add a new patch 'mptcp: add remove subflow support'.
v3:
- fix memory leak and lock issue in v2.
- drop alist in v2.
- fix mptcp_subflow_shutdown's arguments.
- bzero remote in mptcp_pm_create_subflow_or_signal_addr.
- add more commit message.
Geliang Tang (5):
mptcp: remove addr and subflow in PM netlink
mptcp: add RM_ADDR mib
selftests: mptcp: drop first flag in do_rnd_write
selftests: mptcp: add remove addr and subflow test case
selftests: mptcp: add RM_ADDR mib check function
net/mptcp/mib.c | 1 +
net/mptcp/mib.h | 1 +
net/mptcp/options.c | 2 +
net/mptcp/pm.c | 7 +-
net/mptcp/pm_netlink.c | 87 ++++++++++++++++++-
net/mptcp/protocol.c | 2 +
net/mptcp/protocol.h | 2 +
.../selftests/net/mptcp/mptcp_connect.c | 7 +-
.../testing/selftests/net/mptcp/mptcp_join.sh | 67 ++++++++++++++
9 files changed, 166 insertions(+), 10 deletions(-)
--
2.17.1
1 year, 8 months
[MPTCP][PATCH v8 mptcp-next 0/8] Add REMOVE_ADDR support
by Geliang Tang
v8:
- drop anno_list in v7.
We don't need to add a new list, conn_list is enough for the signal
address and local subflow.
- fix local_id and remote_id issues.
The RM_ADDR logic uses an address id to identify the removing
address, so we must make sure the subflow's local_id and remote_id be
set properly.
- fix mptcp_pm_nl_rm_addr_received logic issue
- update selftests
v7:
- add RM_ADDR mib
- update RM_ADDR selftests test case
v6:
- rename lookup_anno_list_by_saddr to remove_anno_list_by_saddr as
Paolo suggested.
- add msk socket lock when traverse msk->conn_list as Paolo suggested.
- Since the first three patches in v5 have been merged to export
branch, drop them from this patchset.
- add remove addr and subflow selftest test case.
- this patchset is against mptcp_net-next's export branch.
v5:
- merge mptcp_nl_remove_subflow() and mptcp_nl_remove_addr()
- add cond_resched
- reduce the indentation level in mptcp_pm_nl_rm_addr_received
v4:
- update mptcp_subflow_shutdown()'s args.
- add rm_id check to make sure we don't shutdown the first subflow.
- add conn_list empty check.
- move anno_list to mptcp_pm_data.
- add a new patch 'mptcp: add remove subflow support'.
v3:
- fix memory leak and lock issue in v2.
- drop alist in v2.
- fix mptcp_subflow_shutdown's arguments.
- bzero remote in mptcp_pm_create_subflow_or_signal_addr.
- add more commit message.
Geliang Tang (8):
mptcp: remove addr and subflow in PM netlink
mptcp: fix mptcp_pm_nl_rm_addr_received logic issue
mptcp: implementing mptcp_pm_remove_subflow
mptcp: fix every subflow's local_id is zero
mptcp: fix subflow's remote_id is zero issue
mptcp: add RM_ADDR related mibs
selftests: mptcp: add remove cfg in mptcp_connect
selftests: mptcp: add remove addr and subflow test cases
net/mptcp/mib.c | 2 +
net/mptcp/mib.h | 2 +
net/mptcp/pm.c | 25 ++-
net/mptcp/pm_netlink.c | 105 +++++++++++--
net/mptcp/protocol.c | 4 +
net/mptcp/protocol.h | 5 +-
net/mptcp/subflow.c | 8 +-
.../selftests/net/mptcp/mptcp_connect.c | 18 ++-
.../testing/selftests/net/mptcp/mptcp_join.sh | 142 +++++++++++++++++-
9 files changed, 288 insertions(+), 23 deletions(-)
--
2.17.1
1 year, 8 months
[MPTCP][PATCH v7 mptcp-next 0/4] add ADD_ADDR echo flag support
by Geliang Tang
v7:
- add accept_subflow re-check
v6:
- add "drop re-check mechanism for PM's flags"
v5:
- move READ_ONCE(msk->pm.add_addr_echo) into mptcp_pm_add_addr_signal
- move mptcp_pm_announce_addr into mptcp_pm_nl_add_addr_received and
mptcp_pm_add_addr_received
v4:
- Just updated some log messages in mptcp_join.sh
v3:
- move add_addr_echo into mptcp_pm_announce_addr()
- check return value of mptcp_pm_add_add_received
- hold lock for add_addr_echo writing
v2:
- add ADD_ADDR mibs
- add selftests for ADD_ADDR
v1:
- mptcp: send out ADD_ADDR with echo flag
Geliang Tang (4):
mptcp: send out ADD_ADDR with echo flag
mptcp: add ADD_ADDR related mibs
selftests: mptcp: add ADD_ADDR mibs check function
mptcp: add accept_subflow re-check
net/mptcp/mib.c | 2 +
net/mptcp/mib.h | 2 +
net/mptcp/options.c | 34 +++++++++-----
net/mptcp/pm.c | 28 ++++++------
net/mptcp/pm_netlink.c | 4 +-
net/mptcp/protocol.h | 6 ++-
.../testing/selftests/net/mptcp/mptcp_join.sh | 44 +++++++++++++++++++
7 files changed, 92 insertions(+), 28 deletions(-)
--
2.17.1
1 year, 8 months
[PATCH net-next] tcp: propagate MPTCP skb extensions on xmit splits
by Paolo Abeni
When the TCP stack splits a packet on the write queue, the tail
half currently lose the associated skb extensions, and will not
carry the DSM on the wire.
The above does not cause functional problems and is allowed by
the RFC, but interact badly with GRO and RX coalescing, as possible
candidates for aggregation will carry different TCP options.
This change tries to improve the MPTCP behavior, propagating the
skb estensions on split.
Additionally, we must prevent the MPTCP stack from updating the
mapping after the split occour: that will both violate the RFC and
fool the reader.
Signed-off-by: Paolo Abeni <pabeni(a)redhat.com>
---
include/net/mptcp.h | 21 ++++++++++++++++++++-
net/ipv4/tcp_output.c | 3 +++
net/mptcp/protocol.c | 7 +++++--
3 files changed, 28 insertions(+), 3 deletions(-)
diff --git a/include/net/mptcp.h b/include/net/mptcp.h
index 3525d2822abe..fbf0849632cb 100644
--- a/include/net/mptcp.h
+++ b/include/net/mptcp.h
@@ -29,7 +29,8 @@ struct mptcp_ext {
use_ack:1,
ack64:1,
mpc_map:1,
- __unused:2;
+ frozen:1,
+ __unused:1;
/* one byte hole */
};
@@ -107,6 +108,19 @@ static inline void mptcp_skb_ext_move(struct sk_buff *to,
from->active_extensions = 0;
}
+static inline void mptcp_skb_ext_copy(struct sk_buff *to,
+ struct sk_buff *from)
+{
+ struct mptcp_ext *from_ext;
+
+ from_ext = skb_ext_find(from, SKB_EXT_MPTCP);
+ if (!from_ext)
+ return;
+
+ skb_ext_copy(to, from);
+ from_ext->frozen = 1;
+}
+
static inline bool mptcp_ext_matches(const struct mptcp_ext *to_ext,
const struct mptcp_ext *from_ext)
{
@@ -195,6 +209,11 @@ static inline void mptcp_skb_ext_move(struct sk_buff *to,
{
}
+static inline void mptcp_skb_ext_copy(struct sk_buff *to,
+ struct sk_buff *from)
+{
+}
+
static inline bool mptcp_skb_can_collapse(const struct sk_buff *to,
const struct sk_buff *from)
{
diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c
index 85ff417bda7f..c66f5bd5f64f 100644
--- a/net/ipv4/tcp_output.c
+++ b/net/ipv4/tcp_output.c
@@ -1411,6 +1411,7 @@ int tcp_fragment(struct sock *sk, enum tcp_queue tcp_queue,
if (!buff)
return -ENOMEM; /* We'll just try again later. */
skb_copy_decrypted(buff, skb);
+ mptcp_skb_ext_copy(buff, skb);
sk_wmem_queued_add(sk, buff->truesize);
sk_mem_charge(sk, buff->truesize);
@@ -1966,6 +1967,7 @@ static int tso_fragment(struct sock *sk, struct sk_buff *skb, unsigned int len,
if (unlikely(!buff))
return -ENOMEM;
skb_copy_decrypted(buff, skb);
+ mptcp_skb_ext_copy(buff, skb);
sk_wmem_queued_add(sk, buff->truesize);
sk_mem_charge(sk, buff->truesize);
@@ -2236,6 +2238,7 @@ static int tcp_mtu_probe(struct sock *sk)
skb = tcp_send_head(sk);
skb_copy_decrypted(nskb, skb);
+ mptcp_skb_ext_copy(nskb, skb);
TCP_SKB_CB(nskb)->seq = TCP_SKB_CB(skb)->seq;
TCP_SKB_CB(nskb)->end_seq = TCP_SKB_CB(skb)->seq + probe_size;
diff --git a/net/mptcp/protocol.c b/net/mptcp/protocol.c
index 1fd80f8c7baa..198dcd7b2bd6 100644
--- a/net/mptcp/protocol.c
+++ b/net/mptcp/protocol.c
@@ -752,8 +752,11 @@ static bool mptcp_skb_can_collapse_to(u64 write_seq,
if (!tcp_skb_can_collapse_to(skb))
return false;
- /* can collapse only if MPTCP level sequence is in order */
- return mpext && mpext->data_seq + mpext->data_len == write_seq;
+ /* can collapse only if MPTCP level sequence is in order and this
+ * mapping has not been xmitted yet
+ */
+ return mpext && mpext->data_seq + mpext->data_len == write_seq &&
+ !mpext->frozen;
}
static bool mptcp_frag_can_collapse_to(const struct mptcp_sock *msk,
--
2.26.2
1 year, 8 months
[PATCH net-next] mptcp: Remove unused macro MPTCP_SAME_STATE
by YueHaibing
There is no caller in tree any more.
Signed-off-by: YueHaibing <yuehaibing(a)huawei.com>
---
net/mptcp/protocol.c | 2 --
1 file changed, 2 deletions(-)
diff --git a/net/mptcp/protocol.c b/net/mptcp/protocol.c
index 1aad411a0e46..e6216c4f308c 100644
--- a/net/mptcp/protocol.c
+++ b/net/mptcp/protocol.c
@@ -24,8 +24,6 @@
#include "protocol.h"
#include "mib.h"
-#define MPTCP_SAME_STATE TCP_MAX_STATES
-
#if IS_ENABLED(CONFIG_MPTCP_IPV6)
struct mptcp6_sock {
struct mptcp_sock msk;
--
2.17.1
1 year, 8 months
fix for most poll selftest timeouts
by Florian Westphal
Turns out that almost all of the 'poll timeout' test failures are related
to subflow->writable getting stale. Its false even though the subflow is
writeable, i.e. userspace is not woken up after socket can accept new data.
Rather than fixing the race that leads to the information becoming stale
Paolo suggested to just remove the 'writable' caching.
I've done this by editing the commit that introduced it
("mptcp: rethink 'is writable' conditional").
While doing so I also squashed sk_stream_is_writeable() and removed
"mptcp: fix stale subflow->writeable caching" as its obsolete by the
removal of this struct member.
Only remaining occurence of poll timeouts seem to be related to
mptcp-level fin not being delivered/processed, investigation is ongoing.
I've pushed the resulting branch here:
https://git.breakpoint.cc/cgit/fw/mptcp_net-next.git/log/?h=export_rebase_5
which can also be pulled via
git://git.breakpoint.cc/fw/mptcp_net-next.git export_rebase_5
Alternatively, edit "mptcp: rethink 'is writable' conditional" and remove
"bool writable" from mptcp_subflow_ctx struct, then fix up the resulting
merge conflicts.
After this mmap tests still pass and it takes about 50 normal poll
self-test runs before the first timeout is seen, on average.
(before change, its less than 10).
Expected delta to current export branch is:
diff --git a/net/mptcp/protocol.c b/net/mptcp/protocol.c
--- a/net/mptcp/protocol.c
+++ b/net/mptcp/protocol.c
@@ -787,7 +787,7 @@ static bool mptcp_is_writeable(struct mptcp_sock *msk)
return false;
mptcp_for_each_subflow(msk, subflow) {
- if (READ_ONCE(subflow->writable))
+ if (sk_stream_is_writeable(subflow->tcp_sock))
return true;
}
return false;
@@ -1121,7 +1121,7 @@ static struct sock *mptcp_subflow_get_send(struct mptcp_sock *msk,
if (next_backup || next_ssk)
continue;
- free = READ_ONCE(subflow->writable);
+ free = sk_stream_is_writeable(subflow->tcp_sock);
if (!free)
continue;
@@ -1237,8 +1237,6 @@ static int mptcp_sendmsg(struct sock *sk, struct msghdr *msg, size_t len)
lock_sock(ssk);
tx_ok = msg_data_left(msg);
while (tx_ok) {
- struct mptcp_subflow_context *subflow = mptcp_subflow_ctx(ssk);
-
ret = mptcp_sendmsg_frag(sk, ssk, msg, NULL, &timeo, &mss_now,
&size_goal);
if (ret < 0) {
@@ -1256,14 +1254,11 @@ static int mptcp_sendmsg(struct sock *sk, struct msghdr *msg, size_t len)
msk->snd_burst -= ret;
copied += ret;
- if (!sk_stream_is_writeable(ssk))
- WRITE_ONCE(subflow->writable, false);
-
tx_ok = msg_data_left(msg);
if (!tx_ok)
break;
- if (!subflow->writable ||
+ if (!sk_stream_memory_free(ssk) ||
!mptcp_page_frag_refill(ssk, pfrag) ||
!mptcp_ext_cache_refill(msk)) {
tcp_push(ssk, msg->msg_flags, mss_now,
@@ -1311,9 +1306,6 @@ static int mptcp_sendmsg(struct sock *sk, struct msghdr *msg, size_t len)
/* start the timer, if it's not pending */
if (!mptcp_timer_pending(sk))
mptcp_reset_timer(sk);
-
- if (!sk_stream_is_writeable(ssk))
- WRITE_ONCE(mptcp_subflow_ctx(ssk)->writable, false);
}
release_sock(ssk);
diff --git a/net/mptcp/protocol.h b/net/mptcp/protocol.h
--- a/net/mptcp/protocol.h
+++ b/net/mptcp/protocol.h
@@ -311,7 +311,6 @@ struct mptcp_subflow_context {
use_64bit_ack : 1, /* Set when we received a 64-bit DSN */
can_ack : 1; /* only after processing the remote a key */
enum mptcp_data_avail data_avail;
- bool writable;
u32 remote_nonce;
u64 thmac;
u32 local_nonce;
diff --git a/net/mptcp/subflow.c b/net/mptcp/subflow.c
--- a/net/mptcp/subflow.c
+++ b/net/mptcp/subflow.c
@@ -977,7 +977,6 @@ static void subflow_write_space(struct sock *sk)
if (!sk_stream_is_writeable(sk))
return;
- WRITE_ONCE(subflow->writable, true);
if (sk_stream_is_writeable(parent)) {
set_bit(MPTCP_SEND_SPACE, &mptcp_sk(parent)->flags);
smp_mb__after_atomic();
@@ -1206,7 +1205,6 @@ static struct mptcp_subflow_context *subflow_create_ctx(struct sock *sk,
rcu_assign_pointer(icsk->icsk_ulp_data, ctx);
INIT_LIST_HEAD(&ctx->node);
- WRITE_ONCE(ctx->writable, true);
pr_debug("subflow=%p", ctx);
--
Florian Westphal <fw(a)strlen.de>
4096R/AD5FF600 2015-09-13
Key fingerprint = 80A9 20C5 B203 E069 F586 AE9F 7091 A8D9 AD5F F600
Phone: +49 151 11132303
1 year, 8 months
[Weekly meetings] MoM - 27th of August 2020
by Matthieu Baerts
Hello everyone,
Last Thursday, we had our 113th meeting with Mat and Ossama (Intel OTC),
Christoph (Apple), Paolo, Davide and Florian (RedHat), Geliang (Xiaomi),
Nicolas and myself (Tessares).
Thanks again for this new good meeting!
Here are the minutes of the meeting:
Accepted patches:
- The list of accepted patches can be seen on PatchWork:
https://patchwork.ozlabs.org/project/mptcp/list/?state=3
netdev (if mptcp ML is in cc) (Florian):
A patch to fix mptcp-connect.sh with "mmap"?
our repo (by: /):
/
Pending patches:
- The list of pending patches can be seen on PatchWork:
https://patchwork.ozlabs.org/project/mptcp/list/?state=*
netdev (if mptcp ML is in cc) (by: Nicolas Rybowski):
1349344 [bpf-next,3/3] bpf: add 'bpf_mptcp_sock' structure and helper
1349341 [bpf-next,2/3] mptcp: attach subflow socket to parent cgroup
our repo (by: Florian Westphal, Geliang Tang, Matthieu Baerts,
Paolo Abeni):
1325171: Changes Requested: [mptcp-next] selftests: mptcp: interpret \n
as a new line:
- TODO Matth
1348371: Awaiting Upstream: [mptcp-next,v3] mptcp: adjust mptcp receive
buffer limit if subflow has larger one
1349244: Awaiting Upstream: [mptcp-next] mptcp: only wake parent if
subflow is writable
1349247: Awaiting Upstream: [mptcp-next] mptcp: free acked data before
waiting for more memory
1349374: Awaiting Upstream: [mptcp-next,1/1] mptcp: fix stale
subflow->writeable caching:
- All 4: To add in the export branch
- Maybe we want to squash some of them
- To add: reviewed-by from Mat
1349790: Changes Requested: [mptcp-next] mptcp: keep track of receivers
advertised window:
- Some discussions on-going with Mat and Florian to follow the RFC
- Sounds a bit more complicated than expected
- Need to check if often reach the max of the window
1350294: New: [v8,mptcp-next,1/8] mptcp: remove addr and subflow in PM
netlink
1350295: New: [v8,mptcp-next,2/8] mptcp: fix
mptcp_pm_nl_rm_addr_received logic issue
1350299: New: [v8,mptcp-next,3/8] mptcp: implementing
mptcp_pm_remove_subflow
1350300: New: [v8,mptcp-next,4/8] mptcp: fix every subflow's local_id is
zero
1350305: New: [v8,mptcp-next,5/8] mptcp: fix subflow's remote_id is zero
issue
1350312: New: [v8,mptcp-next,6/8] mptcp: add RM_ADDR related mibs
1350317: New: [v8,mptcp-next,7/8] selftests: mptcp: add remove cfg in
mptcp_connect
1350320: New: [v8,mptcp-next,8/8] selftests: mptcp: add remove addr and
subflow test cases:
- A lot of modification since last Paolo's review
- Geliang would like Paolo to look at it
- Paolo will try to look at it
1352233: New: [v6,mptcp-next,1/4] mptcp: drop re-check mechanism for
PM's flags
1352235: New: [v6,mptcp-next,2/4] mptcp: send out ADD_ADDR with echo flag
1352238: New: [v6,mptcp-next,3/4] mptcp: add ADD_ADDR related mibs
1352240: New: [v6,mptcp-next,4/4] selftests: mptcp: add ADD_ADDR mibs
check function:
- Mat will look at the latest version
1352466: New: [v2] Squash-to: "mptcp: cleanup mptcp_subflow_discard_data()":
- v2 fixing an issue reported by Mat
- Mat will look at the v2
Issues on Github:
https://github.com/multipath-tcp/mptcp_net-next/issues/
Recently opened (latest from last week: 80)
82 improve packet scheduler for asymmetric links [bug] [enhancement]
@pabeni :
- technically it is an enhancement
- but can be seen as a bug :)
- selftests should pass currently but this is needed
81 packetdrill detects mismatching 'B' flag in mp_join_client.pkt
[bug] @dcaratti :
- needed to support multiple concurrent subflows
- we would need to check what we would like to do:
- extend iproute to set the backup flag (per interface? other?)
- other ways?
- the pull request for packetdrill is already available
Bugs (opened, flagged as "bug" and assigned)
82 improve packet scheduler for asymmetric links [bug] [enhancement]
@pabeni
81 packetdrill detects mismatching 'B' flag in mp_join_client.pkt
[bug] @dcaratti
69 Packetdrill: dss: failing with new DATA_FIN patches [bug] @dcaratti :
- a PR has already been created
- another one is coming (fixing comments from the previous PR)
Bugs (opened and flagged as "bug" and not assigned)
73 Simultaneous xmit issue [bug]
71 [interop] Bad mapping: ssn=16194949 map_seq=16120693
map_data_len=31416 [bug] [interop]
70 [syzkaller] WARNING in mptcp_reset_timer [bug] [syzkaller]
67 `./mptcp_connect.sh -m mmap` test blocks [bug] :
- Already fixed by Florian
- TODO Matth: add in export branch + mark as fixed → done
65 clearing properly the status in listen() [bug]
62 [syzkaller] WARNING in __mptcp_move_skbs_from_subflow [bug]
[syzkaller]
56 msk connection state set without msk lock [bug]
*@Paolo* :
- Report the issue discussed at the meeting related to "mptcp: keep
track of receivers advertised window" patch
In Progress (opened and assigned)
55 ADD_ADDR: IPv6 support [enhancement] @geliangtang
50 REMOVE_ADDR support [enhancement] @geliangtang
49 ADD_ADDR: echo bit support [enhancement] @geliangtang :
- see patches above
43 [syzkaller] Change syzkaller to exercise MPTCP inet_diag
interface [enhancement] [syzkaller] @cpaasch :
- /
17 allow non 'backup' subflows creation [enhancement] @pabeni :
- can be closed
- TODO: Matth: done
Recently closed (since last week)
None.
FYI: Current Roadmap:
- Bugs: https://github.com/multipath-tcp/mptcp_net-next/projects/2
- Current merge window (5.10):
https://github.com/multipath-tcp/mptcp_net-next/projects/5
- For later: https://github.com/multipath-tcp/mptcp_net-next/projects/4
Extra tests:
- news about Syzkaller? (Christoph):
- /
- testing the latest export branch
- news about interop with mptcp.org? (Christoph):
- /
- news about Intel's kbuild? (Mat):
- still no new emails
- some progress seem have been done
- Mat will continue to monitor that and report issue related to
MPTCP if any
- packetdrill (Davide):
- see discussion issue 81
- Davide is seeing the same failure: MP_JOIN, both client and
server
- TODO: Matth: check if there are issues with the CI when mmap
test is fixed
- @Davide: It seems all packetdrill tests are having issues
(timeout)
- CI (Matth):
- blocked because of the timeout with mmap
- should be fixed soon thx to Florian's patch
Netdev:
- feedbacks:
- lot of good questions
- positive feedbacks!
- some questions about alternatives and other but in general it
went well
- when video, slides, others are available, Mat will share them.
Last week of Nicolas as an Intern at Tessares:
- Working on a v2 for the patch sent upstream
- Had fun with BPF!
- Clearer view on what we would do next with BPF.
Poll issue:
- Paolo is looking at fixing that
- quite complex
- makes selftests green again :)
multiple xmits:
- Paolo would like to send it "soon" to get feedback sooner
Data_fin:
- Mat is continuing looking at comments that were recently reported
and linked to DATA_FIN
Next meeting:
- We propose to have the next meeting on Thursday, the 3rd of
September.
- Usual time: 15:00 UTC (8am PDT, 5pm CEST, 11pm UTC+8)
- Still open to everyone!
- https://annuel2.framapad.org/p/mptcp_upstreaming_20200903
Feel free to comment on these points and propose new ones for the next
meeting!
Talk to you next week,
Matt
--
Tessares | Belgium | Hybrid Access Solutions
www.tessares.net
1 year, 8 months
[PATCH mptcp-next 1/1] mptcp: fix stale subflow->writeable caching
by Florian Westphal
Even with fixed subflow_write_space(), we may still indicate EPOLLOUT
even if no subflow is writeable.
We need to re-check the last ssk that was used after the final tcp_push().
tcp_push() can allocate new skbs that get charged to the subflow sk
which may bring it above the wmem limit.
Signed-off-by: Florian Westphal <fw(a)strlen.de>
---
net/mptcp/protocol.c | 5 ++++-
1 file changed, 4 insertions(+), 1 deletion(-)
diff --git a/net/mptcp/protocol.c b/net/mptcp/protocol.c
index ad91f4588216..4dd5d35a8f39 100644
--- a/net/mptcp/protocol.c
+++ b/net/mptcp/protocol.c
@@ -1239,7 +1239,7 @@ static int mptcp_sendmsg(struct sock *sk, struct msghdr *msg, size_t len)
msk->snd_burst -= ret;
copied += ret;
- if (!sk_stream_memory_free(ssk))
+ if (!sk_stream_is_writeable(ssk))
WRITE_ONCE(subflow->writable, false);
tx_ok = msg_data_left(msg);
@@ -1294,6 +1294,9 @@ static int mptcp_sendmsg(struct sock *sk, struct msghdr *msg, size_t len)
/* start the timer, if it's not pending */
if (!mptcp_timer_pending(sk))
mptcp_reset_timer(sk);
+
+ if (!sk_stream_is_writeable(ssk))
+ WRITE_ONCE(mptcp_subflow_ctx(ssk)->writable, false);
}
release_sock(ssk);
--
2.26.2
1 year, 8 months
[PATCH mptcp-next] mptcp: free acked data before waiting for more memory
by Florian Westphal
After subflow lock is dropped, more wmem might have been made available.
This fixes a deadlock in mptcp_connect.sh 'mmap' mode: wmem is exhausted.
But as the mptcp socket holds on to already-acked data (for retransmit)
no wakeup will occur.
Using 'goto restart' calls mptcp_clean_una(sk) which will free pages
that have been acked completely in the mean time.
Fixes: fb529e62d3f3 ("mptcp: break and restart in case mptcp sndbuf is full")
Signed-off-by: Florian Westphal <fw(a)strlen.de>
---
I suspect Paolos recent sndbuf tracking changes made this bug trigger more
frequently. Afaics net-next is just more lucky so it might make sense to
pass this via net-next. Let me know.
net/mptcp/protocol.c | 3 +--
1 file changed, 1 insertion(+), 2 deletions(-)
diff --git a/net/mptcp/protocol.c b/net/mptcp/protocol.c
index bd02c568c4b9..ad91f4588216 100644
--- a/net/mptcp/protocol.c
+++ b/net/mptcp/protocol.c
@@ -1178,7 +1178,6 @@ static int mptcp_sendmsg(struct sock *sk, struct msghdr *msg, size_t len)
goto out;
}
-wait_for_sndbuf:
__mptcp_flush_join_list(msk);
ssk = mptcp_subflow_get_send(msk, &sndbuf);
while (!sk_stream_memory_free(sk) ||
@@ -1282,7 +1281,7 @@ static int mptcp_sendmsg(struct sock *sk, struct msghdr *msg, size_t len)
*/
mptcp_set_timeout(sk, ssk);
release_sock(ssk);
- goto wait_for_sndbuf;
+ goto restart;
}
}
}
--
2.26.2
1 year, 8 months
[PATCH mptcp-next] mptcp: only wake parent if subflow is writable
by Florian Westphal
mptcp_connect.sh -d 10 -l 0 -r 0 -e "" -f $((100 * 1024 * 1024)) -m poll
will result in frequent write errors:
ns1 MPTCP -> ns3 (10.0.2.2:10010 ) MPTCP write: Resource temporarily unavailable
We promised write() would not block but then no subflow was ready
to accept more data.
Changing the 'free' check to 'writeable' (which checks the available
space is above min thresh) makes those errors go away.
Note that there are still poll timeouts:
ns1 MPTCP -> ns2 (10.0.1.2:10006 ) MPTCP copyfd_io_poll: poll timed out (events: POLLIN 1, POLLOUT 0)
copyfd_io_poll: poll timed out (events: POLLIN 0, POLLOUT 4)
This appears unrelated, they also appear without this change.
Signed-off-by: Florian Westphal <fw(a)strlen.de>
---
net/mptcp/subflow.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/net/mptcp/subflow.c b/net/mptcp/subflow.c
index 82f6d2d9e39e..3585fd63c757 100644
--- a/net/mptcp/subflow.c
+++ b/net/mptcp/subflow.c
@@ -972,7 +972,7 @@ static void subflow_write_space(struct sock *sk)
struct mptcp_subflow_context *subflow = mptcp_subflow_ctx(sk);
struct sock *parent = subflow->conn;
- if (!sk_stream_memory_free(sk))
+ if (!sk_stream_is_writeable(sk))
return;
WRITE_ONCE(subflow->writable, true);
--
2.26.2
1 year, 8 months