Userspace Path Manager Draft
by Stephen Brennan
Hello everyone,
I did not sit in on the meeting since I am feeling sick, so I haven't had a
chance to see whether this is actually a useful contribution. However, over
the past week I've been tinkering with the userspace Generic Netlink
libraries, getting a prototype of what a userspace path manager framework
could look like.
My current idea is built so that it could support multiple path managers,
implemented as shared libraries, which are dynamically loaded at runtime.
There would need to be some policy or logic to decide which connections are
assigned to which path manager; I haven't fleshed out what that looks like.
Without an exact API reference, I simply based the API on what Ossama
proposed last week. To test, I created a dummy kernel path manager which
sends empty Netlink messages. It seems to work so far.
I don't know if those working on the Netlink API have been developing
userspace tooling in parallel, but if not, maybe this will be useful. You
can find it here:
https://github.com/brenns10/pathmand
Hope this is useful!
Stephen
3 years, 1 month
[RFC 0/1] MPTCP patch requested by Peter Krystad
by rao.shoaib@oracle.com
From: Rao Shoaib <rao.shoaib(a)oracle.com>
This patch has been created at the request of Peter Krystad. This is not a submission, nor code review. The code does not meet Linux coding style standards. It does work and has been tested against latest MPTCP. This is based on mptcp_v0.91.
I will be happy to answer any questions. Please report any issues.
I do have plans to clean it up and make more modifications.
Rao Shoaib (1):
MPTCP changes that work with the modified network code
include/net/mptcp.h | 343 +++++++--------
include/net/mptcp_v4.h | 14 +-
include/net/mptcp_v6.h | 13 -
net/mptcp/mptcp_ctrl.c | 600 +++++++++++++++++++-------
net/mptcp/mptcp_fullmesh.c | 20 +-
net/mptcp/mptcp_input.c | 996 ++++++++++++++++++++++++++++++++------------
net/mptcp/mptcp_ipv4.c | 232 ++---------
net/mptcp/mptcp_ipv6.c | 288 ++-----------
net/mptcp/mptcp_ofo_queue.c | 306 +++++---------
net/mptcp/mptcp_output.c | 475 +++++++++++----------
net/mptcp/mptcp_redundant.c | 6 +-
net/mptcp/mptcp_rr.c | 4 +-
net/mptcp/mptcp_sched.c | 55 ++-
13 files changed, 1793 insertions(+), 1559 deletions(-)
--
2.7.4
3 years, 1 month
[RFC 0/1] MPTCP Changes
by rao.shoaib@oracle.com
From: Rao Shoaib <rao.shoaib(a)oracle.com>
This patch includes MPTCP code that works in conjuction with the previously posted TCP patches.
This is a prototype. IPv4, IPv6 and join has been tested. Join was tested using ndiff ports. Join to a non listener socket is currently not supported.
The original code had a lot of style issues, some have been fixed but there are quite a few remaining
Rao Shoaib (1):
Add MPTCP code to work with the modified TCP code
include/net/mptcp.h | 1474 +++++++++++++++++++++
include/net/mptcp_v4.h | 56 +
include/net/mptcp_v6.h | 56 +
include/net/netns/mptcp.h | 52 +
net/Kconfig | 1 +
net/Makefile | 1 +
net/mptcp/Kconfig | 129 ++
net/mptcp/Makefile | 22 +
net/mptcp/mptcp_balia.c | 267 ++++
net/mptcp/mptcp_binder.c | 487 +++++++
net/mptcp/mptcp_coupled.c | 270 ++++
net/mptcp/mptcp_ctrl.c | 2981 ++++++++++++++++++++++++++++++++++++++++++
net/mptcp/mptcp_fullmesh.c | 1877 ++++++++++++++++++++++++++
net/mptcp/mptcp_input.c | 2970 +++++++++++++++++++++++++++++++++++++++++
net/mptcp/mptcp_ipv4.c | 345 +++++
net/mptcp/mptcp_ipv6.c | 316 +++++
net/mptcp/mptcp_ndiffports.c | 169 +++
net/mptcp/mptcp_ofo_queue.c | 177 +++
net/mptcp/mptcp_olia.c | 309 +++++
net/mptcp/mptcp_output.c | 1837 ++++++++++++++++++++++++++
net/mptcp/mptcp_pm.c | 178 +++
net/mptcp/mptcp_redundant.c | 268 ++++
net/mptcp/mptcp_rr.c | 301 +++++
net/mptcp/mptcp_sched.c | 597 +++++++++
net/mptcp/mptcp_wvegas.c | 268 ++++
25 files changed, 15408 insertions(+)
create mode 100644 include/net/mptcp.h
create mode 100644 include/net/mptcp_v4.h
create mode 100644 include/net/mptcp_v6.h
create mode 100644 include/net/netns/mptcp.h
create mode 100644 net/mptcp/Kconfig
create mode 100644 net/mptcp/Makefile
create mode 100644 net/mptcp/mptcp_balia.c
create mode 100644 net/mptcp/mptcp_binder.c
create mode 100644 net/mptcp/mptcp_coupled.c
create mode 100644 net/mptcp/mptcp_ctrl.c
create mode 100644 net/mptcp/mptcp_fullmesh.c
create mode 100644 net/mptcp/mptcp_input.c
create mode 100644 net/mptcp/mptcp_ipv4.c
create mode 100644 net/mptcp/mptcp_ipv6.c
create mode 100644 net/mptcp/mptcp_ndiffports.c
create mode 100644 net/mptcp/mptcp_ofo_queue.c
create mode 100644 net/mptcp/mptcp_olia.c
create mode 100644 net/mptcp/mptcp_output.c
create mode 100644 net/mptcp/mptcp_pm.c
create mode 100644 net/mptcp/mptcp_redundant.c
create mode 100644 net/mptcp/mptcp_rr.c
create mode 100644 net/mptcp/mptcp_sched.c
create mode 100644 net/mptcp/mptcp_wvegas.c
--
2.7.4
3 years, 1 month
[Weekly meetings] MoM - 22th of March 2018
by Matthieu Baerts
Hello,
We just had our third meeting with Mat, Ossama and Peter (Intel OTC),
Christoph (Apple) and myself (Tessares).
Thanks again for this new good meeting!
Here are the minutes of the meeting:
Work from last 7 days:
- Mat: pretty close to finish his work to get the DSS ACK from the
SKB. Not fully functional but ready to be tested. We can now parse what
it coming back. To have something working with MPTCP and not having a
fallback to TCP, we need to at least read the DSS options from the "3rd"
ACK. Work was based on net-next. Will first need to have a rebase on
latest net-next before sharing code. Will be ready to be tested "soon".
- Christoph is looking at the lock-less part, patches should arrive
very soon.
- Matthieu: should send patches for the Netlink PM very soon.
Indirect function calls:
- we didn't get new input after asking around.
- message from the maintainer: we cannot simply remove them,
sometimes needed (e.g. for the drivers)
- maybe not the best time to introduce new ones (except in RFC).
- we should certainly reduce their usage to only when it is really
needed (seems what DaveM said)
Patches that could be sent to netdev:
- we need to design them for the maintainers: if they are too big
with not so many explanations, the maintainers may not take the time to
go over them in detail. These "details" seem very important for them.
Netdevconf:
- there was a talk in Ottawa (Netdev 0.1)
- what was presented there is still valid at the end.
- maybe tutorial (proposed by Jamal): at least to convince other
devs that there is a very big interest. Then: "we are interested by
upstreaming that, any recommendations from you guys about (...)?"
- please propose ideas on ML. We can discuss about them next week.
- We should propose something and discuss about upstreaming MPTCP
there and get feedback, advises, ideas, more people interested by that
or by the issues we have in common with other projects, etc.
- Deadline: Proposals must be received on or before May 1, 2018
- Need to check with managers if we can participate to both Netdev
and Linux Plumbers conf.
kTLS:
- https://patchwork.ozlabs.org/patch/888002/
- what they are doing is very similar of what we need to do with
the DSS.
- they are maintaining a look-up table not to affect SKB size.
- callback is used to know when the sent data got acknowledged, so
that they can then purge elements out of the lookup-table
Next steps:
- Mat will try to write an email on the ML to summary the next
steps, what need to be done (from what has already been discussed
previously). Note that it is still difficult to have a roadmap because
we don't know exactly what the maintainers want, the list can change a
lot depending on their feedback. Mat will also testing his new version
regarding the problem with SKB size.
- Christoph: will continue to polish patches (lock-less) and share
them ASAP.
- Matthieu: publish Netlink PM patches and fix issues.
- Peter will continue the rebase, share it soon. Will also have a
look at patches posted on ML.
- Ossama: will also continue the rebase. Will be off next week.
Happy holidays!
Next meeting:
- proposition: the 29th of March at 9am PDT - 16:00 UTC (9am PDT,
6pm CEST)
- open to everyone!
- https://annuel2.framapad.org/p/mptcp_upstreaming_20180329
Feel free to comment these points and propose new ones for the next meeting!
Talk to you next week,
Matthieu
--
Matthieu Baerts | R&D Engineer
matthieu.baerts(a)tessares.net
Tessares SA | Hybrid Access Solutions
www.tessares.net
1 Avenue Jean Monnet, 1348 Louvain-la-Neuve, Belgium
--
------------------------------
DISCLAIMER.
This email and any files transmitted with it are confidential and intended
solely for the use of the individual or entity to whom they are addressed.
If you have received this email in error please notify the system manager.
This message contains confidential information and is intended only for the
individual named. If you are not the named addressee you should not
disseminate, distribute or copy this e-mail. Please notify the sender
immediately by e-mail if you have received this e-mail by mistake and
delete this e-mail from your system. If you are not the intended recipient
you are notified that disclosing, copying, distributing or taking any
action in reliance on the contents of this information is strictly
prohibited.
3 years, 1 month
Weekly meeting - 22 March 2018 16:00 UTC (9am PDT, 5pm CEST)
by Matthieu Baerts
Hello,
Our MPTCP upstreaming weekly web conference is scheduled for tomorrow.
The list of topics is here:
https://annuel2.framapad.org/p/mptcp_upstreaming_20180322
Feel free to add/modify topics!
Meeting link: https://talky.io/mptcp_upstreaming
Speak to you tomorrow!
Matthieu
--
Matthieu Baerts | R&D Engineer
matthieu.baerts(a)tessares.net
Tessares SA | Hybrid Access Solutions
www.tessares.net
1 Avenue Jean Monnet, 1348 Louvain-la-Neuve, Belgium
--
------------------------------
DISCLAIMER.
This email and any files transmitted with it are confidential and intended
solely for the use of the individual or entity to whom they are addressed.
If you have received this email in error please notify the system manager.
This message contains confidential information and is intended only for the
individual named. If you are not the named addressee you should not
disseminate, distribute or copy this e-mail. Please notify the sender
immediately by e-mail if you have received this e-mail by mistake and
delete this e-mail from your system. If you are not the intended recipient
you are notified that disclosing, copying, distributing or taking any
action in reliance on the contents of this information is strictly
prohibited.
3 years, 1 month
[Weekly meetings] MoM - 15th of March 2018
by Matthieu Baerts
Hello Everyone,
We just had our second meeting with Mat, Ossama and Peter (Intel OTC),
Christoph (Apple), Stephen Brennan and myself (Tessares).
Thanks again for this good meeting!
Here are the minutes of the meeting:
Feedback about the "Concrete next steps" from last time:
- Christoph: lock-less: first attempt which seems working but saw
issues with subflow establishment on server-side. @Christoph: feel free
to add more details if needed, It was difficult for me to write the
details :-)
- Mat: Peter is rebasing/cleaning patches. We hope he will be able
to comment that.
- Mat and Ossama will continue to get more advices from colleagues
about upstreaming stuff: create a workshop at Linux plumber conf? meet
these people not alone? presentation at netdevconf?
MPTCP subflow establishment on server-side from Christoph. But:
1. Need to parse TCP-options to demultiplex to the meta-socket;
2. MPTCP-code deep in tcp_check_req,... explore kernel_accept()
Netlink API:
- about the proposition sent by Ossama on the ML: one question
about the SYN+ACK only sent after having asked the userspace if it can
establish the subflow:
can be costly, maybe not good for a first version (we can still
reject the new subflow later with a reset). Maybe better to do that with
bpf, it is quite advanced and more about policing.
- Matthieu: I will continue to clean/rebase patches but I was very
busy this week. Will try to work more on that next week. It will help
for the next discussions.
Rao's patches and feedback from upstream:
- please read the dedicated topic on the ML
Scheduler mailing list thread:
- we can simplify versions: send a first version without
"smart/complex" stuffs (in the scheduler, pm, etc.). That will ease the
job of the reviewers.
https://www.mail-archive.com/netdev@vger.kernel.org/msg221741.html →
indirect function calls are expensive:
- It seems difficult to avoid them
- instead of indirect functions calls, we can have static branches.
Idea: we could also "reduce the jump" → not having an indirect function
call but jump "closer", not trivial, even just to explain :)
- upstream devs didn't propose alternatives. It also means they are
not forbidden!
- no measurement on the cost has been done, we don't know what's
the impact exactly. But we don't have better "solutions". It is a
generic issue in TCP. We guess more ideas will also come from upstream
devs or proposed on netdev.
- even if they are allowed, we will certainly needs not to abuse them.
Next concrete steps:
- Christoph will continue to work on the lock-less part, bugs fixing.
- Peter continues the rebasing stuff, getting closer
- Mat is still working on the receive part of the SKB not to avoid
cache misses every time (goal: process the DSS option and generate the
correct ACK)
- Matthieu will continue to rebase patches and publish them ASAP.
- Mat will try to write an email on the ML to summary the next
steps, what need to be done (from what has already been discussed
previously). Note that it is still difficult to have a roadmap because
we don't know exactly what the maintainers want, the list can change a
lot depending on their feedback. Note that it seems the next occasion to
meet them would be at the next netdevconf, still no date about that,
maybe in June?
Next meeting:
a proposition to have it next Thursday, same time. Christoph will
be at the IETF in London. He will certainly not be able to join.
Feel free to comment these points and propose new ones for the next meeting!
Talk to you next week,
Matthieu
--
Matthieu Baerts | R&D Engineer
matthieu.baerts(a)tessares.net
Tessares SA | Hybrid Access Solutions
www.tessares.net
1 Avenue Jean Monnet, 1348 Louvain-la-Neuve, Belgium
--
------------------------------
DISCLAIMER.
This email and any files transmitted with it are confidential and intended
solely for the use of the individual or entity to whom they are addressed.
If you have received this email in error please notify the system manager.
This message contains confidential information and is intended only for the
individual named. If you are not the named addressee you should not
disseminate, distribute or copy this e-mail. Please notify the sender
immediately by e-mail if you have received this e-mail by mistake and
delete this e-mail from your system. If you are not the intended recipient
you are notified that disclosing, copying, distributing or taking any
action in reliance on the contents of this information is strictly
prohibited.
3 years, 1 month
[RFC] MPTCP Path Management Generic Netlink API
by Othman, Ossama
Hi,
Following up on on a brief exchange between Matthieu and Mat regarding
a MPTCP path manager netlink API, I'd like to share to our own
proposed generic netlink API developed in parallel.
Please find the high level description below. It'll be great to
compare the two netlink based APIs to determine if either can be
improved by leveraging different aspects from each one.
Thanks!
--
Ossama Othman
Intel OTC
==============================================
RFC: MPTCP Path Management Generic Netlink API
==============================================
A generic netlink socket is used to facilitate communication between
the kernel and a user space daemon that handles MPTCP path management
related operations, from here on in called the path manager. Several
multicast groups, attributes and operations are exposed by the "mptcp"
generic netlink family, e.g.:
$ genl ctrl list
...
Name: mptcp
ID: 0x1d Version: 0x1 header size: 0 max attribs: 7
commands supported:
#1: ID-0x0
#2: ID-0x1
#3: ID-0x2
#4: ID-0x3
#5: ID-0x4
multicast groups:
#1: ID-0xa name: new_connection
#2: ID-0xb name: new_addr
#3: ID-0xc name: join_attempt
#4: ID-0xd name: new_subflow
#5: ID-0xe name: subflow_closed
#6: ID-0xf name: conn_closed
Each of the multicast groups corresponds to MPTCP path manager events
supported by the kernel MPTCP stack.
Kernel Initiated Events
-----------------------
* new_connection
* Called upon completion of new MPTCP-capable connection.
Information for initial subflow is made available to the path
manager.
* Payload
* Connection ID (globally unique for host)
* Local address
* Local port
* Remote address
* Remote port
* Priority
* new_addr
* Triggered when the host receives an ADD_ADDR MPTCP option, i.e. a
new address is advertised by the remote side.
* Payload
* Connection ID
* Remote address ID
* Remote address
* Remote port
* join_attempt
* Called when a MP_JOIN has been ACKed. The path manager is
expected to respond with an allow_join event containing its
decision based on the configured policy.
* Payload
* Connection ID
* Local address ID
* Local address
* Local port
* Remote address ID
* Remote address
* Remote port
* new_subflow
* Called when final MP_JOIN ACK has been ACKed.
* Payload
* Connection ID
* Subflow ID
* subflow_closed
* Called when a subflow has been closed. Allows path manager to
clean up subflow related resources.
* Payload
* Connection ID
* Subflow ID
* conn_closed
* Call when an MPTCP connection as a whole, as opposed to a single
subflow, has been closed. This is the case when close(2) has
been called on an MPTCP connection.
* Payload
* Connection ID
Path Manager Initiated Events (Commands)
----------------------------------------
* send_addr
* Notify the kernel of the availability of new address for use in
MPTCP connections. Triggers an ADD_ADDR to be sent to the peer.
* Payload
* Connection ID
* Address ID
* Local address
* Local port (optional, use same port as initial subflow if not
specified)
* add_subflow
* Add new subflow to the MPTCP connection. This triggers an
MP_JOIN to be sent to the peer.
* Payload
* Connection ID
* Local address ID
* Local address (optional, required if send_addr not previously
sent to establish the local address ID)
* Local port (optional, use same port as initial subflow if not
specified)
* Remote address ID (e.g. from a previously received new_addr or
join_attempt event)
* Backup priority flag (optional, use default priority if not
specified)
* Subflow ID
* allow_join
* Allow MP_JOIN attempt from peer.
* Payload
* Connection ID
* Remote address ID (e.g from a previously received join_attempt
event).
* Local address
* Local port
* Allow indication (optional, do not allow join if not
specified)
* Backup priority flag (optional, use default priority if not
specified)
* Subflow ID
* set_backup
* Set subflow priority to backup priority.
* Payload
* Connection ID
* Subflow ID
* Backup priority flag (optional, use default priority if not
specified)
* remove_subflow
* Triggers a REMOVE_ADDR MPTCP option to be sent, ultimately
resulting in subflows routed through that invalidated address to
be closed.
* Payload
* Connection ID
* Subflow ID
Security
--------
For security reasons, path management operations may only be performed
by privileged processes due to the GENL_ADMIN_PERM generic netlink
flag being set. In particular, access to the MPTCP generic netlink
interface will require CAP_NET_ADMIN privileges.
3 years, 1 month