Author: Yizhou Zhao <[email protected]> Date: Thu May 28 13:39:16 2026 +0800 9p: avoid putting oldfid in p9_client_walk() error path commit 1a3860d46e3eb47dbd60339783cdad7904486b9f upstream. When p9_client_walk() is called with clone set to false, fid aliases oldfid. If the walk subsequently fails after the request has been sent, the error path jumps to clunk_fid, which currently calls p9_fid_put(fid) unconditionally. This drops a reference to oldfid even though ownership of oldfid remains with the caller. If this is the last reference, oldfid can be clunked and destroyed while the caller still expects it to be valid. A later use or put of oldfid can then trigger a use-after-free or refcount underflow. Fix this by only putting fid in the clunk_fid error path when it does not alias oldfid, matching the existing guard in the error path below. This can be triggered when a multi-component walk is split into multiple p9_client_walk() calls and a later non-cloning walk fails. A reproducer and refcount warning logs are available on request. Fixes: b48dbb998d70 ("9p fid refcount: add p9_fid_get/put wrappers") Cc: [email protected] Reported-by: Yuxiang Yang <[email protected]> Reported-by: Ao Wang <[email protected]> Reported-by: Xuewei Feng <[email protected]> Reported-by: Qi Li <[email protected]> Reported-by: Ke Xu <[email protected]> Assisted-by: GLM 5.1 Signed-off-by: Yizhou Zhao <[email protected]> Message-ID: <[email protected]> Signed-off-by: Dominique Martinet <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]>
Author: Yicong Yang <[email protected]> Date: Wed Jun 24 14:38:16 2026 +0800 ACPI: scan: Use async schedule function in acpi_scan_clear_dep_fn() [ Upstream commit 7cf28b3797a81b616bb7eb3e90cf131afc452919 ] The device object rescan in acpi_scan_clear_dep_fn() is scheduled on a system workqueue which is not guaranteed to be finished before entering userspace. This may cause some key devices to be missing when userspace init task tries to find them. Two issues observed on RISCV platforms: - Kernel panic due to userspace init cannot have an opened console. The console device scanning is queued by acpi_scan_clear_dep_queue() and not finished by the time userspace init process running, thus by the time userspace init runs, no console is present. - Entering rescue shell due to the lack of root devices (PCIe nvme in our case). Same reason as above, the PCIe host bridge scanning is queued on a system workqueue and finished after init process runs. The reason is because both devices (console, PCIe host bridge) depend on riscv-aplic irqchip to serve their interrupts (console's wired interrupt and PCI's INTx interrupts). In order to keep the dependency, these devices are scanned and created after initializing riscv-aplic. The riscv-aplic is initialized in device_initcall() and a device scan work is queued via acpi_scan_clear_dep_queue(), which is close to the time userspace init process is run. Since system_dfl_wq is used in acpi_scan_clear_dep_queue() with no synchronization, the issues will happen if userspace init runs before these devices are ready. The solution is to wait for the queued work to complete before entering userspace init. One possible way would be to use a dedicated workqueue instead of system_dfl_wq, and explicitly flush it somewhere in the initcall stage before entering userspace. Another way is to use async_schedule_dev_nocall() for scanning these devices. It's designed for asynchronous initialization and will work in the same way as before because it's using a dedicated unbound workqueue as well, but the kernel init code calls async_synchronize_full() right before entering userspace init which will wait for the work to complete. Compared to a dedicated workqueue, the second approach is simpler because the async schedule framework takes care of all of the details. The ACPI code only needs to focus on its job. A dedicated workqueue for this could also be redundant because some platforms don't need acpi_scan_clear_dep_queue() for their device scanning. Signed-off-by: Yicong Yang <[email protected]> [ rjw: Subject adjustment, changelog edits ] Link: https://patch.msgid.link/[email protected] Signed-off-by: Rafael J. Wysocki <[email protected]> [ Vivian: Adjust system_dfl_wq -> system_unbound_wq in removed lines ] Signed-off-by: Vivian Wang <[email protected]> Signed-off-by: Sasha Levin <[email protected]>
Author: Kuniyuki Iwashima <[email protected]> Date: Wed Jul 1 09:53:06 2026 +0300 af_unix: Set gc_in_progress to true in unix_gc(). [ Upstream commit d82ba05263c69fa2437fe93e4e561cc40f4c03af ] Igor Ushakov reported that unix_gc() could run with gc_in_progress being false if the work is scheduled while running: Thread 1 Thread 2 Thread 3 -------- -------- -------- unix_schedule_gc() unix_schedule_gc() `- if (!gc_in_progress) `- if (!gc_in_progress) |- gc_in_progress = true | `- queue_work() | unix_gc() <----------------/ | | |- gc_in_progress = true ... `- queue_work() | | `- gc_in_progress = false | | unix_gc() <---------------------------------------------' | ... /* gc_in_progress == false */ | `- gc_in_progress = false unix_peek_fpl() relies on gc_in_progress not to confuse GC by MSG_PEEK. Let's set gc_in_progress to true in unix_gc(). Fixes: 8b90a9f819dc ("af_unix: Run GC on only one CPU.") Reported-by: Igor Ushakov <[email protected]> Signed-off-by: Kuniyuki Iwashima <[email protected]> Link: https://patch.msgid.link/[email protected] Signed-off-by: Jakub Kicinski <[email protected]> [ Add setting gc_in_progress in __unix_gc(). Keep the existing set in unix_gc() for wait_for_unix_gc() over-limit throttling. ] Signed-off-by: Igor Ushakov <[email protected]> Signed-off-by: Sasha Levin <[email protected]>
Author: Mingyu Wang <[email protected]> Date: Mon May 4 15:48:23 2026 +0800 agp/amd64: Fix broken error propagation in agp_amd64_probe() commit b08472db93b1ccff84a7adec5779d47f0e9d3a30 upstream. A NULL pointer dereference was observed in the AMD64 AGP driver when running in a virtualized environment (e.g. qemu/kvm) without a physical AMD northbridge. The crash occurs in amd64_fetch_size() when attempting to dereference the pointer returned by node_to_amd_nb(0). The root cause of this crash is broken error propagation in agp_amd64_probe(): When no AMD northbridges are found, cache_nbs() correctly returns -ENODEV. However, the probe function erroneously checks the return value against exactly -1, rather than < 0. As a result, the hardware absence error is masked, allowing the driver to improperly proceed with initialization. It eventually calls agp_add_bridge(), which invokes amd64_fetch_size(). Since the hardware does not exist, node_to_amd_nb(0) returns NULL, leading to a General Protection Fault (GPF) when accessing its ->misc member. Fix the issue by correcting the error check in agp_amd64_probe() to abort properly when cache_nbs() returns any negative error code. This prevents the driver from erroneously proceeding without hardware, thereby avoiding the subsequent NULL pointer dereference at its source. Fixes: a32073bffc65 ("[PATCH] x86_64: Clean and enhance up K8 northbridge access code") Signed-off-by: Mingyu Wang <[email protected]> Signed-off-by: Lukas Wunner <[email protected]> Reviewed-by: Lukas Wunner <[email protected]> Cc: [email protected] # v2.6.18+ Link: https://patch.msgid.link/[email protected] Signed-off-by: Greg Kroah-Hartman <[email protected]>
Author: Ruslan Valiyev <[email protected]> Date: Tue May 26 00:04:46 2026 +0200 apparmor: fix use-after-free in rawdata dedup loop commit 6f060496d03e4dc560a40f73770bd08335cb7a27 upstream. aa_replace_profiles() walks ns->rawdata_list to dedup the incoming policy blob against entries already attached to existing profiles. Per the kernel-doc on struct aa_loaddata, list membership does not hold a reference: profiles hold pcount, and when the last pcount drops, do_ploaddata_rmfs() is queued on a workqueue that takes ns->lock and removes the entry. Between dropping the last pcount and the workqueue running, an entry remains on the list with pcount == 0. aa_get_profile_loaddata() is an unconditional kref_get() on pcount, so when the dedup loop hits such an entry, refcount hardening reports refcount_t: addition on 0; use-after-free. inside aa_replace_profiles(), and the poisoned counter then trips "saturated" and "underflow" warnings on the subsequent uses of the same loaddata. Before commit a0b7091c4de4 ("apparmor: fix race on rawdata dereference") the dedup path used a get_unless_zero-style helper on a single counter, so the existing "if (tmp)" guard was meaningful. The split-refcount refactor introduced aa_get_profile_loaddata(), which has plain kref_get() semantics, and the guard quietly became a no-op. Introduce aa_get_profile_loaddata_not0(), matching the existing _not0 convention used by aa_get_profile_not0(), and use it for the rawdata_list dedup lookup so dying entries are skipped. Reproduced on x86_64 with v7.1-rc5 in QEMU+KVM running Ubuntu 24.04 + stress-ng 0.17.06: stress-ng --apparmor 1 --klog-check --timeout 60s Without this patch the three refcount_t warnings fire within a few seconds. With it the same 60 s run is clean. Coverage is a smoke-test only; a longer soak with CONFIG_KASAN, CONFIG_KCSAN and CONFIG_PROVE_LOCKING would be welcome from anyone with the cycles. Fixes: a0b7091c4de4 ("apparmor: fix race on rawdata dereference") Reported-by: Colin Ian King <[email protected]> Closes: https://bugzilla.kernel.org/show_bug.cgi?id=221513 Cc: [email protected] Signed-off-by: Ruslan Valiyev <[email protected]> Signed-off-by: John Johansen <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]>
Author: Bryam Vargas <[email protected]> Date: Mon Jun 22 15:57:38 2026 -0500 apparmor: mediate the implicit connect of TCP fast open sendmsg commit 4d587cd8a72155089a627130bbd4716ec0856e21 upstream. sendmsg()/sendto() with MSG_FASTOPEN is a combination of connect(2) and write(2): it opens the connection in the SYN. apparmor_socket_sendmsg() only checks AA_MAY_SEND, so a profile that grants send but denies connect lets a confined task open an outbound TCP/MPTCP connection that connect(2) would have refused, bypassing connect mediation. Mediate the implicit connect when MSG_FASTOPEN is set and a destination is supplied. Add it to apparmor_socket_sendmsg() (not the shared aa_sock_msg_perm() helper, which recvmsg also uses) and call aa_sk_perm() directly, mirroring the selinux and tomoyo fixes. sk_is_tcp() does not cover MPTCP fast open, so the SOCK_STREAM/IPPROTO_MPTCP arm is explicit. Fixes: cf60af03ca4e ("net-tcp: Fast Open client - sendmsg(MSG_FASTOPEN)") Cc: [email protected] Signed-off-by: Bryam Vargas <[email protected]> Signed-off-by: John Johansen <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]>
Author: Steven Rostedt <[email protected]> Date: Thu Jun 18 16:15:30 2026 -0400 arm64: scripts/sorttable: Implement sorting mcount_loc at boot for arm64 [ Upstream commit b3d09d06e052e1d754645acea4e4d1e96f81c934 ] The mcount_loc section holds the addresses of the functions that get patched by ftrace when enabling function callbacks. It can contain tens of thousands of entries. These addresses must be sorted. If they are not sorted at compile time, they are sorted at boot. Sorting at boot does take some time and does have a small impact on boot performance. x86 and arm32 have the addresses in the mcount_loc section of the ELF file. But for arm64, the section just contains zeros. The .rela.dyn Elf_Rela section holds the addresses and they get patched at boot during the relocation phase. In order to sort these addresses, the Elf_Rela needs to be updated instead of the location in the binary that holds the mcount_loc section. Have the sorttable code, allocate an array to hold the functions, load the addresses from the Elf_Rela entries, sort them, then put them back in order into the Elf_rela entries so that they will be sorted at boot up without having to sort them during boot up. Cc: bpf <[email protected]> Cc: Masami Hiramatsu <[email protected]> Cc: Mark Rutland <[email protected]> Cc: Mathieu Desnoyers <[email protected]> Cc: Andrew Morton <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Masahiro Yamada <[email protected]> Cc: Nathan Chancellor <[email protected]> Cc: Nicolas Schier <[email protected]> Cc: Zheng Yejian <[email protected]> Cc: Martin Kelly <[email protected]> Cc: Christophe Leroy <[email protected]> Cc: Josh Poimboeuf <[email protected]> Cc: Heiko Carstens <[email protected]> Cc: Will Deacon <[email protected]> Cc: Vasily Gorbik <[email protected]> Cc: Alexander Gordeev <[email protected]> Link: https://lore.kernel.org/[email protected] Acked-by: Catalin Marinas <[email protected]> Signed-off-by: Steven Rostedt (Google) <[email protected]> Signed-off-by: Andrey Grodzovsky <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]>
Author: Sven Eckelmann <[email protected]> Date: Fri Jun 26 18:11:38 2026 +0200 batman-adv: bla: annotate lasttime access with READ/WRITE_ONCE commit 98b0fb191c878a64cbaebfe231d96d57576acf8c upstream. The lasttime field for claim, backbone_gw, and loopdetect tracks the jiffies value of the most recent activity and is used to detect timeouts. These accesses are not consistently protected by a lock, so READ_ONCE/WRITE_ONCE must be used to prevent data races caused by compiler optimizations. Cc: [email protected] Fixes: 23721387c409 ("batman-adv: add basic bridge loop avoidance code") Signed-off-by: Sven Eckelmann <[email protected]> Signed-off-by: Sasha Levin <[email protected]>
Author: Sven Eckelmann <[email protected]> Date: Fri Jun 26 18:11:52 2026 +0200 batman-adv: dat: prevent false sharing between VLANs commit 20d7658b74169f86d4ac01b9185b3eadddf71f28 upstream. The local hash of DAT entries is supposed to be VLAN (VID) aware. But the adding to the hash and the search in the hash were not checking the VID information of the hash entries. The entries would therefore only be correctly separated when batadv_hash_dat() didn't select the same buckets for different VIDs. Cc: [email protected] Fixes: be1db4f6615b ("batman-adv: make the Distributed ARP Table vlan aware") Signed-off-by: Sven Eckelmann <[email protected]> Signed-off-by: Sasha Levin <[email protected]>
Author: Sven Eckelmann <[email protected]> Date: Fri Jun 26 18:11:41 2026 +0200 batman-adv: ensure bcast is writable before modifying TTL commit 4cd6d3a4b96a8576f1fed8f9f9f17c2dc2978e0c upstream. Before batman-adv is allowed to write to an skb, it either has to have its own copy of the skb or used skb_cow() to ensure that the data part is not shared. The old implementation used a shared queue and created copies before attempting to write to it. But with the new implementation, the broadcast packet is already modified when it gets received. Potentially writing to shared buffers in this process. Adding a skb_cow() right before this operation avoids this and can at the same time prepare it for the modifications required to rebroadcast the packet. Cc: [email protected] Fixes: 3f69339068f9 ("batman-adv: bcast: queue per interface, if needed") Signed-off-by: Sven Eckelmann <[email protected]> Signed-off-by: Sasha Levin <[email protected]>
Author: Sven Eckelmann <[email protected]> Date: Fri Jun 26 18:11:42 2026 +0200 batman-adv: fix (m|b)cast csum after decrementing TTL commit e728bbdf32660c8f32b8f5e8d09427a2c131ad60 upstream. The broadcast and multicast packets can be received at the same time by the local system and forwarded to other nodes. Both are simply decrementing the TTL at the beginning of the receive path - independent of chosen paths (receive/forward). But such a modification of the data conflicts with the hw csum. This is not a problem when the packet is directly forwarded but can cause errors in the local receive path. Such a problem can then trigger a "hw csum failure". The receiver path must therefore ensure that the csum is fixed for each modification of the payload before batadv_interface_rx() is reached. Since all batman-adv packet types with a ttl have it as u8 at offset 2, a helper can be used for all of them. But it is only used at the moment for batadv_bcast_packet and batadv_mcast_packet because they are the only ones which deliver the packet locally but unconditionally modify the TTL. Cc: [email protected] Fixes: 3f69339068f9 ("batman-adv: bcast: queue per interface, if needed") Fixes: 07afe1ba288c ("batman-adv: mcast: implement multicast packet reception and forwarding") [ Context ] Signed-off-by: Sven Eckelmann <[email protected]> Signed-off-by: Sasha Levin <[email protected]>
Author: Sven Eckelmann <[email protected]> Date: Fri Jun 26 18:11:44 2026 +0200 batman-adv: frag: avoid underflow of TTL commit 493d9d2528e1a09b090e4b37f0f553def7bd5ce9 upstream. Packets with a TTL are using it to limit the amount of time this packet can be forwarded. But for batadv_frag_packet, the TTL was always only reduced but it was never evaluated. It could even underflow without any effect. Check the TTL in batadv_frag_skb_fwd() before attempting to prepare it for forwarding. This keeps it in sync with the not fragmented unicast packet. Cc: [email protected] Fixes: 610bfc6bc99b ("batman-adv: Receive fragmented packets and merge") Signed-off-by: Sven Eckelmann <[email protected]> Signed-off-by: Sasha Levin <[email protected]>
Author: Sven Eckelmann <[email protected]> Date: Fri Jun 26 18:11:43 2026 +0200 batman-adv: frag: ensure fragment is writable before modifying TTL commit b7293c6e8c15b2db77809b25cf8389e35331b27a upstream. Before batman-adv is allowed to write to an skb, it either has to have its own copy of the skb or use skb_cow() to ensure that the data part is not shared. But batadv_frag_skb_fwd() modifies the TTL even when it is shared. Adding a skb_cow() right before this operation avoids this and can at the same time prepare it for the modifications required to forward the fragment. Cc: [email protected] Fixes: 610bfc6bc99b ("batman-adv: Receive fragmented packets and merge") [ Context ] Signed-off-by: Sven Eckelmann <[email protected]> Signed-off-by: Sasha Levin <[email protected]>
Author: Sven Eckelmann <[email protected]> Date: Fri Jun 26 18:11:39 2026 +0200 batman-adv: prevent ELP transmission interval underflow commit 5e50d4b8ae3ea622122d3c6a38d7f6fe68dfddca upstream. batadv_v_elp_start_timer() enqeues a delayed work. The time when it starts is randomly chosen between (elp_interval - BATADV_JITTER) and (elp_interval + BATADV_JITTER). The configured elp_interval must therefore be larger or equal to BATADV_JITTER to avoid that it causes an underflow of the unsigned integer. If this would happen, then a "fast" ELP interval would turn into a "day long" delay. At the same time, it must not be larger than the maximum value the variable can store. Cc: [email protected] Fixes: a10800829040 ("batman-adv: Add elp_interval hardif genl configuration") [ Context ] Signed-off-by: Sven Eckelmann <[email protected]> Signed-off-by: Sasha Levin <[email protected]>
Author: Sven Eckelmann <[email protected]> Date: Fri Jun 26 18:11:37 2026 +0200 batman-adv: tp_meter: add only finished tp_vars to lists commit 15ccbf685222274f5add1387af58c2a41a95f81e upstream. When the receiver variables (aka "session") are initialized, then they are added to the list of sessions before the timer is set up. A RCU protected reader could therefore find the entry and run mod_setup before batadv_tp_init_recv() finished the timer initialization. The same is true for batadv_tp_start(), which must first initialize the finish_work and the test_length to avoid a similar problem. Cc: [email protected] Fixes: 33a3bb4a3345 ("batman-adv: throughput meter implementation") Signed-off-by: Sven Eckelmann <[email protected]> Signed-off-by: Sasha Levin <[email protected]>
Author: Sven Eckelmann <[email protected]> Date: Fri Jun 26 18:11:47 2026 +0200 batman-adv: tp_meter: annotate last_recv_time access with READ/WRITE_ONCE commit d67c728f07fca2ee6ffdc6dd4421cf2e8691f4d1 upstream. The last_recv_time field for batadv_tp_receiver tracks the jiffies value of the most recent activity and is used to detect timeouts. These accesses are not consistently protected by a lock, so READ_ONCE/WRITE_ONCE must be used to prevent data races caused by compiler optimizations. Cc: [email protected] Fixes: 33a3bb4a3345 ("batman-adv: throughput meter implementation") Signed-off-by: Sven Eckelmann <[email protected]> Signed-off-by: Sasha Levin <[email protected]>
Author: Sven Eckelmann <[email protected]> Date: Fri Jun 26 18:11:34 2026 +0200 batman-adv: tp_meter: avoid divide-by-zero for dec_cwnd commit 33ccd52f3cc9ed46ce395199f89aa3234dc83314 upstream. The cwnd is always MSS <= cwnd <= 0x20000000. But the calculation in batadv_tp_update_cwnd() assumes unsigned 32 bit arithmetics. ((mss * 8) ** 2) / (cwnd * 8) In case cwnd is actually 0x20000000, it will be shifted by 3 bit to the left end up at 0x100000000 or U32_MAX + 1. It will therefore wrap around and be 0 - resulting in: ((mss * 8) ** 2) / 0 This is of course invalid and cannot be calculated. The calculation should must be simplified to avoid this overflow: (mss ** 2) * 8 / cwnd It will keep the precision enhancement from the scaling (by 8) but avoid the overflow in the divisor. In theory, there could still be an overflow in the dividend. It is at the moment fixed to BATADV_TP_PLEN in batadv_tp_recv_ack() - so it is not an imminent problem. But allowing it to use the whole u32 bit range, would mean that it can still use up to 67 bits. To keep this calculation safe for 32 bit arithmetic, mss must never use more than floor((32 - 3) / 2) bits - or in other words: must never be larger than 16383. Cc: [email protected] Fixes: 33a3bb4a3345 ("batman-adv: throughput meter implementation") Signed-off-by: Sven Eckelmann <[email protected]> Signed-off-by: Sasha Levin <[email protected]>
Author: Sven Eckelmann <[email protected]> Date: Fri Jun 26 18:11:33 2026 +0200 batman-adv: tp_meter: avoid window underflow commit 765947b81fb54b6ebb0bc1cfe55c0fa399e002b8 upstream. In batadv_tp_avail(), win_left is calculated with 32-bit unsigned arithmetic: win_left = win_limit - tp_vars->last_sent; During Fast Recovery, cwnd is inflated and last_sent advances rapidly. When Fast Recovery ends, cwnd drops abruptly back to ss_threshold. If the newly shrunk win_limit is less than last_sent, the unsigned subtraction will underflow, wrapping to a massive positive value. Instead of returning that the window is full (unavailable), it returns that the sender can continue sending. To handle this situation, it must be checked whether the windows end sequence number (win_limit) has to be compared with the last sent sequence number. If it would be before the last sent sequence number, then more acks are needed before the transmission can be started again. Cc: [email protected] Fixes: 33a3bb4a3345 ("batman-adv: throughput meter implementation") Signed-off-by: Sven Eckelmann <[email protected]> Signed-off-by: Sasha Levin <[email protected]>
Author: Sven Eckelmann <[email protected]> Date: Fri Jun 26 18:11:35 2026 +0200 batman-adv: tp_meter: fix fast recovery precondition commit 2b0d08f08ed3b2174f05c43089ec65f3543a025b upstream. The fast recovery precondition checks if the recover (initialized to BATADV_TP_FIRST_SEQ) is bigger than the received ack. But since recover is only updated when this check is successful, it will never enter the fast recovery mode. According to RFC6582 Section 3.2 step 2, the check should actually be different: > When the third duplicate ACK is received, the TCP sender first > checks the value of recover to see if the Cumulative > Acknowledgment field covers more than recover The precondition must therefore check if recover is smaller than the received ack - basically swapping the operands of the current check. Cc: [email protected] Fixes: 33a3bb4a3345 ("batman-adv: throughput meter implementation") Signed-off-by: Sven Eckelmann <[email protected]> Signed-off-by: Sasha Levin <[email protected]>
Author: Sven Eckelmann <[email protected]> Date: Fri Jun 26 18:11:49 2026 +0200 batman-adv: tp_meter: handle overlapping packets commit cbde75c38b21f022891525078622587ad557b7c1 upstream. If the size of the packets would change during the transmission, it could happen that some retries of packets are overlapping. In this case, precise comparisons of sequence numbers by the receiver would be wrong. It is then necessary to check if the start sequence number to the end sequence number ("seqno + length") would contain a new range. If this is the case then this is enough to accept this packet. In all other cases, the packet still has to be dropped (and not acked). Cc: [email protected] Fixes: 33a3bb4a3345 ("batman-adv: throughput meter implementation") [ Switch to pre-splitted tp_vars structure names ] Signed-off-by: Sven Eckelmann <[email protected]> Signed-off-by: Sasha Levin <[email protected]>
Author: Sven Eckelmann <[email protected]> Date: Fri Jun 26 18:11:36 2026 +0200 batman-adv: tp_meter: handle seqno wrap-around for fast recovery detection commit f54c85ed42a1b27a516cf2a4728f5a612b799e07 upstream. The recover variable and the last_sent sequence number are initialized on purpose as a really high value which will wrap-around after the first 2000 bytes. The fast recovery precondition must therefore not use simple integer comparisons but use helpers which are aware of the sequence number wrap-arounds. Cc: [email protected] Fixes: 33a3bb4a3345 ("batman-adv: throughput meter implementation") Signed-off-by: Sven Eckelmann <[email protected]> Signed-off-by: Sasha Levin <[email protected]>
Author: Sven Eckelmann <[email protected]> Date: Fri Jun 26 18:11:32 2026 +0200 batman-adv: tp_meter: initialize dec_cwnd explicitly commit febfb1b86224489535312296ecfa3d4bf467f339 upstream. When batadv_tp_update_cwnd() is called, dec_cwnd is increased. But dec_cwnd is only initialixed (to 0) when a duplicate Ack was received or when cwnd is below the ss_threshold. Just initialize the cwnd during the initialization to avoid any potential access of uninitialized data. Cc: [email protected] Fixes: 33a3bb4a3345 ("batman-adv: throughput meter implementation") Signed-off-by: Sven Eckelmann <[email protected]> Signed-off-by: Sasha Levin <[email protected]>
Author: Sven Eckelmann <[email protected]> Date: Fri Jun 26 18:11:31 2026 +0200 batman-adv: tp_meter: initialize dup_acks explicitly commit b2b68b32a715e0328662801576974aa37b942b00 upstream. When an ack with a sequence number equal to the last_acked is received, the dup_acks counter is increased to decide whether fast retransmit should be performed. Only when the sequence numbers are not equal, the dup_acks is set to the initial value (0). But if the initial packet would have the sequence number BATADV_TP_FIRST_SEQ, dup_acks would not be initialized and atomic_inc would operate on an undefined starting value. It is therefore required to have it explicitly initialized during the start of the sender session. Cc: [email protected] Fixes: 33a3bb4a3345 ("batman-adv: throughput meter implementation") Signed-off-by: Sven Eckelmann <[email protected]> Signed-off-by: Sasha Levin <[email protected]>
Author: Sven Eckelmann <[email protected]> Date: Fri Jun 26 18:11:40 2026 +0200 batman-adv: tp_meter: initialize last_recv_time during init commit 811cb00fa8cdc3f0a7f6eefc000a6888367c8c8f upstream. The last_recv_time is the most important indicator for a receiver session to figure out whether a session timed out or not. But this information was only initialized after the session was added to the tp_receiver_list and after the timer was started. In the worst case, the timer (function) could have tried to access this information before the actual initialization was reached. Like rest of the variables of the tp_meter receiver session, this field has to be filled out before any other (parallel running) context has the chance to access it. Cc: [email protected] Fixes: 33a3bb4a3345 ("batman-adv: throughput meter implementation") [ Context ] Signed-off-by: Sven Eckelmann <[email protected]> Signed-off-by: Sasha Levin <[email protected]>
Author: Sven Eckelmann <[email protected]> Date: Fri Jun 26 18:11:30 2026 +0200 batman-adv: tp_meter: keep unacked list in ascending ordered commit 5aa8651527ea0b610e7a09fb3b8204c1398b9525 upstream. When batadv_tp_handle_out_of_order inserts a new entry in the list of unacked (out of order) packets, it searches from the entry with the newest sequence number towards oldest sequence number. If an entry is found which is older than the newly entry, the new entry has to be added after the found one to keep the ascending order. But for this operation list_add_tail() was used. But this function adds an entry _before_ another one. As result, the list would contain a lot of swapped sequence numbers. The consumer of this list (batadv_tp_ack_unordered()) would then fail to correctly ack packets. Cc: [email protected] Fixes: 33a3bb4a3345 ("batman-adv: throughput meter implementation") Signed-off-by: Sven Eckelmann <[email protected]> Signed-off-by: Sasha Levin <[email protected]>
Author: Sven Eckelmann <[email protected]> Date: Fri Jun 26 18:11:48 2026 +0200 batman-adv: tp_meter: prevent parallel modifications of last_recv commit 6dde0cfcb36e4d5b3de35b75696937478441eed4 upstream. When last_recv is updated to store the last receive sequence number, it is assuming that nothing is modifying in parallel while: * check for outdated packets is done * out of order check is performed (and packets are stored in out-of-order queue) * the out-of-order queue was searched for closed gaps * sequence number for next ack is calculated Nothing of that was actually protected. It could therefore happen that the last_recv was updated multiple times in parallel and the final sequence number was calculated with deltas which had no connection to the sequence number they were added to. Lock this whole region with the same lock which was already used to protect the unacked (out-of-order) list. Cc: [email protected] Fixes: 33a3bb4a3345 ("batman-adv: throughput meter implementation") [ Switch to pre-splitted tp_vars structure names ] Signed-off-by: Sven Eckelmann <[email protected]> Signed-off-by: Sasha Levin <[email protected]>
Author: Sven Eckelmann <[email protected]> Date: Fri Jun 26 18:11:46 2026 +0200 batman-adv: tp_meter: restrict number of unacked list entries commit e7c775110e1858e5a7471a23a9c9658c0af9df89 upstream. When the unacked_list is unbound, an attacker could send messages with small lengths and appropriated seqno + gaps to force the receiver to allocate more and more unacked_list entries. And the end either causing an out-of-memory situation or increase the management overhead for the (large) list that significant portions of CPU cycles are wasted in searching through the list. When limiting the list to a specific number, it is important to still correctly add a new entry to the list. But if the list became larger than the limit, the last entry of the list (with the highest seqno) must be dropped to still allow the earlier seqnos to finish and therefore to continue the process. Otherwise, the process might get stuck with too high seqnos which are not handled by batadv_tp_ack_unordered(). Cc: [email protected] Fixes: 33a3bb4a3345 ("batman-adv: throughput meter implementation") [ Switch to pre-splitted tp_vars structure names ] Signed-off-by: Sven Eckelmann <[email protected]> Signed-off-by: Sasha Levin <[email protected]>
Author: Sven Eckelmann <[email protected]> Date: Fri Jun 26 18:11:50 2026 +0200 batman-adv: tt: don't merge change entries with different VIDs commit f08e06c2d5c3e2434e7c773f2213f4a7dce6bc1e upstream. batadv_tt_local_event() merges/cancels events for the same client which would conflict or be duplicates. The matching of the queued events only compares the MAC address - the VLAN ID stored in each event is ignored. If a MAC would now appear on multiple VID, the two ADD change events (for VID 1 and VID 2) would be merged to a single vid event. The remote can therefore not calculate the correct TT table and desync. A full translation table exchange is required to recover from this state. A check of VID is therefore necessary to avoid such wrong merges/cancels. Cc: [email protected] Fixes: c018ad3de61a ("batman-adv: add the VLAN ID attribute to the TT entry") [ Context ] Signed-off-by: Sven Eckelmann <[email protected]> Signed-off-by: Sasha Levin <[email protected]>
Author: Sven Eckelmann <[email protected]> Date: Fri Jun 26 18:11:51 2026 +0200 batman-adv: tt: track roam count per VID commit 12407d5f61c2653a64f2ff4b22f3c267f8420ef1 upstream. batadv_tt_check_roam_count() is supposed to track roaming of a TT entry. But TT entries are for a MAC + VID. The VID was completely missed and thus leads to incorrect detection of ROAM counts when a client MAC exists in multiple VLANs. Cc: [email protected] Fixes: c018ad3de61a ("batman-adv: add the VLAN ID attribute to the TT entry") [ Context ] Signed-off-by: Sven Eckelmann <[email protected]> Signed-off-by: Sasha Levin <[email protected]>
Author: Sven Eckelmann <[email protected]> Date: Fri Jun 26 18:11:54 2026 +0200 batman-adv: tvlv: avoid race of cifsnotfound handler state commit edb557b2ba38fea2c5eb710cf366c797e187218c upstream. TVLV handlers can have the flag BATADV_TVLV_HANDLER_OGM_CIFNOTFND set to signal that the OGM handler should be called (with NULL for data) when the specific TVLV container was not found in the OGM. This is used by: * DAT * GW * Multicast (OGM + Tracker) The state whether the handler was executed was stored in the struct batadv_tvlv_handler. But the TVLV processing is started without any lock. Multiple parallel contexts processing TVLVs would therefore overwrite each others BATADV_TVLV_HANDLER_OGM_CALLED flag in the shared batadv_tvlv_handler. Drop the shared BATADV_TVLV_HANDLER_OGM_CALLED flag and instead determine, per TVLV buffer, whether a matching container was present by scanning the packet's buffer. Cc: [email protected] Fixes: ef26157747d4 ("batman-adv: tvlv - basic infrastructure") Signed-off-by: Sven Eckelmann <[email protected]> Signed-off-by: Sasha Levin <[email protected]>
Author: Sven Eckelmann <[email protected]> Date: Fri Jun 26 18:11:53 2026 +0200 batman-adv: tvlv: enforce 2-byte alignment commit 32a6799255525d6ea4da0f7e9e0e521ad9560a46 upstream. The fields of an aggregated OGM(v2) are accessed assuming (at least) 2-byte alignment, so a following OGM must start at an even offset. As the header length is even, an odd tvlv_len would misalign it and trigger unaligned accesses on strict-alignment architectures. Such a misaligned TVLV/OGM/OGMv2 is not created by a normal participant in the mesh. Therefore, reject such malformed packets. Cc: [email protected] Fixes: ef26157747d4 ("batman-adv: tvlv - basic infrastructure") Signed-off-by: Sven Eckelmann <[email protected]> Signed-off-by: Sasha Levin <[email protected]>
Author: Sven Eckelmann <[email protected]> Date: Fri Jun 26 18:11:45 2026 +0200 batman-adv: v: prevent OGM aggregation on disabled hardif commit d11c00b95b2a3b3934007fc003dccc6fdcc061ad upstream. When an interface gets disabled, the worker is correctly disabled by batadv_hardif_disable_interface() -> ... -> batadv_v_ogm_iface_disable(). In this process, the skb aggr_list is also freed. But batadv_v_ogm_send_meshif() can still queue new skbs (via batadv_v_ogm_queue_on_if()) to the aggr_list. This will only stop after all cores can no longer find the RCU protected list of hard interfaces. These queued skbs will never be freed or consumed by batadv_v_ogm_aggr_work. The batadv_v_ogm_iface_disable() function must block batadv_v_ogm_queue_on_if() to avoid leak of skbs. Cc: [email protected] Fixes: f89255a02f1d ("batman-adv: BATMAN_V: introduce per hard-iface OGMv2 queues") [ Context ] Signed-off-by: Sven Eckelmann <[email protected]> Signed-off-by: Sasha Levin <[email protected]>
Author: Michal KoutnĂ½ <[email protected]> Date: Thu Feb 5 23:54:23 2026 +0800 blk-cgroup: fix UAF in __blkcg_rstat_flush() commit 0ab5ee5a1badb58cbb2242617cb01a4972b1f2a2 upstream. When multiple blkgs in the same blkcg are released concurrently, a use-after-free can occur. The race happens when one blkg's __blkcg_rstat_flush() removes another blkg's iostat entries via llist_del_all(). The second blkg sees an empty list and proceeds to free itself while the first is still iterating over its entries. Move the flush from __blkg_release() (RCU callback) to blkg_release() (before call_rcu). This ensures the RCU grace period waits for any concurrent flush's rcu_read_lock() section to complete before freeing. Cc: [email protected] Cc: Jay Shin <[email protected]> Cc: Tejun Heo <[email protected]> Cc: Waiman Long <[email protected]> Fixes: 20cb1c2fb756 ("blk-cgroup: Flush stats before releasing blkcg_gq") Reported-by: [email protected] Closes: https://lore.kernel.org/linux-block/CAHPqNmwT9oRpem3J3erS_W0uSQND47LGGSBsNxP8E6uSUish1w@mail.gmail.com/ Signed-off-by: Ming Lei <[email protected]> Tested-by: Jose Fernandez (Anthropic) <[email protected]> Link: https://patch.msgid.link/[email protected] Signed-off-by: Jens Axboe <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]>
Author: Denis Arefev <[email protected]> Date: Thu May 21 10:28:56 2026 +0300 block: Avoid mounting the bdev pseudo-filesystem in userspace commit f73aa66dffcb8e61e78f01b56163ec16a15d06d2 upstream. The bdev pseudo-filesystem is an internal kernel filesystem with which userspace should not interfere. Unregister it so that userspace cannot even attempt to mount it. This fixes a bug [1] that occurs when attempting to access files, because the system call move_mount() uses pointers declared in the inode_operations structure, which for the bdev pseudo-filesystem are always equal to 0. `inode->i_op = &empty_iops;` [1] BUG: kernel NULL pointer dereference, address: 0000000000000000 #PF: supervisor instruction fetch in kernel mode #PF: error_code(0x0010) - not-present page PGD 23380067 P4D 23380067 PUD 23381067 PMD 0 Oops: 0010 [#1] PREEMPT SMP KASAN NOPTI CPU: 2 PID: 17125 Comm: syz-executor.0 Not tainted 6.1.155-syzkaller-00350-g84221fde2681 #0 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.12.0-1 04/01/2014 RIP: 0010:0x0 Call Trace: <TASK> lookup_open.isra.0+0x700/0x1180 fs/namei.c:3460 open_last_lookups fs/namei.c:3550 [inline] path_openat+0x953/0x2700 fs/namei.c:3780 do_filp_open+0x1c5/0x410 fs/namei.c:3810 do_sys_openat2+0x171/0x4d0 fs/open.c:1318 do_sys_open fs/open.c:1334 [inline] __do_sys_openat fs/open.c:1350 [inline] __se_sys_openat fs/open.c:1345 [inline] __x64_sys_openat+0x13c/0x1f0 fs/open.c:1345 do_syscall_x64 arch/x86/entry/common.c:51 [inline] do_syscall_64+0x35/0x80 arch/x86/entry/common.c:81 entry_SYSCALL_64_after_hwframe+0x6e/0xd8 Found by Linux Verification Center (linuxtesting.org) with Syzkaller. Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Link: https://lore.kernel.org/all/[email protected]/T/# Cc: [email protected] Signed-off-by: Denis Arefev <[email protected]> Reviewed-by: Christoph Hellwig <[email protected]> Link: https://patch.msgid.link/[email protected] Signed-off-by: Jens Axboe <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]>
Author: Usama Arif <[email protected]> Date: Tue Jun 16 07:15:18 2026 -0700 block: invalidate cached plug timestamp after task switch commit fad156c2af227f42ca796cbb20ddc354a6dd9932 upstream. blk_time_get_ns() caches ktime_get_ns() in current->plug->cur_ktime and marks the task with PF_BLOCK_TS. That cache is only valid while the task keeps running; if the task is switched out, wall-clock time advances and the cached value must not be reused when the task runs again. The existing invalidation covers explicit plug flushes through __blk_flush_plug(), and the schedule() / rtmutex paths through sched_update_worker(). It does not cover in-kernel preemption paths such as preempt_schedule(), preempt_schedule_notrace(), and preempt_schedule_irq(), which enter __schedule(SM_PREEMPT) directly and return without calling sched_update_worker(). As a result, a task preempted while holding a plug with PF_BLOCK_TS set can reuse a stale plug->cur_ktime after it is scheduled back in. blk-iocost then consumes that stale timestamp through ioc_now(), producing stale vnow values for throttle decisions, and through ioc_rqos_done(), inflating on-queue time and feeding false missed-QoS samples into vrate adjustment. Move the schedule-side invalidation to finish_task_switch(), which runs for the scheduled-in task after every actual context switch regardless of which schedule entry point was used. Keep __blk_flush_plug() as the explicit flush/finish-plug invalidation path, and remove only the PF_BLOCK_TS handling from sched_update_worker(). Fixes: 06b23f92af87 ("block: update cached timestamp post schedule/preemption") Cc: [email protected] Signed-off-by: Usama Arif <[email protected]> Link: https://patch.msgid.link/[email protected] Signed-off-by: Jens Axboe <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]>
Author: Eric Dumazet <[email protected]> Date: Tue Jun 16 15:54:28 2026 +0000 bonding: 3ad: implement proper RCU rules for port->aggregator [ Upstream commit c4f050ce06c56cfb5993268af4a5cb66ed1cd04e ] syzbot found a data-race in bond_3ad_get_active_agg_info / bond_3ad_state_machine_handler [1] which hints at lack of proper RCU implementation. Add __rcu qualifier to port->aggregator, and add proper RCU API. [1] BUG: KCSAN: data-race in bond_3ad_get_active_agg_info / bond_3ad_state_machine_handler write to 0xffff88813cf5c4b0 of 8 bytes by task 36 on cpu 0: ad_port_selection_logic drivers/net/bonding/bond_3ad.c:1659 [inline] bond_3ad_state_machine_handler+0x9d5/0x2d60 drivers/net/bonding/bond_3ad.c:2569 process_one_work kernel/workqueue.c:3302 [inline] process_scheduled_works+0x4f0/0x9c0 kernel/workqueue.c:3385 worker_thread+0x58a/0x780 kernel/workqueue.c:3466 kthread+0x22a/0x280 kernel/kthread.c:436 ret_from_fork+0x146/0x330 arch/x86/kernel/process.c:158 ret_from_fork_asm+0x1a/0x30 arch/x86/entry/entry_64.S:245 read to 0xffff88813cf5c4b0 of 8 bytes by task 22063 on cpu 1: __bond_3ad_get_active_agg_info drivers/net/bonding/bond_3ad.c:2858 [inline] bond_3ad_get_active_agg_info+0x8c/0x230 drivers/net/bonding/bond_3ad.c:2881 bond_fill_info+0xe0f/0x10f0 drivers/net/bonding/bond_netlink.c:853 rtnl_link_info_fill net/core/rtnetlink.c:906 [inline] rtnl_link_fill+0x1d7/0x4e0 net/core/rtnetlink.c:927 rtnl_fill_ifinfo+0xf8e/0x1380 net/core/rtnetlink.c:2168 rtmsg_ifinfo_build_skb+0x11c/0x1b0 net/core/rtnetlink.c:4453 rtmsg_ifinfo_event net/core/rtnetlink.c:4486 [inline] rtmsg_ifinfo+0x6d/0x110 net/core/rtnetlink.c:4495 __dev_notify_flags+0x76/0x390 net/core/dev.c:9790 netif_change_flags+0xac/0xd0 net/core/dev.c:9823 do_setlink+0x905/0x2950 net/core/rtnetlink.c:3180 rtnl_group_changelink net/core/rtnetlink.c:3813 [inline] __rtnl_newlink net/core/rtnetlink.c:3981 [inline] rtnl_newlink+0xf55/0x1400 net/core/rtnetlink.c:4109 rtnetlink_rcv_msg+0x64b/0x720 net/core/rtnetlink.c:6995 netlink_rcv_skb+0x123/0x220 net/netlink/af_netlink.c:2550 rtnetlink_rcv+0x1c/0x30 net/core/rtnetlink.c:7022 netlink_unicast_kernel net/netlink/af_netlink.c:1318 [inline] netlink_unicast+0x5a8/0x680 net/netlink/af_netlink.c:1344 netlink_sendmsg+0x5c8/0x6f0 net/netlink/af_netlink.c:1894 sock_sendmsg_nosec net/socket.c:787 [inline] __sock_sendmsg net/socket.c:802 [inline] ____sys_sendmsg+0x563/0x5b0 net/socket.c:2698 ___sys_sendmsg+0x195/0x1e0 net/socket.c:2752 __sys_sendmsg net/socket.c:2784 [inline] __do_sys_sendmsg net/socket.c:2789 [inline] __se_sys_sendmsg net/socket.c:2787 [inline] __x64_sys_sendmsg+0xd4/0x160 net/socket.c:2787 x64_sys_call+0x194c/0x3020 arch/x86/include/generated/asm/syscalls_64.h:47 do_syscall_x64 arch/x86/entry/syscall_64.c:63 [inline] do_syscall_64+0x12c/0x3b0 arch/x86/entry/syscall_64.c:94 entry_SYSCALL_64_after_hwframe+0x77/0x7f value changed: 0x0000000000000000 -> 0xffff88813cf5c400 Reported by Kernel Concurrency Sanitizer on: CPU: 1 UID: 0 PID: 22063 Comm: syz.0.31122 Tainted: G W syzkaller #0 PREEMPT(full) Tainted: [W]=WARN Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 04/18/2026 Fixes: 47e91f56008b ("bonding: use RCU protection for 3ad xmit path") Reported-by: [email protected] Closes: https://lore.kernel.org/netdev/[email protected]/ Signed-off-by: Eric Dumazet <[email protected]> Cc: Jay Vosburgh <[email protected]> Cc: Andrew Lunn <[email protected]> Link: https://patch.msgid.link/[email protected] Signed-off-by: Jakub Kicinski <[email protected]> Signed-off-by: Kevin Berry <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]>
Author: Hangbin Liu <[email protected]> Date: Tue Jun 16 15:54:26 2026 +0000 bonding: add support for per-port LACP actor priority [ Upstream commit 6b6dc81ee7e8ca87c71a533e1d69cf96a4f1e986 ] Introduce a new netlink attribute 'actor_port_prio' to allow setting the LACP actor port priority on a per-slave basis. This extends the existing bonding infrastructure to support more granular control over LACP negotiations. The priority value is embedded in LACPDU packets and will be used by subsequent patches to influence aggregator selection policies. Signed-off-by: Hangbin Liu <[email protected]> Link: https://patch.msgid.link/[email protected] Signed-off-by: Paolo Abeni <[email protected]> Signed-off-by: Kevin Berry <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]>
Author: Eric Dumazet <[email protected]> Date: Wed Jun 3 12:35:14 2026 +0000 bonding: annotate data-races arcound churn variables commit b47ff80f280e18ad2310f44293cc057d9b64ff11 upstream. These fields are updated asynchronously by the bonding state machine in ad_churn_machine() while holding bond->mode_lock. bond_info_show_slave() and bond_fill_slave_info() read them without bond->mode_lock being held, we need to add READ_ONCE() and WRITE_ONCE() annotations. Note that AD_CHURN_MONITOR, AD_CHURN, and AD_NO_CHURN are defined exclusively in (kernel private) include/net/bond_3ad.h header. They should be moved to include/uapi/linux/if_bonding.h or userspace tools will have to hardcode their values. Fixes: 4916f2e2f3fc ("bonding: print churn state via netlink") Fixes: 14c9551a32eb ("bonding: Implement port churn-machine (AD standard 43.4.17).") Signed-off-by: Eric Dumazet <[email protected]> Link: https://patch.msgid.link/[email protected] Signed-off-by: Jakub Kicinski <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]>
Author: Hangbin Liu <[email protected]> Date: Wed Mar 4 15:13:53 2026 +0800 bonding: do not set usable_slaves for broadcast mode commit 45fc134bcfadde456639c1b1e206e6918d69a553 upstream. After commit e0caeb24f538 ("net: bonding: update the slave array for broadcast mode"), broadcast mode will also set all_slaves and usable_slaves during bond_enslave(). But if we also set updelay, during enslave, the slave init state will be BOND_LINK_BACK. And later bond_update_slave_arr() will alloc usable_slaves but add nothing. This will cause bond_miimon_inspect() to have ignore_updelay always true. So the updelay will be always ignored. e.g. [ 6.498368] bond0: (slave veth2): link status definitely down, disabling slave [ 7.536371] bond0: (slave veth2): link status up, enabling it in 0 ms [ 7.536402] bond0: (slave veth2): link status definitely up, 10000 Mbps full duplex To fix it, we can either always call bond_update_slave_arr() on every place when link changes. Or, let's just not set usable_slaves for broadcast mode. Fixes: e0caeb24f538 ("net: bonding: update the slave array for broadcast mode") Reported-by: Liang Li <[email protected]> Signed-off-by: Hangbin Liu <[email protected]> Link: https://patch.msgid.link/[email protected] Signed-off-by: Jakub Kicinski <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]>
Author: Hangbin Liu <[email protected]> Date: Tue Jun 16 15:54:30 2026 +0000 bonding: fix NULL pointer dereference in actor_port_prio setting [ Upstream commit 067bf016e99ad72aa4ff869d6dec1fd62a9c6202 ] Liang reported an issue where setting a slave’s actor_port_prio to predefined values such as 0, 255, or 65535 would cause a system crash. The problem occurs because in bond_opt_parse(), when the provided value matches a predefined table entry, the function returns that table entry, which does not contain slave information. Later, in bond_option_actor_port_prio_set(), calling bond_slave_get_rtnl() leads to a NULL pointer dereference. Since actor_port_prio is defined as a u16 and initialized to the default value of 255 in ad_initialize_port(), there is no need for the bond_actor_port_prio_tbl. Using the BOND_OPTFLAG_RAWVAL flag is sufficient. Fixes: 6b6dc81ee7e8 ("bonding: add support for per-port LACP actor priority") Reported-by: Liang Li <[email protected]> Signed-off-by: Hangbin Liu <[email protected]> Link: https://patch.msgid.link/[email protected] Signed-off-by: Jakub Kicinski <[email protected]> Signed-off-by: Kevin Berry <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]>
Author: Hangbin Liu <[email protected]> Date: Tue Jun 16 15:54:27 2026 +0000 bonding: print churn state via netlink [ Upstream commit 4916f2e2f3fc9aef289fcd07949301e5c29094c2 ] Currently, the churn state is printed only in sysfs. Add netlink support so users could get the state via netlink. Signed-off-by: Hangbin Liu <[email protected]> Link: https://patch.msgid.link/[email protected] Signed-off-by: Paolo Abeni <[email protected]> Signed-off-by: Kevin Berry <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]>
Author: Varun R Mallya <[email protected]> Date: Wed Jun 24 14:22:31 2026 +0800 bpf: Reject sleepable kprobe_multi programs at attach time commit eb7024bfcc5f68ed11ed9dd4891a3073c15f04a8 upstream. kprobe.multi programs run in atomic/RCU context and cannot sleep. However, bpf_kprobe_multi_link_attach() did not validate whether the program being attached had the sleepable flag set, allowing sleepable helpers such as bpf_copy_from_user() to be invoked from a non-sleepable context. This causes a "sleeping function called from invalid context" splat: BUG: sleeping function called from invalid context at ./include/linux/uaccess.h:169 in_atomic(): 1, irqs_disabled(): 0, non_block: 0, pid: 1787, name: sudo preempt_count: 1, expected: 0 RCU nest depth: 2, expected: 0 Fix this by rejecting sleepable programs early in bpf_kprobe_multi_link_attach(), before any further processing. Fixes: 0dcac2725406 ("bpf: Add multi kprobe link") Signed-off-by: Varun R Mallya <[email protected]> Acked-by: Kumar Kartikeya Dwivedi <[email protected]> Acked-by: Leon Hwang <[email protected]> Acked-by: Jiri Olsa <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Alexei Starovoitov <[email protected]> Signed-off-by: Shung-Hsi Yu <[email protected]> Signed-off-by: Sasha Levin <[email protected]>
Author: Dawei Feng <[email protected]> Date: Wed Jun 3 18:53:16 2026 +0800 bpf: use kvfree() for replaced sysctl write buffer commit 4c21b5927d4364bfe7365f2700da5fea0ed0d004 upstream. proc_sys_call_handler() allocates its temporary sysctl buffer with kvzalloc() and passes it to __cgroup_bpf_run_filter_sysctl(). Since kvzalloc() may fall back to vmalloc() for large allocations, freeing that buffer with kfree() is wrong and can corrupt memory. Use kvfree() to safely handle both kmalloc and kvzalloc()/vmalloc allocations. The bug was first flagged by an experimental analysis tool we are developing for kernel memory-management bugs while analyzing v6.13-rc1. The tool is still under development and is not yet publicly available. Manual inspection confirms that the bug is still present in v7.1-rc5. Reproduced the bug based on v7.1-rc4 in a QEMU x86_64 guest booted with KASAN and CONFIG_FAILSLAB enabled. To exercise the replacement path, the test tree also included the accompanying fix for the stale ret == 1 check in __cgroup_bpf_run_filter_sysctl(). The reproducer confines failslab injections to the proc_sys_call_handler() range, uses stacktrace-depth=32, and injects fail-nth=1 while writing 8191 bytes to /proc/sys/kernel/domainname from a task in the target cgroup. Under that setup, fail-nth=1 triggered the fault: BUG: unable to handle page fault for address: ffffeb0200024d48 #PF: supervisor read access in kernel mode #PF: error_code(0x0000) - not-present page PGD 0 P4D 0 Oops: Oops: 0000 SMP KASAN NOPTI CPU: 2 UID: 0 PID: 209 Comm: repro_proc_sys_ Not tainted 7.1.0-rc4-00686-g97625979a5d4 PREEMPT(lazy) Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.15.0-1 04/01/2014 RIP: 0010:kfree+0x6e/0x510 ... Call Trace: <TASK> ? __cgroup_bpf_run_filter_sysctl+0x626/0xc30 __cgroup_bpf_run_filter_sysctl+0x74d/0xc30 ? __pfx___cgroup_bpf_run_filter_sysctl+0x10/0x10 ? srso_return_thunk+0x5/0x5f ? __kvmalloc_node_noprof+0x345/0x870 ? proc_sys_call_handler+0x250/0x480 ? srso_return_thunk+0x5/0x5f proc_sys_call_handler+0x3a2/0x480 ? __pfx_proc_sys_call_handler+0x10/0x10 ? srso_return_thunk+0x5/0x5f ? selinux_file_permission+0x39f/0x500 ? srso_return_thunk+0x5/0x5f ? lock_is_held_type+0x9e/0x120 vfs_write+0x98e/0x1000 ... </TASK> With this fix applied on top of the same test setup, rerunning the reproducer with fail-nth=1 yields no corresponding Oops reports. Fixes: 4508943794ef ("proc: use kvzalloc for our kernel buffer") Cc: [email protected] Reviewed-by: Emil Tsalapatis <[email protected]> Reviewed-by: Jiayuan Chen <[email protected]> Acked-by: Yonghong Song <[email protected]> Signed-off-by: Zilin Guan <[email protected]> Signed-off-by: Dawei Feng <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Alexei Starovoitov <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]>
Author: Giovanni Cabiddu <[email protected]> Date: Thu Jun 25 10:08:46 2026 -0400 crypto: qat - remove unused character device and IOCTLs [ Upstream commit d237230728c567297f2f98b425d63156ab2ed17f ] The QAT driver exposes a character device (qat_adf_ctl) with IOCTLs for device configuration, start, stop, status query and enumeration. These IOCTLs are not part of any public uAPI header and have no known in-tree or out-of-tree users. Device lifecycle is already managed via sysfs. The ioctl interface also increases the attack surface and is the subject of a number of bug reports. Remove the character device, the IOCTL definitions, and the related data structures (adf_dev_status_info, adf_user_cfg_key_val, adf_user_cfg_section, adf_user_cfg_ctl_data). Drop the now-unused adf_cfg_user.h header and strip adf_ctl_drv.c down to the minimal module_init/module_exit hooks for workqueue, AER, and crypto/compression algorithm registration. Clean up leftover dead code that was only reachable from the removed IOCTL paths: adf_cfg_del_all(), adf_devmgr_verify_id(), adf_devmgr_get_num_dev(), adf_devmgr_get_dev_by_id(), adf_get_vf_real_id() and the unused ADF_CFG macros. Additionally, drop the entry associated to QAT IOCTLs in ioctl-number.rst. Cc: [email protected] Fixes: d8cba25d2c68 ("crypto: qat - Intel(R) QAT driver framework") Reported-by: Zhi Wang <[email protected]> Reported-by: Bin Yu <[email protected]> Reported-by: MingYu Wang <[email protected]> Closes: https://lore.kernel.org/all/[email protected]/ Link: https://lore.kernel.org/all/[email protected]/ Link: https://lore.kernel.org/all/[email protected]/ Link: https://lore.kernel.org/all/[email protected]/ Signed-off-by: Giovanni Cabiddu <[email protected]> Reviewed-by: Ahsan Atta <[email protected]> Signed-off-by: Herbert Xu <[email protected]> Signed-off-by: Sasha Levin <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]>
Author: Thorsten Blum <[email protected]> Date: Thu Jun 25 10:08:44 2026 -0400 crypto: qat - Replace kzalloc() + copy_from_user() with memdup_user() [ Upstream commit 1e26339703e2afd397037defa798682b2b93dcc0 ] Replace kzalloc() followed by copy_from_user() with memdup_user() to improve and simplify adf_ctl_alloc_resources(). memdup_user() returns either -ENOMEM or -EFAULT (instead of -EIO) if an error occurs. Remove the unnecessary device id initialization, since memdup_user() (like copy_from_user()) immediately overwrites it. No functional changes intended other than returning the more idiomatic error code -EFAULT. Signed-off-by: Thorsten Blum <[email protected]> Signed-off-by: Herbert Xu <[email protected]> Stable-dep-of: d237230728c5 ("crypto: qat - remove unused character device and IOCTLs") Signed-off-by: Sasha Levin <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]>
Author: Herbert Xu <[email protected]> Date: Thu Jun 25 10:08:45 2026 -0400 crypto: qat - Return pointer directly in adf_ctl_alloc_resources [ Upstream commit 5ce9891ea928208a915411ce8227f8c3e37e5ad9 ] Returning values through arguments is confusing and that has upset the compiler with the recent change to memdup_user: ../drivers/crypto/intel/qat/qat_common/adf_ctl_drv.c: In function ‘adf_ctl_ioctl’: ../drivers/crypto/intel/qat/qat_common/adf_ctl_drv.c:308:26: warning: ‘ctl_data’ may be used uninitialized [-Wmaybe-uninitialized] 308 | ctl_data->device_id); | ^~ ../drivers/crypto/intel/qat/qat_common/adf_ctl_drv.c:294:39: note: ‘ctl_data’ was declared here 294 | struct adf_user_cfg_ctl_data *ctl_data; | ^~~~~~~~ In function ‘adf_ctl_ioctl_dev_stop’, inlined from ‘adf_ctl_ioctl’ at ../drivers/crypto/intel/qat/qat_common/adf_ctl_drv.c:386:9: ../drivers/crypto/intel/qat/qat_common/adf_ctl_drv.c:273:48: warning: ‘ctl_data’ may be used uninitialized [-Wmaybe-uninitialized] 273 | ret = adf_ctl_is_device_in_use(ctl_data->device_id); | ~~~~~~~~^~~~~~~~~~~ ../drivers/crypto/intel/qat/qat_common/adf_ctl_drv.c: In function ‘adf_ctl_ioctl’: ../drivers/crypto/intel/qat/qat_common/adf_ctl_drv.c:261:39: note: ‘ctl_data’ was declared here 261 | struct adf_user_cfg_ctl_data *ctl_data; | ^~~~~~~~ In function ‘adf_ctl_ioctl_dev_config’, inlined from ‘adf_ctl_ioctl’ at ../drivers/crypto/intel/qat/qat_common/adf_ctl_drv.c:382:9: ../drivers/crypto/intel/qat/qat_common/adf_ctl_drv.c:192:54: warning: ‘ctl_data’ may be used uninitialized [-Wmaybe-uninitialized] 192 | accel_dev = adf_devmgr_get_dev_by_id(ctl_data->device_id); | ~~~~~~~~^~~~~~~~~~~ ../drivers/crypto/intel/qat/qat_common/adf_ctl_drv.c: In function ‘adf_ctl_ioctl’: ../drivers/crypto/intel/qat/qat_common/adf_ctl_drv.c:185:39: note: ‘ctl_data’ was declared here 185 | struct adf_user_cfg_ctl_data *ctl_data; | ^~~~~~~~ Fix this by returning the pointer directly. Signed-off-by: Herbert Xu <[email protected]> Reviewed-by: Thorsten Blum <[email protected]> Acked-by: Giovanni Cabiddu <[email protected]> Signed-off-by: Herbert Xu <[email protected]> Stable-dep-of: d237230728c5 ("crypto: qat - remove unused character device and IOCTLs") Signed-off-by: Sasha Levin <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]>
Author: Sebastian Andrzej Siewior <[email protected]> Date: Mon Jun 22 11:55:12 2026 +0200 debugobjects: Allow to refill the pool before SYSTEM_SCHEDULING commit 06e0ae988f6e3499785c407429953ade19c1096b upstream. The pool of free objects is refilled on several occasions such as object initialisation. On PREEMPT_RT refilling is limited to preemptible sections due to sleeping locks used by the memory allocator. The system boots with disabled interrupts so the pool can not be refilled. If too many objects are initialized and the pool gets empty then debugobjects disables itself. Refiling can also happen early in the boot with disabled interrupts as long as the scheduler is not operational. If the scheduler can not preempt a task then a sleeping lock can not be contended. Allow to additionally refill the pool if the scheduler is not operational. Signed-off-by: Sebastian Andrzej Siewior <[email protected]> Signed-off-by: Thomas Gleixner <[email protected]> Link: https://patch.msgid.link/[email protected] Signed-off-by: Sasha Levin <[email protected]>
Author: Helen Koike <[email protected]> Date: Mon Jun 22 11:55:21 2026 +0200 debugobjects: Do not fill_pool() if pi_blocked_on commit 5f41161059fd0f1bbf18c90f3180e38cc45a14eb upstream. On RT enabled kernels, fill_pool() ends up calling rtlock_lock(), which asserts if current::pi_blocked_on is set, because a task can obviously only block on one lock as otherwise the priority inheritenace chain gets corrupted. Prevent this by expanding the conditional to take current::pi_blocked_on into account. Fixes: 4bedcc28469a ("debugobjects: Make them PREEMPT_RT aware") Reported-by: [email protected] Signed-off-by: Helen Koike <[email protected]> Signed-off-by: Thomas Gleixner <[email protected]> Link: https://patch.msgid.link/[email protected] Closes: https://syzkaller.appspot.com/bug?extid=b8ca586b9fc235f0c0df Signed-off-by: Sasha Levin <[email protected]>
Author: Waiman Long <[email protected]> Date: Mon Jun 22 11:55:26 2026 +0200 debugobjects: Dont call fill_pool() in early boot hardirq context commit 0d046ae106255cba5eb83b23f78ee93f3620247d upstream. When booting a debug PREEMPT_RT kernel on an ARM64 system, a "inconsistent {HARDIRQ-ON-W} -> {IN-HARDIRQ-W} usage" lockdep warning message was reported to the console. During early boot, interrupts are enabled before the scheduler is enabled. In this window (before SYSTEM_SCHEDULING is set) interrupts can fire and in the hard interrupt context handler attempt to fill the pool This can lead to a deadlock when the interrupt occurred when the interrupt hits a region which holds a lock that is required to be taken in the allocation path. Add a new can_fill_pool() helper and reorder the exception rule and forbid this scenario by excluding allocations from hard interrupt context. Fixes: 06e0ae988f6e ("debugobjects: Allow to refill the pool before SYSTEM_SCHEDULING") Suggested-by: Sebastian Andrzej Siewior <[email protected]> Suggested-by: Thomas Gleixner <[email protected]> Signed-off-by: Waiman Long <[email protected]> Signed-off-by: Thomas Gleixner <[email protected]> Reviewed-by: Sebastian Andrzej Siewior <[email protected]> Cc: [email protected] Link: https://patch.msgid.link/[email protected] Signed-off-by: Sasha Levin <[email protected]>
Author: Sebastian Andrzej Siewior <[email protected]> Date: Mon Jun 22 11:55:17 2026 +0200 debugobjects: Use LD_WAIT_CONFIG instead of LD_WAIT_SLEEP commit 37de2dbc318ee10577c1c2704de5a803e75e55a2 upstream. fill_pool_map is used to suppress nesting violations caused by acquiring a spinlock_t (from within the memory allocator) while holding a raw_spinlock_t. The used annotation is wrong. LD_WAIT_SLEEP is for always sleeping lock types such as mutex_t. LD_WAIT_CONFIG is for lock type which are sleeping while spinning on PREEMPT_RT such as spinlock_t. Use LD_WAIT_CONFIG as override. Signed-off-by: Sebastian Andrzej Siewior <[email protected]> Signed-off-by: Thomas Gleixner <[email protected]> Link: https://patch.msgid.link/[email protected] Signed-off-by: Sasha Levin <[email protected]>
Author: Thadeu Lima de Souza Cascardo <[email protected]> Date: Mon Feb 10 13:16:22 2025 -0600 dlm: prevent NPD when writing a positive value to event_done commit 8e2bad543eca5c25cd02cbc63d72557934d45f13 upstream. do_uevent returns the value written to event_done. In case it is a positive value, new_lockspace would undo all the work, and lockspace would not be set. __dlm_new_lockspace, however, would treat that positive value as a success due to commit 8511a2728ab8 ("dlm: fix use count with multiple joins"). Down the line, device_create_lockspace would pass that NULL lockspace to dlm_find_lockspace_local, leading to a NULL pointer dereference. Treating such positive values as successes prevents the problem. Given this has been broken for so long, this is unlikely to break userspace expectations. Fixes: 8511a2728ab8 ("dlm: fix use count with multiple joins") Signed-off-by: Thadeu Lima de Souza Cascardo <[email protected]> Signed-off-by: David Teigland <[email protected]> Signed-off-by: Nazar Kalashnikov <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]>
Author: Bagas Sanjaya <[email protected]> Date: Thu Jun 25 10:08:43 2026 -0400 Documentation: ioctl-number: Extend "Include File" column width [ Upstream commit 15afd5def819e4df2a29cef6fcfa6ae7ba167c0f ] Extend width of "Include File" column to fit full path to papr-physical-attestation.h in later commit. Reviewed-by: Haren Myneni <[email protected]> Signed-off-by: Bagas Sanjaya <[email protected]> Acked-by: Madhavan Srinivasan <[email protected]> Signed-off-by: Jonathan Corbet <[email protected]> Link: https://lore.kernel.org/r/[email protected] Stable-dep-of: d237230728c5 ("crypto: qat - remove unused character device and IOCTLs") Signed-off-by: Sasha Levin <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]>
Author: Bagas Sanjaya <[email protected]> Date: Thu Jun 25 10:08:42 2026 -0400 Documentation: ioctl-number: Fix linuxppc-dev mailto link [ Upstream commit 3dfa97bd93614c15418ba7b5c727f6c5bb617174 ] Spell out full Linux PPC mailing list address like other subsystem mailing lists listed in the table. Reviewed-by: Haren Myneni <[email protected]> Signed-off-by: Bagas Sanjaya <[email protected]> Acked-by: Madhavan Srinivasan <[email protected]> Signed-off-by: Jonathan Corbet <[email protected]> Link: https://lore.kernel.org/r/[email protected] Stable-dep-of: d237230728c5 ("crypto: qat - remove unused character device and IOCTLs") Signed-off-by: Sasha Levin <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]>
Author: Georgi Djakov <[email protected]> Date: Thu Jun 25 10:06:04 2026 -0400 drivers/base/memory: set mem->altmap after successful device registration [ Upstream commit a2b8d7827f48ee54a686cb80e4a1d0ff954ec42a ] If __add_memory_block() fails at xa_store() (under memory pressure for example), device_unregister() is called, which eventually triggers memory_block_release() with mem->altmap still set, causing a WARN_ON(mem->altmap). This was triggered by modifying virtio-mem driver. Fix this by delaying the assignment of mem->altmap until after __add_memory_block() has succeeded. Link: https://lore.kernel.org/[email protected] Fixes: 1a8c64e11043 ("mm/memory_hotplug: embed vmem_altmap details in memory block") Signed-off-by: Georgi Djakov <[email protected]> Acked-by: Oscar Salvador (SUSE) <[email protected]> Cc: Vishal Verma <[email protected]> Cc: Mike Rapoport <[email protected]> Cc: Richard Cheng <[email protected]> Cc: David Hildenbrand <[email protected]> Cc: Georgi Djakov <[email protected]> Cc: Greg Kroah-Hartman <[email protected]> Cc: "Rafael J. Wysocki" <[email protected]> Cc: <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Sasha Levin <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]>
Author: Dexuan Cui <[email protected]> Date: Tue Jun 16 13:01:22 2026 -0400 Drivers: hv: vmbus: Improve the logic of reserving fb_mmio on Gen2 VMs [ Upstream commit 016a25e4b0df4d77e7c258edee4aaf982e4ee809 ] If vmbus_reserve_fb() in the kdump/kexec kernel fails to properly reserve the framebuffer MMIO range (which is below 4GB) due to a Gen2 VM's screen.lfb_base being zero [1], there is an MMIO conflict between the drivers hyperv-drm and pci-hyperv: when the driver pci-hyperv's hv_allocate_config_window() calls vmbus_allocate_mmio() to get an MMIO range, typically it gets a 32-bit MMIO range that overlaps with the framebuffer MMIO range, and later hv_pci_enter_d0() fails with an error message "PCI Pass-through VSP failed D0 Entry with status" since the host thinks that PCI devices must not use MMIO space that the host has assigned to the framebuffer. This is especially an issue if pci-hyperv is built-in and hyperv-drm is built as a module. Consequently, the kdump/kexec kernel fails to detect PCI devices via pci-hyperv, and may fail to mount the root file system, which may reside in a NVMe disk. The issue described here has existed for SR-IOV VF NICs since day one of the pci-hyperv driver, and has been worked around on x64 when possible. With the recent introduction of ARM64 VMs that boot from NVMe, there is no workaround, so we need a formal fix. On Gen2 VMs, if the screen.lfb_base is 0 in the kdump/kexec kernel [1], fall back to the low MMIO base, which should be equal to the framebuffer MMIO base [2] (the statement is true according to my testing on x64 Windows Server 2016, and on x64 and ARM64 Windows Server 2025 and on Azure. I checked with the Hyper-V team and they said the statement should continue to be true for Gen2 VMs). In the first kernel, screen.lfb_base is not 0; if the user specifies a very high resolution, it's not enough to only reserve 8MB: let's always reserve half of the space below 4GB, but cap the reservation to 128MB, which is the required framebuffer size of the highest resolution 7680*4320 supported by Hyper-V. While at it, fix the comparison "end > VTPM_BASE_ADDRESS" by changing the > to >=. Here the 'end' is an inclusive end (typically, it's 0xFFFF_FFFF for the low MMIO range). Note: vmbus_reserve_fb() now also reserves an MMIO range at the beginning of the low MMIO range on CVMs, which have no framebuffers (the 'screen.lfb_base' in vmbus_reserve_fb() is 0 for CVMs), just in case the host might treat the beginning of the low MMIO range specially [3]. BTW, the OpenHCL kernel is not affected by the change, because that kernel boots with DeviceTree rather than ACPI (so vmbus_reserve_fb() won't run there), and there is no framebuffer device for that kernel. Note: normally Gen1 VMs don't have the MMIO conflict issue because the framebuffer MMIO range (which is hardcoded to base=4GB-128MB and size=64MB for Gen1 VMs by the host) is always reported via the legacy PCI graphics device's BAR, so the kdump/kexec kernel can reserve the 64MB MMIO range; however, if the VM is configured to use a very high resolution and the required framebuffer size exceeds 64MB (AFAIK, in practice, this isn't a typical configuration by users), the hyperv-drm driver may need to allocate an MMIO range above 4GB and change the framebuffer MMIO location to the allocated MMIO range -- in this case, there can still be issues [4] which can't be easily fixed: any possible affected Gen1 users would have to use a resolution whose framebuffer size is <= 64MB, or switch to Gen2 VMs. [1] https://lore.kernel.org/all/SA1PR21MB692176C1BC53BFC9EAE5CF8EBF51A@SA1PR21MB6921.namprd21.prod.outlook.com/ [2] https://lore.kernel.org/all/SA1PR21MB69218F955B62DFF62E3E88D2BF222@SA1PR21MB6921.namprd21.prod.outlook.com/ [3] https://lore.kernel.org/all/SN6PR02MB415726B17D5A6027CD1717E8D4342@SN6PR02MB4157.namprd02.prod.outlook.com/ [4] https://lore.kernel.org/all/SA1PR21MB69213486F821CA5A2C793C81BF342@SA1PR21MB6921.namprd21.prod.outlook.com/ Fixes: 4daace0d8ce8 ("PCI: hv: Add paravirtual PCI front-end for Microsoft Hyper-V VMs") CC: [email protected] Reviewed-by: Michael Kelley <[email protected]> Tested-by: Krister Johansen <[email protected]> Tested-by: Matthew Ruffell <[email protected]> Signed-off-by: Dexuan Cui <[email protected]> Signed-off-by: Wei Liu <[email protected]> Signed-off-by: Sasha Levin <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]>
Author: MaĂra Canal <[email protected]> Date: Tue Jun 2 14:50:15 2026 -0300 drm/v3d: Skip CSD when it has zeroed workgroups [ Upstream commit 7f93fad5ea0affc9e1505dd0f7596c0fdb496213 ] A compute shader dispatch encodes its workgroup counts in the CFG0..CFG2 registers. Kicking off a dispatch with a zero count in any of the three dimensions is invalid. First, the hardware will process 0 as 65536, while the user-space driver exposes a maximum of 65535. Over that, a submission with a zeroed workgroup dimension should be a no-op. These zeroed counts can reach the dispatch path through an indirect CSD job, whose workgroup counts are only known once the indirect buffer is read and may legitimately be zero, but such scenario should only result in a no-op. Overwrite the indirect CSD job workgroup counts with the indirect BO ones, even if they are zeroed, and don't submit the job to the hardware when any of the workgroup counts is zero, so the job completes immediately instead of running the shader. Cc: [email protected] Fixes: d223f98f0209 ("drm/v3d: Add support for compute shader dispatch.") Suggested-by: Jose Maria Casanova Crespo <[email protected]> Reviewed-by: Iago Toral Quiroga <[email protected]> Link: https://patch.msgid.link/[email protected] Signed-off-by: MaĂra Canal <[email protected]> Signed-off-by: Sasha Levin <[email protected]>
Author: MaĂra Canal <[email protected]> Date: Tue Aug 26 11:18:59 2025 -0300 drm/v3d: Store the active job inside the queue's state [ Upstream commit 0d3768826d38c0ac740f8b45cd13346630535f2b ] Instead of storing the queue's active job in four different variables, store the active job inside the queue's state. This way, it's possible to access all active jobs using an index based in `enum v3d_queue`. Reviewed-by: Iago Toral Quiroga <[email protected]> Reviewed-by: Melissa Wen <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: MaĂra Canal <[email protected]> Stable-dep-of: 7f93fad5ea0a ("drm/v3d: Skip CSD when it has zeroed workgroups") Signed-off-by: Sasha Levin <[email protected]>
Author: Jani Nikula <[email protected]> Date: Fri May 15 19:09:20 2026 +0300 drm/xe/display: fix oops in suspend/shutdown without display [ Upstream commit 68938cc08e23a94fd881e845837ff918de005ce7 ] The xe driver keeps track of whether to probe display, and whether display hardware is there, using xe->info.probe_display. It gets set to false if there's no display after intel_display_device_probe(). However, the display may also be disabled via fuses, detected at a later time in intel_display_device_info_runtime_init(). In this case, the xe driver does for_each_intel_crtc() on uninitialized mode config in xe_display_flush_cleanup_work(), leading to a NULL pointer dereference, and generally calls display code with display info cleared. Check for intel_display_device_present() after intel_display_device_info_runtime_init(), and reset xe->info.probe_display as necessary. Also do unset_display_features() for completeness, although display runtime init has already done that. This will need to be unified across all cases later. Move intel_display_device_info_runtime_init() call slightly earlier, similar to i915, to avoid a bunch of unnecessary setup for no display cases. Note #1: The xe driver has no business doing low level display plumbing like for_each_intel_crtc() to begin with. It all needs to happen in display code. Note #2: The actual bug is present already in commit 44e694958b95 ("drm/xe/display: Implement display support"), but the oops was likely introduced later at commit ddf6492e0e50 ("drm/xe/display: Make display suspend/resume work on discrete"). Fixes: 44e694958b95 ("drm/xe/display: Implement display support") Closes: https://gitlab.freedesktop.org/drm/xe/kernel/-/work_items/7904 Closes: https://gitlab.freedesktop.org/drm/xe/kernel/-/work_items/6150 Cc: [email protected] # v6.8+ Reviewed-by: Suraj Kandpal <[email protected]> Link: https://patch.msgid.link/[email protected] Signed-off-by: Jani Nikula <[email protected]> (cherry picked from commit 7c3eb9f47533220888a67266448185fd0775d4da) Signed-off-by: Matthew Brost <[email protected]> Signed-off-by: Sasha Levin <[email protected]>
Author: Arnd Bergmann <[email protected]> Date: Tue May 26 12:18:41 2026 +0200 err.h: use __always_inline on all error pointer helpers commit 94bfc7f3b0c7c33331ba4ff6cc64ff309dfcbce8 upstream. While testing randconfig builds on s390, I came across a link failure with CONFIG_DMA_SHARED_BUFFER disabled: ERROR: modpost: "dma_buf_put" [drivers/iommu/iommufd/iommufd.ko] undefined! The problem here is that IS_ERR() is not inlined and dead code elimination fails as a consequence. The err.h helpers all turn into a trivial assignment of a bit mask and should never result in a function call, so force them to always be inline. This should generally result in better object code aside from avoiding the link failure above. Link: https://lore.kernel.org/[email protected] Signed-off-by: Arnd Bergmann <[email protected]> Reviewed-by: Alexander Lobakin <[email protected]> Reviewed-by: Nathan Chancellor <[email protected]> Tested-by: Tamir Duberstein <[email protected]> Cc: Alexander Gordeev <[email protected]> Cc: Andriy Shevchenko <[email protected]> Cc: Ansuel Smith <[email protected]> Cc: Bjorn Andersson <[email protected]> Cc: Heiko Carstens <[email protected]> Cc: Vasily Gorbik <[email protected]> Cc: <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]>
Author: Christian Brauner <[email protected]> Date: Fri Jun 19 16:58:44 2026 +0200 eventpoll: drop vestigial __ prefix from ep_remove_{file,epi}() [ Upstream commit 0feaf644f7180c4a91b6b405a881afbfd958f1cf ] With __ep_remove() gone, the double-underscore on __ep_remove_file() and __ep_remove_epi() no longer contrasts with a __-less parent and just reads as noise. Rename both to ep_remove_file() and ep_remove_epi(). No functional change. Signed-off-by: Christian Brauner (Amutable) <[email protected]> Stable-dep-of: a6dc643c6931 ("eventpoll: fix ep_remove struct eventpoll / struct file UAF") Signed-off-by: Quentin Schulz <[email protected]> Signed-off-by: Sasha Levin <[email protected]>
Author: Christian Brauner <[email protected]> Date: Fri Jun 19 16:58:47 2026 +0200 eventpoll: fix ep_remove struct eventpoll / struct file UAF [ Upstream commit a6dc643c69311677c574a0f17a3f4d66a5f3744b ] ep_remove() (via ep_remove_file()) cleared file->f_ep under file->f_lock but then kept using @file inside the critical section (is_file_epoll(), hlist_del_rcu() through the head, spin_unlock). A concurrent __fput() taking the eventpoll_release() fastpath in that window observed the transient NULL, skipped eventpoll_release_file() and ran to f_op->release / file_free(). For the epoll-watches-epoll case, f_op->release is ep_eventpoll_release() -> ep_clear_and_put() -> ep_free(), which kfree()s the watched struct eventpoll. Its embedded ->refs hlist_head is exactly where epi->fllink.pprev points, so the subsequent hlist_del_rcu()'s "*pprev = next" scribbles into freed kmalloc-192 memory. In addition, struct file is SLAB_TYPESAFE_BY_RCU, so the slot backing @file could be recycled by alloc_empty_file() -- reinitializing f_lock and f_ep -- while ep_remove() is still nominally inside that lock. The upshot is an attacker-controllable kmem_cache_free() against the wrong slab cache. Pin @file via epi_fget() at the top of ep_remove() and gate the critical section on the pin succeeding. With the pin held @file cannot reach refcount zero, which holds __fput() off and transitively keeps the watched struct eventpoll alive across the hlist_del_rcu() and the f_lock use, closing both UAFs. If the pin fails @file has already reached refcount zero and its __fput() is in flight. Because we bailed before clearing f_ep, that path takes the eventpoll_release() slow path into eventpoll_release_file() and blocks on ep->mtx until the waiter side's ep_clear_and_put() drops it. The bailed epi's share of ep->refcount stays intact, so the trailing ep_refcount_dec_and_test() in ep_clear_and_put() cannot free the eventpoll out from under eventpoll_release_file(); the orphaned epi is then cleaned up there. A successful pin also proves we are not racing eventpoll_release_file() on this epi, so drop the now-redundant re-check of epi->dying under f_lock. The cheap lockless READ_ONCE(epi->dying) fast-path bailout stays. Fixes: 58c9b016e128 ("epoll: use refcount to reduce ep_mutex contention") Reported-by: Jaeyoung Chung <[email protected]> Link: https://patch.msgid.link/[email protected] Signed-off-by: Christian Brauner (Amutable) <[email protected]> Signed-off-by: Quentin Schulz <[email protected]> Signed-off-by: Sasha Levin <[email protected]>
Author: Christian Brauner <[email protected]> Date: Fri Jun 19 16:58:43 2026 +0200 eventpoll: kill __ep_remove() [ Upstream commit e9e5cd40d7c403e19f21d0f7b8b8ba3a76b58330 ] Remove the boolean conditional in __ep_remove() and restructure the code so the check for racing with eventpoll_release_file() are only done in the ep_remove_safe() path where they belong. Link: https://patch.msgid.link/[email protected] Signed-off-by: Christian Brauner (Amutable) <[email protected]> Stable-dep-of: a6dc643c6931 ("eventpoll: fix ep_remove struct eventpoll / struct file UAF") Signed-off-by: Quentin Schulz <[email protected]> Signed-off-by: Sasha Levin <[email protected]>
Author: Christian Brauner <[email protected]> Date: Fri Jun 19 16:58:46 2026 +0200 eventpoll: move epi_fget() up [ Upstream commit 86e87059e6d1fd5115a31949726450ed03c1073b ] We'll need it when removing files so move it up. No functional change. Link: https://patch.msgid.link/[email protected] Signed-off-by: Christian Brauner (Amutable) <[email protected]> Stable-dep-of: a6dc643c6931 ("eventpoll: fix ep_remove struct eventpoll / struct file UAF") [file_ref_get(&file->f_ref) from original commit left as atomic_long_inc_not_zero(&file->f_count) due to v6.12.y missing commit 90ee6ed776c0 ("fs: port files to file_ref") and its dependent commit 08ef26ea9ab3 ("fs: add file_ref")] Signed-off-by: Quentin Schulz <[email protected]> Signed-off-by: Sasha Levin <[email protected]>
Author: Christian Brauner <[email protected]> Date: Fri Jun 19 16:58:45 2026 +0200 eventpoll: rename ep_remove_safe() back to ep_remove() [ Upstream commit 0bade234723e40e4937be912e105785d6a51464e ] The current name is just confusing and doesn't clarify anything. Link: https://patch.msgid.link/[email protected] Signed-off-by: Christian Brauner (Amutable) <[email protected]> Stable-dep-of: a6dc643c6931 ("eventpoll: fix ep_remove struct eventpoll / struct file UAF") Signed-off-by: Quentin Schulz <[email protected]> Signed-off-by: Sasha Levin <[email protected]>
Author: Christian Brauner <[email protected]> Date: Fri Jun 19 16:58:42 2026 +0200 eventpoll: split __ep_remove() [ Upstream commit 0f7bdfd413000985de09fc39eb9efa1e091a3ce0 ] Split __ep_remove() to delineate file removal from epoll item removal. Suggested-by: Linus Torvalds <[email protected]> Link: https://patch.msgid.link/[email protected] Signed-off-by: Christian Brauner (Amutable) <[email protected]> Stable-dep-of: a6dc643c6931 ("eventpoll: fix ep_remove struct eventpoll / struct file UAF") Signed-off-by: Quentin Schulz <[email protected]> Signed-off-by: Sasha Levin <[email protected]>
Author: Christian Brauner <[email protected]> Date: Fri Jun 19 16:58:41 2026 +0200 eventpoll: use hlist_is_singular_node() in __ep_remove() [ Upstream commit 3d9fd0abc94d8cd430cc7cd7d37ce5e5aae2cd2b ] Replace the open-coded "epi is the only entry in file->f_ep" check with hlist_is_singular_node(). Same semantics, and the helper avoids the head-cacheline access in the common false case. Link: https://patch.msgid.link/[email protected] Signed-off-by: Christian Brauner (Amutable) <[email protected]> Stable-dep-of: a6dc643c6931 ("eventpoll: fix ep_remove struct eventpoll / struct file UAF") Signed-off-by: Quentin Schulz <[email protected]> Signed-off-by: Sasha Levin <[email protected]>
Author: Michael Bommarito <[email protected]> Date: Wed Apr 22 11:58:44 2026 -0400 exfat: fix potential use-after-free in exfat_find_dir_entry() commit 3f5f8ee9917cc2b9076ac533492d8a200edcabb8 upstream. In exfat_find_dir_entry(), the buffer_head obtained from exfat_get_dentry() is released with brelse(bh) before the fall-through TYPE_EXTEND branch reads the directory entry through ep (which points into bh->b_data): brelse(bh); if (entry_type == TYPE_EXTEND) { ... len = exfat_extract_uni_name(ep, entry_uniname); ... } After brelse() drops our reference, nothing guarantees that the underlying page backing bh->b_data remains valid for the subsequent exfat_extract_uni_name() read. This is the same pattern fixed in commit fc961522ddbd ("exfat: Fix potential use after free in exfat_load_upcase_table()"). Move brelse(bh) so it runs after ep is no longer dereferenced on each branch. Confirmed on QEMU x86_64 with CONFIG_KASAN=y + CONFIG_DEBUG_PAGEALLOC=y + CONFIG_PAGE_POISONING=y on linux-next, using a crafted exFAT image (long filename with same-hash collisions forcing the TYPE_EXTEND path). With a debug-only invalidate_bdev() inserted between brelse(bh) and the ep read to make the stale-deref window deterministic, the unpatched kernel faults: BUG: KASAN: use-after-free in exfat_find_dir_entry+0x133b/0x15a0 BUG: unable to handle page fault for address: ffff88801a5fa0c2 Oops: 0000 [#1] SMP DEBUG_PAGEALLOC KASAN NOPTI RIP: 0010:exfat_find_dir_entry+0x1188/0x15a0 With this patch applied, the same instrumented harness completes cleanly under the same sanitizer stack. I have not reproduced a crash on an uninstrumented kernel under ordinary reclaim; the instrumented A/B establishes the lifetime violation and that the patch closes it, not an unaided triggerability claim. Fixes: ca06197382bd ("exfat: add directory operations") Cc: [email protected] Assisted-by: Claude:claude-opus-4-7 Signed-off-by: Michael Bommarito <[email protected]> Signed-off-by: Namjae Jeon <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]>
Author: Yongpeng Yang <[email protected]> Date: Mon Apr 27 21:10:51 2026 +0800 f2fs: fix incorrect FI_NO_EXTENT handling in __destroy_extent_node() commit 1f70ddb28a3c71df124da5fa4040c808116d6bb9 upstream. When __destroy_extent_node() sets the inode flag FI_NO_EXTENT, it does not reset the length of the largest extent to 0 and update the inode folio. Since modifications to the extent tree are disallowed afterward, the cached largest extent may become stale. This can trigger the following error in xfstests generic/388: F2FS-fs (dm-0): sanity_check_extent_cache: inode (ino=1761) extent info [220057, 57, 6] is incorrect, run fsck to fix In the f2fs_drop_inode path, __destroy_extent_node() does not need to guarantee that et->node_cnt is 0, because concurrency with writeback is expected in this path, and writeback may update the extent cache. This patch reverts commit ed78aeebef05 ("f2fs: fix node_cnt race between extent node destroy and writeback"), and remove the unnecessary zero check of et->node_cnt. Fixes: ed78aeebef05 ("f2fs: fix node_cnt race between extent node destroy and writeback") Cc: [email protected] Reported-by: Chao Yu <[email protected]> Suggested-by: Chao Yu <[email protected]> Signed-off-by: Yongpeng Yang <[email protected]> Reviewed-by: Chao Yu <[email protected]> Signed-off-by: Jaegeuk Kim <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]>
Author: Sunmin Jeong <[email protected]> Date: Mon Jun 22 14:28:17 2026 +0900 f2fs: fix to round down start offset of fallocate for pin file commit 4275b59673eb60b02eec3997816c83f1f4b909c4 upstream. Currently, the length of fallocate for pin file is section-aligned to keep allocated sections from being selected as victims of GC. However, for the case that the start offset of fallocate is not aligned in section, the allocated sections can't be fully utilized. It's because a new section is allocated by f2fs_allocate_pinning_section() after using blks_per_sec blocks regardless of the start offset. As a result, several unexpected dirty segments may be created, including blocks assigned to the pinned file. To address this issue, let's round down the start offset of fallocate to the length of section. The reproducing scenario is as below chunk=$(((2<<20)+4096)) # 2MB + 4KB touch test f2fs_io pinfile set test f2fs_io fallocate 0 0 $chunk test f2fs_io fallocate 0 $chunk $chunk test f2fs_io fallocate 0 $((chunk*2)) $chunk test f2fs_io fiemap 0 $((chunk*3)) test Fiemap: offset = 0 len = 12288 logical addr. physical addr. length flags 0 0000000000000000 000000068c600000 0000000000400000 00001088 1 0000000000400000 000000003d400000 0000000000001000 00001088 2 0000000000401000 00000003eb200000 0000000000200000 00001088 3 0000000000601000 00000005e4200000 0000000000001000 00001088 4 0000000000602000 0000000605400000 0000000000200000 00001089 Cc: [email protected] Fixes: f5a53edcf01e ("f2fs: support aligned pinned file") Reviewed-by: Yunji Kang <[email protected]> Reviewed-by: Yeongjin Gil <[email protected]> Reviewed-by: Sungjong Seo <[email protected]> Signed-off-by: Sunmin Jeong <[email protected]> Reviewed-by: Chao Yu <[email protected]> Signed-off-by: Jaegeuk Kim <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]>
Author: Wenjie Qi <[email protected]> Date: Wed May 27 20:06:28 2026 +0800 f2fs: keep atomic write retry from zeroing original data commit 6d874b65aadce56ac78f76129dbcfc2599b638f8 upstream. A partial atomic write reserves a block in the COW inode before reading the original data page for the untouched bytes in that page. If that read fails, write_begin returns an error but leaves the COW inode entry as NEW_ADDR. A retry of the same partial write then finds the COW entry, treats it as existing COW data, and f2fs_write_begin() zeroes the whole folio because blkaddr is NEW_ADDR. If the retry is committed, the bytes outside the retried write range are committed as zeroes instead of preserving the original file contents. Only use the COW inode as the read source when it already has a real data block. If the COW entry is still NEW_ADDR, treat it as a reservation to reuse: keep reading the old data from the original inode and avoid reserving or accounting the same atomic block again. Cc: [email protected] Fixes: 3db1de0e582c ("f2fs: change the current atomic write way") Signed-off-by: Wenjie Qi <[email protected]> Reviewed-by: Chao Yu <[email protected]> Signed-off-by: Jaegeuk Kim <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]>
Author: Zhang Cen <[email protected]> Date: Mon Jun 15 15:19:54 2026 +0800 f2fs: validate ACL entry sizes in f2fs_acl_from_disk() commit c4810ada31e80cbe4011467c4f3b1e93f94134f3 upstream. f2fs_acl_count() only validates the aggregate ACL xattr length. A malformed ACL can still place ACL_USER or ACL_GROUP in a slot that only contains struct f2fs_acl_entry_short bytes, and f2fs_acl_from_disk() then reads entry->e_id before verifying that a full entry fits. Require a short entry before reading e_tag and e_perm, and require a full entry before reading e_id for ACL_USER and ACL_GROUP. Return -EFSCORRUPTED from these new truncated-entry checks, while keeping the pre-existing -EINVAL paths unchanged. Validation reproduced this kernel report: KASAN slab-out-of-bounds in __f2fs_get_acl+0x6fb/0x7e0 RIP: 0033:0x7f4b835ea7aa The buggy address belongs to the object at ffff888114589960 which belongs to the cache kmalloc-8 of size 8 The buggy address is located 0 bytes to the right of allocated 8-byte region [ffff888114589960, ffff888114589968) Read of size 4 Call trace: dump_stack_lvl+0x66/0xa0 (?:?) print_report+0xce/0x630 (?:?) __f2fs_get_acl+0x6fb/0x7e0 (fs/f2fs/acl.c:169) srso_alias_return_thunk+0x5/0xfbef5 (?:?) __virt_addr_valid+0x224/0x430 (?:?) kasan_report+0xe0/0x110 (?:?) __f2fs_get_acl+0x5/0x7e0 (fs/f2fs/acl.c:169) __get_acl+0x281/0x380 (?:?) vfs_get_acl+0x10b/0x190 (?:?) do_get_acl+0x2a/0x410 (?:?) do_get_acl+0x9/0x410 (?:?) do_getxattr+0xe8/0x260 (?:?) filename_getxattr+0xd1/0x140 (?:?) do_getname+0x2d/0x2d0 (?:?) path_getxattrat+0x16c/0x200 (?:?) lock_release+0xc8/0x290 (?:?) cgroup_update_frozen+0x9d/0x320 (?:?) lockdep_hardirqs_on_prepare+0xea/0x1a0 (?:?) trace_hardirqs_on+0x1a/0x170 (?:?) _raw_spin_unlock_irq+0x28/0x50 (?:?) do_syscall_64+0x115/0x6a0 (arch/x86/entry/syscall_64.c:87) entry_SYSCALL_64_after_hwframe+0x77/0x7f (?:?) Cc: [email protected] Fixes: af48b85b8cd3 ("f2fs: add xattr and acl functionalities") Assisted-by: Codex:gpt-5.5 Signed-off-by: Zhang Cen <[email protected]> Reviewed-by: Chao Yu <[email protected]> Signed-off-by: Jaegeuk Kim <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]>
Author: Wenjie Qi <[email protected]> Date: Thu May 21 11:16:18 2026 +0800 f2fs: validate compress cache inode only when enabled commit 5073c66a96a9c23c0c2533ed4ed06e42f9021208 upstream. F2FS_COMPRESS_INO() uses NM_I(sbi)->max_nid as the synthetic inode number for the compressed page cache inode. That inode only exists when the compress_cache mount option is enabled. When compress_cache is disabled, max_nid is outside the valid inode range. A corrupted directory entry that points to ino == max_nid should therefore be rejected by f2fs_check_nid_range(). However, is_meta_ino() currently treats F2FS_COMPRESS_INO() as a meta inode unconditionally, so f2fs_iget() bypasses do_read_inode() and its nid range check, and instantiates a fake internal inode instead. Gate the compressed cache inode case on COMPRESS_CACHE, matching f2fs_init_compress_inode(). With compress_cache disabled, ino == max_nid now follows the normal inode path and is rejected as an out-of-range nid. Cc: [email protected] Fixes: 6ce19aff0b8c ("f2fs: compress: add compress_inode to cache compressed blocks") Signed-off-by: Wenjie Qi <[email protected]> Reviewed-by: Chao Yu <[email protected]> Signed-off-by: Jaegeuk Kim <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]>
Author: Ian Bridges <[email protected]> Date: Wed Jun 24 23:13:12 2026 -0500 fbdev: Fix fb_new_modelist to prevent null-ptr-deref in fb_videomode_to_var commit 7f08fc10fa3d3366dc3af723970bd03d7d6d10e3 upstream. info->var, a framebuffer's current mode, is expected to have a matching entry in info->modelist. var_to_display() relies on this and treats a failed fb_match_mode() as "This should not happen". fb_set_var() keeps it true by adding the mode to the list on every change, and do_register_framebuffer() does the same at registration. store_modes() replaces the modelist from userspace. fb_new_modelist() validates the new modes but does not check that info->var still has a match. It relies on fbcon_new_modelist() to re-point consoles, but that only handles consoles mapped to the framebuffer. With fbcon unbound there are none, so info->var is left describing a mode that is no longer in the list. A later console takeover runs var_to_display(), where fb_match_mode() returns NULL and leaves fb_display[i].mode NULL. fbcon_switch() passes it to display_to_var(), and fb_videomode_to_var() dereferences the NULL mode. Keep the current mode in the list in fb_new_modelist(), the same way fb_set_var() does. Cc: [email protected] Assisted-by: Claude:claude-opus-4-8 Signed-off-by: Ian Bridges <[email protected]> Signed-off-by: Helge Deller <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]>
Author: Ian Bridges <[email protected]> Date: Thu Jun 25 23:50:48 2026 -0500 fbdev: fix use-after-free in store_modes() commit 2c1c805c65fb7dc7524e20376d6987721e73a0b1 upstream. store_modes() replaces a framebuffer's modelist with modes from userspace. On success it frees the old modelist with fb_destroy_modelist(). Two fields still point into that freed list. One pointer is fb_display[i].mode, the mode a console is using. fbcon_new_modelist() moves these pointers to the new list. It only does so for consoles still mapped to the framebuffer. An unmapped console is skipped and keeps its stale pointer. Unbinding fbcon, for example, sets con2fb_map[i] to -1 but leaves fb_display[i].mode set. An FBIOPUT_VSCREENINFO ioctl with FB_ACTIVATE_INV_MODE later reaches fbcon_mode_deleted(). That function reads the stale fb_display[i].mode through fb_mode_is_equal(). The read is a use-after-free. The other pointer is fb_info->mode, the current mode. It is set through the mode sysfs attribute. store_modes() does not update fb_info->mode, so it is left pointing into the freed list. show_mode(), the attribute's read handler, dereferences the stale fb_info->mode through mode_string(). The read is a use-after-free. Clear both pointers before freeing the list. Commit a1f305893074 ("fbcon: Set fb_display[i]->mode to NULL when the mode is released") added the helper fbcon_delete_modelist(). It clears every fb_display[i].mode that points into a given list. So far it is called only from the unregister path. Call it from store_modes() too, and set fb_info->mode to NULL. Reported-by: [email protected] Closes: https://syzkaller.appspot.com/bug?extid=81c7c6b52649fd07299d Cc: [email protected] Link: https://lore.kernel.org/all/ajjoDhAi2y4ArSlz@dev/ Assisted-by: Claude:claude-opus-4-8 Signed-off-by: Ian Bridges <[email protected]> Signed-off-by: Helge Deller <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]>
Author: Tuo Li <[email protected]> Date: Wed Jun 10 10:50:14 2026 +0800 fbdev: modedb: fix a possible UAF in fb_find_mode() commit 85b6256469cebdac395e7447147e06b2e151014f upstream. If mode_option is NULL, it is assigned from mode_option_buf: if (!mode_option) { fb_get_options(NULL, &mode_option_buf); mode_option = mode_option_buf; } Later, name is assigned from mode_option: const char *name = mode_option; However, mode_option_buf is freed before name is no longer used: kfree(mode_option_buf); while name is still accessed by: if ((name_matches(db[i], name, namelen) || Since name aliases mode_option_buf, this may result in a use-after-free. Fix this by extending the lifetime of mode_option_buf until the end of the function by using scope-based resource management for cleanup. Signed-off-by: Tuo Li <[email protected]> Cc: [email protected] # v6.5+ Signed-off-by: Helge Deller <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]>
Author: Steffen Persvold <[email protected]> Date: Fri Jun 12 18:40:41 2026 +0200 fbdev: modedb: Fix misaligned fields in the 1920x1080-60 mode commit d894c48a57d78206e4df9c90d4acfaf39394806a upstream. The 1920x1080@60 modedb entry has one too many initializers before its sync field: a stray "0" occupies the sync slot, which shifts the remaining values by one field. The entry therefore decodes as sync = 0, vmode = FB_SYNC_HOR_HIGH_ACT | FB_SYNC_VERT_HIGH_ACT (0x3, i.e. FB_VMODE_INTERLACED | FB_VMODE_DOUBLE), and flag = FB_VMODE_NONINTERLACED, instead of the intended sync = positive H/V, vmode = non-interlaced. fb_find_mode() then returns a 1920x1080 mode flagged as interlaced + doublescan with active-low syncs. Drivers that honour var->vmode and var->sync when programming display timing enable doublescan and the wrong sync polarity, corrupting the output. Drop the stray initializer so sync and vmode hold their intended values (positive H/V sync, non-interlaced), matching the adjacent 1920x1200 entry. Fixes: c8902258b2b8 ("fbdev: modedb: Add 1920x1080 at 60 Hz video mode") Cc: [email protected] Signed-off-by: Steffen Persvold <[email protected]> Signed-off-by: Helge Deller <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]>
Author: Jann Horn <[email protected]> Date: Tue Jun 16 12:36:09 2026 -0400 fhandle: fix UAF due to unlocked ->mnt_ns read in may_decode_fh() [ Upstream commit 40ab6644b99685755f740b872c00ef40d9aa870e ] may_decode_fh() accesses mount::mnt_ns without holding any locks; that means the mount can concurrently be unmounted, and the mnt_namespace can concurrently be freed after an RCU grace period. This race can happens as follows, assuming that the mount point was created by open_tree(..., OPEN_TREE_CLONE): thread 1 thread 2 RCU __do_sys_open_by_handle_at do_handle_open handle_to_path may_decode_fh is_mounted [mount::mnt_ns access] [mount::mnt_ns access] __do_sys_close fput_close_sync __fput dissolve_on_fput umount_tree class_namespace_excl_destructor namespace_unlock free_mnt_ns mnt_ns_tree_remove call_rcu(mnt_ns_release_rcu) mnt_ns_release_rcu mnt_ns_release kfree [mnt_namespace::user_ns access] **UAF** Fix it by taking rcu_read_lock() around the mount::mnt_ns access, like in __prepend_path(). Additionally, document the semantics of mount::mnt_ns, and use WRITE_ONCE() for writers that can race with lockless readers. This bug is unreachable unless one of the following is set: - CONFIG_PREEMPTION - CONFIG_RCU_STRICT_GRACE_PERIOD because it requires an RCU grace period to happen during a syscall without an explicit preemption. This doesn't seem to have interesting security impact; worst-case, it could leak the result of an integer comparison to userspace (from the level check in cap_capable()), cause an endless loop, or crash the kernel by dereferencing an invalid address. Fixes: 620c266f3949 ("fhandle: relax open_by_handle_at() permission checks") Cc: [email protected] Signed-off-by: Jann Horn <[email protected]> Link: https://patch.msgid.link/[email protected] Signed-off-by: Christian Brauner (Amutable) <[email protected]> Signed-off-by: Sasha Levin <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]>
Author: Wentao Liang <[email protected]> Date: Wed Apr 8 15:45:34 2026 +0000 fpga: region: fix use-after-free in child_regions_with_firmware() commit 54f3c5643ec523a04b6ec0e7c19eb10f5ebebdd3 upstream. Move of_node_put(child_region) after the error print to avoid accessing freed memory when pr_err() references child_region. Fixes: 0fa20cdfcc1f ("fpga: fpga-region: device tree control for FPGA") Cc: [email protected] Signed-off-by: Wentao Liang <[email protected]> [ Yilun: Fix the Fixes tag ] Reviewed-by: Xu Yilun <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Xu Yilun <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]>
Author: Amir Goldstein <[email protected]> Date: Mon Jun 29 15:03:36 2026 +0800 fs: constify file ptr in backing_file accessor helpers [ Upstream commit 4e301d858af17ae2ce56886296e5458c5a08219a ] Add internal helper backing_file_set_user_path() for the only two cases that need to modify backing_file fields. Signed-off-by: Amir Goldstein <[email protected]> Link: https://lore.kernel.org/[email protected] Signed-off-by: Christian Brauner <[email protected]> Signed-off-by: Cai Xinchen <[email protected]> Signed-off-by: Sasha Levin <[email protected]>
Author: Steven Rostedt <[email protected]> Date: Thu Jun 18 16:15:38 2026 -0400 ftrace: Check against is_kernel_text() instead of kaslr_offset() [ Upstream commit da0f622b344be769ed61e7c1caf95cd0cdb47964 ] As kaslr_offset() is architecture dependent and also may not be defined by all architectures, when zeroing out unused weak functions, do not check against kaslr_offset(), but instead check if the address is within the kernel text sections. If KASLR added a shift to the zeroed out function, it would still not be located in the kernel text. This is a more robust way to test if the text is valid or not. Cc: Masami Hiramatsu <[email protected]> Cc: Mark Rutland <[email protected]> Cc: Mathieu Desnoyers <[email protected]> Cc: Andrew Morton <[email protected]> Cc: Masahiro Yamada <[email protected]> Cc: Catalin Marinas <[email protected]> Cc: Will Deacon <[email protected]> Cc: "Arnd Bergmann" <[email protected]> Link: https://lore.kernel.org/[email protected] Fixes: ef378c3b8233 ("scripts/sorttable: Zero out weak functions in mcount_loc table") Reported-by: Nathan Chancellor <[email protected]> Reported-by: Mark Brown <[email protected]> Tested-by: Nathan Chancellor <[email protected]> Closes: https://lore.kernel.org/all/20250224180805.GA1536711@ax162/ Closes: https://lore.kernel.org/all/[email protected]/ Signed-off-by: Steven Rostedt (Google) <[email protected]> Signed-off-by: Andrey Grodzovsky <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]>
Author: Guenter Roeck <[email protected]> Date: Thu Jun 18 16:15:36 2026 -0400 ftrace: Do not over-allocate ftrace memory [ Upstream commit be55257fab181b93af38f8c4b1b3cb453a78d742 ] The pg_remaining calculation in ftrace_process_locs() assumes that ENTRIES_PER_PAGE multiplied by 2^order equals the actual capacity of the allocated page group. However, ENTRIES_PER_PAGE is PAGE_SIZE / ENTRY_SIZE (integer division). When PAGE_SIZE is not a multiple of ENTRY_SIZE (e.g. 4096 / 24 = 170 with remainder 16), high-order allocations (like 256 pages) have significantly more capacity than 256 * 170. This leads to pg_remaining being underestimated, which in turn makes skip (derived from skipped - pg_remaining) larger than expected, causing the WARN(skip != remaining) to trigger. Extra allocated pages for ftrace: 2 with 654 skipped WARNING: CPU: 0 PID: 0 at kernel/trace/ftrace.c:7295 ftrace_process_locs+0x5bf/0x5e0 A similar problem in ftrace_allocate_records() can result in allocating too many pages. This can trigger the second warning in ftrace_process_locs(). Extra allocated pages for ftrace WARNING: CPU: 0 PID: 0 at kernel/trace/ftrace.c:7276 ftrace_process_locs+0x548/0x580 Use the actual capacity of a page group to determine the number of pages to allocate. Have ftrace_allocate_pages() return the number of allocated pages to avoid having to calculate it. Use the actual page group capacity when validating the number of unused pages due to skipped entries. Drop the definition of ENTRIES_PER_PAGE since it is no longer used. Cc: [email protected] Fixes: 4a3efc6baff93 ("ftrace: Update the mcount_loc check of skipped entries") Link: https://patch.msgid.link/[email protected] Signed-off-by: Guenter Roeck <[email protected]> Signed-off-by: Steven Rostedt (Google) <[email protected]> Signed-off-by: Andrey Grodzovsky <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]>
Author: Steven Rostedt <[email protected]> Date: Thu Jun 18 16:15:35 2026 -0400 ftrace: Have ftrace pages output reflect freed pages [ Upstream commit 264143c4e54412095f4b615e65bf736fc3c60af0 ] The amount of memory that ftrace uses to save the descriptors to manage the functions it can trace is shown at output. But if there are a lot of functions that are skipped because they were weak or the architecture added holes into the tables, then the extra pages that were allocated are freed. But these freed pages are not reflected in the numbers shown, and they can even be inconsistent with what is reported: ftrace: allocating 57482 entries in 225 pages ftrace: allocated 224 pages with 3 groups The above shows the number of original entries that are in the mcount_loc section and the pages needed to save them (225), but the second output reflects the number of pages that were actually used. The two should be consistent as: ftrace: allocating 56739 entries in 224 pages ftrace: allocated 224 pages with 3 groups The above also shows the accurate number of entires that were actually stored and does not include the entries that were removed. Cc: bpf <[email protected]> Cc: Masami Hiramatsu <[email protected]> Cc: Mark Rutland <[email protected]> Cc: Mathieu Desnoyers <[email protected]> Cc: Andrew Morton <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Masahiro Yamada <[email protected]> Cc: Nathan Chancellor <[email protected]> Cc: Nicolas Schier <[email protected]> Cc: Zheng Yejian <[email protected]> Cc: Martin Kelly <[email protected]> Cc: Christophe Leroy <[email protected]> Cc: Josh Poimboeuf <[email protected]> Cc: Heiko Carstens <[email protected]> Cc: Catalin Marinas <[email protected]> Cc: Will Deacon <[email protected]> Cc: Vasily Gorbik <[email protected]> Cc: Alexander Gordeev <[email protected]> Link: https://lore.kernel.org/[email protected] Signed-off-by: Steven Rostedt (Google) <[email protected]> Signed-off-by: Andrey Grodzovsky <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]>
Author: Steven Rostedt <[email protected]> Date: Thu Jun 18 16:15:37 2026 -0400 ftrace: Test mcount_loc addr before calling ftrace_call_addr() [ Upstream commit 6eeca746fa5f1dd03c6ee05cb03f5eb1ddda1c81 ] The addresses in the mcount_loc can be zeroed and then moved by KASLR making them invalid addresses. ftrace_call_addr() for ARM 64 expects a valid address to kernel text. If the addr read from the mcount_loc section is invalid, it must not call ftrace_call_addr(). Move the addr check before calling ftrace_call_addr() in ftrace_process_locs(). Cc: Masami Hiramatsu <[email protected]> Cc: Mark Rutland <[email protected]> Cc: Mathieu Desnoyers <[email protected]> Cc: Andrew Morton <[email protected]> Cc: Masahiro Yamada <[email protected]> Cc: Catalin Marinas <[email protected]> Cc: Will Deacon <[email protected]> Cc: Mark Brown <[email protected]> Link: https://lore.kernel.org/[email protected] Fixes: ef378c3b8233 ("scripts/sorttable: Zero out weak functions in mcount_loc table") Reported-by: Nathan Chancellor <[email protected]> Reported-by: "Arnd Bergmann" <[email protected]> Tested-by: Nathan Chancellor <[email protected]> Closes: https://lore.kernel.org/all/20250225025631.GA271248@ax162/ Closes: https://lore.kernel.org/all/[email protected]/ Signed-off-by: Steven Rostedt (Google) <[email protected]> Signed-off-by: Andrey Grodzovsky <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]>
Author: Steven Rostedt <[email protected]> Date: Thu Jun 18 16:15:34 2026 -0400 ftrace: Update the mcount_loc check of skipped entries [ Upstream commit 4a3efc6baff931da9a85c6d2e42c87bd9a827399 ] Now that weak functions turn into skipped entries, update the check to make sure the amount that was allocated would fit both the entries that were allocated as well as those that were skipped. Cc: bpf <[email protected]> Cc: Masami Hiramatsu <[email protected]> Cc: Mark Rutland <[email protected]> Cc: Mathieu Desnoyers <[email protected]> Cc: Andrew Morton <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Masahiro Yamada <[email protected]> Cc: Nathan Chancellor <[email protected]> Cc: Nicolas Schier <[email protected]> Cc: Zheng Yejian <[email protected]> Cc: Martin Kelly <[email protected]> Cc: Christophe Leroy <[email protected]> Cc: Josh Poimboeuf <[email protected]> Cc: Heiko Carstens <[email protected]> Cc: Catalin Marinas <[email protected]> Cc: Will Deacon <[email protected]> Cc: Vasily Gorbik <[email protected]> Cc: Alexander Gordeev <[email protected]> Link: https://lore.kernel.org/[email protected] Signed-off-by: Steven Rostedt (Google) <[email protected]> Signed-off-by: Andrey Grodzovsky <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]>
Author: Jann Horn <[email protected]> Date: Tue Jun 16 21:00:23 2026 +0200 fuse: limit FUSE_NOTIFY_RETRIEVE to uptodate folios [ Upstream commit 4e3d1b2c48ca6c55f1e9ca7f8dccc76f120f276c ] FUSE_NOTIFY_RETRIEVE must be limited to uptodate folios; !uptodate folios can contain uninitialized data. Since FUSE_NOTIFY_RETRIEVE is intended to only return data that is already in the page cache and not wait for data from the FUSE daemon, treat !uptodate folios as if they weren't present. This only has security impact on systems that don't enable automatic zero-initialization of all page allocations via CONFIG_INIT_ON_ALLOC_DEFAULT_ON or init_on_alloc=1. Cc: [email protected] Fixes: 2d45ba381a74 ("fuse: add retrieve request") Signed-off-by: Jann Horn <[email protected]> Link: https://patch.msgid.link/[email protected] Acked-by: Miklos Szeredi <[email protected]> Signed-off-by: Christian Brauner (Amutable) <[email protected]> [adjusted for stable: page instead of folio] Signed-off-by: Jann Horn <[email protected]> Signed-off-by: Sasha Levin <[email protected]>
Author: Joanne Koong <[email protected]> Date: Tue Jun 23 08:05:32 2026 -0400 fuse: re-lock request before replacing page cache folio [ Upstream commit a078484921052d0badd827fcc2770b5cfc1d4120 ] fuse_try_move_folio() unlocks the request on entry but does not re-lock it on the success path. This means fuse_chan_abort() can end the request and free the fuse_io_args (eg fuse_readpages_end()) while the subsequent copy chain logic after fuse_try_move_folio() accesses the fuse_io_args, leading to use-after-free issues. Fix this by calling lock_request() before replace_page_cache_folio(). This ensures the request is locked on the success path which will prevent the fuse_io_args from being freed while the later copying logic runs, and also ensures that the ap->folios[i]->mapping is never null since ap->folios[i] will always point to the newfolio after replace_page_cache_folio(). Fixes: ce534fb05292 ("fuse: allow splice to move pages") Cc: [email protected] Reported-by: Lei Lu <[email protected]> Signed-off-by: Joanne Koong <[email protected]> Signed-off-by: Miklos Szeredi <[email protected]> Signed-off-by: Sasha Levin <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]>
Author: Tristan Madani <[email protected]> Date: Fri May 1 11:02:03 2026 +0000 gfs2: fix use-after-free in gfs2_qd_dealloc commit f9c9ec2c319f843b70ecdf939d48b52d189bc081 upstream. gfs2_qd_dealloc(), called as an RCU callback from gfs2_qd_dispose(), accesses the superblock object sdp through qd->qd_sbd after freeing qd. It does so to decrement sd_quota_count and wake up sd_kill_wait. However, by the time the RCU callback runs, gfs2_put_super() may have already freed sdp via free_sbd(). This can happen when gfs2_quota_cleanup() is called during unmount: it disposes of quota objects via call_rcu() and then waits on sd_kill_wait with a 60-second timeout. If the timeout expires, or if gfs2_gl_hash_clear() triggers additional qd_put() calls that schedule more RCU callbacks after the wait completes, gfs2_put_super() will proceed to free the superblock while RCU callbacks referencing it are still pending. Add an rcu_barrier() before free_sbd() in gfs2_put_super() to ensure all pending RCU callbacks (including gfs2_qd_dealloc) have completed before the superblock is freed. Fixes: a475c5dd16e5 ("gfs2: Free quota data objects synchronously") Reported-by: [email protected] Closes: https://syzkaller.appspot.com/bug?extid=42a37bf8045847d8f9d2 Tested-by: [email protected] Cc: [email protected] Signed-off-by: Tristan Madani <[email protected]> Signed-off-by: Andreas Gruenbacher <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]>
Author: Tzung-Bi Shih <[email protected]> Date: Thu Jun 18 18:02:53 2026 +0200 gpio: Fix resource leaks on errors in gpiochip_add_data_with_key() [ Upstream commit 16fdabe143fce2cbf89139677728e17e21b46c28 ] Since commit aab5c6f20023 ("gpio: set device type for GPIO chips"), `gdev->dev.release` is unset. As a result, the reference count to `gdev->dev` isn't dropped on the error handling paths. Drop the reference on errors. Also reorder the instructions to make the error handling simpler. Now gpiochip_add_data_with_key() roughly looks like: >>> Some memory allocation. Go to ERR ZONE 1 on errors. >>> device_initialize(). gpiodev_release() takes over the responsibility for freeing the resources of `gdev->dev`. The subsequent error handling paths shouldn't go through ERR ZONE 1 again which leads to double free. >>> Some initialization mainly on `gdev`. >>> The rest of initialization. Go to ERR ZONE 2 on errors. >>> Chip registration success and exit. >>> ERR ZONE 2. gpio_device_put() and exit. >>> ERR ZONE 1. Cc: [email protected] Fixes: aab5c6f20023 ("gpio: set device type for GPIO chips") Reviewed-by: Linus Walleij <[email protected]> Signed-off-by: Tzung-Bi Shih <[email protected]> Link: https://patch.msgid.link/[email protected] Signed-off-by: Bartosz Golaszewski <[email protected]> [missing commit fcc8b637c542 ("gpiolib: switch the line state notifier to atomic"), commit dcb73cbaaeb3 ("gpio: cdev: use raw notifier for line state events") and commit d4f335b410dd ("gpiolib: rename GPIO chip printk macros") in 6.12.y. s/gpiochip_err/chip_err/ as well as replaced rwlock_init+RAW_INIT_NOTIFIER_HEAD with BLOCKING_INIT_NOTIFIER_HEAD based on missing commits, following same logic as in 16fdabe143fc.] Signed-off-by: Quentin Schulz <[email protected]> Signed-off-by: Sasha Levin <[email protected]>
Author: Andy Shevchenko <[email protected]> Date: Thu Jun 18 18:02:51 2026 +0200 gpiolib: Extract gpiochip_choose_fwnode() for wider use [ Upstream commit 375790f18396b2ba706e031b150c58cd37b45a11 ] Extract gpiochip_choose_fwnode() for the future use in another function. Signed-off-by: Andy Shevchenko <[email protected]> Reviewed-by: Linus Walleij <[email protected]> Tested-by: Mathieu Dubois-Briand <[email protected]> Reviewed-by: Mathieu Dubois-Briand <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Bartosz Golaszewski <[email protected]> Stable-dep-of: 16fdabe143fc ("gpio: Fix resource leaks on errors in gpiochip_add_data_with_key()") Signed-off-by: Quentin Schulz <[email protected]> Signed-off-by: Sasha Levin <[email protected]>
Author: Andy Shevchenko <[email protected]> Date: Thu Jun 18 18:02:52 2026 +0200 gpiolib: Remove redundant assignment of return variable [ Upstream commit 550300b9a295a591e0721a31f8c964a4bc08d51c ] In some functions the returned variable is assigned to 0 and then reassigned to the actual value. Remove redundant assignments. In one case make it more clear that the assignment is not needed. Reviewed-by: Linus Walleij <[email protected]> Signed-off-by: Andy Shevchenko <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Bartosz Golaszewski <[email protected]> Stable-dep-of: 16fdabe143fc ("gpio: Fix resource leaks on errors in gpiochip_add_data_with_key()") Signed-off-by: Quentin Schulz <[email protected]> Signed-off-by: Sasha Levin <[email protected]>
Author: Fan Wu <[email protected]> Date: Wed Jun 17 02:05:18 2026 +0000 hdlc_ppp: sync per-proto timers before freeing hdlc state commit c78a4e41ab5ead6193ad8a2dd92e8906bae659fa upstream. Each PPP control protocol (LCP/IPCP/IPV6CP) embedded in struct ppp registers a timer via timer_setup(). That struct ppp is the hdlc->state allocation, which detach_hdlc_protocol() frees with kfree() in both teardown paths: unregister_hdlc_device() and the re-attach inside attach_hdlc_protocol(). The ppp proto never registered a .detach callback, so detach_hdlc_protocol() performs no timer synchronization before the kfree(). The only cancel, timer_delete(&proto->timer) in ppp_cp_event(), is partial (it does not wait for a running callback) and only runs on the ->CLOSED transition; ppp_stop()/ppp_close() do not sync either. A ppp_timer callback already executing (blocked on ppp->lock) survives the kfree and then dereferences proto->state / ppp->lock in freed memory, leading to a use-after-free. Fix this by adding a .detach helper that calls timer_shutdown_sync() on every per-proto timer. detach_hdlc_protocol() invokes proto->detach(dev) before kfree(hdlc->state), so timer_shutdown_sync() now runs on both free paths. timer_shutdown_sync() is used instead of timer_delete_sync() because the keepalive path re-arms the timer through add_timer()/mod_timer() and shutdown blocks any re-activation during teardown. Initialize the per-protocol timers in ppp_ioctl() when the protocol is attached, and remove the now-redundant timer_setup() from ppp_start(), so that the timers are initialized exactly once at attach time and ppp_timer_release() never operates on uninitialized timer_list structures. attach_hdlc_protocol() uses kmalloc() (not kzalloc), so struct ppp's protos[i].timer is uninitialized garbage until the first timer_setup(); without this init-at-attach, attaching the PPP protocol without ever bringing the device up would leave timer_shutdown_sync() operating on uninitialized memory in .detach. Moving the init out of ppp_start() (which only runs on NETDEV_UP) into the attach path makes the initialization unconditional and avoids initializing the same timer_list twice. This bug was found by static analysis. Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Cc: [email protected] Signed-off-by: Fan Wu <[email protected]> Link: https://patch.msgid.link/[email protected] Signed-off-by: Jakub Kicinski <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]>
Author: Thorsten Blum <[email protected]> Date: Tue Jun 16 13:01:25 2026 -0400 hv: utils: handle and propagate errors in kvp_register [ Upstream commit 3fcf923302a8f5c0dc3af3d2ca2657cb5fae4297 ] Make kvp_register() return an error code instead of silently ignoring failures, and propagate the error from kvp_handle_handshake() instead of returning success. This propagates both kzalloc_obj() and hvutil_transport_send() failures to kvp_handle_handshake() and thus to kvp_on_msg(). Fixes: 245ba56a52a3 ("Staging: hv: Implement key/value pair (KVP)") Cc: [email protected] Signed-off-by: Thorsten Blum <[email protected]> Reviewed-by: Long Li <[email protected]> Signed-off-by: Wei Liu <[email protected]> Signed-off-by: Sasha Levin <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]>
Author: Johan Hovold <[email protected]> Date: Mon May 11 16:37:12 2026 +0200 i2c: core: fix adapter registration race commit ba14d7cf2fe7284610a29854bdff22b2537d3ce6 upstream. Adapters can be looked up based on their id using i2c_get_adapter() which takes a reference to the embedded struct device. Make sure that the adapter (including its struct device) has been initialised before adding it to the IDR to avoid accessing uninitialised data which could, for example, lead to NULL-pointer dereferences or use-after-free. Note that the i2c-dev chardev, which is registered from a bus notifier, currently uses i2c_get_adapter() so the adapter needs to be added to the IDR before registration. Fixes: 6e13e6418418 ("i2c: Add i2c_add_numbered_adapter()") Cc: [email protected] # 2.6.22 Signed-off-by: Johan Hovold <[email protected]> Signed-off-by: Wolfram Sang <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]>
Author: Weiming Shi <[email protected]> Date: Wed Apr 15 01:23:39 2026 +0800 i2c: stub: Reject I2C block transfers with invalid length commit 6036b5067a8199ba7a2dc7b377d4b9dd276d5f9e upstream. The I2C_SMBUS_I2C_BLOCK_DATA case in stub_xfer() uses data->block[0] as the transfer length. The existing check only clamps it to avoid overrunning the chip->words[256] register array, but does not validate it against I2C_SMBUS_BLOCK_MAX (32), which is the limit of the union i2c_smbus_data.block buffer (34 bytes total). The driver is a development/test tool (CONFIG_I2C_STUB=m, not built by default) that must be loaded with a chip_addr= parameter. A local user with access to /dev/i2c-* can issue an I2C_SMBUS ioctl with I2C_SMBUS_I2C_BLOCK_DATA and data->block[0] > 32, causing stub_xfer() to read or write past the end of the union i2c_smbus_data.block buffer: BUG: KASAN: stack-out-of-bounds in stub_xfer (drivers/i2c/i2c-stub.c:223) Read of size 1 at addr ffff88800abcfd92 by task exploit/81 Call Trace: <TASK> stub_xfer (drivers/i2c/i2c-stub.c:223) __i2c_smbus_xfer (drivers/i2c/i2c-core-smbus.c:593) i2c_smbus_xfer (drivers/i2c/i2c-core-smbus.c:536) i2cdev_ioctl_smbus (drivers/i2c/i2c-dev.c:391) i2cdev_ioctl (drivers/i2c/i2c-dev.c:478) __x64_sys_ioctl (fs/ioctl.c:583) do_syscall_64 (arch/x86/entry/syscall_64.c:94) entry_SYSCALL_64_after_hwframe (arch/x86/entry/entry_64.S:130) </TASK> The bug exists because i2c-stub implements .smbus_xfer directly, bypassing the I2C_SMBUS_BLOCK_MAX validation in i2c_smbus_xfer_emulated(). The I2C_SMBUS_BLOCK_DATA case in the same function correctly validates against I2C_SMBUS_BLOCK_MAX, but the I2C_SMBUS_I2C_BLOCK_DATA case does not. Fix by rejecting transfers with data->block[0] == 0 or data->block[0] > I2C_SMBUS_BLOCK_MAX with -EINVAL, consistent with both the I2C_SMBUS_BLOCK_DATA case in the same function and the I2C_SMBUS_I2C_BLOCK_DATA validation in i2c_smbus_xfer_emulated(). Fixes: 4710317891e4 ("i2c-stub: Implement I2C block support") Reported-by: Xiang Mei <[email protected]> Signed-off-by: Weiming Shi <[email protected]> Reviewed-by: Jean Delvare <[email protected]> Signed-off-by: Wolfram Sang <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]>
Author: Sam Daly <[email protected]> Date: Thu May 14 18:23:20 2026 +0200 iio: adc: ti-ads1298: add bounds check to pga_settings index commit 95e8a48d7a85d4226934020e57815a3316d3a14b upstream. ads1298_pga_settings has 7 elements but ADS1298_MASK_CH_PGA can yield values 0-7. If it yields a value >= 7, this causes an out-of-bounds array access. Add a bounds check and return -EINVAL if the index is out of range. Note that the remaining value b111 is reserved so should not be seen in a correctly functioning system. Assisted-by: gkh_clanker_2000 Cc: stable <[email protected]> Cc: Jonathan Cameron <[email protected]> Cc: David Lechner <[email protected]> Cc: "Nuno SĂ¡" <[email protected]> Cc: Andy Shevchenko <[email protected]> Signed-off-by: Sam Daly <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]> Signed-off-by: Jonathan Cameron <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]>
Author: Antoniu Miclaus <[email protected]> Date: Fri Jun 19 17:34:43 2026 +0300 iio: light: bh1780: fix PM runtime leak on error path commit dd72e6c3cdea05cad24e99710939086f7a113fb5 upstream. Move pm_runtime_put_autosuspend() before the error check to ensure the PM runtime reference count is always decremented after pm_runtime_get_sync(), regardless of whether the read operation succeeds or fails. Fixes: 1f0477f18306 ("iio: light: new driver for the ROHM BH1780") Signed-off-by: Antoniu Miclaus <[email protected]> Reviewed-by: Linus Walleij <[email protected]> Cc: <[email protected]> Signed-off-by: Jonathan Cameron <[email protected]> [ moved both pm_runtime_mark_last_busy() and pm_runtime_put_autosuspend() before the error check instead of just pm_runtime_put_autosuspend() ] Signed-off-by: Sasha Levin <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]> Signed-off-by: Elizaveta Tereshkina <[email protected]> Signed-off-by: Sasha Levin <[email protected]>
Author: Sam Daly <[email protected]> Date: Thu May 14 18:23:21 2026 +0200 iio: light: veml6075: add bounds check to veml6075_it_ms index commit 307dc4240bd41852d9e0912921e298160db1c109 upstream. veml6075_it_ms has 5 elements but VEML6075_CONF_IT can yield values 0-7. If it returns a value >= 5, this causes an out-of-bounds array access. Add a bounds check and return -EINVAL if the index is out of range. The problem values are reserved so should never be read from the register. Hence this is hardening against fault device, missprogramming or bus corruption. Assisted-by: gkh_clanker_2000 Cc: stable <[email protected]> Signed-off-by: Sam Daly <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]> Reviewed-by: Javier Carrasco <[email protected]> Signed-off-by: Jonathan Cameron <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]>
Author: Eric Dumazet <[email protected]> Date: Tue Dec 3 17:36:17 2024 +0000 inet: add indirect call wrapper for getfrag() calls [ Upstream commit 5204ccbfa22358f95afd031a3f337e6d9a74baea ] UDP send path suffers from one indirect call to ip_generic_getfrag() We can use INDIRECT_CALL_1() to avoid it. Signed-off-by: Eric Dumazet <[email protected]> Reviewed-by: Willem de Bruijn <[email protected]> Reviewed-by: Brian Vazquez <[email protected]> Link: https://patch.msgid.link/[email protected] Signed-off-by: Jakub Kicinski <[email protected]> Stable-dep-of: eca856950f7c ("ipv4: account for fraggap on the paged allocation path") Signed-off-by: Sasha Levin <[email protected]>
Author: Gabriel Krisman Bertazi <[email protected]> Date: Wed Jun 17 13:51:58 2026 -0400 io_uring/net: Avoid msghdr on op_connect/op_bind async data [ Upstream commit 3979840cd858f30f43ea9f4e7f7f1f56de82d698 ] This fixes a memory leak due to the lack of the cleanup hook for the iovec. The stable backport differs from upstream by dropping the io_connect_bpf_populate hunk, which didn't exist at the time and by fixing the merge conflict due to the introduction of io_bind_file_create and by using the older async_data allocation API. Both IORING_OP_CONNECT and IORING_OP_BIND reuse the msghdr object just to store the sockaddr. Beyond allocating a much larger object than needed, msghdr can also wrap an iovec, which will be recycled unnecessarily. This uses the sockaddr directly. Cc: [email protected] Signed-off-by: Gabriel Krisman Bertazi <[email protected]> Link: https://patch.msgid.link/[email protected] Signed-off-by: Jens Axboe <[email protected]> Signed-off-by: Gabriel Krisman Bertazi <[email protected]> Signed-off-by: Sasha Levin <[email protected]>
Author: Eric Dumazet <[email protected]> Date: Mon Jun 8 15:59:18 2026 +0000 ip6_vti: set netns_immutable on the fallback device. [ Upstream commit d289d5307762d1838aaece22c6b6fcad9e8865f9 ] john1988 and Noam Rathaus reported that vti6_init_net() does not set the netns_immutable flag on the per-netns fallback tunnel device (ip6_vti0). Other similar tunnel drivers (like ip6_tunnel, sit, ip6_gre, and ip_tunnel) correctly set this flag during their fallback device initialization to prevent them from being moved to another network namespace. Fixes: 61220ab34948 ("vti6: Enable namespace changing") Reported-by: Noam Rathaus <[email protected]> Signed-off-by: Eric Dumazet <[email protected]> Cc: Steffen Klassert <[email protected]> Reviewed-by: Nicolas Dichtel <[email protected]> Link: https://patch.msgid.link/[email protected] Signed-off-by: Jakub Kicinski <[email protected]> [Salvatore Bonaccorso: Backport for version without 0c493da86374 ("net: rename netns_local to netns_immutable") in v6.15-rc1 and use netns_local.] Signed-off-by: Salvatore Bonaccorso <[email protected]> Signed-off-by: Sasha Levin <[email protected]>
Author: Wongi Lee <[email protected]> Date: Tue Jun 16 22:38:29 2026 +0900 ipv4: account for fraggap on the paged allocation path [ Upstream commit eca856950f7cb1a221e02b99d758409f2c5cec42 ] In __ip_append_data(), when the paged-allocation branch is taken, alloclen and pagedlen are computed as alloclen = fragheaderlen + transhdrlen; pagedlen = datalen - transhdrlen; datalen already includes fraggap, but the fraggap bytes carried over from the previous skb are copied into the new skb's linear area at offset transhdrlen by the subsequent skb_copy_and_csum_bits(). The linear area is therefore undersized by fraggap bytes while pagedlen is overstated by the same amount. The non-paged branch sets alloclen to fraglen, which already accounts for fraggap because datalen does. Bring the paged branch in line by adding fraggap to alloclen and subtracting it from pagedlen. After this adjustment, copy no longer collapses to -fraggap on the paged path, so remove the stale comment describing that old arithmetic. Fixes: 8eb77cc73977 ("ipv4: avoid partial copy for zc") Signed-off-by: Jungwoo Lee <[email protected]> Signed-off-by: Wongi Lee <[email protected]> Reviewed-by: Ido Schimmel <[email protected]> Link: https://patch.msgid.link/[email protected] Signed-off-by: Jakub Kicinski <[email protected]> Signed-off-by: Sasha Levin <[email protected]>
Author: Wongi Lee <[email protected]> Date: Tue Jun 16 22:46:17 2026 +0900 ipv6: account for fraggap on the paged allocation path commit 736b380e28d0480c7bc3e022f1950f31fe53a7c5 upstream. In __ip6_append_data(), when the paged-allocation branch is taken (MSG_MORE / NETIF_F_SG / large fraglen), alloclen and pagedlen are computed as alloclen = fragheaderlen + transhdrlen; pagedlen = datalen - transhdrlen; datalen already includes fraggap (datalen = length + fraggap). When fraggap is non-zero, this is not the first skb and transhdrlen is zero. The fraggap bytes carried over from the previous skb are copied just past the fragment headers in the new skb's linear area. The linear area is therefore undersized by fraggap bytes while pagedlen is overstated by the same amount, and the copy writes past skb->end into the trailing skb_shared_info. An unprivileged user can trigger this via a UDPv6 socket using MSG_MORE together with MSG_SPLICE_PAGES. The bad accounting was introduced by commit 773ba4fe9104 ("ipv6: avoid partial copy for zc"). Before commit ce650a166335 ("udp6: Fix __ip6_append_data()'s handling of MSG_SPLICE_PAGES"), the negative copy value caused -EINVAL to be returned. That later commit allowed MSG_SPLICE_PAGES to proceed in this case, making the corruption triggerable. The non-paged branch sets alloclen to fraglen, which already accounts for fraggap because datalen does. Bring the paged branch in line by adding fraggap to alloclen and subtracting it from pagedlen. After this adjustment, copy no longer collapses to -fraggap on the paged path, so remove the stale comment describing that old arithmetic. Since a negative copy is no longer expected for a valid MSG_SPLICE_PAGES case, remove the MSG_SPLICE_PAGES exception from the negative copy check. Fixes: 773ba4fe9104 ("ipv6: avoid partial copy for zc") Signed-off-by: Jungwoo Lee <[email protected]> Signed-off-by: Wongi Lee <[email protected]> Reviewed-by: Ido Schimmel <[email protected]> Link: https://patch.msgid.link/[email protected] Signed-off-by: Jakub Kicinski <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]>
Author: Qingshuang Fu <[email protected]> Date: Thu Jun 18 10:13:52 2026 +0800 irqchip/imgpdc: Fix resource leak, add missing chained handler cleanup on remove commit 37738fdf2ab1e504d1c63ce5bc0aeb6452d8f057 upstream. The driver allocates domain generic chips using irq_alloc_domain_generic_chips() during probe and sets up chained handlers using irq_set_chained_handler_and_data(). However, on driver removal, the generic chips are not freed and the chained handlers are not removed. The generic chips remain on the global gc_list and may later be accessed by generic interrupt chip suspend, resume, or shutdown callbacks after the driver has been removed, potentially resulting in a use-after-free and kernel crash. The chained handlers that were installed in probe for peripheral and syswake interrupts are also left dangling, which can lead to spurious interrupts accessing freed memory. Fix these issues by: - Setting IRQ_DOMAIN_FLAG_DESTROY_GC flag in domain->flags, so the core code automatically removes generic chips when irq_domain_remove() is called - Clearing all chained handlers with NULL in pdc_intc_remove() Fixes: b6ef9161e43a ("irq-imgpdc: add ImgTec PDC irqchip driver") Signed-off-by: Qingshuang Fu <[email protected]> Signed-off-by: Thomas Gleixner <[email protected]> Cc: [email protected] Link: https://patch.msgid.link/[email protected] Signed-off-by: Greg Kroah-Hartman <[email protected]>
Author: Usama Arif <[email protected]> Date: Tue Jun 16 07:15:17 2026 -0700 kernel/fork: clear PF_BLOCK_TS in copy_process() commit fd38b75c4b43295b10d69772a46d1c74dbd6fc81 upstream. PF_BLOCK_TS is only set in blk_time_get_ns() when current->plug is non-NULL, and blk_finish_plug() clears it via __blk_flush_plug() before NULLing the plug pointer. copy_process() breaks the invariant by inheriting PF_BLOCK_TS from the parent while resetting the child's plug to NULL. Clear PF_BLOCK_TS alongside that assignment so callers can rely on "PF_BLOCK_TS set implies current->plug != NULL" and dereference current->plug unguarded. Fixes: 06b23f92af87 ("block: update cached timestamp post schedule/preemption") Cc: [email protected] Signed-off-by: Usama Arif <[email protected]> Link: https://patch.msgid.link/[email protected] Signed-off-by: Jens Axboe <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]>
Author: Jarkko Sakkinen <[email protected]> Date: Mon Jun 1 23:11:54 2026 +0300 KEYS: fix overflow in keyctl_pkey_params_get_2() commit cb481e59ea6cae3b7796ac1d7a22b6b24c3f3c0b upstream. The length for the internal output buffer is calculated incorrectly, which can result overflow when a too small buffer is provided. Fix the bug by allocating internal output with the size of the maximum length of the cryptographic primitive instead of caller provided size. Link: https://lore.kernel.org/keyrings/[email protected]/ Cc: [email protected] # v4.20+ Fixes: 00d60fd3b932 ("KEYS: Provide keyctls to drive the new key type ops for asymmetric keys [ver #2]") Reported-by: Alessandro Groppo <[email protected]> Tested-by: Alessandro Groppo <[email protected]> Signed-off-by: Jarkko Sakkinen <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]>
Author: Shaomin Chen <[email protected]> Date: Wed Jun 10 13:10:05 2026 +0300 keys: Pin request_key_auth payload in instantiate paths commit fd15b457a86939c38aa12116adabd8ff686c5e51 upstream. A: request_key() B: KEYCTL_INSTANTIATE_IOV ================ ========================= create auth key store rka in auth key wait for helper get auth key load rka from auth key copy user payload sleep on #PF helper completed detach and free rka destroy auth key wake up use rka->target_key **USE-AFTER-FREE** Give request_key_auth payloads a refcount. Take a payload reference while authkey->sem stabilizes the payload and revocation state. Hold that reference across the instantiate and reject paths. Drop the auth key owning reference from revoke and destroy. [jarkko: Replaced the first two paragraphs of text with an actual concurrency scenario.] Cc: [email protected] # v5.10+ Fixes: b5f545c880a2 ("[PATCH] keys: Permit running process to instantiate keys") Reported-by: Shaomin Chen <[email protected]> Closes: https://lore.kernel.org/r/[email protected] Signed-off-by: Shaomin Chen <[email protected]> Signed-off-by: Jarkko Sakkinen <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]>
Author: Hem Parekh <[email protected]> Date: Tue Jun 2 16:56:46 2026 -0700 ksmbd: fix out-of-bounds read in smb_check_perm_dacl() commit 1ef06004ed4bd6d3ed8c840d9d1a376b66d4935b upstream. The permission-check ACE walk in smb_check_perm_dacl() validates the ACE header size and caps sid.num_subauth at SID_MAX_SUB_AUTHORITIES, but it never checks that ace->size is actually large enough to contain num_subauth sub-authorities before compare_sids() dereferences them. CIFS_SID_BASE_SIZE covers the SID header up to but excluding the sub_auth[] array, and offsetof(struct smb_ace, sid) is the ACE header, so the existing guards only guarantee the 8-byte SID base, i.e. zero sub-authorities. compare_sids() then reads ace->sid.sub_auth[i] for i < min(local_sid->num_subauth, ace->sid.num_subauth). The local comparison SIDs (sid_everyone, sid_unix_NFS_mode, and the id_to_sid() result) always have at least one sub-authority, and an attacker controls the ACE revision and authority bytes (which lie within the in-bounds SID base), so they can match one of those SIDs and force the sub_auth read. A crafted ACE with size == 16 and num_subauth >= 1 placed at the tail of the security descriptor therefore causes a heap out-of-bounds read of up to SID_MAX_SUB_AUTHORITIES * sizeof(__le32) bytes past the pntsd allocation. The security descriptor is loaded by ksmbd_vfs_get_sd_xattr() into a buffer sized exactly to the on-disk data (kzalloc(sd_size) in ndr_decode_v4_ntacl()), so the read lands past the allocation. The malformed descriptor can be stored verbatim via SMB2_SET_INFO (the DACL is not normalised before being written to the security.NTACL xattr) and the read fires on a subsequent SMB2_CREATE access check, making this reachable by an authenticated client on a share that uses ACL xattrs. Add the missing num_subauth-versus-ace_size check, mirroring the identical guards already present in the sibling parsers parse_dacl() and smb_inherit_dacl(). Fixes: d07b26f39246 ("ksmbd: require minimum ACE size in smb_check_perm_dacl()") Cc: [email protected] Signed-off-by: Hem Parekh <[email protected]> Acked-by: Namjae Jeon <[email protected]> Signed-off-by: Steve French <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]>
Author: Gil Portnoy <[email protected]> Date: Thu Jun 11 22:59:19 2026 +0900 ksmbd: reject non-VALID session in compound request branch commit 609ca17d869d04ba249e32cdcbf13c0b1c66f43c upstream. smb2_check_user_session() takes a shortcut for any operation that is not the first in a COMPOUND request: it reuses work->sess (the session bound by the first operation) and validates only the SessionId, then returns "valid". It never re-checks work->sess->state == SMB2_SESSION_VALID, and a SessionId of 0xFFFFFFFFFFFFFFFF (ULLONG_MAX, the MS-SMB2 related-operation value) skips even the id comparison. The standalone path (ksmbd_session_lookup_all() plus the SESSION_SETUP state machine) does enforce the VALID state; the compound branch bypasses all of it. A SESSION_SETUP carrying only an NTLM Type-1 (NtLmNegotiate) blob publishes a fresh SMB2_SESSION_IN_PROGRESS session whose sess->user is still NULL (->user is assigned later, by ntlm_authenticate()). Used as operation 1 of a COMPOUND with operation 2 = TREE_CONNECT (related, SessionId=ULLONG_MAX, \\host\IPC$), the tree-connect then runs on that IN_PROGRESS session and reaches ksmbd_ipc_tree_connect_request(), which dereferences user_name(sess->user) with sess->user == NULL (transport_ipc.c:687/701/704) -> remote NULL-pointer dereference and a kernel Oops that wedges the ksmbd worker for all clients. Reject any non-first compound operation that lands on a session which is not SMB2_SESSION_VALID, mirroring the validity the standalone lookup path enforces. SESSION_SETUP itself legitimately runs on an IN_PROGRESS session, but it is never carried as a non-first compound operation, so multi-leg authentication is unaffected by this check. Fixes: 5005bcb42191 ("ksmbd: validate session id and tree id in the compound request") Cc: [email protected] Signed-off-by: Gil Portnoy <[email protected]> Acked-by: Namjae Jeon <[email protected]> Signed-off-by: Steve French <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]>
Author: Sean Christopherson <[email protected]> Date: Fri Jun 12 15:52:41 2026 -0700 KVM: Replace guest-triggerable BUG_ON() in ioeventfd datamatch with get_unaligned() commit f1edbed787ba67988ed34e0132ca128b052b6ce8 upstream. Drop a BUG_ON() that has been reachable since it was first added, way back in 2009, and instead use get_unaligned() to perform potentially-unaligned accesses. For a given store, KVM x86's emulator tracks the entire value in the destination operand, x86_emulate_ctxt.dst. If the destination is memory, and the target splits multiple pages and/or is emulated MMIO, then KVM handles each fragment independently. E.g. on a page split starting at page offset 0xffc, KVM writes 4 bytes to the first page, then the remaining bytes to the second page, using ctxt->dst as the source for both (with appropriate offsets). If the destination splits a page *and* hits emulated MMIO on the second page, then KVM will complete the write to the first page, then emulate the MMIO access to the second page. If there is a datamatch-enabled ioeventfd at offset 0 of the second page, then KVM will process the remainder of the store as a potential ioeventfd signal. Putting it all together, if the guest emits a store that splits a page starting at page offset N, and the second page has a datamatch-enabled ioeventfd at offset 0, then KVM will check for datamatch using &dst.valptr[N] as the source. Due to dst (and thus dst.valptr) being 32-byte aligned, if N is not aligned to @len, the BUG_ON() fires. E.g. with a 16-byte store at page offset 0xffc, to an ioeventfd of len 8, all initial checks in ioeventfd_in_range() will succeed, and the BUG_ON() fires due to @val being 4-byte aligned, but not 8-byte aligned. ------------[ cut here ]------------ kernel BUG at arch/x86/kvm/../../../virt/kvm/eventfd.c:783! Oops: invalid opcode: 0000 [#1] SMP CPU: 0 UID: 1000 PID: 615 Comm: repro Not tainted 7.1.0-rc2-ff238429d1ea #365 PREEMPT Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 0.0.0 02/06/2015 RIP: 0010:ioeventfd_write+0x6c/0x70 [kvm] Call Trace: <TASK> __kvm_io_bus_write+0x85/0xb0 [kvm] kvm_io_bus_write+0x53/0x80 [kvm] vcpu_mmio_write+0x66/0xf0 [kvm] emulator_read_write_onepage+0x12a/0x540 [kvm] emulator_read_write+0x109/0x2b0 [kvm] x86_emulate_insn+0x4f8/0xfb0 [kvm] x86_emulate_instruction+0x181/0x790 [kvm] kvm_mmu_page_fault+0x313/0x630 [kvm] vmx_handle_exit+0x18a/0x590 [kvm_intel] kvm_arch_vcpu_ioctl_run+0xc81/0x1c90 [kvm] kvm_vcpu_ioctl+0x2d5/0x970 [kvm] __x64_sys_ioctl+0x8a/0xd0 do_syscall_64+0xb7/0x890 entry_SYSCALL_64_after_hwframe+0x4b/0x53 RIP: 0033:0x7f19c931a9bf </TASK> Modules linked in: kvm_intel kvm irqbypass ---[ end trace 0000000000000000 ]--- In a perfect world, the fix would be to simply delete the BUG_ON(), as KVM x86 doesn't perform alignment checks on "normal" memory accesses at CPL0. Sadly, C99 ruins all the fun; while the x86 architecture plays nice, dereferencing an unaligned pointer directly is undefined behavior in C, e.g. triggers splats when running with CONFIG_UBSAN_ALIGNMENT=y. Fixes: d34e6b175e61 ("KVM: add ioeventfd support") Cc: [email protected] Signed-off-by: Sean Christopherson <[email protected]> Message-ID: <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]>
Author: Sean Christopherson <[email protected]> Date: Fri Jun 26 21:28:54 2026 +0200 KVM: SEV: Ignore MMIO requests of length '0' commit 1aa8a6dc7dac8b83234b53518311bf78231f4fa5 upstream. Explicitly ignore MMIO requests of length '0', so that setting up the software scratch area (and other code) doesn't have to worry about underflowing the length, and to allow for special casing '0' in the future. Fixes: 8f423a80d299 ("KVM: SVM: Support MMIO for an SEV-ES guest") Cc: [email protected] Reviewed-by: Tom Lendacky <[email protected]> Signed-off-by: Sean Christopherson <[email protected]> Message-ID: <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]> Signed-off-by: Jack Wang <[email protected]> Signed-off-by: Sasha Levin <[email protected]>
Author: Sean Christopherson <[email protected]> Date: Fri Jun 26 21:28:56 2026 +0200 KVM: SEV: Ignore Port I/O requests of length '0' commit 3988bd2723de407ae90fa7a6f6029b4e60238c58 upstream. Explicitly ignore Port I/O requests of length '0' (or count '0'), so that setting up the software scratch area (and other code) doesn't have to worry about underflowing the length, and to allow for WARNing on trying to configure the scratch area with len==0. Fixes: 291bd20d5d88 ("KVM: SVM: Add initial support for a VMGEXIT VMEXIT") Cc: [email protected] Reviewed-by: Tom Lendacky <[email protected]> Signed-off-by: Sean Christopherson <[email protected]> Message-ID: <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]> Signed-off-by: Jack Wang <[email protected]> Signed-off-by: Sasha Levin <[email protected]>
Author: Sean Christopherson <[email protected]> Date: Tue Jun 30 10:22:03 2026 -0700 KVM: SEV: Move sev_free_vcpu() down below sev_es_unmap_ghcb() [ Upstream commit 08385c5e1814edee829ffe475d559ed730354335 ] Relocate sev_free_vcpu() down in sev.c so that it's definition comes after sev_es_unmap_ghcb(). This will allow sharing unmap functionality between the two functions without needing a forward declaration (or weird placement of the common code). No functional change intended. Cc: [email protected] Reviewed-by: Tom Lendacky <[email protected]> Reviewed-by: Michael Roth <[email protected]> Signed-off-by: Sean Christopherson <[email protected]> Message-ID: <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]> Message-ID: <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]> [sean: Preserve use of sev_es_guest() as is_sev_es_guest() doesn't exist in 6.12, resolve superficial conflict due to pre_sev_run() prototype mismatch.] Signed-off-by: Sean Christopherson <[email protected]> Signed-off-by: Sasha Levin <[email protected]>
Author: Sean Christopherson <[email protected]> Date: Fri Jun 26 21:28:55 2026 +0200 KVM: SEV: Reject MMIO requests larger than 8 bytes with GHCB v2+ commit dcf1b2d4b0564a27e4ca7c654871aab4f9620046 upstream. When using GHCB v2+, reject MMIO requests that are larger than 8 bytes. Per the GHCB spec: SW_EXITINFO2 must be less than or equal to 0x7fffffff for version 1 and less than or equal to 0x8 for all other versions. Fixes: 4af663c2f64a ("KVM: SEV: Allow per-guest configuration of GHCB protocol version") Cc: [email protected] Reviewed-by: Tom Lendacky <[email protected]> Signed-off-by: Sean Christopherson <[email protected]> Message-ID: <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]> Signed-off-by: Jack Wang <[email protected]> Signed-off-by: Sasha Levin <[email protected]>
Author: Sean Christopherson <[email protected]> Date: Tue Jun 30 10:22:04 2026 -0700 KVM: SEV: Unmap and unpin the GHCB as needed on vCPU free [ Upstream commit db38bcb3311053954f62b865cd2d86e164b04351 ] Unmap and unpin the GHCB as needed when freeing a vCPU. If the VM is destroyed after mapping+pinning the GHCB on #VMGEXIT, without re-running the vCPU, KVM will effectively leak the GHCB and any mappings created for the GHCB. Fixes: 291bd20d5d88 ("KVM: SVM: Add initial support for a VMGEXIT VMEXIT") Cc: [email protected] Tested-by: Michael Roth <[email protected]> Reviewed-by: Tom Lendacky <[email protected]> Reviewed-by: Michael Roth <[email protected]> Signed-off-by: Sean Christopherson <[email protected]> Message-ID: <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]> Message-ID: <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]> [sean: Preserve @dirty=true param to kvm_vcpu_unmap()] Signed-off-by: Sean Christopherson <[email protected]> Signed-off-by: Sasha Levin <[email protected]>
Author: Ashutosh Desai <[email protected]> Date: Fri May 1 13:35:32 2026 -0700 KVM: SVM: Fix page overflow in sev_dbg_crypt() for ENCRYPT path commit 78ee2d50185a037b3d2452a97f3dad69c3f7f389 upstream. In sev_dbg_crypt(), the per-iteration transfer length is bounded by the source page offset (PAGE_SIZE - s_off) but not by the destination page offset (PAGE_SIZE - d_off). When d_off > s_off, the encrypt path (__sev_dbg_encrypt_user) performs a read-modify-write using a single-page intermediate buffer (dst_tpage): 1. __sev_dbg_decrypt() expands the size to round_up(len + (d_off & 15), 16) before issuing the PSP command. If len + (d_off & 15) > PAGE_SIZE, the PSP writes beyond the end of the 4096-byte dst_tpage allocation. 2. The subsequent memcpy()/copy_from_user() into page_address(dst_tpage) + (d_off & 15) of 'len' bytes overflows by up to 15 bytes under the same condition. Trigger example: s_off = 0, d_off = 1, debug.len = PAGE_SIZE - the PSP is instructed to write round_up(4097, 16) = 4112 bytes to a 4096-byte buffer. Fix by also bounding len by (PAGE_SIZE - d_off), the same check that sev_send_update_data() already performs for its single-page guest region. ================================================================== BUG: KASAN: slab-use-after-free in sev_dbg_crypt+0x993/0xd10 [kvm_amd] Write of size 4095 at addr ff110062293bb009 by task sev_dbg_test/228214 CPU: 96 UID: 0 PID: 228214 Comm: sev_dbg_test Tainted: G U W 7.0.0-smp--5ce9b0c48211-dbg #156 PREEMPTLAZY Tainted: [U]=USER, [W]=WARN Hardware name: Google Astoria/astoria, BIOS 0.20250817.1-0 08/25/2025 Call Trace: <TASK> dump_stack_lvl+0x54/0x70 print_report+0xbc/0x260 kasan_report+0xa2/0xd0 kasan_check_range+0x25f/0x2c0 __asan_memcpy+0x40/0x70 sev_dbg_crypt+0x993/0xd10 [kvm_amd] sev_mem_enc_ioctl+0x33c/0x450 [kvm_amd] kvm_vm_ioctl+0x65d/0x6d0 [kvm] __se_sys_ioctl+0xb2/0x100 do_syscall_64+0xe8/0x870 entry_SYSCALL_64_after_hwframe+0x4b/0x53 </TASK> The buggy address belongs to the physical page: page: refcount:1 mapcount:0 mapping:0000000000000000 index:0x7fe72b6a0 pfn:0x62293bb memcg:ff11000112827d82 flags: 0x1400000000000000(node=1|zone=1) raw: 1400000000000000 0000000000000000 dead000000000122 0000000000000000 raw: 00000007fe72b6a0 0000000000000000 00000001ffffffff ff11000112827d82 page dumped because: kasan: bad access detected Memory state around the buggy address: ff110062293bbf00: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ff110062293bbf80: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 >ff110062293bc000: fa fb fb fb fb fb fb fb fc fc fc fc fc fc fc fc ^ ff110062293bc080: fa fb fb fb fb fb fb fb fc fc fc fc fc fc fc fc ff110062293bc100: fa fb fb fb fb fb fb fb fc fc fc fc fc fc fc fc ================================================================== Disabling lock debugging due to kernel taint Fixes: 24f41fb23a39 ("KVM: SVM: Add support for SEV DEBUG_DECRYPT command") Fixes: 7d1594f5d94b ("KVM: SVM: Add support for SEV DEBUG_ENCRYPT command") Cc: [email protected] Signed-off-by: Ashutosh Desai <[email protected]> [sean: add sample KASAN splat, Fixes, and stable@] Link: https://patch.msgid.link/[email protected] Signed-off-by: Sean Christopherson <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]>
Author: Sean Christopherson <[email protected]> Date: Fri Jun 26 13:24:05 2026 +0200 KVM: x86/mmu: Ensure hugepage is in by slot before checking max mapping level commit ef057cbf825e03b63f6edf5980f96abf3c53089d upstream. When recovering hugepages in the shadow MMU, verify that the base gfn of the shadow page is actually contained within the target memslot, *before* querying the max mapping level given the shadow page's gfn. Failure to pre-check the validity of the gfn can lead to an out-of-bounds access to the slot's lpage_info (which typically manifests as a host #PF because the lpage_info is vmalloc'd) if the guest creates a hugepage mapping (in its PTEs) that extends "below" the bounds of a memslot. When faulting in memory for a guest, and the size of the guest mapping is greater than KVM's (current) max mapping, then KVM will create a "direct" shadow page (direct in that there are no gPTEs to shadow, and so the target gfn is a direct calculation given the base gfn of the shadow page). The hugepage recovery flow looks for such direct shadow pages, as forcing 4KiB mappings when dirty logging generates the guest > host mapping size case. When the 4KiB restriction is lifted, then KVM can replace the shadow page with a hugepage. But if KVM originally used a smaller mapping than the guest because the range of memory covered by the guest hugepage exceeds the bounds of a memslot, then KVM will link a direct shadow page with a gfn that is outside the bounds of the memslot being used to fault in memory. The rmap entry added for the leaf mapping is correct and within bounds, but the gfn of the leaf SPTE's parent shadow page will be out of bounds. BUG: unable to handle page fault for address: ffffc90000806ffc #PF: supervisor read access in kernel mode #PF: error_code(0x0000) - not-present page PGD 100000067 P4D 100000067 PUD 1002a7067 PMD 10612f067 PTE 0 Oops: Oops: 0000 [#1] SMP CPU: 13 UID: 1000 PID: 757 Comm: mmu_stress_test Not tainted 7.1.0-rc1-48ce1e26eace-x86_pir_to_irr_comments-vm #341 PREEMPT Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 0.0.0 02/06/2015 RIP: 0010:kvm_mmu_max_mapping_level+0x79/0x2b0 [kvm] Call Trace: <TASK> kvm_mmu_recover_huge_pages+0x21b/0x320 [kvm] kvm_set_memslot+0x1ee/0x590 [kvm] kvm_set_memory_region.part.0+0x3a1/0x4d0 [kvm] kvm_vm_ioctl+0x9bf/0x15d0 [kvm] __x64_sys_ioctl+0x8a/0xd0 do_syscall_64+0xb7/0xbb0 entry_SYSCALL_64_after_hwframe+0x4b/0x53 RIP: 0033:0x7f21c0f1a9bf </TASK> Don't bother pre-checking the bounds of the potential hugepage, i.e. don't check that e.g. sp->gfn + KVM_PAGES_PER_HPAGE(sp->role.level + 1) is also within the memslot, as the checks performed by kvm_mmu_max_mapping_level() are a superset of the basic bounds checks. I.e. pre-checking the full range would be a dubious micro-optimization. Fixes: 9eba50f8d7fc ("KVM: x86/mmu: Consult max mapping level when zapping collapsible SPTEs") Cc: [email protected] Cc: David Matlack <[email protected]> Cc: James Houghton <[email protected]> Cc: Alexander Bulekov <[email protected]> Cc: Fred Griffoul <[email protected]> Cc: Alexander Graf <[email protected]> Cc: David Woodhouse <[email protected]> Cc: Filippo Sironi <[email protected]> Cc: Ivan Orlov <[email protected]> Signed-off-by: Sean Christopherson <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]> Signed-off-by: Sasha Levin <[email protected]>
Author: Paolo Bonzini <[email protected]> Date: Fri Jun 26 13:24:04 2026 +0200 KVM: x86: Fix shadow paging use-after-free due to unexpected role commit 81ccda30b4e83d8f5cc4fd50503c44e3a33abfeb upstream. Commit 0cb2af2ea66ad ("KVM: x86: Fix shadow paging use-after-free due to unexpected GFN") fixed a shadow paging mismatch between stored and computed GFNs; the bug could be triggered by changing a PDE mapping from outside the guest, and then deleting a memslot. The rmap_remove() call would miss entries created after the PDE change because the GFN of the leaf SPTE does not match the GFN of the struct kvm_mmu_page. A similar hole however remains if the modified PDE points to a non-leaf page. In this case the gfn can be made to match, but the role does not match: the original large 2MB page creates a kvm_mmu_page with direct=1, while the new 4KB needs a kvm_mmu_page with direct=0. However, kvm_mmu_get_child_sp() does not compare the role, and therefore reuses the page. The next step is installing a leaf (4KB) SPTE on the new path which records an rmap entry under the gfn resolved by the walk. But when that child is zapped its parent kvm_mmu_page has direct=1 and kvm_mmu_page_get_gfn() computes the gfn for the 4KB page as sp->gfn + index instead of using sp->shadowed_translation[] (or sp->gfns[] in older kernels). It therefore fails to remove the recorded entry. When the memslot is dropped the shadow page is freed but the rmap entry survives, as in the scenario that was already fixed. Code that later walks that gfn (dirty logging, MMU notifier invalidation, and so on) dereferences an sptep that lies in the freed page, causing the use-after-free. Fixes: 2032a93d66fa ("KVM: MMU: Don't allocate gfns page for direct mmu pages") Reported-by: Hyunwoo Kim <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]> Signed-off-by: Sasha Levin <[email protected]>
Author: Hyunwoo Kim <[email protected]> Date: Sat Jun 6 23:44:52 2026 +0900 KVM: x86: hyper-v: Bound the bank index when querying sparse banks commit 4721f8160f17554b003e8928bb61e6c9b2fe92a3 upstream. When checking if a VP ID is included in a sparse bank set, explicitly check that the ID can actually be contained in a sparse bank (the TLFS allows for a maximum of 64 banks of 64 vCPUs each). When handling a paravirtual TLB flush for L2, the VP ID is copied verbatim from the enlightened VMCS, without any bounds check, i.e. isn't guaranteed to be under the limit of 4096. Failure to check the bounds of the VP ID leads to an out-of-bounds read when testing the sparse bank, and super strictly speaking could lead to KVM performing an unnecessary TLB flush for an L2 vCPU. ================================================================== BUG: KASAN: use-after-free in hv_is_vp_in_sparse_set+0x85/0x100 [kvm] Read of size 8 at addr ffff88811ba5f598 by task hyperv_evmcs/2802 CPU: 12 UID: 1000 PID: 2802 Comm: hyperv_evmcs Not tainted 7.1.0-rc2 #7 PREEMPT Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 0.0.0 02/06/2015 Call Trace: <TASK> dump_stack_lvl+0x51/0x60 print_report+0xcb/0x5d0 kasan_report+0xb4/0xe0 kasan_check_range+0x35/0x1b0 hv_is_vp_in_sparse_set+0x85/0x100 [kvm] kvm_hv_flush_tlb+0xe9e/0x16c0 [kvm] kvm_hv_hypercall+0xe6b/0x1e60 [kvm] vmx_handle_exit+0x485/0x1b60 [kvm_intel] kvm_arch_vcpu_ioctl_run+0x22e3/0x5070 [kvm] kvm_vcpu_ioctl+0x5d0/0x10c0 [kvm] __x64_sys_ioctl+0x129/0x1a0 do_syscall_64+0xb9/0xcf0 entry_SYSCALL_64_after_hwframe+0x4b/0x53 RIP: 0033:0x7f0e62d1a9bf </TASK> The buggy address belongs to the physical page: page: refcount:0 mapcount:0 mapping:0000000000000000 index:0xffffffffffffffff pfn:0x11ba5f flags: 0x4000000000000000(zone=1) raw: 4000000000000000 0000000000000000 00000000ffffffff 0000000000000000 raw: ffffffffffffffff 0000000000000000 00000000ffffffff 0000000000000000 page dumped because: kasan: bad access detected Memory state around the buggy address: ffff88811ba5f480: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ffff88811ba5f500: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff >ffff88811ba5f580: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ^ ffff88811ba5f600: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ffff88811ba5f680: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ================================================================== Disabling lock debugging due to kernel taint Opportunistically add a compile time assertion to ensure the maximum number of sparse banks exactly matches the number of possible bits in the passed in mask. Cc: [email protected] Fixes: c58a318f6090 ("KVM: x86: hyper-v: L2 TLB flush") Signed-off-by: Hyunwoo Kim <[email protected]> Reviewed-by: Vitaly Kuznetsov <[email protected]> Link: https://patch.msgid.link/aiQyZIJtO-2Aj_xN@v4bel [sean: add KASAN splat, drop comment, add assert, massage changelog] Signed-off-by: Sean Christopherson <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]>
Author: Greg Kroah-Hartman <[email protected]> Date: Sat Jul 4 13:43:36 2026 +0200 Linux 6.12.95 Link: https://lore.kernel.org/r/[email protected] Tested-by: Brett A C Sheffield <[email protected]> Tested-by: Peter Schneider <[email protected]> Tested-by: Salvatore Bonaccorso <[email protected]> Tested-by: Miguel Ojeda <[email protected]> Tested-by: Shung-Hsi Yu <[email protected]> Tested-by: Francesco Dolcini <[email protected]> Tested-by: Pavel Machek (CIP) <[email protected]> Link: https://lore.kernel.org/r/[email protected] Tested-by: Pavel Machek (CIP) <[email protected]> Tested-by: Brett A C Sheffield <[email protected]> Tested-by: Francesco Dolcini <[email protected]> Tested-by: Mark Brown <[email protected]> Tested-by: Peter Schneider <[email protected]> Tested-by: Ron Economos <[email protected]> Tested-by: Miguel Ojeda <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]>
Author: Peter Zijlstra <[email protected]> Date: Tue Jun 16 15:06:00 2026 -0400 locking/mutex: Remove wakeups from under mutex::wait_lock [ Upstream commit 894d1b3db41cf7e6ae0304429a1747b3c3f390bc ] In preparation to nest mutex::wait_lock under rq::lock we need to remove wakeups from under it. Do this by utilizing wake_qs to defer the wakeup until after the lock is dropped. [Heavily changed after 55f036ca7e74 ("locking: WW mutex cleanup") and 08295b3b5bee ("locking: Implement an algorithm choice for Wound-Wait mutexes")] [jstultz: rebased to mainline, added extra wake_up_q & init to avoid hangs, similar to Connor's rework of this patch] Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Signed-off-by: Juri Lelli <[email protected]> Signed-off-by: John Stultz <[email protected]> Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Reviewed-by: Metin Kaya <[email protected]> Acked-by: Davidlohr Bueso <[email protected]> Tested-by: K Prateek Nayak <[email protected]> Tested-by: Metin Kaya <[email protected]> Link: https://lore.kernel.org/r/[email protected] Stable-dep-of: 40a25d59e85b ("locking/rtmutex: Skip remove_waiter() when waiter is not enqueued") Signed-off-by: Sasha Levin <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]>
Author: Davidlohr Bueso <[email protected]> Date: Tue Jun 16 15:06:01 2026 -0400 locking/rtmutex: Skip remove_waiter() when waiter is not enqueued [ Upstream commit 40a25d59e85b3c8709ac2424d44f65610467871e ] syzbot triggered the following splat in remove_waiter() via FUTEX_CMP_REQUEUE_PI: KASAN: null-ptr-deref in range [0x0000000000000a88-0x0000000000000a8f] class_raw_spinlock_constructor remove_waiter+0x159/0x1200 kernel/locking/rtmutex.c:1561 rt_mutex_start_proxy_lock+0x103/0x120 futex_requeue+0x10e4/0x20d0 __x64_sys_futex+0x34f/0x4d0 task_blocks_on_rt_mutex() does not arm the waiter upon deadlock detection, leaving waiter->task nil, where 3bfdc63936dd ("rtmutex: Use waiter::task instead of current in remove_waiter()") made this fatal. Furthermore, rt_mutex_start_proxy_lock() should not be calling into remove_waiter() upon a successfully grabbing the rtmutex. 1a1fb985f2e2 ("futex: Handle early deadlock return correctly"), moved the remove_waiter() out of __rt_mutex_start_proxy_lock() (where 'ret' was only ever 0 or < 0) into the wrapper. Tighten this check to account for try_to_take_rt_mutex(). Fixes: 3bfdc63936dd ("rtmutex: Use waiter::task instead of current in remove_waiter()") Reported-by: [email protected] Signed-off-by: Davidlohr Bueso <[email protected]> Signed-off-by: Thomas Gleixner <[email protected]> Cc: [email protected] Closes: https://lore.kernel.org/all/[email protected]/ Link: https://patch.msgid.link/[email protected] Signed-off-by: Sasha Levin <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]>
Author: John Stultz <[email protected]> Date: Thu Nov 14 11:00:47 2024 -0800 locking: rtmutex: Fix wake_q logic in task_blocks_on_rt_mutex commit 82f9cc094975240885c93effbca7f4603f5de1bf upstream. Anders had bisected a crash using PREEMPT_RT with linux-next and isolated it down to commit 894d1b3db41c ("locking/mutex: Remove wakeups from under mutex::wait_lock"), where it seemed the wake_q structure was somehow getting corrupted causing a null pointer traversal. I was able to easily repoduce this with PREEMPT_RT and managed to isolate down that through various call stacks we were actually calling wake_up_q() twice on the same wake_q. I found that in the problematic commit, I had added the wake_up_q() call in task_blocks_on_rt_mutex() around __ww_mutex_add_waiter(), following a similar pattern in __mutex_lock_common(). However, its just wrong. We haven't dropped the lock->wait_lock, so its contrary to the point of the original patch. And it didn't match the __mutex_lock_common() logic of re-initializing the wake_q after calling it midway in the stack. Looking at it now, the wake_up_q() call is incorrect and should just be removed. So drop the erronious logic I had added. Fixes: 894d1b3db41c ("locking/mutex: Remove wakeups from under mutex::wait_lock") Closes: https://lore.kernel.org/lkml/[email protected]/ Reported-by: Anders Roxell <[email protected]> Reported-by: Arnd Bergmann <[email protected]> Signed-off-by: John Stultz <[email protected]> Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Reviewed-by: Juri Lelli <[email protected]> Tested-by: Anders Roxell <[email protected]> Tested-by: K Prateek Nayak <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Greg Kroah-Hartman <[email protected]>
Author: Huacai Chen <[email protected]> Date: Thu Jun 25 13:03:49 2026 +0800 LoongArch: Report dying CPU to RCU in stop_this_cpu() commit f2539c56c74691e7a88af6372ba2b48c06ed2fe4 upstream. This is a port of MIPS commit 9f3f3bdc6d9dac1 ("MIPS: smp: report dying CPU to RCU in stop_this_cpu()"). smp_send_stop() parks all secondary CPUs in stop_this_cpu(). And the function marks the CPU offline for the scheduler via set_cpu_online(false) but never informs RCU, so RCU keeps expecting a quiescent state from CPUs that are now spinning forever with interrupts disabled. As long as nothing waits for an RCU grace period after smp_send_stop() this is harmless, which is why it went unnoticed. However, since commit 91840be8f710370 ("irq_work: Fix use-after-free in irq_work_single() on PREEMPT_RT"), irq_work_sync() calls synchronize_rcu() on architectures without an irq_work self-IPI, i.e. where arch_irq_work_has_interrupt() returns false. Any irq_work_sync() issued in the reboot/shutdown/halt path after smp_send_stop() then blocks on a grace period that can never complete, hanging the reboot: WARNING: CPU: 0 PID: 15 at kernel/irq_work.c:144 irq_work_queue_on ... rcu: INFO: rcu_sched detected stalls on CPUs/tasks: rcu: Offline CPU 1 blocking current GP. rcu: Offline CPU 2 blocking current GP. rcu: Offline CPU 3 blocking current GP. This issue needs some hacks to reproduce, and it was not noticed on LoongArch because arch_irq_work_has_interrupt() usually returns true. Call rcutree_report_cpu_dead() once interrupts are disabled, mirroring the generic CPU-hotplug offline path, so RCU stops waiting on the parked CPUs and grace periods can still complete. LoongArch shuts down all CPUs here without going through the CPU-hotplug mechanism, so this report is not otherwise issued. Cc: <[email protected]> Fixes: 91840be8f710 ("irq_work: Fix use-after-free in irq_work_single() on PREEMPT_RT") Reviewed-by: Guo Ren <[email protected]> Signed-off-by: Huacai Chen <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]>
Author: Paul Moore <[email protected]> Date: Mon Jun 29 15:03:37 2026 +0800 lsm: add backing_file LSM hooks [ Upstream commit 6af36aeb147a06dea47c49859cd6ca5659aeb987 ] Stacked filesystems such as overlayfs do not currently provide the necessary mechanisms for LSMs to properly enforce access controls on the mmap() and mprotect() operations. In order to resolve this gap, a LSM security blob is being added to the backing_file struct and the following new LSM hooks are being created: security_backing_file_alloc() security_backing_file_free() security_mmap_backing_file() The first two hooks are to manage the lifecycle of the LSM security blob in the backing_file struct, while the third provides a new mmap() access control point for the underlying backing file. It is also expected that LSMs will likely want to update their security_file_mprotect() callback to address issues with their mprotect() controls, but that does not require a change to the security_file_mprotect() LSM hook. There are a three other small changes to support these new LSM hooks: * Pass the user file associated with a backing file down to alloc_empty_backing_file() so it can be included in the security_backing_file_alloc() hook. * Add getter and setter functions for the backing_file struct LSM blob as the backing_file struct remains private to fs/file_table.c. * Constify the file struct field in the LSM common_audit_data struct to better support LSMs that need to pass a const file struct pointer into the common LSM audit code. Thanks to Arnd Bergmann for identifying the missing EXPORT_SYMBOL_GPL() and supplying a fixup. Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Reviewed-by: Amir Goldstein <[email protected]> Reviewed-by: Serge Hallyn <[email protected]> Reviewed-by: Christian Brauner <[email protected]> Signed-off-by: Paul Moore <[email protected]> [ Mainline declares lsm_backing_file_cache in security/lsm.h. Linux 6.12.y does not have security/lsm_init.c or security/lsm.h; the cache variable is defined locally as static struct kmem_cache *lsm_backing_file_cache in security/security.c. ] Signed-off-by: Cai Xinchen <[email protected]> Signed-off-by: Sasha Levin <[email protected]>
Author: Doruk Tan Ozturk <[email protected]> Date: Tue May 26 20:37:26 2026 +0200 mac802154: llsec: add skb_cow_data() before in-place crypto commit 84a04eb5b210643bd67aab81ff805d32f62aa865 upstream. llsec_do_encrypt_unauth(), llsec_do_encrypt_auth(), llsec_do_decrypt_unauth(), and llsec_do_decrypt_auth() all perform in-place cryptographic transformations on skb data. They build a scatterlist with sg_init_one() pointing into the skb's linear data area and then pass the same scatterlist as both src and dst to the crypto API (e.g. crypto_skcipher_encrypt/decrypt, crypto_aead_encrypt/decrypt). On the RX path, __ieee802154_rx_handle_packet() clones the received skb before handing it to each subscriber via ieee802154_subif_frame(). The cloned skb shares the same underlying data buffer via reference counting. When llsec_do_decrypt() subsequently modifies this shared buffer in place, it corrupts data that other clones -- potentially belonging to other sockets or subsystems -- still reference. On the TX path, similar data sharing can occur when an skb's head has been cloned (skb_cloned() returns true). The fix is to call skb_cow_data() before performing any in-place crypto operation. skb_cow_data() ensures that the skb's data area is not shared: if the skb head is cloned or the data spans multiple fragments, it copies the data into a private buffer that can be safely modified in place. This is the same pattern used by: - ESP (net/ipv4/esp4.c, net/ipv6/esp6.c) - MACsec (drivers/net/macsec.c) - WireGuard (drivers/net/wireguard/receive.c) - TIPC (net/tipc/crypto.c) Without this guard, in-place crypto on shared skb data leads to: - Silent data corruption of other skb clones - Use-after-free when the crypto API scatterwalk writes through a page that has already been freed by another clone's kfree_skb() - Kernel crashes under concurrent 802.15.4 traffic with security enabled (KASAN/KMSAN reports slab-use-after-free) Found by 0sec (https://0sec.ai) using automated source analysis. Fixes: 4c14a2fb5d14 ("mac802154: add llsec decryption method") Fixes: 03556e4d0dbb ("mac802154: add llsec encryption method") Cc: [email protected] Reported-by: Doruk Tan Ozturk <[email protected]> Closes: https://lore.kernel.org/linux-wpan/[email protected]/ Reviewed-by: Alexander Lobakin <[email protected]> Signed-off-by: Doruk Tan Ozturk <[email protected]> Closes: <link to your mail on lore> Link: https://lore.kernel.org/[email protected] Signed-off-by: Stefan Schmidt <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]>
Author: Ruslan Valiyev <[email protected]> Date: Tue Mar 17 17:05:44 2026 +0000 media: vidtv: fix NULL pointer dereference in vidtv_mux_push_si commit 7d8bf3d8f91073f4db347ed3aa6302b56107499c upstream. syzbot reported a general protection fault in vidtv_psi_ts_psi_write_into [1]. vidtv_mux_get_pid_ctx() can return NULL, but vidtv_mux_push_si() does not check for this before dereferencing the returned pointer to access the continuity counter. This leads to a general protection fault when accessing a near-NULL address. The root cause is that vidtv_mux_pid_ctx_init() does not check the return value of vidtv_mux_create_pid_ctx_once() for PMT section PIDs. If the allocation fails, the PID context is never created, but init returns success. The subsequent vidtv_mux_push_si() call then gets NULL from vidtv_mux_get_pid_ctx() and crashes. Fix both the root cause (add error check in vidtv_mux_pid_ctx_init for PMT PIDs) and add defensive NULL checks in vidtv_mux_push_si for all vidtv_mux_get_pid_ctx() calls. [1] Oops: general protection fault, probably for non-canonical address 0xdffffc0000000000: 0000 [#1] SMP KASAN PTI KASAN: null-ptr-deref in range [0x0000000000000000-0x0000000000000007] Workqueue: events vidtv_mux_tick RIP: 0010:vidtv_psi_ts_psi_write_into+0x54a/0xbc0 drivers/media/test-drivers/vidtv/vidtv_psi.c:197 Call Trace: <TASK> vidtv_psi_table_header_write_into drivers/media/test-drivers/vidtv/vidtv_psi.c:799 [inline] vidtv_psi_pmt_write_into+0x3b2/0xa70 drivers/media/test-drivers/vidtv/vidtv_psi.c:1231 vidtv_mux_push_si+0x932/0xe80 drivers/media/test-drivers/vidtv/vidtv_mux.c:196 vidtv_mux_tick+0xe9b/0x1480 drivers/media/test-drivers/vidtv/vidtv_mux.c:408 Fixes: f90cf6079bf67 ("media: vidtv: add a bridge driver") Cc: [email protected] Reported-by: [email protected] Closes: https://syzkaller.appspot.com/bug?extid=814c351d094f4f1a1b86 Signed-off-by: Ruslan Valiyev <[email protected]> Signed-off-by: Hans Verkuil <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]>
Author: Maciej W. Rozycki <[email protected]> Date: Wed May 6 23:42:27 2026 +0100 MIPS: DEC: Prevent initial console buffer from landing in XKPHYS commit 7fb13fd35110ebe95eb053faf79d018f51144d85 upstream. In 64-bit configurations calling the initial console output handler from a kernel thread other than the initial one will result in a situation where the stack has been placed in the XKPHYS 64-bit memory segment and consequently so has been the buffer allocated there that is used as the argument corresponding to the `%s' output conversion specifier for the firmware's printf() entry point. This 64-bit address will then be truncated by 32-bit firmware, resulting in an attempt to access the wrong memory location, which in turn will cause all kinds of unpredictable behaviour, such as a kernel crash: Console: colour dummy device 160x64 Calibrating delay loop... 49.36 BogoMIPS (lpj=192512) pid_max: default: 32768 minimum: 301 CPU 0 Unable to handle kernel paging request at virtual address 000000000203bd00, epc == ffffffffbfc08364, ra == ffffffffbfc08800 Oops[#1]: CPU: 0 PID: 0 Comm: swapper Not tainted 5.18.0-rc2-00254-gfb649bda6f56-dirty #121 $ 0 : 0000000000000000 0000000000000001 0000000000000023 ffffffff80684ba0 $ 4 : 000000000203bd00 ffffffffbfc0f3b4 ffffffffffffffff 0000000000000073 $ 8 : 0a303d7469000000 0000000000000000 0000000000000073 ffffffffbfc0f473 $12 : 0000000000000002 0000000000000000 ffffffff80684c1c 0000000000000000 $16 : 0000000000000000 ffffffff80596dc9 0000000000000000 ffffffffbfc09240 $20 : ffffffff80684c40 ffffffffbfc0f400 000000000000002d 000000000000002b $24 : ffffffffffffffbf 000000000203bd00 $28 : ffffffff805f0000 ffffffff80684b58 0000000000000030 ffffffffbfc08800 Hi : 0000000000000000 Lo : 0000000000000aa8 epc : ffffffffbfc08364 0xffffffffbfc08364 ra : ffffffffbfc08800 0xffffffffbfc08800 Status: 140120e2 KX SX UX KERNEL EXL Cause : 00000008 (ExcCode 02) BadVA : 000000000203bd00 PrId : 00000430 (R4000SC) Modules linked in: Process swapper (pid: 0, threadinfo=(____ptrval____), task=(____ptrval____), tls=0000000000000000) Stack : 0000000000000000 0000000000000000 0000000000000000 0000004d0000004d 80684cc0806a2a40 80596dc80000004d 8061000000000000 bfc0850c80684c38 0000000000000000 000000000203bd00 0000000000000000 0000000000000000 0000000000000000 00000000bfc0f3b4 0000000000000000 0000000000000000 0000000000000000 0000000000000000 0000000000000000 0000000000000000 0000000000000000 0000000000000000 0000000000000000 0000000000000000 0000002500000000 0000000000000000 0000000000000000 802c1a7400000000 0203bd0080596dc8 0203bd4d69000000 6c61632000000018 5f746567646e6172 6c616320625f6d6f 5f736e5f6d6f7266 206361323778302b 303d74696e726320 806a0a38806b0000 806a0a38806b0000 00000000806b0000 80683c58806b0000 ... Call Trace: Code: a082ffff 03e00008 00601021 <80820000> 00001821 10400005 24840001 80820000 24630001 ---[ end trace 0000000000000000 ]--- Kernel panic - not syncing: Fatal exception in interrupt KN04 V2.1k (PC: 0xa0026768, SP: 0x806848e8) >> In this case the pointer in $4 was truncated from 0x980000000203bd00 to 0x000000000203bd00. This may happen when no final console driver has been enabled in the configuration and consequently the initial console continues being used late into bootstrap or with an upcoming change that will switch the zs driver to use a platform device, which in turn will make the console handover happen only after other kernel threads have already been started. Fix the issue by making the buffer static and initdata, and therefore placed in the CKSEG0 32-bit compatibility segment, observing that the console output handler is called with the console lock held, implying no need for this code to be reentrant. Add an assertion to verify the buffer actually has been placed in a compatibility segment. Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Signed-off-by: Maciej W. Rozycki <[email protected]> Cc: [email protected] # v2.6.12+ Signed-off-by: Thomas Bogendoerfer <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]>
Author: Jonas Jelonek <[email protected]> Date: Mon Jun 8 09:37:29 2026 +0000 MIPS: smp: report dying CPU to RCU in stop_this_cpu() commit 9f3f3bdc6d9dac1a5a8262ee7ad0f2ff1527a7e7 upstream. smp_send_stop() parks all secondary CPUs in stop_this_cpu(). The function marks the CPU offline for the scheduler via set_cpu_online(false) but never informs RCU, so RCU keeps expecting a quiescent state from CPUs that are now spinning forever with interrupts disabled. As long as nothing waits for an RCU grace period after smp_send_stop() this is harmless, which is why it went unnoticed. Since commit 91840be8f710 ("irq_work: Fix use-after-free in irq_work_single() on PREEMPT_RT") however, irq_work_sync() calls synchronize_rcu() on architectures without an irq_work self-IPI, i.e. where arch_irq_work_has_interrupt() returns false. That is the asm-generic default used by MIPS. Any irq_work_sync() issued in the reboot/shutdown path after smp_send_stop() then blocks on a grace period that can never complete, hanging the reboot: WARNING: CPU: 0 PID: 15 at kernel/irq_work.c:144 irq_work_queue_on ... rcu: INFO: rcu_sched detected stalls on CPUs/tasks: rcu: Offline CPU 1 blocking current GP. rcu: Offline CPU 2 blocking current GP. rcu: Offline CPU 3 blocking current GP. This issue was noticed on several Realtek MIPS switch SoCs (MIPS interAptiv) and came up during kernel bump downstream in OpenWrt from 6.18.33 to 6.18.34, after the backport of the patch to the 6.18 stable branch. The patch also has been backported all the way back to 6.1. Call rcutree_report_cpu_dead() once interrupts are disabled, mirroring the generic CPU-hotplug offline path, so RCU stops waiting on the parked CPUs and grace periods can still complete. MIPS shuts down all CPUs here without going through the CPU-hotplug mechanism, so this report is not otherwise issued. Reporting a dying CPU to RCU outside the regular hotplug offline path is not unprecedented: arm64 does the same in cpu_die_early(). There it is an exception for a CPU that was coming online and is aborting bringup, rather than the default shutdown action as on MIPS. Fixes: 91840be8f710 ("irq_work: Fix use-after-free in irq_work_single() on PREEMPT_RT") CC: [email protected] Signed-off-by: Jonas Jelonek <[email protected]> Signed-off-by: Thomas Bogendoerfer <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]>
Author: Cheng Ming Lin <[email protected]> Date: Wed Jul 1 10:42:03 2026 +0800 mtd: spi-nor: macronix: Add post_sfdp fixups for Quad Input Page Program commit 798aafeffb369c5eb36e406b18970ef27baa820d upstream. Although certain Macronix NOR flash support the Quad Input Page Program feature, the corresponding information in the 4-byte Address Instruction Table of these flash is not properly filled. As a result, this feature cannot be enabled as expected. To address this issue, a post_sfdp fixups implementation is required to correct the missing information. Signed-off-by: Cheng Ming Lin <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Tudor Ambarus <[email protected]> Signed-off-by: Sasha Levin <[email protected]>
Author: Cheng Ming Lin <[email protected]> Date: Wed Jul 1 10:42:04 2026 +0800 mtd: spi-nor: macronix: add support for mx66{l2, u1}g45g commit 797bbaa7531f75985b199e484451fa3f954382b3 upstream. Due to incorrect values in the 4-BAIT table for these two flash IDs, it is necessary to add these two flash IDs with fixups. Signed-off-by: Cheng Ming Lin <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Tudor Ambarus <[email protected]> Signed-off-by: Sasha Levin <[email protected]>
Author: HanQuan <[email protected]> Date: Tue Jun 23 01:52:08 2026 +0000 net/tcp-ao: fix use-after-free of key in del_async path commit 5ba9950bc9078e19b69cca1e56d1553b125c6857 upstream. In tcp_ao_delete_key(), the del_async path skips the current_key and rnext_key validity checks present in the synchronous path, assuming these pointers are always NULL on LISTEN sockets. However, if a key was added with set_current=1/set_rnext=1 while the socket was in CLOSE state, current_key and rnext_key will be non-NULL after listen() transitions the socket to LISTEN. When such a key is deleted with del_async=1, hlist_del_rcu() and call_rcu() free the key without clearing the dangling pointers. After the RCU grace period, getsockopt(TCP_AO_INFO) dereferences current_key->sndid and rnext_key->rcvid from freed slab memory. Clear current_key and rnext_key in the del_async path when they reference the key being deleted. Fixes: d6732b95b6fb ("net/tcp: Allow asynchronous delete for TCP-AO keys (MKTs)") Signed-off-by: HanQuan <[email protected]> Reviewed-by: Eric Dumazet <[email protected]> Link: https://patch.msgid.link/[email protected] Signed-off-by: Jakub Kicinski <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]>
Author: Tonghao Zhang <[email protected]> Date: Tue Jun 16 15:54:25 2026 +0000 net: bonding: add broadcast_neighbor option for 802.3ad [ Upstream commit ce7a381697cb3958ffe0b45e5028ac69444e9288 ] Stacking technology is a type of technology used to expand ports on Ethernet switches. It is widely used as a common access method in large-scale Internet data center architectures. Years of practice have proved that stacking technology has advantages and disadvantages in high-reliability network architecture scenarios. For instance, in stacking networking arch, conventional switch system upgrades require multiple stacked devices to restart at the same time. Therefore, it is inevitable that the business will be interrupted for a while. It is for this reason that "no-stacking" in data centers has become a trend. Additionally, when the stacking link connecting the switches fails or is abnormal, the stack will split. Although it is not common, it still happens in actual operation. The problem is that after the split, it is equivalent to two switches with the same configuration appearing in the network, causing network configuration conflicts and ultimately interrupting the services carried by the stacking system. To improve network stability, "non-stacking" solutions have been increasingly adopted, particularly by public cloud providers and tech companies like Alibaba, Tencent, and Didi. "non-stacking" is a method of mimicing switch stacking that convinces a LACP peer, bonding in this case, connected to a set of "non-stacked" switches that all of its ports are connected to a single switch (i.e., LACP aggregator), as if those switches were stacked. This enables the LACP peer's ports to aggregate together, and requires (a) special switch configuration, described in the linked article, and (b) modifications to the bonding 802.3ad (LACP) mode to send all ARP/ND packets across all ports of the active aggregator. Note that, with multiple aggregators, the current broadcast mode logic will send only packets to the selected aggregator(s). +-----------+ +-----------+ | switch1 | | switch2 | +-----------+ +-----------+ ^ ^ | | +-----------------+ | bond4 lacp | +-----------------+ | | | NIC1 | NIC2 +-----------------+ | server | +-----------------+ - https://www.ruijie.com/fr-fr/support/tech-gallery/de-stack-data-center-network-architecture/ Cc: Jay Vosburgh <[email protected]> Cc: "David S. Miller" <[email protected]> Cc: Eric Dumazet <[email protected]> Cc: Jakub Kicinski <[email protected]> Cc: Paolo Abeni <[email protected]> Cc: Simon Horman <[email protected]> Cc: Jonathan Corbet <[email protected]> Cc: Andrew Lunn <[email protected]> Cc: Steven Rostedt <[email protected]> Cc: Masami Hiramatsu <[email protected]> Cc: Mathieu Desnoyers <[email protected]> Cc: Nikolay Aleksandrov <[email protected]> Signed-off-by: Tonghao Zhang <[email protected]> Signed-off-by: Zengbing Tu <[email protected]> Link: https://patch.msgid.link/84d0a044514157bb856a10b6d03a1028c4883561.1751031306.git.tonghao@bamaicloud.com Signed-off-by: Paolo Abeni <[email protected]> Signed-off-by: Kevin Berry <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]>
Author: Xiang Mei <[email protected]> Date: Tue Jun 16 15:54:29 2026 +0000 net: bonding: fix use-after-free in bond_xmit_broadcast() [ Upstream commit 2884bf72fb8f03409e423397319205de48adca16 ] bond_xmit_broadcast() reuses the original skb for the last slave (determined by bond_is_last_slave()) and clones it for others. Concurrent slave enslave/release can mutate the slave list during RCU-protected iteration, changing which slave is "last" mid-loop. This causes the original skb to be double-consumed (double-freed). Replace the racy bond_is_last_slave() check with a simple index comparison (i + 1 == slaves_count) against the pre-snapshot slave count taken via READ_ONCE() before the loop. This preserves the zero-copy optimization for the last slave while making the "last" determination stable against concurrent list mutations. The UAF can trigger the following crash: ================================================================== BUG: KASAN: slab-use-after-free in skb_clone Read of size 8 at addr ffff888100ef8d40 by task exploit/147 CPU: 1 UID: 0 PID: 147 Comm: exploit Not tainted 7.0.0-rc3+ #4 PREEMPTLAZY Call Trace: <TASK> dump_stack_lvl (lib/dump_stack.c:123) print_report (mm/kasan/report.c:379 mm/kasan/report.c:482) kasan_report (mm/kasan/report.c:597) skb_clone (include/linux/skbuff.h:1724 include/linux/skbuff.h:1792 include/linux/skbuff.h:3396 net/core/skbuff.c:2108) bond_xmit_broadcast (drivers/net/bonding/bond_main.c:5334) bond_start_xmit (drivers/net/bonding/bond_main.c:5567 drivers/net/bonding/bond_main.c:5593) dev_hard_start_xmit (include/linux/netdevice.h:5325 include/linux/netdevice.h:5334 net/core/dev.c:3871 net/core/dev.c:3887) __dev_queue_xmit (include/linux/netdevice.h:3601 net/core/dev.c:4838) ip6_finish_output2 (include/net/neighbour.h:540 include/net/neighbour.h:554 net/ipv6/ip6_output.c:136) ip6_finish_output (net/ipv6/ip6_output.c:208 net/ipv6/ip6_output.c:219) ip6_output (net/ipv6/ip6_output.c:250) ip6_send_skb (net/ipv6/ip6_output.c:1985) udp_v6_send_skb (net/ipv6/udp.c:1442) udpv6_sendmsg (net/ipv6/udp.c:1733) __sys_sendto (net/socket.c:730 net/socket.c:742 net/socket.c:2206) __x64_sys_sendto (net/socket.c:2209) do_syscall_64 (arch/x86/entry/syscall_64.c:63 arch/x86/entry/syscall_64.c:94) entry_SYSCALL_64_after_hwframe (arch/x86/entry/entry_64.S:130) </TASK> Allocated by task 147: Freed by task 147: The buggy address belongs to the object at ffff888100ef8c80 which belongs to the cache skbuff_head_cache of size 224 The buggy address is located 192 bytes inside of freed 224-byte region [ffff888100ef8c80, ffff888100ef8d60) Memory state around the buggy address: ffff888100ef8c00: fb fb fb fb fc fc fc fc fc fc fc fc fc fc fc fc ffff888100ef8c80: fa fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb >ffff888100ef8d00: fb fb fb fb fb fb fb fb fb fb fb fb fc fc fc fc ^ ffff888100ef8d80: fc fc fc fc fc fc fc fc fa fb fb fb fb fb fb fb ffff888100ef8e00: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb ================================================================== Fixes: 4e5bd03ae346 ("net: bonding: fix bond_xmit_broadcast return value error bug") Reported-by: Weiming Shi <[email protected]> Signed-off-by: Xiang Mei <[email protected]> Link: https://patch.msgid.link/[email protected] Signed-off-by: Paolo Abeni <[email protected]> Signed-off-by: Kevin Berry <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]>
Author: Tonghao Zhang <[email protected]> Date: Thu Oct 16 20:51:36 2025 +0800 net: bonding: update the slave array for broadcast mode commit e0caeb24f538c3c9c94f471882ceeb43d9dc2739 upstream. This patch fixes ce7a381697cb ("net: bonding: add broadcast_neighbor option for 802.3ad"). Before this commit, on the broadcast mode, all devices were traversed using the bond_for_each_slave_rcu. This patch supports traversing devices by using all_slaves. Therefore, we need to update the slave array when enslave or release slave. Fixes: ce7a381697cb ("net: bonding: add broadcast_neighbor option for 802.3ad") Cc: Simon Horman <[email protected]> Cc: Jonathan Corbet <[email protected]> Cc: Andrew Lunn <[email protected]> Cc: <[email protected]> Reported-by: Jiri Slaby <[email protected]> Tested-by: Jiri Slaby <[email protected]> Link: https://lore.kernel.org/all/[email protected]/ Signed-off-by: Tonghao Zhang <[email protected]> Reviewed-by: Hangbin Liu <[email protected]> Reviewed-by: Nikolay Aleksandrov <[email protected]> Acked-by: Jay Vosburgh <[email protected]> Link: https://patch.msgid.link/[email protected] Signed-off-by: Jakub Kicinski <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]>
Author: Sebastian Andrzej Siewior <[email protected]> Date: Fri Jun 19 17:20:12 2026 +0200 net: Drop the lock in skb_may_tx_timestamp() [ Upstream commit 983512f3a87fd8dc4c94dfa6b596b6e57df5aad7 ] skb_may_tx_timestamp() may acquire sock::sk_callback_lock. The lock must not be taken in IRQ context, only softirq is okay. A few drivers receive the timestamp via a dedicated interrupt and complete the TX timestamp from that handler. This will lead to a deadlock if the lock is already write-locked on the same CPU. Taking the lock can be avoided. The socket (pointed by the skb) will remain valid until the skb is released. The ->sk_socket and ->file member will be set to NULL once the user closes the socket which may happen before the timestamp arrives. If we happen to observe the pointer while the socket is closing but before the pointer is set to NULL then we may use it because both pointer (and the file's cred member) are RCU freed. Drop the lock. Use READ_ONCE() to obtain the individual pointer. Add a matching WRITE_ONCE() where the pointer are cleared. Link: https://lore.kernel.org/all/[email protected] Fixes: b245be1f4db1a ("net-timestamp: no-payload only sysctl") Signed-off-by: Sebastian Andrzej Siewior <[email protected]> Reviewed-by: Willem de Bruijn <[email protected]> Reviewed-by: Jason Xing <[email protected]> Reviewed-by: Eric Dumazet <[email protected]> Link: https://patch.msgid.link/[email protected] Signed-off-by: Paolo Abeni <[email protected]> [adapted sk_set_socket() in include/net/sock.h to fix the conflict from not having commit 5d6b58c932ec ("net: lockless sock_i_ino()") and the additional previous changes required by it. It comes down to just now having the lines of if (sock) { WRITE_ONCE(sk->sk_uid, SOCK_INODE(sock)->i_uid); WRITE_ONCE(sk->sk_ino, SOCK_INODE(sock)->i_ino); } below the changed line. I've tested this on a device running an nfs-root and did some additional network stress-testing.] Signed-off-by: Heiko Stuebner <[email protected]> Signed-off-by: Sasha Levin <[email protected]>
Author: Maoyi Xie <[email protected]> Date: Fri Jun 12 16:59:35 2026 +0800 net: ip_gre: require CAP_NET_ADMIN in the device netns for changelink commit 8165f7ff57d9667d2bb477ef6af83ede7fed4ad7 upstream. A tunnel changelink() operates on at most two netns, dev_net(dev) and the tunnel link netns t->net. They differ once the device is created in or moved to a netns other than the one the request runs in. The rtnl changelink path checks CAP_NET_ADMIN only against dev_net(dev), so a caller privileged there but not in t->net can rewrite a tunnel that lives in t->net. Add rtnl_dev_link_net_capable() next to rtnl_get_net_ns_capable() in net/core/rtnetlink.c. It requires CAP_NET_ADMIN in the link netns and is skipped when the link netns is dev_net(dev), where the rtnl path already checked it. The other patches in this series use the same helper. Gate ipgre_changelink() and erspan_changelink() with it, at the top of the op before any attribute is parsed, because the parsers update live tunnel fields first. ipgre_netlink_parms() sets t->collect_md before ip_tunnel_changelink() runs. Commit 8b484efd5cb4 ("ip6: vti: Use ip6_tnl.net in vti6_siocdevprivate().") added the same check on the ioctl path. This adds it on RTM_NEWLINK. Reported-by: Xiao Liang <[email protected]> Closes: https://lore.kernel.org/netdev/CABAhCOSzP1vaThGV35_VnsRCb=87_CPjPVsTHbq905k8A+BuUg@mail.gmail.com/ Fixes: b57708add314 ("gre: add x-netns support") Cc: [email protected] Signed-off-by: Maoyi Xie <[email protected]> Reviewed-by: Kuniyuki Iwashima <[email protected]> Link: https://patch.msgid.link/[email protected] Signed-off-by: Jakub Kicinski <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]>
Author: Petr Machata <[email protected]> Date: Thu Jun 25 11:24:41 2026 +0300 net: ipv6: Make udp_tunnel6_xmit_skb() void commit 6a7d88ca15f73c5c570c372238f71d63da1fda55 upstream. The function always returns zero, thus the return value does not carry any signal. Just make it void. Most callers already ignore the return value. However: - Refold arguments of the call from sctp_v6_xmit() so that they fit into the 80-column limit. - tipc_udp_xmit() initializes err from the return value, but that should already be always zero at that point. So there's no practical change, but elision of the assignment prompts a couple more tweaks to clean up the function. Signed-off-by: Petr Machata <[email protected]> Reviewed-by: Ido Schimmel <[email protected]> Reviewed-by: Nikolay Aleksandrov <[email protected]> Link: https://patch.msgid.link/7facacf9d8ca3ca9391a4aee88160913671b868d.1750113335.git.petrm@nvidia.com Signed-off-by: Jakub Kicinski <[email protected]> Signed-off-by: Alexander Martyniuk <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]>
Author: Santosh Kalluri <[email protected]> Date: Wed Jun 17 10:28:30 2026 -0400 net: phonet: free phonet_device after RCU grace period [ Upstream commit 71de0177b28da751f407581a4515cf4d762f6296 ] phonet_device_destroy() removes a phonet_device from the per-net device list with list_del_rcu(), but frees it immediately. RCU readers walking the same list can still hold a pointer to the object after it has been removed, leading to a slab-use-after-free. Use kfree_rcu(), matching the lifetime rule already used by phonet_address_del() for the same object type. Fixes: eeb74a9d45f7 ("Phonet: convert devices list to RCU") Cc: [email protected] Signed-off-by: Santosh Kalluri <[email protected]> Acked-by: Rémi Denis-Courmont <[email protected]> Reviewed-by: Simon Horman <[email protected]> Signed-off-by: Jakub Kicinski <[email protected]> Signed-off-by: Sasha Levin <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]>
Author: Weiming Shi <[email protected]> Date: Thu May 14 05:25:12 2026 -0700 net: qualcomm: rmnet: fix endpoint use-after-free in rmnet_dellink() commit d00c953a8f69921f484b629801766da68f27f658 upstream. rmnet_dellink() removes the endpoint from the hash table with hlist_del_init_rcu() and then immediately frees it with kfree(). However, RCU readers on the receive path (rmnet_rx_handler -> __rmnet_map_ingress_handler) may still hold a reference to the endpoint and dereference ep->egress_dev after the memory has been freed. The endpoint is a kmalloc-32 object, and the stale read at offset 8 corresponds to the egress_dev pointer. BUG: unable to handle page fault for address: ffffffffde942eef Oops: 0002 [#1] SMP NOPTI CPU: 1 UID: 0 PID: 137 Comm: poc_write Not tainted 7.0.0+ #4 PREEMPTLAZY RIP: 0010:rmnet_vnd_rx_fixup (rmnet_vnd.c:27) Call Trace: <TASK> __rmnet_map_ingress_handler (rmnet_handlers.c:48 rmnet_handlers.c:101) rmnet_rx_handler (rmnet_handlers.c:129 rmnet_handlers.c:235) __netif_receive_skb_core.constprop.0 (net/core/dev.c:6096) __netif_receive_skb_one_core (net/core/dev.c:6208) netif_receive_skb (net/core/dev.c:6467) tun_get_user (drivers/net/tun.c:1955) tun_chr_write_iter (drivers/net/tun.c:2003) vfs_write (fs/read_write.c:688) ksys_write (fs/read_write.c:740) </TASK> Add an rcu_head field to struct rmnet_endpoint and replace kfree() with kfree_rcu() so the endpoint memory remains valid through the RCU grace period. Also remove the rmnet_vnd_dellink() call and inline only the nr_rmnet_devs decrement, since rmnet_vnd_dellink() would set ep->egress_dev to NULL during the grace period, creating a data race with lockless readers. Fixes: ceed73a2cf4a ("drivers: net: ethernet: qualcomm: rmnet: Initial implementation") Reported-by: Xiang Mei <[email protected]> Signed-off-by: Weiming Shi <[email protected]> Link: https://patch.msgid.link/[email protected] Signed-off-by: Jakub Kicinski <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]>
Author: Yiming Qian <[email protected]> Date: Wed Jun 10 06:21:36 2026 +0000 net: skmsg: preserve sg.copy across SG transforms commit 406e8a651a7b854c41fecd5117bb282b3a6c2c6b upstream. The sk_msg sg.copy bitmap is part of the scatterlist entry ownership state. A set bit tells sk_msg_compute_data_pointers() not to expose the entry through writable BPF ctx->data. This protects entries backed by pages that are not private to the sk_msg, such as splice-backed file page-cache pages. Several sk_msg transform paths move, copy, split, or compact msg->sg.data[] entries without moving the matching sg.copy bit. This can make an externally backed entry arrive at a new slot with a clear copy bit. A later SK_MSG verdict can then expose sg_virt(sge) as writable ctx->data and BPF stores can modify the original page cache. Keep sg.copy synchronized with sg.data[] whenever entries are transferred, shifted, split, or copied into a new sk_msg. Clear the bit when an entry is replaced by a newly allocated private page or freed. This covers the BPF pull/push/pop helpers, sk_msg_shift_left/right(), sk_msg_xfer(), and tls_split_open_record(), including the partial tail entry created during TLS open-record splitting. Fixes: d3b18ad31f93 ("tls: add bpf support to sk_msg handling") Cc: [email protected] Reported-by: Yiming Qian <[email protected]> Reported-by: Keenan Dong <[email protected]> Signed-off-by: Yiming Qian <[email protected]> Link: https://patch.msgid.link/[email protected] Signed-off-by: Jakub Kicinski <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]>
Author: Markus Elfring <[email protected]> Date: Sun Jun 14 09:56:35 2026 +0200 NFS: Prevent resource leak in nfs_alloc_server() commit d189f224308c8ac3feeea8e442c99922bd18f1b2 upstream. It was overlooked to call ida_free() after a failed nfs_alloc_iostats() call. Thus add the missed function call in an if branch. Fixes: 1c7251187dc067a6d460cf33ca67da9c1dd87807 ("NFS: add superblock sysfs entries") Cc: [email protected] Reported-by: Christophe Jaillet <[email protected]> Closes: https://lore.kernel.org/linux-nfs/[email protected]/ Signed-off-by: Markus Elfring <[email protected]> Signed-off-by: Anna Schumaker <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]>
Author: Jeff Layton <[email protected]> Date: Fri May 22 10:36:14 2026 -0400 nfsd: avoid leaking pre-allocated openowner on unconfirmed retry race commit 57aee7a35bb12753057c5b65d72d1f46c0e95b07 upstream. When find_or_alloc_open_stateowner() encounters an unconfirmed owner, it calls release_openowner() and sets oo = NULL. Control then falls through past the `if (oo)` guard -- which would have freed any pre-allocated `new` -- and unconditionally executes `new = alloc_stateowner(...)`. If `new` was already allocated on a prior iteration, the pointer is silently overwritten and the previous allocation (slab object + owner name buffer) is leaked. This requires a race: two NFSv4.0 OPEN threads with the same owner string, where a concurrent thread inserts a new unconfirmed owner into the hash between retry iterations. The window is narrow but repeatable under adversarial conditions. Fix by adding `goto retry` after `oo = NULL` so the already-allocated `new` is reused on the next iteration rather than overwritten. Reported-by: Chris Mason <[email protected]> Fixes: 23df17788c62 ("nfsd: perform all find_openstateowner_str calls in the one place.") Cc: [email protected] Assisted-by: kres:claude-opus-4-6 Signed-off-by: Jeff Layton <[email protected]> Signed-off-by: Chuck Lever <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]>
Author: Dominik Woźniak <[email protected]> Date: Thu May 21 17:46:56 2026 +0200 nfsd: check get_user() return when reading princhashlen commit e186fa1c057f5eccb22afb1e83e34c0627085868 upstream. In __cld_pipe_inprogress_downcall(), the get_user() that reads princhashlen from the userspace cld_msg_v2 buffer does not check its return value. A failing copy leaves princhashlen with uninitialised stack contents, which are then used to drive memdup_user() and stored as princhash.len on the resulting reclaim record. The other get_user() calls in this function all check the return; only this one is missed, which is most likely a copy-paste oversight from when v2 upcalls were introduced. Mirror the existing pattern used a few lines above for namelen. namecopy is declared with __free(kfree) so the early return cleans up the already-allocated buffer automatically. Fixes: 6ee95d1c8991 ("nfsd: add support for upcall version 2") Cc: [email protected] Signed-off-by: Dominik Woźniak <[email protected]> Reviewed-by: Jeff Layton <[email protected]> Signed-off-by: Chuck Lever <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]>
Author: Jeff Layton <[email protected]> Date: Thu May 21 13:51:43 2026 -0400 nfsd: fix posix_acl leak on SETACL decode failure commit 0853ac544c590880d797b04daa33fcb72b6be0e1 upstream. nfsaclsvc_decode_setaclargs() and nfs3svc_decode_setaclargs() each call nfs_stream_decode_acl() twice, first for NFS_ACL and then for NFS_DFACL. Each successful call transfers ownership of a freshly allocated posix_acl into argp->acl_access or argp->acl_default. If the first call succeeds but the second fails, the decoder returns false and argp->acl_access is left dangling. ACLPROC2_SETACL.pc_release was wired to nfssvc_release_attrstat and ACLPROC3_SETACL.pc_release was wired to nfs3svc_release_fhandle. Both only call fh_put() and have no knowledge of the ACL fields on argp. The posix_acl_release() pairs sat at the out: labels inside nfsacld_proc_setacl() and nfsd3_proc_setacl(), but svc_process() skips pc_func when pc_decode returns false, so that cleanup is unreachable on decode failure: svc_process_common() pc_decode() /* decode_setaclargs: false */ /* pc_func skipped */ pc_release() /* fh_put only -- ACLs leaked */ The orphaned posix_acl is leaked for the lifetime of the server. Fix by adding nfsaclsvc_release_setacl() and nfs3svc_release_setacl(), which release both argp->acl_access and argp->acl_default in addition to fh_put(), and wiring them as pc_release for their respective SETACL procedures. pc_release runs on every path svc_process() takes after decode, including decode failure, so the posix_acl_release() pairs are removed from the proc functions' out: labels to keep ownership in one place. This matches the existing release_getacl() pattern used by the sibling GETACL procedures. Fixes: a257cdd0e217 ("[PATCH] NFSD: Add server support for NFSv3 ACLs.") Cc: [email protected] Assisted-by: kres:claude-opus-4-7 Signed-off-by: Jeff Layton <[email protected]> Signed-off-by: Chuck Lever <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]>
Author: Guannan Wang <[email protected]> Date: Thu May 21 16:03:32 2026 +0800 NFSD: Fix SECINFO_NO_NAME decode error cleanup commit 9e18e83b8846a5c3fe13fc8a464b4865d33996c6 upstream. nfsd4_decode_secinfo_no_name() currently initializes sin_exp after decoding sin_style. If the XDR stream is truncated, the decoder returns nfserr_bad_xdr before sin_exp is initialized. Since commit 3fdc54646234 ("NFSD: Reduce amount of struct nfsd4_compoundargs that needs clearing"), the inline iops array is not cleared between RPC calls. A failed SECINFO_NO_NAME decode can therefore leave sin_exp holding stale union contents from a previous operation. The error response path still invokes nfsd4_secinfo_no_name_release(), which calls exp_put() on a non-NULL sin_exp. Initialize sin_exp before the first failable decode step, matching nfsd4_decode_secinfo(). Fixes: 3fdc54646234 ("NFSD: Reduce amount of struct nfsd4_compoundargs that needs clearing") Cc: [email protected] Signed-off-by: Guannan Wang <[email protected]> Signed-off-by: Chuck Lever <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]>
Author: Jeff Layton <[email protected]> Date: Fri May 22 12:44:19 2026 -0400 nfsd: reset write verifier on deferred writeback errors commit 2090b05803faab8a9fa62fbff871007862cac1b7 upstream. nfsd_vfs_write() and nfsd_commit() both call filemap_check_wb_err() to detect deferred writeback errors, but neither rotates the server's write verifier (nn->writeverf) when this check fails. Every other durable-storage-failure path in these functions calls commit_reset_write_verifier() before returning an error. The missing rotation means clients holding UNSTABLE write data under the current verifier will COMMIT, receive the unchanged verifier back, and conclude their data is durable — silently dropping data that failed writeback. This violates the UNSTABLE+COMMIT durability contract (RFC 1813 §3.3.7, RFC 8881 §18.32). Add commit_reset_write_verifier() calls at both filemap_check_wb_err() error sites, matching the pattern used by adjacent error paths in the same functions. The helper already filters -EAGAIN and -ESTALE internally, so the calls are unconditionally safe. Reported-by: Chris Mason <[email protected]> Fixes: 555dbf1a9aac ("nfsd: Replace use of rwsem with errseq_t") Cc: [email protected] Assisted-by: kres:claude-opus-4-6 Signed-off-by: Jeff Layton <[email protected]> Signed-off-by: Chuck Lever <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]>
Author: Michael Bommarito <[email protected]> Date: Wed May 27 12:30:35 2026 -0400 NFSv4/pNFS: reject zero-length r_addr in nfs4_decode_mp_ds_addr commit 41fe0f7b84f0cb822ae10ab08592996a592b2a25 upstream. nfs4_decode_mp_ds_addr() decodes the r_netid and r_addr opaques of a netaddr4 from a GETDEVICEINFO multipath-DS body, then immediately calls strrchr(buf, '.') to locate the port separator. Both decodes use xdr_stream_decode_string_dup(), and the current code checks only "nlen < 0" / "rlen < 0" before dereferencing the returned string. When the on-wire opaque has length zero, xdr_stream_decode_opaque_inline() returns 0 and xdr_stream_decode_string_dup() falls through to its "*str = NULL; return ret" tail, leaving buf NULL with a return value of 0. The "< 0" check does not catch this, and the next line is strrchr(NULL, '.'), a kernel NULL pointer dereference reachable from any pNFS-flexfile client mounted against a malicious or compromised metadata server. Reject the zero-length cases explicitly so the decoder fails with -EBADMSG (treated as a malformed GETDEVICEINFO body) instead of panicking the client. Cc: [email protected] Fixes: 6b7f3cf96364 ("nfs41: pull decode_ds_addr from file layout to generic pnfs") Assisted-by: Claude:claude-opus-4-7 Signed-off-by: Michael Bommarito <[email protected]> Signed-off-by: Anna Schumaker <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]>
Author: Koichiro Den <[email protected]> Date: Wed Mar 4 11:05:27 2026 +0900 NTB: epf: Avoid pci_iounmap() with offset when PEER_SPAD and CONFIG share BAR commit d876153680e3d721d385e554def919bce3d18c74 upstream. When BAR_PEER_SPAD and BAR_CONFIG share one PCI BAR, the module teardown path ends up calling pci_iounmap() on the same iomem with some offset, which is unnecessary and triggers a kernel warning like the following: Trying to vunmap() nonexistent vm area (0000000069a5ffe8) WARNING: mm/vmalloc.c:3470 at vunmap+0x58/0x68, CPU#5: modprobe/2937 [...] Call trace: vunmap+0x58/0x68 (P) iounmap+0x34/0x48 pci_iounmap+0x2c/0x40 ntb_epf_pci_remove+0x44/0x80 [ntb_hw_epf] pci_device_remove+0x48/0xf8 device_remove+0x50/0x88 device_release_driver_internal+0x1c8/0x228 driver_detach+0x50/0xb0 bus_remove_driver+0x74/0x100 driver_unregister+0x34/0x68 pci_unregister_driver+0x34/0xa0 ntb_epf_pci_driver_exit+0x14/0xfe0 [ntb_hw_epf] [...] Fix it by unmapping only when PEER_SPAD and CONFIG use difference bars. Cc: [email protected] Fixes: e75d5ae8ab88 ("NTB: epf: Allow more flexibility in the memory BAR map method") Reviewed-by: Frank Li <[email protected]> Signed-off-by: Koichiro Den <[email protected]> Reviewed-by: Dave Jiang <[email protected]> Signed-off-by: Jon Mason <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]>
Author: Konstantin Komarov <[email protected]> Date: Wed Jun 10 12:31:01 2026 +0200 ntfs3: reject direct userspace writes to reserved $LX* xattrs commit 5b08dccecf825cbf905f348bc6ccb497507e28e2 upstream. NTFS3 uses $LXUID, $LXGID, $LXMOD and $LXDEV as internal WSL permission metadata and reloads them into i_uid, i_gid and i_mode from ntfs_get_wsl_perm(). Because the empty-prefix xattr handler also lets file owners call setxattr() on these names directly, an unprivileged writer on a writable ntfs3 mount can plant root ownership and S_ISUID on their own file and gain euid 0 after inode reload. Reject direct userspace writes to the reserved $LX* names. Internal ntfs3 metadata updates are unchanged because ntfs_save_wsl_perm() writes them via ntfs_set_ea() directly. Signed-off-by: Zhen Yan <[email protected]> [[email protected]: added an additional check for non privileged users] Signed-off-by: Konstantin Komarov <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]>
Author: Zhang Cen <[email protected]> Date: Sun May 24 19:12:48 2026 +0800 ocfs2: reject oversized group bitmap descriptors commit 9bd541e09dffff27e5bec0f9f45b0228173a5375 upstream. ocfs2_validate_gd_parent() only bounds bg_bits against the parent allocator's chain geometry. A malicious descriptor can still claim a bg_size/bg_bits pair that exceeds the bitmap bytes that physically fit in the group descriptor block, so later bitmap scans and bit updates can run past bg_bitmap. Add a physical-cap check based on ocfs2_group_bitmap_size() for the parent allocator type and reject descriptors whose bg_size or bg_bits exceed that capacity. Keep the existing chain geometry check so both the on-disk bitmap layout and the allocator metadata must agree before the descriptor is used. Validation reproduced this kernel report: KASAN use-after-free in _find_next_bit+0x7f/0xc0 Read of size 8 Call trace: dump_stack_lvl+0x66/0xa0 (?:?) print_report+0xd0/0x630 (?:?) _find_next_bit+0x7f/0xc0 (?:?) srso_alias_return_thunk+0x5/0xfbef5 (?:?) __virt_addr_valid+0x188/0x2f0 (?:?) kasan_report+0xe4/0x120 (?:?) ocfs2_find_max_contig_free_bits+0x35/0x70 (fs/ocfs2/suballoc.c:1375) ocfs2_block_group_set_bits+0x472/0x4b0 (fs/ocfs2/suballoc.c:1457) ocfs2_cluster_group_search+0x16b/0x440 (fs/ocfs2/suballoc.c:86) ocfs2_bg_discontig_fix_result+0x1ef/0x230 (fs/ocfs2/suballoc.c:1786) ocfs2_search_chain+0x8f8/0x10a0 (fs/ocfs2/suballoc.c:1886) get_page_from_freelist+0x70e/0x2370 (?:?) lock_release+0xc6/0x290 (?:?) do_raw_spin_unlock+0x9a/0x100 (?:?) kasan_unpoison+0x27/0x60 (?:?) __bfs+0x147/0x240 (?:?) get_page_from_freelist+0x83d/0x2370 (?:?) ocfs2_claim_suballoc_bits+0x38c/0xe70 (fs/ocfs2/suballoc.c:96) sched_domains_numa_masks_clear+0x70/0xd0 (?:?) check_irq_usage+0xe8/0xb70 (?:?) __ocfs2_claim_clusters+0x18d/0x4c0 (fs/ocfs2/suballoc.c:2497) check_path+0x24/0x50 (?:?) rcu_is_watching+0x20/0x50 (?:?) check_prev_add+0xfd/0xd00 (?:?) ocfs2_add_clusters_in_btree+0x17d/0x810 (fs/ocfs2/suballoc.c:?) __folio_batch_add_and_move+0x1f5/0x3d0 (?:?) ocfs2_add_inode_data+0xd9/0x120 (fs/ocfs2/suballoc.c:?) filemap_add_folio+0x105/0x1f0 (?:?) ocfs2_write_begin_nolock+0x29f7/0x2f80 (fs/ocfs2/suballoc.c:3043) ocfs2_read_inode_block+0xb5/0x110 (fs/ocfs2/suballoc.c:?) down_write+0xf5/0x180 (?:?) ocfs2_write_begin+0x180/0x240 (fs/ocfs2/suballoc.c:?) __mark_inode_dirty+0x758/0x9a0 (?:?) inode_to_bdi+0x41/0x90 (?:?) balance_dirty_pages_ratelimited_flags+0xf8/0x1d0 (?:?) generic_perform_write+0x252/0x440 (?:?) mnt_put_write_access_file+0x16/0x70 (?:?) file_update_time_flags+0xe4/0x200 (?:?) ocfs2_file_write_iter+0x80a/0x1320 (fs/ocfs2/suballoc.c:?) lock_acquire+0x184/0x2f0 (?:?) ksys_write+0xd2/0x170 (?:?) apparmor_file_permission+0xf5/0x310 (?:?) read_zero+0x8d/0x140 (?:?) lock_is_held_type+0x8f/0x100 (?:?) Link: https://lore.kernel.org/[email protected] Fixes: ccd979bdbce9 ("[PATCH] OCFS2: The Second Oracle Cluster Filesystem") Assisted-by: Codex:gpt-5.5 Signed-off-by: Zhang Cen <[email protected]> Reviewed-by: Joseph Qi <[email protected]> Cc: Mark Fasheh <[email protected]> Cc: Joel Becker <[email protected]> Cc: Junxiao Bi <[email protected]> Cc: Changwei Ge <[email protected]> Cc: Jun Piao <[email protected]> Cc: Heming Zhao <[email protected]> Cc: <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]>
Author: Kuniyuki Iwashima <[email protected]> Date: Wed Jun 17 10:28:28 2026 -0400 phonet: Pass ifindex to fill_addr(). [ Upstream commit 08a9572be36819b5d9011604edfa5db6c5062a7a ] We will convert addr_doit() and getaddr_dumpit() to RCU, both of which call fill_addr(). The former will call phonet_address_notify() outside of RCU due to GFP_KERNEL, so dev will not be available in fill_addr(). Let's pass ifindex directly to fill_addr(). Signed-off-by: Kuniyuki Iwashima <[email protected]> Reviewed-by: Eric Dumazet <[email protected]> Signed-off-by: Paolo Abeni <[email protected]> Stable-dep-of: 71de0177b28d ("net: phonet: free phonet_device after RCU grace period") Signed-off-by: Sasha Levin <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]>
Author: Kuniyuki Iwashima <[email protected]> Date: Wed Jun 17 10:28:29 2026 -0400 phonet: Pass net and ifindex to phonet_address_notify(). [ Upstream commit 68ed5c38b512b734caf3da1f87db4a99fcfe3002 ] Currently, phonet_address_notify() fetches netns and ifindex from dev. Once addr_doit() is converted to RCU, phonet_address_notify() will be called outside of RCU due to GFP_KERNEL, and dev will be unavailable there. Let's pass net and ifindex to phonet_address_notify(). Signed-off-by: Kuniyuki Iwashima <[email protected]> Reviewed-by: Eric Dumazet <[email protected]> Signed-off-by: Paolo Abeni <[email protected]> Stable-dep-of: 71de0177b28d ("net: phonet: free phonet_device after RCU grace period") Signed-off-by: Sasha Levin <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]>
Author: Wentao Liang <[email protected]> Date: Mon May 18 13:10:36 2026 +0000 pNFS: Fix use-after-free in pnfs_update_layout() commit 13e198a90ca4050f4bee8a3f23680389a6563ccc upstream. When hitting the NFS_LAYOUT_RETURN branch in pnfs_update_layout(), the code calls pnfs_prepare_to_retry_layoutget(lo). If it succeeds, pnfs_put_layout_hdr(lo) is called before trace_pnfs_update_layout(), which still references 'lo'. This results in a use-after-free when the tracepoint accesses lo's fields. Fix this by moving the tracepoint call before pnfs_put_layout_hdr(lo). Fixes: 2c8d5fc37fe2 ("pNFS: Stricter ordering of layoutget and layoutreturn") Cc: [email protected] Signed-off-by: Wentao Liang <[email protected]> Signed-off-by: Anna Schumaker <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]>
Author: Wentao Liang <[email protected]> Date: Tue Apr 7 07:30:25 2026 +0000 power: reset: linkstation-poweroff: fix use-after-free in the linkstation_poweroff_init() commit 8eec545cde69e46e9a1d2b7d915ce4f5df85b3bd upstream. Move of_node_put(dn) after the of_match_node() call, which still needs the node pointer. The node reference is correctly released after use. Fixes: e2f471efe1d6 ("power: reset: linkstation-poweroff: prepare for new devices") Cc: [email protected] Signed-off-by: Wentao Liang <[email protected]> Link: https://patch.msgid.link/[email protected] Signed-off-by: Sebastian Reichel <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]>
Author: Wentao Liang <[email protected]> Date: Tue Jun 16 15:10:49 2026 +0000 pwrseq: core: fix use-after-free in pwrseq_debugfs_seq_next() commit 257595adf9dac15ae1edd9d07753fbc576a7583d upstream. pwrseq_debugfs_seq_next() declares 'next' with __free(put_device), which causes put_device() to be called on the returned pointer when the variable goes out of scope. This results in a use-after-free since the seq_file framework receives a pointer whose reference has already been dropped. Simply removing __free(put_device) would fix the UAF but would leak the reference acquired by bus_find_next_device(), as stop() only calls up_read(&pwrseq_sem) and never releases the device reference. Fix this by making the reference counting consistent across all seq_file callbacks, matching the standard pattern used by PCI and SCSI: - start(): use get_device() so it returns a referenced pointer. - next(): explicitly put_device(curr) to release the previous device's reference (no NULL check needed - the seq_file framework only calls next() while the previous return was non-NULL). - stop(): put_device(data) to release the last iterated device's reference, with a NULL guard since stop() may be called with NULL when start() returned NULL or next() reached end-of-sequence. Cc: [email protected] Fixes: 249ebf3f65f8 ("power: sequencing: implement the pwrseq core") Signed-off-by: Wentao Liang <[email protected]> Link: https://patch.msgid.link/[email protected] Signed-off-by: Bartosz Golaszewski <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]>
Author: Lord Ulf Henrik Holmberg <[email protected]> Date: Sat May 9 10:40:11 2026 +0200 RDMA/bnxt_re: zero shared page before exposing to userspace commit f6b079629becfa977f9c51fe53ad2e6dcc55ef44 upstream. bnxt_re_alloc_ucontext() allocates uctx->shpg via __get_free_page(GFP_KERNEL). The buddy allocator does not zero pages without __GFP_ZERO, so the page contains stale kernel data from whatever object most recently freed it. The page is then mapped into userspace via vm_insert_page() under BNXT_RE_MMAP_SH_PAGE in bnxt_re_mmap(). The driver only ever writes 4 bytes (a u32 AVID) at offset BNXT_RE_AVID_OFFT (0x10) inside bnxt_re_create_ah(); the remaining 4092 bytes of the page are exposed to userspace unsanitised, leaking kernel memory contents. Any user with access to /dev/infiniband/uverbsX on a host with a bnxt_re device (typically rdma group membership) can read this data via a single mmap() at pgoff 0 after IB_USER_VERBS_CMD_GET_CONTEXT. Other shared pages in the same file already use get_zeroed_page() correctly: drivers/infiniband/hw/bnxt_re/ib_verbs.c srq->uctx_srq_page = (void *)get_zeroed_page(GFP_KERNEL); cq->uctx_cq_page = (void *)get_zeroed_page(GFP_KERNEL); uctx->shpg is the only outlier. Bring it in line with the existing convention by switching to get_zeroed_page(). Fixes: 1ac5a4047975 ("RDMA/bnxt_re: Add bnxt_re RoCE driver") Signed-off-by: Lord Ulf Henrik Holmberg <[email protected]> Link: https://patch.msgid.link/[email protected] Signed-off-by: Leon Romanovsky <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]>
Author: Petr Machata <[email protected]> Date: Mon Jun 22 13:53:25 2026 +0200 Reapply "selftest/ptp: update ptp selftest to exercise the gettimex options" This reverts commit 6b2176a5c99b33f3c4acc04faadaa9c75da7b163, which in turn reverts commit fa361565a7275cc43c6ca1abec9ec4fcc9ec51f1, which is commit 3d07b691ee707c00afaf365440975e81bb96cd9b upstream. The reason for the original revert was that struct ptp_sys_offset_extended does not contain the field clock_id in 6.12.y. However the claim was false: 6.12.y does in fact contain the field. Reapply therefore the original patch. Signed-off-by: Petr Machata <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]>
Author: André Draszik <[email protected]> Date: Fri Jan 9 08:38:38 2026 +0000 regulator: core: fix locking in regulator_resolve_supply() error path commit 497330b203d2c59c5ff3fa4c34d14494d7203bc3 upstream. If late enabling of a supply regulator fails in regulator_resolve_supply(), the code currently triggers a lockdep warning: WARNING: drivers/regulator/core.c:2649 at _regulator_put+0x80/0xa0, CPU#6: kworker/u32:4/596 ... Call trace: _regulator_put+0x80/0xa0 (P) regulator_resolve_supply+0x7cc/0xbe0 regulator_register_resolve_supply+0x28/0xb8 as the regulator_list_mutex must be held when calling _regulator_put(). To solve this, simply switch to using regulator_put(). While at it, we should also make sure that no concurrent access happens to our rdev while we clear out the supply pointer. Add appropriate locking to ensure that. While the code in question will be removed altogether in a follow-up commit, I believe it is still beneficial to have this corrected before removal for future reference. Fixes: 36a1f1b6ddc6 ("regulator: core: Fix memory leak in regulator_resolve_supply()") Fixes: 8e5356a73604 ("regulator: core: Clear the supply pointer if enabling fails") Signed-off-by: André Draszik <[email protected]> Link: https://patch.msgid.link/[email protected] Signed-off-by: Mark Brown <[email protected]> Signed-off-by: Nazar Kalashnikov <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]>
Author: Kevin Berry <[email protected]> Date: Tue Jun 16 15:54:24 2026 +0000 Revert "net: bonding: fix use-after-free in bond_xmit_broadcast()" This reverts commit 3453882f36c40d2339267093676585a89808a73d. There are two versions of this use-after-free fix commit: this one, which was written to avoid taking a dependency on ce7a381697cb3 ("net: bonding: add broadcast_neighbor option for 802.3ad"), and the original, simpler version 2884bf72fb8f ("net: bonding: fix use-after-free in bond_xmit_broadcast()"), which implicitly depends on the slave counting changes in ce7a381697cb3. In both the 6.1 and 6.6 stable branches, commit ce7a381697cb3 was included as a stable dep of c4f050ce06c56 ("bonding: 3ad: implement proper RCU rules for port->aggregator"), and the original version of this fix was subsequently applied. For consistency, and to be able to apply both bug fixes, we should revert this commit, apply the series for ce7a381697cb3 ("net: bonding: add broadcast_neighbor option for 802.3ad"), and then apply the original version of this fix, 2884bf72fb8f ("net: bonding: fix use-after-free in bond_xmit_broadcast()"). Signed-off-by: Kevin Berry <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]>
Author: Sasha Levin <[email protected]> Date: Sat Jun 27 10:53:51 2026 -0400 Revert "PCI: qcom: Advertise Hotplug Slot Capability with no Command Completion support" This reverts commit 480c94d3affbc11b9e98ca223a9fa19d90b84fbb. Signed-off-by: Sasha Levin <[email protected]>
Author: Vivian Wang <[email protected]> Date: Tue Mar 3 13:29:46 2026 +0800 riscv: kfence: Call mark_new_valid_map() for kfence_unprotect() commit 8d6c8c40e733b3fcaf92fed0a078bba2f6941a3b upstream. In kfence_protect_page(), which kfence_unprotect() calls, we cannot send IPIs to other CPUs to ask them to flush TLB. This may lead to those CPUs spuriously faulting on a recently allocated kfence object despite it being valid, leading to false positive use-after-free reports. Fix this by calling mark_new_valid_map() so that the page fault handling code path notices the spurious fault and flushes TLB then retries the access. Update the comment in handle_exception to indicate that new_valid_map_cpus_check also handles kfence_unprotect() spurious faults. Note that kfence_protect() has the same stale TLB entries problem, but that leads to false negatives, which is fine with kfence. Cc: [email protected] Reported-by: Yanko Kaneti <[email protected]> Fixes: b3431a8bb336 ("riscv: Fix IPIs usage in kfence_protect_page()") Signed-off-by: Vivian Wang <[email protected]> Link: https://patch.msgid.link/20260303-handle-kfence-protect-spurious-fault-v2-2-f80d8354d79d@iscas.ac.cn Signed-off-by: Paul Walmsley <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]>
Author: Vivian Wang <[email protected]> Date: Tue Mar 3 13:29:45 2026 +0800 riscv: mm: Extract helper mark_new_valid_map() commit 9ee25d0a70ff4494b4e1d266b962d0a574ef318a upstream. In preparation of a future patch using the same mechanism for non-vmalloc addresses, extract the mark_new_valid_map() helper from flush_cache_vmap(). No functional change intended. Cc: [email protected] Signed-off-by: Vivian Wang <[email protected]> Link: https://patch.msgid.link/20260303-handle-kfence-protect-spurious-fault-v2-1-f80d8354d79d@iscas.ac.cn Signed-off-by: Paul Walmsley <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]>
Author: Yuho Choi <[email protected]> Date: Mon Jun 1 14:32:47 2026 -0400 rpmsg: char: Fix use-after-free on probe error path commit 1ff3f528e67d20e2b1483dcaba899dc7832b2e6b upstream. rpmsg_chrdev_probe() stores the newly allocated eptdev in the default endpoint's priv pointer before calling rpmsg_chrdev_eptdev_add(). If rpmsg_chrdev_eptdev_add() then fails, its error path frees eptdev while the default endpoint may still dispatch callbacks with the stale priv pointer. Avoid publishing eptdev through the default endpoint until rpmsg_chrdev_eptdev_add() succeeds. Messages received before the priv pointer is published should be ignored by rpmsg_ept_cb(). Flow-control updates can hit rpmsg_ept_flow_cb() in the same window, so make both callbacks return success when priv is NULL. Fixes: bc69d1066569 ("rpmsg: char: Introduce the "rpmsg-raw" channel") Signed-off-by: Yuho Choi <[email protected]> Cc: [email protected] Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Mathieu Poirier <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]>
Author: David Howells <[email protected]> Date: Wed Jun 17 13:21:26 2026 -0400 rxrpc: Fix the ACK parser to extract the SACK table for parsing [ Upstream commit 333b6d5bb9f87827ac2639c737bf9613dbae7253 ] Fix modification of the received skbuff in rxrpc_input_soft_acks() and a potential incorrect access of the buffer in a fragmented UDP packet (the packet would probably have to be deliberately pre-generated as fragmented) when AF_RXRPC tries to extract the contents of the SACK table by copying out the contents of the SACK table into a buffer before attempting to parse AF_RXRPC assumes that it can just call skb_condense() and then validly access the SACK table from skb->data and that it will be a flat buffer - but skb_condense() can silently fail to do anything under some circumstances. Note that whilst rxrpc_input_soft_acks() should be able to parse extended ACKs, the rest of AF_RXRPC doesn't currently support that. Further, there's then no need to call skb_condense() in rxrpc_input_ack(), so don't. Fixes: d57a3a151660 ("rxrpc: Save last ACK's SACK table rather than marking txbufs") Reported-by: Michael Bommarito <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: David Howells <[email protected]> cc: Marc Dionne <[email protected]> cc: Jeffrey Altman <[email protected]> cc: Eric Dumazet <[email protected]> cc: "David S. Miller" <[email protected]> cc: Jakub Kicinski <[email protected]> cc: Paolo Abeni <[email protected]> cc: Simon Horman <[email protected]> cc: [email protected] cc: [email protected] cc: [email protected] Link: https://patch.msgid.link/[email protected] Signed-off-by: Paolo Abeni <[email protected]> Signed-off-by: Sasha Levin <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]>
Author: Steven Rostedt <[email protected]> Date: Thu Jun 18 16:15:23 2026 -0400 scripts/sorttable: Add helper functions for Elf_Ehdr [ Upstream commit 1dfb59a228dde59ad7d99b2fa2104e90004995c7 ] In order to remove the double #include of sorttable.h for 64 and 32 bit to create duplicate functions, add helper functions for Elf_Ehdr. This will create a function pointer for each helper that will get assigned to the appropriate function to handle either the 64bit or 32bit version. This also moves the _r()/r() wrappers for the Elf_Ehdr references that handle endian and size differences between the different architectures, into the helper function and out of the open code which is more error prone. Cc: bpf <[email protected]> Cc: Masami Hiramatsu <[email protected]> Cc: Mark Rutland <[email protected]> Cc: Mathieu Desnoyers <[email protected]> Cc: Andrew Morton <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Masahiro Yamada <[email protected]> Cc: Nathan Chancellor <[email protected]> Cc: Nicolas Schier <[email protected]> Cc: Zheng Yejian <[email protected]> Cc: Martin Kelly <[email protected]> Cc: Christophe Leroy <[email protected]> Cc: Josh Poimboeuf <[email protected]> Link: https://lore.kernel.org/[email protected] Signed-off-by: Steven Rostedt (Google) <[email protected]> Signed-off-by: Andrey Grodzovsky <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]>
Author: Steven Rostedt <[email protected]> Date: Thu Jun 18 16:15:24 2026 -0400 scripts/sorttable: Add helper functions for Elf_Shdr [ Upstream commit 67afb7f504400e5b4e5ff895459fbb3eb63d4450 ] In order to remove the double #include of sorttable.h for 64 and 32 bit to create duplicate functions, add helper functions for Elf_Shdr. This will create a function pointer for each helper that will get assigned to the appropriate function to handle either the 64bit or 32bit version. This also moves the _r()/r() wrappers for the Elf_Shdr references that handle endian and size differences between the different architectures, into the helper function and out of the open code which is more error prone. Cc: bpf <[email protected]> Cc: Masami Hiramatsu <[email protected]> Cc: Mark Rutland <[email protected]> Cc: Mathieu Desnoyers <[email protected]> Cc: Andrew Morton <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Masahiro Yamada <[email protected]> Cc: Nathan Chancellor <[email protected]> Cc: Nicolas Schier <[email protected]> Cc: Zheng Yejian <[email protected]> Cc: Martin Kelly <[email protected]> Cc: Christophe Leroy <[email protected]> Cc: Josh Poimboeuf <[email protected]> Link: https://lore.kernel.org/[email protected] Signed-off-by: Steven Rostedt (Google) <[email protected]> Signed-off-by: Andrey Grodzovsky <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]>
Author: Steven Rostedt <[email protected]> Date: Thu Jun 18 16:15:25 2026 -0400 scripts/sorttable: Add helper functions for Elf_Sym [ Upstream commit 17bed33ac12f011f4695059960e1b1d6457229a7 ] In order to remove the double #include of sorttable.h for 64 and 32 bit to create duplicate functions, add helper functions for Elf_Sym. This will create a function pointer for each helper that will get assigned to the appropriate function to handle either the 64bit or 32bit version. This also removes the last references of etype and _r() macros from the sorttable.h file as their references are now just defined in the appropriate architecture version of the helper functions. All read functions now exist in the helper functions which makes it easier to maintain, as the helper functions define the necessary architecture sizes. Cc: bpf <[email protected]> Cc: Masami Hiramatsu <[email protected]> Cc: Mark Rutland <[email protected]> Cc: Mathieu Desnoyers <[email protected]> Cc: Andrew Morton <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Masahiro Yamada <[email protected]> Cc: Nathan Chancellor <[email protected]> Cc: Nicolas Schier <[email protected]> Cc: Zheng Yejian <[email protected]> Cc: Martin Kelly <[email protected]> Cc: Christophe Leroy <[email protected]> Cc: Josh Poimboeuf <[email protected]> Link: https://lore.kernel.org/[email protected] Signed-off-by: Steven Rostedt (Google) <[email protected]> Signed-off-by: Andrey Grodzovsky <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]>
Author: Steven Rostedt <[email protected]> Date: Thu Jun 18 16:15:40 2026 -0400 scripts/sorttable: Allow matches to functions before function entry [ Upstream commit dc208c69c033d3caba0509da1ae065d2b5ff165f ] ARM 64 uses -fpatchable-function-entry=4,2 which adds padding before the function and the addresses in the mcount_loc point there instead of the function entry that is returned by nm. In order to find a function from nm to make sure it's not an unused weak function, the entries in the mcount_loc section needs to match the entries from nm. Since it can be an instruction before the entry, add a before_func variable that ARM 64 can set to 8, and if the mcount_loc entry is within 8 bytes of the nm function entry, then it will be considered a match. Cc: Masami Hiramatsu <[email protected]> Cc: Mark Rutland <[email protected]> Cc: Mathieu Desnoyers <[email protected]> Cc: Andrew Morton <[email protected]> Cc: Masahiro Yamada <[email protected]> Cc: Catalin Marinas <[email protected]> Cc: Will Deacon <[email protected]> Cc: "Arnd Bergmann" <[email protected]> Cc: Mark Brown <[email protected]> Link: https://lore.kernel.org/[email protected] Fixes: ef378c3b82338 ("scripts/sorttable: Zero out weak functions in mcount_loc table") Tested-by: Nathan Chancellor <[email protected]> Signed-off-by: Steven Rostedt (Google) <[email protected]> Signed-off-by: Andrey Grodzovsky <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]>
Author: Steven Rostedt <[email protected]> Date: Thu Jun 18 16:15:32 2026 -0400 scripts/sorttable: Always use an array for the mcount_loc sorting [ Upstream commit 5fb964f5ba53afda0e2b6dbc00b8205461ffe04a ] The sorting of the mcount_loc section is done directly to the section for x86 and arm32 but it uses a separate array for arm64 as arm64 has the values for the mcount_loc stored in the rela sections of the vmlinux ELF file. In order to use the same code to remove weak functions, always use a separate array to do the sorting. This requires splitting up the filling of the array into one function and the placing the contents of the array back into the rela sections or into the mcount_loc section into a separate file. Cc: bpf <[email protected]> Cc: Masami Hiramatsu <[email protected]> Cc: Mark Rutland <[email protected]> Cc: Mathieu Desnoyers <[email protected]> Cc: Andrew Morton <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Masahiro Yamada <[email protected]> Cc: Nathan Chancellor <[email protected]> Cc: Nicolas Schier <[email protected]> Cc: Zheng Yejian <[email protected]> Cc: Martin Kelly <[email protected]> Cc: Christophe Leroy <[email protected]> Cc: Josh Poimboeuf <[email protected]> Cc: Heiko Carstens <[email protected]> Cc: Catalin Marinas <[email protected]> Cc: Will Deacon <[email protected]> Cc: Vasily Gorbik <[email protected]> Cc: Alexander Gordeev <[email protected]> Link: https://lore.kernel.org/[email protected] Signed-off-by: Steven Rostedt (Google) <[email protected]> Signed-off-by: Andrey Grodzovsky <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]>
Author: Steven Rostedt <[email protected]> Date: Thu Jun 18 16:15:20 2026 -0400 scripts/sorttable: Convert Elf_Ehdr to union [ Upstream commit 157fb5b3cfd2cb5950314f926a76e567fc1921c5 ] In order to remove the double #include of sorttable.h for 64 and 32 bit to create duplicate functions for both, replace the Elf_Ehdr macro with a union that defines both Elf64_Ehdr and Elf32_Ehdr, with field e64 for the 64bit version, and e32 for the 32bit version. Then a macro etype can be used instead to get to the proper value. This will eventually be replaced with just single functions that can handle both 32bit and 64bit ELF parsing. Cc: bpf <[email protected]> Cc: Masami Hiramatsu <[email protected]> Cc: Mark Rutland <[email protected]> Cc: Mathieu Desnoyers <[email protected]> Cc: Andrew Morton <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Masahiro Yamada <[email protected]> Cc: Nathan Chancellor <[email protected]> Cc: Nicolas Schier <[email protected]> Cc: Zheng Yejian <[email protected]> Cc: Martin Kelly <[email protected]> Cc: Christophe Leroy <[email protected]> Cc: Josh Poimboeuf <[email protected]> Link: https://lore.kernel.org/[email protected] Signed-off-by: Steven Rostedt (Google) <[email protected]> Signed-off-by: Andrey Grodzovsky <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]>
Author: Steven Rostedt <[email protected]> Date: Thu Jun 18 16:15:22 2026 -0400 scripts/sorttable: Convert Elf_Sym MACRO over to a union [ Upstream commit 200d015e73b4da69bcd8212a7c58695452b12bad ] In order to remove the double #include of sorttable.h for 64 and 32 bit to create duplicate functions for both, replace the Elf_Sym macro with a union that defines both Elf64_Sym and Elf32_Sym, with field e64 for the 64bit version, and e32 for the 32bit version. It can then use the macro etype to get the proper value. This will eventually be replaced with just single functions that can handle both 32bit and 64bit ELF parsing. Cc: bpf <[email protected]> Cc: Masami Hiramatsu <[email protected]> Cc: Mark Rutland <[email protected]> Cc: Mathieu Desnoyers <[email protected]> Cc: Andrew Morton <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Masahiro Yamada <[email protected]> Cc: Nathan Chancellor <[email protected]> Cc: Nicolas Schier <[email protected]> Cc: Zheng Yejian <[email protected]> Cc: Martin Kelly <[email protected]> Cc: Christophe Leroy <[email protected]> Cc: Josh Poimboeuf <[email protected]> Link: https://lore.kernel.org/[email protected] Signed-off-by: Steven Rostedt (Google) <[email protected]> Signed-off-by: Andrey Grodzovsky <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]>
Author: Vasily Gorbik <[email protected]> Date: Thu Jun 18 16:15:41 2026 -0400 scripts/sorttable: Fix endianness handling in build-time mcount sort [ Upstream commit 023f124a64174c47e18340ded7e2a39b96eb9523 ] Kernel cross-compilation with BUILDTIME_MCOUNT_SORT produces zeroed mcount values if the build-host endianness does not match the ELF file endianness. The mcount values array is converted from ELF file endianness to build-host endianness during initialization in fill_relocs()/fill_addrs(). Avoid extra conversion of these values during weak-function zeroing; otherwise, they do not match nm-parsed addresses and all mcount values are zeroed out. Cc: Masami Hiramatsu <[email protected]> Cc: Catalin Marinas <[email protected]> Cc: Nathan Chancellor <[email protected]> Cc: Heiko Carstens <[email protected]> Cc: Alexander Gordeev <[email protected]> Link: https://lore.kernel.org/patch.git-dca31444b0f1.your-ad-here.call-01743554658-ext-8692@work.hours Fixes: ef378c3b8233 ("scripts/sorttable: Zero out weak functions in mcount_loc table") Reported-by: Ilya Leoshkevich <[email protected]> Reported-by: Ihor Solodrai <[email protected]> Closes: https://lore.kernel.org/all/[email protected]/ Signed-off-by: Vasily Gorbik <[email protected]> Signed-off-by: Steven Rostedt (Google) <[email protected]> Signed-off-by: Andrey Grodzovsky <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]>
Author: Steven Rostedt <[email protected]> Date: Thu Jun 18 16:15:28 2026 -0400 scripts/sorttable: Get start/stop_mcount_loc from ELF file directly [ Upstream commit 4acda8edefa1ce66d3de845f1c12745721cd14c3 ] The get_mcount_loc() does a cheesy trick to find the start_mcount_loc and stop_mcount_loc values. That trick is: file_start = popen(" grep start_mcount System.map | awk '{print $1}' ", "r"); and file_stop = popen(" grep stop_mcount System.map | awk '{print $1}' ", "r"); Those values are stored in the Elf symbol table. Use that to capture those values. Using the symbol table is more efficient and more robust. The above could fail if another variable had "start_mcount" or "stop_mcount" as part of its name. Cc: bpf <[email protected]> Cc: Masami Hiramatsu <[email protected]> Cc: Mark Rutland <[email protected]> Cc: Mathieu Desnoyers <[email protected]> Cc: Andrew Morton <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Masahiro Yamada <[email protected]> Cc: Nathan Chancellor <[email protected]> Cc: Nicolas Schier <[email protected]> Cc: Zheng Yejian <[email protected]> Cc: Martin Kelly <[email protected]> Cc: Christophe Leroy <[email protected]> Cc: Josh Poimboeuf <[email protected]> Link: https://lore.kernel.org/[email protected] Signed-off-by: Steven Rostedt (Google) <[email protected]> Signed-off-by: Andrey Grodzovsky <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]>
Author: Steven Rostedt <[email protected]> Date: Thu Jun 18 16:15:31 2026 -0400 scripts/sorttable: Have mcount rela sort use direct values [ Upstream commit a0265659322540d656727b9e132edfb6f06b6c1a ] The mcount_loc sorting for when the values are stored in the Elf_Rela entries uses the compare_extable() function to do the compares in the qsort(). That function does handle byte swapping if the machine being compiled for is a different endian than the host machine. But the sort_relocs() function sorts an array that pulled in the values from the Elf_Rela section and has already done the swapping. Create two new compare functions that will sort the direct values. One will sort 32 bit values and the other will sort the 64 bit value. One of these will be assigned to a compare_values function pointer and that will be used for sorting the Elf_Rela mcount values. Cc: bpf <[email protected]> Cc: Masami Hiramatsu <[email protected]> Cc: Mark Rutland <[email protected]> Cc: Mathieu Desnoyers <[email protected]> Cc: Andrew Morton <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Masahiro Yamada <[email protected]> Cc: Nathan Chancellor <[email protected]> Cc: Nicolas Schier <[email protected]> Cc: Zheng Yejian <[email protected]> Cc: Martin Kelly <[email protected]> Cc: Christophe Leroy <[email protected]> Cc: Josh Poimboeuf <[email protected]> Cc: Heiko Carstens <[email protected]> Cc: Catalin Marinas <[email protected]> Cc: Will Deacon <[email protected]> Cc: Vasily Gorbik <[email protected]> Cc: Alexander Gordeev <[email protected]> Link: https://lore.kernel.org/[email protected] Signed-off-by: Steven Rostedt (Google) <[email protected]> Signed-off-by: Andrey Grodzovsky <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]>
Author: Steven Rostedt <[email protected]> Date: Thu Jun 18 16:15:18 2026 -0400 scripts/sorttable: Have the ORC code use the _r() functions to read [ Upstream commit 66990c003306c240d570b3ba274ec4f68cf18c91 ] The ORC code reads the section information directly from the file. This currently works because the default read function is for 64bit little endian machines. But if for some reason that ever changes, this will break. Instead of having a surprise breakage, use the _r() functions that will read the values from the file properly. Cc: bpf <[email protected]> Cc: Masami Hiramatsu <[email protected]> Cc: Mark Rutland <[email protected]> Cc: Mathieu Desnoyers <[email protected]> Cc: Andrew Morton <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Masahiro Yamada <[email protected]> Cc: Nathan Chancellor <[email protected]> Cc: Nicolas Schier <[email protected]> Cc: Zheng Yejian <[email protected]> Cc: Martin Kelly <[email protected]> Cc: Christophe Leroy <[email protected]> Cc: Josh Poimboeuf <[email protected]> Link: https://lore.kernel.org/[email protected] Signed-off-by: Steven Rostedt (Google) <[email protected]> Signed-off-by: Andrey Grodzovsky <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]>
Author: Steven Rostedt <[email protected]> Date: Thu Jun 18 16:15:19 2026 -0400 scripts/sorttable: Make compare_extable() into two functions [ Upstream commit 7ffc0d0819f438779ed592e2e2e3576f43ce14f0 ] Instead of having the compare_extable() part of the sorttable.h header where it get's defined twice, since it is a very simple function, just define it twice in sorttable.c, and then it can use the proper read functions for the word size and endianess and the Elf_Addr macro can be removed from sorttable.h. Also add a micro optimization. Instead of: if (a < b) return -1; if (a > b) return 1; return 0; That can be shorten to: if (a < b) return -1; return a > b; Cc: bpf <[email protected]> Cc: Masami Hiramatsu <[email protected]> Cc: Mark Rutland <[email protected]> Cc: Mathieu Desnoyers <[email protected]> Cc: Andrew Morton <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Masahiro Yamada <[email protected]> Cc: Nathan Chancellor <[email protected]> Cc: Nicolas Schier <[email protected]> Cc: Zheng Yejian <[email protected]> Cc: Martin Kelly <[email protected]> Cc: Christophe Leroy <[email protected]> Cc: Josh Poimboeuf <[email protected]> Link: https://lore.kernel.org/[email protected] Signed-off-by: Steven Rostedt (Google) <[email protected]> Signed-off-by: Andrey Grodzovsky <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]>
Author: Steven Rostedt <[email protected]> Date: Thu Jun 18 16:15:27 2026 -0400 scripts/sorttable: Move code from sorttable.h into sorttable.c [ Upstream commit 58d87678a0f46c6120904b4326aaf5ebf4454c69 ] Instead of having the main code live in a header file and included twice with MACROs that define the Elf structures for 64 bit or 32 bit, move the code in the C file now that the Elf structures are defined in a union that has both. All accesses to the Elf structure fields are done through helper function pointers. If the file being parsed if for a 64 bit architecture, all the helper functions point to the 64 bit versions to retrieve the Elf fields. The same is true if the architecture is 32 bit, where the function pointers will point to the 32 bit helper functions. Note, when the value of a field can be either 32 bit or 64 bit, a 64 bit is always returned, as it works for the 32 bit code as well. This makes the code easier to read and maintain, and it now all exists in sorttable.c and sorttable.h may be removed. Cc: bpf <[email protected]> Cc: Masami Hiramatsu <[email protected]> Cc: Mark Rutland <[email protected]> Cc: Mathieu Desnoyers <[email protected]> Cc: Andrew Morton <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Masahiro Yamada <[email protected]> Cc: Nathan Chancellor <[email protected]> Cc: Nicolas Schier <[email protected]> Cc: Zheng Yejian <[email protected]> Cc: Martin Kelly <[email protected]> Cc: Christophe Leroy <[email protected]> Cc: Josh Poimboeuf <[email protected]> Cc: Stephen Rothwell <[email protected]> Link: https://lore.kernel.org/[email protected] Signed-off-by: Steven Rostedt (Google) <[email protected]> Signed-off-by: Andrey Grodzovsky <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]>
Author: Steven Rostedt <[email protected]> Date: Thu Jun 18 16:15:17 2026 -0400 scripts/sorttable: Remove unneeded Elf_Rel [ Upstream commit 6f2c2f93a190467cebd6ebd03feb49514fead5ca ] The code had references to initialize the Elf_Rel relocation tables, but it was never used. Remove it. Cc: bpf <[email protected]> Cc: Masami Hiramatsu <[email protected]> Cc: Mark Rutland <[email protected]> Cc: Mathieu Desnoyers <[email protected]> Cc: Andrew Morton <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Masahiro Yamada <[email protected]> Cc: Nathan Chancellor <[email protected]> Cc: Nicolas Schier <[email protected]> Cc: Zheng Yejian <[email protected]> Cc: Martin Kelly <[email protected]> Cc: Christophe Leroy <[email protected]> Cc: Josh Poimboeuf <[email protected]> Link: https://lore.kernel.org/[email protected] Signed-off-by: Steven Rostedt (Google) <[email protected]> Signed-off-by: Andrey Grodzovsky <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]>
Author: Steven Rostedt <[email protected]> Date: Thu Jun 18 16:15:15 2026 -0400 scripts/sorttable: Remove unused macro defines [ Upstream commit 28b24394c6e9a3166fcb4480cba054562526657c ] The code of sorttable.h was copied from the recordmcount.h which defined a bunch of Elf MACROs so that they could be used between 32bit and 64bit functions. But there's several MACROs that sorttable.h does not use but was copied over. Remove them to clean up the code. Cc: bpf <[email protected]> Cc: Masami Hiramatsu <[email protected]> Cc: Mark Rutland <[email protected]> Cc: Mathieu Desnoyers <[email protected]> Cc: Andrew Morton <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Masahiro Yamada <[email protected]> Cc: Nathan Chancellor <[email protected]> Cc: Nicolas Schier <[email protected]> Cc: Zheng Yejian <[email protected]> Cc: Martin Kelly <[email protected]> Cc: Christophe Leroy <[email protected]> Cc: Josh Poimboeuf <[email protected]> Link: https://lore.kernel.org/[email protected] Signed-off-by: Steven Rostedt (Google) <[email protected]> Signed-off-by: Andrey Grodzovsky <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]>
Author: Steven Rostedt <[email protected]> Date: Thu Jun 18 16:15:16 2026 -0400 scripts/sorttable: Remove unused write functions [ Upstream commit 4f48a28b37d594dab38092514a42ae9f4b781553 ] The code of sorttable.h was copied from the recordmcount.h which defined various write functions for different sizes (2, 4, 8 byte lengths). But sorttable only uses the 4 byte writes. Remove the extra versions as they are not used. Cc: bpf <[email protected]> Cc: Masami Hiramatsu <[email protected]> Cc: Mark Rutland <[email protected]> Cc: Mathieu Desnoyers <[email protected]> Cc: Andrew Morton <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Masahiro Yamada <[email protected]> Cc: Nathan Chancellor <[email protected]> Cc: Nicolas Schier <[email protected]> Cc: Zheng Yejian <[email protected]> Cc: Martin Kelly <[email protected]> Cc: Christophe Leroy <[email protected]> Cc: Josh Poimboeuf <[email protected]> Link: https://lore.kernel.org/[email protected] Signed-off-by: Steven Rostedt (Google) <[email protected]> Signed-off-by: Andrey Grodzovsky <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]>
Author: Steven Rostedt <[email protected]> Date: Thu Jun 18 16:15:21 2026 -0400 scripts/sorttable: Replace Elf_Shdr Macro with a union [ Upstream commit 545f6cf8f4c9a268e0bab2637f1d279679befdbf ] In order to remove the double #include of sorttable.h for 64 and 32 bit to create duplicate functions for both, replace the Elf_Shdr macro with a union that defines both Elf64_Shdr and Elf32_Shdr, with field e64 for the 64bit version, and e32 for the 32bit version. It can then use the macro etype to get the proper value. This will eventually be replaced with just single functions that can handle both 32bit and 64bit ELF parsing. Cc: bpf <[email protected]> Cc: Masami Hiramatsu <[email protected]> Cc: Mark Rutland <[email protected]> Cc: Mathieu Desnoyers <[email protected]> Cc: Andrew Morton <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Masahiro Yamada <[email protected]> Cc: Nathan Chancellor <[email protected]> Cc: Nicolas Schier <[email protected]> Cc: Zheng Yejian <[email protected]> Cc: Martin Kelly <[email protected]> Cc: Christophe Leroy <[email protected]> Cc: Josh Poimboeuf <[email protected]> Link: https://lore.kernel.org/[email protected] Signed-off-by: Steven Rostedt (Google) <[email protected]> Signed-off-by: Andrey Grodzovsky <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]>
Author: Steven Rostedt <[email protected]> Date: Thu Jun 18 16:15:29 2026 -0400 scripts/sorttable: Use a structure of function pointers for elf helpers [ Upstream commit 1e5f6771c247b28135307058d2cfe3b0153733dc ] Instead of having a series of function pointers that gets assigned to the Elf64 or Elf32 versions, put them all into a single structure and use that. Add the helper function that chooses the structure into the macros that build the different versions of the elf functions. Link: https://lore.kernel.org/all/CAHk-=wiafEyX7UgOeZgvd6fvuByE5WXUPh9599kwOc_d-pdeug@mail.gmail.com/ Cc: bpf <[email protected]> Cc: Masami Hiramatsu <[email protected]> Cc: Mark Rutland <[email protected]> Cc: Mathieu Desnoyers <[email protected]> Cc: Andrew Morton <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Masahiro Yamada <[email protected]> Cc: Nathan Chancellor <[email protected]> Cc: Nicolas Schier <[email protected]> Cc: Zheng Yejian <[email protected]> Cc: Martin Kelly <[email protected]> Cc: Christophe Leroy <[email protected]> Cc: Josh Poimboeuf <[email protected]> Cc: Stephen Rothwell <[email protected]> Link: https://lore.kernel.org/[email protected] Suggested-by: Linus Torvalds <[email protected]> Signed-off-by: Steven Rostedt (Google) <[email protected]> Signed-off-by: Andrey Grodzovsky <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]>
Author: Steven Rostedt <[email protected]> Date: Thu Jun 18 16:15:39 2026 -0400 scripts/sorttable: Use normal sort if theres no relocs in the mcount section [ Upstream commit 46514b3c2c17c67cefe84b0c1a59e0aaf6093131 ] When ARM 64 is compiled with gcc, the mcount_loc section will be filled with zeros and the addresses will be located in the Elf_Rela sections. To sort the mcount_loc section, the addresses from the Elf_Rela need to be placed into an array and that is sorted. But when ARM 64 is compiled with clang, it does it the same way as other architectures and leaves the addresses as is in the mcount_loc section. To handle both cases, ARM 64 will first try to sort the Elf_Rela section, and if it doesn't find any functions, it will then fall back to the sorting of the addresses in the mcount_loc section itself. Cc: Masami Hiramatsu <[email protected]> Cc: Mark Rutland <[email protected]> Cc: Mathieu Desnoyers <[email protected]> Cc: Andrew Morton <[email protected]> Cc: Masahiro Yamada <[email protected]> Cc: Catalin Marinas <[email protected]> Cc: Will Deacon <[email protected]> Cc: Mark Brown <[email protected]> Link: https://lore.kernel.org/[email protected] Fixes: b3d09d06e052 ("arm64: scripts/sorttable: Implement sorting mcount_loc at boot for arm64") Reported-by: "Arnd Bergmann" <[email protected]> Tested-by: Nathan Chancellor <[email protected]> Closes: https://lore.kernel.org/all/[email protected]/ Signed-off-by: Steven Rostedt (Google) <[email protected]> Signed-off-by: Andrey Grodzovsky <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]>
Author: Steven Rostedt <[email protected]> Date: Thu Jun 18 16:15:26 2026 -0400 scripts/sorttable: Use uint64_t for mcount sorting [ Upstream commit 1b649e6ab8dc9188d82c64069493afe66ca0edad ] The mcount sorting defines uint_t to uint64_t on 64bit architectures and uint32_t on 32bit architectures. It can work with just using uint64_t as that will hold the values of both, and they are not used to point into the ELF file. sizeof(uint_t) is used for defining the size of the mcount_loc section. Instead of using a type, define long_size and use that instead. This will allow the header code to be moved into the C file as generic functions and not need to include sorttable.h twice, once for 64bit and once for 32bit. Cc: bpf <[email protected]> Cc: Masami Hiramatsu <[email protected]> Cc: Mark Rutland <[email protected]> Cc: Mathieu Desnoyers <[email protected]> Cc: Andrew Morton <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Masahiro Yamada <[email protected]> Cc: Nathan Chancellor <[email protected]> Cc: Nicolas Schier <[email protected]> Cc: Zheng Yejian <[email protected]> Cc: Martin Kelly <[email protected]> Cc: Christophe Leroy <[email protected]> Cc: Josh Poimboeuf <[email protected]> Link: https://lore.kernel.org/[email protected] Signed-off-by: Steven Rostedt (Google) <[email protected]> Signed-off-by: Andrey Grodzovsky <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]>
Author: Steven Rostedt <[email protected]> Date: Thu Jun 18 16:15:33 2026 -0400 scripts/sorttable: Zero out weak functions in mcount_loc table [ Upstream commit ef378c3b8233855497a414b9d67bf22592c928a4 ] When a function is annotated as "weak" and is overridden, the code is not removed. If it is traced, the fentry/mcount location in the weak function will be referenced by the "__mcount_loc" section. This will then be added to the available_filter_functions list. Since only the address of the functions are listed, to find the name to show, a search of kallsyms is used. Since kallsyms will return the function by simply finding the function that the address is after but before the next function, an address of a weak function will show up as the function before it. This is because kallsyms does not save names of weak functions. This has caused issues in the past, as now the traced weak function will be listed in available_filter_functions with the name of the function before it. At best, this will cause the previous function's name to be listed twice. At worse, if the previous function was marked notrace, it will now show up as a function that can be traced. Note that it only shows up that it can be traced but will not be if enabled, which causes confusion. https://lore.kernel.org/all/[email protected]/ The commit b39181f7c6907 ("ftrace: Add FTRACE_MCOUNT_MAX_OFFSET to avoid adding weak function") was a workaround to this by checking the function address before printing its name. If the address was too far from the function given by the name then instead of printing the name it would print: __ftrace_invalid_address___<invalid-offset> The real issue is that these invalid addresses are listed in the ftrace table look up which available_filter_functions is derived from. A place holder must be listed in that file because set_ftrace_filter may take a series of indexes into that file instead of names to be able to do O(1) lookups to enable filtering (many tools use this method). Even if kallsyms saved the size of the function, it does not remove the need of having these place holders. The real solution is to not add a weak function into the ftrace table in the first place. To solve this, the sorttable.c code that sorts the mcount regions during the build is modified to take a "nm -S vmlinux" input, sort it, and any function listed in the mcount_loc section that is not within a boundary of the function list given by nm is considered a weak function and is zeroed out. Note, this does not mean they will remain zero when booting as KASLR will still shift those addresses. To handle this, the entries in the mcount_loc section will be ignored if they are zero or match the kaslr_offset() value. Before: ~# grep __ftrace_invalid_address___ /sys/kernel/tracing/available_filter_functions | wc -l 551 After: ~# grep __ftrace_invalid_address___ /sys/kernel/tracing/available_filter_functions | wc -l 0 Cc: bpf <[email protected]> Cc: Masami Hiramatsu <[email protected]> Cc: Mark Rutland <[email protected]> Cc: Mathieu Desnoyers <[email protected]> Cc: Andrew Morton <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Masahiro Yamada <[email protected]> Cc: Nathan Chancellor <[email protected]> Cc: Nicolas Schier <[email protected]> Cc: Zheng Yejian <[email protected]> Cc: Martin Kelly <[email protected]> Cc: Christophe Leroy <[email protected]> Cc: Josh Poimboeuf <[email protected]> Cc: Heiko Carstens <[email protected]> Cc: Catalin Marinas <[email protected]> Cc: Will Deacon <[email protected]> Cc: Vasily Gorbik <[email protected]> Cc: Alexander Gordeev <[email protected]> Link: https://lore.kernel.org/[email protected] Signed-off-by: Steven Rostedt (Google) <[email protected]> Signed-off-by: Andrey Grodzovsky <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]>
Author: Xin Long <[email protected]> Date: Thu Jun 25 11:24:42 2026 +0300 sctp: disable BH before calling udp_tunnel_xmit_skb() commit 2cd7e6971fc2787408ceef17906ea152791448cf upstream. udp_tunnel_xmit_skb() / udp_tunnel6_xmit_skb() are expected to run with BH disabled. After commit 6f1a9140ecda ("add xmit recursion limit to tunnel xmit functions"), on the path: udp(6)_tunnel_xmit_skb() -> ip(6)tunnel_xmit() dev_xmit_recursion_inc()/dec() must stay balanced on the same CPU. Without local_bh_disable(), the context may move between CPUs, which can break the inc/dec pairing. This may lead to incorrect recursion level detection and cause packets to be dropped in ip(6)_tunnel_xmit() or __dev_queue_xmit(). Fix it by disabling BH around both IPv4 and IPv6 SCTP UDP xmit paths. In my testing, after enabling the SCTP over UDP: # ip net exec ha sysctl -w net.sctp.udp_port=9899 # ip net exec ha sysctl -w net.sctp.encap_port=9899 # ip net exec hb sysctl -w net.sctp.udp_port=9899 # ip net exec hb sysctl -w net.sctp.encap_port=9899 # ip net exec ha iperf3 -s - without this patch: # ip net exec hb iperf3 -c 192.168.0.1 --sctp [ 5] 0.00-10.00 sec 37.2 MBytes 31.2 Mbits/sec sender [ 5] 0.00-10.00 sec 37.1 MBytes 31.1 Mbits/sec receiver - with this patch: # ip net exec hb iperf3 -c 192.168.0.1 --sctp [ 5] 0.00-10.00 sec 3.14 GBytes 2.69 Gbits/sec sender [ 5] 0.00-10.00 sec 3.14 GBytes 2.69 Gbits/sec receiver Fixes: 6f1a9140ecda ("net: add xmit recursion limit to tunnel xmit functions") Fixes: 046c052b475e ("sctp: enable udp tunneling socks") Signed-off-by: Xin Long <[email protected]> Acked-by: Marcelo Ricardo Leitner <[email protected]> Link: https://patch.msgid.link/c874a8548221dcd56ff03c65ba75a74e6cf99119.1776017727.git.lucien.xin@gmail.com Signed-off-by: Jakub Kicinski <[email protected]> Signed-off-by: Alexander Martyniuk <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]>
Author: Varun R Mallya <[email protected]> Date: Wed Jun 24 14:22:32 2026 +0800 selftests/bpf: Add test to ensure kprobe_multi is not sleepable commit c7cab53f9d5273f0cf2a26bdf178c4e074bdfb50 upstream. Add a selftest to ensure that kprobe_multi programs cannot be attached using the BPF_F_SLEEPABLE flag. This test succeeds when the kernel rejects attachment of kprobe_multi when the BPF_F_SLEEPABLE flag is set. Suggested-by: Leon Hwang <[email protected]> Signed-off-by: Varun R Mallya <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Alexei Starovoitov <[email protected]> Signed-off-by: Shung-Hsi Yu <[email protected]> Signed-off-by: Sasha Levin <[email protected]>
Author: Paul Moore <[email protected]> Date: Mon Jun 29 15:03:38 2026 +0800 selinux: fix overlayfs mmap() and mprotect() access checks [ Upstream commit 82544d36b1729153c8aeb179e84750f0c085d3b1 ] The existing SELinux security model for overlayfs is to allow access if the current task is able to access the top level file (the "user" file) and the mounter's credentials are sufficient to access the lower level file (the "backing" file). Unfortunately, the current code does not properly enforce these access controls for both mmap() and mprotect() operations on overlayfs filesystems. This patch makes use of the newly created security_mmap_backing_file() LSM hook to provide the missing backing file enforcement for mmap() operations, and leverages the backing file API and new LSM blob to provide the necessary information to properly enforce the mprotect() access controls. Cc: [email protected] Acked-by: Amir Goldstein <[email protected]> Signed-off-by: Paul Moore <[email protected]> Signed-off-by: Cai Xinchen <[email protected]> Signed-off-by: Sasha Levin <[email protected]>
Author: Stepan Ionichev <[email protected]> Date: Thu Jun 25 09:49:39 2026 -0400 serial: 8250_dw: unregister 8250 port if clk_notifier_register() fails [ Upstream commit 10fc708b4de7f86002d2d735a2dbf3b5b7f65692 ] dw8250_probe() registers the 8250 port via serial8250_register_8250_port() and then, if the device has a clock, registers a clock notifier. If clk_notifier_register() fails, probe returns the error but leaves the 8250 port registered. The matching serial8250_unregister_port() lives in dw8250_remove(), which is not called when probe fails, so the port slot stays occupied until the device is rebound or the system is rebooted. The devm-allocated driver data is freed while the port still references it (via the saved private_data and serial_in/serial_out callbacks), so any access to that port slot before a rebind is a use-after-free hazard. Unregister the port on the clk_notifier_register() error path. Fixes: cc816969d7b5 ("serial: 8250_dw: Fix common clocks usage race condition") Cc: [email protected] Signed-off-by: Stepan Ionichev <[email protected]> Reviewed-by: Andy Shevchenko <[email protected]> Link: https://patch.msgid.link/[email protected] Signed-off-by: Greg Kroah-Hartman <[email protected]> Signed-off-by: Sasha Levin <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]>
Author: Viken Dadhaniya <[email protected]> Date: Thu May 28 22:48:07 2026 +0530 serial: qcom_geni: Fix RX DMA stall when SE_DMA_RX_LEN_IN is zero commit b93062b6d8a1b2d9bad235cac25558a909819026 upstream. In qcom_geni_serial_handle_rx_dma(), geni_se_rx_dma_unprep() clears port->rx_dma_addr before SE_DMA_RX_LEN_IN is read. If the register is zero, for example when the RX stale counter fires on an idle line, the handler returns without calling geni_se_rx_dma_prep(). The next RX DMA interrupt then hits the !port->rx_dma_addr guard and returns immediately, so the RX DMA buffer is never rearmed and later input is lost. Keep the handler on the rearm path when rx_in is zero. Warn about the unexpected zero-length DMA completion, skip received-data handling, and always call geni_se_rx_dma_prep(). Fixes: 2aaa43c70778 ("tty: serial: qcom-geni-serial: add support for serial engine DMA") Cc: [email protected] Reviewed-by: Bartosz Golaszewski <[email protected]> Signed-off-by: Viken Dadhaniya <[email protected]> Link: https://patch.msgid.link/[email protected] Signed-off-by: Greg Kroah-Hartman <[email protected]>
Author: Salman Alghamdi <[email protected]> Date: Tue Jun 16 11:55:20 2026 -0400 staging: rtl8723bs: fix buffer over-read in rtw_update_protection [ Upstream commit 514ab98364595007d4557ecc85d7e5f012c504d3 ] rtw_update_protection() is called with a pointer offset into the ies buffer but the full ie_length is passed, causing a potential buffer over-read. Fixes: e945c43df60b ("Staging: rtl8723bs: Delete dead code from update_current_network()") Fixes: d3fcee1b78a5 ("staging: rtl8723bs: fix camel case in struct wlan_bssid_ex") Reported-by: Luka Gejak <[email protected]> Closes: https://lore.kernel.org/linux-staging/[email protected] Cc: [email protected] Signed-off-by: Salman Alghamdi <[email protected]> Reviewed-by: Luka Gejak <[email protected]> Link: https://patch.msgid.link/[email protected] Signed-off-by: Greg Kroah-Hartman <[email protected]> Signed-off-by: Sasha Levin <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]>
Author: Doruk Tan Ozturk <[email protected]> Date: Wed Jun 17 09:58:18 2026 +0200 tipc: fix slab-use-after-free Read in tipc_aead_decrypt_done commit bda3348872a2ef0d19f2df6aa8cb5025adce2f20 upstream. tipc_aead_decrypt() goes straight from tipc_bearer_hold(b) to crypto_aead_decrypt(req) without taking a reference on the netns, unlike the encrypt path. When crypto_aead_decrypt() is offloaded asynchronously (e.g. the SIMD aead wrapper queuing to cryptd), the cryptd worker runs tipc_aead_decrypt_done() later. If the bearer's netns is torn down in the meantime, cleanup_net() -> tipc_exit_net() -> tipc_crypto_stop() frees the per-netns tipc_crypto, and the completion then reads it: tipc_aead_decrypt_done() dereferences aead->crypto->stats and aead->crypto->net, and tipc_crypto_rcv_complete() dereferences aead->crypto->aead[] and the node table -- reading freed memory. Decoded KASAN splat (v7.1-rc7, CONFIG_KASAN_INLINE + TIPC + TIPC_CRYPTO): BUG: KASAN: slab-use-after-free in tipc_aead_decrypt_done (net/tipc/crypto.c:999) Read of size 8 at addr ffff8881056258a8 by task kworker/u16:2/51 Workqueue: events_unbound Call Trace: tipc_aead_decrypt_done (net/tipc/crypto.c:999) process_one_work (kernel/workqueue.c:3314) worker_thread (kernel/workqueue.c:3397 kernel/workqueue.c:3478) kthread (kernel/kthread.c:436) ret_from_fork (arch/x86/kernel/process.c:158) ret_from_fork_asm (arch/x86/entry/entry_64.S:245) Allocated by task 169: __kasan_kmalloc (mm/kasan/common.c:398 mm/kasan/common.c:415) tipc_crypto_start (net/tipc/crypto.c:1502) tipc_init_net (net/tipc/core.c:72) ops_init (net/core/net_namespace.c:137) setup_net (net/core/net_namespace.c:446) copy_net_ns (net/core/net_namespace.c:579) create_new_namespaces (kernel/nsproxy.c:132) __x64_sys_unshare (kernel/fork.c:3316) do_syscall_64 (arch/x86/entry/syscall_64.c:63) entry_SYSCALL_64_after_hwframe (arch/x86/entry/entry_64.S:121) Freed by task 8: kfree (mm/slub.c:6566) tipc_exit_net (net/tipc/core.c:119) cleanup_net (net/core/net_namespace.c:704) process_one_work (kernel/workqueue.c:3314) kthread (kernel/kthread.c:436) This is the same class of bug that commit e279024617134 ("net/tipc: fix slab-use-after-free Read in tipc_aead_encrypt_done") fixed for the encrypt side. The encrypt path takes maybe_get_net(aead->crypto->net) before crypto_aead_encrypt() and drops it with put_net() on the synchronous return paths and in tipc_aead_encrypt_done(); the -EINPROGRESS/-EBUSY return keeps the reference for the async callback to release. The decrypt path was left without the equivalent guard. Mirror the encrypt-side fix on the decrypt path: take a net reference before crypto_aead_decrypt() (failing with -ENODEV and the matching bearer put if it cannot be acquired), keep it across the -EINPROGRESS/-EBUSY async return, and drop it with put_net() on the synchronous success/error return and at the end of tipc_aead_decrypt_done(). Reproduced under KASAN on v7.1-rc7: a UDP bearer with a cluster key is flooded with crafted encrypted frames from an unknown peer (driving the cluster-key decrypt path) while the bearer's netns is repeatedly torn down. The completion must run asynchronously to outlive tipc_crypto_stop(); on x86 the stock aesni gcm(aes) now decrypts synchronously, so the async path was exercised via cryptd offload. The unguarded aead->crypto dereference in tipc_aead_decrypt_done() is the unpatched upstream path; tipc_aead_decrypt() still lacks maybe_get_net(aead->crypto->net), so the completion can outlive the free on any config where crypto_aead_decrypt() goes async. Found by 0sec automated security-research tooling (https://0sec.ai). Fixes: fc1b6d6de220 ("tipc: introduce TIPC encryption & authentication") Cc: [email protected] Signed-off-by: Doruk Tan Ozturk <[email protected]> Reviewed-by: Alexander Lobakin <[email protected]> Reviewed-by: Tung Nguyen <[email protected]> Reviewed-by: Simon Horman <[email protected]> Link: https://patch.msgid.link/[email protected] Signed-off-by: Jakub Kicinski <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]>
Author: Yi Yang <[email protected]> Date: Thu Jun 4 06:07:34 2026 +0000 vc_screen: fix null-ptr-deref in vcs_notifier() during concurrent vcs_write commit a287620312dc6dcb9a093417a0e589bf30fcf38a upstream. A KASAN null-ptr-deref was observed in vcs_notifier(): BUG: KASAN: null-ptr-deref in vcs_notifier+0x98/0x130 Read of size 2 at addr qmp_cmd_name: qmp_capabilities, arguments: {} The issue is a race condition in vcs_write(). When the console_lock is temporarily dropped (to copy data from userspace), the vc_data pointer obtained from vcs_vc() may become stale. After re-acquiring the lock, vcs_vc() is called again to re-validate the pointer. If the vc has been deallocated in the meantime, vcs_vc() returns NULL, and the while loop breaks (with written > 0). However, after the loop, vcs_scr_updated(vc) is still called with the now-NULL vc pointer, leading to a null pointer dereference in the notifier chain (vcs_notifier dereferences param->vc). Fix this by adding a NULL check for vc before calling vcs_scr_updated(). Fixes: 8fb9ea65c9d1 ("vc_screen: reload load of struct vc_data pointer in vcs_write() to avoid UAF") Cc: [email protected] Signed-off-by: Yi Yang <[email protected]> Reviewed-by: Jiri Slaby <[email protected]> Link: https://patch.msgid.link/[email protected] Signed-off-by: Greg Kroah-Hartman <[email protected]>
Author: Miklos Szeredi <[email protected]> Date: Thu May 28 10:58:24 2026 +0200 virtiofs: fix UAF on submount umount commit 06b41351779e9289e8785694ade9042ae85e41ea upstream. iput() called from fuse_release_end() can Oops if the super block has already been destroyed. Normally this is prevented by waiting for num_waiting to go down to zero before commencing with super block shutdown. This only works, however, for the last submount instance, as the wait counter is per connection, not per superblock. Revert to using synchronous release requests for the auto_submounts case, which is virtiofs only at this time. Reported-by: Aurélien Bombo <[email protected]> Reported-by: Zhihao Cheng <[email protected]> Cc: Greg Kurz <[email protected]> Closes: https://github.com/kata-containers/kata-containers/issues/12589 Fixes: 26e5c67deb2e ("fuse: fix livelock in synchronous file put from fuseblk workers") Cc: [email protected] Reviewed-by: Greg Kurz <[email protected]> Signed-off-by: Miklos Szeredi <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]>
Author: Jose Ignacio Tornos Martinez <[email protected]> Date: Mon Apr 20 13:01:29 2026 +0200 wifi: ath11k: fix warning when unbinding commit 8b7a26b6681922a38cd5a7829ace61f8e54df9b7 upstream. If there is an error during some initialization related to firmware, the buffers dp->tx_ring[i].tx_status are released. However this is released again when the device is unbinded (ath11k_pci), and we get: WARNING: CPU: 0 PID: 6231 at mm/slub.c:4368 free_large_kmalloc+0x57/0x90 Call Trace: free_large_kmalloc ath11k_dp_free ath11k_core_deinit ath11k_pci_remove ... The issue is always reproducible from a VM because the MSI addressing initialization is failing. In order to fix the issue, just set the buffers to NULL after releasing in order to avoid the double free. Fixes: d5c65159f289 ("ath11k: driver for Qualcomm IEEE 802.11ax devices") Cc: [email protected] Signed-off-by: Jose Ignacio Tornos Martinez <[email protected]> Reviewed-by: Baochen Qiang <[email protected]> Reviewed-by: Rameshkumar Sundaram <[email protected]> Link: https://patch.msgid.link/[email protected] Signed-off-by: Jeff Johnson <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]>
Author: Junjie Cao <[email protected]> Date: Thu Feb 12 20:50:34 2026 +0800 wifi: iwlwifi: mvm: fix race condition in PTP removal commit 65150c9cc3e06ab54bc4e8134a47f6f5d095a4e3 upstream. iwl_mvm_ptp_remove() calls cancel_delayed_work_sync() only after ptp_clock_unregister() and clearing ptp_data state (ptp_clock, ptp_clock_info, last_gp2). This creates a race where the delayed work iwl_mvm_ptp_work() can execute between ptp_clock_unregister() and cancel_delayed_work_sync(), observing partially cleared PTP state. Move cancel_delayed_work_sync() before ptp_clock_unregister() to ensure the delayed work is fully stopped before any PTP cleanup begins. Cc: [email protected] Reviewed-by: Simon Horman <[email protected]> Reviewed-by: Vadim Fedorenko <[email protected]> Signed-off-by: Junjie Cao <[email protected]> Link: https://patch.msgid.link/[email protected] Signed-off-by: Miri Korenblit <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]>
Author: Zenm Chen <[email protected]> Date: Tue Apr 7 23:44:30 2026 +0800 wifi: mt76: mt76x2u: Add support for ELECOM WDC-867SU3S commit f4ce0664e9f0387873b181777891741c33e19465 upstream. Add the ID 056e:400a to the table to support an additional MT7612U adapter: ELECOM WDC-867SU3S. Compile tested only. Cc: [email protected] # 5.10.x Signed-off-by: Zenm Chen <[email protected]> Acked-by: Lorenzo Bianconi <[email protected]> Link: https://patch.msgid.link/[email protected] Signed-off-by: Felix Fietkau <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]>
Author: Leon Yen <[email protected]> Date: Thu Jun 18 11:14:11 2026 +0300 wifi: mt76: mt7921: avoid undesired changes of the preset regulatory domain commit 2425dc7beaadc39c2636f97f8bdc22dc3cf88149 upstream. Some countries have strict RF restrictions where changing the regulatory domain dynamically based on the connected AP is not acceptable. This patch disables Beacon country IE hinting when a valid country code is set from usersland (e.g., by system using iw or CRDA). Signed-off-by: Leon Yen <[email protected]> Signed-off-by: Ming Yen Hsieh <[email protected]> Tested-by: David Ruth <[email protected]> Link: https://patch.msgid.link/[email protected] Signed-off-by: Felix Fietkau <[email protected]> Signed-off-by: Ajrat Makhmutov <[email protected]> Signed-off-by: Sasha Levin <[email protected]>
Author: Quan Zhou <[email protected]> Date: Thu Jun 18 11:14:12 2026 +0300 wifi: mt76: mt7921: fix a potential scan no APs commit 5ed54896b6bd444223092cab361b0785932119ab upstream. In multi-channel scenarios, the granted channel must be aborted before station remove. Otherwise, the firmware will be put into a wrong state, resulting in have chance to make subsequence scan no APs. With this patch, the granted channel will be always aborted before station remove. Signed-off-by: Quan Zhou <[email protected]> Reviewed-by: Sean Wang <[email protected]> Tested-by: David Ruth <[email protected]> Reviewed-by: David Ruth <[email protected]> Link: https://patch.msgid.link/1ac1ae779db86d4012199a24ea2ca74050ed4af6.1721300411.git.quan.zhou@mediatek.com Signed-off-by: Felix Fietkau <[email protected]> Signed-off-by: Ajrat Makhmutov <[email protected]> Signed-off-by: Sasha Levin <[email protected]>
Author: Sean Wang <[email protected]> Date: Thu Jun 18 11:14:13 2026 +0300 wifi: mt76: mt7921: fix potential deadlock in mt7921_roc_abort_sync commit d5059e52fd8bc624ec4255c9fa01a266513d126b upstream. roc_abort_sync() can deadlock with roc_work(). roc_work() holds dev->mt76.mutex, while cancel_work_sync() waits for roc_work() to finish. If the caller already owns the same mutex, both sides block and no progress is possible. This deadlock can occur during station removal when mt76_sta_state() -> mt76_sta_remove() -> mt7921_mac_sta_remove() -> mt7921_roc_abort_sync() invokes cancel_work_sync() while roc_work() is still running and holding dev->mt76.mutex. This avoids the mutex deadlock and preserves exactly-once work ownership. Fixes: 352d966126e6 ("wifi: mt76: mt7921: fix a potential association failure upon resuming") Co-developed-by: Quan Zhou <[email protected]> Signed-off-by: Quan Zhou <[email protected]> Signed-off-by: Sean Wang <[email protected]> Link: https://patch.msgid.link/[email protected] Signed-off-by: Felix Fietkau <[email protected]> [Ajrat: keep del_timer_sync() instead of timer_delete_sync() -- the timer API rename is not present in 6.12.y. ] Signed-off-by: Ajrat Makhmutov <[email protected]> Signed-off-by: Sasha Levin <[email protected]>
Author: ElXreno <[email protected]> Date: Wed May 6 04:39:16 2026 +0300 wifi: mt76: mt7925: don't disable AP BSS when removing TDLS peer commit 37d65384aa6f9cbe45f4052b13b378af1aab3e95 upstream. On a STATION vif, removing a TDLS peer takes the mt7925_mac_sta_remove -> mt7925_mac_sta_remove_links path. The first loop in that function calls mt7925_mcu_add_bss_info(..., enable=false) for every link of the station being removed. For a non-MLO STATION vif there is exactly one link, link 0, whose bss_conf is the AP's. TDLS peers do not have their own bss_conf - they share the AP's BSS. The result is that every TDLS peer teardown sends a BSS_INFO_UPDATE with enable=0 for the AP's BSS to the firmware, which wipes the AP-side rate-control context. The connection stays associated and TX from the host still works at the negotiated rate, but the AP's downlink to us collapses to the lowest mandatory OFDM rate (HE-MCS 0 / 6 Mbit/s OFDM) and only slowly recovers as rate adaptation re-learns under sustained traffic. With brief or bursty traffic the link can stay at 6-72 Mbit/s indefinitely, requiring a manual reconnect. mt7925_mac_link_sta_remove() already guards its own mt7925_mcu_add_bss_info(..., false) call with "vif->type == NL80211_IFTYPE_STATION && !link_sta->sta->tdls". Add the equivalent guard at the top of the cleanup loop in mt7925_mac_sta_remove_links(), above the link_sta / link_conf / mlink / mconf lookups, so TDLS peer teardown skips the loop body entirely without doing the per-link work that would just be thrown away. Verified on mt7925e by triggering Samsung-S938B auto-TDLS via iperf3 and watching iw rx bitrate after teardown: Before: rx bitrate collapses to 6.0-72.0 Mbit/s, oscillates 17/72/ 137/288/432 Mbit/s for 30+ seconds, no full recovery without a manual reassoc. After: rx bitrate stays at 1200.9 Mbit/s HE-MCS 11 NSS 2 80 MHz across the entire TDLS lifecycle. bpftrace confirms a single mt7925_mcu_add_bss_info(enable=0) call per teardown before the fix; zero such calls after. Fixes: 3878b4333602 ("wifi: mt76: mt7925: update mt7925_mac_link_sta_[add, assoc, remove] for MLO") Cc: [email protected] Signed-off-by: ElXreno <[email protected]> Assisted-by: Claude:claude-opus-4-7 bpftrace Link: https://patch.msgid.link/[email protected] Signed-off-by: Felix Fietkau <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]>
Author: Bitterblue Smith <[email protected]> Date: Sat Apr 25 22:32:58 2026 +0300 wifi: rtlwifi: rtl8821ae: Fix C2H bit location in RX descriptor commit 83d38df6929118c3f996b9e3351c2d5014073d87 upstream. Bit 28 of double word 2 in the RX descriptor indicates if the packet is a normal 802.11 frame, or a message from the wifi firmware to the driver (Card 2 Host). Commit f5678bfe1cdc ("rtlwifi: rtl8821ae: Replace local bit manipulation macros") mistakenly made the driver look for this bit in double word 1, causing packet loss and Bluetooth coexistence problems. Fixes: f5678bfe1cdc ("rtlwifi: rtl8821ae: Replace local bit manipulation macros") Cc: <[email protected]> Signed-off-by: Bitterblue Smith <[email protected]> Acked-by: Ping-Ke Shih <[email protected]> Signed-off-by: Ping-Ke Shih <[email protected]> Link: https://patch.msgid.link/[email protected] Signed-off-by: Greg Kroah-Hartman <[email protected]>
Author: Luka Gejak <[email protected]> Date: Mon May 18 16:23:10 2026 +0200 wifi: rtw88: increase TX report timeout to fix race condition commit c80788f7c5aed8d420366b821f867a8a353d83a5 upstream. The driver expects the firmware to report TX status within 500ms. However, a timeout can be triggered when the hardware performs background scans while under TX load. During these scans, the firmware stays off-channel for periods exceeding 500ms, delaying the delivery of TX reports back to the driver. When this occurs, the purge timer fires prematurely and drops the tracking skbs from the queue. This results in the host stack interpreting the missing status as packet loss, leading to TCP window collapse. In testing with iperf3, this causes throughput to drop from ~90 Mbps to near-zero for approximately 2 seconds until the connection recovers. Increase RTW_TX_PROBE_TIMEOUT to 2500ms for RTL8723DU. This duration is sufficient to accommodate off-channel dwell time during full background scans, ensuring the purge timer only trips during genuine firmware lockups and preventing unnecessary TCP retransmission cycles. Fixes: a82dfd33d123 ("wifi: rtw88: Add common USB chip support") Cc: [email protected] Acked-by: Ping-Ke Shih <[email protected]> Tested-by: Luka Gejak <[email protected]> Signed-off-by: Luka Gejak <[email protected]> Signed-off-by: Ping-Ke Shih <[email protected]> Link: https://patch.msgid.link/[email protected] Signed-off-by: Greg Kroah-Hartman <[email protected]>
Author: Luka Gejak <[email protected]> Date: Mon May 18 16:23:11 2026 +0200 wifi: rtw88: usb: fix memory leaks on USB write failures commit 6b964941bbfe6e0f18b1a5e008486dbb62df440a upstream. When rtw_usb_write_port() fails to submit a USB Request Block (URB) (e.g., due to device disconnect or ENOMEM), the completion callback is never executed. Currently, the driver ignores the return value of rtw_usb_write_port() in rtw_usb_write_data() and rtw_usb_tx_agg_skb(). Because these functions rely on the completion callback to free the socket buffers (skbs) and the transaction control block (txcb), a submission failure results in: 1. A memory leak of the allocated skb in rtw_usb_write_data(). 2. A memory leak of the txcb structure and all aggregated skbs in rtw_usb_tx_agg_skb(). Fix this by checking the return value of rtw_usb_write_port(). If it fails, explicitly free the skb in rtw_usb_write_data(), and properly purge the tx_ack_queue and free the txcb in rtw_usb_tx_agg_skb(). The issue was discovered in practice during device disconnect/reconnect scenarios and memory pressure conditions. Tested by verifying normal TX operation continues after the fix without regressions. Fixes: a82dfd33d123 ("wifi: rtw88: Add common USB chip support") Cc: [email protected] Acked-by: Ping-Ke Shih <[email protected]> Tested-by: Luka Gejak <[email protected]> Signed-off-by: Luka Gejak <[email protected]> Signed-off-by: Ping-Ke Shih <[email protected]> Link: https://patch.msgid.link/[email protected] Signed-off-by: Greg Kroah-Hartman <[email protected]>
Author: Yingjie Gao <[email protected]> Date: Tue Jun 16 10:02:14 2026 -0400 xfs: fix error returns in CoW fork repair [ Upstream commit fcf4faba9f986b3bb528da11913c9ec5d6e8f689 ] xrep_cow_find_bad() returns success after the cleanup labels even if AG setup, btree queries, or bitmap updates failed. This can make repair continue with an incomplete bad-file-offset bitmap instead of stopping at the original error. The force-rebuild path has a related cleanup problem. If xrep_cow_mark_file_range() fails, the function returns directly and skips the scrub AG context and perag cleanup. Let the force-rebuild path fall through to the existing cleanup code and return the saved error after cleanup. Fixes: dbbdbd008632 ("xfs: repair problems in CoW forks") Cc: <[email protected]> # v6.8 Signed-off-by: Yingjie Gao <[email protected]> Reviewed-by: "Darrick J. Wong" <[email protected]> Signed-off-by: Carlos Maiolino <[email protected]> Signed-off-by: Sasha Levin <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]>
Author: Christoph Hellwig <[email protected]> Date: Tue Jun 16 10:02:13 2026 -0400 xfs: remove the expr argument to XFS_TEST_ERROR [ Upstream commit 807df3227d7674d7957c576551d552acf15bb96f ] Don't pass expr to XFS_TEST_ERROR. Most calls pass a constant false, and the places that do pass an expression become cleaner by moving it out. Signed-off-by: Christoph Hellwig <[email protected]> Reviewed-by: Darrick J. Wong <[email protected]> Signed-off-by: Carlos Maiolino <[email protected]> Stable-dep-of: fcf4faba9f98 ("xfs: fix error returns in CoW fork repair") Signed-off-by: Sasha Levin <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]>