Список изменений

ALSA: hda/realtek: Fix the speaker output on Samsung Galaxy Book2 Pro [+ + +]

Author: Hamidreza H. Fard <[email protected]>
Date:   Tue Mar 7 16:37:41 2023 +0000

    ALSA: hda/realtek: Fix the speaker output on Samsung Galaxy Book2 Pro
    
    commit a86e79e3015f5dd8e1b01ccfa49bd5c6e41047a1 upstream.
    
    Samsung Galaxy Book2 Pro (13" 2022 NP930XED-KA1DE) with codec SSID
    144d:c868 requires the same workaround for enabling the speaker amp
    like other Samsung models with ALC298 code.
    
    Signed-off-by: Hamidreza H. Fard <[email protected]>
    Cc: <[email protected]>
    Link: https://lore.kernel.org/r/[email protected]
    Signed-off-by: Takashi Iwai <[email protected]>
    Signed-off-by: Greg Kroah-Hartman <[email protected]>

ALSA: hda: intel-dsp-config: add MTL PCI id [+ + +]

Author: Bard Liao <[email protected]>
Date:   Mon Mar 6 15:41:01 2023 +0800

    ALSA: hda: intel-dsp-config: add MTL PCI id
    
    commit bbdf904b13a62bb8b1272d92a7dde082dff86fbb upstream.
    
    Use SOF as default audio driver.
    
    Signed-off-by: Bard Liao <[email protected]>
    Reviewed-by: Gongjun Song <[email protected]>
    Reviewed-by: Kai Vehmanen <[email protected]>
    Cc: <[email protected]>
    Link: https://lore.kernel.org/r/[email protected]
    Signed-off-by: Takashi Iwai <[email protected]>
    Signed-off-by: Greg Kroah-Hartman <[email protected]>

ALSA: hda: Match only Intel devices with CONTROLLER_IN_GPU() [+ + +]

Author: Bjorn Helgaas <[email protected]>
Date:   Tue Mar 7 15:40:54 2023 -0600

    ALSA: hda: Match only Intel devices with CONTROLLER_IN_GPU()
    
    [ Upstream commit ff447886e675979d66b2bc01810035d3baea1b3a ]
    
    CONTROLLER_IN_GPU() is clearly intended to match only Intel devices, but
    previously it checked only the PCI Device ID, not the Vendor ID, so it
    could match devices from other vendors that happened to use the same Device
    ID.
    
    Update CONTROLLER_IN_GPU() so it matches only Intel devices.
    
    Fixes: 535115b5ff51 ("ALSA: hda - Abort the probe without i915 binding for HSW/B")
    Signed-off-by: Bjorn Helgaas <[email protected]>
    Link: https://lore.kernel.org/r/[email protected]
    Signed-off-by: Takashi Iwai <[email protected]>
    Signed-off-by: Sasha Levin <[email protected]>

attr: add in_group_or_capable() [+ + +]

Author: Amir Goldstein <[email protected]>
Date:   Sat Mar 18 12:15:24 2023 +0200

    attr: add in_group_or_capable()
    
    commit 11c2a8700cdcabf9b639b7204a1e38e2a0b6798e upstream.
    
    [backported to 5.10.y, prior to idmapped mounts]
    
    In setattr_{copy,prepare}() we need to perform the same permission
    checks to determine whether we need to drop the setgid bit or not.
    Instead of open-coding it twice add a simple helper the encapsulates the
    logic. We will reuse this helpers to make dropping the setgid bit during
    write operations more consistent in a follow up patch.
    
    Reviewed-by: Amir Goldstein <[email protected]>
    Signed-off-by: Christian Brauner (Microsoft) <[email protected]>
    Signed-off-by: Amir Goldstein <[email protected]>
    Signed-off-by: Greg Kroah-Hartman <[email protected]>

attr: add setattr_should_drop_sgid() [+ + +]

Author: Amir Goldstein <[email protected]>
Date:   Sat Mar 18 12:15:26 2023 +0200

    attr: add setattr_should_drop_sgid()
    
    commit 72ae017c5451860443a16fb2a8c243bff3e396b8 upstream.
    
    [backported to 5.10.y, prior to idmapped mounts]
    
    The current setgid stripping logic during write and ownership change
    operations is inconsistent and strewn over multiple places. In order to
    consolidate it and make more consistent we'll add a new helper
    setattr_should_drop_sgid(). The function retains the old behavior where
    we remove the S_ISGID bit unconditionally when S_IXGRP is set but also
    when it isn't set and the caller is neither in the group of the inode
    nor privileged over the inode.
    
    We will use this helper both in write operation permission removal such
    as file_remove_privs() as well as in ownership change operations.
    
    Reviewed-by: Amir Goldstein <[email protected]>
    Signed-off-by: Christian Brauner (Microsoft) <[email protected]>
    Signed-off-by: Amir Goldstein <[email protected]>
    Signed-off-by: Greg Kroah-Hartman <[email protected]>

attr: use consistent sgid stripping checks [+ + +]

Author: Amir Goldstein <[email protected]>
Date:   Sat Mar 18 12:15:27 2023 +0200

    attr: use consistent sgid stripping checks
    
    commit ed5a7047d2011cb6b2bf84ceb6680124cc6a7d95 upstream.
    
    [backported to 5.10.y, prior to idmapped mounts]
    
    Currently setgid stripping in file_remove_privs()'s should_remove_suid()
    helper is inconsistent with other parts of the vfs. Specifically, it only
    raises ATTR_KILL_SGID if the inode is S_ISGID and S_IXGRP but not if the
    inode isn't in the caller's groups and the caller isn't privileged over the
    inode although we require this already in setattr_prepare() and
    setattr_copy() and so all filesystem implement this requirement implicitly
    because they have to use setattr_{prepare,copy}() anyway.
    
    But the inconsistency shows up in setgid stripping bugs for overlayfs in
    xfstests (e.g., generic/673, generic/683, generic/685, generic/686,
    generic/687). For example, we test whether suid and setgid stripping works
    correctly when performing various write-like operations as an unprivileged
    user (fallocate, reflink, write, etc.):
    
    echo "Test 1 - qa_user, non-exec file $verb"
    setup_testfile
    chmod a+rws $junk_file
    commit_and_check "$qa_user" "$verb" 64k 64k
    
    The test basically creates a file with 6666 permissions. While the file has
    the S_ISUID and S_ISGID bits set it does not have the S_IXGRP set. On a
    regular filesystem like xfs what will happen is:
    
    sys_fallocate()
    -> vfs_fallocate()
       -> xfs_file_fallocate()
          -> file_modified()
             -> __file_remove_privs()
                -> dentry_needs_remove_privs()
                   -> should_remove_suid()
                -> __remove_privs()
                   newattrs.ia_valid = ATTR_FORCE | kill;
                   -> notify_change()
                      -> setattr_copy()
    
    In should_remove_suid() we can see that ATTR_KILL_SUID is raised
    unconditionally because the file in the test has S_ISUID set.
    
    But we also see that ATTR_KILL_SGID won't be set because while the file
    is S_ISGID it is not S_IXGRP (see above) which is a condition for
    ATTR_KILL_SGID being raised.
    
    So by the time we call notify_change() we have attr->ia_valid set to
    ATTR_KILL_SUID | ATTR_FORCE. Now notify_change() sees that
    ATTR_KILL_SUID is set and does:
    
    ia_valid = attr->ia_valid |= ATTR_MODE
    attr->ia_mode = (inode->i_mode & ~S_ISUID);
    
    which means that when we call setattr_copy() later we will definitely
    update inode->i_mode. Note that attr->ia_mode still contains S_ISGID.
    
    Now we call into the filesystem's ->setattr() inode operation which will
    end up calling setattr_copy(). Since ATTR_MODE is set we will hit:
    
    if (ia_valid & ATTR_MODE) {
            umode_t mode = attr->ia_mode;
            vfsgid_t vfsgid = i_gid_into_vfsgid(mnt_userns, inode);
            if (!vfsgid_in_group_p(vfsgid) &&
                !capable_wrt_inode_uidgid(mnt_userns, inode, CAP_FSETID))
                    mode &= ~S_ISGID;
            inode->i_mode = mode;
    }
    
    and since the caller in the test is neither capable nor in the group of the
    inode the S_ISGID bit is stripped.
    
    But assume the file isn't suid then ATTR_KILL_SUID won't be raised which
    has the consequence that neither the setgid nor the suid bits are stripped
    even though it should be stripped because the inode isn't in the caller's
    groups and the caller isn't privileged over the inode.
    
    If overlayfs is in the mix things become a bit more complicated and the bug
    shows up more clearly. When e.g., ovl_setattr() is hit from
    ovl_fallocate()'s call to file_remove_privs() then ATTR_KILL_SUID and
    ATTR_KILL_SGID might be raised but because the check in notify_change() is
    questioning the ATTR_KILL_SGID flag again by requiring S_IXGRP for it to be
    stripped the S_ISGID bit isn't removed even though it should be stripped:
    
    sys_fallocate()
    -> vfs_fallocate()
       -> ovl_fallocate()
          -> file_remove_privs()
             -> dentry_needs_remove_privs()
                -> should_remove_suid()
             -> __remove_privs()
                newattrs.ia_valid = ATTR_FORCE | kill;
                -> notify_change()
                   -> ovl_setattr()
                      // TAKE ON MOUNTER'S CREDS
                      -> ovl_do_notify_change()
                         -> notify_change()
                      // GIVE UP MOUNTER'S CREDS
         // TAKE ON MOUNTER'S CREDS
         -> vfs_fallocate()
            -> xfs_file_fallocate()
               -> file_modified()
                  -> __file_remove_privs()
                     -> dentry_needs_remove_privs()
                        -> should_remove_suid()
                     -> __remove_privs()
                        newattrs.ia_valid = attr_force | kill;
                        -> notify_change()
    
    The fix for all of this is to make file_remove_privs()'s
    should_remove_suid() helper to perform the same checks as we already
    require in setattr_prepare() and setattr_copy() and have notify_change()
    not pointlessly requiring S_IXGRP again. It doesn't make any sense in the
    first place because the caller must calculate the flags via
    should_remove_suid() anyway which would raise ATTR_KILL_SGID.
    
    While we're at it we move should_remove_suid() from inode.c to attr.c
    where it belongs with the rest of the iattr helpers. Especially since it
    returns ATTR_KILL_S{G,U}ID flags. We also rename it to
    setattr_should_drop_suidgid() to better reflect that it indicates both
    setuid and setgid bit removal and also that it returns attr flags.
    
    Running xfstests with this doesn't report any regressions. We should really
    try and use consistent checks.
    
    Reviewed-by: Amir Goldstein <[email protected]>
    Signed-off-by: Christian Brauner (Microsoft) <[email protected]>
    Signed-off-by: Amir Goldstein <[email protected]>
    Signed-off-by: Greg Kroah-Hartman <[email protected]>

block: null_blk: Fix handling of fake timeout request [+ + +]

Author: Damien Le Moal <[email protected]>
Date:   Tue Mar 14 13:11:05 2023 +0900

    block: null_blk: Fix handling of fake timeout request
    
    [ Upstream commit 63f886597085f346276e3b3c8974de0100d65f32 ]
    
    When injecting a fake timeout into the null_blk driver using
    fail_io_timeout, the request timeout handler does not execute
    blk_mq_complete_request(), so the complete callback is never executed
    for a timedout request.
    
    The null_blk driver also has a driver-specific fake timeout mechanism
    which does not have this problem. Fix the problem with fail_io_timeout
    by using the same meachanism as null_blk internal timeout feature, using
    the fake_timeout field of null_blk commands.
    
    Reported-by: Akinobu Mita <[email protected]>
    Fixes: de3510e52b0a ("null_blk: fix command timeout completion handling")
    Signed-off-by: Damien Le Moal <[email protected]>
    Reviewed-by: Johannes Thumshirn <[email protected]>
    Link: https://lore.kernel.org/r/[email protected]
    Signed-off-by: Jens Axboe <[email protected]>
    Signed-off-by: Sasha Levin <[email protected]>

block: sunvdc: add check for mdesc_grab() returning NULL [+ + +]

Author: Liang He <[email protected]>
Date:   Wed Mar 15 14:20:32 2023 +0800

    block: sunvdc: add check for mdesc_grab() returning NULL
    
    [ Upstream commit 6030363199e3a6341afb467ddddbed56640cbf6a ]
    
    In vdc_port_probe(), we should check the return value of mdesc_grab() as
    it may return NULL, which can cause potential NPD bug.
    
    Fixes: 43fdf27470b2 ("[SPARC64]: Abstract out mdesc accesses for better MD update handling.")
    Signed-off-by: Liang He <[email protected]>
    Link: https://lore.kernel.org/r/[email protected]
    [axboe: style cleanup]
    Signed-off-by: Jens Axboe <[email protected]>
    Signed-off-by: Sasha Levin <[email protected]>

cifs: Fix smb2_set_path_size() [+ + +]

Author: Volker Lendecke <[email protected]>
Date:   Mon Mar 13 16:09:54 2023 +0100

    cifs: Fix smb2_set_path_size()
    
    commit 211baef0eabf4169ce4f73ebd917749d1a7edd74 upstream.
    
    If cifs_get_writable_path() finds a writable file, smb2_compound_op()
    must use that file's FID and not the COMPOUND_FID.
    
    Cc: [email protected]
    Signed-off-by: Volker Lendecke <[email protected]>
    Reviewed-by: Paulo Alcantara (SUSE) <[email protected]>
    Signed-off-by: Steve French <[email protected]>
    Signed-off-by: Greg Kroah-Hartman <[email protected]>

cifs: Move the in_send statistic to __smb_send_rqst() [+ + +]

Author: Zhang Xiaoxu <[email protected]>
Date:   Wed Nov 16 11:11:36 2022 +0800

    cifs: Move the in_send statistic to __smb_send_rqst()
    
    [ Upstream commit d0dc41119905f740e8d5594adce277f7c0de8c92 ]
    
    When send SMB_COM_NT_CANCEL and RFC1002_SESSION_REQUEST, the
    in_send statistic was lost.
    
    Let's move the in_send statistic to the send function to avoid
    this scenario.
    
    Fixes: 7ee1af765dfa ("[CIFS]")
    Signed-off-by: Zhang Xiaoxu <[email protected]>
    Signed-off-by: Steve French <[email protected]>
    Signed-off-by: Sasha Levin <[email protected]>

clk: HI655X: select REGMAP instead of depending on it [+ + +]

Author: Randy Dunlap <[email protected]>
Date:   Sat Feb 25 21:39:47 2023 -0800

    clk: HI655X: select REGMAP instead of depending on it
    
    [ Upstream commit 0ffad67784a097beccf34d297ddd1b0773b3b8a3 ]
    
    REGMAP is a hidden (not user visible) symbol. Users cannot set it
    directly thru "make *config", so drivers should select it instead of
    depending on it if they need it.
    
    Consistently using "select" or "depends on" can also help reduce
    Kconfig circular dependency issues.
    
    Therefore, change the use of "depends on REGMAP" to "select REGMAP".
    
    Fixes: 3a49afb84ca0 ("clk: enable hi655x common clk automatically")
    Signed-off-by: Randy Dunlap <[email protected]>
    Cc: Riku Voipio <[email protected]>
    Cc: Stephen Boyd <[email protected]>
    Cc: Michael Turquette <[email protected]>
    Cc: [email protected]
    Link: https://lore.kernel.org/r/[email protected]
    Signed-off-by: Stephen Boyd <[email protected]>
    Signed-off-by: Sasha Levin <[email protected]>

cpuidle: psci: Iterate backwards over list in psci_pd_remove() [+ + +]

Author: Shawn Guo <[email protected]>
Date:   Sat Mar 4 15:41:07 2023 +0800

    cpuidle: psci: Iterate backwards over list in psci_pd_remove()
    
    commit 6b0313c2fa3d2cf991c9ffef6fae6e7ef592ce6d upstream.
    
    In case that psci_pd_init_topology() fails for some reason,
    psci_pd_remove() will be responsible for deleting provider and removing
    genpd from psci_pd_providers list.  There will be a failure when removing
    the cluster PD, because the cpu (child) PDs haven't been removed.
    
    [    0.050232] CPUidle PSCI: init PM domain cpu0
    [    0.050278] CPUidle PSCI: init PM domain cpu1
    [    0.050329] CPUidle PSCI: init PM domain cpu2
    [    0.050370] CPUidle PSCI: init PM domain cpu3
    [    0.050422] CPUidle PSCI: init PM domain cpu-cluster0
    [    0.050475] PM: genpd_remove: unable to remove cpu-cluster0
    [    0.051412] PM: genpd_remove: removed cpu3
    [    0.051449] PM: genpd_remove: removed cpu2
    [    0.051499] PM: genpd_remove: removed cpu1
    [    0.051546] PM: genpd_remove: removed cpu0
    
    Fix the problem by iterating the provider list reversely, so that parent
    PD gets removed after child's PDs like below.
    
    [    0.029052] CPUidle PSCI: init PM domain cpu0
    [    0.029076] CPUidle PSCI: init PM domain cpu1
    [    0.029103] CPUidle PSCI: init PM domain cpu2
    [    0.029124] CPUidle PSCI: init PM domain cpu3
    [    0.029151] CPUidle PSCI: init PM domain cpu-cluster0
    [    0.029647] PM: genpd_remove: removed cpu0
    [    0.029666] PM: genpd_remove: removed cpu1
    [    0.029690] PM: genpd_remove: removed cpu2
    [    0.029714] PM: genpd_remove: removed cpu3
    [    0.029738] PM: genpd_remove: removed cpu-cluster0
    
    Fixes: a65a397f2451 ("cpuidle: psci: Add support for PM domains by using genpd")
    Reviewed-by: Sudeep Holla <[email protected]>
    Reviewed-by: Ulf Hansson <[email protected]>
    Signed-off-by: Shawn Guo <[email protected]>
    Cc: 5.10+ <[email protected]> # 5.10+
    Signed-off-by: Rafael J. Wysocki <[email protected]>
    Signed-off-by: Greg Kroah-Hartman <[email protected]>

docs: Correct missing "d_" prefix for dentry_operations member d_weak_revalidate [+ + +]

Author: Glenn Washburn <[email protected]>
Date:   Mon Feb 27 12:40:42 2023 -0600

    docs: Correct missing "d_" prefix for dentry_operations member d_weak_revalidate
    
    [ Upstream commit 74596085796fae0cfce3e42ee46bf4f8acbdac55 ]
    
    The details for struct dentry_operations member d_weak_revalidate is
    missing a "d_" prefix.
    
    Fixes: af96c1e304f7 ("docs: filesystems: vfs: Convert vfs.txt to RST")
    Signed-off-by: Glenn Washburn <[email protected]>
    Reviewed-by: Matthew Wilcox (Oracle) <[email protected]>
    Link: https://lore.kernel.org/r/[email protected]
    Signed-off-by: Jonathan Corbet <[email protected]>
    Signed-off-by: Sasha Levin <[email protected]>

drm/amd/display: fix shift-out-of-bounds in CalculateVMAndRowBytes [+ + +]

Author: Alex Hung <[email protected]>
Date:   Wed Jan 11 09:54:11 2023 -0700

    drm/amd/display: fix shift-out-of-bounds in CalculateVMAndRowBytes
    
    [ Upstream commit 031f196d1b1b6d5dfcb0533b431e3ab1750e6189 ]
    
    [WHY]
    When PTEBufferSizeInRequests is zero, UBSAN reports the following
    warning because dml_log2 returns an unexpected negative value:
    
      shift exponent 4294966273 is too large for 32-bit type 'int'
    
    [HOW]
    
    In the case PTEBufferSizeInRequests is zero, skip the dml_log2() and
    assign the result directly.
    
    Reviewed-by: Jun Lei <[email protected]>
    Acked-by: Qingqing Zhuo <[email protected]>
    Signed-off-by: Alex Hung <[email protected]>
    Tested-by: Daniel Wheeler <[email protected]>
    Signed-off-by: Alex Deucher <[email protected]>
    Signed-off-by: Sasha Levin <[email protected]>

drm/amdkfd: Fix an illegal memory access [+ + +]

Author: Qu Huang <[email protected]>
Date:   Tue Feb 21 11:35:16 2023 +0000

    drm/amdkfd: Fix an illegal memory access
    
    [ Upstream commit 4fc8fff378b2f2039f2a666d9f8c570f4e58352c ]
    
    In the kfd_wait_on_events() function, the kfd_event_waiter structure is
    allocated by alloc_event_waiters(), but the event field of the waiter
    structure is not initialized; When copy_from_user() fails in the
    kfd_wait_on_events() function, it will enter exception handling to
    release the previously allocated memory of the waiter structure;
    Due to the event field of the waiters structure being accessed
    in the free_waiters() function, this results in illegal memory access
    and system crash, here is the crash log:
    
    localhost kernel: RIP: 0010:native_queued_spin_lock_slowpath+0x185/0x1e0
    localhost kernel: RSP: 0018:ffffaa53c362bd60 EFLAGS: 00010082
    localhost kernel: RAX: ff3d3d6bff4007cb RBX: 0000000000000282 RCX: 00000000002c0000
    localhost kernel: RDX: ffff9e855eeacb80 RSI: 000000000000279c RDI: ffffe7088f6a21d0
    localhost kernel: RBP: ffffe7088f6a21d0 R08: 00000000002c0000 R09: ffffaa53c362be64
    localhost kernel: R10: ffffaa53c362bbd8 R11: 0000000000000001 R12: 0000000000000002
    localhost kernel: R13: ffff9e7ead15d600 R14: 0000000000000000 R15: ffff9e7ead15d698
    localhost kernel: FS:  0000152a3d111700(0000) GS:ffff9e855ee80000(0000) knlGS:0000000000000000
    localhost kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
    localhost kernel: CR2: 0000152938000010 CR3: 000000044d7a4000 CR4: 00000000003506e0
    localhost kernel: Call Trace:
    localhost kernel: _raw_spin_lock_irqsave+0x30/0x40
    localhost kernel: remove_wait_queue+0x12/0x50
    localhost kernel: kfd_wait_on_events+0x1b6/0x490 [hydcu]
    localhost kernel: ? ftrace_graph_caller+0xa0/0xa0
    localhost kernel: kfd_ioctl+0x38c/0x4a0 [hydcu]
    localhost kernel: ? kfd_ioctl_set_trap_handler+0x70/0x70 [hydcu]
    localhost kernel: ? kfd_ioctl_create_queue+0x5a0/0x5a0 [hydcu]
    localhost kernel: ? ftrace_graph_caller+0xa0/0xa0
    localhost kernel: __x64_sys_ioctl+0x8e/0xd0
    localhost kernel: ? syscall_trace_enter.isra.18+0x143/0x1b0
    localhost kernel: do_syscall_64+0x33/0x80
    localhost kernel: entry_SYSCALL_64_after_hwframe+0x44/0xa9
    localhost kernel: RIP: 0033:0x152a4dff68d7
    
    Allocate the structure with kcalloc, and remove redundant 0-initialization
    and a redundant loop condition check.
    
    Signed-off-by: Qu Huang <[email protected]>
    Signed-off-by: Felix Kuehling <[email protected]>
    Reviewed-by: Felix Kuehling <[email protected]>
    Signed-off-by: Alex Deucher <[email protected]>
    Signed-off-by: Sasha Levin <[email protected]>

drm/bridge: Fix returned array size name for atomic_get_input_bus_fmts kdoc [+ + +]

Author: Liu Ying <[email protected]>
Date:   Tue Mar 14 13:50:35 2023 +0800

    drm/bridge: Fix returned array size name for atomic_get_input_bus_fmts kdoc
    
    [ Upstream commit 0d3c9333d976af41d7dbc6bf4d9d2e95fbdf9c89 ]
    
    The returned array size for input formats is set through
    atomic_get_input_bus_fmts()'s 'num_input_fmts' argument, so use
    'num_input_fmts' to represent the array size in the function's kdoc,
    not 'num_output_fmts'.
    
    Fixes: 91ea83306bfa ("drm/bridge: Fix the bridge kernel doc")
    Fixes: f32df58acc68 ("drm/bridge: Add the necessary bits to support bus format negotiation")
    Signed-off-by: Liu Ying <[email protected]>
    Reviewed-by: Robert Foss <[email protected]>
    Signed-off-by: Neil Armstrong <[email protected]>
    Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
    Signed-off-by: Sasha Levin <[email protected]>

drm/i915/active: Fix misuse of non-idle barriers as fence trackers [+ + +]

Author: Janusz Krzysztofik <[email protected]>
Date:   Thu Mar 2 13:08:20 2023 +0100

    drm/i915/active: Fix misuse of non-idle barriers as fence trackers
    
    commit e0e6b416b25ee14716f3549e0cbec1011b193809 upstream.
    
    Users reported oopses on list corruptions when using i915 perf with a
    number of concurrently running graphics applications.  Root cause analysis
    pointed at an issue in barrier processing code -- a race among perf open /
    close replacing active barriers with perf requests on kernel context and
    concurrent barrier preallocate / acquire operations performed during user
    context first pin / last unpin.
    
    When adding a request to a composite tracker, we try to reuse an existing
    fence tracker, already allocated and registered with that composite.  The
    tracker we obtain may already track another fence, may be an idle barrier,
    or an active barrier.
    
    If the tracker we get occurs a non-idle barrier then we try to delete that
    barrier from a list of barrier tasks it belongs to.  However, while doing
    that we don't respect return value from a function that performs the
    barrier deletion.  Should the deletion ever fail, we would end up reusing
    the tracker still registered as a barrier task.  Since the same structure
    field is reused with both fence callback lists and barrier tasks list,
    list corruptions would likely occur.
    
    Barriers are now deleted from a barrier tasks list by temporarily removing
    the list content, traversing that content with skip over the node to be
    deleted, then populating the list back with the modified content.  Should
    that intentionally racy concurrent deletion attempts be not serialized,
    one or more of those may fail because of the list being temporary empty.
    
    Related code that ignores the results of barrier deletion was initially
    introduced in v5.4 by commit d8af05ff38ae ("drm/i915: Allow sharing the
    idle-barrier from other kernel requests").  However, all users of the
    barrier deletion routine were apparently serialized at that time, then the
    issue didn't exhibit itself.  Results of git bisect with help of a newly
    developed igt@gem_barrier_race@remote-request IGT test indicate that list
    corruptions might start to appear after commit 311770173fac ("drm/i915/gt:
    Schedule request retirement when timeline idles"), introduced in v5.5.
    
    Respect results of barrier deletion attempts -- mark the barrier as idle
    only if successfully deleted from the list.  Then, before proceeding with
    setting our fence as the one currently tracked, make sure that the tracker
    we've got is not a non-idle barrier.  If that check fails then don't use
    that tracker but go back and try to acquire a new, usable one.
    
    v3: use unlikely() to document what outcome we expect (Andi),
      - fix bad grammar in commit description.
    v2: no code changes,
      - blame commit 311770173fac ("drm/i915/gt: Schedule request retirement
        when timeline idles"), v5.5, not commit d8af05ff38ae ("drm/i915: Allow
        sharing the idle-barrier from other kernel requests"), v5.4,
      - reword commit description.
    
    Closes: https://gitlab.freedesktop.org/drm/intel/-/issues/6333
    Fixes: 311770173fac ("drm/i915/gt: Schedule request retirement when timeline idles")
    Cc: Chris Wilson <[email protected]>
    Cc: [email protected] # v5.5
    Cc: Andi Shyti <[email protected]>
    Signed-off-by: Janusz Krzysztofik <[email protected]>
    Reviewed-by: Andi Shyti <[email protected]>
    Signed-off-by: Andi Shyti <[email protected]>
    Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
    (cherry picked from commit 506006055769b10d1b2b4e22f636f3b45e0e9fc7)
    Signed-off-by: Jani Nikula <[email protected]>
    Signed-off-by: Janusz Krzysztofik <[email protected]>
    Signed-off-by: Greg Kroah-Hartman <[email protected]>

drm/i915: Don't use stolen memory for ring buffers with LLC [+ + +]

Author: John Harrison <[email protected]>
Date:   Wed Feb 15 17:11:00 2023 -0800

    drm/i915: Don't use stolen memory for ring buffers with LLC
    
    commit 690e0ec8e63da9a29b39fedc6ed5da09c7c82651 upstream.
    
    Direction from hardware is that stolen memory should never be used for
    ring buffer allocations on platforms with LLC. There are too many
    caching pitfalls due to the way stolen memory accesses are routed. So
    it is safest to just not use it.
    
    Signed-off-by: John Harrison <[email protected]>
    Fixes: c58b735fc762 ("drm/i915: Allocate rings from stolen")
    Cc: Chris Wilson <[email protected]>
    Cc: Joonas Lahtinen <[email protected]>
    Cc: Jani Nikula <[email protected]>
    Cc: Rodrigo Vivi <[email protected]>
    Cc: Tvrtko Ursulin <[email protected]>
    Cc: [email protected]
    Cc: <[email protected]> # v4.9+
    Tested-by: Jouni Hц╤gander <[email protected]>
    Reviewed-by: Daniele Ceraolo Spurio <[email protected]>
    Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
    (cherry picked from commit f54c1f6c697c4297f7ed94283c184acc338a5cf8)
    Signed-off-by: Jani Nikula <[email protected]>
    Signed-off-by: John Harrison <[email protected]>
    Signed-off-by: Greg Kroah-Hartman <[email protected]>

drm/meson: fix 1px pink line on GXM when scaling video overlay [+ + +]

Author: Christian Hewitt <[email protected]>
Date:   Fri Mar 3 12:33:12 2023 +0000

    drm/meson: fix 1px pink line on GXM when scaling video overlay
    
    [ Upstream commit 5c8cf1664f288098a971a1d1e65716a2b6a279e1 ]
    
    Playing media with a resolution smaller than the crtc size requires the
    video overlay to be scaled for output and GXM boards display a 1px pink
    line on the bottom of the scaled overlay. Comparing with the downstream
    vendor driver revealed VPP_DUMMY_DATA not being set [0].
    
    Setting VPP_DUMMY_DATA prevents the 1px pink line from being seen.
    
    [0] https://github.com/endlessm/linux-s905x/blob/master/drivers/amlogic/amports/video.c#L7869
    
    Fixes: bbbe775ec5b5 ("drm: Add support for Amlogic Meson Graphic Controller")
    Suggested-by: Martin Blumenstingl <[email protected]>
    Signed-off-by: Christian Hewitt <[email protected]>
    Acked-by: Martin Blumenstingl <[email protected]>
    Signed-off-by: Neil Armstrong <[email protected]>
    Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
    Signed-off-by: Sasha Levin <[email protected]>

drm/panfrost: Don't sync rpm suspension after mmu flushing [+ + +]

Author: Dmitry Osipenko <[email protected]>
Date:   Thu Nov 17 04:40:38 2022 +0300

    drm/panfrost: Don't sync rpm suspension after mmu flushing
    
    [ Upstream commit ba3be66f11c3c49afaa9f49b99e21d88756229ef ]
    
    Lockdep warns about potential circular locking dependency of devfreq
    with the fs_reclaim caused by immediate device suspension when mapping is
    released by shrinker. Fix it by doing the suspension asynchronously.
    
    Reviewed-by: Steven Price <[email protected]>
    Fixes: ec7eba47da86 ("drm/panfrost: Rework page table flushing and runtime PM interaction")
    Signed-off-by: Dmitry Osipenko <[email protected]>
    Link: https://lore.kernel.org/all/[email protected]/
    Signed-off-by: Sasha Levin <[email protected]>

drm/shmem-helper: Remove another errant put in error path [+ + +]

Author: Dmitry Osipenko <[email protected]>
Date:   Mon Jan 9 00:13:11 2023 +0300

    drm/shmem-helper: Remove another errant put in error path
    
    commit ee9adb7a45516cfa536ca92253d7ae59d56db9e4 upstream.
    
    drm_gem_shmem_mmap() doesn't own reference in error code path, resulting
    in the dma-buf shmem GEM object getting prematurely freed leading to a
    later use-after-free.
    
    Fixes: f49a51bfdc8e ("drm/shme-helpers: Fix dma_buf_mmap forwarding bug")
    Cc: [email protected]
    Signed-off-by: Dmitry Osipenko <[email protected]>
    Reviewed-by: Rob Clark <[email protected]>
    Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
    Signed-off-by: Greg Kroah-Hartman <[email protected]>

ethernet: sun: add check for the mdesc_grab() [+ + +]

Author: Liang He <[email protected]>
Date:   Wed Mar 15 14:00:21 2023 +0800

    ethernet: sun: add check for the mdesc_grab()
    
    [ Upstream commit 90de546d9a0b3c771667af18bb3f80567eabb89b ]
    
    In vnet_port_probe() and vsw_port_probe(), we should
    check the return value of mdesc_grab() as it may
    return NULL which can caused NPD bugs.
    
    Fixes: 5d01fa0c6bd8 ("ldmvsw: Add ldmvsw.c driver code")
    Fixes: 43fdf27470b2 ("[SPARC64]: Abstract out mdesc accesses for better MD update handling.")
    Signed-off-by: Liang He <[email protected]>
    Reviewed-by: Piotr Raczynski <[email protected]>
    Signed-off-by: David S. Miller <[email protected]>
    Signed-off-by: Sasha Levin <[email protected]>

ext4: fail ext4_iget if special inode unallocated [+ + +]

Author: Baokun Li <[email protected]>
Date:   Sat Jan 7 11:21:25 2023 +0800

    ext4: fail ext4_iget if special inode unallocated
    
    [ Upstream commit 5cd740287ae5e3f9d1c46f5bfe8778972fd6d3fe ]
    
    In ext4_fill_super(), EXT4_ORPHAN_FS flag is cleared after
    ext4_orphan_cleanup() is executed. Therefore, when __ext4_iget() is
    called to get an inode whose i_nlink is 0 when the flag exists, no error
    is returned. If the inode is a special inode, a null pointer dereference
    may occur. If the value of i_nlink is 0 for any inodes (except boot loader
    inodes) got by using the EXT4_IGET_SPECIAL flag, the current file system
    is corrupted. Therefore, make the ext4_iget() function return an error if
    it gets such an abnormal special inode.
    
    Link: https://bugzilla.kernel.org/show_bug.cgi?id=199179
    Link: https://bugzilla.kernel.org/show_bug.cgi?id=216541
    Link: https://bugzilla.kernel.org/show_bug.cgi?id=216539
    Reported-by: Luц╜s Henriques <[email protected]>
    Suggested-by: Theodore Ts'o <[email protected]>
    Signed-off-by: Baokun Li <[email protected]>
    Reviewed-by: Jan Kara <[email protected]>
    Link: https://lore.kernel.org/r/[email protected]
    Signed-off-by: Theodore Ts'o <[email protected]>
    Signed-off-by: Sasha Levin <[email protected]>

ext4: fix possible double unlock when moving a directory [+ + +]

Author: Theodore Ts'o <[email protected]>
Date:   Fri Mar 17 21:53:52 2023 -0400

    ext4: fix possible double unlock when moving a directory
    
    commit 70e42feab2e20618ddd0cbfc4ab4b08628236ecd upstream.
    
    Fixes: 0813299c586b ("ext4: Fix possible corruption when moving a directory")
    Link: https://lore.kernel.org/r/[email protected]
    Reported-by: Dan Carpenter <[email protected]>
    Reported-by: [email protected]
    Signed-off-by: Theodore Ts'o <[email protected]>
    Signed-off-by: Greg Kroah-Hartman <[email protected]>

ext4: fix task hung in ext4_xattr_delete_inode [+ + +]

Author: Baokun Li <[email protected]>
Date:   Tue Jan 10 21:34:36 2023 +0800

    ext4: fix task hung in ext4_xattr_delete_inode
    
    [ Upstream commit 0f7bfd6f8164be32dbbdf36aa1e5d00485c53cd7 ]
    
    Syzbot reported a hung task problem:
    ==================================================================
    INFO: task syz-executor232:5073 blocked for more than 143 seconds.
          Not tainted 6.2.0-rc2-syzkaller-00024-g512dee0c00ad #0
    "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
    task:syz-exec232 state:D stack:21024 pid:5073 ppid:5072 flags:0x00004004
    Call Trace:
     <TASK>
     context_switch kernel/sched/core.c:5244 [inline]
     __schedule+0x995/0xe20 kernel/sched/core.c:6555
     schedule+0xcb/0x190 kernel/sched/core.c:6631
     __wait_on_freeing_inode fs/inode.c:2196 [inline]
     find_inode_fast+0x35a/0x4c0 fs/inode.c:950
     iget_locked+0xb1/0x830 fs/inode.c:1273
     __ext4_iget+0x22e/0x3ed0 fs/ext4/inode.c:4861
     ext4_xattr_inode_iget+0x68/0x4e0 fs/ext4/xattr.c:389
     ext4_xattr_inode_dec_ref_all+0x1a7/0xe50 fs/ext4/xattr.c:1148
     ext4_xattr_delete_inode+0xb04/0xcd0 fs/ext4/xattr.c:2880
     ext4_evict_inode+0xd7c/0x10b0 fs/ext4/inode.c:296
     evict+0x2a4/0x620 fs/inode.c:664
     ext4_orphan_cleanup+0xb60/0x1340 fs/ext4/orphan.c:474
     __ext4_fill_super fs/ext4/super.c:5516 [inline]
     ext4_fill_super+0x81cd/0x8700 fs/ext4/super.c:5644
     get_tree_bdev+0x400/0x620 fs/super.c:1282
     vfs_get_tree+0x88/0x270 fs/super.c:1489
     do_new_mount+0x289/0xad0 fs/namespace.c:3145
     do_mount fs/namespace.c:3488 [inline]
     __do_sys_mount fs/namespace.c:3697 [inline]
     __se_sys_mount+0x2d3/0x3c0 fs/namespace.c:3674
     do_syscall_x64 arch/x86/entry/common.c:50 [inline]
     do_syscall_64+0x3d/0xb0 arch/x86/entry/common.c:80
     entry_SYSCALL_64_after_hwframe+0x63/0xcd
    RIP: 0033:0x7fa5406fd5ea
    RSP: 002b:00007ffc7232f968 EFLAGS: 00000202 ORIG_RAX: 00000000000000a5
    RAX: ffffffffffffffda RBX: 0000000000000003 RCX: 00007fa5406fd5ea
    RDX: 0000000020000440 RSI: 0000000020000000 RDI: 00007ffc7232f970
    RBP: 00007ffc7232f970 R08: 00007ffc7232f9b0 R09: 0000000000000432
    R10: 0000000000804a03 R11: 0000000000000202 R12: 0000000000000004
    R13: 0000555556a7a2c0 R14: 00007ffc7232f9b0 R15: 0000000000000000
     </TASK>
    ==================================================================
    
    The problem is that the inode contains an xattr entry with ea_inum of 15
    when cleaning up an orphan inode <15>. When evict inode <15>, the reference
    counting of the corresponding EA inode is decreased. When EA inode <15> is
    found by find_inode_fast() in __ext4_iget(), it is found that the EA inode
    holds the I_FREEING flag and waits for the EA inode to complete deletion.
    As a result, when inode <15> is being deleted, we wait for inode <15> to
    complete the deletion, resulting in an infinite loop and triggering Hung
    Task. To solve this problem, we only need to check whether the ino of EA
    inode and parent is the same before getting EA inode.
    
    Link: https://syzkaller.appspot.com/bug?extid=77d6fcc37bbb92f26048
    Reported-by: [email protected]
    Signed-off-by: Baokun Li <[email protected]>
    Reviewed-by: Jan Kara <[email protected]>
    Link: https://lore.kernel.org/r/[email protected]
    Signed-off-by: Theodore Ts'o <[email protected]>
    Signed-off-by: Sasha Levin <[email protected]>

fbdev: stifb: Provide valid pixelclock and add fb_check_var() checks [+ + +]

Author: Helge Deller <[email protected]>
Date:   Thu Mar 16 11:38:19 2023 +0100

    fbdev: stifb: Provide valid pixelclock and add fb_check_var() checks
    
    commit 203873a535d627c668f293be0cb73e26c30f9cc7 upstream.
    
    Find a valid modeline depending on the machine graphic card
    configuration and add the fb_check_var() function to validate
    Xorg provided graphics settings.
    
    Signed-off-by: Helge Deller <[email protected]>
    Cc: [email protected]
    Signed-off-by: Greg Kroah-Hartman <[email protected]>

firmware: xilinx: don't make a sleepable memory allocation from an atomic context [+ + +]

Author: Roman Gushchin <[email protected]>
Date:   Wed Mar 8 14:26:02 2023 -0800

    firmware: xilinx: don't make a sleepable memory allocation from an atomic context
    
    commit 38ed310c22e7a0fc978b1f8292136a4a4a8b3051 upstream.
    
    The following issue was discovered using lockdep:
    [    6.691371] BUG: sleeping function called from invalid context at include/linux/sched/mm.h:209
    [    6.694602] in_atomic(): 1, irqs_disabled(): 128, non_block: 0, pid: 1, name: swapper/0
    [    6.702431] 2 locks held by swapper/0/1:
    [    6.706300]  #0: ffffff8800f6f188 (&dev->mutex){....}-{3:3}, at: __device_driver_lock+0x4c/0x90
    [    6.714900]  #1: ffffffc009a2abb8 (enable_lock){....}-{2:2}, at: clk_enable_lock+0x4c/0x140
    [    6.723156] irq event stamp: 304030
    [    6.726596] hardirqs last  enabled at (304029): [<ffffffc008d17ee0>] _raw_spin_unlock_irqrestore+0xc0/0xd0
    [    6.736142] hardirqs last disabled at (304030): [<ffffffc00876bc5c>] clk_enable_lock+0xfc/0x140
    [    6.744742] softirqs last  enabled at (303958): [<ffffffc0080904f0>] _stext+0x4f0/0x894
    [    6.752655] softirqs last disabled at (303951): [<ffffffc0080e53b8>] irq_exit+0x238/0x280
    [    6.760744] CPU: 1 PID: 1 Comm: swapper/0 Tainted: G     U            5.15.36 #2
    [    6.768048] Hardware name: xlnx,zynqmp (DT)
    [    6.772179] Call trace:
    [    6.774584]  dump_backtrace+0x0/0x300
    [    6.778197]  show_stack+0x18/0x30
    [    6.781465]  dump_stack_lvl+0xb8/0xec
    [    6.785077]  dump_stack+0x1c/0x38
    [    6.788345]  ___might_sleep+0x1a8/0x2a0
    [    6.792129]  __might_sleep+0x6c/0xd0
    [    6.795655]  kmem_cache_alloc_trace+0x270/0x3d0
    [    6.800127]  do_feature_check_call+0x100/0x220
    [    6.804513]  zynqmp_pm_invoke_fn+0x8c/0xb0
    [    6.808555]  zynqmp_pm_clock_getstate+0x90/0xe0
    [    6.813027]  zynqmp_pll_is_enabled+0x8c/0x120
    [    6.817327]  zynqmp_pll_enable+0x38/0xc0
    [    6.821197]  clk_core_enable+0x144/0x400
    [    6.825067]  clk_core_enable+0xd4/0x400
    [    6.828851]  clk_core_enable+0xd4/0x400
    [    6.832635]  clk_core_enable+0xd4/0x400
    [    6.836419]  clk_core_enable+0xd4/0x400
    [    6.840203]  clk_core_enable+0xd4/0x400
    [    6.843987]  clk_core_enable+0xd4/0x400
    [    6.847771]  clk_core_enable+0xd4/0x400
    [    6.851555]  clk_core_enable_lock+0x24/0x50
    [    6.855683]  clk_enable+0x24/0x40
    [    6.858952]  fclk_probe+0x84/0xf0
    [    6.862220]  platform_probe+0x8c/0x110
    [    6.865918]  really_probe+0x110/0x5f0
    [    6.869530]  __driver_probe_device+0xcc/0x210
    [    6.873830]  driver_probe_device+0x64/0x140
    [    6.877958]  __driver_attach+0x114/0x1f0
    [    6.881828]  bus_for_each_dev+0xe8/0x160
    [    6.885698]  driver_attach+0x34/0x50
    [    6.889224]  bus_add_driver+0x228/0x300
    [    6.893008]  driver_register+0xc0/0x1e0
    [    6.896792]  __platform_driver_register+0x44/0x60
    [    6.901436]  fclk_driver_init+0x1c/0x28
    [    6.905220]  do_one_initcall+0x104/0x590
    [    6.909091]  kernel_init_freeable+0x254/0x2bc
    [    6.913390]  kernel_init+0x24/0x130
    [    6.916831]  ret_from_fork+0x10/0x20
    
    Fix it by passing the GFP_ATOMIC gfp flag for the corresponding
    memory allocation.
    
    Fixes: acfdd18591ea ("firmware: xilinx: Use hash-table for api feature check")
    Cc: stable <[email protected]>
    Signed-off-by: Roman Gushchin <[email protected]>
    Cc: Amit Sunil Dhamne <[email protected]>
    Cc: Michal Simek <[email protected]>
    Cc: [email protected]
    Cc: [email protected]
    Link: https://lore.kernel.org/r/[email protected]
    Signed-off-by: Greg Kroah-Hartman <[email protected]>

fs: add mode_strip_sgid() helper [+ + +]

Author: Yang Xu <[email protected]>
Date:   Sat Mar 18 12:15:22 2023 +0200

    fs: add mode_strip_sgid() helper
    
    commit 2b3416ceff5e6bd4922f6d1c61fb68113dd82302 upstream.
    
    [remove userns argument of helper for 5.10.y backport]
    
    Add a dedicated helper to handle the setgid bit when creating a new file
    in a setgid directory. This is a preparatory patch for moving setgid
    stripping into the vfs. The patch contains no functional changes.
    
    Currently the setgid stripping logic is open-coded directly in
    inode_init_owner() and the individual filesystems are responsible for
    handling setgid inheritance. Since this has proven to be brittle as
    evidenced by old issues we uncovered over the last months (see [1] to
    [3] below) we will try to move this logic into the vfs.
    
    Link: e014f37db1a2 ("xfs: use setattr_copy to set vfs inode attributes") [1]
    Link: 01ea173e103e ("xfs: fix up non-directory creation in SGID directories") [2]
    Link: fd84bfdddd16 ("ceph: fix up non-directory creation in SGID directories") [3]
    Link: https://lore.kernel.org/r/[email protected]
    Reviewed-by: Darrick J. Wong <[email protected]>
    Reviewed-by: Christian Brauner (Microsoft) <[email protected]>
    Reviewed-and-Tested-by: Jeff Layton <[email protected]>
    Signed-off-by: Yang Xu <[email protected]>
    Signed-off-by: Christian Brauner (Microsoft) <[email protected]>
    Signed-off-by: Amir Goldstein <[email protected]>
    Signed-off-by: Greg Kroah-Hartman <[email protected]>

fs: move S_ISGID stripping into the vfs_*() helpers [+ + +]

Author: Yang Xu <[email protected]>
Date:   Sat Mar 18 12:15:23 2023 +0200

    fs: move S_ISGID stripping into the vfs_*() helpers
    
    commit 1639a49ccdce58ea248841ed9b23babcce6dbb0b upstream.
    
    [remove userns argument of helpers for 5.10.y backport]
    
    Move setgid handling out of individual filesystems and into the VFS
    itself to stop the proliferation of setgid inheritance bugs.
    
    Creating files that have both the S_IXGRP and S_ISGID bit raised in
    directories that themselves have the S_ISGID bit set requires additional
    privileges to avoid security issues.
    
    When a filesystem creates a new inode it needs to take care that the
    caller is either in the group of the newly created inode or they have
    CAP_FSETID in their current user namespace and are privileged over the
    parent directory of the new inode. If any of these two conditions is
    true then the S_ISGID bit can be raised for an S_IXGRP file and if not
    it needs to be stripped.
    
    However, there are several key issues with the current implementation:
    
    * S_ISGID stripping logic is entangled with umask stripping.
    
      If a filesystem doesn't support or enable POSIX ACLs then umask
      stripping is done directly in the vfs before calling into the
      filesystem.
      If the filesystem does support POSIX ACLs then unmask stripping may be
      done in the filesystem itself when calling posix_acl_create().
    
      Since umask stripping has an effect on S_ISGID inheritance, e.g., by
      stripping the S_IXGRP bit from the file to be created and all relevant
      filesystems have to call posix_acl_create() before inode_init_owner()
      where we currently take care of S_ISGID handling S_ISGID handling is
      order dependent. IOW, whether or not you get a setgid bit depends on
      POSIX ACLs and umask and in what order they are called.
    
      Note that technically filesystems are free to impose their own
      ordering between posix_acl_create() and inode_init_owner() meaning
      that there's additional ordering issues that influence S_SIGID
      inheritance.
    
    * Filesystems that don't rely on inode_init_owner() don't get S_ISGID
      stripping logic.
    
      While that may be intentional (e.g. network filesystems might just
      defer setgid stripping to a server) it is often just a security issue.
    
    This is not just ugly it's unsustainably messy especially since we do
    still have bugs in this area years after the initial round of setgid
    bugfixes.
    
    So the current state is quite messy and while we won't be able to make
    it completely clean as posix_acl_create() is still a filesystem specific
    call we can improve the S_SIGD stripping situation quite a bit by
    hoisting it out of inode_init_owner() and into the vfs creation
    operations. This means we alleviate the burden for filesystems to handle
    S_ISGID stripping correctly and can standardize the ordering between
    S_ISGID and umask stripping in the vfs.
    
    We add a new helper vfs_prepare_mode() so S_ISGID handling is now done
    in the VFS before umask handling. This has S_ISGID handling is
    unaffected unaffected by whether umask stripping is done by the VFS
    itself (if no POSIX ACLs are supported or enabled) or in the filesystem
    in posix_acl_create() (if POSIX ACLs are supported).
    
    The vfs_prepare_mode() helper is called directly in vfs_*() helpers that
    create new filesystem objects. We need to move them into there to make
    sure that filesystems like overlayfs hat have callchains like:
    
    sys_mknod()
    -> do_mknodat(mode)
       -> .mknod = ovl_mknod(mode)
          -> ovl_create(mode)
             -> vfs_mknod(mode)
    
    get S_ISGID stripping done when calling into lower filesystems via
    vfs_*() creation helpers. Moving vfs_prepare_mode() into e.g.
    vfs_mknod() takes care of that. This is in any case semantically cleaner
    because S_ISGID stripping is VFS security requirement.
    
    Security hooks so far have seen the mode with the umask applied but
    without S_ISGID handling done. The relevant hooks are called outside of
    vfs_*() creation helpers so by calling vfs_prepare_mode() from vfs_*()
    helpers the security hooks would now see the mode without umask
    stripping applied. For now we fix this by passing the mode with umask
    settings applied to not risk any regressions for LSM hooks. IOW, nothing
    changes for LSM hooks. It is worth pointing out that security hooks
    never saw the mode that is seen by the filesystem when actually creating
    the file. They have always been completely misplaced for that to work.
    
    The following filesystems use inode_init_owner() and thus relied on
    S_ISGID stripping: spufs, 9p, bfs, btrfs, ext2, ext4, f2fs, hfsplus,
    hugetlbfs, jfs, minix, nilfs2, ntfs3, ocfs2, omfs, overlayfs, ramfs,
    reiserfs, sysv, ubifs, udf, ufs, xfs, zonefs, bpf, tmpfs.
    
    All of the above filesystems end up calling inode_init_owner() when new
    filesystem objects are created through the ->mkdir(), ->mknod(),
    ->create(), ->tmpfile(), ->rename() inode operations.
    
    Since directories always inherit the S_ISGID bit with the exception of
    xfs when irix_sgid_inherit mode is turned on S_ISGID stripping doesn't
    apply. The ->symlink() and ->link() inode operations trivially inherit
    the mode from the target and the ->rename() inode operation inherits the
    mode from the source inode. All other creation inode operations will get
    S_ISGID handling via vfs_prepare_mode() when called from their relevant
    vfs_*() helpers.
    
    In addition to this there are filesystems which allow the creation of
    filesystem objects through ioctl()s or - in the case of spufs -
    circumventing the vfs in other ways. If filesystem objects are created
    through ioctl()s the vfs doesn't know about it and can't apply regular
    permission checking including S_ISGID logic. Therfore, a filesystem
    relying on S_ISGID stripping in inode_init_owner() in their ioctl()
    callpath will be affected by moving this logic into the vfs. We audited
    those filesystems:
    
    * btrfs allows the creation of filesystem objects through various
      ioctls(). Snapshot creation literally takes a snapshot and so the mode
      is fully preserved and S_ISGID stripping doesn't apply.
    
      Creating a new subvolum relies on inode_init_owner() in
      btrfs_new_subvol_inode() but only creates directories and doesn't
      raise S_ISGID.
    
    * ocfs2 has a peculiar implementation of reflinks. In contrast to e.g.
      xfs and btrfs FICLONE/FICLONERANGE ioctl() that is only concerned with
      the actual extents ocfs2 uses a separate ioctl() that also creates the
      target file.
    
      Iow, ocfs2 circumvents the vfs entirely here and did indeed rely on
      inode_init_owner() to strip the S_ISGID bit. This is the only place
      where a filesystem needs to call mode_strip_sgid() directly but this
      is self-inflicted pain.
    
    * spufs doesn't go through the vfs at all and doesn't use ioctl()s
      either. Instead it has a dedicated system call spufs_create() which
      allows the creation of filesystem objects. But spufs only creates
      directories and doesn't allo S_SIGID bits, i.e. it specifically only
      allows 0777 bits.
    
    * bpf uses vfs_mkobj() but also doesn't allow S_ISGID bits to be created.
    
    The patch will have an effect on ext2 when the EXT2_MOUNT_GRPID mount
    option is used, on ext4 when the EXT4_MOUNT_GRPID mount option is used,
    and on xfs when the XFS_FEAT_GRPID mount option is used. When any of
    these filesystems are mounted with their respective GRPID option then
    newly created files inherit the parent directories group
    unconditionally. In these cases non of the filesystems call
    inode_init_owner() and thus did never strip the S_ISGID bit for newly
    created files. Moving this logic into the VFS means that they now get
    the S_ISGID bit stripped. This is a user visible change. If this leads
    to regressions we will either need to figure out a better way or we need
    to revert. However, given the various setgid bugs that we found just in
    the last two years this is a regression risk we should take.
    
    Associated with this change is a new set of fstests to enforce the
    semantics for all new filesystems.
    
    Link: https://lore.kernel.org/ceph-devel/20220427092201.wvsdjbnc7b4dttaw@wittgenstein [1]
    Link: e014f37db1a2 ("xfs: use setattr_copy to set vfs inode attributes") [2]
    Link: 01ea173e103e ("xfs: fix up non-directory creation in SGID directories") [3]
    Link: fd84bfdddd16 ("ceph: fix up non-directory creation in SGID directories") [4]
    Link: https://lore.kernel.org/r/[email protected]
    Suggested-by: Dave Chinner <[email protected]>
    Suggested-by: Christian Brauner (Microsoft) <[email protected]>
    Reviewed-by: Darrick J. Wong <[email protected]>
    Reviewed-and-Tested-by: Jeff Layton <[email protected]>
    Signed-off-by: Yang Xu <[email protected]>
    [<[email protected]>: rewrote commit message]
    Signed-off-by: Christian Brauner (Microsoft) <[email protected]>
    Signed-off-by: Amir Goldstein <[email protected]>
    Signed-off-by: Greg Kroah-Hartman <[email protected]>

fs: move should_remove_suid() [+ + +]

Author: Amir Goldstein <[email protected]>
Date:   Sat Mar 18 12:15:25 2023 +0200

    fs: move should_remove_suid()
    
    commit e243e3f94c804ecca9a8241b5babe28f35258ef4 upstream.
    
    Move the helper from inode.c to attr.c. This keeps the the core of the
    set{g,u}id stripping logic in one place when we add follow-up changes.
    It is the better place anyway, since should_remove_suid() returns
    ATTR_KILL_S{G,U}ID flags.
    
    Reviewed-by: Amir Goldstein <[email protected]>
    Signed-off-by: Christian Brauner (Microsoft) <[email protected]>
    Signed-off-by: Amir Goldstein <[email protected]>
    Signed-off-by: Greg Kroah-Hartman <[email protected]>

fs: use consistent setgid checks in is_sxid() [+ + +]

Author: Christian Brauner <[email protected]>
Date:   Sat Mar 18 12:15:28 2023 +0200

    fs: use consistent setgid checks in is_sxid()
    
    commit 8d84e39d76bd83474b26cb44f4b338635676e7e8 upstream.
    
    Now that we made the VFS setgid checking consistent an inode can't be
    marked security irrelevant even if the setgid bit is still set. Make
    this function consistent with all other helpers.
    
    Note that enforcing consistent setgid stripping checks for file
    modification and mode- and ownership changes will cause the setgid bit
    to be lost in more cases than useed to be the case. If an unprivileged
    user wrote to a non-executable setgid file that they don't have
    privilege over the setgid bit will be dropped. This will lead to
    temporary failures in some xfstests until they have been updated.
    
    Reported-by: Miklos Szeredi <[email protected]>
    Signed-off-by: Christian Brauner (Microsoft) <[email protected]>
    Signed-off-by: Amir Goldstein <[email protected]>
    Signed-off-by: Greg Kroah-Hartman <[email protected]>

ftrace: Fix invalid address access in lookup_rec() when index is 0 [+ + +]

Author: Chen Zhongjin <[email protected]>
Date:   Thu Mar 9 16:02:30 2023 +0800

    ftrace: Fix invalid address access in lookup_rec() when index is 0
    
    commit ee92fa443358f4fc0017c1d0d325c27b37802504 upstream.
    
    KASAN reported follow problem:
    
     BUG: KASAN: use-after-free in lookup_rec
     Read of size 8 at addr ffff000199270ff0 by task modprobe
     CPU: 2 Comm: modprobe
     Call trace:
      kasan_report
      __asan_load8
      lookup_rec
      ftrace_location
      arch_check_ftrace_location
      check_kprobe_address_safe
      register_kprobe
    
    When checking pg->records[pg->index - 1].ip in lookup_rec(), it can get a
    pg which is newly added to ftrace_pages_start in ftrace_process_locs().
    Before the first pg->index++, index is 0 and accessing pg->records[-1].ip
    will cause this problem.
    
    Don't check the ip when pg->index is 0.
    
    Link: https://lore.kernel.org/linux-trace-kernel/[email protected]
    
    Cc: [email protected]
    Fixes: 9644302e3315 ("ftrace: Speed up search by skipping pages by address")
    Suggested-by: Steven Rostedt (Google) <[email protected]>
    Signed-off-by: Chen Zhongjin <[email protected]>
    Signed-off-by: Steven Rostedt (Google) <[email protected]>
    Signed-off-by: Greg Kroah-Hartman <[email protected]>

HID: core: Provide new max_buffer_size attribute to over-ride the default [+ + +]

Author: Lee Jones <[email protected]>
Date:   Mon Mar 20 13:06:31 2023 +0000

    HID: core: Provide new max_buffer_size attribute to over-ride the default
    
    commit b1a37ed00d7908a991c1d0f18a8cba3c2aa99bdc upstream.
    
    Presently, when a report is processed, its proposed size, provided by
    the user of the API (as Report Size * Report Count) is compared against
    the subsystem default HID_MAX_BUFFER_SIZE (16k).  However, some
    low-level HID drivers allocate a reduced amount of memory to their
    buffers (e.g. UHID only allocates UHID_DATA_MAX (4k) buffers), rending
    this check inadequate in some cases.
    
    In these circumstances, if the received report ends up being smaller
    than the proposed report size, the remainder of the buffer is zeroed.
    That is, the space between sizeof(csize) (size of the current report)
    and the rsize (size proposed i.e. Report Size * Report Count), which can
    be handled up to HID_MAX_BUFFER_SIZE (16k).  Meaning that memset()
    shoots straight past the end of the buffer boundary and starts zeroing
    out in-use values, often resulting in calamity.
    
    This patch introduces a new variable into 'struct hid_ll_driver' where
    individual low-level drivers can over-ride the default maximum value of
    HID_MAX_BUFFER_SIZE (16k) with something more sympathetic to the
    interface.
    
    Signed-off-by: Lee Jones <[email protected]>
    Signed-off-by: Jiri Kosina <[email protected]>
    [Lee: Backported to v5.10.y]
    Signed-off-by: Lee Jones <[email protected]>
    Signed-off-by: Greg Kroah-Hartman <[email protected]>

HID: uhid: Over-ride the default maximum data buffer value with our own [+ + +]

Author: Lee Jones <[email protected]>
Date:   Mon Mar 20 13:06:32 2023 +0000

    HID: uhid: Over-ride the default maximum data buffer value with our own
    
    commit 1c5d4221240a233df2440fe75c881465cdf8da07 upstream.
    
    The default maximum data buffer size for this interface is UHID_DATA_MAX
    (4k).  When data buffers are being processed, ensure this value is used
    when ensuring the sanity, rather than a value between the user provided
    value and HID_MAX_BUFFER_SIZE (16k).
    
    Signed-off-by: Lee Jones <[email protected]>
    Signed-off-by: Jiri Kosina <[email protected]>
    Signed-off-by: Lee Jones <[email protected]>
    Signed-off-by: Greg Kroah-Hartman <[email protected]>

hwmon: (adm1266) Set `can_sleep` flag for GPIO chip [+ + +]

Author: Lars-Peter Clausen <[email protected]>
Date:   Tue Mar 14 02:31:45 2023 -0700

    hwmon: (adm1266) Set `can_sleep` flag for GPIO chip
    
    [ Upstream commit a5bb73b3f5db1a4e91402ad132b59b13d2651ed9 ]
    
    The adm1266 driver uses I2C bus access in its GPIO chip `set` and `get`
    implementation. This means these functions can sleep and the GPIO chip
    should set the `can_sleep` property to true.
    
    This will ensure that a warning is printed when trying to set or get the
    GPIO value from a context that potentially can't sleep.
    
    Fixes: d98dfad35c38 ("hwmon: (pmbus/adm1266) Add support for GPIOs")
    Signed-off-by: Lars-Peter Clausen <[email protected]>
    Link: https://lore.kernel.org/r/[email protected]
    Signed-off-by: Guenter Roeck <[email protected]>
    Signed-off-by: Sasha Levin <[email protected]>

hwmon: (adt7475) Display smoothing attributes in correct order [+ + +]

Author: Tony O'Brien <[email protected]>
Date:   Wed Feb 22 13:52:27 2023 +1300

    hwmon: (adt7475) Display smoothing attributes in correct order
    
    [ Upstream commit 5f8d1e3b6f9b5971f9c06d5846ce00c49e3a8d94 ]
    
    Throughout the ADT7475 driver, attributes relating to the temperature
    sensors are displayed in the order Remote 1, Local, Remote 2.  Make
    temp_st_show() conform to this expectation so that values set by
    temp_st_store() can be displayed using the correct attribute.
    
    Fixes: 8f05bcc33e74 ("hwmon: (adt7475) temperature smoothing")
    Signed-off-by: Tony O'Brien <[email protected]>
    Link: https://lore.kernel.org/r/[email protected]
    Signed-off-by: Guenter Roeck <[email protected]>
    Signed-off-by: Sasha Levin <[email protected]>

hwmon: (adt7475) Fix masking of hysteresis registers [+ + +]

Author: Tony O'Brien <[email protected]>
Date:   Wed Feb 22 13:52:28 2023 +1300

    hwmon: (adt7475) Fix masking of hysteresis registers
    
    [ Upstream commit 48e8186870d9d0902e712d601ccb7098cb220688 ]
    
    The wrong bits are masked in the hysteresis register; indices 0 and 2
    should zero bits [7:4] and preserve bits [3:0], and index 1 should zero
    bits [3:0] and preserve bits [7:4].
    
    Fixes: 1c301fc5394f ("hwmon: Add a driver for the ADT7475 hardware monitoring chip")
    Signed-off-by: Tony O'Brien <[email protected]>
    Link: https://lore.kernel.org/r/[email protected]
    Signed-off-by: Guenter Roeck <[email protected]>
    Signed-off-by: Sasha Levin <[email protected]>

hwmon: (ina3221) return prober error code [+ + +]

Author: Marcus Folkesson <[email protected]>
Date:   Fri Mar 10 08:50:35 2023 +0100

    hwmon: (ina3221) return prober error code
    
    [ Upstream commit c93f5e2ab53243b17febabb9422a697017d3d49a ]
    
    ret is set to 0 which do not indicate an error.
    Return -EINVAL instead.
    
    Fixes: a9e9dd9c6de5 ("hwmon: (ina3221) Read channel input source info from DT")
    Signed-off-by: Marcus Folkesson <[email protected]>
    Link: https://lore.kernel.org/r/[email protected]
    Signed-off-by: Guenter Roeck <[email protected]>
    Signed-off-by: Sasha Levin <[email protected]>

hwmon: (ucd90320) Add minimum delay between bus accesses [+ + +]

Author: Lars-Peter Clausen <[email protected]>
Date:   Sun Mar 12 09:03:12 2023 -0700

    hwmon: (ucd90320) Add minimum delay between bus accesses
    
    [ Upstream commit 8d655e65237643c48ada2c131b83679bf1105373 ]
    
    When probing the ucd90320 access to some of the registers randomly fails.
    Sometimes it NACKs a transfer, sometimes it returns just random data and
    the PEC check fails.
    
    Experimentation shows that this seems to be triggered by a register access
    directly back to back with a previous register write. Experimentation also
    shows that inserting a small delay after register writes makes the issue go
    away.
    
    Use a similar solution to what the max15301 driver does to solve the same
    problem. Create a custom set of bus read and write functions that make sure
    that the delay is added.
    
    Fixes: a470f11c5ba2 ("hwmon: (pmbus/ucd9000) Add support for UCD90320 Power Sequencer")
    Signed-off-by: Lars-Peter Clausen <[email protected]>
    Link: https://lore.kernel.org/r/[email protected]
    Signed-off-by: Guenter Roeck <[email protected]>
    Signed-off-by: Sasha Levin <[email protected]>

hwmon: (xgene) Fix use after free bug in xgene_hwmon_remove due to race condition [+ + +]

Author: Zheng Wang <[email protected]>
Date:   Fri Mar 10 16:40:07 2023 +0800

    hwmon: (xgene) Fix use after free bug in xgene_hwmon_remove due to race condition
    
    [ Upstream commit cb090e64cf25602b9adaf32d5dfc9c8bec493cd1 ]
    
    In xgene_hwmon_probe, &ctx->workq is bound with xgene_hwmon_evt_work.
    Then it will be started.
    
    If we remove the driver which will call xgene_hwmon_remove to clean up,
    there may be unfinished work.
    
    The possible sequence is as follows:
    
    Fix it by finishing the work before cleanup in xgene_hwmon_remove.
    
    CPU0                  CPU1
    
                        |xgene_hwmon_evt_work
    xgene_hwmon_remove   |
    kfifo_free(&ctx->async_msg_fifo);|
                        |
                        |kfifo_out_spinlocked
                        |//use &ctx->async_msg_fifo
    Fixes: 2ca492e22cb7 ("hwmon: (xgene) Fix crash when alarm occurs before driver probe")
    Signed-off-by: Zheng Wang <[email protected]>
    Link: https://lore.kernel.org/r/[email protected]
    Signed-off-by: Guenter Roeck <[email protected]>
    Signed-off-by: Sasha Levin <[email protected]>

hwmon: tmp512: drop of_match_ptr for ID table [+ + +]

Author: Krzysztof Kozlowski <[email protected]>
Date:   Sun Mar 12 20:37:23 2023 +0100

    hwmon: tmp512: drop of_match_ptr for ID table
    
    [ Upstream commit 00d85e81796b17a29a0e096c5a4735daa47adef8 ]
    
    The driver will match mostly by DT table (even thought there is regular
    ID table) so there is little benefit in of_match_ptr (this also allows
    ACPI matching via PRP0001, even though it might not be relevant here).
    This also fixes !CONFIG_OF error:
    
      drivers/hwmon/tmp513.c:610:34: error: Б─≤tmp51x_of_matchБ─≥ defined but not used [-Werror=unused-const-variable=]
    
    Fixes: 59dfa75e5d82 ("hwmon: Add driver for Texas Instruments TMP512/513 sensor chips.")
    Signed-off-by: Krzysztof Kozlowski <[email protected]>
    Link: https://lore.kernel.org/r/[email protected]
    Signed-off-by: Guenter Roeck <[email protected]>
    Signed-off-by: Sasha Levin <[email protected]>

i40e: Fix kernel crash during reboot when adapter is in recovery mode [+ + +]

Author: Ivan Vecera <[email protected]>
Date:   Thu Mar 9 10:45:09 2023 -0800

    i40e: Fix kernel crash during reboot when adapter is in recovery mode
    
    [ Upstream commit 7e4f8a0c495413a50413e8c9f1032ce1bc633bae ]
    
    If the driver detects during probe that firmware is in recovery
    mode then i40e_init_recovery_mode() is called and the rest of
    probe function is skipped including pci_set_drvdata(). Subsequent
    i40e_shutdown() called during shutdown/reboot dereferences NULL
    pointer as pci_get_drvdata() returns NULL.
    
    To fix call pci_set_drvdata() also during entering to recovery mode.
    
    Reproducer:
    1) Lets have i40e NIC with firmware in recovery mode
    2) Run reboot
    
    Result:
    [  139.084698] i40e: Intel(R) Ethernet Connection XL710 Network Driver
    [  139.090959] i40e: Copyright (c) 2013 - 2019 Intel Corporation.
    [  139.108438] i40e 0000:02:00.0: Firmware recovery mode detected. Limiting functionality.
    [  139.116439] i40e 0000:02:00.0: Refer to the Intel(R) Ethernet Adapters and Devices User Guide for details on firmware recovery mode.
    [  139.129499] i40e 0000:02:00.0: fw 8.3.64775 api 1.13 nvm 8.30 0x8000b78d 1.3106.0 [8086:1583] [15d9:084a]
    [  139.215932] i40e 0000:02:00.0 enp2s0f0: renamed from eth0
    [  139.223292] i40e 0000:02:00.1: Firmware recovery mode detected. Limiting functionality.
    [  139.231292] i40e 0000:02:00.1: Refer to the Intel(R) Ethernet Adapters and Devices User Guide for details on firmware recovery mode.
    [  139.244406] i40e 0000:02:00.1: fw 8.3.64775 api 1.13 nvm 8.30 0x8000b78d 1.3106.0 [8086:1583] [15d9:084a]
    [  139.329209] i40e 0000:02:00.1 enp2s0f1: renamed from eth0
    ...
    [  156.311376] BUG: kernel NULL pointer dereference, address: 00000000000006c2
    [  156.318330] #PF: supervisor write access in kernel mode
    [  156.323546] #PF: error_code(0x0002) - not-present page
    [  156.328679] PGD 0 P4D 0
    [  156.331210] Oops: 0002 [#1] PREEMPT SMP NOPTI
    [  156.335567] CPU: 26 PID: 15119 Comm: reboot Tainted: G            E      6.2.0+ #1
    [  156.343126] Hardware name: Abacus electric, s.r.o. - [email protected] Super Server/H12SSW-iN, BIOS 2.4 04/13/2022
    [  156.353369] RIP: 0010:i40e_shutdown+0x15/0x130 [i40e]
    [  156.358430] Code: c1 fc ff ff 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 f3 0f 1e fa 0f 1f 44 00 00 55 48 89 fd 53 48 8b 9f 48 01 00 00 <f0> 80 8b c2 06 00 00 04 f0 80 8b c0 06 00 00 08 48 8d bb 08 08 00
    [  156.377168] RSP: 0018:ffffb223c8447d90 EFLAGS: 00010282
    [  156.382384] RAX: ffffffffc073ee70 RBX: 0000000000000000 RCX: 0000000000000001
    [  156.389510] RDX: 0000000080000001 RSI: 0000000000000246 RDI: ffff95db49988000
    [  156.396634] RBP: ffff95db49988000 R08: ffffffffffffffff R09: ffffffff8bd17d40
    [  156.403759] R10: 0000000000000001 R11: ffffffff8a5e3d28 R12: ffff95db49988000
    [  156.410882] R13: ffffffff89a6fe17 R14: ffff95db49988150 R15: 0000000000000000
    [  156.418007] FS:  00007fe7c0cc3980(0000) GS:ffff95ea8ee80000(0000) knlGS:0000000000000000
    [  156.426083] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
    [  156.431819] CR2: 00000000000006c2 CR3: 00000003092fc005 CR4: 0000000000770ee0
    [  156.438944] PKRU: 55555554
    [  156.441647] Call Trace:
    [  156.444096]  <TASK>
    [  156.446199]  pci_device_shutdown+0x38/0x60
    [  156.450297]  device_shutdown+0x163/0x210
    [  156.454215]  kernel_restart+0x12/0x70
    [  156.457872]  __do_sys_reboot+0x1ab/0x230
    [  156.461789]  ? vfs_writev+0xa6/0x1a0
    [  156.465362]  ? __pfx_file_free_rcu+0x10/0x10
    [  156.469635]  ? __call_rcu_common.constprop.85+0x109/0x5a0
    [  156.475034]  do_syscall_64+0x3e/0x90
    [  156.478611]  entry_SYSCALL_64_after_hwframe+0x72/0xdc
    [  156.483658] RIP: 0033:0x7fe7bff37ab7
    
    Fixes: 4ff0ee1af016 ("i40e: Introduce recovery mode support")
    Signed-off-by: Ivan Vecera <[email protected]>
    Tested-by: Arpana Arland <[email protected]> (A Contingent worker at Intel)
    Signed-off-by: Tony Nguyen <[email protected]>
    Link: https://lore.kernel.org/r/[email protected]
    Signed-off-by: Jakub Kicinski <[email protected]>
    Signed-off-by: Sasha Levin <[email protected]>

ice: xsk: disable txq irq before flushing hw [+ + +]

Author: Maciej Fijalkowski <[email protected]>
Date:   Tue Mar 14 10:45:43 2023 -0700

    ice: xsk: disable txq irq before flushing hw
    
    [ Upstream commit b830c9642386867863ac64295185f896ff2928ac ]
    
    ice_qp_dis() intends to stop a given queue pair that is a target of xsk
    pool attach/detach. One of the steps is to disable interrupts on these
    queues. It currently is broken in a way that txq irq is turned off
    *after* HW flush which in turn takes no effect.
    
    ice_qp_dis():
    -> ice_qvec_dis_irq()
    --> disable rxq irq
    --> flush hw
    -> ice_vsi_stop_tx_ring()
    -->disable txq irq
    
    Below splat can be triggered by following steps:
    - start xdpsock WITHOUT loading xdp prog
    - run xdp_rxq_info with XDP_TX action on this interface
    - start traffic
    - terminate xdpsock
    
    [  256.312485] BUG: kernel NULL pointer dereference, address: 0000000000000018
    [  256.319560] #PF: supervisor read access in kernel mode
    [  256.324775] #PF: error_code(0x0000) - not-present page
    [  256.329994] PGD 0 P4D 0
    [  256.332574] Oops: 0000 [#1] PREEMPT SMP NOPTI
    [  256.337006] CPU: 3 PID: 32 Comm: ksoftirqd/3 Tainted: G           OE      6.2.0-rc5+ #51
    [  256.345218] Hardware name: Intel Corporation S2600WFT/S2600WFT, BIOS SE5C620.86B.02.01.0008.031920191559 03/19/2019
    [  256.355807] RIP: 0010:ice_clean_rx_irq_zc+0x9c/0x7d0 [ice]
    [  256.361423] Code: b7 8f 8a 00 00 00 66 39 ca 0f 84 f1 04 00 00 49 8b 47 40 4c 8b 24 d0 41 0f b7 45 04 66 25 ff 3f 66 89 04 24 0f 84 85 02 00 00 <49> 8b 44 24 18 0f b7 14 24 48 05 00 01 00 00 49 89 04 24 49 89 44
    [  256.380463] RSP: 0018:ffffc900088bfd20 EFLAGS: 00010206
    [  256.385765] RAX: 000000000000003c RBX: 0000000000000035 RCX: 000000000000067f
    [  256.393012] RDX: 0000000000000775 RSI: 0000000000000000 RDI: ffff8881deb3ac80
    [  256.400256] RBP: 000000000000003c R08: ffff889847982710 R09: 0000000000010000
    [  256.407500] R10: ffffffff82c060c0 R11: 0000000000000004 R12: 0000000000000000
    [  256.414746] R13: ffff88811165eea0 R14: ffffc9000d255000 R15: ffff888119b37600
    [  256.421990] FS:  0000000000000000(0000) GS:ffff8897e0cc0000(0000) knlGS:0000000000000000
    [  256.430207] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
    [  256.436036] CR2: 0000000000000018 CR3: 0000000005c0a006 CR4: 00000000007706e0
    [  256.443283] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
    [  256.450527] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
    [  256.457770] PKRU: 55555554
    [  256.460529] Call Trace:
    [  256.463015]  <TASK>
    [  256.465157]  ? ice_xmit_zc+0x6e/0x150 [ice]
    [  256.469437]  ice_napi_poll+0x46d/0x680 [ice]
    [  256.473815]  ? _raw_spin_unlock_irqrestore+0x1b/0x40
    [  256.478863]  __napi_poll+0x29/0x160
    [  256.482409]  net_rx_action+0x136/0x260
    [  256.486222]  __do_softirq+0xe8/0x2e5
    [  256.489853]  ? smpboot_thread_fn+0x2c/0x270
    [  256.494108]  run_ksoftirqd+0x2a/0x50
    [  256.497747]  smpboot_thread_fn+0x1c1/0x270
    [  256.501907]  ? __pfx_smpboot_thread_fn+0x10/0x10
    [  256.506594]  kthread+0xea/0x120
    [  256.509785]  ? __pfx_kthread+0x10/0x10
    [  256.513597]  ret_from_fork+0x29/0x50
    [  256.517238]  </TASK>
    
    In fact, irqs were not disabled and napi managed to be scheduled and run
    while xsk_pool pointer was still valid, but SW ring of xdp_buff pointers
    was already freed.
    
    To fix this, call ice_qvec_dis_irq() after ice_vsi_stop_tx_ring(). Also
    while at it, remove redundant ice_clean_rx_ring() call - this is handled
    in ice_qp_clean_rings().
    
    Fixes: 2d4238f55697 ("ice: Add support for AF_XDP")
    Signed-off-by: Maciej Fijalkowski <[email protected]>
    Reviewed-by: Larysa Zaremba <[email protected]>
    Tested-by: Chandan Kumar Rout <[email protected]> (A Contingent Worker at Intel)
    Acked-by: John Fastabend <[email protected]>
    Signed-off-by: Tony Nguyen <[email protected]>
    Reviewed-by: Leon Romanovsky <[email protected]>
    Signed-off-by: David S. Miller <[email protected]>
    Signed-off-by: Sasha Levin <[email protected]>

interconnect: fix mem leak when freeing nodes [+ + +]

Author: Johan Hovold <[email protected]>
Date:   Mon Mar 6 08:56:29 2023 +0100

    interconnect: fix mem leak when freeing nodes
    
    commit a5904f415e1af72fa8fe6665aa4f554dc2099a95 upstream.
    
    The node link array is allocated when adding links to a node but is not
    deallocated when nodes are destroyed.
    
    Fixes: 11f1ceca7031 ("interconnect: Add generic on-chip interconnect API")
    Cc: [email protected]      # 5.1
    Reviewed-by: Konrad Dybcio <[email protected]>
    Signed-off-by: Johan Hovold <[email protected]>
    Tested-by: Luca Ceresoli <[email protected]> # i.MX8MP MSC SM2-MB-EP1 Board
    Link: https://lore.kernel.org/r/[email protected]
    Signed-off-by: Georgi Djakov <[email protected]>
    Signed-off-by: Greg Kroah-Hartman <[email protected]>

io_uring: avoid null-ptr-deref in io_arm_poll_handler [+ + +]

Author: Fedor Pchelkin <[email protected]>
Date:   Thu Mar 16 21:56:16 2023 +0300

    io_uring: avoid null-ptr-deref in io_arm_poll_handler
    
    No upstream commit exists for this commit.
    
    The issue was introduced with backporting upstream commit c16bda37594f
    ("io_uring/poll: allow some retries for poll triggering spuriously").
    
    Memory allocation can possibly fail causing invalid pointer be
    dereferenced just before comparing it to NULL value.
    
    Move the pointer check in proper place (upstream has the similar location
    of the check). In case the request has REQ_F_POLLED flag up, apoll can't
    be NULL so no need to check there.
    
    Found by Linux Verification Center (linuxtesting.org) with Syzkaller.
    
    Signed-off-by: Fedor Pchelkin <[email protected]>
    Signed-off-by: Greg Kroah-Hartman <[email protected]>

ipv4: Fix incorrect table ID in IOCTL path [+ + +]

Author: Ido Schimmel <[email protected]>
Date:   Wed Mar 15 14:40:09 2023 +0200

    ipv4: Fix incorrect table ID in IOCTL path
    
    [ Upstream commit 8a2618e14f81604a9b6ad305d57e0c8da939cd65 ]
    
    Commit f96a3d74554d ("ipv4: Fix incorrect route flushing when source
    address is deleted") started to take the table ID field in the FIB info
    structure into account when determining if two structures are identical
    or not. This field is initialized using the 'fc_table' field in the
    route configuration structure, which is not set when adding a route via
    IOCTL.
    
    The above can result in user space being able to install two identical
    routes that only differ in the table ID field of their associated FIB
    info.
    
    Fix by initializing the table ID field in the route configuration
    structure in the IOCTL path.
    
    Before the fix:
    
     # ip route add default via 192.0.2.2
     # route add default gw 192.0.2.2
     # ip -4 r show default
     # default via 192.0.2.2 dev dummy10
     # default via 192.0.2.2 dev dummy10
    
    After the fix:
    
     # ip route add default via 192.0.2.2
     # route add default gw 192.0.2.2
     SIOCADDRT: File exists
     # ip -4 r show default
     default via 192.0.2.2 dev dummy10
    
    Audited the code paths to ensure there are no other paths that do not
    properly initialize the route configuration structure when installing a
    route.
    
    Fixes: 5a56a0b3a45d ("net: Don't delete routes in different VRFs")
    Fixes: f96a3d74554d ("ipv4: Fix incorrect route flushing when source address is deleted")
    Reported-by: gaoxingwang <[email protected]>
    Link: https://lore.kernel.org/netdev/[email protected]/
    Tested-by: gaoxingwang <[email protected]>
    Signed-off-by: Ido Schimmel <[email protected]>
    Reviewed-by: David Ahern <[email protected]>
    Link: https://lore.kernel.org/r/[email protected]
    Signed-off-by: Jakub Kicinski <[email protected]>
    Signed-off-by: Sasha Levin <[email protected]>

ipvlan: Make skb->skb_iif track skb->dev for l3s mode [+ + +]

Author: Jianguo Wu <[email protected]>
Date:   Thu Mar 9 10:03:36 2023 +0800

    ipvlan: Make skb->skb_iif track skb->dev for l3s mode
    
    [ Upstream commit 59a0b022aa249e3f5735d93de0849341722c4754 ]
    
    For l3s mode, skb->dev is set to ipvlan interface in ipvlan_nf_input():
      skb->dev = addr->master->dev
    but, skb->skb_iif remain unchanged, this will cause socket lookup failed
    if a target socket is bound to a interface, like the following example:
    
      ip link add ipvlan0 link eth0 type ipvlan mode l3s
      ip addr add dev ipvlan0 192.168.124.111/24
      ip link set ipvlan0 up
    
      ping -c 1 -I ipvlan0 8.8.8.8
      100% packet loss
    
    This is because there is no match sk in __raw_v4_lookup() as sk->sk_bound_dev_if != dif(skb->skb_iif).
    Fix this by make skb->skb_iif track skb->dev in ipvlan_nf_input().
    
    Fixes: c675e06a98a4 ("ipvlan: decouple l3s mode dependencies from other modes")
    Signed-off-by: Jianguo Wu <[email protected]>
    Reviewed-by: Jiri Pirko <[email protected]>
    Link: https://lore.kernel.org/r/[email protected]
    Signed-off-by: Jakub Kicinski <[email protected]>
    Signed-off-by: Sasha Levin <[email protected]>

jffs2: correct logic when creating a hole in jffs2_write_begin [+ + +]

Author: Yifei Liu <[email protected]>
Date:   Wed Aug 3 15:53:12 2022 +0000

    jffs2: correct logic when creating a hole in jffs2_write_begin
    
    [ Upstream commit 23892d383bee15b64f5463bd7195615734bb2415 ]
    
    Bug description and fix:
    
    1. Write data to a file, say all 1s from offset 0 to 16.
    
    2. Truncate the file to a smaller size, say 8 bytes.
    
    3. Write new bytes (say 2s) from an offset past the original size of the
    file, say at offset 20, for 4 bytes.  This is supposed to create a "hole"
    in the file, meaning that the bytes from offset 8 (where it was truncated
    above) up to the new write at offset 20, should all be 0s (zeros).
    
    4. Flush all caches using "echo 3 > /proc/sys/vm/drop_caches" (or unmount
    and remount) the f/s.
    
    5. Check the content of the file.  It is wrong.  The 1s that used to be
    between bytes 9 and 16, before the truncation, have REAPPEARED (they should
    be 0s).
    
    We wrote a script and helper C program to reproduce the bug
    (reproduce_jffs2_write_begin_issue.sh, write_file.c, and Makefile).  We can
    make them available to anyone.
    
    The above example is shown when writing a small file within the same first
    page.  But the bug happens for larger files, as long as steps 1, 2, and 3
    above all happen within the same page.
    
    The problem was traced to the jffs2_write_begin code, where it goes into an
    'if' statement intended to handle writes past the current EOF (i.e., writes
    that may create a hole).  The code computes a 'pageofs' that is the floor
    of the write position (pos), aligned to the page size boundary.  In other
    words, 'pageofs' will never be larger than 'pos'.  The code then sets the
    internal jffs2_raw_inode->isize to the size of max(current inode size,
    pageofs) but that is wrong: the new file size should be the 'pos', which is
    larger than both the current inode size and pageofs.
    
    Similarly, the code incorrectly sets the internal jffs2_raw_inode->dsize to
    the difference between the pageofs minus current inode size; instead it
    should be the current pos minus the current inode size.  Finally,
    inode->i_size was also set incorrectly.
    
    The patch below fixes this bug.  The bug was discovered using a new tool
    for finding f/s bugs using model checking, called MCFS (Model Checking File
    Systems).
    
    Signed-off-by: Yifei Liu <[email protected]>
    Signed-off-by: Erez Zadok <[email protected]>
    Signed-off-by: Manish Adkar <[email protected]>
    Signed-off-by: Richard Weinberger <[email protected]>
    Signed-off-by: Sasha Levin <[email protected]>

KVM: nVMX: add missing consistency checks for CR0 and CR4 [+ + +]

Author: Paolo Bonzini <[email protected]>
Date:   Fri Mar 10 11:10:56 2023 -0500

    KVM: nVMX: add missing consistency checks for CR0 and CR4
    
    commit 112e66017bff7f2837030f34c2bc19501e9212d5 upstream.
    
    The effective values of the guest CR0 and CR4 registers may differ from
    those included in the VMCS12.  In particular, disabling EPT forces
    CR4.PAE=1 and disabling unrestricted guest mode forces CR0.PG=CR0.PE=1.
    
    Therefore, checks on these bits cannot be delegated to the processor
    and must be performed by KVM.
    
    Reported-by: Reima ISHII <[email protected]>
    Cc: [email protected]
    Signed-off-by: Paolo Bonzini <[email protected]>
    Signed-off-by: Greg Kroah-Hartman <[email protected]>

Linux: Linux 5.10.176 [+ + +]

Author: Greg Kroah-Hartman <[email protected]>
Date:   Wed Mar 22 13:30:08 2023 +0100

    Linux 5.10.176
    
    Link: https://lore.kernel.org/r/[email protected]
    Tested-by: Chris Paterson (CIP) <[email protected]>
    Tested-by: Florian Fainelli <[email protected]>
    Tested-by: Linux Kernel Functional Testing <[email protected]>
    Tested-by: Shuah Khan <[email protected]>
    Tested-by: Jon Hunter <[email protected]>
    Tested-by: Guenter Roeck <[email protected]>
    Signed-off-by: Greg Kroah-Hartman <[email protected]>

media: m5mols: fix off-by-one loop termination error [+ + +]

Author: Linus Torvalds <[email protected]>
Date:   Fri Mar 17 13:51:17 2023 -0700

    media: m5mols: fix off-by-one loop termination error
    
    [ Upstream commit efbcbb12ee99f750c9f25c873b55ad774871de2a ]
    
    The __find_restype() function loops over the m5mols_default_ffmt[]
    array, and the termination condition ends up being wrong: instead of
    stopping when the iterator becomes the size of the array it traverses,
    it stops after it has already overshot the array.
    
    Now, in practice this doesn't likely matter, because the code will
    always find the entry it looks for, and will thus return early and never
    hit that last extra iteration.
    
    But it turns out that clang will unroll the loop fully, because it has
    only two iterations (well, three due to the off-by-one bug), and then
    clang will end up just giving up in the middle of the loop unrolling
    when it notices that the code walks past the end of the array.
    
    And that made 'objtool' very unhappy indeed, because the generated code
    just falls off the edge of the universe, and ends up falling through to
    the next function, causing this warning:
    
       drivers/media/i2c/m5mols/m5mols.o: warning: objtool: m5mols_set_fmt() falls through to next function m5mols_get_frame_desc()
    
    Fix the loop ending condition.
    
    Reported-by: Jens Axboe <[email protected]>
    Analyzed-by: Miguel Ojeda <[email protected]>
    Analyzed-by: Nick Desaulniers <[email protected]>
    Link: https://lore.kernel.org/linux-block/CAHk-=wgTSdKYbmB1JYM5vmHMcD9J9UZr0mn7BOYM_LudrP+Xvw@mail.gmail.com/
    Fixes: bc125106f8af ("[media] Add support for M-5MOLS 8 Mega Pixel camera ISP")
    Cc: HeungJun, Kim <[email protected]>
    Cc: Sylwester Nawrocki <[email protected]>
    Cc: Kyungmin Park <[email protected]>
    Cc: Mauro Carvalho Chehab <[email protected]>
    Signed-off-by: Linus Torvalds <[email protected]>
    Signed-off-by: Sasha Levin <[email protected]>

mm/userfaultfd: propagate uffd-wp bit when PTE-mapping the huge zeropage [+ + +]

Author: David Hildenbrand <[email protected]>
Date:   Thu Mar 2 18:54:23 2023 +0100

    mm/userfaultfd: propagate uffd-wp bit when PTE-mapping the huge zeropage
    
    commit 42b2af2c9b7eede8ef21d0943f84d135e21a32a3 upstream.
    
    Currently, we'd lose the userfaultfd-wp marker when PTE-mapping a huge
    zeropage, resulting in the next write faults in the PMD range not
    triggering uffd-wp events.
    
    Various actions (partial MADV_DONTNEED, partial mremap, partial munmap,
    partial mprotect) could trigger this.  However, most importantly,
    un-protecting a single sub-page from the userfaultfd-wp handler when
    processing a uffd-wp event will PTE-map the shared huge zeropage and lose
    the uffd-wp bit for the remainder of the PMD.
    
    Let's properly propagate the uffd-wp bit to the PMDs.
    
     #define _GNU_SOURCE
     #include <stdio.h>
     #include <stdlib.h>
     #include <stdint.h>
     #include <stdbool.h>
     #include <inttypes.h>
     #include <fcntl.h>
     #include <unistd.h>
     #include <errno.h>
     #include <poll.h>
     #include <pthread.h>
     #include <sys/mman.h>
     #include <sys/syscall.h>
     #include <sys/ioctl.h>
     #include <linux/userfaultfd.h>
    
     static size_t pagesize;
     static int uffd;
     static volatile bool uffd_triggered;
    
     #define barrier() __asm__ __volatile__("": : :"memory")
    
     static void uffd_wp_range(char *start, size_t size, bool wp)
     {
            struct uffdio_writeprotect uffd_writeprotect;
    
            uffd_writeprotect.range.start = (unsigned long) start;
            uffd_writeprotect.range.len = size;
            if (wp) {
                    uffd_writeprotect.mode = UFFDIO_WRITEPROTECT_MODE_WP;
            } else {
                    uffd_writeprotect.mode = 0;
            }
            if (ioctl(uffd, UFFDIO_WRITEPROTECT, &uffd_writeprotect)) {
                    fprintf(stderr, "UFFDIO_WRITEPROTECT failed: %d\n", errno);
                    exit(1);
            }
     }
    
     static void *uffd_thread_fn(void *arg)
     {
            static struct uffd_msg msg;
            ssize_t nread;
    
            while (1) {
                    struct pollfd pollfd;
                    int nready;
    
                    pollfd.fd = uffd;
                    pollfd.events = POLLIN;
                    nready = poll(&pollfd, 1, -1);
                    if (nready == -1) {
                            fprintf(stderr, "poll() failed: %d\n", errno);
                            exit(1);
                    }
    
                    nread = read(uffd, &msg, sizeof(msg));
                    if (nread <= 0)
                            continue;
    
                    if (msg.event != UFFD_EVENT_PAGEFAULT ||
                        !(msg.arg.pagefault.flags & UFFD_PAGEFAULT_FLAG_WP)) {
                            printf("FAIL: wrong uffd-wp event fired\n");
                            exit(1);
                    }
    
                    /* un-protect the single page. */
                    uffd_triggered = true;
                    uffd_wp_range((char *)(uintptr_t)msg.arg.pagefault.address,
                                  pagesize, false);
            }
            return arg;
     }
    
     static int setup_uffd(char *map, size_t size)
     {
            struct uffdio_api uffdio_api;
            struct uffdio_register uffdio_register;
            pthread_t thread;
    
            uffd = syscall(__NR_userfaultfd,
                           O_CLOEXEC | O_NONBLOCK | UFFD_USER_MODE_ONLY);
            if (uffd < 0) {
                    fprintf(stderr, "syscall() failed: %d\n", errno);
                    return -errno;
            }
    
            uffdio_api.api = UFFD_API;
            uffdio_api.features = UFFD_FEATURE_PAGEFAULT_FLAG_WP;
            if (ioctl(uffd, UFFDIO_API, &uffdio_api) < 0) {
                    fprintf(stderr, "UFFDIO_API failed: %d\n", errno);
                    return -errno;
            }
    
            if (!(uffdio_api.features & UFFD_FEATURE_PAGEFAULT_FLAG_WP)) {
                    fprintf(stderr, "UFFD_FEATURE_WRITEPROTECT missing\n");
                    return -ENOSYS;
            }
    
            uffdio_register.range.start = (unsigned long) map;
            uffdio_register.range.len = size;
            uffdio_register.mode = UFFDIO_REGISTER_MODE_WP;
            if (ioctl(uffd, UFFDIO_REGISTER, &uffdio_register) < 0) {
                    fprintf(stderr, "UFFDIO_REGISTER failed: %d\n", errno);
                    return -errno;
            }
    
            pthread_create(&thread, NULL, uffd_thread_fn, NULL);
    
            return 0;
     }
    
     int main(void)
     {
            const size_t size = 4 * 1024 * 1024ull;
            char *map, *cur;
    
            pagesize = getpagesize();
    
            map = mmap(NULL, size, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANON, -1, 0);
            if (map == MAP_FAILED) {
                    fprintf(stderr, "mmap() failed\n");
                    return -errno;
            }
    
            if (madvise(map, size, MADV_HUGEPAGE)) {
                    fprintf(stderr, "MADV_HUGEPAGE failed\n");
                    return -errno;
            }
    
            if (setup_uffd(map, size))
                    return 1;
    
            /* Read the whole range, populating zeropages. */
            madvise(map, size, MADV_POPULATE_READ);
    
            /* Write-protect the whole range. */
            uffd_wp_range(map, size, true);
    
            /* Make sure uffd-wp triggers on each page. */
            for (cur = map; cur < map + size; cur += pagesize) {
                    uffd_triggered = false;
    
                    barrier();
                    /* Trigger a write fault. */
                    *cur = 1;
                    barrier();
    
                    if (!uffd_triggered) {
                            printf("FAIL: uffd-wp did not trigger\n");
                            return 1;
                    }
            }
    
            printf("PASS: uffd-wp triggered\n");
            return 0;
     }
    
    Link: https://lkml.kernel.org/r/[email protected]
    Fixes: e06f1e1dd499 ("userfaultfd: wp: enabled write protection in userfaultfd API")
    Signed-off-by: David Hildenbrand <[email protected]>
    Acked-by: Peter Xu <[email protected]>
    Cc: Mike Rapoport <[email protected]>
    Cc: Andrea Arcangeli <[email protected]>
    Cc: Jerome Glisse <[email protected]>
    Cc: Shaohua Li <[email protected]>
    Cc: <[email protected]>
    Signed-off-by: Andrew Morton <[email protected]>
    Signed-off-by: Greg Kroah-Hartman <[email protected]>

mmc: atmel-mci: fix race between stop command and start of next command [+ + +]

Author: Tobias Schramm <[email protected]>
Date:   Fri Dec 30 20:43:15 2022 +0100

    mmc: atmel-mci: fix race between stop command and start of next command
    
    [ Upstream commit eca5bd666b0aa7dc0bca63292e4778968241134e ]
    
    This commit fixes a race between completion of stop command and start of a
    new command.
    Previously the command ready interrupt was enabled before stop command
    was written to the command register. This caused the command ready
    interrupt to fire immediately since the CMDRDY flag is asserted constantly
    while there is no command in progress.
    Consequently the command state machine will immediately advance to the
    next state when the tasklet function is executed again, no matter
    actual completion state of the stop command.
    Thus a new command can then be dispatched immediately, interrupting and
    corrupting the stop command on the CMD line.
    Fix that by dropping the command ready interrupt enable before calling
    atmci_send_stop_cmd. atmci_send_stop_cmd does already enable the
    command ready interrupt, no further writes to ATMCI_IER are necessary.
    
    Signed-off-by: Tobias Schramm <[email protected]>
    Acked-by: Ludovic Desroches <[email protected]>
    Link: https://lore.kernel.org/r/[email protected]
    Signed-off-by: Ulf Hansson <[email protected]>
    Signed-off-by: Sasha Levin <[email protected]>

mmc: sdhci_am654: lower power-on failed message severity [+ + +]

Author: Francesco Dolcini <[email protected]>
Date:   Mon Mar 6 17:27:51 2023 +0100

    mmc: sdhci_am654: lower power-on failed message severity
    
    commit 11440da77d6020831ee6f9ce4551b545dea789ee upstream.
    
    Lower the power-on failed message severity from warn to info when the
    controller does not power-up. It's normal to have this situation when
    the SD card slot is empty, therefore we should not warn the user about
    it.
    
    Fixes: 7ca0f166f5b2 ("mmc: sdhci_am654: Add workaround for card detect debounce timer")
    Signed-off-by: Francesco Dolcini <[email protected]>
    Acked-by: Adrian Hunter <[email protected]>
    Cc: [email protected]
    Link: https://lore.kernel.org/r/[email protected]
    Signed-off-by: Ulf Hansson <[email protected]>
    Signed-off-by: Greg Kroah-Hartman <[email protected]>

mptcp: avoid setting TCP_CLOSE state twice [+ + +]

Author: Matthieu Baerts <[email protected]>
Date:   Thu Mar 9 15:50:03 2023 +0100

    mptcp: avoid setting TCP_CLOSE state twice
    
    commit 3ba14528684f528566fb7d956bfbfb958b591d86 upstream.
    
    tcp_set_state() is called from tcp_done() already.
    
    There is then no need to first set the state to TCP_CLOSE, then call
    tcp_done().
    
    Fixes: d582484726c4 ("mptcp: fix fallback for MP_JOIN subflows")
    Cc: [email protected]
    Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/362
    Acked-by: Paolo Abeni <[email protected]>
    Signed-off-by: Matthieu Baerts <[email protected]>
    Signed-off-by: Jakub Kicinski <[email protected]>
    Signed-off-by: Greg Kroah-Hartman <[email protected]>

net/iucv: Fix size of interrupt data [+ + +]

Author: Alexandra Winter <[email protected]>
Date:   Wed Mar 15 14:14:35 2023 +0100

    net/iucv: Fix size of interrupt data
    
    [ Upstream commit 3d87debb8ed2649608ff432699e7c961c0c6f03b ]
    
    iucv_irq_data needs to be 4 bytes larger.
    These bytes are not used by the iucv module, but written by
    the z/VM hypervisor in case a CPU is deconfigured.
    
    Reported as:
    BUG dma-kmalloc-64 (Not tainted): kmalloc Redzone overwritten
    -----------------------------------------------------------------------------
    0x0000000000400564-0x0000000000400567 @offset=1380. First byte 0x80 instead of 0xcc
    Allocated in iucv_cpu_prepare+0x44/0xd0 age=167839 cpu=2 pid=1
    __kmem_cache_alloc_node+0x166/0x450
    kmalloc_node_trace+0x3a/0x70
    iucv_cpu_prepare+0x44/0xd0
    cpuhp_invoke_callback+0x156/0x2f0
    cpuhp_issue_call+0xf0/0x298
    __cpuhp_setup_state_cpuslocked+0x136/0x338
    __cpuhp_setup_state+0xf4/0x288
    iucv_init+0xf4/0x280
    do_one_initcall+0x78/0x390
    do_initcalls+0x11a/0x140
    kernel_init_freeable+0x25e/0x2a0
    kernel_init+0x2e/0x170
    __ret_from_fork+0x3c/0x58
    ret_from_fork+0xa/0x40
    Freed in iucv_init+0x92/0x280 age=167839 cpu=2 pid=1
    __kmem_cache_free+0x308/0x358
    iucv_init+0x92/0x280
    do_one_initcall+0x78/0x390
    do_initcalls+0x11a/0x140
    kernel_init_freeable+0x25e/0x2a0
    kernel_init+0x2e/0x170
    __ret_from_fork+0x3c/0x58
    ret_from_fork+0xa/0x40
    Slab 0x0000037200010000 objects=32 used=30 fp=0x0000000000400640 flags=0x1ffff00000010200(slab|head|node=0|zone=0|
    Object 0x0000000000400540 @offset=1344 fp=0x0000000000000000
    Redzone  0000000000400500: cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc  ................
    Redzone  0000000000400510: cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc  ................
    Redzone  0000000000400520: cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc  ................
    Redzone  0000000000400530: cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc  ................
    Object   0000000000400540: 00 01 00 03 00 00 00 00 00 00 00 00 00 00 00 00  ................
    Object   0000000000400550: f3 86 81 f2 f4 82 f8 82 f0 f0 f0 f0 f0 f0 f0 f2  ................
    Object   0000000000400560: 00 00 00 00 80 00 00 00 cc cc cc cc cc cc cc cc  ................
    Object   0000000000400570: cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc  ................
    Redzone  0000000000400580: cc cc cc cc cc cc cc cc                          ........
    Padding  00000000004005d4: 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a  ZZZZZZZZZZZZZZZZ
    Padding  00000000004005e4: 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a  ZZZZZZZZZZZZZZZZ
    Padding  00000000004005f4: 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a              ZZZZZZZZZZZZ
    CPU: 6 PID: 121030 Comm: 116-pai-crypto. Not tainted 6.3.0-20230221.rc0.git4.99b8246b2d71.300.fc37.s390x+debug #1
    Hardware name: IBM 3931 A01 704 (z/VM 7.3.0)
    Call Trace:
    [<000000032aa034ec>] dump_stack_lvl+0xac/0x100
    [<0000000329f5a6cc>] check_bytes_and_report+0x104/0x140
    [<0000000329f5aa78>] check_object+0x370/0x3c0
    [<0000000329f5ede6>] free_debug_processing+0x15e/0x348
    [<0000000329f5f06a>] free_to_partial_list+0x9a/0x2f0
    [<0000000329f5f4a4>] __slab_free+0x1e4/0x3a8
    [<0000000329f61768>] __kmem_cache_free+0x308/0x358
    [<000000032a91465c>] iucv_cpu_dead+0x6c/0x88
    [<0000000329c2fc66>] cpuhp_invoke_callback+0x156/0x2f0
    [<000000032aa062da>] _cpu_down.constprop.0+0x22a/0x5e0
    [<0000000329c3243e>] cpu_device_down+0x4e/0x78
    [<000000032a61dee0>] device_offline+0xc8/0x118
    [<000000032a61e048>] online_store+0x60/0xe0
    [<000000032a08b6b0>] kernfs_fop_write_iter+0x150/0x1e8
    [<0000000329fab65c>] vfs_write+0x174/0x360
    [<0000000329fab9fc>] ksys_write+0x74/0x100
    [<000000032aa03a5a>] __do_syscall+0x1da/0x208
    [<000000032aa177b2>] system_call+0x82/0xb0
    INFO: lockdep is turned off.
    FIX dma-kmalloc-64: Restoring kmalloc Redzone 0x0000000000400564-0x0000000000400567=0xcc
    FIX dma-kmalloc-64: Object at 0x0000000000400540 not freed
    
    Fixes: 2356f4cb1911 ("[S390]: Rewrite of the IUCV base code, part 2")
    Signed-off-by: Alexandra Winter <[email protected]>
    Link: https://lore.kernel.org/r/[email protected]
    Signed-off-by: Jakub Kicinski <[email protected]>
    Signed-off-by: Sasha Levin <[email protected]>

net/smc: fix deadlock triggered by cancel_delayed_work_syn() [+ + +]

Author: Wenjia Zhang <[email protected]>
Date:   Mon Mar 13 11:08:28 2023 +0100

    net/smc: fix deadlock triggered by cancel_delayed_work_syn()
    
    [ Upstream commit 13085e1b5cab8ad802904d72e6a6dae85ae0cd20 ]
    
    The following LOCKDEP was detected:
                    Workqueue: events smc_lgr_free_work [smc]
                    WARNING: possible circular locking dependency detected
                    6.1.0-20221027.rc2.git8.56bc5b569087.300.fc36.s390x+debug #1 Not tainted
                    ------------------------------------------------------
                    kworker/3:0/176251 is trying to acquire lock:
                    00000000f1467148 ((wq_completion)smc_tx_wq-00000000#2){+.+.}-{0:0},
                            at: __flush_workqueue+0x7a/0x4f0
                    but task is already holding lock:
                    0000037fffe97dc8 ((work_completion)(&(&lgr->free_work)->work)){+.+.}-{0:0},
                            at: process_one_work+0x232/0x730
                    which lock already depends on the new lock.
                    the existing dependency chain (in reverse order) is:
                    -> #4 ((work_completion)(&(&lgr->free_work)->work)){+.+.}-{0:0}:
                           __lock_acquire+0x58e/0xbd8
                           lock_acquire.part.0+0xe2/0x248
                           lock_acquire+0xac/0x1c8
                           __flush_work+0x76/0xf0
                           __cancel_work_timer+0x170/0x220
                           __smc_lgr_terminate.part.0+0x34/0x1c0 [smc]
                           smc_connect_rdma+0x15e/0x418 [smc]
                           __smc_connect+0x234/0x480 [smc]
                           smc_connect+0x1d6/0x230 [smc]
                           __sys_connect+0x90/0xc0
                           __do_sys_socketcall+0x186/0x370
                           __do_syscall+0x1da/0x208
                           system_call+0x82/0xb0
                    -> #3 (smc_client_lgr_pending){+.+.}-{3:3}:
                           __lock_acquire+0x58e/0xbd8
                           lock_acquire.part.0+0xe2/0x248
                           lock_acquire+0xac/0x1c8
                           __mutex_lock+0x96/0x8e8
                           mutex_lock_nested+0x32/0x40
                           smc_connect_rdma+0xa4/0x418 [smc]
                           __smc_connect+0x234/0x480 [smc]
                           smc_connect+0x1d6/0x230 [smc]
                           __sys_connect+0x90/0xc0
                           __do_sys_socketcall+0x186/0x370
                           __do_syscall+0x1da/0x208
                           system_call+0x82/0xb0
                    -> #2 (sk_lock-AF_SMC){+.+.}-{0:0}:
                           __lock_acquire+0x58e/0xbd8
                           lock_acquire.part.0+0xe2/0x248
                           lock_acquire+0xac/0x1c8
                           lock_sock_nested+0x46/0xa8
                           smc_tx_work+0x34/0x50 [smc]
                           process_one_work+0x30c/0x730
                           worker_thread+0x62/0x420
                           kthread+0x138/0x150
                           __ret_from_fork+0x3c/0x58
                           ret_from_fork+0xa/0x40
                    -> #1 ((work_completion)(&(&smc->conn.tx_work)->work)){+.+.}-{0:0}:
                           __lock_acquire+0x58e/0xbd8
                           lock_acquire.part.0+0xe2/0x248
                           lock_acquire+0xac/0x1c8
                           process_one_work+0x2bc/0x730
                           worker_thread+0x62/0x420
                           kthread+0x138/0x150
                           __ret_from_fork+0x3c/0x58
                           ret_from_fork+0xa/0x40
                    -> #0 ((wq_completion)smc_tx_wq-00000000#2){+.+.}-{0:0}:
                           check_prev_add+0xd8/0xe88
                           validate_chain+0x70c/0xb20
                           __lock_acquire+0x58e/0xbd8
                           lock_acquire.part.0+0xe2/0x248
                           lock_acquire+0xac/0x1c8
                           __flush_workqueue+0xaa/0x4f0
                           drain_workqueue+0xaa/0x158
                           destroy_workqueue+0x44/0x2d8
                           smc_lgr_free+0x9e/0xf8 [smc]
                           process_one_work+0x30c/0x730
                           worker_thread+0x62/0x420
                           kthread+0x138/0x150
                           __ret_from_fork+0x3c/0x58
                           ret_from_fork+0xa/0x40
                    other info that might help us debug this:
                    Chain exists of:
                      (wq_completion)smc_tx_wq-00000000#2
                      --> smc_client_lgr_pending
                      --> (work_completion)(&(&lgr->free_work)->work)
                     Possible unsafe locking scenario:
                           CPU0                    CPU1
                           ----                    ----
                      lock((work_completion)(&(&lgr->free_work)->work));
                                       lock(smc_client_lgr_pending);
                                       lock((work_completion)
                                            (&(&lgr->free_work)->work));
                      lock((wq_completion)smc_tx_wq-00000000#2);
                     *** DEADLOCK ***
                    2 locks held by kworker/3:0/176251:
                     #0: 0000000080183548
                            ((wq_completion)events){+.+.}-{0:0},
                                    at: process_one_work+0x232/0x730
                     #1: 0000037fffe97dc8
                            ((work_completion)
                             (&(&lgr->free_work)->work)){+.+.}-{0:0},
                                    at: process_one_work+0x232/0x730
                    stack backtrace:
                    CPU: 3 PID: 176251 Comm: kworker/3:0 Not tainted
                    Hardware name: IBM 8561 T01 701 (z/VM 7.2.0)
                    Call Trace:
                     [<000000002983c3e4>] dump_stack_lvl+0xac/0x100
                     [<0000000028b477ae>] check_noncircular+0x13e/0x160
                     [<0000000028b48808>] check_prev_add+0xd8/0xe88
                     [<0000000028b49cc4>] validate_chain+0x70c/0xb20
                     [<0000000028b4bd26>] __lock_acquire+0x58e/0xbd8
                     [<0000000028b4cf6a>] lock_acquire.part.0+0xe2/0x248
                     [<0000000028b4d17c>] lock_acquire+0xac/0x1c8
                     [<0000000028addaaa>] __flush_workqueue+0xaa/0x4f0
                     [<0000000028addf9a>] drain_workqueue+0xaa/0x158
                     [<0000000028ae303c>] destroy_workqueue+0x44/0x2d8
                     [<000003ff8029af26>] smc_lgr_free+0x9e/0xf8 [smc]
                     [<0000000028adf3d4>] process_one_work+0x30c/0x730
                     [<0000000028adf85a>] worker_thread+0x62/0x420
                     [<0000000028aeac50>] kthread+0x138/0x150
                     [<0000000028a63914>] __ret_from_fork+0x3c/0x58
                     [<00000000298503da>] ret_from_fork+0xa/0x40
                    INFO: lockdep is turned off.
    ===================================================================
    
    This deadlock occurs because cancel_delayed_work_sync() waits for
    the work(&lgr->free_work) to finish, while the &lgr->free_work
    waits for the work(lgr->tx_wq), which needs the sk_lock-AF_SMC, that
    is already used under the mutex_lock.
    
    The solution is to use cancel_delayed_work() instead, which kills
    off a pending work.
    
    Fixes: a52bcc919b14 ("net/smc: improve termination processing")
    Signed-off-by: Wenjia Zhang <[email protected]>
    Reviewed-by: Jan Karcher <[email protected]>
    Reviewed-by: Karsten Graul <[email protected]>
    Reviewed-by: Tony Lu <[email protected]>
    Signed-off-by: David S. Miller <[email protected]>
    Signed-off-by: Sasha Levin <[email protected]>

net/smc: fix NULL sndbuf_desc in smc_cdc_tx_handler() [+ + +]

Author: D. Wythe <[email protected]>
Date:   Wed Mar 8 16:17:12 2023 +0800

    net/smc: fix NULL sndbuf_desc in smc_cdc_tx_handler()
    
    [ Upstream commit 22a825c541d775c1dbe7b2402786025acad6727b ]
    
    When performing a stress test on SMC-R by rmmod mlx5_ib driver
    during the wrk/nginx test, we found that there is a probability
    of triggering a panic while terminating all link groups.
    
    This issue dues to the race between smc_smcr_terminate_all()
    and smc_buf_create().
    
                            smc_smcr_terminate_all
    
    smc_buf_create
    /* init */
    conn->sndbuf_desc = NULL;
    ...
    
                            __smc_lgr_terminate
                                    smc_conn_kill
                                            smc_close_abort
                                                    smc_cdc_get_slot_and_msg_send
    
                            __softirqentry_text_start
                                    smc_wr_tx_process_cqe
                                            smc_cdc_tx_handler
                                                    READ(conn->sndbuf_desc->len);
                                                    /* panic dues to NULL sndbuf_desc */
    
    conn->sndbuf_desc = xxx;
    
    This patch tries to fix the issue by always to check the sndbuf_desc
    before send any cdc msg, to make sure that no null pointer is
    seen during cqe processing.
    
    Fixes: 0b29ec643613 ("net/smc: immediate termination for SMCR link groups")
    Signed-off-by: D. Wythe <[email protected]>
    Reviewed-by: Tony Lu <[email protected]>
    Reviewed-by: Wenjia Zhang <[email protected]>
    Link: https://lore.kernel.org/r/[email protected]
    Signed-off-by: Jakub Kicinski <[email protected]>
    Signed-off-by: Sasha Levin <[email protected]>

net: dsa: mv88e6xxx: fix max_mtu of 1492 on 6165, 6191, 6220, 6250, 6290 [+ + +]

Author: Vladimir Oltean <[email protected]>
Date:   Tue Mar 14 20:24:05 2023 +0200

    net: dsa: mv88e6xxx: fix max_mtu of 1492 on 6165, 6191, 6220, 6250, 6290
    
    [ Upstream commit 7e9517375a14f44ee830ca1c3278076dd65fcc8f ]
    
    There are 3 classes of switch families that the driver is aware of, as
    far as mv88e6xxx_change_mtu() is concerned:
    
    - MTU configuration is available per port. Here, the
      chip->info->ops->port_set_jumbo_size() method will be present.
    
    - MTU configuration is global to the switch. Here, the
      chip->info->ops->set_max_frame_size() method will be present.
    
    - We don't know how to change the MTU. Here, none of the above methods
      will be present.
    
    Switch families MV88E6165, MV88E6191, MV88E6220, MV88E6250 and MV88E6290
    fall in category 3.
    
    The blamed commit has adjusted the MTU for all 3 categories by EDSA_HLEN
    (8 bytes), resulting in a new maximum MTU of 1492 being reported by the
    driver for these switches.
    
    I don't have the hardware to test, but I do have a MV88E6390 switch on
    which I can simulate this by commenting out its .port_set_jumbo_size
    definition from mv88e6390_ops. The result is this set of messages at
    probe time:
    
    mv88e6085 d0032004.mdio-mii:10: nonfatal error -34 setting MTU to 1500 on port 1
    mv88e6085 d0032004.mdio-mii:10: nonfatal error -34 setting MTU to 1500 on port 2
    mv88e6085 d0032004.mdio-mii:10: nonfatal error -34 setting MTU to 1500 on port 3
    mv88e6085 d0032004.mdio-mii:10: nonfatal error -34 setting MTU to 1500 on port 4
    mv88e6085 d0032004.mdio-mii:10: nonfatal error -34 setting MTU to 1500 on port 5
    mv88e6085 d0032004.mdio-mii:10: nonfatal error -34 setting MTU to 1500 on port 6
    mv88e6085 d0032004.mdio-mii:10: nonfatal error -34 setting MTU to 1500 on port 7
    mv88e6085 d0032004.mdio-mii:10: nonfatal error -34 setting MTU to 1500 on port 8
    
    It is highly implausible that there exist Ethernet switches which don't
    support the standard MTU of 1500 octets, and this is what the DSA
    framework says as well - the error comes from dsa_slave_create() ->
    dsa_slave_change_mtu(slave_dev, ETH_DATA_LEN).
    
    But the error messages are alarming, and it would be good to suppress
    them.
    
    As a consequence of this unlikeliness, we reimplement mv88e6xxx_get_max_mtu()
    and mv88e6xxx_change_mtu() on switches from the 3rd category as follows:
    the maximum supported MTU is 1500, and any request to set the MTU to a
    value larger than that fails in dev_validate_mtu().
    
    Fixes: b9c587fed61c ("dsa: mv88e6xxx: Include tagger overhead when setting MTU for DSA and CPU ports")
    Signed-off-by: Vladimir Oltean <[email protected]>
    Reviewed-by: Simon Horman <[email protected]>
    Reviewed-by: Florian Fainelli <[email protected]>
    Signed-off-by: David S. Miller <[email protected]>
    Signed-off-by: Sasha Levin <[email protected]>

net: phy: smsc: bail out in lan87xx_read_status if genphy_read_status fails [+ + +]

Author: Heiner Kallweit <[email protected]>
Date:   Sat Mar 11 19:34:45 2023 +0100

    net: phy: smsc: bail out in lan87xx_read_status if genphy_read_status fails
    
    [ Upstream commit c22c3bbf351e4ce905f082649cffa1ff893ea8c1 ]
    
    If genphy_read_status fails then further access to the PHY may result
    in unpredictable behavior. To prevent this bail out immediately if
    genphy_read_status fails.
    
    Fixes: 4223dbffed9f ("net: phy: smsc: Re-enable EDPD mode for LAN87xx")
    Signed-off-by: Heiner Kallweit <[email protected]>
    Reviewed-by: Simon Horman <[email protected]>
    Link: https://lore.kernel.org/r/[email protected]
    Signed-off-by: Jakub Kicinski <[email protected]>
    Signed-off-by: Sasha Levin <[email protected]>

net: tunnels: annotate lockless accesses to dev->needed_headroom [+ + +]

Author: Eric Dumazet <[email protected]>
Date:   Fri Mar 10 19:11:09 2023 +0000

    net: tunnels: annotate lockless accesses to dev->needed_headroom
    
    [ Upstream commit 4b397c06cb987935b1b097336532aa6b4210e091 ]
    
    IP tunnels can apparently update dev->needed_headroom
    in their xmit path.
    
    This patch takes care of three tunnels xmit, and also the
    core LL_RESERVED_SPACE() and LL_RESERVED_SPACE_EXTRA()
    helpers.
    
    More changes might be needed for completeness.
    
    BUG: KCSAN: data-race in ip_tunnel_xmit / ip_tunnel_xmit
    
    read to 0xffff88815b9da0ec of 2 bytes by task 888 on cpu 1:
    ip_tunnel_xmit+0x1270/0x1730 net/ipv4/ip_tunnel.c:803
    __gre_xmit net/ipv4/ip_gre.c:469 [inline]
    ipgre_xmit+0x516/0x570 net/ipv4/ip_gre.c:661
    __netdev_start_xmit include/linux/netdevice.h:4881 [inline]
    netdev_start_xmit include/linux/netdevice.h:4895 [inline]
    xmit_one net/core/dev.c:3580 [inline]
    dev_hard_start_xmit+0x127/0x400 net/core/dev.c:3596
    __dev_queue_xmit+0x1007/0x1eb0 net/core/dev.c:4246
    dev_queue_xmit include/linux/netdevice.h:3051 [inline]
    neigh_direct_output+0x17/0x20 net/core/neighbour.c:1623
    neigh_output include/net/neighbour.h:546 [inline]
    ip_finish_output2+0x740/0x840 net/ipv4/ip_output.c:228
    ip_finish_output+0xf4/0x240 net/ipv4/ip_output.c:316
    NF_HOOK_COND include/linux/netfilter.h:291 [inline]
    ip_output+0xe5/0x1b0 net/ipv4/ip_output.c:430
    dst_output include/net/dst.h:444 [inline]
    ip_local_out+0x64/0x80 net/ipv4/ip_output.c:126
    iptunnel_xmit+0x34a/0x4b0 net/ipv4/ip_tunnel_core.c:82
    ip_tunnel_xmit+0x1451/0x1730 net/ipv4/ip_tunnel.c:813
    __gre_xmit net/ipv4/ip_gre.c:469 [inline]
    ipgre_xmit+0x516/0x570 net/ipv4/ip_gre.c:661
    __netdev_start_xmit include/linux/netdevice.h:4881 [inline]
    netdev_start_xmit include/linux/netdevice.h:4895 [inline]
    xmit_one net/core/dev.c:3580 [inline]
    dev_hard_start_xmit+0x127/0x400 net/core/dev.c:3596
    __dev_queue_xmit+0x1007/0x1eb0 net/core/dev.c:4246
    dev_queue_xmit include/linux/netdevice.h:3051 [inline]
    neigh_direct_output+0x17/0x20 net/core/neighbour.c:1623
    neigh_output include/net/neighbour.h:546 [inline]
    ip_finish_output2+0x740/0x840 net/ipv4/ip_output.c:228
    ip_finish_output+0xf4/0x240 net/ipv4/ip_output.c:316
    NF_HOOK_COND include/linux/netfilter.h:291 [inline]
    ip_output+0xe5/0x1b0 net/ipv4/ip_output.c:430
    dst_output include/net/dst.h:444 [inline]
    ip_local_out+0x64/0x80 net/ipv4/ip_output.c:126
    iptunnel_xmit+0x34a/0x4b0 net/ipv4/ip_tunnel_core.c:82
    ip_tunnel_xmit+0x1451/0x1730 net/ipv4/ip_tunnel.c:813
    __gre_xmit net/ipv4/ip_gre.c:469 [inline]
    ipgre_xmit+0x516/0x570 net/ipv4/ip_gre.c:661
    __netdev_start_xmit include/linux/netdevice.h:4881 [inline]
    netdev_start_xmit include/linux/netdevice.h:4895 [inline]
    xmit_one net/core/dev.c:3580 [inline]
    dev_hard_start_xmit+0x127/0x400 net/core/dev.c:3596
    __dev_queue_xmit+0x1007/0x1eb0 net/core/dev.c:4246
    dev_queue_xmit include/linux/netdevice.h:3051 [inline]
    neigh_direct_output+0x17/0x20 net/core/neighbour.c:1623
    neigh_output include/net/neighbour.h:546 [inline]
    ip_finish_output2+0x740/0x840 net/ipv4/ip_output.c:228
    ip_finish_output+0xf4/0x240 net/ipv4/ip_output.c:316
    NF_HOOK_COND include/linux/netfilter.h:291 [inline]
    ip_output+0xe5/0x1b0 net/ipv4/ip_output.c:430
    dst_output include/net/dst.h:444 [inline]
    ip_local_out+0x64/0x80 net/ipv4/ip_output.c:126
    iptunnel_xmit+0x34a/0x4b0 net/ipv4/ip_tunnel_core.c:82
    ip_tunnel_xmit+0x1451/0x1730 net/ipv4/ip_tunnel.c:813
    __gre_xmit net/ipv4/ip_gre.c:469 [inline]
    ipgre_xmit+0x516/0x570 net/ipv4/ip_gre.c:661
    __netdev_start_xmit include/linux/netdevice.h:4881 [inline]
    netdev_start_xmit include/linux/netdevice.h:4895 [inline]
    xmit_one net/core/dev.c:3580 [inline]
    dev_hard_start_xmit+0x127/0x400 net/core/dev.c:3596
    __dev_queue_xmit+0x1007/0x1eb0 net/core/dev.c:4246
    dev_queue_xmit include/linux/netdevice.h:3051 [inline]
    neigh_direct_output+0x17/0x20 net/core/neighbour.c:1623
    neigh_output include/net/neighbour.h:546 [inline]
    ip_finish_output2+0x740/0x840 net/ipv4/ip_output.c:228
    ip_finish_output+0xf4/0x240 net/ipv4/ip_output.c:316
    NF_HOOK_COND include/linux/netfilter.h:291 [inline]
    ip_output+0xe5/0x1b0 net/ipv4/ip_output.c:430
    dst_output include/net/dst.h:444 [inline]
    ip_local_out+0x64/0x80 net/ipv4/ip_output.c:126
    iptunnel_xmit+0x34a/0x4b0 net/ipv4/ip_tunnel_core.c:82
    ip_tunnel_xmit+0x1451/0x1730 net/ipv4/ip_tunnel.c:813
    __gre_xmit net/ipv4/ip_gre.c:469 [inline]
    ipgre_xmit+0x516/0x570 net/ipv4/ip_gre.c:661
    __netdev_start_xmit include/linux/netdevice.h:4881 [inline]
    netdev_start_xmit include/linux/netdevice.h:4895 [inline]
    xmit_one net/core/dev.c:3580 [inline]
    dev_hard_start_xmit+0x127/0x400 net/core/dev.c:3596
    __dev_queue_xmit+0x1007/0x1eb0 net/core/dev.c:4246
    dev_queue_xmit include/linux/netdevice.h:3051 [inline]
    neigh_direct_output+0x17/0x20 net/core/neighbour.c:1623
    neigh_output include/net/neighbour.h:546 [inline]
    ip_finish_output2+0x740/0x840 net/ipv4/ip_output.c:228
    ip_finish_output+0xf4/0x240 net/ipv4/ip_output.c:316
    NF_HOOK_COND include/linux/netfilter.h:291 [inline]
    ip_output+0xe5/0x1b0 net/ipv4/ip_output.c:430
    dst_output include/net/dst.h:444 [inline]
    ip_local_out+0x64/0x80 net/ipv4/ip_output.c:126
    iptunnel_xmit+0x34a/0x4b0 net/ipv4/ip_tunnel_core.c:82
    ip_tunnel_xmit+0x1451/0x1730 net/ipv4/ip_tunnel.c:813
    __gre_xmit net/ipv4/ip_gre.c:469 [inline]
    ipgre_xmit+0x516/0x570 net/ipv4/ip_gre.c:661
    __netdev_start_xmit include/linux/netdevice.h:4881 [inline]
    netdev_start_xmit include/linux/netdevice.h:4895 [inline]
    xmit_one net/core/dev.c:3580 [inline]
    dev_hard_start_xmit+0x127/0x400 net/core/dev.c:3596
    __dev_queue_xmit+0x1007/0x1eb0 net/core/dev.c:4246
    dev_queue_xmit include/linux/netdevice.h:3051 [inline]
    neigh_direct_output+0x17/0x20 net/core/neighbour.c:1623
    neigh_output include/net/neighbour.h:546 [inline]
    ip_finish_output2+0x740/0x840 net/ipv4/ip_output.c:228
    ip_finish_output+0xf4/0x240 net/ipv4/ip_output.c:316
    NF_HOOK_COND include/linux/netfilter.h:291 [inline]
    ip_output+0xe5/0x1b0 net/ipv4/ip_output.c:430
    dst_output include/net/dst.h:444 [inline]
    ip_local_out+0x64/0x80 net/ipv4/ip_output.c:126
    iptunnel_xmit+0x34a/0x4b0 net/ipv4/ip_tunnel_core.c:82
    ip_tunnel_xmit+0x1451/0x1730 net/ipv4/ip_tunnel.c:813
    __gre_xmit net/ipv4/ip_gre.c:469 [inline]
    ipgre_xmit+0x516/0x570 net/ipv4/ip_gre.c:661
    __netdev_start_xmit include/linux/netdevice.h:4881 [inline]
    netdev_start_xmit include/linux/netdevice.h:4895 [inline]
    xmit_one net/core/dev.c:3580 [inline]
    dev_hard_start_xmit+0x127/0x400 net/core/dev.c:3596
    __dev_queue_xmit+0x1007/0x1eb0 net/core/dev.c:4246
    
    write to 0xffff88815b9da0ec of 2 bytes by task 2379 on cpu 0:
    ip_tunnel_xmit+0x1294/0x1730 net/ipv4/ip_tunnel.c:804
    __gre_xmit net/ipv4/ip_gre.c:469 [inline]
    ipgre_xmit+0x516/0x570 net/ipv4/ip_gre.c:661
    __netdev_start_xmit include/linux/netdevice.h:4881 [inline]
    netdev_start_xmit include/linux/netdevice.h:4895 [inline]
    xmit_one net/core/dev.c:3580 [inline]
    dev_hard_start_xmit+0x127/0x400 net/core/dev.c:3596
    __dev_queue_xmit+0x1007/0x1eb0 net/core/dev.c:4246
    dev_queue_xmit include/linux/netdevice.h:3051 [inline]
    neigh_direct_output+0x17/0x20 net/core/neighbour.c:1623
    neigh_output include/net/neighbour.h:546 [inline]
    ip6_finish_output2+0x9bc/0xc50 net/ipv6/ip6_output.c:134
    __ip6_finish_output net/ipv6/ip6_output.c:195 [inline]
    ip6_finish_output+0x39a/0x4e0 net/ipv6/ip6_output.c:206
    NF_HOOK_COND include/linux/netfilter.h:291 [inline]
    ip6_output+0xeb/0x220 net/ipv6/ip6_output.c:227
    dst_output include/net/dst.h:444 [inline]
    NF_HOOK include/linux/netfilter.h:302 [inline]
    mld_sendpack+0x438/0x6a0 net/ipv6/mcast.c:1820
    mld_send_cr net/ipv6/mcast.c:2121 [inline]
    mld_ifc_work+0x519/0x7b0 net/ipv6/mcast.c:2653
    process_one_work+0x3e6/0x750 kernel/workqueue.c:2390
    worker_thread+0x5f2/0xa10 kernel/workqueue.c:2537
    kthread+0x1ac/0x1e0 kernel/kthread.c:376
    ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:308
    
    value changed: 0x0dd4 -> 0x0e14
    
    Reported by Kernel Concurrency Sanitizer on:
    CPU: 0 PID: 2379 Comm: kworker/0:0 Not tainted 6.3.0-rc1-syzkaller-00002-g8ca09d5fa354-dirty #0
    Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 03/02/2023
    Workqueue: mld mld_ifc_work
    
    Fixes: 8eb30be0352d ("ipv6: Create ip6_tnl_xmit")
    Reported-by: syzbot <[email protected]>
    Signed-off-by: Eric Dumazet <[email protected]>
    Link: https://lore.kernel.org/r/[email protected]
    Signed-off-by: Jakub Kicinski <[email protected]>
    Signed-off-by: Sasha Levin <[email protected]>

net: usb: smsc75xx: Limit packet length to skb->len [+ + +]

Author: Szymon Heidrich <[email protected]>
Date:   Mon Mar 13 23:00:45 2023 +0100

    net: usb: smsc75xx: Limit packet length to skb->len
    
    [ Upstream commit d8b228318935044dafe3a5bc07ee71a1f1424b8d ]
    
    Packet length retrieved from skb data may be larger than
    the actual socket buffer length (up to 9026 bytes). In such
    case the cloned skb passed up the network stack will leak
    kernel memory contents.
    
    Fixes: d0cad871703b ("smsc75xx: SMSC LAN75xx USB gigabit ethernet adapter driver")
    Signed-off-by: Szymon Heidrich <[email protected]>
    Signed-off-by: David S. Miller <[email protected]>
    Signed-off-by: Sasha Levin <[email protected]>

net: usb: smsc75xx: Move packet length check to prevent kernel panic in skb_pull [+ + +]

Author: Szymon Heidrich <[email protected]>
Date:   Thu Mar 16 12:05:40 2023 +0100

    net: usb: smsc75xx: Move packet length check to prevent kernel panic in skb_pull
    
    [ Upstream commit 43ffe6caccc7a1bb9d7442fbab521efbf6c1378c ]
    
    Packet length check needs to be located after size and align_count
    calculation to prevent kernel panic in skb_pull() in case
    rx_cmd_a & RX_CMD_A_RED evaluates to true.
    
    Fixes: d8b228318935 ("net: usb: smsc75xx: Limit packet length to skb->len")
    Signed-off-by: Szymon Heidrich <[email protected]>
    Link: https://lore.kernel.org/r/[email protected]
    Signed-off-by: Jakub Kicinski <[email protected]>
    Signed-off-by: Sasha Levin <[email protected]>

netfilter: nft_masq: correct length for loading protocol registers [+ + +]

Author: Jeremy Sowden <[email protected]>
Date:   Tue Mar 7 23:22:57 2023 +0000

    netfilter: nft_masq: correct length for loading protocol registers
    
    [ Upstream commit ec2c5917eb858428b2083d1c74f445aabbe8316b ]
    
    The values in the protocol registers are two bytes wide.  However, when
    parsing the register loads, the code currently uses the larger 16-byte
    size of a `union nf_inet_addr`.  Change it to use the (correct) size of
    a `union nf_conntrack_man_proto` instead.
    
    Fixes: 8a6bf5da1aef ("netfilter: nft_masq: support port range")
    Signed-off-by: Jeremy Sowden <[email protected]>
    Reviewed-by: Florian Westphal <[email protected]>
    Signed-off-by: Pablo Neira Ayuso <[email protected]>
    Signed-off-by: Sasha Levin <[email protected]>

netfilter: nft_nat: correct length for loading protocol registers [+ + +]

Author: Jeremy Sowden <[email protected]>
Date:   Tue Mar 7 23:22:56 2023 +0000

    netfilter: nft_nat: correct length for loading protocol registers
    
    [ Upstream commit 068d82e75d537b444303b8c449a11e51ea659565 ]
    
    The values in the protocol registers are two bytes wide.  However, when
    parsing the register loads, the code currently uses the larger 16-byte
    size of a `union nf_inet_addr`.  Change it to use the (correct) size of
    a `union nf_conntrack_man_proto` instead.
    
    Fixes: d07db9884a5f ("netfilter: nf_tables: introduce nft_validate_register_load()")
    Signed-off-by: Jeremy Sowden <[email protected]>
    Reviewed-by: Florian Westphal <[email protected]>
    Signed-off-by: Pablo Neira Ayuso <[email protected]>
    Signed-off-by: Sasha Levin <[email protected]>

netfilter: nft_redir: correct length for loading protocol registers [+ + +]

Author: Jeremy Sowden <[email protected]>
Date:   Tue Mar 7 23:22:58 2023 +0000

    netfilter: nft_redir: correct length for loading protocol registers
    
    [ Upstream commit 1f617b6b4c7a3d5ea7a56abb83a4c27733b60c2f ]
    
    The values in the protocol registers are two bytes wide.  However, when
    parsing the register loads, the code currently uses the larger 16-byte
    size of a `union nf_inet_addr`.  Change it to use the (correct) size of
    a `union nf_conntrack_man_proto` instead.
    
    Fixes: d07db9884a5f ("netfilter: nf_tables: introduce nft_validate_register_load()")
    Signed-off-by: Jeremy Sowden <[email protected]>
    Reviewed-by: Florian Westphal <[email protected]>
    Signed-off-by: Pablo Neira Ayuso <[email protected]>
    Signed-off-by: Sasha Levin <[email protected]>

netfilter: nft_redir: correct value of inet type `.maxattrs` [+ + +]

Author: Jeremy Sowden <[email protected]>
Date:   Tue Mar 7 23:22:59 2023 +0000

    netfilter: nft_redir: correct value of inet type `.maxattrs`
    
    [ Upstream commit 493924519b1fe3faab13ee621a43b0d0939abab1 ]
    
    `nft_redir_inet_type.maxattrs` was being set, presumably because of a
    cut-and-paste error, to `NFTA_MASQ_MAX`, instead of `NFTA_REDIR_MAX`.
    
    Fixes: 63ce3940f3ab ("netfilter: nft_redir: add inet support")
    Signed-off-by: Jeremy Sowden <[email protected]>
    Reviewed-by: Florian Westphal <[email protected]>
    Signed-off-by: Pablo Neira Ayuso <[email protected]>
    Signed-off-by: Sasha Levin <[email protected]>

nfc: pn533: initialize struct pn533_out_arg properly [+ + +]

Author: Fedor Pchelkin <[email protected]>
Date:   Thu Mar 9 19:50:50 2023 +0300

    nfc: pn533: initialize struct pn533_out_arg properly
    
    [ Upstream commit 484b7059796e3bc1cb527caa61dfc60da649b4f6 ]
    
    struct pn533_out_arg used as a temporary context for out_urb is not
    initialized properly. Its uninitialized 'phy' field can be dereferenced in
    error cases inside pn533_out_complete() callback function. It causes the
    following failure:
    
    general protection fault, probably for non-canonical address 0xdffffc0000000000: 0000 [#1] PREEMPT SMP KASAN
    KASAN: null-ptr-deref in range [0x0000000000000000-0x0000000000000007]
    CPU: 1 PID: 0 Comm: swapper/1 Not tainted 6.2.0-rc3-next-20230110-syzkaller #0
    Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 10/26/2022
    RIP: 0010:pn533_out_complete.cold+0x15/0x44 drivers/nfc/pn533/usb.c:441
    Call Trace:
     <IRQ>
     __usb_hcd_giveback_urb+0x2b6/0x5c0 drivers/usb/core/hcd.c:1671
     usb_hcd_giveback_urb+0x384/0x430 drivers/usb/core/hcd.c:1754
     dummy_timer+0x1203/0x32d0 drivers/usb/gadget/udc/dummy_hcd.c:1988
     call_timer_fn+0x1da/0x800 kernel/time/timer.c:1700
     expire_timers+0x234/0x330 kernel/time/timer.c:1751
     __run_timers kernel/time/timer.c:2022 [inline]
     __run_timers kernel/time/timer.c:1995 [inline]
     run_timer_softirq+0x326/0x910 kernel/time/timer.c:2035
     __do_softirq+0x1fb/0xaf6 kernel/softirq.c:571
     invoke_softirq kernel/softirq.c:445 [inline]
     __irq_exit_rcu+0x123/0x180 kernel/softirq.c:650
     irq_exit_rcu+0x9/0x20 kernel/softirq.c:662
     sysvec_apic_timer_interrupt+0x97/0xc0 arch/x86/kernel/apic/apic.c:1107
    
    Initialize the field with the pn533_usb_phy currently used.
    
    Found by Linux Verification Center (linuxtesting.org) with Syzkaller.
    
    Fixes: 9dab880d675b ("nfc: pn533: Wait for out_urb's completion in pn533_usb_send_frame()")
    Reported-by: [email protected]
    Signed-off-by: Fedor Pchelkin <[email protected]>
    Reviewed-by: Simon Horman <[email protected]>
    Link: https://lore.kernel.org/r/[email protected]
    Signed-off-by: Jakub Kicinski <[email protected]>
    Signed-off-by: Sasha Levin <[email protected]>

nfc: st-nci: Fix use after free bug in ndlc_remove due to race condition [+ + +]

Author: Zheng Wang <[email protected]>
Date:   Mon Mar 13 00:08:37 2023 +0800

    nfc: st-nci: Fix use after free bug in ndlc_remove due to race condition
    
    [ Upstream commit 5000fe6c27827a61d8250a7e4a1d26c3298ef4f6 ]
    
    This bug influences both st_nci_i2c_remove and st_nci_spi_remove.
    Take st_nci_i2c_remove as an example.
    
    In st_nci_i2c_probe, it called ndlc_probe and bound &ndlc->sm_work
    with llt_ndlc_sm_work.
    
    When it calls ndlc_recv or timeout handler, it will finally call
    schedule_work to start the work.
    
    When we call st_nci_i2c_remove to remove the driver, there
    may be a sequence as follows:
    
    Fix it by finishing the work before cleanup in ndlc_remove
    
    CPU0                  CPU1
    
                        |llt_ndlc_sm_work
    st_nci_i2c_remove   |
      ndlc_remove       |
         st_nci_remove  |
         nci_free_device|
         kfree(ndev)    |
    //free ndlc->ndev   |
                        |llt_ndlc_rcv_queue
                        |nci_recv_frame
                        |//use ndlc->ndev
    
    Fixes: 35630df68d60 ("NFC: st21nfcb: Add driver for STMicroelectronics ST21NFCB NFC chip")
    Signed-off-by: Zheng Wang <[email protected]>
    Reviewed-by: Krzysztof Kozlowski <[email protected]>
    Link: https://lore.kernel.org/r/[email protected]
    Signed-off-by: Jakub Kicinski <[email protected]>
    Signed-off-by: Sasha Levin <[email protected]>

null_blk: Move driver into its own directory [+ + +]

Author: Damien Le Moal <[email protected]>
Date:   Fri Nov 20 10:55:19 2020 +0900

    null_blk: Move driver into its own directory
    
    [ Upstream commit eebf34a85c8c724676eba502d15202854f199b05 ]
    
    Move null_blk driver code into the new sub-directory
    drivers/block/null_blk.
    
    Suggested-by: Bart Van Assche <[email protected]>
    Signed-off-by: Damien Le Moal <[email protected]>
    Reviewed-by: Johannes Thumshirn <[email protected]>
    Signed-off-by: Jens Axboe <[email protected]>
    Stable-dep-of: 63f886597085 ("block: null_blk: Fix handling of fake timeout request")
    Signed-off-by: Sasha Levin <[email protected]>

nvme: fix handling single range discard request [+ + +]

Author: Ming Lei <[email protected]>
Date:   Sat Mar 4 07:13:45 2023 +0800

    nvme: fix handling single range discard request
    
    [ Upstream commit 37f0dc2ec78af0c3f35dd05578763de059f6fe77 ]
    
    When investigating one customer report on warning in nvme_setup_discard,
    we observed the controller(nvme/tcp) actually exposes
    queue_max_discard_segments(req->q) == 1.
    
    Obviously the current code can't handle this situation, since contiguity
    merge like normal RW request is taken.
    
    Fix the issue by building range from request sector/nr_sectors directly.
    
    Fixes: b35ba01ea697 ("nvme: support ranged discard requests")
    Signed-off-by: Ming Lei <[email protected]>
    Reviewed-by: Chaitanya Kulkarni <[email protected]>
    Signed-off-by: Christoph Hellwig <[email protected]>
    Signed-off-by: Sasha Levin <[email protected]>

nvmet: avoid potential UAF in nvmet_req_complete() [+ + +]

Author: Damien Le Moal <[email protected]>
Date:   Mon Mar 6 10:13:13 2023 +0900

    nvmet: avoid potential UAF in nvmet_req_complete()
    
    [ Upstream commit 6173a77b7e9d3e202bdb9897b23f2a8afe7bf286 ]
    
    An nvme target ->queue_response() operation implementation may free the
    request passed as argument. Such implementation potentially could result
    in a use after free of the request pointer when percpu_ref_put() is
    called in nvmet_req_complete().
    
    Avoid such problem by using a local variable to save the sq pointer
    before calling __nvmet_req_complete(), thus avoiding dereferencing the
    req pointer after that function call.
    
    Fixes: a07b4970f464 ("nvmet: add a generic NVMe target")
    Signed-off-by: Damien Le Moal <[email protected]>
    Reviewed-by: Chaitanya Kulkarni <[email protected]>
    Signed-off-by: Christoph Hellwig <[email protected]>
    Signed-off-by: Sasha Levin <[email protected]>

PCI/DPC: Await readiness of secondary bus after reset [+ + +]

Author: Lukas Wunner <[email protected]>
Date:   Sun Jan 15 09:20:33 2023 +0100

    PCI/DPC: Await readiness of secondary bus after reset
    
    commit 53b54ad074de1896f8b021615f65b27f557ce874 upstream.
    
    pci_bridge_wait_for_secondary_bus() is called after a Secondary Bus
    Reset, but not after a DPC-induced Hot Reset.
    
    As a result, the delays prescribed by PCIe r6.0 sec 6.6.1 are not
    observed and devices on the secondary bus may be accessed before
    they're ready.
    
    One affected device is Intel's Ponte Vecchio HPC GPU.  It comprises a
    PCIe switch whose upstream port is not immediately ready after reset.
    Because its config space is restored too early, it remains in
    D0uninitialized, its subordinate devices remain inaccessible and DPC
    recovery fails with messages such as:
    
      i915 0000:8c:00.0: can't change power state from D3cold to D0 (config space inaccessible)
      intel_vsec 0000:8e:00.1: can't change power state from D3cold to D0 (config space inaccessible)
      pcieport 0000:89:02.0: AER: device recovery failed
    
    Fix it.
    
    Link: https://lore.kernel.org/r/9f5ff00e1593d8d9a4b452398b98aa14d23fca11.1673769517.git.lukas@wunner.de
    Tested-by: Ravi Kishore Koppuravuri <[email protected]>
    Signed-off-by: Lukas Wunner <[email protected]>
    Signed-off-by: Bjorn Helgaas <[email protected]>
    Reviewed-by: Mika Westerberg <[email protected]>
    Cc: [email protected]
    Signed-off-by: Greg Kroah-Hartman <[email protected]>

PCI: Unify delay handling for reset and resume [+ + +]

Author: Lukas Wunner <[email protected]>
Date:   Sun Jan 15 09:20:32 2023 +0100

    PCI: Unify delay handling for reset and resume
    
    commit ac91e6980563ed53afadd925fa6585ffd2bc4a2c upstream.
    
    Sheng Bi reports that pci_bridge_secondary_bus_reset() may fail to wait
    for devices on the secondary bus to become accessible after reset:
    
    Although it does call pci_dev_wait(), it erroneously passes the bridge's
    pci_dev rather than that of a child.  The bridge of course is always
    accessible while its secondary bus is reset, so pci_dev_wait() returns
    immediately.
    
    Sheng Bi proposes introducing a new pci_bridge_secondary_bus_wait()
    function which is called from pci_bridge_secondary_bus_reset():
    
    https://lore.kernel.org/linux-pci/[email protected]/
    
    However we already have pci_bridge_wait_for_secondary_bus() which does
    almost exactly what we need.  So far it's only called on resume from
    D3cold (which implies a Fundamental Reset per PCIe r6.0 sec 5.8).
    Re-using it for Secondary Bus Resets is a leaner and more rational
    approach than introducing a new function.
    
    That only requires a few minor tweaks:
    
    - Amend pci_bridge_wait_for_secondary_bus() to await accessibility of
      the first device on the secondary bus by calling pci_dev_wait() after
      performing the prescribed delays.  pci_dev_wait() needs two parameters,
      a reset reason and a timeout, which callers must now pass to
      pci_bridge_wait_for_secondary_bus().  The timeout is 1 sec for resume
      (PCIe r6.0 sec 6.6.1) and 60 sec for reset (commit 821cdad5c46c ("PCI:
      Wait up to 60 seconds for device to become ready after FLR")).
      Introduce a PCI_RESET_WAIT macro for the 1 sec timeout.
    
    - Amend pci_bridge_wait_for_secondary_bus() to return 0 on success or
      -ENOTTY on error for consumption by pci_bridge_secondary_bus_reset().
    
    - Drop an unnecessary 1 sec delay from pci_reset_secondary_bus() which
      is now performed by pci_bridge_wait_for_secondary_bus().  A static
      delay this long is only necessary for Conventional PCI, so modern
      PCIe systems benefit from shorter reset times as a side effect.
    
    Fixes: 6b2f1351af56 ("PCI: Wait for device to become ready after secondary bus reset")
    Link: https://lore.kernel.org/r/da77c92796b99ec568bd070cbe4725074a117038.1673769517.git.lukas@wunner.de
    Reported-by: Sheng Bi <[email protected]>
    Tested-by: Ravi Kishore Koppuravuri <[email protected]>
    Signed-off-by: Lukas Wunner <[email protected]>
    Signed-off-by: Bjorn Helgaas <[email protected]>
    Reviewed-by: Mika Westerberg <[email protected]>
    Reviewed-by: Kuppuswamy Sathyanarayanan <[email protected]>
    Cc: [email protected] # v4.17+
    Signed-off-by: Greg Kroah-Hartman <[email protected]>

qed/qed_dev: guard against a possible division by zero [+ + +]

Author: Daniil Tatianin <[email protected]>
Date:   Thu Mar 9 23:15:56 2023 +0300

    qed/qed_dev: guard against a possible division by zero
    
    [ Upstream commit 1a9dc5610ef89d807acdcfbff93a558f341a44da ]
    
    Previously we would divide total_left_rate by zero if num_vports
    happened to be 1 because non_requested_count is calculated as
    num_vports - req_count. Guard against this by validating num_vports at
    the beginning and returning an error otherwise.
    
    Found by Linux Verification Center (linuxtesting.org) with the SVACE
    static analysis tool.
    
    Fixes: bcd197c81f63 ("qed: Add vport WFQ configuration APIs")
    Signed-off-by: Daniil Tatianin <[email protected]>
    Reviewed-by: Simon Horman <[email protected]>
    Link: https://lore.kernel.org/r/[email protected]
    Signed-off-by: Jakub Kicinski <[email protected]>
    Signed-off-by: Sasha Levin <[email protected]>

qed/qed_mng_tlv: correctly zero out ->min instead of ->hour [+ + +]

Author: Daniil Tatianin <[email protected]>
Date:   Wed Mar 15 22:46:18 2023 +0300

    qed/qed_mng_tlv: correctly zero out ->min instead of ->hour
    
    [ Upstream commit 470efd68a4653d9819d391489886432cd31bcd0b ]
    
    This fixes an issue where ->hour would erroneously get zeroed out
    instead of ->min because of a bad copy paste.
    
    Found by Linux Verification Center (linuxtesting.org) with the SVACE
    static analysis tool.
    
    Fixes: f240b6882211 ("qed: Add support for processing fcoe tlv request.")
    Signed-off-by: Daniil Tatianin <[email protected]>
    Link: https://lore.kernel.org/r/[email protected]
    Signed-off-by: Jakub Kicinski <[email protected]>
    Signed-off-by: Sasha Levin <[email protected]>

s390/ipl: add missing intersection check to ipl_report handling [+ + +]

Author: Sven Schnelle <[email protected]>
Date:   Tue Mar 7 14:35:23 2023 +0100

    s390/ipl: add missing intersection check to ipl_report handling
    
    commit a52e5cdbe8016d4e3e6322fd93d71afddb9a5af9 upstream.
    
    The code which handles the ipl report is searching for a free location
    in memory where it could copy the component and certificate entries to.
    It checks for intersection between the sections required for the kernel
    and the component/certificate data area, but fails to check whether
    the data structures linking these data areas together intersect.
    
    This might cause the iplreport copy code to overwrite the iplreport
    itself. Fix this by adding two addtional intersection checks.
    
    Cc: <[email protected]>
    Fixes: 9641b8cc733f ("s390/ipl: read IPL report at early boot")
    Signed-off-by: Sven Schnelle <[email protected]>
    Reviewed-by: Vasily Gorbik <[email protected]>
    Signed-off-by: Vasily Gorbik <[email protected]>
    Signed-off-by: Sven Schnelle <[email protected]>
    Signed-off-by: Greg Kroah-Hartman <[email protected]>

scsi: core: Fix a comment in function scsi_host_dev_release() [+ + +]

Author: Xiang Chen <[email protected]>
Date:   Mon May 10 19:35:26 2021 +0800

    scsi: core: Fix a comment in function scsi_host_dev_release()
    
    [ Upstream commit 2dde5c8d912efea43be94d6a83ac9cb74879fa12 ]
    
    Commit 3be8828fc507 ("scsi: core: Avoid that ATA error handling can
    trigger a kernel hang or oops") moved rcu to scsi_cmnd instead of
    shost. Modify "shost->rcu" to "scmd->rcu" in a comment.
    
    Link: https://lore.kernel.org/r/[email protected]
    Signed-off-by: Xiang Chen <[email protected]>
    Signed-off-by: Martin K. Petersen <[email protected]>
    Stable-dep-of: be03df3d4bfe ("scsi: core: Fix a procfs host directory removal regression")
    Signed-off-by: Sasha Levin <[email protected]>

scsi: core: Fix a procfs host directory removal regression [+ + +]

Author: Bart Van Assche <[email protected]>
Date:   Tue Mar 7 13:44:28 2023 -0800

    scsi: core: Fix a procfs host directory removal regression
    
    [ Upstream commit be03df3d4bfe7e8866d4aa43d62e648ffe884f5f ]
    
    scsi_proc_hostdir_rm() decreases a reference counter and hence must only be
    called once per host that is removed. This change does not require a
    scsi_add_host_with_dma() change since scsi_add_host_with_dma() will return
    0 (success) if scsi_proc_host_add() is called.
    
    Fixes: fc663711b944 ("scsi: core: Remove the /proc/scsi/${proc_name} directory earlier")
    Cc: John Garry <[email protected]>
    Reported-by: John Garry <[email protected]>
    Link: https://lore.kernel.org/all/[email protected]/
    Reported-by: [email protected]
    Link: https://lore.kernel.org/linux-scsi/[email protected]/
    Signed-off-by: Bart Van Assche <[email protected]>
    Link: https://lore.kernel.org/r/[email protected]
    Tested-by: John Garry <[email protected]>
    Tested-by: Shin'ichiro Kawasaki <[email protected]>
    Signed-off-by: Martin K. Petersen <[email protected]>
    Signed-off-by: Sasha Levin <[email protected]>

scsi: mpt3sas: Fix NULL pointer access in mpt3sas_transport_port_add() [+ + +]

Author: Wenchao Hao <[email protected]>
Date:   Sat Feb 25 18:01:36 2023 +0800

    scsi: mpt3sas: Fix NULL pointer access in mpt3sas_transport_port_add()
    
    [ Upstream commit d3c57724f1569311e4b81e98fad0931028b9bdcd ]
    
    Port is allocated by sas_port_alloc_num() and rphy is allocated by either
    sas_end_device_alloc() or sas_expander_alloc(), all of which may return
    NULL. So we need to check the rphy to avoid possible NULL pointer access.
    
    If sas_rphy_add() returned with failure, rphy is set to NULL. We would
    access the rphy in the following lines which would also result NULL pointer
    access.
    
    Fixes: 78316e9dfc24 ("scsi: mpt3sas: Fix possible resource leaks in mpt3sas_transport_port_add()")
    Signed-off-by: Wenchao Hao <[email protected]>
    Link: https://lore.kernel.org/r/[email protected]
    Acked-by: Sathya Prakash Veerichetty <[email protected]>
    Signed-off-by: Martin K. Petersen <[email protected]>
    Signed-off-by: Sasha Levin <[email protected]>

selftests: net: devlink_port_split.py: skip test if no suitable device available [+ + +]

Author: Po-Hsu Lin <[email protected]>
Date:   Thu Mar 16 00:53:53 2023 +0800

    selftests: net: devlink_port_split.py: skip test if no suitable device available
    
    [ Upstream commit 24994513ad13ff2c47ba91d2b5df82c3d496c370 ]
    
    The `devlink -j port show` command output may not contain the "flavour"
    key, an example from Ubuntu 22.10 s390x LPAR(5.19.0-37-generic), with
    mlx4 driver and iproute2-5.15.0:
      {"port":{"pci/0001:00:00.0/1":{"type":"eth","netdev":"ens301"},
               "pci/0001:00:00.0/2":{"type":"eth","netdev":"ens301d1"},
               "pci/0002:00:00.0/1":{"type":"eth","netdev":"ens317"},
               "pci/0002:00:00.0/2":{"type":"eth","netdev":"ens317d1"}}}
    
    This will cause a KeyError exception.
    
    Create a validate_devlink_output() to check for this "flavour" from
    devlink command output to avoid this KeyError exception. Also let
    it handle the check for `devlink -j dev show` output in main().
    
    Apart from this, if the test was not started because the max lanes of
    the designated device is 0. The script will still return 0 and thus
    causing a false-negative test result.
    
    Use a found_max_lanes flag to determine if these tests were skipped
    due to this reason and return KSFT_SKIP to make it more clear.
    
    Link: https://bugs.launchpad.net/bugs/1937133
    Fixes: f3348a82e727 ("selftests: net: Add port split test")
    Signed-off-by: Po-Hsu Lin <[email protected]>
    Link: https://lore.kernel.org/r/[email protected]
    Signed-off-by: Jakub Kicinski <[email protected]>
    Signed-off-by: Sasha Levin <[email protected]>

serial: 8250_em: Fix UART port type [+ + +]

Author: Biju Das <[email protected]>
Date:   Mon Feb 27 11:41:46 2023 +0000

    serial: 8250_em: Fix UART port type
    
    commit 32e293be736b853f168cd065d9cbc1b0c69f545d upstream.
    
    As per HW manual for  EMEV2 "R19UH0040EJ0400 Rev.4.00", the UART
    IP found on EMMA mobile SoC is Register-compatible with the
    general-purpose 16750 UART chip. Fix UART port type as 16750 and
    enable 64-bytes fifo support.
    
    Fixes: 22886ee96895 ("serial8250-em: Emma Mobile UART driver V2")
    Cc: [email protected]
    Signed-off-by: Biju Das <[email protected]>
    Link: https://lore.kernel.org/r/[email protected]
    Signed-off-by: Greg Kroah-Hartman <[email protected]>

sh: intc: Avoid spurious sizeof-pointer-div warning [+ + +]

Author: Michael Karcher <[email protected]>
Date:   Tue Jan 24 22:48:16 2023 +0100

    sh: intc: Avoid spurious sizeof-pointer-div warning
    
    [ Upstream commit 250870824c1cf199b032b1ef889c8e8d69d9123a ]
    
    GCC warns about the pattern sizeof(void*)/sizeof(void), as it looks like
    the abuse of a pattern to calculate the array size. This pattern appears
    in the unevaluated part of the ternary operator in _INTC_ARRAY if the
    parameter is NULL.
    
    The replacement uses an alternate approach to return 0 in case of NULL
    which does not generate the pattern sizeof(void*)/sizeof(void), but still
    emits the warning if _INTC_ARRAY is called with a nonarray parameter.
    
    This patch is required for successful compilation with -Werror enabled.
    
    The idea to use _Generic for type distinction is taken from Comment #7
    in https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108483 by Jakub Jelinek
    
    Signed-off-by: Michael Karcher <[email protected]>
    Acked-by: Randy Dunlap <[email protected]> # build-tested
    Link: https://lore.kernel.org/r/619fa552-c988-35e5-b1d7-fe256c46a272@mkarcher.dialup.fu-berlin.de
    Signed-off-by: John Paul Adrian Glaubitz <[email protected]>
    Signed-off-by: Sasha Levin <[email protected]>

tcp: tcp_make_synack() can be called from process context [+ + +]

Author: Breno Leitao <[email protected]>
Date:   Wed Mar 8 11:07:45 2023 -0800

    tcp: tcp_make_synack() can be called from process context
    
    [ Upstream commit bced3f7db95ff2e6ca29dc4d1c9751ab5e736a09 ]
    
    tcp_rtx_synack() now could be called in process context as explained in
    0a375c822497 ("tcp: tcp_rtx_synack() can be called from process
    context").
    
    tcp_rtx_synack() might call tcp_make_synack(), which will touch per-CPU
    variables with preemption enabled. This causes the following BUG:
    
        BUG: using __this_cpu_add() in preemptible [00000000] code: ThriftIO1/5464
        caller is tcp_make_synack+0x841/0xac0
        Call Trace:
         <TASK>
         dump_stack_lvl+0x10d/0x1a0
         check_preemption_disabled+0x104/0x110
         tcp_make_synack+0x841/0xac0
         tcp_v6_send_synack+0x5c/0x450
         tcp_rtx_synack+0xeb/0x1f0
         inet_rtx_syn_ack+0x34/0x60
         tcp_check_req+0x3af/0x9e0
         tcp_rcv_state_process+0x59b/0x2030
         tcp_v6_do_rcv+0x5f5/0x700
         release_sock+0x3a/0xf0
         tcp_sendmsg+0x33/0x40
         ____sys_sendmsg+0x2f2/0x490
         __sys_sendmsg+0x184/0x230
         do_syscall_64+0x3d/0x90
    
    Avoid calling __TCP_INC_STATS() with will touch per-cpu variables. Use
    TCP_INC_STATS() which is safe to be called from context switch.
    
    Fixes: 8336886f786f ("tcp: TCP Fast Open Server - support TFO listeners")
    Signed-off-by: Breno Leitao <[email protected]>
    Reviewed-by: Eric Dumazet <[email protected]>
    Link: https://lore.kernel.org/r/[email protected]
    Signed-off-by: Jakub Kicinski <[email protected]>
    Signed-off-by: Sasha Levin <[email protected]>

tracing: Check field value in hist_field_name() [+ + +]

Author: Steven Rostedt (Google) <[email protected]>
Date:   Wed Mar 1 20:00:53 2023 -0500

    tracing: Check field value in hist_field_name()
    
    commit 9f116f76fa8c04c81aef33ad870dbf9a158e5b70 upstream.
    
    The function hist_field_name() cannot handle being passed a NULL field
    parameter. It should never be NULL, but due to a previous bug, NULL was
    passed to the function and the kernel crashed due to a NULL dereference.
    Mark Rutland reported this to me on IRC.
    
    The bug was fixed, but to prevent future bugs from crashing the kernel,
    check the field and add a WARN_ON() if it is NULL.
    
    Link: https://lkml.kernel.org/r/[email protected]
    
    Cc: [email protected]
    Cc: Masami Hiramatsu <[email protected]>
    Cc: Andrew Morton <[email protected]>
    Reported-by: Mark Rutland <[email protected]>
    Fixes: c6afad49d127f ("tracing: Add hist trigger 'sym' and 'sym-offset' modifiers")
    Tested-by: Mark Rutland <[email protected]>
    Signed-off-by: Steven Rostedt (Google) <[email protected]>
    Signed-off-by: Greg Kroah-Hartman <[email protected]>

tracing: Make splice_read available again [+ + +]

Author: Sung-hun Kim <[email protected]>
Date:   Tue Mar 14 10:37:07 2023 +0900

    tracing: Make splice_read available again
    
    commit e400be674a1a40e9dcb2e95f84d6c1fd2d88f31d upstream.
    
    Since the commit 36e2c7421f02 ("fs: don't allow splice read/write
    without explicit ops") is applied to the kernel, splice() and
    sendfile() calls on the trace file (/sys/kernel/debug/tracing
    /trace) return EINVAL.
    
    This patch restores these system calls by initializing splice_read
    in file_operations of the trace file. This patch only enables such
    functionalities for the read case.
    
    Link: https://lore.kernel.org/linux-trace-kernel/[email protected]
    
    Cc: [email protected]
    Fixes: 36e2c7421f02 ("fs: don't allow splice read/write without explicit ops")
    Signed-off-by: Sung-hun Kim <[email protected]>
    Signed-off-by: Steven Rostedt (Google) <[email protected]>
    Signed-off-by: Greg Kroah-Hartman <[email protected]>

tracing: Make tracepoint lockdep check actually test something [+ + +]

Author: Steven Rostedt (Google) <[email protected]>
Date:   Fri Mar 10 17:28:56 2023 -0500

    tracing: Make tracepoint lockdep check actually test something
    
    commit c2679254b9c9980d9045f0f722cf093a2b1f7590 upstream.
    
    A while ago where the trace events had the following:
    
       rcu_read_lock_sched_notrace();
       rcu_dereference_sched(...);
       rcu_read_unlock_sched_notrace();
    
    If the tracepoint is enabled, it could trigger RCU issues if called in
    the wrong place. And this warning was only triggered if lockdep was
    enabled. If the tracepoint was never enabled with lockdep, the bug would
    not be caught. To handle this, the above sequence was done when lockdep
    was enabled regardless if the tracepoint was enabled or not (although the
    always enabled code really didn't do anything, it would still trigger a
    warning).
    
    But a lot has changed since that lockdep code was added. One is, that
    sequence no longer triggers any warning. Another is, the tracepoint when
    enabled doesn't even do that sequence anymore.
    
    The main check we care about today is whether RCU is "watching" or not.
    So if lockdep is enabled, always check if rcu_is_watching() which will
    trigger a warning if it is not (tracepoints require RCU to be watching).
    
    Note, that old sequence did add a bit of overhead when lockdep was enabled,
    and with the latest kernel updates, would cause the system to slow down
    enough to trigger kernel "stalled" warnings.
    
    Link: http://lore.kernel.org/lkml/[email protected]
    Link: http://lore.kernel.org/lkml/[email protected]
    Link: https://lore.kernel.org/lkml/[email protected]/
    Link: https://lore.kernel.org/linux-trace-kernel/[email protected]
    
    Cc: [email protected]
    Cc: Masami Hiramatsu <[email protected]>
    Cc: Dave Hansen <[email protected]>
    Cc: "Paul E. McKenney" <[email protected]>
    Cc: Mathieu Desnoyers <[email protected]>
    Cc: Joel Fernandes <[email protected]>
    Acked-by: Peter Zijlstra (Intel) <[email protected]>
    Acked-by: Paul E. McKenney <[email protected]>
    Fixes: e6753f23d961 ("tracepoint: Make rcuidle tracepoint callers use SRCU")
    Signed-off-by: Steven Rostedt (Google) <[email protected]>
    Signed-off-by: Greg Kroah-Hartman <[email protected]>

tty: serial: fsl_lpuart: skip waiting for transmission complete when UARTCTRL_SBK is asserted [+ + +]

Author: Sherry Sun <[email protected]>
Date:   Thu Feb 23 17:39:41 2023 +0800

    tty: serial: fsl_lpuart: skip waiting for transmission complete when UARTCTRL_SBK is asserted
    
    commit 2411fd94ceaa6e11326e95d6ebf876cbfed28d23 upstream.
    
    According to LPUART RM, Transmission Complete Flag becomes 0 if queuing
    a break character by writing 1 to CTRL[SBK], so here need to skip
    waiting for transmission complete when UARTCTRL_SBK is asserted,
    otherwise the kernel may stuck here.
    And actually set_termios() adds transmission completion waiting to avoid
    data loss or data breakage when changing the baud rate, but we don't
    need to worry about this when queuing break characters.
    
    Signed-off-by: Sherry Sun <[email protected]>
    Cc: stable <[email protected]>
    Link: https://lore.kernel.org/r/[email protected]
    Signed-off-by: Greg Kroah-Hartman <[email protected]>

x86/mce: Make sure logged MCEs are processed after sysfs update [+ + +]

Author: Yazen Ghannam <[email protected]>
Date:   Wed Mar 1 22:14:20 2023 +0000

    x86/mce: Make sure logged MCEs are processed after sysfs update
    
    commit 4783b9cb374af02d49740e00e2da19fd4ed6dec4 upstream.
    
    A recent change introduced a flag to queue up errors found during
    boot-time polling. These errors will be processed during late init once
    the MCE subsystem is fully set up.
    
    A number of sysfs updates call mce_restart() which goes through a subset
    of the CPU init flow. This includes polling MCA banks and logging any
    errors found. Since the same function is used as boot-time polling,
    errors will be queued. However, the system is now past late init, so the
    errors will remain queued until another error is found and the workqueue
    is triggered.
    
    Call mce_schedule_work() at the end of mce_restart() so that queued
    errors are processed.
    
    Fixes: 3bff147b187d ("x86/mce: Defer processing of early errors")
    Signed-off-by: Yazen Ghannam <[email protected]>
    Signed-off-by: Borislav Petkov (AMD) <[email protected]>
    Reviewed-by: Tony Luck <[email protected]>
    Cc: [email protected]
    Link: https://lore.kernel.org/r/[email protected]
    Signed-off-by: Greg Kroah-Hartman <[email protected]>

x86/mm: Fix use of uninitialized buffer in sme_enable() [+ + +]

Author: Nikita Zhandarovich <[email protected]>
Date:   Mon Mar 6 08:06:56 2023 -0800

    x86/mm: Fix use of uninitialized buffer in sme_enable()
    
    commit cbebd68f59f03633469f3ecf9bea99cd6cce3854 upstream.
    
    cmdline_find_option() may fail before doing any initialization of
    the buffer array. This may lead to unpredictable results when the same
    buffer is used later in calls to strncmp() function.  Fix the issue by
    returning early if cmdline_find_option() returns an error.
    
    Found by Linux Verification Center (linuxtesting.org) with static
    analysis tool SVACE.
    
    Fixes: aca20d546214 ("x86/mm: Add support to make use of Secure Memory Encryption")
    Signed-off-by: Nikita Zhandarovich <[email protected]>
    Signed-off-by: Borislav Petkov (AMD) <[email protected]>
    Acked-by: Tom Lendacky <[email protected]>
    Cc: <[email protected]>
    Link: https://lore.kernel.org/r/[email protected]
    Signed-off-by: Greg Kroah-Hartman <[email protected]>

xfrm: Allow transport-mode states with AF_UNSPEC selector [+ + +]

Author: Herbert Xu <[email protected]>
Date:   Tue Feb 21 13:54:00 2023 +0800

    xfrm: Allow transport-mode states with AF_UNSPEC selector
    
    [ Upstream commit c276a706ea1f51cf9723ed8484feceaf961b8f89 ]
    
    xfrm state selectors are matched against the inner-most flow
    which can be of any address family.  Therefore middle states
    in nested configurations need to carry a wildcard selector in
    order to work at all.
    
    However, this is currently forbidden for transport-mode states.
    
    Fix this by removing the unnecessary check.
    
    Fixes: 13996378e658 ("[IPSEC]: Rename mode to outer_mode and add inner_mode")
    Reported-by: David George <[email protected]>
    Signed-off-by: Herbert Xu <[email protected]>
    Signed-off-by: Steffen Klassert <[email protected]>
    Signed-off-by: Sasha Levin <[email protected]>

xfs: don't assert fail on perag references on teardown [+ + +]

Author: Dave Chinner <[email protected]>
Date:   Sat Mar 18 12:15:15 2023 +0200

    xfs: don't assert fail on perag references on teardown
    
    commit 5b55cbc2d72632e874e50d2e36bce608e55aaaea upstream.
    
    [backport for 5.10.y, prior to perag refactoring in v5.14]
    
    Not fatal, the assert is there to catch developer attention. I'm
    seeing this occasionally during recoveryloop testing after a
    shutdown, and I don't want this to stop an overnight recoveryloop
    run as it is currently doing.
    
    Convert the ASSERT to a XFS_IS_CORRUPT() check so it will dump a
    corruption report into the log and cause a test failure that way,
    but it won't stop the machine dead.
    
    Signed-off-by: Dave Chinner <[email protected]>
    Reviewed-by: Darrick J. Wong <[email protected]>
    Reviewed-by: Christoph Hellwig <[email protected]>
    Signed-off-by: Dave Chinner <[email protected]>
    Signed-off-by: Amir Goldstein <[email protected]>
    Signed-off-by: Greg Kroah-Hartman <[email protected]>

xfs: don't leak btree cursor when insrec fails after a split [+ + +]

Author: Darrick J. Wong <[email protected]>
Date:   Sat Mar 18 12:15:17 2023 +0200

    xfs: don't leak btree cursor when insrec fails after a split
    
    commit a54f78def73d847cb060b18c4e4a3d1d26c9ca6d upstream.
    
    The recent patch to improve btree cycle checking caused a regression
    when I rebased the in-memory btree branch atop the 5.19 for-next branch,
    because in-memory short-pointer btrees do not have AG numbers.  This
    produced the following complaint from kmemleak:
    
    unreferenced object 0xffff88803d47dde8 (size 264):
      comm "xfs_io", pid 4889, jiffies 4294906764 (age 24.072s)
      hex dump (first 32 bytes):
        90 4d 0b 0f 80 88 ff ff 00 a0 bd 05 80 88 ff ff  .M..............
        e0 44 3a a0 ff ff ff ff 00 df 08 06 80 88 ff ff  .D:.............
      backtrace:
        [<ffffffffa0388059>] xfbtree_dup_cursor+0x49/0xc0 [xfs]
        [<ffffffffa029887b>] xfs_btree_dup_cursor+0x3b/0x200 [xfs]
        [<ffffffffa029af5d>] __xfs_btree_split+0x6ad/0x820 [xfs]
        [<ffffffffa029b130>] xfs_btree_split+0x60/0x110 [xfs]
        [<ffffffffa029f6da>] xfs_btree_make_block_unfull+0x19a/0x1f0 [xfs]
        [<ffffffffa029fada>] xfs_btree_insrec+0x3aa/0x810 [xfs]
        [<ffffffffa029fff3>] xfs_btree_insert+0xb3/0x240 [xfs]
        [<ffffffffa02cb729>] xfs_rmap_insert+0x99/0x200 [xfs]
        [<ffffffffa02cf142>] xfs_rmap_map_shared+0x192/0x5f0 [xfs]
        [<ffffffffa02cf60b>] xfs_rmap_map_raw+0x6b/0x90 [xfs]
        [<ffffffffa0384a85>] xrep_rmap_stash+0xd5/0x1d0 [xfs]
        [<ffffffffa0384dc0>] xrep_rmap_visit_bmbt+0xa0/0xf0 [xfs]
        [<ffffffffa0384fb6>] xrep_rmap_scan_iext+0x56/0xa0 [xfs]
        [<ffffffffa03850d8>] xrep_rmap_scan_ifork+0xd8/0x160 [xfs]
        [<ffffffffa0385195>] xrep_rmap_scan_inode+0x35/0x80 [xfs]
        [<ffffffffa03852ee>] xrep_rmap_find_rmaps+0x10e/0x270 [xfs]
    
    I noticed that xfs_btree_insrec has a bunch of debug code that return
    out of the function immediately, without freeing the "new" btree cursor
    that can be returned when _make_block_unfull calls xfs_btree_split.  Fix
    the error return in this function to free the btree cursor.
    
    Signed-off-by: Darrick J. Wong <[email protected]>
    Reviewed-by: Christoph Hellwig <[email protected]>
    Reviewed-by: Dave Chinner <[email protected]>
    Signed-off-by: Dave Chinner <[email protected]>
    Signed-off-by: Amir Goldstein <[email protected]>
    Signed-off-by: Greg Kroah-Hartman <[email protected]>

xfs: fallocate() should call file_modified() [+ + +]

Author: Dave Chinner <[email protected]>
Date:   Sat Mar 18 12:15:19 2023 +0200

    xfs: fallocate() should call file_modified()
    
    commit fbe7e520036583a783b13ff9744e35c2a329d9a4 upstream.
    
    In XFS, we always update the inode change and modification time when
    any fallocate() operation succeeds.  Furthermore, as various
    fallocate modes can change the file contents (extending EOF,
    punching holes, zeroing things, shifting extents), we should drop
    file privileges like suid just like we do for a regular write().
    There's already a VFS helper that figures all this out for us, so
    use that.
    
    The net effect of this is that we no longer drop suid/sgid if the
    caller is root, but we also now drop file capabilities.
    
    We also move the xfs_update_prealloc_flags() function so that it now
    is only called by the scope that needs to set the the prealloc flag.
    
    Based on a patch from Darrick Wong.
    
    Signed-off-by: Dave Chinner <[email protected]>
    Reviewed-by: Darrick J. Wong <[email protected]>
    Signed-off-by: Darrick J. Wong <[email protected]>
    Signed-off-by: Amir Goldstein <[email protected]>
    Signed-off-by: Greg Kroah-Hartman <[email protected]>

xfs: purge dquots after inode walk fails during quotacheck [+ + +]

Author: Darrick J. Wong <[email protected]>
Date:   Sat Mar 18 12:15:16 2023 +0200

    xfs: purge dquots after inode walk fails during quotacheck
    
    commit 86d40f1e49e9a909d25c35ba01bea80dbcd758cb upstream.
    
    [add XFS_QMOPT_QUOTALL flag to xfs_qm_dqpurge_all() for 5.10.y backport]
    
    xfs/434 and xfs/436 have been reporting occasional memory leaks of
    xfs_dquot objects.  These tests themselves were the messenger, not the
    culprit, since they unload the xfs module, which trips the slub
    debugging code while tearing down all the xfs slab caches:
    
    =============================================================================
    BUG xfs_dquot (Tainted: G        W        ): Objects remaining in xfs_dquot on __kmem_cache_shutdown()
    -----------------------------------------------------------------------------
    
    Slab 0xffffea000606de00 objects=30 used=5 fp=0xffff888181b78a78 flags=0x17ff80000010200(slab|head|node=0|zone=2|lastcpupid=0xfff)
    CPU: 0 PID: 3953166 Comm: modprobe Tainted: G        W         5.18.0-rc6-djwx #rc6 d5824be9e46a2393677bda868f9b154d917ca6a7
    Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS ?-20171121_152543-x86-ol7-builder-01.us.oracle.com-4.el7.1 04/01/2014
    
    Since we don't generally rmmod the xfs module between fstests, this
    means that xfs/434 is really just the canary in the coal mine --
    something leaked a dquot, but we don't know who.  After days of pounding
    on fstests with kmemleak enabled, I finally got it to spit this out:
    
    unreferenced object 0xffff8880465654c0 (size 536):
      comm "u10:4", pid 88, jiffies 4294935810 (age 29.512s)
      hex dump (first 32 bytes):
        60 4a 56 46 80 88 ff ff 58 ea e4 5c 80 88 ff ff  `JVF....X..\....
        00 e0 52 49 80 88 ff ff 01 00 01 00 00 00 00 00  ..RI............
      backtrace:
        [<ffffffffa0740f6c>] xfs_dquot_alloc+0x2c/0x530 [xfs]
        [<ffffffffa07443df>] xfs_qm_dqread+0x6f/0x330 [xfs]
        [<ffffffffa07462a2>] xfs_qm_dqget+0x132/0x4e0 [xfs]
        [<ffffffffa0756bb0>] xfs_qm_quotacheck_dqadjust+0xa0/0x3e0 [xfs]
        [<ffffffffa075724d>] xfs_qm_dqusage_adjust+0x35d/0x4f0 [xfs]
        [<ffffffffa06c9068>] xfs_iwalk_ag_recs+0x348/0x5d0 [xfs]
        [<ffffffffa06c95d3>] xfs_iwalk_run_callbacks+0x273/0x540 [xfs]
        [<ffffffffa06c9e8d>] xfs_iwalk_ag+0x5ed/0x890 [xfs]
        [<ffffffffa06ca22f>] xfs_iwalk_ag_work+0xff/0x170 [xfs]
        [<ffffffffa06d22c9>] xfs_pwork_work+0x79/0x130 [xfs]
        [<ffffffff81170bb2>] process_one_work+0x672/0x1040
        [<ffffffff81171b1b>] worker_thread+0x59b/0xec0
        [<ffffffff8118711e>] kthread+0x29e/0x340
        [<ffffffff810032bf>] ret_from_fork+0x1f/0x30
    
    Now we know that quotacheck is at fault, but even this report was
    canaryish -- it was triggered by xfs/494, which doesn't actually mount
    any filesystems.  (kmemleak can be a little slow to notice leaks, even
    with fstests repeatedly whacking it to look for them.)  Looking at the
    *previous* fstest, however, showed that the test run before xfs/494 was
    xfs/117.  The tipoff to the problem is in this excerpt from dmesg:
    
    XFS (sda4): Quotacheck needed: Please wait.
    XFS (sda4): Metadata corruption detected at xfs_dinode_verify.part.0+0xdb/0x7b0 [xfs], inode 0x119 dinode
    XFS (sda4): Unmount and run xfs_repair
    XFS (sda4): First 128 bytes of corrupted metadata buffer:
    00000000: 49 4e 81 a4 03 02 00 00 00 00 00 00 00 00 00 00  IN..............
    00000010: 00 00 00 01 00 00 00 00 00 90 57 54 54 1a 4c 68  ..........WTT.Lh
    00000020: 81 f9 7d e1 6d ee 16 00 34 bd 7d e1 6d ee 16 00  ..}.m...4.}.m...
    00000030: 34 bd 7d e1 6d ee 16 00 00 00 00 00 00 00 00 00  4.}.m...........
    00000040: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
    00000050: 00 00 00 02 00 00 00 00 00 00 00 00 96 80 f3 ab  ................
    00000060: ff ff ff ff da 57 7b 11 00 00 00 00 00 00 00 03  .....W{.........
    00000070: 00 00 00 01 00 00 00 10 00 00 00 00 00 00 00 08  ................
    XFS (sda4): Quotacheck: Unsuccessful (Error -117): Disabling quotas.
    
    The dinode verifier decided that the inode was corrupt, which causes
    iget to return with EFSCORRUPTED.  Since this happened during
    quotacheck, it is obvious that the kernel aborted the inode walk on
    account of the corruption error and disabled quotas.  Unfortunately, we
    neglect to purge the dquot cache before doing that, which is how the
    dquots leaked.
    
    The problems started 10 years ago in commit b84a3a, when the dquot lists
    were converted to a radix tree, but the error handling behavior was not
    correctly preserved -- in that commit, if the bulkstat failed and
    usrquota was enabled, the bulkstat failure code would be overwritten by
    the result of flushing all the dquots to disk.  As long as that
    succeeds, we'd continue the quota mount as if everything were ok, but
    instead we're now operating with a corrupt inode and incorrect quota
    usage counts.  I didn't notice this bug in 2019 when I wrote commit
    ebd126a, which changed quotacheck to skip the dqflush when the scan
    doesn't complete due to inode walk failures.
    
    Introduced-by: b84a3a96751f ("xfs: remove the per-filesystem list of dquots")
    Fixes: ebd126a651f8 ("xfs: convert quotacheck to use the new iwalk functions")
    Signed-off-by: Darrick J. Wong <[email protected]>
    Reviewed-by: Christoph Hellwig <[email protected]>
    Reviewed-by: Dave Chinner <[email protected]>
    Signed-off-by: Dave Chinner <[email protected]>
    Signed-off-by: Amir Goldstein <[email protected]>
    Signed-off-by: Greg Kroah-Hartman <[email protected]>

xfs: remove XFS_PREALLOC_SYNC [+ + +]

Author: Dave Chinner <[email protected]>
Date:   Sat Mar 18 12:15:18 2023 +0200

    xfs: remove XFS_PREALLOC_SYNC
    
    commit 472c6e46f589c26057596dcba160712a5b3e02c5 upstream.
    
    [partial backport for dependency -
     xfs_ioc_space() still uses XFS_PREALLOC_SYNC]
    
    Callers can acheive the same thing by calling xfs_log_force_inode()
    after making their modifications. There is no need for
    xfs_update_prealloc_flags() to do this.
    
    Signed-off-by: Dave Chinner <[email protected]>
    Reviewed-by: Darrick J. Wong <[email protected]>
    Signed-off-by: Darrick J. Wong <[email protected]>
    Signed-off-by: Amir Goldstein <[email protected]>
    Signed-off-by: Greg Kroah-Hartman <[email protected]>

xfs: remove xfs_setattr_time() declaration [+ + +]

Author: Gaosheng Cui <[email protected]>
Date:   Sat Mar 18 12:15:29 2023 +0200

    xfs: remove xfs_setattr_time() declaration
    
    commit b0463b9dd7030a766133ad2f1571f97f204d7bdf upstream.
    
    xfs_setattr_time() has been removed since
    commit e014f37db1a2 ("xfs: use setattr_copy to set vfs inode
    attributes"), so remove it.
    
    Signed-off-by: Gaosheng Cui <[email protected]>
    Reviewed-by: Carlos Maiolino <[email protected]>
    Signed-off-by: Dave Chinner <[email protected]>
    Signed-off-by: Amir Goldstein <[email protected]>
    Signed-off-by: Greg Kroah-Hartman <[email protected]>

xfs: set prealloc flag in xfs_alloc_file_space() [+ + +]

Author: Dave Chinner <[email protected]>
Date:   Sat Mar 18 12:15:20 2023 +0200

    xfs: set prealloc flag in xfs_alloc_file_space()
    
    commit 0b02c8c0d75a738c98c35f02efb36217c170d78c upstream.
    
    [backport for 5.10.y]
    
    Now that we only call xfs_update_prealloc_flags() from
    xfs_file_fallocate() in the case where we need to set the
    preallocation flag, do this in xfs_alloc_file_space() where we
    already have the inode joined into a transaction and get
    rid of the call to xfs_update_prealloc_flags() from the fallocate
    code.
    
    This also means that we now correctly avoid setting the
    XFS_DIFLAG_PREALLOC flag when xfs_is_always_cow_inode() is true, as
    these inodes will never have preallocated extents.
    
    Signed-off-by: Dave Chinner <[email protected]>
    Reviewed-by: Darrick J. Wong <[email protected]>
    Signed-off-by: Darrick J. Wong <[email protected]>
    Signed-off-by: Amir Goldstein <[email protected]>
    Signed-off-by: Greg Kroah-Hartman <[email protected]>

xfs: use setattr_copy to set vfs inode attributes [+ + +]

Author: Darrick J. Wong <[email protected]>
Date:   Sat Mar 18 12:15:21 2023 +0200

    xfs: use setattr_copy to set vfs inode attributes
    
    commit e014f37db1a2d109afa750042ac4d69cf3e3d88e upstream.
    
    [remove userns argument of setattr_copy() for 5.10.y backport]
    
    Filipe Manana pointed out that XFS' behavior w.r.t. setuid/setgid
    revocation isn't consistent with btrfs[1] or ext4.  Those two
    filesystems use the VFS function setattr_copy to convey certain
    attributes from struct iattr into the VFS inode structure.
    
    Andrey Zhadchenko reported[2] that XFS uses the wrong user namespace to
    decide if it should clear setgid and setuid on a file attribute update.
    This is a second symptom of the problem that Filipe noticed.
    
    XFS, on the other hand, open-codes setattr_copy in xfs_setattr_mode,
    xfs_setattr_nonsize, and xfs_setattr_time.  Regrettably, setattr_copy is
    /not/ a simple copy function; it contains additional logic to clear the
    setgid bit when setting the mode, and XFS' version no longer matches.
    
    The VFS implements its own setuid/setgid stripping logic, which
    establishes consistent behavior.  It's a tad unfortunate that it's
    scattered across notify_change, should_remove_suid, and setattr_copy but
    XFS should really follow the Linux VFS.  Adapt XFS to use the VFS
    functions and get rid of the old functions.
    
    [1] https://lore.kernel.org/fstests/CAL3q7H47iNQ=Wmk83WcGB-KBJVOEtR9+qGczzCeXJ9Y2KCV25Q@mail.gmail.com/
    [2] https://lore.kernel.org/linux-xfs/[email protected]/
    
    Fixes: 7fa294c8991c ("userns: Allow chown and setgid preservation")
    Signed-off-by: Darrick J. Wong <[email protected]>
    Reviewed-by: Dave Chinner <[email protected]>
    Reviewed-by: Christoph Hellwig <[email protected]>
    Reviewed-by: Christian Brauner <[email protected]>
    Signed-off-by: Amir Goldstein <[email protected]>
    Signed-off-by: Greg Kroah-Hartman <[email protected]>

Список изменений в Linux 5.10.176