Список изменений в Linux 5.10.218

 
btrfs: add missing mutex_unlock in btrfs_relocate_sys_chunks() [+ + +]
Author: Dominique Martinet <[email protected]>
Date:   Fri Apr 19 11:22:48 2024 +0900

    btrfs: add missing mutex_unlock in btrfs_relocate_sys_chunks()
    
    commit 9af503d91298c3f2945e73703f0e00995be08c30 upstream.
    
    The previous patch that replaced BUG_ON by error handling forgot to
    unlock the mutex in the error path.
    
    Link: https://lore.kernel.org/all/Zh%[email protected]
    Reported-by: Pavel Machek <[email protected]>
    Fixes: 7411055db5ce ("btrfs: handle chunk tree lookup error in btrfs_relocate_sys_chunks()")
    CC: [email protected]
    Reviewed-by: Pavel Machek <[email protected]>
    Signed-off-by: Dominique Martinet <[email protected]>
    Reviewed-by: David Sterba <[email protected]>
    Signed-off-by: David Sterba <[email protected]>
    Signed-off-by: Dominique Martinet <[email protected]>
    Signed-off-by: Greg Kroah-Hartman <[email protected]>

 
docs: kernel_include.py: Cope with docutils 0.21 [+ + +]
Author: Akira Yokosawa <[email protected]>
Date:   Wed May 1 12:16:11 2024 +0900

    docs: kernel_include.py: Cope with docutils 0.21
    
    commit d43ddd5c91802a46354fa4c4381416ef760676e2 upstream.
    
    Running "make htmldocs" on a newly installed Sphinx 7.3.7 ends up in
    a build error:
    
        Sphinx parallel build error:
        AttributeError: module 'docutils.nodes' has no attribute 'reprunicode'
    
    docutils 0.21 has removed nodes.reprunicode, quote from release note [1]:
    
      * Removed objects:
    
        docutils.nodes.reprunicode, docutils.nodes.ensure_str()
            Python 2 compatibility hacks
    
    Sphinx 7.3.0 supports docutils 0.21 [2]:
    
    kernel_include.py, whose origin is misc.py of docutils, uses reprunicode.
    
    Upstream docutils removed the offending line from the corresponding file
    (docutils/docutils/parsers/rst/directives/misc.py) in January 2022.
    Quoting the changelog [3]:
    
        Deprecate `nodes.reprunicode` and `nodes.ensure_str()`.
    
        Drop uses of the deprecated constructs (not required with Python 3).
    
    Do the same for kernel_include.py.
    
    Tested against:
      - Sphinx 2.4.5 (docutils 0.17.1)
      - Sphinx 3.4.3 (docutils 0.17.1)
      - Sphinx 5.3.0 (docutils 0.18.1)
      - Sphinx 6.2.1 (docutils 0.19)
      - Sphinx 7.2.6 (docutils 0.20.1)
      - Sphinx 7.3.7 (docutils 0.21.2)
    
    Link: http://www.docutils.org/RELEASE-NOTES.html#release-0-21-2024-04-09 [1]
    Link: https://www.sphinx-doc.org/en/master/changes.html#release-7-3-0-released-apr-16-2024 [2]
    Link: https://github.com/docutils/docutils/commit/c8471ce47a24 [3]
    Signed-off-by: Akira Yokosawa <[email protected]>
    Cc: [email protected]
    Signed-off-by: Jonathan Corbet <[email protected]>
    Link: https://lore.kernel.org/r/[email protected]
    Signed-off-by: Greg Kroah-Hartman <[email protected]>

 
drm/amdgpu: Fix possible NULL dereference in amdgpu_ras_query_error_status_helper() [+ + +]
Author: Srinivasan Shanmugam <[email protected]>
Date:   Tue Dec 26 15:32:19 2023 +0530

    drm/amdgpu: Fix possible NULL dereference in amdgpu_ras_query_error_status_helper()
    
    commit b8d55a90fd55b767c25687747e2b24abd1ef8680 upstream.
    
    Return invalid error code -EINVAL for invalid block id.
    
    Fixes the below:
    
    drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c:1183 amdgpu_ras_query_error_status_helper() error: we previously assumed 'info' could be null (see line 1176)
    
    Suggested-by: Hawking Zhang <[email protected]>
    Cc: Tao Zhou <[email protected]>
    Cc: Hawking Zhang <[email protected]>
    Cc: Christian Kц╤nig <[email protected]>
    Cc: Alex Deucher <[email protected]>
    Signed-off-by: Srinivasan Shanmugam <[email protected]>
    Reviewed-by: Hawking Zhang <[email protected]>
    Signed-off-by: Alex Deucher <[email protected]>
    [Ajay: applied AMDGPU_RAS_BLOCK_COUNT condition to amdgpu_ras_error_query()
           as amdgpu_ras_query_error_status_helper() not present in v5.10, v5.4
           amdgpu_ras_query_error_status_helper() was introduced in 8cc0f5669eb6]
    Signed-off-by: Ajay Kaher <[email protected]>
    Signed-off-by: Greg Kroah-Hartman <[email protected]>

 
firmware: arm_scmi: Harden accesses to the reset domains [+ + +]
Author: Cristian Marussi <[email protected]>
Date:   Wed Aug 17 18:27:29 2022 +0100

    firmware: arm_scmi: Harden accesses to the reset domains
    
    commit e9076ffbcaed5da6c182b144ef9f6e24554af268 upstream.
    
    Accessing reset domains descriptors by the index upon the SCMI drivers
    requests through the SCMI reset operations interface can potentially
    lead to out-of-bound violations if the SCMI driver misbehave.
    
    Add an internal consistency check before any such domains descriptors
    accesses.
    
    Link: https://lore.kernel.org/r/[email protected]
    Signed-off-by: Cristian Marussi <[email protected]>
    Signed-off-by: Sudeep Holla <[email protected]>
    Signed-off-by: Dominique Martinet <[email protected]>
    Signed-off-by: Greg Kroah-Hartman <[email protected]>

 
ima: fix deadlock when traversing "ima_default_rules". [+ + +]
Author: liqiong <[email protected]>
Date:   Sat Oct 9 18:38:21 2021 +0800

    ima: fix deadlock when traversing "ima_default_rules".
    
    commit eb0782bbdfd0d7c4786216659277c3fd585afc0e upstream.
    
    The current IMA ruleset is identified by the variable "ima_rules"
    that default to "&ima_default_rules". When loading a custom policy
    for the first time, the variable is updated to "&ima_policy_rules"
    instead. That update isn't RCU-safe, and deadlocks are possible.
    Indeed, some functions like ima_match_policy() may loop indefinitely
    when traversing "ima_default_rules" with list_for_each_entry_rcu().
    
    When iterating over the default ruleset back to head, if the list
    head is "ima_default_rules", and "ima_rules" have been updated to
    "&ima_policy_rules", the loop condition (&entry->list != ima_rules)
    stays always true, traversing won't terminate, causing a soft lockup
    and RCU stalls.
    
    Introduce a temporary value for "ima_rules" when iterating over
    the ruleset to avoid the deadlocks.
    
    Signed-off-by: liqiong <[email protected]>
    Reviewed-by: THOBY Simon <[email protected]>
    Fixes: 38d859f991f3 ("IMA: policy can now be updated multiple times")
    Reported-by: kernel test robot <[email protected]> (Fix sparse: incompatible types in comparison expression.)
    Signed-off-by: Mimi Zohar <[email protected]>
    Signed-off-by: GUO Zihua <[email protected]>
    Signed-off-by: Greg Kroah-Hartman <[email protected]>

 
KVM: x86: Clear "has_error_code", not "error_code", for RM exception injection [+ + +]
Author: Sean Christopherson <[email protected]>
Date:   Wed Mar 22 07:32:59 2023 -0700

    KVM: x86: Clear "has_error_code", not "error_code", for RM exception injection
    
    commit 6c41468c7c12d74843bb414fc00307ea8a6318c3 upstream.
    
    When injecting an exception into a vCPU in Real Mode, suppress the error
    code by clearing the flag that tracks whether the error code is valid, not
    by clearing the error code itself.  The "typo" was introduced by recent
    fix for SVM's funky Paged Real Mode.
    
    Opportunistically hoist the logic above the tracepoint so that the trace
    is coherent with respect to what is actually injected (this was also the
    behavior prior to the buggy commit).
    
    Fixes: b97f07458373 ("KVM: x86: determine if an exception has an error code only when injecting it.")
    Cc: [email protected]
    Cc: Maxim Levitsky <[email protected]>
    Signed-off-by: Sean Christopherson <[email protected]>
    Message-Id: <[email protected]>
    Signed-off-by: Paolo Bonzini <[email protected]>
    [nsaenz: backport to 5.10.y]
    Signed-off-by: Nicolas Saenz Julienne <[email protected]>
    Signed-off-by: Greg Kroah-Hartman <[email protected]>
    Acked-by: Sean Christopherson <[email protected]>

 
Linux: Linux 5.10.218 [+ + +]
Author: Greg Kroah-Hartman <[email protected]>
Date:   Sat May 25 16:19:07 2024 +0200

    Linux 5.10.218
    
    Link: https://lore.kernel.org/r/[email protected]
    Tested-by: Mark Brown <[email protected]>
    Tested-by: Florian Fainelli <[email protected]>
    Tested-by: Dominique Martinet <[email protected]>
    Tested-by: Pavel Machek (CIP) <[email protected]>
    Tested-by: Linux Kernel Functional Testing <[email protected]>
    Tested-by: Jon Hunter <[email protected]>
    Tested-by: Salvatore Bonaccorso <[email protected]>
    Signed-off-by: Greg Kroah-Hartman <[email protected]>

 
mptcp: ensure snd_nxt is properly initialized on connect [+ + +]
Author: Paolo Abeni <[email protected]>
Date:   Mon Apr 29 20:00:31 2024 +0200

    mptcp: ensure snd_nxt is properly initialized on connect
    
    commit fb7a0d334894206ae35f023a82cad5a290fd7386 upstream.
    
    Christoph reported a splat hinting at a corrupted snd_una:
    
      WARNING: CPU: 1 PID: 38 at net/mptcp/protocol.c:1005 __mptcp_clean_una+0x4b3/0x620 net/mptcp/protocol.c:1005
      Modules linked in:
      CPU: 1 PID: 38 Comm: kworker/1:1 Not tainted 6.9.0-rc1-gbbeac67456c9 #59
      Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.11.0-2.el7 04/01/2014
      Workqueue: events mptcp_worker
      RIP: 0010:__mptcp_clean_una+0x4b3/0x620 net/mptcp/protocol.c:1005
      Code: be 06 01 00 00 bf 06 01 00 00 e8 a8 12 e7 fe e9 00 fe ff ff e8
            8e 1a e7 fe 0f b7 ab 3e 02 00 00 e9 d3 fd ff ff e8 7d 1a e7 fe
            <0f> 0b 4c 8b bb e0 05 00 00 e9 74 fc ff ff e8 6a 1a e7 fe 0f 0b e9
      RSP: 0018:ffffc9000013fd48 EFLAGS: 00010293
      RAX: 0000000000000000 RBX: ffff8881029bd280 RCX: ffffffff82382fe4
      RDX: ffff8881003cbd00 RSI: ffffffff823833c3 RDI: 0000000000000001
      RBP: 0000000000000000 R08: 0000000000000001 R09: 0000000000000000
      R10: 0000000000000000 R11: fefefefefefefeff R12: ffff888138ba8000
      R13: 0000000000000106 R14: ffff8881029bd908 R15: ffff888126560000
      FS:  0000000000000000(0000) GS:ffff88813bd00000(0000) knlGS:0000000000000000
      CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      CR2: 00007f604a5dae38 CR3: 0000000101dac002 CR4: 0000000000170ef0
      Call Trace:
       <TASK>
       __mptcp_clean_una_wakeup net/mptcp/protocol.c:1055 [inline]
       mptcp_clean_una_wakeup net/mptcp/protocol.c:1062 [inline]
       __mptcp_retrans+0x7f/0x7e0 net/mptcp/protocol.c:2615
       mptcp_worker+0x434/0x740 net/mptcp/protocol.c:2767
       process_one_work+0x1e0/0x560 kernel/workqueue.c:3254
       process_scheduled_works kernel/workqueue.c:3335 [inline]
       worker_thread+0x3c7/0x640 kernel/workqueue.c:3416
       kthread+0x121/0x170 kernel/kthread.c:388
       ret_from_fork+0x44/0x50 arch/x86/kernel/process.c:147
       ret_from_fork_asm+0x1a/0x30 arch/x86/entry/entry_64.S:243
       </TASK>
    
    When fallback to TCP happens early on a client socket, snd_nxt
    is not yet initialized and any incoming ack will copy such value
    into snd_una. If the mptcp worker (dumbly) tries mptcp-level
    re-injection after such ack, that would unconditionally trigger a send
    buffer cleanup using 'bad' snd_una values.
    
    We could easily disable re-injection for fallback sockets, but such
    dumb behavior already helped catching a few subtle issues and a very
    low to zero impact in practice.
    
    Instead address the issue always initializing snd_nxt (and write_seq,
    for consistency) at connect time.
    
    Fixes: 8fd738049ac3 ("mptcp: fallback in case of simultaneous connect")
    Cc: [email protected]
    Reported-by: Christoph Paasch <[email protected]>
    Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/485
    Tested-by: Christoph Paasch <[email protected]>
    Signed-off-by: Paolo Abeni <[email protected]>
    Reviewed-by: Mat Martineau <[email protected]>
    Signed-off-by: Matthieu Baerts (NGI0) <[email protected]>
    Link: https://lore.kernel.org/r/20240429-upstream-net-20240429-mptcp-snd_nxt-init-connect-v1-1-59ceac0a7dcb@kernel.org
    Signed-off-by: Jakub Kicinski <[email protected]>
    [ snd_nxt field is not available in v5.10.y: before, only write_seq was
      used, see commit eaa2ffabfc35 ("mptcp: introduce MPTCP snd_nxt") for
      more details about that. ]
    Signed-off-by: Matthieu Baerts (NGI0) <[email protected]>
    Signed-off-by: Greg Kroah-Hartman <[email protected]>

 
net: bcmgenet: synchronize EXT_RGMII_OOB_CTRL access [+ + +]
Author: Doug Berger <[email protected]>
Date:   Thu Apr 25 15:27:19 2024 -0700

    net: bcmgenet: synchronize EXT_RGMII_OOB_CTRL access
    
    commit d85cf67a339685beae1d0aee27b7f61da95455be upstream.
    
    The EXT_RGMII_OOB_CTRL register can be written from different
    contexts. It is predominantly written from the adjust_link
    handler which is synchronized by the phydev->lock, but can
    also be written from a different context when configuring the
    mii in bcmgenet_mii_config().
    
    The chances of contention are quite low, but it is conceivable
    that adjust_link could occur during resume when WoL is enabled
    so use the phydev->lock synchronizer in bcmgenet_mii_config()
    to be sure.
    
    Fixes: afe3f907d20f ("net: bcmgenet: power on MII block for all MII modes")
    Cc: [email protected]
    Signed-off-by: Doug Berger <[email protected]>
    Acked-by: Florian Fainelli <[email protected]>
    Signed-off-by: David S. Miller <[email protected]>
    Signed-off-by: Greg Kroah-Hartman <[email protected]>

net: bcmgenet: synchronize UMAC_CMD access [+ + +]
Author: Doug Berger <[email protected]>
Date:   Thu Apr 25 15:27:21 2024 -0700

    net: bcmgenet: synchronize UMAC_CMD access
    
    commit 0d5e2a82232605b337972fb2c7d0cbc46898aca1 upstream.
    
    The UMAC_CMD register is written from different execution
    contexts and has insufficient synchronization protections to
    prevent possible corruption. Of particular concern are the
    acceses from the phy_device delayed work context used by the
    adjust_link call and the BH context that may be used by the
    ndo_set_rx_mode call.
    
    A spinlock is added to the driver to protect contended register
    accesses (i.e. reg_lock) and it is used to synchronize accesses
    to UMAC_CMD.
    
    Fixes: 1c1008c793fa ("net: bcmgenet: add main driver file")
    Cc: [email protected]
    Signed-off-by: Doug Berger <[email protected]>
    Acked-by: Florian Fainelli <[email protected]>
    Signed-off-by: David S. Miller <[email protected]>
    Signed-off-by: Greg Kroah-Hartman <[email protected]>

 
netlink: annotate lockless accesses to nlk->max_recvmsg_len [+ + +]
Author: Eric Dumazet <[email protected]>
Date:   Mon Apr 3 21:46:43 2023 +0000

    netlink: annotate lockless accesses to nlk->max_recvmsg_len
    
    commit a1865f2e7d10dde00d35a2122b38d2e469ae67ed upstream.
    
    syzbot reported a data-race in data-race in netlink_recvmsg() [1]
    
    Indeed, netlink_recvmsg() can be run concurrently,
    and netlink_dump() also needs protection.
    
    [1]
    BUG: KCSAN: data-race in netlink_recvmsg / netlink_recvmsg
    
    read to 0xffff888141840b38 of 8 bytes by task 23057 on cpu 0:
    netlink_recvmsg+0xea/0x730 net/netlink/af_netlink.c:1988
    sock_recvmsg_nosec net/socket.c:1017 [inline]
    sock_recvmsg net/socket.c:1038 [inline]
    __sys_recvfrom+0x1ee/0x2e0 net/socket.c:2194
    __do_sys_recvfrom net/socket.c:2212 [inline]
    __se_sys_recvfrom net/socket.c:2208 [inline]
    __x64_sys_recvfrom+0x78/0x90 net/socket.c:2208
    do_syscall_x64 arch/x86/entry/common.c:50 [inline]
    do_syscall_64+0x41/0xc0 arch/x86/entry/common.c:80
    entry_SYSCALL_64_after_hwframe+0x63/0xcd
    
    write to 0xffff888141840b38 of 8 bytes by task 23037 on cpu 1:
    netlink_recvmsg+0x114/0x730 net/netlink/af_netlink.c:1989
    sock_recvmsg_nosec net/socket.c:1017 [inline]
    sock_recvmsg net/socket.c:1038 [inline]
    ____sys_recvmsg+0x156/0x310 net/socket.c:2720
    ___sys_recvmsg net/socket.c:2762 [inline]
    do_recvmmsg+0x2e5/0x710 net/socket.c:2856
    __sys_recvmmsg net/socket.c:2935 [inline]
    __do_sys_recvmmsg net/socket.c:2958 [inline]
    __se_sys_recvmmsg net/socket.c:2951 [inline]
    __x64_sys_recvmmsg+0xe2/0x160 net/socket.c:2951
    do_syscall_x64 arch/x86/entry/common.c:50 [inline]
    do_syscall_64+0x41/0xc0 arch/x86/entry/common.c:80
    entry_SYSCALL_64_after_hwframe+0x63/0xcd
    
    value changed: 0x0000000000000000 -> 0x0000000000001000
    
    Reported by Kernel Concurrency Sanitizer on:
    CPU: 1 PID: 23037 Comm: syz-executor.2 Not tainted 6.3.0-rc4-syzkaller-00195-g5a57b48fdfcb #0
    Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 03/02/2023
    
    Fixes: 9063e21fb026 ("netlink: autosize skb lengthes")
    Reported-by: syzbot <[email protected]>
    Signed-off-by: Eric Dumazet <[email protected]>
    Reviewed-by: Simon Horman <[email protected]>
    Link: https://lore.kernel.org/r/[email protected]
    Signed-off-by: Jakub Kicinski <[email protected]>
    Signed-off-by: yenchia.chen <[email protected]>
    Signed-off-by: Greg Kroah-Hartman <[email protected]>

 
pinctrl: core: handle radix_tree_insert() errors in pinctrl_register_one_pin() [+ + +]
Author: Sergey Shtylyov <[email protected]>
Date:   Wed Jul 19 23:22:52 2023 +0300

    pinctrl: core: handle radix_tree_insert() errors in pinctrl_register_one_pin()
    
    commit ecfe9a015d3e1e46504d5b3de7eef1f2d186194a upstream.
    
    pinctrl_register_one_pin() doesn't check the result of radix_tree_insert()
    despite they both may return a negative error code.  Linus Walleij said he
    has copied the radix tree code from kernel/irq/ where the functions calling
    radix_tree_insert() are *void* themselves; I think it makes more sense to
    propagate the errors from radix_tree_insert() upstream if we can do that...
    
    Found by Linux Verification Center (linuxtesting.org) with the Svace static
    analysis tool.
    
    Signed-off-by: Sergey Shtylyov <[email protected]>
    Link: https://lore.kernel.org/r/[email protected]
    Signed-off-by: Linus Walleij <[email protected]>
    Cc: "Hemdan, Hagar Gamal Halim" <[email protected]>
    Signed-off-by: Greg Kroah-Hartman <[email protected]>
 
Revert "selftests: mm: fix map_hugetlb failure on 64K page size systems" [+ + +]
Author: Harshit Mogalapalli <[email protected]>
Date:   Mon May 6 01:49:26 2024 -0700

    Revert "selftests: mm: fix map_hugetlb failure on 64K page size systems"
    
    This reverts commit c9c3cc6a13bddc76bb533ad8147a5528cac5ba5a which is
    commit 91b80cc5b39f00399e8e2d17527cad2c7fa535e2 upstream.
    
    map_hugetlb.c:18:10: fatal error: vm_util.h: No such file or directory
       18 | #include "vm_util.h"
          |          ^~~~~~~~~~~
    compilation terminated.
    
    vm_util.h is not present in 5.10.y, as commit:642bc52aed9c ("selftests:
    vm: bring common functions to a new file") is not present in stable
    kernels <=6.1.y
    
    Signed-off-by: Harshit Mogalapalli <[email protected]>

 
serial: kgdboc: Fix NMI-safety problems from keyboard reset code [+ + +]
Author: Daniel Thompson <[email protected]>
Date:   Wed Apr 24 15:21:41 2024 +0100

    serial: kgdboc: Fix NMI-safety problems from keyboard reset code
    
    commit b2aba15ad6f908d1a620fd97f6af5620c3639742 upstream.
    
    Currently, when kdb is compiled with keyboard support, then we will use
    schedule_work() to provoke reset of the keyboard status.  Unfortunately
    schedule_work() gets called from the kgdboc post-debug-exception
    handler.  That risks deadlock since schedule_work() is not NMI-safe and,
    even on platforms where the NMI is not directly used for debugging, the
    debug trap can have NMI-like behaviour depending on where breakpoints
    are placed.
    
    Fix this by using the irq work system, which is NMI-safe, to defer the
    call to schedule_work() to a point when it is safe to call.
    
    Reported-by: Liuye <[email protected]>
    Closes: https://lore.kernel.org/all/[email protected]/
    Cc: [email protected]
    Reviewed-by: Douglas Anderson <[email protected]>
    Acked-by: Greg Kroah-Hartman <[email protected]>
    Link: https://lore.kernel.org/r/[email protected]
    Signed-off-by: Daniel Thompson <[email protected]>
    Signed-off-by: Greg Kroah-Hartman <[email protected]>

 
usb: typec: ucsi: displayport: Fix potential deadlock [+ + +]
Author: Heikki Krogerus <[email protected]>
Date:   Tue May 7 16:43:16 2024 +0300

    usb: typec: ucsi: displayport: Fix potential deadlock
    
    commit b791a67f68121d69108640d4a3e591d210ffe850 upstream.
    
    The function ucsi_displayport_work() does not access the
    connector, so it also must not acquire the connector lock.
    
    This fixes a potential deadlock scenario:
    
    ucsi_displayport_work() -> lock(&con->lock)
    typec_altmode_vdm()
    dp_altmode_vdm()
    dp_altmode_work()
    typec_altmode_enter()
    ucsi_displayport_enter() -> lock(&con->lock)
    
    Reported-by: Mathias Nyman <[email protected]>
    Fixes: af8622f6a585 ("usb: typec: ucsi: Support for DisplayPort alt mode")
    Cc: [email protected]
    Signed-off-by: Heikki Krogerus <[email protected]>
    Link: https://lore.kernel.org/r/[email protected]
    Signed-off-by: Greg Kroah-Hartman <[email protected]>

 
x86/xen: Drop USERGS_SYSRET64 paravirt call [+ + +]
Author: Juergen Gross <[email protected]>
Date:   Wed Jan 20 14:55:45 2021 +0100

    x86/xen: Drop USERGS_SYSRET64 paravirt call
    
    commit afd30525a659ac0ae0904f0cb4a2ca75522c3123 upstream.
    
    USERGS_SYSRET64 is used to return from a syscall via SYSRET, but
    a Xen PV guest will nevertheless use the IRET hypercall, as there
    is no sysret PV hypercall defined.
    
    So instead of testing all the prerequisites for doing a sysret and
    then mangling the stack for Xen PV again for doing an iret just use
    the iret exit from the beginning.
    
    This can easily be done via an ALTERNATIVE like it is done for the
    sysenter compat case already.
    
    It should be noted that this drops the optimization in Xen for not
    restoring a few registers when returning to user mode, but it seems
    as if the saved instructions in the kernel more than compensate for
    this drop (a kernel build in a Xen PV guest was slightly faster with
    this patch applied).
    
    While at it remove the stale sysret32 remnants.
    
      [ pawan: Brad Spengler and Salvatore Bonaccorso <[email protected]>
               reported a problem with the 5.10 backport commit edc702b4a820
               ("x86/entry_64: Add VERW just before userspace transition").
    
               When CONFIG_PARAVIRT_XXL=y, CLEAR_CPU_BUFFERS is not executed in
               syscall_return_via_sysret path as USERGS_SYSRET64 is runtime
               patched to:
    
            .cpu_usergs_sysret64    = { 0x0f, 0x01, 0xf8,
                                        0x48, 0x0f, 0x07 }, // swapgs; sysretq
    
               which is missing CLEAR_CPU_BUFFERS. It turns out dropping
               USERGS_SYSRET64 simplifies the code, allowing CLEAR_CPU_BUFFERS
               to be explicitly added to syscall_return_via_sysret path. Below
               is with CONFIG_PARAVIRT_XXL=y and this patch applied:
    
               syscall_return_via_sysret:
               ...
               <+342>:   swapgs
               <+345>:   xchg   %ax,%ax
               <+347>:   verw   -0x1a2(%rip)  <------
               <+354>:   sysretq
      ]
    
    Signed-off-by: Juergen Gross <[email protected]>
    Signed-off-by: Borislav Petkov <[email protected]>
    Signed-off-by: Pawan Gupta <[email protected]>
    Link: https://lkml.kernel.org/r/[email protected]
    Signed-off-by: Greg Kroah-Hartman <[email protected]>