Hi all, sorry if I post this the wrong place.

I have a laptop running mint with qtile which sometimes freezes. To the point where nothing responds and I need to kill it. I’ve tried: sudo journalctl But I don’t get any information which helps me.

Can anyone help to debug it?

  • drillepind42@feddit.dkOP
    link
    fedilink
    arrow-up
    1
    ·
    8 hours ago

    journalctl -b | grep -v rtkit-daemon

    I just got the following freeze that it recovered from by itself (rarely happens). After filtering the log a bit I get the following from around the time it happened.

    Feb 07 00:00:58 tuxedo systemd[1]: Starting systemd-tmpfiles-clean.service - Cleanup of Temporary Directories...
    Feb 07 00:00:58 tuxedo systemd[1]: systemd-tmpfiles-clean.service: Deactivated successfully.
    Feb 07 00:00:58 tuxedo systemd[1]: Finished systemd-tmpfiles-clean.service - Cleanup of Temporary Directories.
    Feb 07 00:00:58 tuxedo systemd[1]: run-credentials-systemd\x2dtmpfiles\x2dclean.service.mount: Deactivated successfully.
    Feb 07 00:58:35 tuxedo kernel: [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring gfx_0.0.0 timeout, signaled seq=780519, emitted seq=780521
    Feb 07 00:58:35 tuxedo kernel: [drm:amdgpu_job_timedout [amdgpu]] *ERROR* Process information: process floorp pid 1548 thread floorp:cs0 pid 1606
    Feb 07 00:58:35 tuxedo kernel: amdgpu 0000:65:00.0: amdgpu: GPU reset begin!
    Feb 07 00:58:35 tuxedo kernel: [drm:mes_v11_0_submit_pkt_and_poll_completion.constprop.0 [amdgpu]] *ERROR* MES failed to response msg=3
    Feb 07 00:58:35 tuxedo kernel: [drm:amdgpu_mes_unmap_legacy_queue [amdgpu]] *ERROR* failed to unmap legacy queue
    Feb 07 00:58:35 tuxedo kernel: [drm:mes_v11_0_submit_pkt_and_poll_completion.constprop.0 [amdgpu]] *ERROR* MES failed to response msg=3
    Feb 07 00:58:35 tuxedo kernel: [drm:amdgpu_mes_unmap_legacy_queue [amdgpu]] *ERROR* failed to unmap legacy queue
    Feb 07 00:58:36 tuxedo kernel: [drm:mes_v11_0_submit_pkt_and_poll_completion.constprop.0 [amdgpu]] *ERROR* MES failed to response msg=3
    Feb 07 00:58:36 tuxedo kernel: [drm:amdgpu_mes_unmap_legacy_queue [amdgpu]] *ERROR* failed to unmap legacy queue
    Feb 07 00:58:36 tuxedo kernel: [drm:mes_v11_0_submit_pkt_and_poll_completion.constprop.0 [amdgpu]] *ERROR* MES failed to response msg=3
    Feb 07 00:58:36 tuxedo kernel: [drm:amdgpu_mes_unmap_legacy_queue [amdgpu]] *ERROR* failed to unmap legacy queue
    Feb 07 00:58:36 tuxedo kernel: [drm:mes_v11_0_submit_pkt_and_poll_completion.constprop.0 [amdgpu]] *ERROR* MES failed to response msg=3
    Feb 07 00:58:36 tuxedo kernel: [drm:amdgpu_mes_unmap_legacy_queue [amdgpu]] *ERROR* failed to unmap legacy queue
    Feb 07 00:58:36 tuxedo kernel: [drm:mes_v11_0_submit_pkt_and_poll_completion.constprop.0 [amdgpu]] *ERROR* MES failed to response msg=3
    Feb 07 00:58:36 tuxedo kernel: [drm:amdgpu_mes_unmap_legacy_queue [amdgpu]] *ERROR* failed to unmap legacy queue
    Feb 07 00:58:36 tuxedo kernel: [drm:mes_v11_0_submit_pkt_and_poll_completion.constprop.0 [amdgpu]] *ERROR* MES failed to response msg=3
    Feb 07 00:58:36 tuxedo kernel: [drm:amdgpu_mes_unmap_legacy_queue [amdgpu]] *ERROR* failed to unmap legacy queue
    Feb 07 00:58:36 tuxedo kernel: [drm:mes_v11_0_submit_pkt_and_poll_completion.constprop.0 [amdgpu]] *ERROR* MES failed to response msg=3
    Feb 07 00:58:36 tuxedo kernel: [drm:amdgpu_mes_unmap_legacy_queue [amdgpu]] *ERROR* failed to unmap legacy queue
    Feb 07 00:58:36 tuxedo kernel: [drm:mes_v11_0_submit_pkt_and_poll_completion.constprop.0 [amdgpu]] *ERROR* MES failed to response msg=3
    Feb 07 00:58:36 tuxedo kernel: [drm:amdgpu_mes_unmap_legacy_queue [amdgpu]] *ERROR* failed to unmap legacy queue
    Feb 07 00:58:37 tuxedo kernel: [drm:gfx_v11_0_hw_fini [amdgpu]] *ERROR* failed to halt cp gfx
    Feb 07 00:58:37 tuxedo kernel: amdgpu 0000:65:00.0: amdgpu: MODE2 reset
    Feb 07 00:58:37 tuxedo kernel: amdgpu 0000:65:00.0: amdgpu: GPU reset succeeded, trying to resume
    Feb 07 00:58:37 tuxedo kernel: [drm] PCIE GART of 512M enabled (table at 0x000000801FD00000).
    Feb 07 00:58:37 tuxedo kernel: amdgpu 0000:65:00.0: amdgpu: SMU is resuming...
    Feb 07 00:58:37 tuxedo kernel: amdgpu 0000:65:00.0: amdgpu: SMU is resumed successfully!
    Feb 07 00:58:37 tuxedo kernel: [drm] DMUB hardware initialized: version=0x08000500
    Feb 07 00:58:37 tuxedo kernel: [drm] DPIA AUX failed on 0x600(1), error 7
    Feb 07 00:58:37 tuxedo kernel: [drm] DPIA AUX failed on 0x600(1), error 7
    ...
    Feb 07 00:58:37 tuxedo kernel: [drm] DPIA AUX failed on 0x600(1), error 7
    Feb 07 00:58:37 tuxedo kernel: [drm] DPIA AUX failed on 0x600(1), error 7
    Feb 07 00:58:37 tuxedo kernel: [drm] kiq ring mec 3 pipe 1 q 0
    Feb 07 00:58:37 tuxedo kernel: [drm] VCN decode and encode initialized successfully(under DPG Mode).
    Feb 07 00:58:37 tuxedo kernel: amdgpu 0000:65:00.0: [drm:jpeg_v4_0_hw_init [amdgpu]] JPEG decode initialized successfully.
    Feb 07 00:58:37 tuxedo kernel: amdgpu 0000:65:00.0: amdgpu: ring gfx_0.0.0 uses VM inv eng 0 on hub 0
    Feb 07 00:58:37 tuxedo kernel: amdgpu 0000:65:00.0: amdgpu: ring comp_1.0.0 uses VM inv eng 1 on hub 0
    Feb 07 00:58:37 tuxedo kernel: amdgpu 0000:65:00.0: amdgpu: ring comp_1.1.0 uses VM inv eng 4 on hub 0
    Feb 07 00:58:37 tuxedo kernel: amdgpu 0000:65:00.0: amdgpu: ring comp_1.2.0 uses VM inv eng 6 on hub 0
    Feb 07 00:58:37 tuxedo kernel: amdgpu 0000:65:00.0: amdgpu: ring comp_1.3.0 uses VM inv eng 7 on hub 0
    Feb 07 00:58:37 tuxedo kernel: amdgpu 0000:65:00.0: amdgpu: ring comp_1.0.1 uses VM inv eng 8 on hub 0
    Feb 07 00:58:37 tuxedo kernel: amdgpu 0000:65:00.0: amdgpu: ring comp_1.1.1 uses VM inv eng 9 on hub 0
    Feb 07 00:58:37 tuxedo kernel: amdgpu 0000:65:00.0: amdgpu: ring comp_1.2.1 uses VM inv eng 10 on hub 0
    Feb 07 00:58:37 tuxedo kernel: amdgpu 0000:65:00.0: amdgpu: ring comp_1.3.1 uses VM inv eng 11 on hub 0
    Feb 07 00:58:37 tuxedo kernel: amdgpu 0000:65:00.0: amdgpu: ring sdma0 uses VM inv eng 12 on hub 0
    Feb 07 00:58:37 tuxedo kernel: amdgpu 0000:65:00.0: amdgpu: ring vcn_unified_0 uses VM inv eng 0 on hub 1
    Feb 07 00:58:37 tuxedo kernel: amdgpu 0000:65:00.0: amdgpu: ring jpeg_dec uses VM inv eng 1 on hub 1
    Feb 07 00:58:37 tuxedo kernel: amdgpu 0000:65:00.0: amdgpu: ring mes_kiq_3.1.0 uses VM inv eng 13 on hub 0
    Feb 07 00:58:37 tuxedo kernel: amdgpu 0000:65:00.0: amdgpu: recover vram bo from shadow start
    Feb 07 00:58:37 tuxedo kernel: amdgpu 0000:65:00.0: amdgpu: recover vram bo from shadow done
    Feb 07 00:58:37 tuxedo kernel: [drm] ring gfx_32775.1.1 was added
    Feb 07 00:58:37 tuxedo kernel: [drm] ring compute_32775.2.2 was added
    Feb 07 00:58:37 tuxedo kernel: [drm] ring sdma_32775.3.3 was added
    Feb 07 00:58:37 tuxedo kernel: [drm] ring gfx_32775.1.1 test pass
    Feb 07 00:58:37 tuxedo kernel: [drm] ring gfx_32775.1.1 ib test pass
    Feb 07 00:58:37 tuxedo kernel: [drm] ring compute_32775.2.2 test pass
    Feb 07 00:58:37 tuxedo kernel: [drm] ring compute_32775.2.2 ib test pass
    Feb 07 00:58:37 tuxedo kernel: [drm] ring sdma_32775.3.3 test pass
    Feb 07 00:58:37 tuxedo kernel: [drm] ring sdma_32775.3.3 ib test pass
    Feb 07 00:58:37 tuxedo kernel: amdgpu 0000:65:00.0: amdgpu: GPU reset(2) succeeded!
    ``'
    • A_norny_mousse@piefed.zip
      link
      fedilink
      English
      arrow-up
      2
      ·
      41 minutes ago

      Nice, that looks pretty obvious.

      In addition to the other reply, you should search around your distro* having problems with (certain) AMD gpus; maybe all you need is a backported kernel.

      * I don’t think you ever mentioned. If it’s Ubuntu-based, search for Ubuntu.

    • FishFace@piefed.social
      link
      fedilink
      English
      arrow-up
      2
      ·
      8 hours ago

      Looks like the chipset/graphics driver is crashing. That can be because of the driver or the hardware.

      It will be hard to diagnose, but you can search for the most detailed of those log lines together with your laptop model and see if that yields anything. There problem is that it’s never possible to know whether you have a software issue or a hardware issue that is exposed by particular software.

      You can try installing a completely different os (i.e windows) to see if the same problem occurs - if it does you can be fairly sure it’s hardware.