Happy HW Failure day

No backups, no fun

The server which hosted this website and a bunch of my data-hoarding and data-intensive operations has decided to fuck off, with roughly this failure

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
[  134.611236] {1}[Hardware Error]: Hardware error from APEI Generic Hardware Error Source: 32992
[  134.611245] {1}[Hardware Error]: event severity: fatal
[  134.611261] {1}[Hardware Error]:  Error 0, type: fatal
[  134.611266] {1}[Hardware Error]:   section_type: PCIe error
[  134.611269] {1}[Hardware Error]:   port_type: 4, root port
[  134.611272] {1}[Hardware Error]:   version: 1.0
[  134.611275] {1}[Hardware Error]:   command: 0x0547, status: 0x4010
[  134.611279] {1}[Hardware Error]:   device_id: 0000:00:01.0
[  134.611284] {1}[Hardware Error]:   slot: 0
[  134.611286] {1}[Hardware Error]:   secondary_bus: 0x01
[  134.611289] {1}[Hardware Error]:   vendor_id: 0x8086, device_id: 0x3c02
[  134.611291] {1}[Hardware Error]:   class_code: 060400
[  134.611294] {1}[Hardware Error]:   bridge: secondary_status: 0x0000, control: 0x0003
[  134.611297] {1}[Hardware Error]:   aer_cor_status: 0x00000001, aer_cor_mask: 0x000031c1
[  134.611300] {1}[Hardware Error]:   aer_uncor_status: 0x00000020, aer_uncor_mask: 0x00318000
[  134.611304] {1}[Hardware Error]:   aer_uncor_severity: 0x00067030
[  134.611306] {1}[Hardware Error]:   TLP Header: 00000000 00000000 00000000 00000000
[  134.611312] GHES: Fatal hardware error but panic disabled
[  134.611315] Kernel panic - not syncing: GHES: Fatal hardware error

Backups exist, but not for all crap that I ran there. So I’m on a data diet until we attempt a replacement 2 weeks from now.

Built with Hugo
Theme Stack designed by Jimmy