Forskjell mellom versjoner av «Bite»

Fra Bitraf
Hopp til navigering Hopp til søk
m (2023: fixed a spelling error)
(2024: document todays' service outage.)
 
(Én mellomliggende revisjon av samme bruker vises ikke)
Linje 121: Linje 121:
  
 
(Signer loggen ved å legge til <nowiki>--~~~~</nowiki>) på en egen linje på slutten av innlegget eller trykk på signatur-knappen i menyen).
 
(Signer loggen ved å legge til <nowiki>--~~~~</nowiki>) på en egen linje på slutten av innlegget eller trykk på signatur-knappen i menyen).
 +
=== 2024 ===
 +
; 2024-10-26 : power to the server room was out (a circuit breaker had tripped) so bite and every other electrical device (including the internet gateway) was out. Fixed (by Nikolai) by resetting the breaker. bite was up again at around 16:00 hours. [[Bruker:Tingo|Tingo]] ([[Brukerdiskusjon:Tingo|diskusjon]]) 26. okt. 2024 kl. 19:27 (CEST)
 +
 
=== 2023 ===
 
=== 2023 ===
 
; 2023-08-24 : this evening, trygvis installed the newest kernel for Debian buster on the server:
 
; 2023-08-24 : this evening, trygvis installed the newest kernel for Debian buster on the server:

Nåværende revisjon fra 26. okt. 2024 kl. 18:27

{{#invoke:Infobox|infobox}}


nettverk

br0 har fast adresse på det lokale nettverket, 10.13.37.3. br1 har fast adresse på det eksterne nettverket, 77.40.158.102. bite sin ruter er powertech sin, nemlig 77.40.158.97.

virtuelle maskiner kan bli med i br0, og/eller br1 avhengig av hvilke nettverk de skal være på

Virtuelle maskiner

Bite er stort sett host for en del VMer. Vi bruker både QEMU og LXC for virtualisering.

vm'er på bite

$ sudo virsh list
 Id   Name                      State
-----------------------------------------
 1    iot                       running
 2    zabbix.karlsbakk.net      running
 3    heim                      running
 4    francesco.karlsbakk.net   running
 5    p2k16-production          running
 6    riemann                   running
 7    p2k16-staging             running
 8    p2k16                     running
 9    ssh.karlsbakk.net         running
 10   dlock                     running
 11   unifi                     running

En av disse er tenkt å kjøre Dockers.


Home assistant docker

Som eksempel er Home Assistant (snart) satt opp som en Docker på en av maskinene.

Hjemmeområde for medlemmer

Heim er en egen virtuell server.

styre virtuelle maskiner

det går an å styre de virtuelle maskinene på bite over nettverket: Remote Management of Guests

LXC

lxc-create \
 -B zfs --zfsroot=pool0/lxc \
 -t debian \
 --name=$NAME -- \
 -r stretch

HP Smart Array P410i

=> controller slot=0 show config detail 

[...]

   Firmware Version: 6.00-2

[...]

=> controller slot=0 modify hbamode=on forced

Error: Syntax error at "hbamode"

=>

Drives

Logiske drives i maskina.

tingo@bite:~$ sudo hpacucli controller slot=0 logicaldrive all show

Smart Array P410i in Slot 0 (Embedded)

   array A

      logicaldrive 1 (465.7 GB, RAID 1, OK)

Fysiske disker i maskina.

tingo@bite:~$ sudo hpacucli controller slot=0 physicaldrive all show

Smart Array P410i in Slot 0 (Embedded)

   array A

      physicaldrive 1I:1:1 (port 1I:box 1:bay 1, Solid State SATA, 500 GB, OK)
      physicaldrive 1I:1:2 (port 1I:box 1:bay 2, Solid State SATA, 500 GB, OK)

   unassigned

      physicaldrive 1I:1:3 (port 1I:box 1:bay 3, SAS, 146 GB, OK)
      physicaldrive 1I:1:4 (port 1I:box 1:bay 4, SAS, 146 GB, OK)
      physicaldrive 2I:1:5 (port 2I:box 1:bay 5, SAS, 146 GB, OK)
      physicaldrive 2I:1:6 (port 2I:box 1:bay 6, SAS, 146 GB, Predictive Failure)
      physicaldrive 2I:1:7 (port 2I:box 1:bay 7, SAS, 146 GB, OK)
      physicaldrive 2I:1:8 (port 2I:box 1:bay 8, SAS, 146 GB, OK)

Logg

(Signer loggen ved å legge til --~~~~) på en egen linje på slutten av innlegget eller trykk på signatur-knappen i menyen).

2024

2024-10-26 
power to the server room was out (a circuit breaker had tripped) so bite and every other electrical device (including the internet gateway) was out. Fixed (by Nikolai) by resetting the breaker. bite was up again at around 16:00 hours. Tingo (diskusjon) 26. okt. 2024 kl. 19:27 (CEST)

2023

2023-08-24 
this evening, trygvis installed the newest kernel for Debian buster on the server:
tingo@bite:~$ uname -a
Linux bite 5.10.0-0.deb10.24-amd64 #1 SMP Debian 5.10.179-5~deb10u1 (2023-08-08) x86_64 GNU/Linux

package

tingo@bite:~$ sudo apt list --installed linux-image*
Listing... Done
linux-image-5.10-amd64/oldoldstable,now 5.10.179-5~deb10u1 amd64 [installed]
linux-image-5.10.0-0.deb10.16-amd64/buster-backports,now 5.10.127-2~bpo10+1 amd64 [installed,automatic]
linux-image-5.10.0-0.deb10.24-amd64/oldoldstable,now 5.10.179-5~deb10u1 amd64 [installed,automatic]
linux-image-amd64/buster-backports,now 5.10.127-2~bpo10+1 amd64 [installed]

Tingo (diskusjon) 24. aug. 2023 kl. 21:41 (CEST)

2023-08-24 
new uptime status and reboot log
tingo@bite:~$ date;uptime
Thu 24 Aug 2023 05:27:24 PM CEST
 17:27:24 up 22:10,  2 users,  load average: 0.19, 0.19, 0.22

reboot log

tingo@bite:~$ last | grep reboot
reboot   system boot  5.10.0-0.deb10.1 Wed Aug 23 19:16   still running
reboot   system boot  5.10.0-0.deb10.1 Tue Aug 22 18:44   still running
reboot   system boot  5.10.0-0.deb10.1 Tue Aug 22 13:52   still running
reboot   system boot  5.10.0-0.deb10.1 Tue Aug 22 13:30   still running
reboot   system boot  5.10.0-0.deb10.1 Fri Aug  4 09:34   still running
reboot   system boot  5.10.0-0.deb10.1 Tue Dec 20 17:58   still running
reboot   system boot  5.10.0-0.deb10.1 Tue Dec 20 12:30   still running
reboot   system boot  5.10.0-0.deb10.1 Wed Nov 30 09:06   still running
reboot   system boot  5.10.0-0.bpo.8-a Sat Aug 27 13:04   still running
reboot   system boot  5.10.0-0.bpo.8-a Tue Aug 16 10:11   still running
reboot   system boot  5.10.0-0.bpo.8-a Fri Aug 12 08:45   still running
reboot   system boot  5.10.0-0.bpo.8-a Sat Jul 23 14:18   still running
reboot   system boot  5.10.0-0.bpo.8-a Sat May  7 14:06   still running
reboot   system boot  5.10.0-0.bpo.8-a Thu Nov  4 10:28   still running
reboot   system boot  5.10.0-0.bpo.8-a Tue Sep 14 18:01   still running
reboot   system boot  5.10.0-0.bpo.8-a Sun Sep 12 23:14   still running
reboot   system boot  5.10.0-0.bpo.8-a Sun Sep 12 22:26 - 23:06  (00:39)
reboot   system boot  5.10.0-0.bpo.8-a Sun Sep 12 18:42 - 22:22  (03:39)
reboot   system boot  5.10.0-0.bpo.8-a Sat Sep 11 21:50 - 18:28  (20:37)
reboot   system boot  5.10.0-0.bpo.8-a Sat Sep 11 18:57 - 21:33  (02:36)

Tingo (diskusjon) 24. aug. 2023 kl. 17:29 (CEST)

2023-08-23 
restarted bite this evening. On first start, it hung (no unusal error messages). On second start it started normally. Tingo (diskusjon) 23. aug. 2023 kl. 19:44 (CEST) No new entries in the error (sel) log. Tingo (diskusjon) 23. aug. 2023 kl. 19:46 (CEST)
2023-08-22 
uptime status
tingo@bite:~$ date;uptime
Tue 22 Aug 2023 09:42:20 PM CEST
 21:42:20 up  2:58,  1 user,  load average: 0.38, 0.39, 0.41

Tingo (diskusjon) 22. aug. 2023 kl. 21:43 (CEST)

2023-08-22 
physical and logical drive info
tingo@bite:~$ sudo hpacucli controller slot=0 physicaldrive all show

Smart Array P410i in Slot 0 (Embedded)

   array A

      physicaldrive 1I:1:1 (port 1I:box 1:bay 1, Solid State SATA, 500 GB, OK)
      physicaldrive 1I:1:2 (port 1I:box 1:bay 2, Solid State SATA, 500 GB, OK)

tingo@bite:~$ sudo hpacucli controller slot=0 logicaldrive all show

Smart Array P410i in Slot 0 (Embedded)

   array A

      logicaldrive 1 (465.7 GB, RAID 1, OK)

Tingo (diskusjon) 22. aug. 2023 kl. 19:39 (CEST)

2023-08-22 
memory info
tingo@bite:~$ sudo lshw -c memory -short
H/W path             Device           Class          Description
================================================================
/0/0                                  memory         64KiB BIOS
/0/400/710                            memory         192KiB L1 cache
/0/400/720                            memory         1536KiB L2 cache
/0/400/730                            memory         12MiB L3 cache
/0/406/716                            memory         192KiB L1 cache
/0/406/726                            memory         1536KiB L2 cache
/0/406/736                            memory         12MiB L3 cache
/0/1000                               memory         192GiB System Memory
/0/1000/0                             memory         DIMM DDR3 Synchronous [empty]
/0/1000/1                             memory         16GiB DIMM DDR3 Synchronous 1067 MHz (0.9 ns)
/0/1000/2                             memory         16GiB DIMM DDR3 Synchronous 1067 MHz (0.9 ns)
/0/1000/3                             memory         DIMM DDR3 Synchronous [empty]
/0/1000/4                             memory         16GiB DIMM DDR3 Synchronous 1067 MHz (0.9 ns)
/0/1000/5                             memory         16GiB DIMM DDR3 Synchronous 1067 MHz (0.9 ns)
/0/1000/6                             memory         DIMM DDR3 Synchronous [empty]
/0/1000/7                             memory         16GiB DIMM DDR3 Synchronous 1067 MHz (0.9 ns)
/0/1000/8                             memory         16GiB DIMM DDR3 Synchronous 1067 MHz (0.9 ns)
/0/1000/9                             memory         DIMM DDR3 Synchronous [empty]
/0/1000/a                             memory         16GiB DIMM DDR3 Synchronous 1067 MHz (0.9 ns)
/0/1000/b                             memory         16GiB DIMM DDR3 Synchronous 1067 MHz (0.9 ns)
/0/1000/c                             memory         DIMM DDR3 Synchronous [empty]
/0/1000/d                             memory         16GiB DIMM DDR3 Synchronous 1067 MHz (0.9 ns)
/0/1000/e                             memory         16GiB DIMM DDR3 Synchronous 1067 MHz (0.9 ns)
/0/1000/f                             memory         DIMM DDR3 Synchronous [empty]
/0/1000/10                            memory         16GiB DIMM DDR3 Synchronous 1067 MHz (0.9 ns)
/0/1000/11                            memory         16GiB DIMM DDR3 Synchronous 1067 MHz (0.9 ns)

So 192 GiB memory in total. Tingo (diskusjon) 22. aug. 2023 kl. 19:22 (CEST)

2023-08-22 
CPU info
tingo@bite:~$ lscpu
Architecture:        x86_64
CPU op-mode(s):      32-bit, 64-bit
Byte Order:          Little Endian
Address sizes:       40 bits physical, 48 bits virtual
CPU(s):              24
On-line CPU(s) list: 0-23
Thread(s) per core:  2
Core(s) per socket:  6
Socket(s):           2
NUMA node(s):        2
Vendor ID:           GenuineIntel
CPU family:          6
Model:               44
Model name:          Intel(R) Xeon(R) CPU           X5670  @ 2.93GHz
Stepping:            2
CPU MHz:             1910.524
CPU max MHz:         2933.0000
CPU min MHz:         1600.0000
BogoMIPS:            5864.98
Virtualization:      VT-x
L1d cache:           32K
L1i cache:           32K
L2 cache:            256K
L3 cache:            12288K
NUMA node0 CPU(s):   0,2,4,6,8,10,12,14,16,18,20,22
NUMA node1 CPU(s):   1,3,5,7,9,11,13,15,17,19,21,23
Flags:               fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc cpuid aperfmperf pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 cx16 xtpr pdcm pcid dca sse4_1 sse4_2 popcnt aes lahf_lm epb pti ssbd ibrs ibpb stibp tpr_shadow vnmi flexpriority ept vpid dtherm ida arat flush_l1d

So two 6-core 12-threads Xeon CPUs. Tingo (diskusjon) 22. aug. 2023 kl. 19:17 (CEST)

2023-08-22 
 Debian info
tingo@bite:~$ lsb_release -a
No LSB modules are available.
Distributor ID:	Debian
Description:	Debian GNU/Linux 10 (buster)
Release:	10
Codename:	buster

tingo@bite:~$ cat /etc/debian_version 
10.13

kernel

tingo@bite:~$ uname -a
Linux bite 5.10.0-0.deb10.16-amd64 #1 SMP Debian 5.10.127-2~bpo10+1 (2022-07-28) x86_64 GNU/Linux

perhaps we should upgrade. Tingo (diskusjon) 22. aug. 2023 kl. 19:09 (CEST)

2023-08-22 
Power button off / on. No errors, it just powered on.ipmi info - log
tingo@bite:~$ sudo ipmitool sel info
SEL Information
Version          : 1.5 (v1.5, v2 compliant)
Entries          : 15
Free Space       : 784 bytes 
Percent Used     : 23%
Last Add Time    : 08/04/2023 08:18:35
Last Del Time    : 05/05/2011 21:03:22
Overflow         : false
Supported Cmds   : 'Reserve' 
tingo@bite:~$ sudo ipmitool sel list
   1 | 01/27/2012 | 18:38:47 | Power Supply #0x03 | Failure detected | Asserted
   2 | 01/27/2012 | 18:59:50 | Power Supply #0x03 | Failure detected | Asserted
   3 | 01/27/2012 | 19:32:23 | Power Supply #0x03 | Failure detected | Asserted
   4 | 01/27/2012 | 19:32:30 | Power Supply #0x03 | Failure detected | Asserted
   5 | 02/18/2015 | 10:53:37 | Power Supply #0x04 | Failure detected | Asserted
   6 | 02/18/2015 | 11:28:47 | Power Supply #0x04 | Failure detected | Asserted
   7 | 02/18/2015 | 11:49:09 | Power Supply #0x03 | Failure detected | Asserted
   8 | 09/21/2017 | 14:41:16 | Fan #0x08 | Transition to Off Line | Asserted
   9 | 09/21/2017 | 14:41:16 | Fan #0x08 | Transition to Running | Deasserted
   a | 09/21/2017 | 14:42:57 | Fan #0x08 | Transition to Running | Deasserted
   b | 09/21/2017 | 14:43:10 | Fan #0x08 | Transition to Off Line | Asserted
   c | 09/21/2017 | 14:43:54 | Power Supply #0x04 | Failure detected | Asserted
   d | 09/23/2017 | 15:31:23 | Power Supply #0x04 | Failure detected | Asserted
   e | 09/23/2017 | 18:40:21 | Power Supply #0x04 | Failure detected | Asserted
   f | 08/04/2023 | 08:18:35 | Power Supply #0x03 | Failure detected | Asserted

no new entries since the last time I checked. Sensors

tingo@bite:~$ sudo ipmitool sdr
UID Light        | 0x00              | ok
Sys. Health LED  | 0x00              | ok
Power Supply 1   | 60 Watts          | ok
Power Supply 2   | 80 Watts          | ok
Power Supplies   | 0x00              | ok
Fan 1            | 29.40 percent     | ok
Fan 2            | 34.89 percent     | ok
Fan 3            | 41.16 percent     | ok
Fan 4            | 41.16 percent     | ok
Fan 5            | 34.89 percent     | ok
Fan 6            | 13.72 percent     | ok
Fans             | 0x00              | ok
Temp 1           | 29 degrees C      | ok
Temp 2           | 40 degrees C      | ok
Temp 3           | 40 degrees C      | ok
Temp 4           | 41 degrees C      | ok
Temp 5           | 42 degrees C      | ok
Temp 6           | 45 degrees C      | ok
Temp 7           | 47 degrees C      | ok
Temp 8           | 45 degrees C      | ok
Temp 9           | 42 degrees C      | ok
Temp 10          | 50 degrees C      | ok
Temp 11          | 40 degrees C      | ok
Temp 12          | 47 degrees C      | ok
Temp 13          | 37 degrees C      | ok
Temp 14          | 38 degrees C      | ok
Temp 15          | 37 degrees C      | ok
Temp 16          | disabled          | ns
Temp 17          | disabled          | ns
Temp 18          | disabled          | ns
Temp 19          | 31 degrees C      | ok
Temp 20          | 36 degrees C      | ok
Temp 21          | 38 degrees C      | ok
Temp 22          | 37 degrees C      | ok
Temp 23          | 43 degrees C      | ok
Temp 24          | 41 degrees C      | ok
Temp 25          | 39 degrees C      | ok
Temp 26          | 41 degrees C      | ok
Temp 27          | disabled          | ns
Temp 28          | disabled          | ns
Temp 29          | 35 degrees C      | ok
Temp 30          | 69 degrees C      | ok
Memory           | 0x00              | ok
Power Meter      | 172 Watts         | ok

Values are reasonable, and the server currently draws 172 Watts. General health

tingo@bite:~$ sudo ipmitool -I open chassis status
System Power         : on
Power Overload       : false
Power Interlock      : inactive
Main Power Fault     : false
Power Control Fault  : false
Power Restore Policy : previous
Last Power Event     : 
Chassis Intrusion    : inactive
Front-Panel Lockout  : inactive
Drive Fault          : false
Cooling/Fan Fault    : false
Front Panel Control  : none

looks ok. Tingo (diskusjon) 22. aug. 2023 kl. 19:02 (CEST)

2023-08-14 
log from ipmi
tingo@bite:~$ sudo ipmitool sel info 
SEL Information
Version          : 1.5 (v1.5, v2 compliant)
Entries          : 15
Free Space       : 784 bytes 
Percent Used     : 23%
Last Add Time    : 08/04/2023 08:18:35
Last Del Time    : 05/05/2011 21:03:22
Overflow         : false
Supported Cmds   : 'Reserve' 
tingo@bite:~$ sudo ipmitool sel list
   1 | 01/27/2012 | 18:38:47 | Power Supply #0x03 | Failure detected | Asserted
   2 | 01/27/2012 | 18:59:50 | Power Supply #0x03 | Failure detected | Asserted
   3 | 01/27/2012 | 19:32:23 | Power Supply #0x03 | Failure detected | Asserted
   4 | 01/27/2012 | 19:32:30 | Power Supply #0x03 | Failure detected | Asserted
   5 | 02/18/2015 | 10:53:37 | Power Supply #0x04 | Failure detected | Asserted
   6 | 02/18/2015 | 11:28:47 | Power Supply #0x04 | Failure detected | Asserted
   7 | 02/18/2015 | 11:49:09 | Power Supply #0x03 | Failure detected | Asserted
   8 | 09/21/2017 | 14:41:16 | Fan #0x08 | Transition to Off Line | Asserted
   9 | 09/21/2017 | 14:41:16 | Fan #0x08 | Transition to Running | Deasserted
   a | 09/21/2017 | 14:42:57 | Fan #0x08 | Transition to Running | Deasserted
   b | 09/21/2017 | 14:43:10 | Fan #0x08 | Transition to Off Line | Asserted
   c | 09/21/2017 | 14:43:54 | Power Supply #0x04 | Failure detected | Asserted
   d | 09/23/2017 | 15:31:23 | Power Supply #0x04 | Failure detected | Asserted
   e | 09/23/2017 | 18:40:21 | Power Supply #0x04 | Failure detected | Asserted
   f | 08/04/2023 | 08:18:35 | Power Supply #0x03 | Failure detected | Asserted

Not guaranteed that time and date has been correct always. Tingo (diskusjon) 14. aug. 2023 kl. 20:03 (CEST)

2023-08-14 
state of server hardware
tingo@bite:~$ sudo ipmitool -I open chassis status
System Power         : on
Power Overload       : false
Power Interlock      : inactive
Main Power Fault     : false
Power Control Fault  : false
Power Restore Policy : previous
Last Power Event     : 
Chassis Intrusion    : inactive
Front-Panel Lockout  : inactive
Drive Fault          : false
Cooling/Fan Fault    : false
Front Panel Control  : none

Tingo (diskusjon) 14. aug. 2023 kl. 19:51 (CEST)

2023-08-14 
ipmi - various info gathered
Device ID                 : 19
Device Revision           : 1
Firmware Revision         : 1.26
IPMI Version              : 2.0
Manufacturer ID           : 11
Manufacturer Name         : Hewlett-Packard
Product ID                : 8192 (0x2000)
Product Name              : Unknown (0x2000)
Device Available          : yes
Provides Device SDRs      : yes
Additional Device Support :
    Sensor Device
    SDR Repository Device
    SEL Device
    FRU Inventory Device

tingo@bite:~$ sudo ipmitool mc getsysinfo system_name
ProLiant DL380 G7

sensors

tingo@bite:~$ sudo ipmitool sdr
UID Light        | 0x00              | ok
Sys. Health LED  | 0x00              | ok
Power Supply 1   | 55 Watts          | ok
Power Supply 2   | 80 Watts          | ok
Power Supplies   | 0x00              | ok
Fan 1            | 29.40 percent     | ok
Fan 2            | 33.71 percent     | ok
Fan 3            | 39.98 percent     | ok
Fan 4            | 39.98 percent     | ok
Fan 5            | 33.71 percent     | ok
Fan 6            | 13.72 percent     | ok
Fans             | 0x00              | ok
Temp 1           | 26 degrees C      | ok
Temp 2           | 40 degrees C      | ok
Temp 3           | 40 degrees C      | ok
Temp 4           | 40 degrees C      | ok
Temp 5           | 41 degrees C      | ok
Temp 6           | 41 degrees C      | ok
Temp 7           | 43 degrees C      | ok
Temp 8           | 42 degrees C      | ok
Temp 9           | 39 degrees C      | ok
Temp 10          | 47 degrees C      | ok
Temp 11          | 38 degrees C      | ok
Temp 12          | 45 degrees C      | ok
Temp 13          | 34 degrees C      | ok
Temp 14          | 35 degrees C      | ok
Temp 15          | 34 degrees C      | ok
Temp 16          | disabled          | ns
Temp 17          | disabled          | ns
Temp 18          | disabled          | ns
Temp 19          | 30 degrees C      | ok
Temp 20          | 33 degrees C      | ok
Temp 21          | 35 degrees C      | ok
Temp 22          | 34 degrees C      | ok
Temp 23          | 41 degrees C      | ok
Temp 24          | 39 degrees C      | ok
Temp 25          | 36 degrees C      | ok
Temp 26          | 37 degrees C      | ok
Temp 27          | disabled          | ns
Temp 28          | disabled          | ns
Temp 29          | 35 degrees C      | ok
Temp 30          | 67 degrees C      | ok
Memory           | 0x00              | ok
Power Meter      | 172 Watts         | ok

Tingo (diskusjon) 14. aug. 2023 kl. 19:48 (CEST)

2022

2022-08-16 
the server was down due to a local power problem (the circuit breaker for the server room had tripped) and came back up automatically when power was restored. Tingo (diskusjon) 16. aug. 2022 kl. 11:15 (CEST)
2022-08-12 
after a planned power outage (work on the local power grid), bite restarted automatically when power returned, and all services on it started too. Tingo (diskusjon) 12. aug. 2022 kl. 11:48 (CEST)
2022-07-23 
bite was down / offline. Initial reports yesterday, about 21:00 hours local time. I got notified this morning. Accessed bite via iLO; everything looked ok - green lights across the board. No indication in logs (iLo Event Log, Integrated Management Log) that anything was wrong. Tried restarting the server (graceful power off / on, forced power off / on, reset) nothing worked. Even tried the physical power button on the server - it didn't work either. Got remote console working (integrated Java Remote Console, via JavaFox) and was able to see this error message on the console
Fatal PCI Express Device Error PCI Slot ?
 B00/D00/F00

relevant hits on the internet suggested re-seating the PCI cards in the server (one of the PCI cards is the RAID controller), so I did that. Unplug power, take off top cover, unscrew three thumbscrews on the back, lift out the PCI cage using the handles, take out the two PCI cards and put them back in. Re-assemble everything, power on the server. This time the remote console showed that it booted to Debian login. Some minutes later ( 10 - 15 minutes) all the virtual machines was up and running, and p2k16 worked again. Relevant internet info:

  • Proliant DL380 G7 Fatal PCI Express Device Error PCI ? B00/D00/F00[1]
  • HP ProLiant Servers - Error "PCI express device error PCI slot? 86" When the Server Powers On[2]
Tingo (diskusjon) 23. jul. 2022 kl. 14:53 (CEST)
2022-05-07 
today, after a power outage, bite didn't boot fully. power was on, and it was possible to access the machine via iLO (web or ssh). Unfortunately, the remote console via iLO doesn't work - web requires java (but Javafox doesn't work), ssh to iLo and then textcons works, but Linux uses an "unsupported graphics mode" for its console. Anyway, no error messages or problems seen in iLO and logs there. Rebooting from iLo (cold boot) didn't help. Finally I restarted the machine with the power button (I also had a local console (vga monitor plus usb keyboard attached), then it came up normally. Everything seems to work now. Maybe this hardware is getting to old? Tingo (diskusjon) 7. mai 2022 kl. 14:46 (CEST)

2021

2021-02-16 
tonight we switched the two upper disk drives in the exteranl chassis, to figure out if the anomaly is with the drive or the slot in the chassis.
tingo@bite:~$ sudo smartctl -i /dev/sdb
smartctl 6.6 2017-11-05 r4594 [x86_64-linux-5.9.0-0.bpo.5-amd64] (local build)
Copyright (C) 2002-17, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Model Family:     Seagate IronWolf
Device Model:     ST4000VN008-2DR166
Serial Number:    ZGY7LN54
Firmware Version: 0958
User Capacity:    4,000,787,030,016 bytes [4.00 TB]
Sector Size:      512 bytes logical/physical
Device is:        In smartctl database [for details use: -P show]
ATA Version is:   ATA/ATAPI-7 (minor revision not indicated)
Local Time is:    Tue Feb 16 20:18:06 2021 CET
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

tingo@bite:~$ sudo smartctl -i /dev/sdc
smartctl 6.6 2017-11-05 r4594 [x86_64-linux-5.9.0-0.bpo.5-amd64] (local build)
Copyright (C) 2002-17, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Model Family:     Seagate IronWolf
Device Model:     ST4000VN008-2DR166
Serial Number:    ZGY7KZNL
LU WWN Device Id: 5 000c50 0c6c55c55
Firmware Version: SC60
User Capacity:    4,000,787,030,016 bytes [4.00 TB]
Sector Sizes:     512 bytes logical, 4096 bytes physical
Rotation Rate:    5980 rpm
Form Factor:      3.5 inches
Device is:        In smartctl database [for details use: -P show]
ATA Version is:   ACS-3 T13/2161-D revision 5
SATA Version is:  SATA 3.1, 6.0 Gb/s (current: 3.0 Gb/s)
Local Time is:    Tue Feb 16 20:19:19 2021 CET
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

tingo@bite:~$ sudo smartctl -i /dev/sdd
smartctl 6.6 2017-11-05 r4594 [x86_64-linux-5.9.0-0.bpo.5-amd64] (local build)
Copyright (C) 2002-17, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Model Family:     Seagate IronWolf
Device Model:     ST4000VN008-2DR166
Serial Number:    ZGY7M1EN
LU WWN Device Id: 5 000c50 0c6c8b047
Firmware Version: SC60
User Capacity:    4,000,787,030,016 bytes [4.00 TB]
Sector Sizes:     512 bytes logical, 4096 bytes physical
Rotation Rate:    5980 rpm
Form Factor:      3.5 inches
Device is:        In smartctl database [for details use: -P show]
ATA Version is:   ACS-3 T13/2161-D revision 5
SATA Version is:  SATA 3.1, 6.0 Gb/s (current: 3.0 Gb/s)
Local Time is:    Tue Feb 16 20:19:36 2021 CET
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

tingo@bite:~$ sudo smartctl -i /dev/sde
smartctl 6.6 2017-11-05 r4594 [x86_64-linux-5.9.0-0.bpo.5-amd64] (local build)
Copyright (C) 2002-17, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Model Family:     Seagate IronWolf
Device Model:     ST4000VN008-2DR166
Serial Number:    ZGY7P7JN
LU WWN Device Id: 5 000c50 0c6da4787
Firmware Version: SC60
User Capacity:    4,000,787,030,016 bytes [4.00 TB]
Sector Sizes:     512 bytes logical, 4096 bytes physical
Rotation Rate:    5980 rpm
Form Factor:      3.5 inches
Device is:        In smartctl database [for details use: -P show]
ATA Version is:   ACS-3 T13/2161-D revision 5
SATA Version is:  SATA 3.1, 6.0 Gb/s (current: 3.0 Gb/s)
Local Time is:    Tue Feb 16 20:19:59 2021 CET
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

Unfortunately, the anomaly floows the slot (it is the topmost slot in the external chassis), not the drive. Tingo (diskusjon) 16. feb. 2021 kl. 21:12 (CET)

2021-02-12 
we had some trouble with the disk drives in the external chassis, here I document what smartctl reports for the drives.
tingo@bite:~$ sudo smartctl -i /dev/sdb
smartctl 6.6 2017-11-05 r4594 [x86_64-linux-5.9.0-0.bpo.5-amd64] (local build)
Copyright (C) 2002-17, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Model Family:     Seagate IronWolf
Device Model:     ST4000VN008-2DR166
Serial Number:    ZGY7KZNL
Firmware Version: 0958
User Capacity:    4,000,787,030,016 bytes [4.00 TB]
Sector Size:      512 bytes logical/physical
Device is:        In smartctl database [for details use: -P show]
ATA Version is:   ATA/ATAPI-7 (minor revision not indicated)
Local Time is:    Fri Feb 12 09:36:26 2021 CET
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

tingo@bite:~$ sudo smartctl -i /dev/sdc
smartctl 6.6 2017-11-05 r4594 [x86_64-linux-5.9.0-0.bpo.5-amd64] (local build)
Copyright (C) 2002-17, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Model Family:     Seagate IronWolf
Device Model:     ST4000VN008-2DR166
Serial Number:    ZGY7LN54
LU WWN Device Id: 5 000c50 0c6c868a0
Firmware Version: SC60
User Capacity:    4,000,787,030,016 bytes [4.00 TB]
Sector Sizes:     512 bytes logical, 4096 bytes physical
Rotation Rate:    5980 rpm
Form Factor:      3.5 inches
Device is:        In smartctl database [for details use: -P show]
ATA Version is:   ACS-3 T13/2161-D revision 5
SATA Version is:  SATA 3.1, 6.0 Gb/s (current: 3.0 Gb/s)
Local Time is:    Fri Feb 12 09:36:33 2021 CET
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

tingo@bite:~$ sudo smartctl -i /dev/sdd
smartctl 6.6 2017-11-05 r4594 [x86_64-linux-5.9.0-0.bpo.5-amd64] (local build)
Copyright (C) 2002-17, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Model Family:     Seagate IronWolf
Device Model:     ST4000VN008-2DR166
Serial Number:    ZGY7M1EN
LU WWN Device Id: 5 000c50 0c6c8b047
Firmware Version: SC60
User Capacity:    4,000,787,030,016 bytes [4.00 TB]
Sector Sizes:     512 bytes logical, 4096 bytes physical
Rotation Rate:    5980 rpm
Form Factor:      3.5 inches
Device is:        In smartctl database [for details use: -P show]
ATA Version is:   ACS-3 T13/2161-D revision 5
SATA Version is:  SATA 3.1, 6.0 Gb/s (current: 3.0 Gb/s)
Local Time is:    Fri Feb 12 09:36:51 2021 CET
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

tingo@bite:~$ sudo smartctl -i /dev/sde
smartctl 6.6 2017-11-05 r4594 [x86_64-linux-5.9.0-0.bpo.5-amd64] (local build)
Copyright (C) 2002-17, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Model Family:     Seagate IronWolf
Device Model:     ST4000VN008-2DR166
Serial Number:    ZGY7P7JN
LU WWN Device Id: 5 000c50 0c6da4787
Firmware Version: SC60
User Capacity:    4,000,787,030,016 bytes [4.00 TB]
Sector Sizes:     512 bytes logical, 4096 bytes physical
Rotation Rate:    5980 rpm
Form Factor:      3.5 inches
Device is:        In smartctl database [for details use: -P show]
ATA Version is:   ACS-3 T13/2161-D revision 5
SATA Version is:  SATA 3.1, 6.0 Gb/s (current: 3.0 Gb/s)
Local Time is:    Fri Feb 12 09:36:55 2021 CET
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

Tingo (diskusjon) 16. feb. 2021 kl. 21:06 (CET)

2021-02-10 
Bite er oppgradert til Debian 10.8 av Trygvis.
tingo@bite:~$ cat /etc/debian_version 
10.8

kernel

tingo@bite:~$ uname -a
Linux bite 5.9.0-0.bpo.5-amd64 #1 SMP Debian 5.9.15-1~bpo10+1 (2020-12-31) x86_64 GNU/Linux

Tingo (diskusjon) 13. feb. 2021 kl. 16:28 (CET)

2020

2020-11-11 
eksternt diskkabinett - byttet fra usb til esata tilkobling. Først fjernet jeg diskene fra operativsystemet ved hjelp av # echo 1 > /sys/block/<disk>/device/delete, eksempel # echo 1 > /sys/block/sdb/device/delete, deretter koblet jeg usb-kabelen fra serveren, og koblet inn esata-kabelen. Diskene dukket opp automagisk på serveren. Tingo (diskusjon) 11. nov. 2020 kl. 11:55 (CET)
2020-11-09 
eksternt diskkabinett - installert nye disker 4 x 4TB Seagate IronWolf. De nye diskene vises i lsblk
tingo@bite:~$ sudo lsblk
NAME   MAJ:MIN RM   SIZE RO TYPE MOUNTPOINT
sda      8:0    0 465.7G  0 disk 
└─sda1   8:1    0 465.7G  0 part /
sdb      8:16   0   3.7T  0 disk 
sdc      8:32   0   3.7T  0 disk 
sdd      8:48   0   3.7T  0 disk 
sde      8:64   0   3.7T  0 disk 
sr0     11:0    1  1024M  0 rom  

nye disker er sdb, sdc, sdd, sde. Tingo (diskusjon) 9. nov. 2020 kl. 13:37 (CET)

2020-10-14 
det eksterne disk-kabinettet er et IcyBox IB-RD3640SU3[3] fra RaidSonic[4]. Tingo (diskusjon) 14. okt. 2020 kl. 10:26 (CEST)
2020-06-19 
bite er oppgradert til nyeste Debian stretch (9.12) med litt hjelp fra trygvis. Tingo (diskusjon) 19. jun. 2020 kl. 16:58 (CEST)
tingo@bite:~$ lsb_release -a
No LSB modules are available.
Distributor ID:	Debian
Description:	Debian GNU/Linux 9.12 (stretch)
Release:	9.12
Codename:	stretch
tingo@bite:~$ uname -a
Linux bite 4.19.0-0.bpo.9-amd64 #1 SMP Debian 4.19.118-2~bpo9+1 (2020-05-20) x86_64 GNU/Linux

og rebootet.

2020-06-19 
bite var off (shutdown); det var veldig varmt på rommet, så kanskje det skyldes for høy temperatur. Tiltak for å få ned temperaturen var igangsatt, så jeg trykket på power-knappen og skrudde bite på igjen. Den kom opp, det gjorde også alle vm'ene på den. Tingo (diskusjon) 19. jun. 2020 kl. 12:09 (CEST)

2018

2018-10-12: Disker og div vedlikehold

De siste dagene har Bite fått nytt diskkabinett med 4x 3TB disker og nytt eSATA-kort (USB funket ikke noe særlig).

Ansible-oppsettet har blitt sterkt forbedret for automatisk utrulling av brukere, admins og roller. Tilganger på en del maskiner har blitt ryddet opp.

Marvin har blitt satt opp som disk-VM og Bitrafs Dropbox-konto blir etterhvert tilgjengelig som //marvin/dropbox på verktøymaskinene (CNC og laser). Delingen er tilgjengelig for alle som er på Bitraf, brukernavn og passord er ikke nødvendig.

--Trygvis (diskusjon) 12. okt. 2018 kl. 07:55 (UTC)

2017

2017-10-12

mastensg

la til virtuell maskin p2k16-staging for haavares. tilgjengelig på p2k16-staging.local på bitraf og p2k16-staging.bitraf.no.

2017-09-30

mastensg

jeg fant hpssacli og installerte det.

bite ~ $ sudo /opt/hp/hpssacli/bld/hpssacli
HPE Smart Storage Administrator CLI 2.40.13.0
Detecting Controllers...Done.
Type "help" for a list of supported commands.
Type "exit" to close the console.

=> ctrl slot=0 modify hbamode=on

Error: This operation is not supported with the current configuration. Use the 
       "show" command on devices to show additional details about the
       configuration.
Reason: Not supported

=> ctrl slot=0 show             

Smart Array P410i in Slot 0 (Embedded)
   Bus Interface: PCI
   Slot: 0
   Serial Number: 5001438017522FA0
   Cache Serial Number: PAAVPID1126167V
   Controller Status: OK
   Hardware Revision: C
   Firmware Version: 6.00-2
   Rebuild Priority: Medium
   Expand Priority: Medium
   Surface Scan Delay: 15 secs
   Surface Scan Mode: Idle
   Parallel Surface Scan Supported: No
   Queue Depth: Automatic
   Monitor and Performance Delay: 60  min
   Elevator Sort: Enabled
   Degraded Performance Optimization: Disabled
   Inconsistency Repair Policy: Disabled
   Wait for Cache Room: Disabled
   Surface Analysis Inconsistency Notification: Disabled
   Post Prompt Timeout: 0 secs
   Cache Board Present: True
   Cache Status: OK
   Cache Ratio: 25% Read / 75% Write
   Drive Write Cache: Disabled
   Total Cache Size: 512 MB
   Total Cache Memory Available: 400 MB
   No-Battery Write Cache: Disabled
   Cache Backup Power Source: Batteries
   Battery/Capacitor Count: 1
   Battery/Capacitor Status: OK
   SATA NCQ Supported: True
   Number of Ports: 2 Internal only
   Driver Name: hpsa
   Driver Version: 3.4.16
   Driver Supports HPE SSD Smart Path: True
   PCI Address (Domain:Bus:Device.Function): 0000:05:00.0
   Host Serial Number: CZ21450B5J
   Sanitize Erase Supported: False
   Primary Boot Volume: logicaldrive 1 (600508B1001C7A7F3E2649601E8275E0)
   Secondary Boot Volume: None

=> 

fra tråden over:

> P410i HBA mode support was only released for Integrity servers.

det virker som et dårlig tegn siden bite er en proliant-server.

jeg gir opp på hba-modus.


virtuelle maskiner.

jeg jobbet mer med heim. jeg lagde kontoer for alle medlemmer.

2017-09-23

eliasbakken, odinho, mastensg

vi konfigurerte ilo: 10.13.38.102

brukernavn og passord står i https://github.com/bitraf/infrastructure

denne maskinen var: vmwarehost07.sb1a.sparebank1.no


det er klager på ram-brikkene som har blitt satt inn

mastensg tok ut brikkene i sprekk 1, 4 og 7 på prosessor 1. det var disse som hadde fått klager på seg


maskinen har Smart Array P410i, som automatisk lager raid 1

installerte debian 9 på smart arrey-et sitt raid 1


pakker for nettverkskortene (non-free):

  • firmware-bnx2
  • firmware-bnx2x
  • firmware-qlogic

nettverkskonfigurasjon:

  • lan: dhcp / enp4s0f0 / nic 3
  • wan: 77.40.158.102 / enp4s0f1 / nic 4
  • ilo: 10.13.38.102

lagde ansible-konfigurasjon for bite: https://github.com/bitraf/infrastructure/tree/master/bite

satt virtuell maskin heim.bitraf.no i eksperimentell drift

nyttige virt-kommandoer:

bite ~ $ sudo virsh define heim.xml
Domain heim defined from heim.xml

bite ~ $ sudo virsh list --all
 Id    Name                           State
----------------------------------------------------
 -     heim                           shut off

bite ~ $ sudo virsh start heim
Domain heim started

bite ~ $ sudo virsh destroy heim
Domain heim destroyed

bite ~ $

se hva som skjer på en virtuell maskin på bite, fra din egen maskin:

~ $ virt-viewer -c qemu+ssh://mastensg@bite.bitraf.no/system heim

Om maskinen

Bite er en Is computer model::HP ProLiant DL380 G7[5] server.

Bilder

Se også

Referanser