Forskjell mellom versjoner av «Bite»

Fra Bitraf
Hopp til navigering Hopp til søk
(2022: automatic server restart on restored power after power loss worked)
(2022: server down due to local power problem)
Linje 123: Linje 123:
  
 
=== 2022 ===
 
=== 2022 ===
 +
; 2022-08-16 : the server was down due to a local power problem (the circuit breaker for the server room had tripped) and came back up automatically when power was restored. [[Bruker:Tingo|Tingo]] ([[Brukerdiskusjon:Tingo|diskusjon]]) 16. aug. 2022 kl. 11:15 (CEST)
 +
 
; 2022-08-12 : after a planned power outage (work on the local power grid), bite restarted automatically when power returned, and all services on it started too. [[Bruker:Tingo|Tingo]] ([[Brukerdiskusjon:Tingo|diskusjon]]) 12. aug. 2022 kl. 11:48 (CEST)
 
; 2022-08-12 : after a planned power outage (work on the local power grid), bite restarted automatically when power returned, and all services on it started too. [[Bruker:Tingo|Tingo]] ([[Brukerdiskusjon:Tingo|diskusjon]]) 12. aug. 2022 kl. 11:48 (CEST)
  

Revisjonen fra 16. aug. 2022 kl. 10:15

{{#invoke:Infobox|infobox}}


nettverk

br0 har fast adresse på det lokale nettverket, 10.13.37.3. br1 har fast adresse på det eksterne nettverket, 77.40.158.102. bite sin ruter er powertech sin, nemlig 77.40.158.97.

virtuelle maskiner kan bli med i br0, og/eller br1 avhengig av hvilke nettverk de skal være på

Virtuelle maskiner

Bite er stort sett host for en del VMer. Vi bruker både QEMU og LXC for virtualisering.

vm'er på bite

$ sudo virsh list
 Id   Name                      State
-----------------------------------------
 1    iot                       running
 2    zabbix.karlsbakk.net      running
 3    heim                      running
 4    francesco.karlsbakk.net   running
 5    p2k16-production          running
 6    riemann                   running
 7    p2k16-staging             running
 8    p2k16                     running
 9    ssh.karlsbakk.net         running
 10   dlock                     running
 11   unifi                     running

En av disse er tenkt å kjøre Dockers.


Home assistant docker

Som eksempel er Home Assistant (snart) satt opp som en Docker på en av maskinene.

Hjemmeområde for medlemmer

Heim er en egen virtuell server.

styre virtuelle maskiner

det går an å styre de virtuelle maskinene på bite over nettverket: Remote Management of Guests

LXC

lxc-create \
 -B zfs --zfsroot=pool0/lxc \
 -t debian \
 --name=$NAME -- \
 -r stretch

HP Smart Array P410i

=> controller slot=0 show config detail 

[...]

   Firmware Version: 6.00-2

[...]

=> controller slot=0 modify hbamode=on forced

Error: Syntax error at "hbamode"

=>

Drives

Logiske drives i maskina.

tingo@bite:~$ sudo hpacucli controller slot=0 logicaldrive all show

Smart Array P410i in Slot 0 (Embedded)

   array A

      logicaldrive 1 (465.7 GB, RAID 1, OK)

Fysiske disker i maskina.

tingo@bite:~$ sudo hpacucli controller slot=0 physicaldrive all show

Smart Array P410i in Slot 0 (Embedded)

   array A

      physicaldrive 1I:1:1 (port 1I:box 1:bay 1, Solid State SATA, 500 GB, OK)
      physicaldrive 1I:1:2 (port 1I:box 1:bay 2, Solid State SATA, 500 GB, OK)

   unassigned

      physicaldrive 1I:1:3 (port 1I:box 1:bay 3, SAS, 146 GB, OK)
      physicaldrive 1I:1:4 (port 1I:box 1:bay 4, SAS, 146 GB, OK)
      physicaldrive 2I:1:5 (port 2I:box 1:bay 5, SAS, 146 GB, OK)
      physicaldrive 2I:1:6 (port 2I:box 1:bay 6, SAS, 146 GB, Predictive Failure)
      physicaldrive 2I:1:7 (port 2I:box 1:bay 7, SAS, 146 GB, OK)
      physicaldrive 2I:1:8 (port 2I:box 1:bay 8, SAS, 146 GB, OK)

Logg

(Signer loggen ved å legge til --~~~~) på en egen linje på slutten av innlegget eller trykk på signatur-knappen i menyen).

2022

2022-08-16 
the server was down due to a local power problem (the circuit breaker for the server room had tripped) and came back up automatically when power was restored. Tingo (diskusjon) 16. aug. 2022 kl. 11:15 (CEST)
2022-08-12 
after a planned power outage (work on the local power grid), bite restarted automatically when power returned, and all services on it started too. Tingo (diskusjon) 12. aug. 2022 kl. 11:48 (CEST)
2022-07-23 
bite was down / offline. Initial reports yesterday, about 21:00 hours local time. I got notified this morning. Accessed bite via iLO; everything looked ok - green lights across the board. No indication in logs (iLo Event Log, Integrated Management Log) that anything was wrong. Tried restarting the server (graceful power off / on, forced power off / on, reset) nothing worked. Even tried the physical power button on the server - it didn't work either. Got remote console working (integrated Java Remote Console, via JavaFox) and was able to see this error message on the console
Fatal PCI Express Device Error PCI Slot ?
 B00/D00/F00

relevant hits on the internet suggested re-seating the PCI cards in the server (one of the PCI cards is the RAID controller), so I did that. Unplug power, take off top cover, unscrew three thumbscrews on the back, lift out the PCI cage using the handles, take out the two PCI cards and put them back in. Re-assemble everything, power on the server. This time the remote console showed that it booted to Debian login. Some minutes later ( 10 - 15 minutes) all the virtual machines was up and running, and p2k16 worked again. Relevant internet info:

  • Proliant DL380 G7 Fatal PCI Express Device Error PCI ? B00/D00/F00[1]
  • HP ProLiant Servers - Error "PCI express device error PCI slot? 86" When the Server Powers On[2]
Tingo (diskusjon) 23. jul. 2022 kl. 14:53 (CEST)
2022-05-07 
today, after a power outage, bite didn't boot fully. power was on, and it was possible to access the machine via iLO (web or ssh). Unfortunately, the remote console via iLO doesn't work - web requires java (but Javafox doesn't work), ssh to iLo and then textcons works, but Linux uses an "unsupported graphics mode" for its console. Anyway, no error messages or problems seen in iLO and logs there. Rebooting from iLo (cold boot) didn't help. Finally I restarted the machine with the power button (I also had a local console (vga monitor plus usb keyboard attached), then it came up normally. Everything seems to work now. Maybe this hardware is getting to old? Tingo (diskusjon) 7. mai 2022 kl. 14:46 (CEST)

2021

2021-02-16 
tonight we switched the two upper disk drives in the exteranl chassis, to figure out if the anomaly is with the drive or the slot in the chassis.
tingo@bite:~$ sudo smartctl -i /dev/sdb
smartctl 6.6 2017-11-05 r4594 [x86_64-linux-5.9.0-0.bpo.5-amd64] (local build)
Copyright (C) 2002-17, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Model Family:     Seagate IronWolf
Device Model:     ST4000VN008-2DR166
Serial Number:    ZGY7LN54
Firmware Version: 0958
User Capacity:    4,000,787,030,016 bytes [4.00 TB]
Sector Size:      512 bytes logical/physical
Device is:        In smartctl database [for details use: -P show]
ATA Version is:   ATA/ATAPI-7 (minor revision not indicated)
Local Time is:    Tue Feb 16 20:18:06 2021 CET
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

tingo@bite:~$ sudo smartctl -i /dev/sdc
smartctl 6.6 2017-11-05 r4594 [x86_64-linux-5.9.0-0.bpo.5-amd64] (local build)
Copyright (C) 2002-17, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Model Family:     Seagate IronWolf
Device Model:     ST4000VN008-2DR166
Serial Number:    ZGY7KZNL
LU WWN Device Id: 5 000c50 0c6c55c55
Firmware Version: SC60
User Capacity:    4,000,787,030,016 bytes [4.00 TB]
Sector Sizes:     512 bytes logical, 4096 bytes physical
Rotation Rate:    5980 rpm
Form Factor:      3.5 inches
Device is:        In smartctl database [for details use: -P show]
ATA Version is:   ACS-3 T13/2161-D revision 5
SATA Version is:  SATA 3.1, 6.0 Gb/s (current: 3.0 Gb/s)
Local Time is:    Tue Feb 16 20:19:19 2021 CET
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

tingo@bite:~$ sudo smartctl -i /dev/sdd
smartctl 6.6 2017-11-05 r4594 [x86_64-linux-5.9.0-0.bpo.5-amd64] (local build)
Copyright (C) 2002-17, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Model Family:     Seagate IronWolf
Device Model:     ST4000VN008-2DR166
Serial Number:    ZGY7M1EN
LU WWN Device Id: 5 000c50 0c6c8b047
Firmware Version: SC60
User Capacity:    4,000,787,030,016 bytes [4.00 TB]
Sector Sizes:     512 bytes logical, 4096 bytes physical
Rotation Rate:    5980 rpm
Form Factor:      3.5 inches
Device is:        In smartctl database [for details use: -P show]
ATA Version is:   ACS-3 T13/2161-D revision 5
SATA Version is:  SATA 3.1, 6.0 Gb/s (current: 3.0 Gb/s)
Local Time is:    Tue Feb 16 20:19:36 2021 CET
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

tingo@bite:~$ sudo smartctl -i /dev/sde
smartctl 6.6 2017-11-05 r4594 [x86_64-linux-5.9.0-0.bpo.5-amd64] (local build)
Copyright (C) 2002-17, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Model Family:     Seagate IronWolf
Device Model:     ST4000VN008-2DR166
Serial Number:    ZGY7P7JN
LU WWN Device Id: 5 000c50 0c6da4787
Firmware Version: SC60
User Capacity:    4,000,787,030,016 bytes [4.00 TB]
Sector Sizes:     512 bytes logical, 4096 bytes physical
Rotation Rate:    5980 rpm
Form Factor:      3.5 inches
Device is:        In smartctl database [for details use: -P show]
ATA Version is:   ACS-3 T13/2161-D revision 5
SATA Version is:  SATA 3.1, 6.0 Gb/s (current: 3.0 Gb/s)
Local Time is:    Tue Feb 16 20:19:59 2021 CET
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

Unfortunately, the anomaly floows the slot (it is the topmost slot in the external chassis), not the drive. Tingo (diskusjon) 16. feb. 2021 kl. 21:12 (CET)

2021-02-12 
we had some trouble with the disk drives in the external chassis, here I document what smartctl reports for the drives.
tingo@bite:~$ sudo smartctl -i /dev/sdb
smartctl 6.6 2017-11-05 r4594 [x86_64-linux-5.9.0-0.bpo.5-amd64] (local build)
Copyright (C) 2002-17, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Model Family:     Seagate IronWolf
Device Model:     ST4000VN008-2DR166
Serial Number:    ZGY7KZNL
Firmware Version: 0958
User Capacity:    4,000,787,030,016 bytes [4.00 TB]
Sector Size:      512 bytes logical/physical
Device is:        In smartctl database [for details use: -P show]
ATA Version is:   ATA/ATAPI-7 (minor revision not indicated)
Local Time is:    Fri Feb 12 09:36:26 2021 CET
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

tingo@bite:~$ sudo smartctl -i /dev/sdc
smartctl 6.6 2017-11-05 r4594 [x86_64-linux-5.9.0-0.bpo.5-amd64] (local build)
Copyright (C) 2002-17, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Model Family:     Seagate IronWolf
Device Model:     ST4000VN008-2DR166
Serial Number:    ZGY7LN54
LU WWN Device Id: 5 000c50 0c6c868a0
Firmware Version: SC60
User Capacity:    4,000,787,030,016 bytes [4.00 TB]
Sector Sizes:     512 bytes logical, 4096 bytes physical
Rotation Rate:    5980 rpm
Form Factor:      3.5 inches
Device is:        In smartctl database [for details use: -P show]
ATA Version is:   ACS-3 T13/2161-D revision 5
SATA Version is:  SATA 3.1, 6.0 Gb/s (current: 3.0 Gb/s)
Local Time is:    Fri Feb 12 09:36:33 2021 CET
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

tingo@bite:~$ sudo smartctl -i /dev/sdd
smartctl 6.6 2017-11-05 r4594 [x86_64-linux-5.9.0-0.bpo.5-amd64] (local build)
Copyright (C) 2002-17, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Model Family:     Seagate IronWolf
Device Model:     ST4000VN008-2DR166
Serial Number:    ZGY7M1EN
LU WWN Device Id: 5 000c50 0c6c8b047
Firmware Version: SC60
User Capacity:    4,000,787,030,016 bytes [4.00 TB]
Sector Sizes:     512 bytes logical, 4096 bytes physical
Rotation Rate:    5980 rpm
Form Factor:      3.5 inches
Device is:        In smartctl database [for details use: -P show]
ATA Version is:   ACS-3 T13/2161-D revision 5
SATA Version is:  SATA 3.1, 6.0 Gb/s (current: 3.0 Gb/s)
Local Time is:    Fri Feb 12 09:36:51 2021 CET
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

tingo@bite:~$ sudo smartctl -i /dev/sde
smartctl 6.6 2017-11-05 r4594 [x86_64-linux-5.9.0-0.bpo.5-amd64] (local build)
Copyright (C) 2002-17, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Model Family:     Seagate IronWolf
Device Model:     ST4000VN008-2DR166
Serial Number:    ZGY7P7JN
LU WWN Device Id: 5 000c50 0c6da4787
Firmware Version: SC60
User Capacity:    4,000,787,030,016 bytes [4.00 TB]
Sector Sizes:     512 bytes logical, 4096 bytes physical
Rotation Rate:    5980 rpm
Form Factor:      3.5 inches
Device is:        In smartctl database [for details use: -P show]
ATA Version is:   ACS-3 T13/2161-D revision 5
SATA Version is:  SATA 3.1, 6.0 Gb/s (current: 3.0 Gb/s)
Local Time is:    Fri Feb 12 09:36:55 2021 CET
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

Tingo (diskusjon) 16. feb. 2021 kl. 21:06 (CET)

2021-02-10 
Bite er oppgradert til Debian 10.8 av Trygvis.
tingo@bite:~$ cat /etc/debian_version 
10.8

kernel

tingo@bite:~$ uname -a
Linux bite 5.9.0-0.bpo.5-amd64 #1 SMP Debian 5.9.15-1~bpo10+1 (2020-12-31) x86_64 GNU/Linux

Tingo (diskusjon) 13. feb. 2021 kl. 16:28 (CET)

2020

2020-11-11 
eksternt diskkabinett - byttet fra usb til esata tilkobling. Først fjernet jeg diskene fra operativsystemet ved hjelp av # echo 1 > /sys/block/<disk>/device/delete, eksempel # echo 1 > /sys/block/sdb/device/delete, deretter koblet jeg usb-kabelen fra serveren, og koblet inn esata-kabelen. Diskene dukket opp automagisk på serveren. Tingo (diskusjon) 11. nov. 2020 kl. 11:55 (CET)
2020-11-09 
eksternt diskkabinett - installert nye disker 4 x 4TB Seagate IronWolf. De nye diskene vises i lsblk
tingo@bite:~$ sudo lsblk
NAME   MAJ:MIN RM   SIZE RO TYPE MOUNTPOINT
sda      8:0    0 465.7G  0 disk 
└─sda1   8:1    0 465.7G  0 part /
sdb      8:16   0   3.7T  0 disk 
sdc      8:32   0   3.7T  0 disk 
sdd      8:48   0   3.7T  0 disk 
sde      8:64   0   3.7T  0 disk 
sr0     11:0    1  1024M  0 rom  

nye disker er sdb, sdc, sdd, sde. Tingo (diskusjon) 9. nov. 2020 kl. 13:37 (CET)

2020-10-14 
det eksterne disk-kabinettet er et IcyBox IB-RD3640SU3[3] fra RaidSonic[4]. Tingo (diskusjon) 14. okt. 2020 kl. 10:26 (CEST)
2020-06-19 
bite er oppgradert til nyeste Debian stretch (9.12) med litt hjelp fra trygvis. Tingo (diskusjon) 19. jun. 2020 kl. 16:58 (CEST)
tingo@bite:~$ lsb_release -a
No LSB modules are available.
Distributor ID:	Debian
Description:	Debian GNU/Linux 9.12 (stretch)
Release:	9.12
Codename:	stretch
tingo@bite:~$ uname -a
Linux bite 4.19.0-0.bpo.9-amd64 #1 SMP Debian 4.19.118-2~bpo9+1 (2020-05-20) x86_64 GNU/Linux

og rebootet.

2020-06-19 
bite var off (shutdown); det var veldig varmt på rommet, så kanskje det skyldes for høy temperatur. Tiltak for å få ned temperaturen var igangsatt, så jeg trykket på power-knappen og skrudde bite på igjen. Den kom opp, det gjorde også alle vm'ene på den. Tingo (diskusjon) 19. jun. 2020 kl. 12:09 (CEST)

2018

2018-10-12: Disker og div vedlikehold

De siste dagene har Bite fått nytt diskkabinett med 4x 3TB disker og nytt eSATA-kort (USB funket ikke noe særlig).

Ansible-oppsettet har blitt sterkt forbedret for automatisk utrulling av brukere, admins og roller. Tilganger på en del maskiner har blitt ryddet opp.

Marvin har blitt satt opp som disk-VM og Bitrafs Dropbox-konto blir etterhvert tilgjengelig som //marvin/dropbox på verktøymaskinene (CNC og laser). Delingen er tilgjengelig for alle som er på Bitraf, brukernavn og passord er ikke nødvendig.

--Trygvis (diskusjon) 12. okt. 2018 kl. 07:55 (UTC)

2017

2017-10-12

mastensg

la til virtuell maskin p2k16-staging for haavares. tilgjengelig på p2k16-staging.local på bitraf og p2k16-staging.bitraf.no.

2017-09-30

mastensg

jeg fant hpssacli og installerte det.

bite ~ $ sudo /opt/hp/hpssacli/bld/hpssacli
HPE Smart Storage Administrator CLI 2.40.13.0
Detecting Controllers...Done.
Type "help" for a list of supported commands.
Type "exit" to close the console.

=> ctrl slot=0 modify hbamode=on

Error: This operation is not supported with the current configuration. Use the 
       "show" command on devices to show additional details about the
       configuration.
Reason: Not supported

=> ctrl slot=0 show             

Smart Array P410i in Slot 0 (Embedded)
   Bus Interface: PCI
   Slot: 0
   Serial Number: 5001438017522FA0
   Cache Serial Number: PAAVPID1126167V
   Controller Status: OK
   Hardware Revision: C
   Firmware Version: 6.00-2
   Rebuild Priority: Medium
   Expand Priority: Medium
   Surface Scan Delay: 15 secs
   Surface Scan Mode: Idle
   Parallel Surface Scan Supported: No
   Queue Depth: Automatic
   Monitor and Performance Delay: 60  min
   Elevator Sort: Enabled
   Degraded Performance Optimization: Disabled
   Inconsistency Repair Policy: Disabled
   Wait for Cache Room: Disabled
   Surface Analysis Inconsistency Notification: Disabled
   Post Prompt Timeout: 0 secs
   Cache Board Present: True
   Cache Status: OK
   Cache Ratio: 25% Read / 75% Write
   Drive Write Cache: Disabled
   Total Cache Size: 512 MB
   Total Cache Memory Available: 400 MB
   No-Battery Write Cache: Disabled
   Cache Backup Power Source: Batteries
   Battery/Capacitor Count: 1
   Battery/Capacitor Status: OK
   SATA NCQ Supported: True
   Number of Ports: 2 Internal only
   Driver Name: hpsa
   Driver Version: 3.4.16
   Driver Supports HPE SSD Smart Path: True
   PCI Address (Domain:Bus:Device.Function): 0000:05:00.0
   Host Serial Number: CZ21450B5J
   Sanitize Erase Supported: False
   Primary Boot Volume: logicaldrive 1 (600508B1001C7A7F3E2649601E8275E0)
   Secondary Boot Volume: None

=> 

fra tråden over:

> P410i HBA mode support was only released for Integrity servers.

det virker som et dårlig tegn siden bite er en proliant-server.

jeg gir opp på hba-modus.


virtuelle maskiner.

jeg jobbet mer med heim. jeg lagde kontoer for alle medlemmer.

2017-09-23

eliasbakken, odinho, mastensg

vi konfigurerte ilo: 10.13.38.102

brukernavn og passord står i https://github.com/bitraf/infrastructure

denne maskinen var: vmwarehost07.sb1a.sparebank1.no


det er klager på ram-brikkene som har blitt satt inn

mastensg tok ut brikkene i sprekk 1, 4 og 7 på prosessor 1. det var disse som hadde fått klager på seg


maskinen har Smart Array P410i, som automatisk lager raid 1

installerte debian 9 på smart arrey-et sitt raid 1


pakker for nettverkskortene (non-free):

  • firmware-bnx2
  • firmware-bnx2x
  • firmware-qlogic

nettverkskonfigurasjon:

  • lan: dhcp / enp4s0f0 / nic 3
  • wan: 77.40.158.102 / enp4s0f1 / nic 4
  • ilo: 10.13.38.102

lagde ansible-konfigurasjon for bite: https://github.com/bitraf/infrastructure/tree/master/bite

satt virtuell maskin heim.bitraf.no i eksperimentell drift

nyttige virt-kommandoer:

bite ~ $ sudo virsh define heim.xml
Domain heim defined from heim.xml

bite ~ $ sudo virsh list --all
 Id    Name                           State
----------------------------------------------------
 -     heim                           shut off

bite ~ $ sudo virsh start heim
Domain heim started

bite ~ $ sudo virsh destroy heim
Domain heim destroyed

bite ~ $

se hva som skjer på en virtuell maskin på bite, fra din egen maskin:

~ $ virt-viewer -c qemu+ssh://mastensg@bite.bitraf.no/system heim

Om maskinen

Bite er en Is computer model::HP ProLiant DL380 G7[5] server.

Bilder

Se også

Referanser