In order to reset ILOM password to “changeme”, use this command:

# /usr/sbin/ipmitool user set password 0x02 changeme

What is a snapshot?

A snapshot preserves the state and data of a virtual machine at a specific point in time.
State includes the virtual machine’s power state (powered‐on, powered‐off, suspended, etc).
Data includes all the files that make‐up the virtual machine, including disks, memory, and other devices, such as virtual network interface cards.

To commit all snapshots by using the command line:

Log in to the ESX host as root via the console or an SSH session. For more information about SSH, see Unable to connect to an ESX host using Secure Shell (SSH) (1003807).

Note: The following commands can also be executed remotely using the vSphere Command Line for both ESX and ESXi hosts. For more information, see vSphere Command Line Interface documentation.

Input

vmware-cmd -l

and press Enter.

The output appears similar to:

/vmfs/volumes/UUID/VMNAME/VMNAME.vmx

Input

vmware-cmd /vmfs/volumes/UUID/VMNAME/VMNAME.vmx hassnapshot

and press Enter to confirm that there is a snapshot. If the output displays a value of 1, a snapshot is present. If the output displays a value of 0, there is no snapshot present.

Input

vmware-cmd /vmfs/volumes/UUID/VMNAME/VMNAME.vmx createsnapshot

and press Enter to create a new snapshot.

For example, the command vmware-cmd /vmfs/volumes/UUID/VMNAME/VMNAME.vmx createsnapshot “test” “” 0 0 makes a snapshot without memory, quiescing, or a description called test.

Note: You can use any name you like. The name appears in the snapshot manager. For more information about the syntax of the vmware-cmd command, see vSphere Command Line Interface documentation.

Input

vmware-cmd /vmfs/volumes/UUID/VMNAME/VMNAME.vmx removesnapshots

and press Enter to remove the snapshot.

GUI

1. Select Inventory > Virtual Machine > Snapshot > Take Snapshot.
You can also right-click the virtual machine and select Snapshot > Take Snapshot.
The Take Virtual Machine Snapshot window appears.
2. Type a name for your snapshot.
3. (Optional) Type a description for your snapshot.
4. (Optional) Select the Snapshot the virtual machine’s memory check box if you want to capture the memory of the virtual machine.
5. (Optional) Select the Quiesce guest file system (Needs VMware Tools installed) check box to pause running processes on the guest operating system so that file system contents are in a known consistent state when the snapshot is taken. This applies only to virtual machines that are powered on.
6. Click OK. When the snapshot has been successfully taken, it is listed in the Recent Tasks field at the bottom of the vSphere Client.
7. Click the target virtual machine to display tasks and events for this machine or, while the virtual machine is selected, click the Tasks & Events tab.

Reference:
http://kb.vmware.com/selfservice/microsites/search.do?language=en_US&cmd=displayKC&externalId=1002310
http://pubs.vmware.com/vsp40u1/wwhelp/wwhimpl/js/html/wwhelp.htm#href=admin/t_take_a_snapshot.html
http://kb.vmware.com/selfservice/microsites/search.do?language=en_US&cmd=displayKC&externalId=1015180

To calculate physical memory capacity, is to use “top” command.
top

top – 13:15:47 up 12 days, 6:18, 1 user, load average: 0.01, 0.03, 0.01
Tasks: 22 total, 1 running, 21 sleeping, 0 stopped, 0 zombie
Cpu(s): 0.0% us, 0.0% sy, 0.0% ni, 100.0% id, 0.0% wa, 0.0% hi, 0.0% si
Mem: 1048576k total, 206424k used, 842152k free, 0k buffers
Swap: 0k total, 0k used, 0k free, 0k cached

Pay attention on “Mem”

Mem: 1048576k total, 206424k used, 842152k free, 0k buffers

Physical Memory Capacity: 1048576k
Currently Used Memory: 206424k
Free Memory: 841376k

To get the value in MB, divide the above value to 1024. For example,

Physical Memory Capacity in MB
= 1048576/1024
= 1024 MB

Currently Used Memory in MB
= 206424/1024
~ 201 MB

Free Memory in MB
= 841376k/1024
~ 822 MB

Easy !!

There is so many command available to reboot server, but which one is which that will gracefully reboot your Solaris server? Just use “reboot” command?

NO NO NO.

That’s not the right thing to do, unless you have no choice. WHY? Because “reboot” doesn’t proceed the shutdown script command, but will reboot as reboot (confuse huh?). init6 is the better choice, and for that, this is my favorite.

shutdown -y -i6 -g 180 "Message to users"

What it will do is, rebooting the server in 3 minutes (180/60) and will give message to the console to any online users.

For lesser hassle, make sure your auto-reboot = true so that you didn’t have to access ALOM to manual input in OK prompt.

root@jebat10 # eeprom |grep auto-boot
auto-boot?=true

It’s always been nice to warn users first and give them ample time to save their works before you’re going to reboot it. And make sure, that you’re rebooting the intended server !!

root@jebat10 # uname -a
SunOS jebat10 5.10 Generic_127127-11 sun4u sparc SUNW,SPARC-Enterprise

root@jebat10 # uptime
8:14pm up 288 day(s), 9:50, 7 users, load average: 0.06, 0.08, 0.13

root@jebat10 # shutdown -y -i6 -g 180 “===== server going down for reboot in 3 minutes. finish your work immediately  !! ==”

Shutdown started.    Wed Jun 10 20:43:08 EEST 2009

Broadcast Message from root (pts/12) on jebat10 Wed Jun 10 20:43:09…
The system jebat10 will be shut down in 3 minutes
===== server going down for reboot in 3 minutes. finish your work immediately  !! ==
showmount: jebat10: RPC: Program not registered
Broadcast Message from root (pts/12) on jebat10 Wed Jun 10 20:44:09…
The system jebat10 will be shut down in 2 minutes
===== server going down for reboot in 3 minutes. finish your work immediately  !! ==
showmount: jebat10: RPC: Program not registered
Broadcast Message from root (pts/12) on jebat10 Wed Jun 10 20:45:09…
The system jebat10 will be shut down in 1 minute
===== server going down for reboot in 3 minutes. finish your work immediately  !! ==
showmount: jebat10: RPC: Program not registered
Broadcast Message from root (pts/12) on jebat10 Wed Jun 10 20:45:39…
The system jebat10 will be shut down in 30 seconds
===== server going down for reboot in 3 minutes. finish your work immediately  !! ==
showmount: jebat10: RPC: Program not registered
Broadcast Message from root (pts/12) on jebat10 Wed Jun 10 20:45:59…
THE SYSTEM jebat10 IS BEING SHUT DOWN NOW ! ! !
Log off now or risk your files being damaged
===== server going down for reboot in 3 minutes. finish your work immediately  !! ==
showmount: jebat10: RPC: Program not registered
Changing to init state 6 – please wait
root@jebat10 # Connection to jebat10-m.europe.nokia.com closed.

Fun With Find

Find command combined with exec can give you about anything you want. For example, you want to list and output to file for user “darkwan”.

find / -user darkwan -exec ls -ld {} \; > /tmp/output

Cool! How about change the owner of files, from “darkwan” to “root” ?

find / -user darkwan -exec chown root {} \;

Anyway, be warned about the destructive of the command if you combined it with rm ! 

More about this topic can be referred here.

Locking User In Solaris

Locking user in Solaris is easy, and I wonder why should I write actually. But anyway, just in case if you bother to read, this is how.

Status of normal account is PS (Password Set)

jebat8#passwd -s gojilla
gojilla PS

Lock it using this command

passwd -l

jebat8#passwd -l gojilla
passwd: password information changed for gojilla

Now the account is locked. User can’t access the system as it used to be.

jebat8#passwd -s gojilla
gojilla LK

ALOM For Dungu

In day-to-day work for a remote administrator like me, most important of all is to make sure the server is accessible all the time for troubleshooting and other server administration task. In some unlucky situation when server doesn’t accept login, then ALOM is the lifeboat.

What is ALOM? Sun Advanced Lights Out Manager (ALOM) is the standard System Controller (SC) for remote out-of-band management for many current and future Sun servers. In non-geek word, we can manipulate ALOM to do task as if the server is in front of you while the fact it is distant away. So I can instruct the machine to reboot without disturbing Data Center guy in the middle of the night to do the job for me.

To do that, log in to the ALOM using terminal and reboot the server by issuing poweroff command first and poweron it back after few minutes.

To shutdown server, use

poweroff

To put it back online, use

poweron

It’s good to know that we can check status of the server using showenvironment to make sure the hardware condition is still sane.

Output of the said command in my (client’s) Sun Fire v490 is as per below;

rsc> showenvironment

=============== Environmental Status ===============

System Temperatures (Celsius):
——————————
P0 42
P1 45
P2 43
P3 43
DBP0 22

=================================

Front Status Panel:
——————-

Keyswitch position is in On mode.

System LED Status: LOCATOR FAULT POWER
[OFF] [OFF] [ ON]

=================================

Disk LED Status: OK = GREEN ERROR = YELLOW
DISK 1: [OK]
DISK 0: [OK]

=================================

Fan Tray :
———-

Tray Speed Status
—- —– ——
FAN TRAY0 CPU FAN0 5769 [OK]
FAN TRAY0 CPU FAN1 4000 [OK]
FAN TRAY0 CPU FAN2 3896 [OK]
FAN TRAY1 IO FAN0 4000 [OK]
FAN TRAY1 IO FAN1 4225 [OK]

=================================

Power Supplies:
—————

Supply Status PS Fault Fan Fault Temp Fault
—— ———— ——– ——— ———-
0 OK OFF OFF OFF
1 OK OFF OFF OFF

=================================

Super nice! Isn’t it?

In my first time using ALOM, after poweron the server, the server still doesn’t come “online” yet. I didn’t have a clue until I was shown by my higher technical guy that the server is actually ‘stuck’ at OK prompt. The reason is because autoboot is set to false. If you ever face the same thing like mine, after poweron, go to the console.

rsc> console

Here, issue printenv command of auto-boot.

ok printenv auto-boot?
auto-boot = false
ok setenv auto-boot? true
auto-boot = true
ok reset-all

In example above, I set the auto-boot from false to true. You can actually either use the solution above if you want permanently set auto-boot to true or, if you want the configuration stay put but still continuing the booting, issue command boot. Server will boot after that.

It’s a good practice also to store the OK variable for future reference. In a peaceful time, issue command eeprom in your terminal to show the output of OK variable.

Sun ALOM Software User’s Guide can be found here. Happy ALOM’ing !

In one of Linux machine that I recently administer, I found out that there is libvorbis package which obviously I didn’t need for a server system. So I try to get rid of it but “error: Failed dependencies” alerted in my remote terminal.

[root@kasturi ~]# rpm -e libvorbis
error: Failed dependencies:
libvorbis.so.0 is needed by (installed) sox-12.17.5-3.i386
libvorbisenc.so.2 is needed by (installed) sox-12.17.5-3.i386
libvorbisfile.so.3 is needed by (installed) sox-12.17.5-3.i386

So what is the next step? Get rid of these two packages together !! The syntax is

rpm -e package1 package2

[root@kasturi ~]# rpm -e sox libvorbis
error: Failed dependencies:
sox is needed by (installed) system-config-soundcard-1.2.10-2.EL4.noarch

Oh, not again. system-config-soundcard, get out of my way!

[root@kasturi ~]# rpm -e system-config-soundcard

And now your turn, sox and libvorbis !

[root@kasturi ~]# rpm -e sox libvorbis

DONE !

PS: If you’re using FC, you can use “yum remove
[package-name]” too !

DIAGNOSTIC SYSTEM WARNING

Issue :

What you have to do when suddenly “DIAGNOSTIC SYSTEM WARNING” appears in dmesg ?

tuahB1100:/#dmesg
Aug 12 04:56
NOTICE: autofs_link(): File system was registered at index 3.
NOTICE: nfs3_link(): File system was registered at index 5.
8 ccio
8/0 bc
8/0/0 mux2
8/4 c720
8/4.5 tgt
8/4.5.0 sdisk
8/4.7 tgt
8/4.7.0 sctl
8/4.8 tgt
8/4.8.0 sdisk
8/4.9 tgt
8/4.9.0 sdisk
8/4.10 tgt
8/4.10.0 sdisk
8/16 bus_adapter
8/16/5 c720
8/16/5.0 tgt
8/16/5.0.0 stape
8/16/5.2 tgt
8/16/5.2.0 sdisk
8/16/5.7 tgt
8/16/5.7.0 sctl
8/16/0 CentIf
ps2_readbyte_timeout: no byte after 500 uSec
ps2_readbyte_timeout: no byte after 500 uSec
8/16/7 ps2
8/20 bus_adapter
8/20/5 eisa
8/20/2 asio0
10 ccio
10/12 GSCtoPCI
10/12/1/0 btlan4
32 processor
49 memory
btlan4: Initializing 10/100BASE-TX card at 10/12/1/0....
Logical volume 64, 0x3 configured as ROOT
Logical volume 64, 0x2 configured as SWAP
Logical volume 64, 0x2 configured as DUMP
Swap device table: (start & size given in 512-byte blocks)
entry 0 - major is 64, minor is 0x2; start = 0, size = 4096000
Dump device table: (start & size given in 1-Kbyte blocks)
entry 0 - major is 31, minor is 0xa000; start = 88928, size = 2048000
Starting the STREAMS daemons-phase 1
Create STCP device files
B2352B/9245XB HP-UX (B.11.00) #1: Thu Nov 6 01:58:21 PST 1997
Memory Information:
physical page size = 4096 bytes, logical page size = 4096 bytes
Physical: 524288 Kbytes, lockable: 389252 Kbytes, available: 451348 Kbytes
vxfs: mesg 004: vx_mapbad - /var/opt/OV file system free inode bitmap in au 1 ma
rked bad
SCSI TAPE: dev = 0xcd010000 I/O error during close
SCSI TAPE: dev = 0xcd010000 I/O error during close
SCSI TAPE: dev = 0xcd010000 I/O error during close
SCSI TAPE: dev = 0xcd010000 I/O error during close
SCSI TAPE: dev = 0xcd010000 I/O error during close
SCSI TAPE: dev = 0xcd010000 I/O error during close
SCSI TAPE: dev = 0xcd010000 I/O error during close
SCSI TAPE: dev = 0xcd010000 I/O error during close
SCSI TAPE: dev = 0xcd010000 I/O error during close
SCSI TAPE: dev = 0xcd010000 I/O error during close
SCSI TAPE: dev = 0xcd010000 I/O error during close
DIAGNOSTIC SYSTEM WARNING:
The diagnostic logging facility has started receiving excessive
errors from the I/O subsystem. I/O error entries will be lost
until the cause of the excessive I/O logging is corrected.
If the diaglogd daemon is not active, use the Daemon Startup command
in stm to start it.
If the diaglogd daemon is active, use the logtool utility in stm
to determine which I/O subsystem is logging excessive errors.
SCSI TAPE: dev = 0xcd010000 I/O error during close

Method :

1) Is diaglogd running ?

tuahB1100:/# ps -ef |grep diaglogd |grep -v grep
root 1758 1350 0 Jun 26 ? 6:24 diaglogd

2) Diagnose using STM

  • Run logtool using mstm
  • tuahB1100:/ # mstm
    Tools | Utility | Run | (select "logtool")

or

  • Run logtool using cstm
  • tuahB1100:/ # cstm
    cstm>ru logtool

Check Also :

tuahB1100:/# cat /var/adm/syslog/syslog.log
tuahB1100:/# cat /var/opt/resmon/log/event.log

Glossary :

STM : Support Tool Manager
UUT : Unit Under Test

References :

Disclaimer :

This article is prepared for personal reference based on original discussion in ITRC forums.

Issue :

Problem starting OpC agent.

Description :

Error encountered when trying to start OpC agent

Error Log :

Error opcctla (Control Agent)(20813) : Initialize of the ITO Control Agent failed. (OpC30-1036)
Can't lookup servers: Connection request rejected (dce / rpc). (OpC20-108)
Can't lookup servers: Connection request rejected (dce / rpc). (OpC20-108)
Stopping all ITO Agent processes... (OpC30-1192)

Suggestion :

1) Stop monitoring agent
2) Stop reporting agent
3) Restart dce

HP-UX

# /sbin/init.d/Rpcd stop
# rm /var/opt/dce/dced/*.db (to clear the DCE endpoint mapper DB)
# /sbin/init.d/Rpcd start
# opcagt -start

Solaris

# /etc/init.d/hplwdce stop
# rm /opt/dcelocal/var/dced/Ep.db /opt/dcelocal/var/dced/Llb.db
# /etc/init.d/hplwdce start
# opcagt -start

Linux

# /etc/rc.d/init.d/rc.dcerpcd stop
# /etc/rc.d/init.d/rc.dce-clean start
# /etc/rc.d/init.d/rc.dcerpcd start
# opcagt -start

If dce is missing (isn’t installed at all), reinstall monitoring agent, and dce will be reinstalled as well.

Credit :

khursi
Radko
IT Resource Center forums

Disclaimer :

This article is prepared for personal reference based on original discussion in ITRC forums.

Next Page »