formatting imagepaultclark.com

Network Administrator. Here are some things you may need to know to be a good network administrator.

Event Management is the process that monitors all events that occur through the IT Infrastructure to allow for normal operation and also to detect and escalate exception conditions.

Significance of events:

  • Informational: This refers to an event that does not require any action and does not represent an exception.
    • Configuration changes
    • critical file edits
      • /boot
      • /etc/groups
      • /etc/passwd
      • hosts file
      • httpd.conf
      • resolv.conf
      • /sbin/*
      • tnsnames.ora
    • Database access
    • failed login attempts
    • OS commands
      • ifconfig
    • Software licensing usage
  • Warning: A warning is an event that is generated when a service or device is approaching a threshold.
    • application (re)starts
    • Capacity
      • CPU utilization
      • I/O thresholds
      • Memory usage
      • Partition usage
      • Swap usage
    • Dependency availability
    • Loss of redundancy
    • server restarts
    • syslog - crit, alert, emerg, and panic events
  • Exception: An exception means that a service or device is currently operating abnormally (however that has been defined).
    • Availability, Reliability
    • Customer Experience, KPIs
    • Network or port connectivity
      • (D)DOS attacks
      • ICMP connectivity
      • Security or network intrusions
      • SNMP connectivity
    • NFS server connectivity
    • Process taking too long
    • Process returns errors
    • Response Time
    • Too many / few processes
    • VIP failures

Protocols

Problem Solving

Redundancy and failovers

  • Active – active
  • Active – Passive
  • Failover testing
  • Networking
  • Layer 2
  • Layer 3
  • Tcpdump

Security

  • Close open ports
  • Disable unneeded services
  • Monitor unauthorized access attempts
  • ACLs
  • Firewalls

Security

  • crit and above should be monitored
  • Helpful in troubleshooting

Scripting

  • Must be able to read other people\’s code
  • Must test it before implementing in production
  • Better learn how to write in at least one of the following:

Documentation

  • 75% of all code is written after release
  • Share knowledge with team members frees your time
  • Do not revisit problems you have already solved
  • Searching accounts for 50% of your work

Lead by example

  • If you see an opportunity, fix it. Don’t wait on someone else to fix it for you.
  • It is easier to ask for forgiveness, than ask for permission.
  • Communicate 3x or more.