Tue Nov 5 10:40:51 2013

Emacs registers as bookmarks

  • I spend so much time in emacs, and am constantly needing to navigate to certain files and directories.
  • It's so easy to get lost in navigation to what you were looking for.
  • Given I've got over a hundred buffers going at any time, this has saved me a lot of effort.
(set-register ?i '(file . "~/.emacs.d/init.el"))
  • Anytime I want to get to my init file I just
C-x r j o

Sun Oct 27 22:51:54 2013

Monitoring the ever changing cloud.

  • Autoscaling in AWS creates an interesting use case for most monitoring tools out there. (i.e. Nagios/Icinga)
  • With this in mind, I decided to implement a monitoring system that removed the alert conditions from the endpoint.
  • Setup each node to export the information of interest, and let one or more services decide if an alert is feasible.
  • So we created Pinky. It builds on OpenResty (nginx + lua and others).
  • Write rest controllers to provide the information in json format.
  • A collector written in Go-lang can pull down all of the json
Pinky - The agent
The Pinkies, and their purposes:
  • chef: return 'last cooked' time.
  • disk: return df(1) output.
  • dpkg: Out standing security updates
  • ec2meta: return all ec2metadata from any node.
  • hello: Hello world (used for test)
  • load: Return the output of load
  • log: Return the last # lines of a log.
  • memcache: Memcache monitoring
  • memfree: "free -m" output
  • mydb: Mysql Slave delay monitor.
  • netstat: netstat(1) output
  • nginx: Nginx log parser for returning all lines matching a given date. (under development)
  • passenger: Return passenger-status output bypassing ruby.
  • ping: Ask a given node to ping another host 1 time.
  • port: Ask a given node to check a port via nc(1)
  • proc: Test pinky for walking the /proc tree.
  • process: Return the full process tree.
  • redis: Query info; as well as other arbitary values.
  • runit: Return the status of all runit(1) services.
  • rvm: Return rubies, their gems, and versions.
  • stat: return fstat(2) on any file.
  • unicorn: Verify if any unicorns are running on older code.
  • vmstat: Return vmstat(1) output.
Labrat - The collector
  • The collector is available here
  • Usage is as follows:
ubuntu@labrat:/tmp/temp$ /data/labrat/labrat --help
Usage of /data/labrat/labrat:
  -c=1: number of parallel requests
  -p="44444": default pinky port
  -s="./servers.txt": List of servers to hit
  -u="/pinky/disk": url to hit
  • Using ec2din we generate a servers.txt containing all of instances (internal names) in the current directory.
ubuntu@labrat:/tmp/temp$ cat servers.txt|wc -l
  • Now let's query the disk endpoint on all of them, saving each into a host.monitor.json file.
ubuntu@labrat:/tmp/temp$ time /data/labrat/labrat -c=100 -u="/pinky/disk"
real    0m4.311s
user    0m0.044s
sys     0m0.144s
  • Each file looks like
$ cat ip-10-112-12-9.ec2.internal-disk.json
{"status":{"error":"","value":"OK"},"data":{"\/run":["tmpfs","1525900","188","1525712","1%"],"\/dev":["udev","3806708    ","12","3806696","1%"],"\/run\/lock":["none","5120","0","5120","0%"],"\/run\/shm":["none","3814740","0","3814740","0%"],"\/":["\/dev\/xvda1","82569904","15344892","63031632","20%"],"\/mnt":["\/dev\/xvdb","433455904","221932","411215668","1%"]},"system":{"name":"prod-lin-app-s2-i-3ad14d7","time":1382940141}}
  • Given a working directory with hundreds of json files, one can easily see the potential.
Can my app servers reach mysql?
  • Another example showing how I can find network partitions that a centralized monitoring system may not see.
$ time /data/labrat/labrat -c=200 -u="/pinky/ping/"
real    0m4.242s
user    0m0.196s
sys     0m0.308s
$ cat ip-10-81-56-37.ec2.internal-ping.json # A fail
  {"ip":"","system":{"name":"prod-lin-app-s1-i-ba8d191e","time":1382945706},"status":{"error":"100%% packet loss,","value":"FAIL"},"data":"PING ( 56(84) bytes of data.--- ping statistics ---1 packets transmitted, 0 received, 100% packet loss, time 0ms"}
$ cat ip-10-81-21-87.ec2.internal-ping.json # A pass
  {"ip":"","system":{"name":"prod-lin-app-s1-i-fa8f099e","time":1382946218},"status":{"error":"","value":"OK"},"ping_time":"26.4","data":"PING ( 56(84) bytes of data.64 bytes from icmp_req=1 ttl=234 time=26.4 ms--- ping statistics ---1 packets transmitted, 1 received, 0% packet loss, time 0msrtt min\/avg\/max\/mdev = 26.478\/26.478\/26.478\/0.000 ms"}
  • This can also monitor changes in security groups and network rules that allow/disallow access to a given host from a given group.
  • We also have a port endpoint to check specific services.
  • Processing the json is trivial and allows for easy integration into multiple projects.
Icinga - Modified to read our directory of json
  • Icinga is able to gather the data much faster via local disk read than nrpe calls.
  • The load on the Icinga server is much lower.
  • Using a tmpfs drive to store all the json helps us not run out of space and gives us a performance bump.
  • Server and group configs are automatically generated and reloaded once a minute. This prevents autoscaling from causing alerts.
Librato - Long term metrics storage.
  • Using the directory of json we can service multiple needs.
  • Monitors can use historical data from Librato, as well as the current state. (kept in the directory of json)
Building complex monitors
  • Using Cucumber, we can build feature files that consider multiple conditions before alerting.
Feature: Memcache connectivity
 Scenario: Memcache error rate due to network connectivity
   Given that the app is reporting memcache issues
   And memcache monitor is not reporting memcache errors
   And network connectivity between the app and memcache is elevated
   Then send alert to ops about ec2 network issue.
Current status:
  • Pinky and Pinky-server are both actively developed.
  • New endpoints get created as needed.
  • Developers are able to help build out requirements for new projects.
Other notes
  • Lua is fast. (duh)
  • Nginx is a known entity and has proved to be robust.
  • Much easier to debug endpoints using curl(1) or pinky command (vs nrpe)
  • We avoid DRY with metrics, monitoring, and system acceptance tests combined.

Sat Oct 26 23:28:42 2013 :wireshark: memcache:

Debugging memcache in realtime remotely with Wireshark and command line.

  • First ensure you have tshark installed (assuming OS X)
    brew install wireshark
  • Next, build a script to display all of the memcache values on the command line output.
    $ tshark -G|awk -v f="'" 'BEGIN{ print "ssh \$1 \"tcpdump -w- -s0|gzip\" | tshark -r- -Nn -tad -R \"memcache\" \\" } ($3 ~ "memcache") {print " -z \"proto,colinfo," $3 "," $3 "\" \\" } END{ print "|LANG=C sed -e " f "s# == ##g" f " -e " f "s#memcache.##g" f }' > read-memcache
    $ chmod a+rx read-memcache
    $ read-memcache some-remote-memcacheserver.mydomain.com

How it works.

  • 'tshark -G' generates all of the variables available to dissectors.
  • Using these we build the display list for each.
  • Running it we get output such as this.
1 2013-10-26 23:38:54.062904 -> MEMCACHE 123 get linbsd2:breaking_news_banners:tag:site-wide   command"get"  key"linbsd2:breaking_news_banners:tag:site-wide"
4 2013-10-26 23:38:54.062992 -> MEMCACHE 71 END   response"END"
6 2013-10-26 23:38:54.063428 -> MEMCACHE 113 get linbsd2:views/tag_smurfs_217   command"get"  key"linbsd2:views/tag_smurfs_217"
7 2013-10-26 23:38:54.063459 -> MEMCACHE 674 VALUE linbsd2:views/tag_smurfs_217 0 546   flags0  value"\x04\x08\"\x02\x1c\x02<!-- \"tag_smurfs_217\" cached 10/26/2013 at 06:03 PM -->\x0a\x09\x09\x09...
10 2013-10-26 23:38:54.065539 -> MEMCACHE 119 get linbsd2:views/tag_smurfs_16_mobile   command"get"  key"linbsd2:views/tag_smurfs_16_mobile"
11 2013-10-26 23:38:54.065575 -> MEMCACHE 71 END   response"END"
  • We can filter for errors with augmentations to the -R "filter"
  • Will add more filters later on for spotting memcache errors other than "missing key"

2013-10-26 Sat

Rejoined NetBSD to help work on #lua in the kernel. Want to extend Pinky to work with the tinyhttpd that mbalmer extended with lua support.