Coworkers of mine have told me that I come across some of the weirdest problems they have ever heard of. They also suggested that I put them online and blog about it so people can find solutions to these problems ... if anyone else in the world even has them. Let's see if anyone else really has software-related issues like I do. |
Problems Only Tyler Has
gpg --recv-keys not working with CentOS 6 and hkps
I have a need at work to import gpg keys automatically from keyservers. To make sure I trust the keys, I fully intended to use hkps keyservers. To that end I found these:
https://SERVER/pks/lookup?search=0x8F3B8C432F4393BD in order to make sure they work. For those who do not know, hkps is "HTTP Keyserver Protocol" as defined in this proposal. Thus, I can just hit the URLs directly, thankfully, because it soon became quite necessary.Time to get the GPG command working. Here's what works on all of the systems I tried EXCEPT CentOS 6. Debugging options ( --verbose --keyserver-options debug ) were added to get additional diagnostic information.gpg --verbose --status-fd 1 --keyserver hkps://keyserver.ubuntu.com --keyserver-options debug --recv-key 8F3B8C432F4393BD Typically works like a charm. CentOS 6 has this error, regardless of which server I use. gpg: requesting key 2F4393BD from hkps server keyserver.ubuntu.com gpgkeys: curl version = libcurl/7.19.7 NSS/3.19.1 Basic ECC zlib/1.2.3 libidn/1.18 libssh2/1.4.2 * About to connect() to keyserver.ubuntu.com port 443 (#0) * Trying 91.189.90.55... * connected * Connected to keyserver.ubuntu.com (91.189.90.55) port 443 (#0) * Initializing NSS with certpath: sql:/etc/pki/nssdb * CAfile: none CApath: none * Certificate is signed by an untrusted issuer: 'CN=DigiCert SHA2 Secure Server CA,O=DigiCert Inc,C=US' * NSS error -8172 * Closing connection #0 * Peer certificate cannot be authenticated with known CA certificates gpgkeys: HTTP fetch error 60: Peer certificate cannot be authenticated with known CA certificates gpg: no valid OpenPGP data found. [GNUPG:] NODATA 1 gpg: Total number processed: 0 [GNUPG:] IMPORT_RES 0 0 0 0 0 0 0 0 0 0 0 0 0 0 Wait a sec, that looks like a CA cert problem. I can curl to the URL directly from CentOS 6 and I also used openssl to manually make an HTTP request. Works every time, except from gpg. I did not dig deep enough to see why gpg does not work because I was much more interested in having this fixed and CentOS 6 is already quite ancient. So, here's what I did as a workaround. curl https://keyserver.ubuntu.com/pks/lookup?search=0x8F3B8C432F4393BD\&op=get | gpg --import --status-fd 1 And you can see here that it works perfectly fine: % Total % Received % Xferd Average Speed Time Time Time Current Dload Upload Total Spent Left Speed 100 3230 100 3230 0 0 18737 0 --:--:-- --:--:-- --:--:-- 175k gpg: /home/centos/.gnupg/trustdb.gpg: trustdb created gpg: key 2F4393BD: public key "Tyler Akins <fidian@rumkin.com>" imported [GNUPG:] IMPORTED 8F3B8C432F4393BD Tyler Akins <fidian@rumkin.com> [GNUPG:] IMPORT_OK 1 164090D5B9551478BE7F25588F3B8C432F4393BD gpg: Total number processed: 1 gpg: imported: 1 (RSA: 1) [GNUPG:] IMPORT_RES 1 0 1 1 0 0 0 0 0 0 0 0 0 0 Problem solved. |
dhclient not honoring prepend config
I have a CentOS 6.7 image running in AWS. It reads /etc/dhcp/dhclinet.conf - here is mine:timeout 300;
Please note that I fixed the first two lines! The stock version of this file did not have semicolons at the end of the timeout nor retry lines.The idea is that I would prefer to use the local dnsmasq before falling back to other domain name servers. Sounds like a typical use case, right? I can use " dhclient -r ; dhclient " to release and renew the DHCP lease and I see the entry in my /etc/resolv.conf exactly as I would expect.nameserver 127.0.0.1 # other nameservers listed later in the file. I believe that this works just fine and so I go ahead and reboot the box. Just to be sure, I double check the /etc/resolv.conf file and find out, to my horror ... the line is missing!Where did it go? It totally worked before! Running " dhclient -r ; dhclient " again puts the line back. What's the deal?It turns out that the version of dhclient that's installed (version 4.1.1 49.P1.el6.centos) is not properly setting $new_domain_name_servers for the REBOOT reason. On reboot, dhclient will talk DHCP to a server and get a new lease. It fires off /sbin/dhclient-script with $reason set to REBOOT. When I use "dhclient -r ; dhclient " it does almost the same thing but $reason is set to BOUND.The strange thing is that the environment variables are different for those two calls. For REBOOT the $new_domain_name_servers does not list 127.0.0.1 and for BOUND it does list 127.0.0.1. It should always have 127.0.0.1 because we have the "prepend domain-name-servers 127.0.0.1 " config set.I tried taking a peek at the source code but did not invest enough time to determine the cause for this issue. I mostly gave up for these reasons:
Let's talk about that workaround. On CentOS, the /sbin/dhclient-script script is made to be extended. It looks for /etc/dhcp/dhclient-enter-hooks and will execute it if it exists and is flagged as executable. You can modify $new_domain_name_servers directly here. So, just omit the "prepend domain-name-servers 127.0.0.1 " in /etc/dhcp/dhclient.conf and instead you should create /etc/dhcp/dhclient-enter-hooks with the content here.#!/bin/sh # Prepend 127.0.0.1 to the list of name servers if new servers are being set. if [ -n "$new_domain_name_servers" ]; then new_domain_name_servers="127.0.0.1 $new_domain_name_servers" fi A simple " chmod 0755 /etc/dhcp/dhclient-enter-hooks " and you're done. This will always prepend 127.0.0.1 to your list of domain name servers. The same method can work for all sorts of properties that dhclient is having difficulty honoring.Problem solved. It is possible that this isn't a "Problem Only Tyler Has". Here's a few people that could be the result of the same issue as me or possibly a related issue. They didn't solve it the same way I did and I didn't investigate their problems further to determine if they were really experiencing the same issue as myself.
I hope the workaround works for you. If you do manage to figure out and fix dhclient, let me know. |
IE8 <div> Height Changing
This was a problem that stumped me for quite some time. I'm working to create a pagination plugin where you have a single parent <div class="results"> that contains several tile <div class="tile"> elements. Basically, the structure looks a little like this:<div class="results" style="overflow: hidden; position: relative"> <div class="resultsWrapper"> <div class="tile">Result # 1</div> <div class="tile">Result # 2</div> ... <div class="tile">Result # 100</div> </div> </div> I add some styles to div.results to make it only show a few tiles at a time. Because the tiles can have a variable height, I use jQuery to calculate this:// Error detection and bounds checking removed for clarity var page = 3; // zero-based indexing var perPage = 5; var children = $('div.results').children().children(); // Get the tiles var firstChildTop = Math.floor(children.get(0).position().top); var firstVisibleTop = Math.floor(children.get(page * perPage).position().top); var lastVisibleBottom = Math.floor(children.get((page + 1) * perPage).position().top); // Show the divs on this page $('div.results').animate({ height: lastVisibleBottom - firstVisibleTop }); $('div.results').children().animate({ marginTop: firstVisibleTop - firstChildTop }); Remember, this is just an example to help illustrate what I am trying to do. You'll need quite a bit more code to make a working pager plugin for jQuery. Anyway, so this will appear to the browser that there's a sliding series of div.tile elements moving to the "page" that you are on. With the "overflow: hidden " and the negative margin, this acts like a little window seeing just a portion of the larger div.resultsWrapper that is sliding around to show just what we need.Except in IE8. It's also not the case sometimes in IE9 when rendering in IE8 mode, but only sometimes. The problem boils down to the heights of the elements. When IE8 slides the div.resultsWrapper up, the div.tile elements forget their heights. It's crazy, but you could have some JavaScript like this to show the heights:var h = 'Heights: '; $('div.tile').each(function () { h += ' ' + $(this).height(); }); console.log(h); You'll see output like this when at the top of the list: Heights: 212 197 197 202 212 207 ... Now use a little jQuery magic to scroll down by setting a negative margin-top CSS property on div.resultsWrapper . Let's say you scrolled down so just a little of the bottom of the fourth element is shown. Move your mouse over the div.results element. Now, run that JavaScript again that shows the heights. I was seeing this:Heights: 47 47 47 768 212 207 ... The height of the first three shrunk to just the padding I had on div.tile and the fourth tile strangely sucked up most (but not exactly all) of the height that was missing. You can move back to the top and the content is messed up until you mouse over div.results . I set a global breakpoint and no JavaScript runs when I mouse over div.results , yet that's still when the heights changed. After much trial and error, I found that the contents of the tiles were to blame. Here's closer to what my tiles looked like, and I bet you'll start to get a feel for where the problem lies.<div class="tile"> <div class="productImage" style="float: left"><img src="..."></div> <div class="productDescription" style="float: left">This is result #1</div> <div class="clear" style="clear: both"></div> </div> My divs used " float: left " to position them inside the div.tile element properly. This works well in all browsers and looks great even in IE8 and IE7 (I have no need to go lower). The only browser that chokes is IE8. It must do something when the div.tile elements are above the visible area and it just doesn't keep them loaded or positioned properly. This feels a lot like another type of "peekaboo bug" that has plagued IE with floats ever since they were introduced in that browser.The fix: Do not use float. Yep, I tried several variations, but nothing ever worked with dynamically sizing content and floats. In the end " float: left " was replaced with "display: inline-block " and it again looks perfect in all browsers. |
WCF and gzip compression
I was helping to diagnose a problem where web requests to a service were being troublesome. It always enabled compression on the output stream, whether or not the client asked for it. Normally that is not a problem. We were using PHP to make SOAP calls and tied that to PHP's curl library because we had some special requirements regarding request and response headers that were necessary. PHP's SOAP library (when fetching via the curl module) was saying that there was no response or that there were problems decompressing the stream. Wget did not work. The curl command-line tool worked. Using a sniffer on the network showed me that data was coming across the wire. When that data was written to disk, gzip would not decompress it but zcat would. Everything worked like a charm when compression was disabled, but it was absolutely necessary that the compression was enabled and forced on in our production environment. We more carefully analyzed the responses from the server and found that there was random-ish looking data (as is expected) for most of the response and then perhaps about 1/3 is NULL bytes or (even worse) XML from some sort of SOAP request. It looks like we're leaking memory contents. Very undesirable. We obtained the source code at about the time that I noticed all response lengths were powers of 2: 256 bytes, 512 bytes, 1k, 2k, 4k, 8k. We're sending back some sort of buffer that was allocated. Here's the code that was affected -- you may notice it looks a lot like many other copies of this code on the web. //Helper method to compress an array of bytes static ArraySegment<byte> CompressBuffer(ArraySegment<byte> buffer, BufferManager bufferManager, int messageOffset) { MemoryStream memoryStream = new MemoryStream(); memoryStream.Write(buffer.Array, 0, messageOffset); using (GZipStream gzStream = new GZipStream(memoryStream, CompressionMode.Compress, true)) { gzStream.Write(buffer.Array, messageOffset, buffer.Count); } byte[] compressedBytes = memoryStream.ToArray(); byte[] bufferedBytes = bufferManager.TakeBuffer(compressedBytes.Length); Array.Copy(compressedBytes, 0, bufferedBytes, 0, compressedBytes.Length); bufferManager.ReturnBuffer(buffer.Array); ArraySegment<byte> byteArray = new ArraySegment<byte>(bufferedBytes, messageOffset, bufferedBytes.Length - messageOffset); return byteArray; } ArraySegment<byte> byteArray = new ArraySegment<byte>(bufferedBytes, messageOffset, compressedBytes.Length ); Once you make this change, your HTTP responses should no longer be exactly equal to powers of 2. You can double-check this by looking for the Content-Length headers when you sniff the traffic or use some tool that will show you the full response headers. I hope that others can spread this good knowledge out to the various other forums for when people have problems with this. I believe that this is the reason that Chrome has issues with compressed data when people are doing things like this. I found forum postings mentioning that Chrome is extra picky about compressed data and how compressed data from some C# services were not working in Chrome. |
Chef Upgrade Issue
Once again, I had a problem with Opscode Chef, but for a very understandable reason. First, while trying to spin up an instance, I see messages like this at the end. ERROR: Server returned error for http://ec2-50-17-230-193.compute-1.amazonaws.com:4000/cookbooks/phpunit/0.9.1/files/1ac61a28fa057aeb34ca4e5071e9c96c, retrying 2/5 in 8s ERROR: Server returned error for http://ec2-50-17-230-193.compute-1.amazonaws.com:4000/cookbooks/phpunit/0.9.1/files/1ac61a28fa057aeb34ca4e5071e9c96c, retrying 3/5 in 16s ERROR: Server returned error for http://ec2-50-17-230-193.compute-1.amazonaws.com:4000/cookbooks/phpunit/0.9.1/files/1ac61a28fa057aeb34ca4e5071e9c96c, retrying 4/5 in 29s ERROR: Server returned error for http://ec2-50-17-230-193.compute-1.amazonaws.com:4000/cookbooks/phpunit/0.9.1/files/1ac61a28fa057aeb34ca4e5071e9c96c, retrying 5/5 in 53s ERROR: Running exception handlers FATAL: Saving node information to /var/chef/cache/failed-run-data.json ERROR: Exception handlers complete FATAL: Stacktrace dumped to /var/chef/cache/chef-stacktrace.out FATAL: Net::HTTPFatalError: 500 "Internal Server Error" I then peeked into /var/log/chef/server.log and saw some messages. Not quite helpful, but perhaps a clue? merb : chef-server (api) : worker (port 4000) ~ Params: {"cookbook_version"=>"1.0.0", "action"=>"show_file", "cookbook_name"=>"apache2", "checksum"=>"ab1792e9de7461ddf4861e461c0c8a24", "controller"=>"cookbooks"} merb : chef-server (api) : worker (port 4000) ~ undefined method `file_location' for # - (NoMethodError) The file_location was set correctly, so now I am stumped. I restarted chef server, rebuilt the database as I mentioned on a previous blog post and uploaded everything again to the server. No luck with any of those. The failure point wasn't always on the same package. It seemed to hop around to different packages at different times. So, now I check versions of the packages that are installed. 19:32 utilities:/tmp$ sudo dpkg -l | grep chef ii chef 0.10.8-2 A systems integration framework, built to br ii chef-expander 0.10.4-1 A systems integration framework, built to br ii chef-server 0.10.4-1 A meta-gem to install all server components ii chef-server-api 0.10.4-1 A systems integration framework, built to br ii chef-server-webui 0.10.4-1 A systems integration framework, built to br ii chef-solr 0.10.4-1 Manages search indexes of Chef node attribut You'll see that one package is at 0.10.8 and the rest are all 0.10.4. Could that be it? Reinstalling the chef package didn't force upgrades of the others, so I just manually used apt-get to upgrade the other chef packages and it started to work again.Problem solved. |
Backups and Recovery
Diablo 3 on Ubuntu Linux
I sunk an obsessive amount of hours into Diablo and Diablo 2. Now Diablo 3 is newly released and I cracked under the pressure. I don't run Windows - I use Linux. Ubuntu 12.04 Precise Pangolin, to be ... precise. I also have an interesting set of criteria for whatever solution I find.
sudo add-apt-repository ppa:cheako/packages4diabloiii sudo add-apt-repository ppa:oibaf/graphics-drivers sudo apt-get update Now install the updated packages. I also installed S3TC texture compression, which may be illegal where you are. sudo apt-get upgrade sudo apt-get install libtxc-dxtn0 Lastly, we'll need to tweak things a bit when we run wine. First, go download the installer. You can just double-click on it and it will install Diablo 3 and start downloading the gigs of data. Once you get done with the download, or at least to a place where it will let you play the game, stop it. Edit the link to Diablo 3. Run " gedit " and edit Desktop/Diablo III.desktop . Inside there, you will see a line that starts with "Exec ". Add the portion in bold below to force the use of S3TC. Keep in mind that the next thing is all on one really long line.Exec=env WINEPREFIX="/home/fidian/.wine" force_s3tc_enable=true wine C:\\\\windows\\\\command\\\\start.exe /Unix /home/fidian/.wine/dosdevices/c:/users/Public/Desktop/Diablo\\ III.lnk Almost done. Now we just need to disable some security. You have two options: run a command as root whenever you want to run Diablo 3, or you can put it in your /etc/rc.local file and have it run automatically at boot.# Here is the command if you want to run it manually And now you can perhaps play. I can't because the framerate is exceedingly slow, but perhaps that's just one last hurdle to getting the game to play. |
Renaming Windows Network Adapter for VirtualBox
The problem with VirtualBox is that on Mac it names its virtual host-only adapters "vboxnet0" and the like. On Windows they are called "VirtualBox Host-Only Ethernet Adapter", maybe with "#2" added at the end. Normally this really is not a problem, but it is if you are working in a Macintosh-dominated environment and they have been using Opscode Chef, Vagrant and VirtualBox to bundle up development environments into boxes. These virtual machines may be scripted to enable specific networking configurations, such as making a host-only virtual ethernet adapter available to the virtual machine so your VMs can easily network to just each other. The problem is that the default host-only adapter name changes based on your OS, so now my configuration that's stored in the box from Vagrant is expecting an adapter named "vboxnet0" and mine isn't called that at all. Starting the VM in VirtualBox will cause problems and then Vagrant will think the install failed. You'd think it would be as easy as just going to the network settings in the Windows control panel and then right-clicking the adapter and hitting "Rename". No, it's unfortunately not nearly that simple. Contributing FactorsThis is a slightly more painful process because of the following wrinkles:
The SolutionIt turns out that the name of the network adapter, as seen by VirtualBox, is secreted away in the registry. If you use regedit to check out HKLM\SYSTEM\CurrentControlSet\Enum\Root\NET and pick one of the keys listed there. If you check out the Service value and it says "VBoxNetAdp", then you are in luck. If there is a FriendlyName value just change it to "vboxnet0". If not, make a FriendlyName value and set it to "vboxnet0". Reboot or restart all of your VirtualBox software and you should now see this renamed network adapter.Unfortunately, this is where we hit a snag. On Windows XP you may need administrator privileges to set this value. On Windows 7 you need to use the "SYSTEM" account (not the administrator account) or else you will get the wrath of the "access denied" alert. Don't fret, I've got you covered. Manual Process
Automatic ProcessIf you are in a situation like where I was and you need to get this deployed to many machines, you will want to write a little script. There's two key parts to the script - scanning and escalating. The scanning part is pretty straightforward. This is not real code, just in case you were wondering.f or each key in HKLM\SYSTEM\CurrentControlSet\Enum\Root\NET as key Next up, the escalating of privileges. Either you can write a real program or else you can perhaps use PsExec to grant you the right privileges when running a command-line tool. Attempt to rename as regular user if rename_script_result == FAILURE Attempt to rename as Administrator if rename_script_result == FAILURE Attempt to rename as SYSTEM if rename_script_result == FAILURE return FAILURE // Could not do it end end end While trying to get my solution to work, I found perhaps a half dozen ways that UAC didn't work with regard to batch files and windows scripting host. I guess that there were enough skript kiddiez out there using these tools that Microsoft needed to clamp down on the interaction between the shell and programs. I can't blame them, but it is sure hard to pop open a UAC prompt on Windows 7 from a command line; I certainly didn't find a good way. |
Disabling Hyperthreading
Opscode Chef + RabbitMQ
1-10 of 11