Coworkers of mine have told me that I come across some of the weirdest problems they have ever heard of. They also suggested that I put them online and blog about it so people can find solutions to these problems ... if anyone else in the world even has them. Let's see if anyone else really has software-related issues like I do.
Problems Only Tyler Has
I have a need at work to import gpg keys automatically from keyservers. To make sure I trust the keys, I fully intended to use hkps keyservers. To that end I found these:
Time to get the GPG command working. Here's what works on all of the systems I tried EXCEPT CentOS 6. Debugging options (
Typically works like a charm. CentOS 6 has this error, regardless of which server I use.
Wait a sec, that looks like a CA cert problem. I can curl to the URL directly from CentOS 6 and I also used openssl to manually make an HTTP request. Works every time, except from gpg. I did not dig deep enough to see why gpg does not work because I was much more interested in having this fixed and CentOS 6 is already quite ancient. So, here's what I did as a workaround.
And you can see here that it works perfectly fine:
I have a CentOS 6.7 image running in AWS. It reads
Please note that I fixed the first two lines! The stock version of this file did not have semicolons at the end of the
The idea is that I would prefer to use the local dnsmasq before falling back to other domain name servers. Sounds like a typical use case, right? I can use "
I believe that this works just fine and so I go ahead and reboot the box. Just to be sure, I double check the
Where did it go? It totally worked before! Running "
It turns out that the version of dhclient that's installed (version 4.1.1 49.P1.el6.centos) is not properly setting
The strange thing is that the environment variables are different for those two calls. For REBOOT the
I tried taking a peek at the source code but did not invest enough time to determine the cause for this issue. I mostly gave up for these reasons:
Let's talk about that workaround. On CentOS, the
A simple "
It is possible that this isn't a "Problem Only Tyler Has". Here's a few people that could be the result of the same issue as me or possibly a related issue. They didn't solve it the same way I did and I didn't investigate their problems further to determine if they were really experiencing the same issue as myself.
I hope the workaround works for you. If you do manage to figure out and fix dhclient, let me know.
This was a problem that stumped me for quite some time. I'm working to create a pagination plugin where you have a single parent
I add some styles to
Remember, this is just an example to help illustrate what I am trying to do. You'll need quite a bit more code to make a working pager plugin for jQuery. Anyway, so this will appear to the browser that there's a sliding series of
Except in IE8. It's also not the case sometimes in IE9 when rendering in IE8 mode, but only sometimes.
The problem boils down to the heights of the elements. When IE8 slides the
You'll see output like this when at the top of the list:
Now use a little jQuery magic to scroll down by setting a negative margin-top CSS property on
The height of the first three shrunk to just the padding I had on
My divs used "
The fix: Do not use float. Yep, I tried several variations, but nothing ever worked with dynamically sizing content and floats. In the end "
I was helping to diagnose a problem where web requests to a service were being troublesome. It always enabled compression on the output stream, whether or not the client asked for it. Normally that is not a problem. We were using PHP to make SOAP calls and tied that to PHP's curl library because we had some special requirements regarding request and response headers that were necessary.
PHP's SOAP library (when fetching via the curl module) was saying that there was no response or that there were problems decompressing the stream. Wget did not work. The curl command-line tool worked. Using a sniffer on the network showed me that data was coming across the wire. When that data was written to disk, gzip would not decompress it but zcat would.
Everything worked like a charm when compression was disabled, but it was absolutely necessary that the compression was enabled and forced on in our production environment.
We more carefully analyzed the responses from the server and found that there was random-ish looking data (as is expected) for most of the response and then perhaps about 1/3 is NULL bytes or (even worse) XML from some sort of SOAP request. It looks like we're leaking memory contents. Very undesirable.
We obtained the source code at about the time that I noticed all response lengths were powers of 2: 256 bytes, 512 bytes, 1k, 2k, 4k, 8k. We're sending back some sort of buffer that was allocated. Here's the code that was affected -- you may notice it looks a lot like many other copies of this code on the web.
This actually comes from one version of an example that Microsoft produced. In our case we thought it was Iconic.Zlib but the above code uses System.IO.Compression.GZipStream, so it isn't related to the compression library. That works like a charm. What's broken about this code is the byteArray and how many bytes are copied to it. That last line should instead look like this:
Once you make this change, your HTTP responses should no longer be exactly equal to powers of 2. You can double-check this by looking for the Content-Length headers when you sniff the traffic or use some tool that will show you the full response headers.
I hope that others can spread this good knowledge out to the various other forums for when people have problems with this. I believe that this is the reason that Chrome has issues with compressed data when people are doing things like this. I found forum postings mentioning that Chrome is extra picky about compressed data and how compressed data from some C# services were not working in Chrome.
Once again, I had a problem with Opscode Chef, but for a very understandable reason. First, while trying to spin up an instance, I see messages like this at the end.
I then peeked into /var/log/chef/server.log and saw some messages. Not quite helpful, but perhaps a clue?
The file_location was set correctly, so now I am stumped. I restarted chef server, rebuilt the database as I mentioned on a previous blog post and uploaded everything again to the server. No luck with any of those. The failure point wasn't always on the same package. It seemed to hop around to different packages at different times.
So, now I check versions of the packages that are installed.
You'll see that one package is at 0.10.8 and the rest are all 0.10.4. Could that be it? Reinstalling the chef package didn't force upgrades of the others, so I just manually used
Years ago, I had the task to create a very large network share. I decided to build a Linux box with 6 raided 1.5 TB drives. At the time, it was a hefty cost. So, when we were planning this whole thing out, it was decided that there would really be no possibility of a backup since getting tapes and building a secondary machine were both cost prohibitive. Yes, it was a risk, but one that was acceptable. To counter the
And yesterday the machine failed.
Now, I don't have another Linux box with six SATA ports on it, so I made a trip to Microcenter and purchased some handy SATA to USB devices in order to get five drives running. That way I could run in degraded mode and mount the filesystem as read-only so I could get the data off the drives. I discovered that one of the things I picked was actually IDE to USB, and so I made trip #2 to Microcenter. After that, I was wiring things together and one of the enclosures failed to work. Trip #3. At least they're really nice at the returns counter.
I plug in the drives into a USB hub, then plug in the hub and additional destination drives to my laptop. I'm recovering at a mere 20 MB/s, so it will take a long time, but at least I didn't have the drives full when I started.
So, here I am, pondering the things that went well and the things that were terrible about this strategy, and I have to say that I am quite pleased with how everything is panning out. I figure that I should give you an overview of the various pieces that were considered during building the system and how well things worked for me during this time of failure. It might keep my mind off the fact that I'm now recovering my RAID on a hodgepodge of cabling, I've got my kids looking at the flashing lights, I'm pretty sure one of the enclosures has touchy wiring making it motion sensitive and there's a thunderstorm coming. I wish I plugged all this into a UPS before I started.
I knew that I'd be building a custom system that had more space on it than what was on all of the servers, NAS devices and desktops (combined) at my current place of business. When this would fail, how would I get data off the machine? Have a backup plan. Mine was really to get the information again through a very long and painful process because I could not afford to double my costs.
To mitigate the chance of loss, I did decide that I'd always be able to afford one more drive to be used by the RAID for the "R" part (redundant). I'd need at least two drives to fail for me to lose the data.
When you purchase the drives for your devices, you want to get them from different batches. This is because hard drives manufactured at the same time tend to break at about the same time. I didn't do this either due to time constraints, but you should do what you can.
Alerts were set up to monitor the drives and let me know immediately if the data was at risk. I'd just go out and buy a new hard drive and add it to the RAID to recover. Not a big deal... as long as the other five drives stayed running.
If my machine died, part of the recovery plan was to go out and purchase USB adapters for the drives. At the time, those were a little expensive and they came down greatly in price. I figured that perhaps USB 3 could be everywhere when there was a drive failure, so I could get improved recovery speeds.
One big thing to avoid is setting up a hardware-based RAID array. Yes, they offload the RAID work to some other device, but benchmarks show that it isn't very expensive computationally to use a software based RAID. Another advantage of using a software RAID is that you can use multiple channels on the board to fetch and store information instead of passing everything through a single controller. Lastly, you avoid proprietary RAID formats. This last topic is a huge hurdle.
When you use a hardware RAID card, I strongly suggest you buy no fewer than two at the exact same time and confirm that they have the same firmware on them. I've experienced and heard of people having issues recovering a RAID when they use newer cards, different models and even with minor firmware changes. If your one controller dies, you will need a backup controller that can get the data off the RAID, otherwise you've got a lot of useless disks.
Now, compare these problems to software RAID. If I keep a CD of the distribution I used to make the RAID, I'll be able to install it again and recover. Plus, it is usually forward compatible with future versions of that software. Years ago I used mdadm to set up the RAID and today I used the current mdadm version to recover the data from the drives. No hassle at all.
Since you are investing all this time and energy in making a bulletproof system, you probably want to put it on a UPS to help your hardware last longer. The local power grid goes through brownouts, power outages, spikes and has lots of noise from adjacent buildings, blenders, fluorescent lights and other computers. A UPS stops that and conditions the power so your hardware doesn't get beaten up nearly as much. I have a feeling that something like that fried the big computer so that it can only stay on for two minutes at a time, which is why I'm trying to recover this data with my laptop.
I've worked at places where the backup job appeared to be running for months, but never actually wrote data to the disk. We were able to recover some of the data painfully (RAID failure there as well), but it also taught us to try to restore files from our backups every now and then. Acrobats test that their net will hold their weight before they blindly trust their lives. Your data is depending on you; test your "safety net" backups before you rely on them.
Keep an eye on the current safety of your systems. Set up monitoring to ensure the health of your system is consistently good. Backups are good, redundancy is good. Plan for failure and test your failure plans when you can.
Thankfully my drives were not full, otherwise I'd be spending abut 110 hours recovering them. As it is, I only have perhaps another 12 hours. The hardest part is that I'm juggling data to drives that are significantly smaller, but I would much rather have my data than try to regenerate it again!
I sunk an obsessive amount of hours into Diablo and Diablo 2. Now Diablo 3 is newly released and I cracked under the pressure. I don't run Windows - I use Linux. Ubuntu 12.04 Precise Pangolin, to be ... precise. I also have an interesting set of criteria for whatever solution I find.
Now install the updated packages. I also installed S3TC texture compression, which may be illegal where you are.
Lastly, we'll need to tweak things a bit when we run wine. First, go download the installer. You can just double-click on it and it will install Diablo 3 and start downloading the gigs of data. Once you get done with the download, or at least to a place where it will let you play the game, stop it. Edit the link to Diablo 3. Run "
Almost done. Now we just need to disable some security. You have two options: run a command as root whenever you want to run Diablo 3, or you can put it in your
And now you can perhaps play. I can't because the framerate is exceedingly slow, but perhaps that's just one last hurdle to getting the game to play.
The problem with VirtualBox is that on Mac it names its virtual host-only adapters "vboxnet0" and the like. On Windows they are called "VirtualBox Host-Only Ethernet Adapter", maybe with "#2" added at the end. Normally this really is not a problem, but it is if you are working in a Macintosh-dominated environment and they have been using Opscode Chef, Vagrant and VirtualBox to bundle up development environments into boxes. These virtual machines may be scripted to enable specific networking configurations, such as making a host-only virtual ethernet adapter available to the virtual machine so your VMs can easily network to just each other. The problem is that the default host-only adapter name changes based on your OS, so now my configuration that's stored in the box from Vagrant is expecting an adapter named "vboxnet0" and mine isn't called that at all. Starting the VM in VirtualBox will cause problems and then Vagrant will think the install failed.
You'd think it would be as easy as just going to the network settings in the Windows control panel and then right-clicking the adapter and hitting "Rename". No, it's unfortunately not nearly that simple.
Unfortunately, this is where we hit a snag. On Windows XP you may need administrator privileges to set this value. On Windows 7 you need to use the "SYSTEM" account (not the administrator account) or else you will get the wrath of the "access denied" alert. Don't fret, I've got you covered.
What this will do is first scan all net adapters for a VirtualBox network adapter. If it finds one with the name "vboxnet0" it will exit since we don't need to do any work. Failing that, it will scan again to find the first VirtualBox network adapter and attempt to rename it to vboxnet0. This will return either success or failure. If no VirtualBox network adapters were found, this script fails.
Next up, the escalating of privileges. Either you can write a real program or else you can perhaps use PsExec to grant you the right privileges when running a command-line tool.
While trying to get my solution to work, I found perhaps a half dozen ways that UAC didn't work with regard to batch files and windows scripting host. I guess that there were enough skript kiddiez out there using these tools that Microsoft needed to clamp down on the interaction between the shell and programs. I can't blame them, but it is sure hard to pop open a UAC prompt on Windows 7 from a command line; I certainly didn't find a good way.
I never thought I would find what I feel is a really bad problem with the Linux scheduler, but it's hard to argue with my results. I have an Acer Aspire One netbook and it has an 1.5 Ghz Intel Atom N550 inside. It is a dual-core CPU with hyperthreading enabled.
At first I thought I was crazy or that something was fundamentally broken with my recent Ubuntu install on this fine machine. I had been used to an HP Mini 110. It's a dual core 1 Ghz AMD processor, and I expected better performance from this one. Instead, I had found that my programs seemed to frequently hang, really crawl slowly, or sporadically operate well. Very odd behavior. I found, through use of my Mad Google Skillz, that it could be due to hyperthreading on the processor. You see, hyperthreading isn't a real processing thread. It's more like sharing parts of the same processing unit. While one is doing an addition, another could use the unused multiplication routine. If they both want to use bits of the CPU that overlap, then one process just has to wait. In my case, that starved process waited and waited and waited. It looked like Linux thought the hyperthreading was another core and treated it as though it could safely and quickly run threads on any of the available cores. Thus, lots of jobs were running on the first core and few were running on the seconds. The ones sharing a real core all got stalled.
Your CPU numbers may not match mine, so I don't suggest you use the above.
Now you might be wondering how you can get this to happen automatically on machines when they start up. You can edit /etc/rc.local and add those two lines above the 'exit' line, but if the machine changes and you now don't have hyperthreading, then maybe you disabled two processors in your quad core machine. Yikes! Since programs are supposed to detect things like this and do work for you, I scoured the internet and tried to find a way to detect if a CPU is a hyperthreading CPU or not. I didn't come up with anything at all.
But that didn't stop me.
Update 2013-03-22: Linux 2.6.x uses a comma as a separator, so changed
With the above script saved safely on my hard drive and /etc/rc.local running this shell command, I automatically disable hyperthreading just after boot... until my machine gets cloned to another netbook that doesn't have hyperthreading, and then no CPUs are disabled. The best of both worlds.
I have the privilege of working with Opscode Chef at work, maintaining the recipes as various projects move to "the cloud" or otherwise want to script the setup of their environments. While spinning up many new machines with knife, very rarely I will hit this problem. After trying pretty hard to find the root cause and why it happens on the internet, I'll sum up what I found on to this one page. Maybe it will help solve this problem for you too?
I really hate when things like that happen.
At this point, the server started acting funny and I couldn't spin up any instances, so I rebooted.
Uh-ho. Why would the server say that the client already exists? Let's go see what the logs say. I browse
What is this Bunny and why is it down? Turn out that RabbitMQ will not start because it has corrupted databases. Bummer. You may ask "how can I fix such a thing?" Well, you can't really repair the databases. Instead, we just delete them.
# Fix RabbitMQ by removing the databases
service rabbitmq-server stop
if [ -d /var/lib/rabbitmq/mnesia ]; then
echo Removing mnesia directory
rm -r /var/lib/rabbitmq/mnesia -r
service rabbitmq-server start
# Add the Chef vhost, username, password, and permissions
rabbitmqctl add_vhost /chef
PASS=$( grep ^amqp_pass /etc/chef/solr.rb | cut -d '"' -f 2 );
rabbitmqctl add_user chef $PASS
rabbitmqctl set_permissions -p /chef chef ".*" ".*" ".*"
1-10 of 11