Programmer Tips

One day I might become a teacher and help people learn to program computers.  There's several skills I need in order to achieve that goal.  Until I get there, here's my one-way communications to the world regarding specific topics.  I don't call it a blog ... yet.

Parameter Order

posted Sep 4, 2013, 3:35 PM by Tyler Akins

When functions are created, they often start out as little baby functions that don't take parameters or only accept a couple.  Then they grow to include additional functionality or they need to pass additional data to other routines and their signature grows.  When this happens, the parameter order is often overlooked.

Let's take a function in JavaScript that might seem a bit weird.  Imagine it uses an AJAX service to retrieve a web page.  It might send additional data and take a timeout.

function getWebPage(method, uri, timeout, additionalData, ajaxService) { ... }

This might seem good, but there are a few problems.

Optional At The End

The above example might have seemed good because you felt the most important things are listed first.  You can't do a web request without a method or URI, right?  The rest probably just don't matter as much.  Unfortunately, they actually matter more.

It is a good practice to place optional parameters at the end.  That way you don't have function calls like this:

callMyFunction('some parameter', undefined, 'another needed parameter');

That's just ugly.  Plus the code to do automatic insertion of default values in the middle of a list of arguments often looks terrible.  Don't get me wrong; there are times when that makes sense, but consider those times carefully.  It's not a good habit.

Back to our getWebPage function.  Moving the optional additionalData parameter and the timeout value to the end gives us the following:

function getWebPage(method, uri, ajaxService, timeout, additionalData) { ... }

Order Optionals Parameters

Optional parameters should be arranged by the likelihood of them being used.  It is far more likely for a timeout to not be specified than additionalData.  Perhaps you would always want the browser's default timeout, but you do want to POST a lot of information.  Let's reorder those optional parameters one more time.

function getWebPage(method, uri, ajaxService, additionalData, timeout) { ... }

Order Mandatory Parameters

Mandatory parameters should be ordered by how likely they are to change from invocation to invocation.  Let's say that our ajaxService never changes for the life of our application.  We might only want to change it for tests.  With JavaScript we can bind the default AJAX service, reducing the number of arguments.  Move that ajaxService up front and here's an example of using bind.

var getWebPage;
function getWebPageOriginal(ajaxService, method, uri, additionalData, timeout) { ... }
getWebPage = getWebPageOriginal.bind(null, myDefaultAjaxService);
getWebPage.get = getWebPage.bind(null, 'GET');
getWebPage.post = getWebPage.bind(null, 'GET');

Summary

If you pay attention to parameter order and keep it in a sane format, you will make things easier for other developers to use your functions.  There's a lot of other tips, like not having several parameters and using descriptive names for variables, but those are well discussed on other sites.

Password Security

posted Jul 3, 2012, 9:12 AM by Tyler Akins

Sites are compromised all the time and the public, in general, just isn't aware of most of the minor ones. It's just when passwords are leaked from larger sites and it makes the local news, only then do people care.

At the time of this article's writing, LinkedIn was the most recent big hack where 6.5 million password hashes were leaked to the public.  At least they were doing things a little better than many places - they hashed their passwords but unfortunately didn't use salts.  More on that later.

In order to understand good password security, I will first give you a breakdown on what hackers will do in order to find your password.  With this knowledge, you can understand why you need to use better passwords and different passwords for each site.  I will also give you real-world examples for password cracking times, password cracking keyspace, and more.  For reference, this is on an HP Pavilion g6 with an Intel Core i3 2350M (2 core with hyperthreading) processor, running at 2.3 gigahertz and using an OCZ Agility3 SSD, which makes loading passwords and hashes faster.

What is password hashing?

Thankfully, most places know that storing your plain password in a database is a really bad idea.  For example, a plain password will look like "Testing".  Instead, developers will use a variety of techniques to turn this into a really large set of letters and numbers.  This is typically called hashing.  "Testing" as a hash looks like "fa6a5a3224d7da66d9e0bdec25f62cf0" or "0820b32b206b7352858e8903a838ed14319acdfd".

Hashes are computed using a method that doesn't allow you to go backwards.  You might have one of the hashes above, but you can't go backwards and get "Testing" from it.  You can only go forwards.  When you log into a system, it generates a new hash with the password you just typed.  If the two hashes match, you can log in.

While it is impossible to go backwards, hackers can certainly take an entire dictionary of words and hash them all really quickly, checking if any of those hashes equal your password.

For the LinkedIn breach, I can scan all 6.5 million hashes against various wordlists, each time starting from scratch.  It takes 12 seconds to load an empty wordlist since it takes that long just to read and prepare for the cracking attempt.  Against my list of the top 7184 passwords, I found 3854 hashes in 14 seconds.  Against my list of 97 thousand English words, I recovered 22,572 in a mere 15 seconds.  When I threw my huge wordlist of over 18 million words from various languages and other password breaches and I mixed in the mutation engine to generate more likely passwords from the wordlist, I was able to crack over 390,000 of the hashes in a mere 129 seconds.

Hashing methods vary, and newer ones are typically more secure (ie. fewer "collisions"), harder to compute and slower to compute than older ones.  It's good to know that people who take security seriously aren't making the hackers' lives too easy.

Brute Force Attacks

I was using a wordlist to generate possible passwords to try, but another approach is to just use "brute force" and guess every possible password combination.  There are even rules that can be applied to the guesses to make it more likely that a real password gets generated.  It comes from us being humans and how we build our words.  For example, the letter "C" is often followed by "H" and only rarely it is followed by "K" at the beginning of a word.

I flipped the password cracker into brute force mode and let it run for almost exactly 48 hours.  On its own and without any help from any wordlists, it provided over 2 million passwords.

Distributed Cracking

The numbers I have been listing are for me cracking hashes on my own with just my laptop.  Imagine if I had a bunch of computers at a college available to try passwords simultaneously.  What if I spun up 20 Amazon cluster compute instances to crunch numbers for a mere day?  What if a bunch of hackers got serious about cracking a set of passwords and decided to pool all of their resources together?

Rainbow Tables and Salts

There are additional password cracking techniques out there to speed up the cracking.  One of them is called "rainbow tables", which is where some hashes are partially generated ahead of time, saving the up-front cost of starting a hash over and over.  It really speeds up the efforts when used against a susceptible cipher.  By "really speeds up", I am talking about cutting days of cracking down to just minutes.

System administrators have tried to combat this issue by using "salts" with passwords.  A simple way to think of the salt is to take the hash of your password with "blah blah blah blah blah" at the end of it.  Because it is significantly longer, the salted password takes far longer to compute.  A better salt would add "blah blah blah blah blah" to the hash and then hash that again, but we don't need to go into that here.  I merely wanted to point out that there are techniques that can be applied to slow down the process.

Precautions You Need To Take

Never reuse passwords!

If Mr. Evil Hacker gets your username and password from one website you visit, would you want Mr. Evil Hacker to also be able to get into your email?  Bank account?  Many people made the mistake of using the same password for LinkedIn as they did for other sites.  Now, one must assume that all of the sites are compromised and your personal information may have been leaked to unsavory characters.

Don't write passwords down

You're probably thinking "How am I to remember my crazy passwords for each site?"  Writing them down leaves them in plain text.  It could be hiding on your desk or maybe a scrap of paper in your pocket, but it's insecure.  Someone could easily walk over and read your passwords.

Use a password manager if that helps.  Depending on your needs, but maybe some software that runs on your phone is ideal.  Others use secure storage of passwords in their web browsers, like Chrome and Firefox; it is best to guard this with a "master password".  There are online password storage solutions like LastPass and Clipperz that can integrate into your browser.  Just make sure you can back up the storage of the passwords and that you can get the passwords easily whenever you need them.

Make up stories and use the third letter of each word for your password.  Or the first.  Or use poems.  Keep a book with you and assign each site a page number, then use the 10th letter from each line down the page.

Use Diceware or another system to generate truly random passwords.  Diceware uses a large table of English words, so use this method only when you can make passwords of tolerable length.  I would use at least five different words before feeling good about a random website.

Use randomly generated passwords

This eliminates the bias that people use when generating passwords.  If an attacker knows you speak English, they will probably generate passwords that look like English words.  "gpdswoir" is far stronger than "homework" even though they are the same length.  That's because Mr. Evil Hacker will first attempt to use a wordlist like I did to get as many passwords as possible.

Even the tips of "add a number at the end" and "change i into ! and a into @" are represented in mutation rules that can be applied to wordlists.  Your password really isn't much stronger since it is, in the end, just a dictionary word.

Use different types of characters

Usually, randomly generated passwords usually mix in uppercase, lowercase, numbers, and symbols.  Let's say you only used lowercase letters.  You would have only 26 options available at each position of your password.  This is called your keyspace.  To calculate the number of possibilities for a given password length, you multiply the keyspace by itself.  If you are looking for any single-letter lowercase "password", you have a mere 26 options.  If you want all two-letter passwords, that would be 26 * 26 (26^2) = 676 options.  If a site forces all passwords to lowercase or uppercase and you type in an 8-letter word, that's only 208 billion possibilities, or 2.0x10^11.

208 billion!  That's a lot, you may think.  Even with hundreds of billions of possibilities, it looks like it would only take 414 hours to go through them all on my computer.  Yep, if you used 8 or fewer characters in your password, it doesn't really pose a challenge and I would certainly get it if I tried.  With the advanced algorithms out there and wordlists, I'll probably still be able to crack most password hashes in the first 24 hours.

So, those 208 billion possibilities (or 2.0x10^11 possibilities) is not nearly large enough to thwart a concentrated attack.  We're going to start dealing with really large numbers here and your goal is to make the exponential part (the "11") much larger.  Each time you can get the exponential part even a single digit larger, it takes 10 times the computing power to search all possibilities.

By using different types of characters, such as uppercase, numbers and symbols, you increase the keyspace dramatically.  Instead of a keyspace of merely 26, now you increase it to 26 (lowercase) + 26 (uppercase) + 10 (numbers) + 32 (symbols) = 94 characters.  A randomly generated 8-character password using a keyspace this large can make about 6.01 quadrillion different passwords, or 6.0x10^15.  Again, we should focus on that "15".  We just made your password 10,000 times harder to guess.

According to my computer's statistics, my laptop could crack any 8-character password in about 1,100 days.  Assume hackers coordinate their attack and pool their resources.  Let's say we get a team of merely 100 hackers, each with 10 big machines (potentially a REALLY low estimate).  With this dedicated group of hackers and access to more powerful machines, all 8-character passwords could be cracked in just over a day (about 26-27 hours).  With botnets and hundreds of thousands of drone computers at your disposal, you could crack this in hours or minutes.

Size really matters

Each character increases the difficulty of the hack exponentially.  Depending on your keyspace, this could mean significant changes.  Assuming your keyspace of 95 characters and a length of 8, there are 6.6x10^15 possibilities.  By including just one more random character, we can generate 6.3x10^17.  One extra keypress means it is almost 100 times harder to guess.  The cracking time for my laptop went from about 1,100 days to about 105,000 days.  The dedicated group of hackers now would spend 1/3 of a year instead of a day.  A botnet equivalent to 100,000 of my laptops would still get this password in just one day.

If a site lets you use 12 characters, that's far better.  If the site doesn't restrict length, you could use 20 or more characters.  With a 95-character keyspace, 12 characters can produce 5.2x10^23 possibilities and 20 characters can make 3.5x10^39 different combinations.  We're going for computationally infeasible, and this certainly qualifies.

Use spaces too

There's a lot of password crackers out there that don't crack multi-word passwords by default.  At least add the 96th character to the keyspace.  With an 8-character password, we increase from 6.6x10^15 to 7.2x10^15, which is only a minor jump, but we've now eliminated the normal use of wordlists and people will have to crack your password using non-default techniques.

Change your password

The longer that someone has to crack your password, the more likely they will get it.  Why leave that window of opportunity open for so long?  I'm not advocating changing your passwords daily (which can also be a security risk), but perhaps change them yearly, or change the ones you care about with every season.

If there is a breach, change your password right away

It's likely hackers had your password in their hands for quite a while before a company admits it was hacked.  Before I got an email from LinkedIn, I had the 6.5 million password hashes in my hands and already found that a password matching mine was leaked.

Use two-factor authentication when possible

Not many places let you do this, but it is difficult for people to guess a password.  It is next to impossible for them to just guess your password and the number from a two-factor authentication method.  There is software for smartphones and key fobs that can be tied to web sites to generate a new number every minute automatically.  Instead of just relying on something you know (your password), they also rely on something you have (the number generator).

Assume there is no security

Often there isn't any.  Lots of sites store your password without any encryption or in a way where they can get the original password back.  Other sites mess up and encrypt the password poorly or rely on obfuscation instead of real security.

If someone gets into one of your accounts, they may try that username and email address with that password elsewhere.  They might be able to see the password recovery questions and answers, then try to use those on other sites.  If they hacked your email address, they might try getting your password reset on sites and intercept the email so they can now gain access to additional sites.  Be careful.  If your information gets exposed, you may be at a bigger risk than you realize.  Plan carefully and try to make each account as individual and separate as possible.

When you assume your password will be compromised and you plan for it, then news of password leaks at LinkedIn (or any other place) won't have you worried at all.

VMWare Tweaks

posted Feb 25, 2012, 10:34 AM by Tyler Akins   [ updated Jun 8, 2012, 10:29 AM ]

I use a virtual machine at work.  I've tried using VirtualBox, but I like VMWare Player a little more, especially since I have tweaked some settings to make it work better and changed it to be less obtrusive.

Stopping Continual Writes To Disk

First and foremost, my largest complaint with VMWare is how it always writes to the hard drive.  At times my Windows laptop would hang or be severely bogged down.  My hard drive light would blink every second or two.  I thought it might be the anti-virus software scanning or noticing changes in the *.vmdk file where the virtual disk image is stored.  It turns out that VMWare is constantly writing out the contents of the virtual machine's memory to disk.  To disable this feature, we need to edit config.ini; on Windows 7 this file is located at "C:\Users\YourUsername\Application Data\VMWare\config.ini".  You will have a difficult time browsing to this location since the Application Data folder is hidden by Windows, but you could open Notepad, use File -> Open, then paste in the filename.

There might not be a config.ini that exists already.  If that's the case, you will need to make a new file.  Once you have this file open in notepad, add this line anywhere inside.

mainMem.useNamedFile = FALSE

Save the file and close Notepad.  If you had VMWare Player open, you will need to close that too before opening another virtual machine.  I've heard that this sort of option works for VMWare Workstation too, but you should reboot or restart all of the VMWare services.  Once you start up the next virtual machine, you shouldn't see the weird hard drive blinking going on any longer.

Hotkey and Fullscreen Preferences

There are a bunch of additional settings you can change which are not available in the VMWare Player's GUI, but are still honored by the program.  Your preferences file on Windows 7 is saved in "C:\Users\YourUsername\Application Data\VMWare\preferences.ini", and you can use Notepad (described above) to open this file as well.

pref.hotkey.gui = "true"
pref.hotkey.shift = "true"
pref.hotkey.control = "true"
pref.hotkey.alt = "true"

Setting these properties will change what you hotkey is.  By default it is control + alt, which is also how I switch desktops in my Ubuntu guest OS.  I switched it to use all four keys - the Widows key (gui), shift, control and alt.

pref.vmplayer.fullscreen.autohide = "TRUE"
pref.fullscreen.toolbarpixels = "0"

The above two properties will change the toolbar when you fullscreen the application into one that will automatically hide.  This gets almost all of the VMWare Player UI out of the way so you can focus entirely on the guest OS.  I also shrink the size of the hidden toolbar; I specify 0 pixels, but it still has a couple there on top.  It's hardly noticeable so I don't mind much.

hints.hideAll = "true"
pref.vmplayer.exit.vmAction = "poweroff"

Lastly, I hide all of the silly hints that VMWare Player likes to keep popping up, plus I like changing the close button's behavior to power off the machine instead of suspend.  Both of these are available in the GUI, but I thought I'd include them here since I use them all the time.

Best Programming Language?

posted Jan 13, 2012, 7:47 PM by Tyler Akins   [ updated Jun 8, 2012, 10:25 AM ]

Occasionally, people will foolishly ask me for what I believe is the best programming language available. My answer is usually, "What are you trying to do?"  My belief is that some languages are better than others for particular tasks.  For instance, one would want to write graphics drivers in C, C++ or assembly for speed.  Likewise, we write applications on the web using a mix of JavaScript and some sort of scripting language backend such as Ruby or PHP.  To get at the core of the issue, first we must understand our restrictions and our goal.  After we truly understand where we want to go, often a particular language will win over the others.

Often times it will be a language that is interpreted because they will handle more of the nitty-gritty things for the programmer.  This is a huge benefit since programmers will only pump out X many lines of code per day.  You'll want those lines of code to be in as high of a language as possible; in assembler you can probably write an itty bitty function in 7 lines of code, but in node.js you can write an HTTP proxy.  Higher level languages will let programmers be far more productive.

Hackers and Painters

In Hackers and Painters by Paul Graham, there is much talk about Lisp and how this mathematical language is still very powerful.  Graham attributes this to nine aspects that Lisp has that were new to languages of that era.

1 - Conditionals

This would be your standard if / else block.  At the time, there were machine-specific comparisons and then you would use a goto.  Today, a language that doesn't support "if" statements is not considered a real language.

2 - A function type

In Lisp, a variable can hold a number, string, or a function.  Functions can even be passed as arguments. Some languages today implement this, but certainly not all.

3 - Recursion

Lisp functions can call themselves.  Other languages perhaps looped, had to manually handle the stack for arguments, or needed to employ tricky tactics to get the job done in a similar method.

4 - Dynamic typing

A variable can hold a string, number, function, or other forms of data.  One does not need to declare the datatype ahead of time.  With Lisp, variables are essentially pointers to data and the data is what has a type.  Assigning a value to a variable merely copies the pointer.

5 - Garbage collection

When you were done using up memory, you would normally have to free it.  With Lisp, the variables would be deallocated eventually and automatically.

6 - Programs composed of expressions

Lisp doesn't distinguish between expressions and statements.  An expression calculates a result (like "4 + 5") and can be viewed like a phrase.  A statement also does something with the result (like "x = 4 + 5") and can be understood like a whole sentence.  In Lisp, when an expression is evaluated it always produces a result, which can be fed into another expression to build another result.

7 - A symbol type

References to a symbol are really pointers to strings in a hash table.  Ruby has this concept, and it makes comparisons much faster.  If you have a certain string in your code multiple times because you are comparing it here and there and everywhere, symbols would have one copy of the string in memory and every time you used it there would just be a pointer to that one string.  On the other hand, in C you would end up with multiple copies of your string.

8 - A notation for code using trees of symbols and constants

When in memory, programs are written as lists of symbols and constants.  These lists may also contain lists, and those might contain lists, etc.  Thus, all of Lisp is stored in trees internally.

9 - The whole language there all the time

This one is aimed at the parser, compiler and runtime being always available.  While reading code it could execute code to reprogram Lisp's syntax.  While executing code it could compile more code to extend Language.  One could run code to parse more data, perhaps as a form of communication with another piece of software.

One problem with the book Hackers and Painters is that they do not go into why this sort of thing would be powerful and why it gives Lisp such an advantage over other languages.  It discusses partly that since programs are just lists and functions can be treated the same as any other variable, one can mutate functions and alter how they run without much fuss.  That sort of flexibility might give huge amounts of power, but unfortunately I have not yet been witness to where that's a good idea and other languages didn't have a solution that would work for them as well.

Today's Languages

Lisp has the above things and was designed in the 1950's, and it appears that other languages have been trying to achieve many of the above items after learning that they do work well.  Here, I shall attempt to compare other languages with some of the above criterion that set Lisp apart from other languages.  I'm going to exclude some that are in every modern language and also remove the "notation for code using trees" since that's how one language works internally, which may or may not actually empower the programmer.

Attributes

  1. [Fn] Functions can be assigned to variables
  2. [Dyn] Dynamic typing
  3. [Gc] Garbage collection
  4. [Sym] Symbol type (natively)
  5. [All] Whole language there (eg. can use "eval")
  6. [R] REPL: Read, eval, print loops

Language comparison

Language [Fn] [Dyn] [Gc] [Sym] [All] [R]
C No No No No No No
Java No No Yes No No No
JavaScript Yes Yes Yes No Yes Yes
Lisp Yes Yes Yes Yes Yes Yes
PHP Partial Yes Yes No Yes Partial
Ruby Yes Yes Yes Yes Yes Yes
Visual Basic No Yes Yes No Yes No

PHP can pass closures as functions but doesn't treat all functions this way.  The command-line interface does have an interactive mode, but it won't load library functions by default.

Shrinking VM Disk Images

posted Jan 12, 2012, 3:19 PM by Tyler Akins   [ updated Jun 8, 2012, 10:28 AM ]

I have been asked to compress dynamically-sized virtual disk images more than once.  These instructions can apply to VMDK files (common for VMWare) and VMI files (VirtualBox).  This sort of request seems to come up every year or two for me.  Usually it is because some place is gearing up to distribute these disk images and serving up gigs of data is undesirable.

I come up with the same sort of steps time and time again.  Instead of recreating this work for the next time I get asked, I'm posting these instructions online to record them publicly.  I've found that they are more thorough than what I find on other sites, so perhaps you could benefit from these instructions too.

First Step - Backup

Make a backup.  The steps below can really destroy images; follow them AT YOUR OWN RISK.

Reconfigure The Machine

Before you distribute the disk image around, you may need to tweak the configuration so that other virtualization tools will work with your image correctly.

Disable Network Configs Via MAC Addresses

If you use kudzu, you should disable it or it may prompt you when you start up the VM and it has a new MAC address.  Kudzu ships with older Red Hat and CentOS.

chkconfig kudzu off

Newer systems use udev.  You can disable the persistent network connections with these commands.

rm /etc/udev/rules.d/70-persistent-net.rules
cd /lib/udev/rules.d
rm 75-persistent-net-generator.rules && touch 75-persistent-net-generator.rules

You will also need to make sure you networking scripts don't have hardcoded HWADDR lines.

cd /etc/sysconfig/network-scripts
perl -pi -e "s/^HWADDR/#HWADDR/" ifcfg-eth*

Free More Space

It is a common misconception that deleting files on your dynamically sized disk image will make the disk image shrink.  This is not true - the virtual machine software doesn't peek into the filesystem to determine that sectors aren't needed any longer.  More on this later ...  For now, let's focus on making some room.

Delete temp files.  They shouldn't be needed.

# Linux variants
find /tmp /var/tmp -mindepth 1 -maxdepth 1 -exec rm -rf \{\}

If you are on Windows, you can usually delete the files in C:\Windows\Temp or use a program like CCleaner to thoroughly scrub away temporary files and unused garbage.

Clean your package manager's cache on Linux:

# Red Hat, CentOS
yum clean all

# Debian, Ubuntu
apt-get clean
apt-get autoclean

Also, you could clean out logs in /var/log.  This section could be improved - just let me know other things that could be cleaned out.

Defragment The Drive

This step really isn't needed, but it could help to squeeze out a few more bytes if you are really concerned.  There's really no defrag for Linux.  For Windows, I suggest using UltraDefrag.

Wiping Free Space

Even after you delete the files, the hard drive image still has the contents of the old file on it.  This is why programs like photorec can work.  We need to wipe the data clean off the drive by writing NULL (hex 0x00) bytes to all of the free areas on the drive.  This still doesn't make the image any smaller.  More on this later ...

Wiping Linux From CD

The easiest way to wipe extfs filesystems (ext2, ext3, ext4) is with zerofree.  It's the faster choice.  You can download the iso image of Parted Magic and configure your VM to mount that as a virtual CD-ROM.  Boot from it, then open a terminal by clicking on the black monitor icon at the bottom.  From there, it is a few simple commands:

# Wipe a hard drive partition.  Let's say that /dev/sda1 is for /boot and /dev/sda2 is /root
zerofree -v /dev/sda1
zerofree -v /dev/sda2

# Do you use LVM?  Don't forget that your device name may differ from mine.
# If you are unsure which device, look in /dev/mapper
vgchange -a y
zerofree -v /dev/mapper/VolGroup00_LogVol00

Now we can also wipe the swap.

# If swap is a normal partition ... make sure you know which partition it is!
# You can also use ddrescue or dd_rescue instead
dd if=/dev/zero of=/dev/sda3 bs=1M

# If you use LVM, the device will be under /dev/mapper instead
dd if=/dev/zero of=/dev/mapper/ bs=1M

Guarantee we're done and shut down.

sync
shudown -h now

Wiping Linux From Itself

You can also use zerofree on the system if you prevent things from writing to the disk.  This is a bit more risky.

# Shut down to single user mode
init 1

# Remount the drive read only
# If you get errors, use "fuser -m /" to see process IDs that have open files
# Once they get killed or handled, then try this command again
mount -o remount,ro /


# Start wiping, assuming that the drive mounted to / is /dev/sda2
zerofree -v /dev/sda2

# Wipe the boot partition, assuming you have one and it is /dev/sda1
zerofree -v /dev/sda1

# Clear the swap space, assuming it is /dev/sda4
swapoff
dd if=/dev/zero of=/dev/sda4 bs=1M
mkswap /dev/sda4

# Guarantee we did things
sync

# Done.  Shut down the machine now!
shutdown -h now

Wiping Windows From Itself

There are a couple tools that I'd recommend.  First, CCleaner has added a built-in disk wiping utility.  It's easy to use.  Alternately, Eraser also is a nice tool with a GUI.  If command-line tools are more your thing, check out  SDelete from Microsoft's site, then open a command prompt and run it.

sdelete /c c

Wiping Linux By Filling The Disk

As a last resort, you can use "dd" to fill a disk.  I suggest you boot from a bootable CD image, mount your filesystem, then run dd.  That way you can prevent many bad things from happening.  Either way, you'd be executing commands such as this

cd /mount_point_I_want_to_fill
dd if=/dev/null of=empty_file
rm empty_file

Wiping Windows By Filling The Disk

It is better to not fill your hard drive with a big empty file, but these instructions are provided in case you really don't have a better way.

You can use nullfile to create this huge, empty file; Harddisk Image Cloning for PCs has links to the software.  I'd suggest using Control-F to find "nullfile" instead of scrolling.  All you need to do is double-click to run it.

There is also a Windows port of dd that works very similarly to the Unix version.  Open a command prompt and run these commands to fill your disk

dd if=/dev/zero of=empty_file bs=1M
del empty_file

Resizing The Disk Image

Finally, after we freed up lots of space and wiped the space with zeros, we can now shrink the image.  The virtualization software would have too big of a burden to resize the disk image on the fly to get smaller.  Maybe it can be smart and do this in the background someday, but for now we are forced to manually resize the file ourselves.  Here's all of the solutions I know about.  I haven't had any need to start VMs on Linux or MacOS, but the instructions below should be similar to your install.

VMWare Server's Utility - Windows Host

VMWare Server comes with vmware-vdiskmanager, which can shrink .vmdk files.  Open up a command prompt and run vmware-vdiskmanager.

"C:\Program Files\VMware\VMWare Server\vmware-vdiskmanager" -k my_disk_image.vmdk

VMWare Player - Windows Host

Open up VMWare Player and edit the virtual machine.  Select the hard disk, then there's a button on the right that says Utilities.  Under that drop-down menu is an option, "Compact".  Presto-chango, you are done.

VirtualBox - Windows Host

The VBoxManage command can use "modifyhd" to shrink .vdi files but there is no support for shrinking .vmdk files, which VirtualBox can also read and write.

"C:\Program Files\Oracle\VirtualBox\VBoxManage" modifyhd my_disk_image.vdi --compact

Alternately you can copy the disk to a new image, but this changes the UUID.  By changing the format parameter, you can change it to a VMDK file or other types.  When you do this, you'll need to go to VirtualBox and remove the old disk image and attach the new one, then you can finally delete the big disk image.

"C:\Program Files\Oracle\VirtualBox\VBoxManage" clonehd my_disk_image.vdi my_shrunken_image.vdi --format VDI

Vagrant

When Vagrant makes a new package, it will automatically clone the hard drive, which uses the virtualization software's process for copying a hard drive.  This won't copy the blank sections of the disk, making a smaller image automatically.

Public DNS Pointing To localhost (127.0.0.1)

posted Oct 7, 2011, 9:01 AM by Tyler Akins   [ updated Aug 30, 2012, 6:06 AM ]

When you are developing and using a local development environment, you typically need to hit your own site.  A lot.  You'd use URLs that look like this:

http://localhost/
http://127.0.0.1/

When you get slightly more advanced, you would want to run multiple sites off your installation.  You can easily do this with name based virtual hosts (eg. with VirtualHost directives in Apache's config).  Now you want to use urls like this:

http://client1.local/
http://client2.dev/
http://client3/

Those URLs don't work, so now we need to find some way to map our domain names to the "localhost" address.

What if we could map hostnames to 127.0.0.1 and make this work?

Hosts File

The first and easiest method is where one edits their hosts file (/etc/hosts in Linux, C:\Windows\System32\Drivers\etc\hosts for some versions of Windows) and add lines like this:

127.0.0.1 client1.local
127.0.0.1 client2.dev
127.0.0.1 client3

At work, we have up to five different hostnames for each of our clients.  Adding yet another client means dozens of developers that now need to edit their hosts file.  Oh, the pain and agony when you have to do this for hundreds of domains!

What if we could have a single top-level domain that always resolved to localhost?

DNS Entries - Windows

If you are using Windows DNS, you can create a new zone:

dnscmd /RecordAdd local * 3600 A 127.0.0.1
dnscmd /RecordAdd local @ 3600 A 127.0.0.1

dnsmasq - Linux, MacOS

On Linux systems, you can install dnsmasq to pretend to be a real DNS server and actually respond with 127.0.0.1 for all subdomains of a top level domain.  So, if you wanted *.local to always resolve to your own domain, then you can use URLs like this:

http://client1.local/
http://client2.local/
http://client3.local/

You only need to install and set up dnsmasq.  There's some well-written instructions at http://drhevans.com/blog/posts/106-wildcard-subdomains-of-localhost that you can follow; I won't repeat them here.

The drawback of this setup is that you now have to install and configure dnsmasq on every machine where you want to use this trick.

What if someone set up DNS entries and basically did this for you?

Available Wildcarded DNS Domains

It turns out that some kind hearted people already set up wildcarded domains for you already.  You can use any top level domain below and any subdomain of these and they will always resolve back to 127.0.0.1 (your local machine).  Here's the list of ones I know about.  Let me know if there are more!

  • fuf.me - Managed by me; it will always point to localhost for IPv4 and IPv6
  • localtest.me - Also has an SSL cert - see http://readme.localtest.me
  • ulh.us
  • 127-0-0-1.org.uk
  • ratchetlocal.com
  • smackaho.st
  • 42foo.com
  • vcap.me
  • beweb.com
  • yoogle.com
  • ortkut.com
  • feacebook.com
  • lvh.me
Now, with these wildcarded domains, you don't need to do any modification of your system for requests to come back to your own server.  For instance, you can go to http://client1.127-0-0-1.co.uk/ and the web page request will always head back to your own server.  You'll still need to configure your web server to answer on this hostname, but at least the DNS portion of the problem is now solved.

Escaping Strings

posted Sep 12, 2011, 3:30 PM by Tyler Akins   [ updated Sep 13, 2011, 10:45 AM ]

There is a lot of confusion out there about the proper way to escape strings in different languages for different purposes.  I recently had a discussion with an acquaintance regarding the correct way to escape a regular expression in PHP.  To that end, I wrote up an email to him in an attempt to explain why I said he didn't have enough backslashes.

Let's pretend we want to write a regular expression to remove all periods.  I'm using only a couple languages to better illustrate my point, and please don't mention that the JavaScript one doesn't really need a RegExp object.  Remember, this code is designed to show you a tricky part about escaping.

// Version 1

// PHP
$result = preg_replace("/./", "", $input);

// JavaScript
var regexp = new RegExp(".");
var result = input.replace(regexp, "");

There, done.  Oh wait, I forgot to escape the period in the string!  Regular expressions will match any character on a period, so we need to put a backslash before the period so the engine knows we want to match just periods.

// Version 2

// PHP
$result = preg_replace("/\./", "", $input);

// JavaScript
var regexp = new RegExp("\.");
var result = input.replace(regexp, "");

There, done.

Or am I?  When doing escaping in strings, the backslash character is often the indicator that the next character is treated differently.  For instance, \n translates to a newline character, \t becomes a tab character and \\ means to put in a literal backslash character.  The real string that we want passed into the Regular Expression engine is literally, \. (a backslash and a period = an escaped period).  We need to take that string and then escape it again to embed it in our code properly.

// Version 3

// PHP
$result = preg_replace("/\\./", "", $input);

// JavaScript
var regexp = new RegExp("\\.");
var result = input.replace(regexp, "");

Finally our code is correct.  I've received several questions about the multiple levels of escaping, so let's anticipate some questions and provide useful answers right away!

Why are we escaping the period twice?

It's because the string goes through two levels of unescaping before being used - first it goes through PHP's string unescaping and then through the regular expression engine's string unescaping.

Why does Version 2 still work?

Great question, and I think this is the source of the confusion about string escaping.  Version 2 still works because \. doesn't "unescape" to anything.  Instead of choking and dying, the software will just let the two characters go through.  Better yet is this list of unescaped strings:

Input Output
\
\\
\"
\\"
\n
\\n
\
\\
\"
invalid
newline
\n

If Version 2 works, why worry about proper escaping?

It is doubtful that the string processing engine of the different languages will change much in the future.  However, it could help you avoid problems.  Let's pretend you wanted to match a literal backslash and any character.  You'd want the pattern \\. and in both of the example languages it should be escaped as "\\\\." (yeah, four backslashes = 1 literal backslash because it gets unescaped twice).  If you get your escaping messed up or don't know how many levels of unescaping will happen, you would get unexpected results.  If you only used the string "\\." in either language, it would match only periods, not a backslash followed by any character.

In PHP there are these single-quoted strings where you don't need to escape ...

Sorry, that's wrong.  You must still escape there.  Try echo '\\' or echo '\'' (that's two single-quotes, not a double quote at the end).  Without the escaping in single-quoted strings, you would not be able to embed an apostrophe.  Most of the escape characters are disabled, however, so sequences like \n and \t will not produce a newline nor a tab.

In conclusion ...

So, armed with this knowledge, I could ask you to escape the regular expression that looks for a backslash, a period, a double quote, and a slash.  You'd be able to produce the following:

// Looking for:  \."/
// Escaped for Regular Expression:  \\\."/ 

// PHP - Escape backslashes, slash, double quote
$result = preg_replace("/\\\\\\.\"\\//", "", $input);

// JavaScript - Escape backslashes and double quote
var regexp = new RegExp("\\\\\\.\"/");
var result = input.replace(regexp, "");

1-7 of 7