Google Voice turned loose and finding duplicated files

Google Voice goes public and three tools for finding duplicated files

If you've been waiting for an account on Google Voice (what was once called "Central Station") you, my friend, need wait no longer … Google has just announced that is now open to the public.

Should you, for some unfathomable reason, not be familiar with Google Voice let me give you the skinny: GV provides you with a telephone number, for free no less, which Google claims is yours forever, and then throws in a heap of cool features.

You can configure your GV number to route incoming calls to one or more of your phones and, if you like, do so only at certain times and or for certain callers or groups of callers. You can also customize greetings for individuals and groups, screen voicemail, record calls, conference multiple incoming callers, and make free outgoing national calls. Google also offers low cost international calling as well.

You really should check it out. I now give everyone my GV number so that if I, for some reason, change phone numbers or add a new number to my collection of phone services (which currently includes a Vonage account, two cell phones and a Gizmo account) I can still be contacted. Way cool.

So, this week I've been spring cleaning. My house? Nah. My storage? As the French would have it, "mais oui!"

I currently have about 7TB here at the Gearhead Secret Underground Bunker and I know I have what could be described as significantly unoptimized storage.

My big problem is not fragmentation. For this, I have relied on Raxco PerfectDisk on my servers and workstations, and my SAN is based on the Synology RackStation and DiskStation products I reviewed at the end of last year.

Synology devices use the ext3 filesystem which, according to Linux System Administrators Guide: Chapter 5, Using Disks and Other Storage Media, keeps "fragmentation at a minimum by keeping all blocks in a file close together, even if they can't be stored in consecutive sectors. Some filesystems, like ext3, effectively allocate the free block that is nearest to other blocks in a file. Therefore it is not necessary to worry about fragmentation in a Linux system."

This assertion appears to be disputed for very heavy use cases involving ext3, but as far as we here at GSUB are concerned, the Synology devices aren't showing any performance degradation or other symptoms of fragmentation so far.

So, what could need cleaning out? The answer: Duplicate files.

On my storage systems I have literally hundreds of projects along with various resources that simply exist for our entertainment (such as iTunes) that I know contain in many cases many duplicate copies of files.

So, in an attempt to streamline my stuff, I decided to try out three de-duplication products from Mindgems: Visual Similarity Duplicate Image Finder ($24.95), Fast Duplicate File Finder (Free), and Audio Dedupe ($29.95).

All of these products follow a similar user interface design; a panel that lists duplicated files, a panel for selecting folders to be scanned, and a configuration and preview panel.

Once you've specified which folders to examine and the programs have run to completion you'll have a list of files organized in groups of suspected duplicates, each with an estimate of similarity to the group.

I found the image tool was remarkably good at detecting similar images, as was the file similarity search, while the audio search turned up some odd matches that I couldn't quite figure out why they were considered to be related (underlining the need to pay close attention in all three utilities to exactly what is detected).

The Mindgems programs will, by default, check off all of the files that they determine to be duplicates with the exception of one that is to be retained as the original. The problem with this is that for Windows applications there are a ridiculous number of duplicated and shared resources so, should you allow many of these automatically selected copies to be deleted, you are going to be very, very sorry.

What is really needed is a way to replace the duplicates with links to a single shared file, but that also leads to all sorts of hideous complexities (many Windows applications are, as you are no doubt well aware, architecturally messy) that will result in software getting broken. This implies that the best way to use the Mindgems tools is on archival storage that is isolated from applications and their default resources or to be very careful in selecting which areas of PC storage you include in scans.

In my test, all of these products worked as fast as resources (available processor cycles, available memory, and storage performance) allowed and in my tests they were robust and fairly easy to use and actually did a good job. One thing to watch out for with music files are duplicate tracks released on different albums – even though the duplicates waste space. You'll want to keep the copies to make sure that you don't "break" playlists.

I'll give all three products a rating of 4 out of 5.

Gibbs isn't duplicated in Ventura, Calif. Copy your thoughts to

This story, "Google Voice turned loose and finding duplicated files " was originally published by Network World.

Shop Tech Products at Amazon