Identifying and Working with File Types

If you're new to Linux, it won't take long before you begin seeing files with extensions that may seem foreign. A file's extension is the last part of a file's name, after the final dot (in the file sneakers.txt, "txt" is that file's extension).

Here's a brief listing of extensions and their meanings:

Compressed/Archived Files

File Formats

System Files

Programming and Scripting Files

But file extensions are not always used, or used consistently. So what happens when a file doesn't have an extension, or the file doesn't seem to be what the extension says it's supposed to be?

That's when the file command can come in handy.

In Chapter 15, we created a file called saturday -- without an extension. Using the file command, we can tell what the file is by typing:

file saturday
	    

and we'll see it's a text file. Any file that's designated a text file should be readable using cat, more, or less.

TipRead the man page
 

To learn more about file, read the file man page by typing man file.

And speaking of reading files…

There are plenty of ways to read files in Linux. In Chapter 15, for example, we covered the pagers more and less -- they're called pagers because you can "page" through documents one screen at a time. We also learned how we can not only view but manipulate files with cat.

But there are even more options when it comes time to take a look at README files, man pages or documents you've created.

You have a number of tools to help you read text files, among them, the text editors pico, emacs, and vim, the pagers more and less, and the viewers head, tail, cat, and grep.

Let's take a look at some of the features in these tools.

The less Command

In Chapter 15, we introduced to the pager less. Less is the pager that's used to display man pages.

Let's view the man page for less to see less in action.

man less
	      

To move forward a screen, press Space; to move back a screen, press B, and to quit, press Q.

There are other powerful features to less, as well, including the ability to scroll horizontally and specify the number of lines to scroll.

The more Command

Odd as it may seem, more offers less than less (actually, less was inspired by more).

Let's take a look at the man page for more, but this time, we'll open the page using more -- by piping man's output to more.

man more | more
	      

It may not look too different at first, but there are fewer enhancements to more than to less. Probably the most striking difference at first is the lack of a way to go backwards in a document -- although moving forward by pressing Space and quitting by pressing Q are the same.

The head Command

You can use the head command if you just want to look at the beginning of a file. The command is:

head <filename>
	      

Head can be useful, but because it's limited to the first several lines, you won't know how long the file actually is. By default, you can only read the first 10 lines of a file, although we can specify the number to see more by typing:

head -20 <filename>
            

Read head's man page (man head) for more information. You'll probably find that less or more are more helpful, because you can page through the file if you find that the information you're looking for is further into the file than you originally thought.

The tail Command

The reverse of head (obviously), is tail. With (tail), you can review the last 10 lines of a file.

The cat Command

The command cat, short for concatenation, will dump the contents of the entire file on the screen. Using cat can be handy if the file is fairly short, such as when we created sneakers.txt. But if a file is fairly long, it will easily scroll past you on the screen, since cat displays the whole file.

The grep Command

The grep command is pretty nifty for finding specific character strings in a file. Let's say we want to find every reference we made to "coffee" in the file sneakers.txt, which we created in our login directory. We could type:

grep coffee sneakers.txt
          

and we would see every line in which the word "coffee" could be found.

TipRemember case
 

Unless otherwise specified, grep searches are case sensitive. That means that searching for Coffee is different than searching for coffee. So among grep's options is -i, which allows you to make a case-insensitive search through a file. Read the grep man page for more about this command.

I/O Redirection and Pipes

And don't forget about using pipes and output redirection when you want to store and/or print information to read at a later time.

You can, for example, use grep to search for particular contents of a file, then have those results either saved as a file or sent to a printer.

To print the information about references to "coffee" in sneakers.txt, for example, just type:

grep coffee sneakers.txt | lpr
	      

This command behaves similarly to the command ls -al /etc | more. You may have used that command in Chapter 15 to list the contents of the /etc directory then send the results through the more command for viewing on the screen.

TipIt's safest to use >>
 

Remember the distinction of using > and >>: using > will overwrite a file, while >> appends the information to a file. Usually, unless you're certain you want to, it's safer to use >>, because you won't lose potentially valuable information (though you may have to edit the file if you didn't want to append information to it).

Wildcards and Regular Expressions

What if you forget the name of the file you're looking for? You can't say to your computer, "Find a file called 'sneak' or 'sneak-something'."

Well, yes you can, in a way. Using wildcards or regular expressions, you can perform actions on a file or files without knowing the complete filename. Just fill out what you know, then substitute the remainder with a wildcard.

TipFind out more about wildcards and regular expressions
 

To read more about wildcards and regular expressions, take a look at the bash man page (man bash). Remember that you can save the file to a text file by typing man bash | col -b > bash.txt. Then, you can open and read the file with less or pico (pico bash.txt). If you want to print the file, be prepared: It's quite long.

We know the file's called "sneak-something.txt," so just type:

ls sneak*.txt
	      

and there's the name of the file:

sneakers.txt
	      

You'll probably use the asterisk (*) most frequently when you're searching. The asterisk will search out everything that matches the pattern you're looking for. So even by typing:

ls *.txt
	      

or:

ls sn*
	      

you'd find sneakers.txt -- except that as time goes on, there will be more text files, and they'll all show up because they match the pattern you're searching for.

It helps, then, to narrow your search as much as possible.

One way to narrow that search might be to use the question mark symbol (?). Like the asterisk, using ? can help locate a file matching a search pattern.

In this case, though, ? is useful for matching a single character -- so if you were searching for sneaker?.txt, you'd get sneakers.txt as a result -- and/or sneakerz.txt, if there were such a filename.

When an asterisk, for example, just happens to be part of a filename, such as might be the case if the file sneakers.txt was called sneak*.txt, that's when regular expressions can come in handy.

Regular expressions are more complex than the straightforward asterisk or question mark.

Using the backslash (\), you can specify that you don't want to search out everything by using the asterisk, but you're instead looking for a file with an asterisk in the name.

If the file is called sneak*.txt, then, type:

sneak\*.txt
	      

Here is a brief list of wildcards and regular expressions:

  • * -- Matches all characters

  • ? -- Matches one character in a string (such as sneaker?.txt)

  • \* -- Matches the * character

  • \? -- Matches the ? character

  • \) -- Matches the ) character

You can also use wildcards for more than searching: they can come in handy when you want to move and rename files. And regular expressions can help you rename files with characters like * and ? in them.

For more on that, read on.