Laughter is coded explicitly in the transcriptions as. Using "grep", we can have an idea of how much people laugh in these conversations. Do to this, we add the option "-color=auto": It can be useful to use some colors, to see what exactly gets matched. If the regular expression matches the text, the line will be echoed back to us. We can then type some text (one line), and hit enter. We will type "grep" followed by the regular expression we want to look for, then "enter". We can first try regular expressions in an interactive mode. It searches plain-text data for lines matching a regular expression. "grep" stands for Globally search a Regular Expression and Print. Just for the fun of it, I will quickly illustrate what you can do with regular expressions. Regular expressions are quite handy but tricky too (see xkcd comic). The "grep" command ("Select-String" in PowerShell) might be quite useful to you but we will not insist too much on this. Sometimes a file might be really big, and you don't want to open it in Word -) There are other useful Unix commands, and you might want to take a look at Unix for Poets.īut for our purpose, we just want to be able to navigate the directory tree and have quick peeks into files. Now count the number of files in the "058" directory. So we have 100 files in the "065" directory. Two (or more) commands connected in this way form what's called a pipe. We use the vertical bar "|" to connect two commands together so that the output from one command becomes the input of the next command. We can list the directory content and count the output. Can we know how many files we have in this "065" directory? So *.txt will mean "anything that ends with. We will use * which serves as a wildcard character. Now I want to know how many words I have in this whole directory "065", and not only for this particular file. 14,055 is the number of bytes in the file "fe_03_06501.txt".2,729 is the number of words in the file "fe_03_06501.txt".173 is the number of lines in the file "fe_03_06501.txt".So what are the different numbers we see? Charactersīeyond the final character will not be included in the line count. A line isĭefined as a string of characters delimited by a character. The wc utility displays the number of lines, words, and bytes contained in each inputįile, or standard input (if no file is specified) to the standard output. Wc - word, line, character, and byte count What happens when we type the "wc" command? We can count stuff easily with the "wc" command. Now we just type "01", and hit "tab" again, and we are seeing the content of the file "fe_03_06501.txt".Īnother small trick: Using the up and down arrows, we can repeat commands we just typed (there is a history of the commands we typed).Ĭan we roughly quantify the amount of data that's available here? E.g., how many files do we have? How many words? Using the "tab" key, Unix will fill the rest of the command for you. Ok, we have a pretty decent understanding of how the file is structured. The numbers at the beginning of the lines are the start and end time of the utterance. Participants are coded as A and B, and we have gender information (as -f or -m tagged to the participant code). It seems that we have one utterance per line. To see more of the file, hit the "return" key. 16.81 25.87 A-f: um so do you think that public or private school
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |