How can I replace end line with fixed text when the next line begins with a defined set of characters?
up vote
2
down vote
favorite
I have several big files with some measurements.
It looks this way:
N 12344;PE 9.9999999;...
#S 0 0 31 44 75 130 165 196...
#S_+ "2 5 2 3 3 1 1 2 3 1 2 2...
N 12345;PE 9.9999999;...
#S 0 0 34 57 84 133 152...
#S_+ "1 0 1 1 2 3 0 0 0...
N 12346;PE 9.9999999;...
#S 0 0 31 44 73 140 169...
#S_+ "3 3 4 0 0 2 1 2 4...
N 25104;PE 9.9999999;...
#S 0 0 36 52 102 108 145...
#S_+ "1 1 0 1 0 0 3 0 1...
N 25105;PE 9.9999999;...
#S 0 0 32 58 88 130 143...
Sample is here:
http://pasted.co/d9806b7c4
The file is much bigger but I replaced part of the data with "..." to make it shorter.
I need to somehow replace the line ends before "#S" - in fact simply merge the "N" line with the following two ones into one line (or with the following three ones so I can get rid of the blank lines). Expect output like this:
N 12344;PE 9.9999999; #S 0 0 31 44 75 130 165 196 #S_+ "2 5 2 3 3 1 1 2 3 1 2 2...
N 12345;PE 9.9999999; #S 0 0 34 57 84 133 152 #S_+ "1 0 1 1 2 3 0 0 0...
N 12346;PE 9.9999999; #S 0 0 31 44 73 140 169 #S_+ "3 3 4 0 0 2 1 2 4...
N 25104;PE 9.9999999; #S 0 0 36 52 102 108 145 #S_+ "1 1 0 1 0 0 3 0 1...
N 25105;PE 9.9999999; #S 0 0 32 58 88 130 143...
Is this possible to achieve using some command-line utility in linux?
My knowledge is quite limited in this area so I would appreciate any help.
thanks
linux command-line regex
|
show 2 more comments
up vote
2
down vote
favorite
I have several big files with some measurements.
It looks this way:
N 12344;PE 9.9999999;...
#S 0 0 31 44 75 130 165 196...
#S_+ "2 5 2 3 3 1 1 2 3 1 2 2...
N 12345;PE 9.9999999;...
#S 0 0 34 57 84 133 152...
#S_+ "1 0 1 1 2 3 0 0 0...
N 12346;PE 9.9999999;...
#S 0 0 31 44 73 140 169...
#S_+ "3 3 4 0 0 2 1 2 4...
N 25104;PE 9.9999999;...
#S 0 0 36 52 102 108 145...
#S_+ "1 1 0 1 0 0 3 0 1...
N 25105;PE 9.9999999;...
#S 0 0 32 58 88 130 143...
Sample is here:
http://pasted.co/d9806b7c4
The file is much bigger but I replaced part of the data with "..." to make it shorter.
I need to somehow replace the line ends before "#S" - in fact simply merge the "N" line with the following two ones into one line (or with the following three ones so I can get rid of the blank lines). Expect output like this:
N 12344;PE 9.9999999; #S 0 0 31 44 75 130 165 196 #S_+ "2 5 2 3 3 1 1 2 3 1 2 2...
N 12345;PE 9.9999999; #S 0 0 34 57 84 133 152 #S_+ "1 0 1 1 2 3 0 0 0...
N 12346;PE 9.9999999; #S 0 0 31 44 73 140 169 #S_+ "3 3 4 0 0 2 1 2 4...
N 25104;PE 9.9999999; #S 0 0 36 52 102 108 145 #S_+ "1 1 0 1 0 0 3 0 1...
N 25105;PE 9.9999999; #S 0 0 32 58 88 130 143...
Is this possible to achieve using some command-line utility in linux?
My knowledge is quite limited in this area so I would appreciate any help.
thanks
linux command-line regex
thanks to grawity for helping me with the code :-)
– Juhele
Nov 21 at 14:09
1
@Pimp Juice IT: OK, I updated the question.
– Juhele
Nov 21 at 14:14
Hi @Juhele can you specify better the output format: Do you need to cut the first line after e.g.PE 9.9999999;
, do you need to cut the second after the 7th (8th) number or, as you write, merge the "N" line with the following two ones? What about the"
present only in the output ?! I give some edit to your post, please check it. It can be an incomplete file? BTW for the most simple case you already have more than one good answer.
– Hastur
Nov 22 at 9:09
the " is both in input and output like #S_+ "2 5 - it is not important character for me (I am going to remove it in next processing step) but it is just in the input data. "do you need to cut the second after the 7th (8th) number" - No, I just shortened the example as the data has several hundreds "columns".
– Juhele
Dec 4 at 9:05
1
For personal experiences with acquisition devices do redundancy checks (the more the better). I suppose you already know, but with enough lines, it is not important how little is the likelihood of corrupted data, if it is not zero it will occur. Many time is enoughawk '{print NF}' YourFILE | sort -n -u | uniq -c
to know that you have the same number of columns in (almost, because of headers) each line, or a consistent structure (3 lines of data 1 blank...)
– Hastur
Dec 4 at 11:09
|
show 2 more comments
up vote
2
down vote
favorite
up vote
2
down vote
favorite
I have several big files with some measurements.
It looks this way:
N 12344;PE 9.9999999;...
#S 0 0 31 44 75 130 165 196...
#S_+ "2 5 2 3 3 1 1 2 3 1 2 2...
N 12345;PE 9.9999999;...
#S 0 0 34 57 84 133 152...
#S_+ "1 0 1 1 2 3 0 0 0...
N 12346;PE 9.9999999;...
#S 0 0 31 44 73 140 169...
#S_+ "3 3 4 0 0 2 1 2 4...
N 25104;PE 9.9999999;...
#S 0 0 36 52 102 108 145...
#S_+ "1 1 0 1 0 0 3 0 1...
N 25105;PE 9.9999999;...
#S 0 0 32 58 88 130 143...
Sample is here:
http://pasted.co/d9806b7c4
The file is much bigger but I replaced part of the data with "..." to make it shorter.
I need to somehow replace the line ends before "#S" - in fact simply merge the "N" line with the following two ones into one line (or with the following three ones so I can get rid of the blank lines). Expect output like this:
N 12344;PE 9.9999999; #S 0 0 31 44 75 130 165 196 #S_+ "2 5 2 3 3 1 1 2 3 1 2 2...
N 12345;PE 9.9999999; #S 0 0 34 57 84 133 152 #S_+ "1 0 1 1 2 3 0 0 0...
N 12346;PE 9.9999999; #S 0 0 31 44 73 140 169 #S_+ "3 3 4 0 0 2 1 2 4...
N 25104;PE 9.9999999; #S 0 0 36 52 102 108 145 #S_+ "1 1 0 1 0 0 3 0 1...
N 25105;PE 9.9999999; #S 0 0 32 58 88 130 143...
Is this possible to achieve using some command-line utility in linux?
My knowledge is quite limited in this area so I would appreciate any help.
thanks
linux command-line regex
I have several big files with some measurements.
It looks this way:
N 12344;PE 9.9999999;...
#S 0 0 31 44 75 130 165 196...
#S_+ "2 5 2 3 3 1 1 2 3 1 2 2...
N 12345;PE 9.9999999;...
#S 0 0 34 57 84 133 152...
#S_+ "1 0 1 1 2 3 0 0 0...
N 12346;PE 9.9999999;...
#S 0 0 31 44 73 140 169...
#S_+ "3 3 4 0 0 2 1 2 4...
N 25104;PE 9.9999999;...
#S 0 0 36 52 102 108 145...
#S_+ "1 1 0 1 0 0 3 0 1...
N 25105;PE 9.9999999;...
#S 0 0 32 58 88 130 143...
Sample is here:
http://pasted.co/d9806b7c4
The file is much bigger but I replaced part of the data with "..." to make it shorter.
I need to somehow replace the line ends before "#S" - in fact simply merge the "N" line with the following two ones into one line (or with the following three ones so I can get rid of the blank lines). Expect output like this:
N 12344;PE 9.9999999; #S 0 0 31 44 75 130 165 196 #S_+ "2 5 2 3 3 1 1 2 3 1 2 2...
N 12345;PE 9.9999999; #S 0 0 34 57 84 133 152 #S_+ "1 0 1 1 2 3 0 0 0...
N 12346;PE 9.9999999; #S 0 0 31 44 73 140 169 #S_+ "3 3 4 0 0 2 1 2 4...
N 25104;PE 9.9999999; #S 0 0 36 52 102 108 145 #S_+ "1 1 0 1 0 0 3 0 1...
N 25105;PE 9.9999999; #S 0 0 32 58 88 130 143...
Is this possible to achieve using some command-line utility in linux?
My knowledge is quite limited in this area so I would appreciate any help.
thanks
linux command-line regex
linux command-line regex
edited Nov 22 at 11:39
Toto
3,37191126
3,37191126
asked Nov 21 at 13:52
Juhele
2,07221222
2,07221222
thanks to grawity for helping me with the code :-)
– Juhele
Nov 21 at 14:09
1
@Pimp Juice IT: OK, I updated the question.
– Juhele
Nov 21 at 14:14
Hi @Juhele can you specify better the output format: Do you need to cut the first line after e.g.PE 9.9999999;
, do you need to cut the second after the 7th (8th) number or, as you write, merge the "N" line with the following two ones? What about the"
present only in the output ?! I give some edit to your post, please check it. It can be an incomplete file? BTW for the most simple case you already have more than one good answer.
– Hastur
Nov 22 at 9:09
the " is both in input and output like #S_+ "2 5 - it is not important character for me (I am going to remove it in next processing step) but it is just in the input data. "do you need to cut the second after the 7th (8th) number" - No, I just shortened the example as the data has several hundreds "columns".
– Juhele
Dec 4 at 9:05
1
For personal experiences with acquisition devices do redundancy checks (the more the better). I suppose you already know, but with enough lines, it is not important how little is the likelihood of corrupted data, if it is not zero it will occur. Many time is enoughawk '{print NF}' YourFILE | sort -n -u | uniq -c
to know that you have the same number of columns in (almost, because of headers) each line, or a consistent structure (3 lines of data 1 blank...)
– Hastur
Dec 4 at 11:09
|
show 2 more comments
thanks to grawity for helping me with the code :-)
– Juhele
Nov 21 at 14:09
1
@Pimp Juice IT: OK, I updated the question.
– Juhele
Nov 21 at 14:14
Hi @Juhele can you specify better the output format: Do you need to cut the first line after e.g.PE 9.9999999;
, do you need to cut the second after the 7th (8th) number or, as you write, merge the "N" line with the following two ones? What about the"
present only in the output ?! I give some edit to your post, please check it. It can be an incomplete file? BTW for the most simple case you already have more than one good answer.
– Hastur
Nov 22 at 9:09
the " is both in input and output like #S_+ "2 5 - it is not important character for me (I am going to remove it in next processing step) but it is just in the input data. "do you need to cut the second after the 7th (8th) number" - No, I just shortened the example as the data has several hundreds "columns".
– Juhele
Dec 4 at 9:05
1
For personal experiences with acquisition devices do redundancy checks (the more the better). I suppose you already know, but with enough lines, it is not important how little is the likelihood of corrupted data, if it is not zero it will occur. Many time is enoughawk '{print NF}' YourFILE | sort -n -u | uniq -c
to know that you have the same number of columns in (almost, because of headers) each line, or a consistent structure (3 lines of data 1 blank...)
– Hastur
Dec 4 at 11:09
thanks to grawity for helping me with the code :-)
– Juhele
Nov 21 at 14:09
thanks to grawity for helping me with the code :-)
– Juhele
Nov 21 at 14:09
1
1
@Pimp Juice IT: OK, I updated the question.
– Juhele
Nov 21 at 14:14
@Pimp Juice IT: OK, I updated the question.
– Juhele
Nov 21 at 14:14
Hi @Juhele can you specify better the output format: Do you need to cut the first line after e.g.
PE 9.9999999;
, do you need to cut the second after the 7th (8th) number or, as you write, merge the "N" line with the following two ones? What about the "
present only in the output ?! I give some edit to your post, please check it. It can be an incomplete file? BTW for the most simple case you already have more than one good answer.– Hastur
Nov 22 at 9:09
Hi @Juhele can you specify better the output format: Do you need to cut the first line after e.g.
PE 9.9999999;
, do you need to cut the second after the 7th (8th) number or, as you write, merge the "N" line with the following two ones? What about the "
present only in the output ?! I give some edit to your post, please check it. It can be an incomplete file? BTW for the most simple case you already have more than one good answer.– Hastur
Nov 22 at 9:09
the " is both in input and output like #S_+ "2 5 - it is not important character for me (I am going to remove it in next processing step) but it is just in the input data. "do you need to cut the second after the 7th (8th) number" - No, I just shortened the example as the data has several hundreds "columns".
– Juhele
Dec 4 at 9:05
the " is both in input and output like #S_+ "2 5 - it is not important character for me (I am going to remove it in next processing step) but it is just in the input data. "do you need to cut the second after the 7th (8th) number" - No, I just shortened the example as the data has several hundreds "columns".
– Juhele
Dec 4 at 9:05
1
1
For personal experiences with acquisition devices do redundancy checks (the more the better). I suppose you already know, but with enough lines, it is not important how little is the likelihood of corrupted data, if it is not zero it will occur. Many time is enough
awk '{print NF}' YourFILE | sort -n -u | uniq -c
to know that you have the same number of columns in (almost, because of headers) each line, or a consistent structure (3 lines of data 1 blank...)– Hastur
Dec 4 at 11:09
For personal experiences with acquisition devices do redundancy checks (the more the better). I suppose you already know, but with enough lines, it is not important how little is the likelihood of corrupted data, if it is not zero it will occur. Many time is enough
awk '{print NF}' YourFILE | sort -n -u | uniq -c
to know that you have the same number of columns in (almost, because of headers) each line, or a consistent structure (3 lines of data 1 blank...)– Hastur
Dec 4 at 11:09
|
show 2 more comments
6 Answers
6
active
oldest
votes
up vote
4
down vote
With sed:
sed -z -e 's/n#S/ #S/g' -e 's/nN /N /g' data
In slow-mo:
-z
makes sed consider the file as a single line (so the line ends are plain characters)
's/n#S/#S/g'
replaces all LF's occurring just before a#S
by a space
-e 's/nN /N /g'
replaces all LFs beforeN
(ie, the blank lines)
hmm, looks I will have to adjust it a little bit as "sed -z -e 's/n#S/ #S/g' -e 's/nN /N /g' test1.txt >> test1_mod.txt" still does not do the same as replacing "rn#S" with "#S" in Notepad++ so there is still CR left on the previous line before "#S".
– Juhele
Dec 4 at 10:59
OK, "sed -z -e 's/rn#S/ #S/g' -e 's/nN /N /g' test1.txt >> test1_mod.txt" finally turns the multiple lines into one. Just blank line remains as the "N..." line ends with CR LF (this is fine) and then there is a blank line with CR.
– Juhele
Dec 4 at 11:09
add a comment |
up vote
4
down vote
With paste
(this requires to always have groups of 4 lines):
paste -s -d ' n' data
In slo-mo:
paste -s
concatenates the lines from the file
-d
specifies characters to be inserted as delimiters. When there are several characters, they are used in a round-robin fashion, so with 3 spaces and a LF:
- the first space is used on the first splice (
N
to#S
), - the second space is used on the second splice (
#S
to#S
), - the third space is used on the thrid splice (
#S
to blank line), - the last delimiter, a LF, is used on the fourth splice (blank line to
N
) - and the cycle repeats for the next 4 lines.
- the first space is used on the first splice (
add a comment |
up vote
4
down vote
This is a portable solution with POSIX sed
, implementing the following rules:
- empty lines shall be deleted;
- any line starting with
#S
shall be merged with the previous non-empty line, with a single space character between them, unless there is no previous non-empty line.
The code:
<data sed '/^$/ d; :start; N; s/n$//; t start; s/n#S/ #S/; t start; P; D'
The same with comments (still working code):
<data sed '
/^$/ d # If empty line read, delete it and start a new cycle.
:start # A label.
N # Read additional line, there are now two lines in the pattern space.
s/n$// # If the second line is empty, replace the newline with nothing.
t start # If the above replacement occurred, go to start (to add another line).
# Otherwise
s/n#S/ #S/ # if the second line starts with #S, replace the newline with space.
t start # If the above replacement occurred, go to start (to add another line).
# Otherwise
# (i.e when non-empty line not starting with #S occurred)
P # print the pattern space up to the first newline and...
D # delete the initial segment of the pattern space
# through the first newline (i.e. everything just printed),
# and start the next cycle with the resultant pattern space
# and without reading any new input
# (in our case the new input will be explicitly read by N then).
'
Note the solution uses sed
pattern space to accumulate many input lines. This remark applies:
The pattern and hold spaces shall each be able to hold at least 8192 bytes.
Just before the P
command the pattern space holds one (relatively long) line meant to be printed and a single (relatively short) input line, plus a newline in between. Obviously it depends on your data, whether or not such structure exceeds 8192 bytes at some point. If it does, some sed
implementations may fail.
add a comment |
up vote
3
down vote
Using Perl:
perl -0 -ape 's/R(?=RN|#)/ /g' file.txt
N 12344;PE 9.9999999;... #S 0 0 31 44 75 130 165 196... #S_+ "2 5 2 3 3 1 1 2 3 1 2 2...
N 12345;PE 9.9999999;... #S 0 0 34 57 84 133 152... #S_+ "1 0 1 1 2 3 0 0 0...
N 12346;PE 9.9999999;... #S 0 0 31 44 73 140 169... #S_+ "3 3 4 0 0 2 1 2 4...
N 25104;PE 9.9999999;... #S 0 0 36 52 102 108 145... #S_+ "1 1 0 1 0 0 3 0 1...
N 25105;PE 9.9999999;... #S 0 0 32 58 88 130 143...
Regex explain:
s/ : substitute
R : any kind of line break (ie. r, n, rn)
(?= : positive lookahead, zero-length assertion that make sure we have after
RN : a line break followed by letter N
| : OR
# : # character
) : end lookahead
/ /g : replace with a space, global
add a comment |
up vote
3
down vote
awk (gawk [1])
As usually other than sed
you can use awk
(and in many different ways...)
awk 'ORS=" "; NR % 4 == 0 && ORS="n" ' data
where
ORS=" "
fixes the output record separator, by default a newline, to a space (you can change)
NR % 4 == 0 && ORS="n"
each 4th line it fixes back to the newlinen
- If nothing else is specified
awk
prints the full line
data
is your data file.
If you want you can use regex as in sed
(in a similar way).
A format check version with awk
Even if not requested, you may want to manage a truncated file eliminating the corrupted output line and generating an error and an error message.
awk '{a=$0; getline b; getline c;
if ( getline > 0 ) {print a, b, c, $0 }
else { print "Ohi " > "/dev/stderr" ; exit 65; } }' data
where
a=$0;
puts the full line in the variablea
getline b;
reads a line and puts the variableb
getline c;
obscure unfathomable command:-)
if (getline)
if it is able to read a line...- ..............
{print a, b, c, $0}
prints the 4 lines
else
prints an error on the stderr device (screen or other) you can custom here...
exit 65
return an exit code different from 0--->error
Bonus: why 65?
Searching for a good value for your exit code [2] you may found that it is suggested to see in /usr/include/sysexits.h
among some C standards...
#define EX_DATAERR 65 /* data format error */
65 is the most appropriate for the a data format error...
Honestly as answer I preferred 42,
but each value different from zero (and not reserved[2]) could be good and 65 is the specific one...
One disadvantage though: the last pack of lines may consist of three of them (i.e. no empty line at the very end); or may not. If three, then the last character of your output is space, not a newline. POSIX defines "line" asa sequence of zero or more non- <newline> characters plus a terminating <newline> character
. This will probably backfire if the output is parsed further.
– Kamil Maciorowski
Nov 22 at 9:36
Nice though, but the OP, among some other points not completely specified, states that are sets of 4 lines, last of them blank. With a truncated file the next unknown processing may be however compromised. A not requested formats check is out of this thread scope, and IMHO a good practice is to generate an error. If you require solidity it is better to opt for a script (awk
,sed
,perl
are scripting languages) that also allows you to reproduce the data processing. Then you have to decide how to deal with errors, but that is another quesiton...:-)
I just try to keep it simple.
– Hastur
Nov 22 at 10:45
@KamilMaciorowski ... nonetheless I added another version with error check...
– Hastur
Nov 22 at 11:29
add a comment |
up vote
0
down vote
You can do it with any text editor that support regular expressions like Notepad++.
The new line is just simple non-printable character or two characters. In Windows usually CarrigeReturn and LineFeed and in Unix based system usually LineFeed only.
To see them you need to turn on showing non-printable character (usually a Paragraph icon)
See here: https://imgur.com/cqiTvrp
Now what you need to do is to use regular expression replacer (CTRL + H) to replace CRLF#S to #S.
The symbol for CR is r and for LF is n. So you gonna end up with rn#S or n#S to #S.
https://imgur.com/GoeVn70
Or you can replace it to SPACE if you need.
The question is tagged "Linux"....
– xenoid
Nov 21 at 14:16
I think regular expressions in Geany are the same. Is used Notepad++ as an example beacuse I am currently at Windows.
– KaRolthas
Nov 21 at 14:20
The question also asks for a command-line utility...
– xenoid
Nov 21 at 14:22
Nice, works. I need to somehow process at least few files now so even Notepad++ helps when I am working on my other machine with Windows. thanks
– Juhele
Nov 21 at 14:44
add a comment |
6 Answers
6
active
oldest
votes
6 Answers
6
active
oldest
votes
active
oldest
votes
active
oldest
votes
up vote
4
down vote
With sed:
sed -z -e 's/n#S/ #S/g' -e 's/nN /N /g' data
In slow-mo:
-z
makes sed consider the file as a single line (so the line ends are plain characters)
's/n#S/#S/g'
replaces all LF's occurring just before a#S
by a space
-e 's/nN /N /g'
replaces all LFs beforeN
(ie, the blank lines)
hmm, looks I will have to adjust it a little bit as "sed -z -e 's/n#S/ #S/g' -e 's/nN /N /g' test1.txt >> test1_mod.txt" still does not do the same as replacing "rn#S" with "#S" in Notepad++ so there is still CR left on the previous line before "#S".
– Juhele
Dec 4 at 10:59
OK, "sed -z -e 's/rn#S/ #S/g' -e 's/nN /N /g' test1.txt >> test1_mod.txt" finally turns the multiple lines into one. Just blank line remains as the "N..." line ends with CR LF (this is fine) and then there is a blank line with CR.
– Juhele
Dec 4 at 11:09
add a comment |
up vote
4
down vote
With sed:
sed -z -e 's/n#S/ #S/g' -e 's/nN /N /g' data
In slow-mo:
-z
makes sed consider the file as a single line (so the line ends are plain characters)
's/n#S/#S/g'
replaces all LF's occurring just before a#S
by a space
-e 's/nN /N /g'
replaces all LFs beforeN
(ie, the blank lines)
hmm, looks I will have to adjust it a little bit as "sed -z -e 's/n#S/ #S/g' -e 's/nN /N /g' test1.txt >> test1_mod.txt" still does not do the same as replacing "rn#S" with "#S" in Notepad++ so there is still CR left on the previous line before "#S".
– Juhele
Dec 4 at 10:59
OK, "sed -z -e 's/rn#S/ #S/g' -e 's/nN /N /g' test1.txt >> test1_mod.txt" finally turns the multiple lines into one. Just blank line remains as the "N..." line ends with CR LF (this is fine) and then there is a blank line with CR.
– Juhele
Dec 4 at 11:09
add a comment |
up vote
4
down vote
up vote
4
down vote
With sed:
sed -z -e 's/n#S/ #S/g' -e 's/nN /N /g' data
In slow-mo:
-z
makes sed consider the file as a single line (so the line ends are plain characters)
's/n#S/#S/g'
replaces all LF's occurring just before a#S
by a space
-e 's/nN /N /g'
replaces all LFs beforeN
(ie, the blank lines)
With sed:
sed -z -e 's/n#S/ #S/g' -e 's/nN /N /g' data
In slow-mo:
-z
makes sed consider the file as a single line (so the line ends are plain characters)
's/n#S/#S/g'
replaces all LF's occurring just before a#S
by a space
-e 's/nN /N /g'
replaces all LFs beforeN
(ie, the blank lines)
answered Nov 21 at 14:32
xenoid
3,5533718
3,5533718
hmm, looks I will have to adjust it a little bit as "sed -z -e 's/n#S/ #S/g' -e 's/nN /N /g' test1.txt >> test1_mod.txt" still does not do the same as replacing "rn#S" with "#S" in Notepad++ so there is still CR left on the previous line before "#S".
– Juhele
Dec 4 at 10:59
OK, "sed -z -e 's/rn#S/ #S/g' -e 's/nN /N /g' test1.txt >> test1_mod.txt" finally turns the multiple lines into one. Just blank line remains as the "N..." line ends with CR LF (this is fine) and then there is a blank line with CR.
– Juhele
Dec 4 at 11:09
add a comment |
hmm, looks I will have to adjust it a little bit as "sed -z -e 's/n#S/ #S/g' -e 's/nN /N /g' test1.txt >> test1_mod.txt" still does not do the same as replacing "rn#S" with "#S" in Notepad++ so there is still CR left on the previous line before "#S".
– Juhele
Dec 4 at 10:59
OK, "sed -z -e 's/rn#S/ #S/g' -e 's/nN /N /g' test1.txt >> test1_mod.txt" finally turns the multiple lines into one. Just blank line remains as the "N..." line ends with CR LF (this is fine) and then there is a blank line with CR.
– Juhele
Dec 4 at 11:09
hmm, looks I will have to adjust it a little bit as "sed -z -e 's/n#S/ #S/g' -e 's/nN /N /g' test1.txt >> test1_mod.txt" still does not do the same as replacing "rn#S" with "#S" in Notepad++ so there is still CR left on the previous line before "#S".
– Juhele
Dec 4 at 10:59
hmm, looks I will have to adjust it a little bit as "sed -z -e 's/n#S/ #S/g' -e 's/nN /N /g' test1.txt >> test1_mod.txt" still does not do the same as replacing "rn#S" with "#S" in Notepad++ so there is still CR left on the previous line before "#S".
– Juhele
Dec 4 at 10:59
OK, "sed -z -e 's/rn#S/ #S/g' -e 's/nN /N /g' test1.txt >> test1_mod.txt" finally turns the multiple lines into one. Just blank line remains as the "N..." line ends with CR LF (this is fine) and then there is a blank line with CR.
– Juhele
Dec 4 at 11:09
OK, "sed -z -e 's/rn#S/ #S/g' -e 's/nN /N /g' test1.txt >> test1_mod.txt" finally turns the multiple lines into one. Just blank line remains as the "N..." line ends with CR LF (this is fine) and then there is a blank line with CR.
– Juhele
Dec 4 at 11:09
add a comment |
up vote
4
down vote
With paste
(this requires to always have groups of 4 lines):
paste -s -d ' n' data
In slo-mo:
paste -s
concatenates the lines from the file
-d
specifies characters to be inserted as delimiters. When there are several characters, they are used in a round-robin fashion, so with 3 spaces and a LF:
- the first space is used on the first splice (
N
to#S
), - the second space is used on the second splice (
#S
to#S
), - the third space is used on the thrid splice (
#S
to blank line), - the last delimiter, a LF, is used on the fourth splice (blank line to
N
) - and the cycle repeats for the next 4 lines.
- the first space is used on the first splice (
add a comment |
up vote
4
down vote
With paste
(this requires to always have groups of 4 lines):
paste -s -d ' n' data
In slo-mo:
paste -s
concatenates the lines from the file
-d
specifies characters to be inserted as delimiters. When there are several characters, they are used in a round-robin fashion, so with 3 spaces and a LF:
- the first space is used on the first splice (
N
to#S
), - the second space is used on the second splice (
#S
to#S
), - the third space is used on the thrid splice (
#S
to blank line), - the last delimiter, a LF, is used on the fourth splice (blank line to
N
) - and the cycle repeats for the next 4 lines.
- the first space is used on the first splice (
add a comment |
up vote
4
down vote
up vote
4
down vote
With paste
(this requires to always have groups of 4 lines):
paste -s -d ' n' data
In slo-mo:
paste -s
concatenates the lines from the file
-d
specifies characters to be inserted as delimiters. When there are several characters, they are used in a round-robin fashion, so with 3 spaces and a LF:
- the first space is used on the first splice (
N
to#S
), - the second space is used on the second splice (
#S
to#S
), - the third space is used on the thrid splice (
#S
to blank line), - the last delimiter, a LF, is used on the fourth splice (blank line to
N
) - and the cycle repeats for the next 4 lines.
- the first space is used on the first splice (
With paste
(this requires to always have groups of 4 lines):
paste -s -d ' n' data
In slo-mo:
paste -s
concatenates the lines from the file
-d
specifies characters to be inserted as delimiters. When there are several characters, they are used in a round-robin fashion, so with 3 spaces and a LF:
- the first space is used on the first splice (
N
to#S
), - the second space is used on the second splice (
#S
to#S
), - the third space is used on the thrid splice (
#S
to blank line), - the last delimiter, a LF, is used on the fourth splice (blank line to
N
) - and the cycle repeats for the next 4 lines.
- the first space is used on the first splice (
answered Nov 21 at 14:42
xenoid
3,5533718
3,5533718
add a comment |
add a comment |
up vote
4
down vote
This is a portable solution with POSIX sed
, implementing the following rules:
- empty lines shall be deleted;
- any line starting with
#S
shall be merged with the previous non-empty line, with a single space character between them, unless there is no previous non-empty line.
The code:
<data sed '/^$/ d; :start; N; s/n$//; t start; s/n#S/ #S/; t start; P; D'
The same with comments (still working code):
<data sed '
/^$/ d # If empty line read, delete it and start a new cycle.
:start # A label.
N # Read additional line, there are now two lines in the pattern space.
s/n$// # If the second line is empty, replace the newline with nothing.
t start # If the above replacement occurred, go to start (to add another line).
# Otherwise
s/n#S/ #S/ # if the second line starts with #S, replace the newline with space.
t start # If the above replacement occurred, go to start (to add another line).
# Otherwise
# (i.e when non-empty line not starting with #S occurred)
P # print the pattern space up to the first newline and...
D # delete the initial segment of the pattern space
# through the first newline (i.e. everything just printed),
# and start the next cycle with the resultant pattern space
# and without reading any new input
# (in our case the new input will be explicitly read by N then).
'
Note the solution uses sed
pattern space to accumulate many input lines. This remark applies:
The pattern and hold spaces shall each be able to hold at least 8192 bytes.
Just before the P
command the pattern space holds one (relatively long) line meant to be printed and a single (relatively short) input line, plus a newline in between. Obviously it depends on your data, whether or not such structure exceeds 8192 bytes at some point. If it does, some sed
implementations may fail.
add a comment |
up vote
4
down vote
This is a portable solution with POSIX sed
, implementing the following rules:
- empty lines shall be deleted;
- any line starting with
#S
shall be merged with the previous non-empty line, with a single space character between them, unless there is no previous non-empty line.
The code:
<data sed '/^$/ d; :start; N; s/n$//; t start; s/n#S/ #S/; t start; P; D'
The same with comments (still working code):
<data sed '
/^$/ d # If empty line read, delete it and start a new cycle.
:start # A label.
N # Read additional line, there are now two lines in the pattern space.
s/n$// # If the second line is empty, replace the newline with nothing.
t start # If the above replacement occurred, go to start (to add another line).
# Otherwise
s/n#S/ #S/ # if the second line starts with #S, replace the newline with space.
t start # If the above replacement occurred, go to start (to add another line).
# Otherwise
# (i.e when non-empty line not starting with #S occurred)
P # print the pattern space up to the first newline and...
D # delete the initial segment of the pattern space
# through the first newline (i.e. everything just printed),
# and start the next cycle with the resultant pattern space
# and without reading any new input
# (in our case the new input will be explicitly read by N then).
'
Note the solution uses sed
pattern space to accumulate many input lines. This remark applies:
The pattern and hold spaces shall each be able to hold at least 8192 bytes.
Just before the P
command the pattern space holds one (relatively long) line meant to be printed and a single (relatively short) input line, plus a newline in between. Obviously it depends on your data, whether or not such structure exceeds 8192 bytes at some point. If it does, some sed
implementations may fail.
add a comment |
up vote
4
down vote
up vote
4
down vote
This is a portable solution with POSIX sed
, implementing the following rules:
- empty lines shall be deleted;
- any line starting with
#S
shall be merged with the previous non-empty line, with a single space character between them, unless there is no previous non-empty line.
The code:
<data sed '/^$/ d; :start; N; s/n$//; t start; s/n#S/ #S/; t start; P; D'
The same with comments (still working code):
<data sed '
/^$/ d # If empty line read, delete it and start a new cycle.
:start # A label.
N # Read additional line, there are now two lines in the pattern space.
s/n$// # If the second line is empty, replace the newline with nothing.
t start # If the above replacement occurred, go to start (to add another line).
# Otherwise
s/n#S/ #S/ # if the second line starts with #S, replace the newline with space.
t start # If the above replacement occurred, go to start (to add another line).
# Otherwise
# (i.e when non-empty line not starting with #S occurred)
P # print the pattern space up to the first newline and...
D # delete the initial segment of the pattern space
# through the first newline (i.e. everything just printed),
# and start the next cycle with the resultant pattern space
# and without reading any new input
# (in our case the new input will be explicitly read by N then).
'
Note the solution uses sed
pattern space to accumulate many input lines. This remark applies:
The pattern and hold spaces shall each be able to hold at least 8192 bytes.
Just before the P
command the pattern space holds one (relatively long) line meant to be printed and a single (relatively short) input line, plus a newline in between. Obviously it depends on your data, whether or not such structure exceeds 8192 bytes at some point. If it does, some sed
implementations may fail.
This is a portable solution with POSIX sed
, implementing the following rules:
- empty lines shall be deleted;
- any line starting with
#S
shall be merged with the previous non-empty line, with a single space character between them, unless there is no previous non-empty line.
The code:
<data sed '/^$/ d; :start; N; s/n$//; t start; s/n#S/ #S/; t start; P; D'
The same with comments (still working code):
<data sed '
/^$/ d # If empty line read, delete it and start a new cycle.
:start # A label.
N # Read additional line, there are now two lines in the pattern space.
s/n$// # If the second line is empty, replace the newline with nothing.
t start # If the above replacement occurred, go to start (to add another line).
# Otherwise
s/n#S/ #S/ # if the second line starts with #S, replace the newline with space.
t start # If the above replacement occurred, go to start (to add another line).
# Otherwise
# (i.e when non-empty line not starting with #S occurred)
P # print the pattern space up to the first newline and...
D # delete the initial segment of the pattern space
# through the first newline (i.e. everything just printed),
# and start the next cycle with the resultant pattern space
# and without reading any new input
# (in our case the new input will be explicitly read by N then).
'
Note the solution uses sed
pattern space to accumulate many input lines. This remark applies:
The pattern and hold spaces shall each be able to hold at least 8192 bytes.
Just before the P
command the pattern space holds one (relatively long) line meant to be printed and a single (relatively short) input line, plus a newline in between. Obviously it depends on your data, whether or not such structure exceeds 8192 bytes at some point. If it does, some sed
implementations may fail.
edited Nov 22 at 6:35
answered Nov 21 at 18:17
Kamil Maciorowski
23.1k155072
23.1k155072
add a comment |
add a comment |
up vote
3
down vote
Using Perl:
perl -0 -ape 's/R(?=RN|#)/ /g' file.txt
N 12344;PE 9.9999999;... #S 0 0 31 44 75 130 165 196... #S_+ "2 5 2 3 3 1 1 2 3 1 2 2...
N 12345;PE 9.9999999;... #S 0 0 34 57 84 133 152... #S_+ "1 0 1 1 2 3 0 0 0...
N 12346;PE 9.9999999;... #S 0 0 31 44 73 140 169... #S_+ "3 3 4 0 0 2 1 2 4...
N 25104;PE 9.9999999;... #S 0 0 36 52 102 108 145... #S_+ "1 1 0 1 0 0 3 0 1...
N 25105;PE 9.9999999;... #S 0 0 32 58 88 130 143...
Regex explain:
s/ : substitute
R : any kind of line break (ie. r, n, rn)
(?= : positive lookahead, zero-length assertion that make sure we have after
RN : a line break followed by letter N
| : OR
# : # character
) : end lookahead
/ /g : replace with a space, global
add a comment |
up vote
3
down vote
Using Perl:
perl -0 -ape 's/R(?=RN|#)/ /g' file.txt
N 12344;PE 9.9999999;... #S 0 0 31 44 75 130 165 196... #S_+ "2 5 2 3 3 1 1 2 3 1 2 2...
N 12345;PE 9.9999999;... #S 0 0 34 57 84 133 152... #S_+ "1 0 1 1 2 3 0 0 0...
N 12346;PE 9.9999999;... #S 0 0 31 44 73 140 169... #S_+ "3 3 4 0 0 2 1 2 4...
N 25104;PE 9.9999999;... #S 0 0 36 52 102 108 145... #S_+ "1 1 0 1 0 0 3 0 1...
N 25105;PE 9.9999999;... #S 0 0 32 58 88 130 143...
Regex explain:
s/ : substitute
R : any kind of line break (ie. r, n, rn)
(?= : positive lookahead, zero-length assertion that make sure we have after
RN : a line break followed by letter N
| : OR
# : # character
) : end lookahead
/ /g : replace with a space, global
add a comment |
up vote
3
down vote
up vote
3
down vote
Using Perl:
perl -0 -ape 's/R(?=RN|#)/ /g' file.txt
N 12344;PE 9.9999999;... #S 0 0 31 44 75 130 165 196... #S_+ "2 5 2 3 3 1 1 2 3 1 2 2...
N 12345;PE 9.9999999;... #S 0 0 34 57 84 133 152... #S_+ "1 0 1 1 2 3 0 0 0...
N 12346;PE 9.9999999;... #S 0 0 31 44 73 140 169... #S_+ "3 3 4 0 0 2 1 2 4...
N 25104;PE 9.9999999;... #S 0 0 36 52 102 108 145... #S_+ "1 1 0 1 0 0 3 0 1...
N 25105;PE 9.9999999;... #S 0 0 32 58 88 130 143...
Regex explain:
s/ : substitute
R : any kind of line break (ie. r, n, rn)
(?= : positive lookahead, zero-length assertion that make sure we have after
RN : a line break followed by letter N
| : OR
# : # character
) : end lookahead
/ /g : replace with a space, global
Using Perl:
perl -0 -ape 's/R(?=RN|#)/ /g' file.txt
N 12344;PE 9.9999999;... #S 0 0 31 44 75 130 165 196... #S_+ "2 5 2 3 3 1 1 2 3 1 2 2...
N 12345;PE 9.9999999;... #S 0 0 34 57 84 133 152... #S_+ "1 0 1 1 2 3 0 0 0...
N 12346;PE 9.9999999;... #S 0 0 31 44 73 140 169... #S_+ "3 3 4 0 0 2 1 2 4...
N 25104;PE 9.9999999;... #S 0 0 36 52 102 108 145... #S_+ "1 1 0 1 0 0 3 0 1...
N 25105;PE 9.9999999;... #S 0 0 32 58 88 130 143...
Regex explain:
s/ : substitute
R : any kind of line break (ie. r, n, rn)
(?= : positive lookahead, zero-length assertion that make sure we have after
RN : a line break followed by letter N
| : OR
# : # character
) : end lookahead
/ /g : replace with a space, global
edited Nov 21 at 16:31
answered Nov 21 at 15:58
Toto
3,37191126
3,37191126
add a comment |
add a comment |
up vote
3
down vote
awk (gawk [1])
As usually other than sed
you can use awk
(and in many different ways...)
awk 'ORS=" "; NR % 4 == 0 && ORS="n" ' data
where
ORS=" "
fixes the output record separator, by default a newline, to a space (you can change)
NR % 4 == 0 && ORS="n"
each 4th line it fixes back to the newlinen
- If nothing else is specified
awk
prints the full line
data
is your data file.
If you want you can use regex as in sed
(in a similar way).
A format check version with awk
Even if not requested, you may want to manage a truncated file eliminating the corrupted output line and generating an error and an error message.
awk '{a=$0; getline b; getline c;
if ( getline > 0 ) {print a, b, c, $0 }
else { print "Ohi " > "/dev/stderr" ; exit 65; } }' data
where
a=$0;
puts the full line in the variablea
getline b;
reads a line and puts the variableb
getline c;
obscure unfathomable command:-)
if (getline)
if it is able to read a line...- ..............
{print a, b, c, $0}
prints the 4 lines
else
prints an error on the stderr device (screen or other) you can custom here...
exit 65
return an exit code different from 0--->error
Bonus: why 65?
Searching for a good value for your exit code [2] you may found that it is suggested to see in /usr/include/sysexits.h
among some C standards...
#define EX_DATAERR 65 /* data format error */
65 is the most appropriate for the a data format error...
Honestly as answer I preferred 42,
but each value different from zero (and not reserved[2]) could be good and 65 is the specific one...
One disadvantage though: the last pack of lines may consist of three of them (i.e. no empty line at the very end); or may not. If three, then the last character of your output is space, not a newline. POSIX defines "line" asa sequence of zero or more non- <newline> characters plus a terminating <newline> character
. This will probably backfire if the output is parsed further.
– Kamil Maciorowski
Nov 22 at 9:36
Nice though, but the OP, among some other points not completely specified, states that are sets of 4 lines, last of them blank. With a truncated file the next unknown processing may be however compromised. A not requested formats check is out of this thread scope, and IMHO a good practice is to generate an error. If you require solidity it is better to opt for a script (awk
,sed
,perl
are scripting languages) that also allows you to reproduce the data processing. Then you have to decide how to deal with errors, but that is another quesiton...:-)
I just try to keep it simple.
– Hastur
Nov 22 at 10:45
@KamilMaciorowski ... nonetheless I added another version with error check...
– Hastur
Nov 22 at 11:29
add a comment |
up vote
3
down vote
awk (gawk [1])
As usually other than sed
you can use awk
(and in many different ways...)
awk 'ORS=" "; NR % 4 == 0 && ORS="n" ' data
where
ORS=" "
fixes the output record separator, by default a newline, to a space (you can change)
NR % 4 == 0 && ORS="n"
each 4th line it fixes back to the newlinen
- If nothing else is specified
awk
prints the full line
data
is your data file.
If you want you can use regex as in sed
(in a similar way).
A format check version with awk
Even if not requested, you may want to manage a truncated file eliminating the corrupted output line and generating an error and an error message.
awk '{a=$0; getline b; getline c;
if ( getline > 0 ) {print a, b, c, $0 }
else { print "Ohi " > "/dev/stderr" ; exit 65; } }' data
where
a=$0;
puts the full line in the variablea
getline b;
reads a line and puts the variableb
getline c;
obscure unfathomable command:-)
if (getline)
if it is able to read a line...- ..............
{print a, b, c, $0}
prints the 4 lines
else
prints an error on the stderr device (screen or other) you can custom here...
exit 65
return an exit code different from 0--->error
Bonus: why 65?
Searching for a good value for your exit code [2] you may found that it is suggested to see in /usr/include/sysexits.h
among some C standards...
#define EX_DATAERR 65 /* data format error */
65 is the most appropriate for the a data format error...
Honestly as answer I preferred 42,
but each value different from zero (and not reserved[2]) could be good and 65 is the specific one...
One disadvantage though: the last pack of lines may consist of three of them (i.e. no empty line at the very end); or may not. If three, then the last character of your output is space, not a newline. POSIX defines "line" asa sequence of zero or more non- <newline> characters plus a terminating <newline> character
. This will probably backfire if the output is parsed further.
– Kamil Maciorowski
Nov 22 at 9:36
Nice though, but the OP, among some other points not completely specified, states that are sets of 4 lines, last of them blank. With a truncated file the next unknown processing may be however compromised. A not requested formats check is out of this thread scope, and IMHO a good practice is to generate an error. If you require solidity it is better to opt for a script (awk
,sed
,perl
are scripting languages) that also allows you to reproduce the data processing. Then you have to decide how to deal with errors, but that is another quesiton...:-)
I just try to keep it simple.
– Hastur
Nov 22 at 10:45
@KamilMaciorowski ... nonetheless I added another version with error check...
– Hastur
Nov 22 at 11:29
add a comment |
up vote
3
down vote
up vote
3
down vote
awk (gawk [1])
As usually other than sed
you can use awk
(and in many different ways...)
awk 'ORS=" "; NR % 4 == 0 && ORS="n" ' data
where
ORS=" "
fixes the output record separator, by default a newline, to a space (you can change)
NR % 4 == 0 && ORS="n"
each 4th line it fixes back to the newlinen
- If nothing else is specified
awk
prints the full line
data
is your data file.
If you want you can use regex as in sed
(in a similar way).
A format check version with awk
Even if not requested, you may want to manage a truncated file eliminating the corrupted output line and generating an error and an error message.
awk '{a=$0; getline b; getline c;
if ( getline > 0 ) {print a, b, c, $0 }
else { print "Ohi " > "/dev/stderr" ; exit 65; } }' data
where
a=$0;
puts the full line in the variablea
getline b;
reads a line and puts the variableb
getline c;
obscure unfathomable command:-)
if (getline)
if it is able to read a line...- ..............
{print a, b, c, $0}
prints the 4 lines
else
prints an error on the stderr device (screen or other) you can custom here...
exit 65
return an exit code different from 0--->error
Bonus: why 65?
Searching for a good value for your exit code [2] you may found that it is suggested to see in /usr/include/sysexits.h
among some C standards...
#define EX_DATAERR 65 /* data format error */
65 is the most appropriate for the a data format error...
Honestly as answer I preferred 42,
but each value different from zero (and not reserved[2]) could be good and 65 is the specific one...
awk (gawk [1])
As usually other than sed
you can use awk
(and in many different ways...)
awk 'ORS=" "; NR % 4 == 0 && ORS="n" ' data
where
ORS=" "
fixes the output record separator, by default a newline, to a space (you can change)
NR % 4 == 0 && ORS="n"
each 4th line it fixes back to the newlinen
- If nothing else is specified
awk
prints the full line
data
is your data file.
If you want you can use regex as in sed
(in a similar way).
A format check version with awk
Even if not requested, you may want to manage a truncated file eliminating the corrupted output line and generating an error and an error message.
awk '{a=$0; getline b; getline c;
if ( getline > 0 ) {print a, b, c, $0 }
else { print "Ohi " > "/dev/stderr" ; exit 65; } }' data
where
a=$0;
puts the full line in the variablea
getline b;
reads a line and puts the variableb
getline c;
obscure unfathomable command:-)
if (getline)
if it is able to read a line...- ..............
{print a, b, c, $0}
prints the 4 lines
else
prints an error on the stderr device (screen or other) you can custom here...
exit 65
return an exit code different from 0--->error
Bonus: why 65?
Searching for a good value for your exit code [2] you may found that it is suggested to see in /usr/include/sysexits.h
among some C standards...
#define EX_DATAERR 65 /* data format error */
65 is the most appropriate for the a data format error...
Honestly as answer I preferred 42,
but each value different from zero (and not reserved[2]) could be good and 65 is the specific one...
edited Nov 22 at 11:49
answered Nov 21 at 22:28
Hastur
13k53266
13k53266
One disadvantage though: the last pack of lines may consist of three of them (i.e. no empty line at the very end); or may not. If three, then the last character of your output is space, not a newline. POSIX defines "line" asa sequence of zero or more non- <newline> characters plus a terminating <newline> character
. This will probably backfire if the output is parsed further.
– Kamil Maciorowski
Nov 22 at 9:36
Nice though, but the OP, among some other points not completely specified, states that are sets of 4 lines, last of them blank. With a truncated file the next unknown processing may be however compromised. A not requested formats check is out of this thread scope, and IMHO a good practice is to generate an error. If you require solidity it is better to opt for a script (awk
,sed
,perl
are scripting languages) that also allows you to reproduce the data processing. Then you have to decide how to deal with errors, but that is another quesiton...:-)
I just try to keep it simple.
– Hastur
Nov 22 at 10:45
@KamilMaciorowski ... nonetheless I added another version with error check...
– Hastur
Nov 22 at 11:29
add a comment |
One disadvantage though: the last pack of lines may consist of three of them (i.e. no empty line at the very end); or may not. If three, then the last character of your output is space, not a newline. POSIX defines "line" asa sequence of zero or more non- <newline> characters plus a terminating <newline> character
. This will probably backfire if the output is parsed further.
– Kamil Maciorowski
Nov 22 at 9:36
Nice though, but the OP, among some other points not completely specified, states that are sets of 4 lines, last of them blank. With a truncated file the next unknown processing may be however compromised. A not requested formats check is out of this thread scope, and IMHO a good practice is to generate an error. If you require solidity it is better to opt for a script (awk
,sed
,perl
are scripting languages) that also allows you to reproduce the data processing. Then you have to decide how to deal with errors, but that is another quesiton...:-)
I just try to keep it simple.
– Hastur
Nov 22 at 10:45
@KamilMaciorowski ... nonetheless I added another version with error check...
– Hastur
Nov 22 at 11:29
One disadvantage though: the last pack of lines may consist of three of them (i.e. no empty line at the very end); or may not. If three, then the last character of your output is space, not a newline. POSIX defines "line" as
a sequence of zero or more non- <newline> characters plus a terminating <newline> character
. This will probably backfire if the output is parsed further.– Kamil Maciorowski
Nov 22 at 9:36
One disadvantage though: the last pack of lines may consist of three of them (i.e. no empty line at the very end); or may not. If three, then the last character of your output is space, not a newline. POSIX defines "line" as
a sequence of zero or more non- <newline> characters plus a terminating <newline> character
. This will probably backfire if the output is parsed further.– Kamil Maciorowski
Nov 22 at 9:36
Nice though, but the OP, among some other points not completely specified, states that are sets of 4 lines, last of them blank. With a truncated file the next unknown processing may be however compromised. A not requested formats check is out of this thread scope, and IMHO a good practice is to generate an error. If you require solidity it is better to opt for a script (
awk
,sed
,perl
are scripting languages) that also allows you to reproduce the data processing. Then you have to decide how to deal with errors, but that is another quesiton...:-)
I just try to keep it simple.– Hastur
Nov 22 at 10:45
Nice though, but the OP, among some other points not completely specified, states that are sets of 4 lines, last of them blank. With a truncated file the next unknown processing may be however compromised. A not requested formats check is out of this thread scope, and IMHO a good practice is to generate an error. If you require solidity it is better to opt for a script (
awk
,sed
,perl
are scripting languages) that also allows you to reproduce the data processing. Then you have to decide how to deal with errors, but that is another quesiton...:-)
I just try to keep it simple.– Hastur
Nov 22 at 10:45
@KamilMaciorowski ... nonetheless I added another version with error check...
– Hastur
Nov 22 at 11:29
@KamilMaciorowski ... nonetheless I added another version with error check...
– Hastur
Nov 22 at 11:29
add a comment |
up vote
0
down vote
You can do it with any text editor that support regular expressions like Notepad++.
The new line is just simple non-printable character or two characters. In Windows usually CarrigeReturn and LineFeed and in Unix based system usually LineFeed only.
To see them you need to turn on showing non-printable character (usually a Paragraph icon)
See here: https://imgur.com/cqiTvrp
Now what you need to do is to use regular expression replacer (CTRL + H) to replace CRLF#S to #S.
The symbol for CR is r and for LF is n. So you gonna end up with rn#S or n#S to #S.
https://imgur.com/GoeVn70
Or you can replace it to SPACE if you need.
The question is tagged "Linux"....
– xenoid
Nov 21 at 14:16
I think regular expressions in Geany are the same. Is used Notepad++ as an example beacuse I am currently at Windows.
– KaRolthas
Nov 21 at 14:20
The question also asks for a command-line utility...
– xenoid
Nov 21 at 14:22
Nice, works. I need to somehow process at least few files now so even Notepad++ helps when I am working on my other machine with Windows. thanks
– Juhele
Nov 21 at 14:44
add a comment |
up vote
0
down vote
You can do it with any text editor that support regular expressions like Notepad++.
The new line is just simple non-printable character or two characters. In Windows usually CarrigeReturn and LineFeed and in Unix based system usually LineFeed only.
To see them you need to turn on showing non-printable character (usually a Paragraph icon)
See here: https://imgur.com/cqiTvrp
Now what you need to do is to use regular expression replacer (CTRL + H) to replace CRLF#S to #S.
The symbol for CR is r and for LF is n. So you gonna end up with rn#S or n#S to #S.
https://imgur.com/GoeVn70
Or you can replace it to SPACE if you need.
The question is tagged "Linux"....
– xenoid
Nov 21 at 14:16
I think regular expressions in Geany are the same. Is used Notepad++ as an example beacuse I am currently at Windows.
– KaRolthas
Nov 21 at 14:20
The question also asks for a command-line utility...
– xenoid
Nov 21 at 14:22
Nice, works. I need to somehow process at least few files now so even Notepad++ helps when I am working on my other machine with Windows. thanks
– Juhele
Nov 21 at 14:44
add a comment |
up vote
0
down vote
up vote
0
down vote
You can do it with any text editor that support regular expressions like Notepad++.
The new line is just simple non-printable character or two characters. In Windows usually CarrigeReturn and LineFeed and in Unix based system usually LineFeed only.
To see them you need to turn on showing non-printable character (usually a Paragraph icon)
See here: https://imgur.com/cqiTvrp
Now what you need to do is to use regular expression replacer (CTRL + H) to replace CRLF#S to #S.
The symbol for CR is r and for LF is n. So you gonna end up with rn#S or n#S to #S.
https://imgur.com/GoeVn70
Or you can replace it to SPACE if you need.
You can do it with any text editor that support regular expressions like Notepad++.
The new line is just simple non-printable character or two characters. In Windows usually CarrigeReturn and LineFeed and in Unix based system usually LineFeed only.
To see them you need to turn on showing non-printable character (usually a Paragraph icon)
See here: https://imgur.com/cqiTvrp
Now what you need to do is to use regular expression replacer (CTRL + H) to replace CRLF#S to #S.
The symbol for CR is r and for LF is n. So you gonna end up with rn#S or n#S to #S.
https://imgur.com/GoeVn70
Or you can replace it to SPACE if you need.
answered Nov 21 at 14:15
KaRolthas
1
1
The question is tagged "Linux"....
– xenoid
Nov 21 at 14:16
I think regular expressions in Geany are the same. Is used Notepad++ as an example beacuse I am currently at Windows.
– KaRolthas
Nov 21 at 14:20
The question also asks for a command-line utility...
– xenoid
Nov 21 at 14:22
Nice, works. I need to somehow process at least few files now so even Notepad++ helps when I am working on my other machine with Windows. thanks
– Juhele
Nov 21 at 14:44
add a comment |
The question is tagged "Linux"....
– xenoid
Nov 21 at 14:16
I think regular expressions in Geany are the same. Is used Notepad++ as an example beacuse I am currently at Windows.
– KaRolthas
Nov 21 at 14:20
The question also asks for a command-line utility...
– xenoid
Nov 21 at 14:22
Nice, works. I need to somehow process at least few files now so even Notepad++ helps when I am working on my other machine with Windows. thanks
– Juhele
Nov 21 at 14:44
The question is tagged "Linux"....
– xenoid
Nov 21 at 14:16
The question is tagged "Linux"....
– xenoid
Nov 21 at 14:16
I think regular expressions in Geany are the same. Is used Notepad++ as an example beacuse I am currently at Windows.
– KaRolthas
Nov 21 at 14:20
I think regular expressions in Geany are the same. Is used Notepad++ as an example beacuse I am currently at Windows.
– KaRolthas
Nov 21 at 14:20
The question also asks for a command-line utility...
– xenoid
Nov 21 at 14:22
The question also asks for a command-line utility...
– xenoid
Nov 21 at 14:22
Nice, works. I need to somehow process at least few files now so even Notepad++ helps when I am working on my other machine with Windows. thanks
– Juhele
Nov 21 at 14:44
Nice, works. I need to somehow process at least few files now so even Notepad++ helps when I am working on my other machine with Windows. thanks
– Juhele
Nov 21 at 14:44
add a comment |
Thanks for contributing an answer to Super User!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Some of your past answers have not been well-received, and you're in danger of being blocked from answering.
Please pay close attention to the following guidance:
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fsuperuser.com%2fquestions%2f1377291%2fhow-can-i-replace-end-line-with-fixed-text-when-the-next-line-begins-with-a-defi%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
thanks to grawity for helping me with the code :-)
– Juhele
Nov 21 at 14:09
1
@Pimp Juice IT: OK, I updated the question.
– Juhele
Nov 21 at 14:14
Hi @Juhele can you specify better the output format: Do you need to cut the first line after e.g.
PE 9.9999999;
, do you need to cut the second after the 7th (8th) number or, as you write, merge the "N" line with the following two ones? What about the"
present only in the output ?! I give some edit to your post, please check it. It can be an incomplete file? BTW for the most simple case you already have more than one good answer.– Hastur
Nov 22 at 9:09
the " is both in input and output like #S_+ "2 5 - it is not important character for me (I am going to remove it in next processing step) but it is just in the input data. "do you need to cut the second after the 7th (8th) number" - No, I just shortened the example as the data has several hundreds "columns".
– Juhele
Dec 4 at 9:05
1
For personal experiences with acquisition devices do redundancy checks (the more the better). I suppose you already know, but with enough lines, it is not important how little is the likelihood of corrupted data, if it is not zero it will occur. Many time is enough
awk '{print NF}' YourFILE | sort -n -u | uniq -c
to know that you have the same number of columns in (almost, because of headers) each line, or a consistent structure (3 lines of data 1 blank...)– Hastur
Dec 4 at 11:09