How to get sum of values in column based on variables in other column separately? [duplicate]
This question already has an answer here:
How to calculate the sum of the data that have the same ID in the first column?
4 answers
I have a table data like below
abc 1 1 1
bcd 2 2 4
bcd 12 23 3
cde 3 5 5
cde 3 4 5
cde 14 2 25
I want the sum of values in each column based on variables in first column and desired result is like below:
abc 1 1 1
bcd 14 25 7
cde 20 11 35
I used awk command like this
awk -F"t" '{for(n=2;n<=NF; ++n)a[$1]+=$n}END{for(i in a ) print i, a[i] }' tablefilepath
and I got a result below:
abc 3
bcd 46
cde 66
I think the end of my code is wrong but don't know how to fix it.
I need some directions to fix the code.
shell-script text-processing awk numeric-data
marked as duplicate by Jeff Schaller, elbarna, RalfFriedl, roaima, Isaac Nov 27 at 23:37
This question has been asked before and already has an answer. If those answers do not fully address your question, please ask a new question.
add a comment |
This question already has an answer here:
How to calculate the sum of the data that have the same ID in the first column?
4 answers
I have a table data like below
abc 1 1 1
bcd 2 2 4
bcd 12 23 3
cde 3 5 5
cde 3 4 5
cde 14 2 25
I want the sum of values in each column based on variables in first column and desired result is like below:
abc 1 1 1
bcd 14 25 7
cde 20 11 35
I used awk command like this
awk -F"t" '{for(n=2;n<=NF; ++n)a[$1]+=$n}END{for(i in a ) print i, a[i] }' tablefilepath
and I got a result below:
abc 3
bcd 46
cde 66
I think the end of my code is wrong but don't know how to fix it.
I need some directions to fix the code.
shell-script text-processing awk numeric-data
marked as duplicate by Jeff Schaller, elbarna, RalfFriedl, roaima, Isaac Nov 27 at 23:37
This question has been asked before and already has an answer. If those answers do not fully address your question, please ask a new question.
add a comment |
This question already has an answer here:
How to calculate the sum of the data that have the same ID in the first column?
4 answers
I have a table data like below
abc 1 1 1
bcd 2 2 4
bcd 12 23 3
cde 3 5 5
cde 3 4 5
cde 14 2 25
I want the sum of values in each column based on variables in first column and desired result is like below:
abc 1 1 1
bcd 14 25 7
cde 20 11 35
I used awk command like this
awk -F"t" '{for(n=2;n<=NF; ++n)a[$1]+=$n}END{for(i in a ) print i, a[i] }' tablefilepath
and I got a result below:
abc 3
bcd 46
cde 66
I think the end of my code is wrong but don't know how to fix it.
I need some directions to fix the code.
shell-script text-processing awk numeric-data
This question already has an answer here:
How to calculate the sum of the data that have the same ID in the first column?
4 answers
I have a table data like below
abc 1 1 1
bcd 2 2 4
bcd 12 23 3
cde 3 5 5
cde 3 4 5
cde 14 2 25
I want the sum of values in each column based on variables in first column and desired result is like below:
abc 1 1 1
bcd 14 25 7
cde 20 11 35
I used awk command like this
awk -F"t" '{for(n=2;n<=NF; ++n)a[$1]+=$n}END{for(i in a ) print i, a[i] }' tablefilepath
and I got a result below:
abc 3
bcd 46
cde 66
I think the end of my code is wrong but don't know how to fix it.
I need some directions to fix the code.
This question already has an answer here:
How to calculate the sum of the data that have the same ID in the first column?
4 answers
shell-script text-processing awk numeric-data
shell-script text-processing awk numeric-data
edited Nov 27 at 11:40
terdon♦
128k31249423
128k31249423
asked Nov 27 at 6:05
awkprob
232
232
marked as duplicate by Jeff Schaller, elbarna, RalfFriedl, roaima, Isaac Nov 27 at 23:37
This question has been asked before and already has an answer. If those answers do not fully address your question, please ask a new question.
marked as duplicate by Jeff Schaller, elbarna, RalfFriedl, roaima, Isaac Nov 27 at 23:37
This question has been asked before and already has an answer. If those answers do not fully address your question, please ask a new question.
add a comment |
add a comment |
3 Answers
3
active
oldest
votes
You were fairly close.
You see what you were doing wrong, don't you?
You were keeping one total for each column 1 value,
when you should have been keeping three.
This is similar to Inian's answer,
but trivially extendable to handle any number of columns:
awk -F"t" '{for(n=2;n<=NF; ++n) a[$1][n]+=$n}
END {for(i in a) {
printf "%s", i
for (n=2; n<=4; ++n) printf "t%s", a[i][n]
printf "n"
}
}'
Rather than keep three arrays, like Inian's answer,
it keeps a two-dimensional array.
Why limit it at all? Why notawk '{for(n=2;n<=NF; ++n){a[$1][n]+=$n}}END{for(i in a){ printf "%s ", i; for(k in a[i]){printf "%s ",a[i][k]} print ""}}'
? I mean, why usefor (n=2; n<=4; ++n)
in theEND{}
block instead of just iterating over the array so you don't need to keep track of its size?
– terdon♦
Nov 27 at 11:46
@terdon: Thanks for dropping by. "for (variable in array)
[which] shall iterate, assigning each index of array to variable in an unspecified order." — POSIX Inian and I failed to mention that our answers produce output in random order (specifically, I getbcd
,abc
,cde
); but that can be fixed by pipingawk
intosort
. Your enhancement would output the columns in random order, with no way to fix it by post-processing.
– Scott
Nov 27 at 19:10
Ah, yes indeed. Fair point.
– terdon♦
Nov 27 at 19:23
@Scott: Thanks for the direction!. Now I can see what was wrong with my code. But when I try your code, I get syntax error message "awk: line 1: syntax error at or near [ ". Is this caused by variables expansion problem or escaping problem? It's difficult to find the reason.
– awkprob
Nov 28 at 1:55
@Scott: I'm running Linux ubuntu 14.04 and after gnu awk installation, 'awk --version' say GNU Awk 3.1.8. But still have syntax error
– awkprob
Nov 28 at 4:54
|
show 3 more comments
So long as your file is tab-delimited, datamash is a good fit for this.
$ datamash groupby 1 sum 2 sum 3 sum 4 < tablefilepath
abc 1 1 1
bcd 14 25 7
cde 20 11 35
Datamash can also work with non-tabs, if you specify -t <delimiter>
. But tabs seem closest to the example input you have provided.
Datamash won't work if your input is delimited by arbitrary whitespace (i.e. possible multiple spaces intended to "look like" a tab). Still, even if that is what your data looks like, it is easily munged into the form expected by datamash:
sed -i 's/ +/t/g' tablefilepath
1
At least in recent versions, there's a-W
(--whitespace
) option that should allow arbitrary whitespace delimiters
– steeldriver
Nov 27 at 6:17
@steeldriver Thanks!
– cryptarch
Nov 27 at 6:57
add a comment |
Using awk
summing up the columns 2-4 based on 1.
awk -v FS="t" -v OFS="t" '{ col1[$1]+=$2; col2[$1]+=$3; col3[$1]+=$4; next } END { for ( i in col1) print i, col1[i], col2[i], col3[i] }' file
add a comment |
3 Answers
3
active
oldest
votes
3 Answers
3
active
oldest
votes
active
oldest
votes
active
oldest
votes
You were fairly close.
You see what you were doing wrong, don't you?
You were keeping one total for each column 1 value,
when you should have been keeping three.
This is similar to Inian's answer,
but trivially extendable to handle any number of columns:
awk -F"t" '{for(n=2;n<=NF; ++n) a[$1][n]+=$n}
END {for(i in a) {
printf "%s", i
for (n=2; n<=4; ++n) printf "t%s", a[i][n]
printf "n"
}
}'
Rather than keep three arrays, like Inian's answer,
it keeps a two-dimensional array.
Why limit it at all? Why notawk '{for(n=2;n<=NF; ++n){a[$1][n]+=$n}}END{for(i in a){ printf "%s ", i; for(k in a[i]){printf "%s ",a[i][k]} print ""}}'
? I mean, why usefor (n=2; n<=4; ++n)
in theEND{}
block instead of just iterating over the array so you don't need to keep track of its size?
– terdon♦
Nov 27 at 11:46
@terdon: Thanks for dropping by. "for (variable in array)
[which] shall iterate, assigning each index of array to variable in an unspecified order." — POSIX Inian and I failed to mention that our answers produce output in random order (specifically, I getbcd
,abc
,cde
); but that can be fixed by pipingawk
intosort
. Your enhancement would output the columns in random order, with no way to fix it by post-processing.
– Scott
Nov 27 at 19:10
Ah, yes indeed. Fair point.
– terdon♦
Nov 27 at 19:23
@Scott: Thanks for the direction!. Now I can see what was wrong with my code. But when I try your code, I get syntax error message "awk: line 1: syntax error at or near [ ". Is this caused by variables expansion problem or escaping problem? It's difficult to find the reason.
– awkprob
Nov 28 at 1:55
@Scott: I'm running Linux ubuntu 14.04 and after gnu awk installation, 'awk --version' say GNU Awk 3.1.8. But still have syntax error
– awkprob
Nov 28 at 4:54
|
show 3 more comments
You were fairly close.
You see what you were doing wrong, don't you?
You were keeping one total for each column 1 value,
when you should have been keeping three.
This is similar to Inian's answer,
but trivially extendable to handle any number of columns:
awk -F"t" '{for(n=2;n<=NF; ++n) a[$1][n]+=$n}
END {for(i in a) {
printf "%s", i
for (n=2; n<=4; ++n) printf "t%s", a[i][n]
printf "n"
}
}'
Rather than keep three arrays, like Inian's answer,
it keeps a two-dimensional array.
Why limit it at all? Why notawk '{for(n=2;n<=NF; ++n){a[$1][n]+=$n}}END{for(i in a){ printf "%s ", i; for(k in a[i]){printf "%s ",a[i][k]} print ""}}'
? I mean, why usefor (n=2; n<=4; ++n)
in theEND{}
block instead of just iterating over the array so you don't need to keep track of its size?
– terdon♦
Nov 27 at 11:46
@terdon: Thanks for dropping by. "for (variable in array)
[which] shall iterate, assigning each index of array to variable in an unspecified order." — POSIX Inian and I failed to mention that our answers produce output in random order (specifically, I getbcd
,abc
,cde
); but that can be fixed by pipingawk
intosort
. Your enhancement would output the columns in random order, with no way to fix it by post-processing.
– Scott
Nov 27 at 19:10
Ah, yes indeed. Fair point.
– terdon♦
Nov 27 at 19:23
@Scott: Thanks for the direction!. Now I can see what was wrong with my code. But when I try your code, I get syntax error message "awk: line 1: syntax error at or near [ ". Is this caused by variables expansion problem or escaping problem? It's difficult to find the reason.
– awkprob
Nov 28 at 1:55
@Scott: I'm running Linux ubuntu 14.04 and after gnu awk installation, 'awk --version' say GNU Awk 3.1.8. But still have syntax error
– awkprob
Nov 28 at 4:54
|
show 3 more comments
You were fairly close.
You see what you were doing wrong, don't you?
You were keeping one total for each column 1 value,
when you should have been keeping three.
This is similar to Inian's answer,
but trivially extendable to handle any number of columns:
awk -F"t" '{for(n=2;n<=NF; ++n) a[$1][n]+=$n}
END {for(i in a) {
printf "%s", i
for (n=2; n<=4; ++n) printf "t%s", a[i][n]
printf "n"
}
}'
Rather than keep three arrays, like Inian's answer,
it keeps a two-dimensional array.
You were fairly close.
You see what you were doing wrong, don't you?
You were keeping one total for each column 1 value,
when you should have been keeping three.
This is similar to Inian's answer,
but trivially extendable to handle any number of columns:
awk -F"t" '{for(n=2;n<=NF; ++n) a[$1][n]+=$n}
END {for(i in a) {
printf "%s", i
for (n=2; n<=4; ++n) printf "t%s", a[i][n]
printf "n"
}
}'
Rather than keep three arrays, like Inian's answer,
it keeps a two-dimensional array.
answered Nov 27 at 6:27
Scott
6,83152750
6,83152750
Why limit it at all? Why notawk '{for(n=2;n<=NF; ++n){a[$1][n]+=$n}}END{for(i in a){ printf "%s ", i; for(k in a[i]){printf "%s ",a[i][k]} print ""}}'
? I mean, why usefor (n=2; n<=4; ++n)
in theEND{}
block instead of just iterating over the array so you don't need to keep track of its size?
– terdon♦
Nov 27 at 11:46
@terdon: Thanks for dropping by. "for (variable in array)
[which] shall iterate, assigning each index of array to variable in an unspecified order." — POSIX Inian and I failed to mention that our answers produce output in random order (specifically, I getbcd
,abc
,cde
); but that can be fixed by pipingawk
intosort
. Your enhancement would output the columns in random order, with no way to fix it by post-processing.
– Scott
Nov 27 at 19:10
Ah, yes indeed. Fair point.
– terdon♦
Nov 27 at 19:23
@Scott: Thanks for the direction!. Now I can see what was wrong with my code. But when I try your code, I get syntax error message "awk: line 1: syntax error at or near [ ". Is this caused by variables expansion problem or escaping problem? It's difficult to find the reason.
– awkprob
Nov 28 at 1:55
@Scott: I'm running Linux ubuntu 14.04 and after gnu awk installation, 'awk --version' say GNU Awk 3.1.8. But still have syntax error
– awkprob
Nov 28 at 4:54
|
show 3 more comments
Why limit it at all? Why notawk '{for(n=2;n<=NF; ++n){a[$1][n]+=$n}}END{for(i in a){ printf "%s ", i; for(k in a[i]){printf "%s ",a[i][k]} print ""}}'
? I mean, why usefor (n=2; n<=4; ++n)
in theEND{}
block instead of just iterating over the array so you don't need to keep track of its size?
– terdon♦
Nov 27 at 11:46
@terdon: Thanks for dropping by. "for (variable in array)
[which] shall iterate, assigning each index of array to variable in an unspecified order." — POSIX Inian and I failed to mention that our answers produce output in random order (specifically, I getbcd
,abc
,cde
); but that can be fixed by pipingawk
intosort
. Your enhancement would output the columns in random order, with no way to fix it by post-processing.
– Scott
Nov 27 at 19:10
Ah, yes indeed. Fair point.
– terdon♦
Nov 27 at 19:23
@Scott: Thanks for the direction!. Now I can see what was wrong with my code. But when I try your code, I get syntax error message "awk: line 1: syntax error at or near [ ". Is this caused by variables expansion problem or escaping problem? It's difficult to find the reason.
– awkprob
Nov 28 at 1:55
@Scott: I'm running Linux ubuntu 14.04 and after gnu awk installation, 'awk --version' say GNU Awk 3.1.8. But still have syntax error
– awkprob
Nov 28 at 4:54
Why limit it at all? Why not
awk '{for(n=2;n<=NF; ++n){a[$1][n]+=$n}}END{for(i in a){ printf "%s ", i; for(k in a[i]){printf "%s ",a[i][k]} print ""}}'
? I mean, why use for (n=2; n<=4; ++n)
in the END{}
block instead of just iterating over the array so you don't need to keep track of its size?– terdon♦
Nov 27 at 11:46
Why limit it at all? Why not
awk '{for(n=2;n<=NF; ++n){a[$1][n]+=$n}}END{for(i in a){ printf "%s ", i; for(k in a[i]){printf "%s ",a[i][k]} print ""}}'
? I mean, why use for (n=2; n<=4; ++n)
in the END{}
block instead of just iterating over the array so you don't need to keep track of its size?– terdon♦
Nov 27 at 11:46
@terdon: Thanks for dropping by. "
for (variable in array)
[which] shall iterate, assigning each index of array to variable in an unspecified order." — POSIX Inian and I failed to mention that our answers produce output in random order (specifically, I get bcd
, abc
, cde
); but that can be fixed by piping awk
into sort
. Your enhancement would output the columns in random order, with no way to fix it by post-processing.– Scott
Nov 27 at 19:10
@terdon: Thanks for dropping by. "
for (variable in array)
[which] shall iterate, assigning each index of array to variable in an unspecified order." — POSIX Inian and I failed to mention that our answers produce output in random order (specifically, I get bcd
, abc
, cde
); but that can be fixed by piping awk
into sort
. Your enhancement would output the columns in random order, with no way to fix it by post-processing.– Scott
Nov 27 at 19:10
Ah, yes indeed. Fair point.
– terdon♦
Nov 27 at 19:23
Ah, yes indeed. Fair point.
– terdon♦
Nov 27 at 19:23
@Scott: Thanks for the direction!. Now I can see what was wrong with my code. But when I try your code, I get syntax error message "awk: line 1: syntax error at or near [ ". Is this caused by variables expansion problem or escaping problem? It's difficult to find the reason.
– awkprob
Nov 28 at 1:55
@Scott: Thanks for the direction!. Now I can see what was wrong with my code. But when I try your code, I get syntax error message "awk: line 1: syntax error at or near [ ". Is this caused by variables expansion problem or escaping problem? It's difficult to find the reason.
– awkprob
Nov 28 at 1:55
@Scott: I'm running Linux ubuntu 14.04 and after gnu awk installation, 'awk --version' say GNU Awk 3.1.8. But still have syntax error
– awkprob
Nov 28 at 4:54
@Scott: I'm running Linux ubuntu 14.04 and after gnu awk installation, 'awk --version' say GNU Awk 3.1.8. But still have syntax error
– awkprob
Nov 28 at 4:54
|
show 3 more comments
So long as your file is tab-delimited, datamash is a good fit for this.
$ datamash groupby 1 sum 2 sum 3 sum 4 < tablefilepath
abc 1 1 1
bcd 14 25 7
cde 20 11 35
Datamash can also work with non-tabs, if you specify -t <delimiter>
. But tabs seem closest to the example input you have provided.
Datamash won't work if your input is delimited by arbitrary whitespace (i.e. possible multiple spaces intended to "look like" a tab). Still, even if that is what your data looks like, it is easily munged into the form expected by datamash:
sed -i 's/ +/t/g' tablefilepath
1
At least in recent versions, there's a-W
(--whitespace
) option that should allow arbitrary whitespace delimiters
– steeldriver
Nov 27 at 6:17
@steeldriver Thanks!
– cryptarch
Nov 27 at 6:57
add a comment |
So long as your file is tab-delimited, datamash is a good fit for this.
$ datamash groupby 1 sum 2 sum 3 sum 4 < tablefilepath
abc 1 1 1
bcd 14 25 7
cde 20 11 35
Datamash can also work with non-tabs, if you specify -t <delimiter>
. But tabs seem closest to the example input you have provided.
Datamash won't work if your input is delimited by arbitrary whitespace (i.e. possible multiple spaces intended to "look like" a tab). Still, even if that is what your data looks like, it is easily munged into the form expected by datamash:
sed -i 's/ +/t/g' tablefilepath
1
At least in recent versions, there's a-W
(--whitespace
) option that should allow arbitrary whitespace delimiters
– steeldriver
Nov 27 at 6:17
@steeldriver Thanks!
– cryptarch
Nov 27 at 6:57
add a comment |
So long as your file is tab-delimited, datamash is a good fit for this.
$ datamash groupby 1 sum 2 sum 3 sum 4 < tablefilepath
abc 1 1 1
bcd 14 25 7
cde 20 11 35
Datamash can also work with non-tabs, if you specify -t <delimiter>
. But tabs seem closest to the example input you have provided.
Datamash won't work if your input is delimited by arbitrary whitespace (i.e. possible multiple spaces intended to "look like" a tab). Still, even if that is what your data looks like, it is easily munged into the form expected by datamash:
sed -i 's/ +/t/g' tablefilepath
So long as your file is tab-delimited, datamash is a good fit for this.
$ datamash groupby 1 sum 2 sum 3 sum 4 < tablefilepath
abc 1 1 1
bcd 14 25 7
cde 20 11 35
Datamash can also work with non-tabs, if you specify -t <delimiter>
. But tabs seem closest to the example input you have provided.
Datamash won't work if your input is delimited by arbitrary whitespace (i.e. possible multiple spaces intended to "look like" a tab). Still, even if that is what your data looks like, it is easily munged into the form expected by datamash:
sed -i 's/ +/t/g' tablefilepath
answered Nov 27 at 6:12
cryptarch
4546
4546
1
At least in recent versions, there's a-W
(--whitespace
) option that should allow arbitrary whitespace delimiters
– steeldriver
Nov 27 at 6:17
@steeldriver Thanks!
– cryptarch
Nov 27 at 6:57
add a comment |
1
At least in recent versions, there's a-W
(--whitespace
) option that should allow arbitrary whitespace delimiters
– steeldriver
Nov 27 at 6:17
@steeldriver Thanks!
– cryptarch
Nov 27 at 6:57
1
1
At least in recent versions, there's a
-W
(--whitespace
) option that should allow arbitrary whitespace delimiters– steeldriver
Nov 27 at 6:17
At least in recent versions, there's a
-W
(--whitespace
) option that should allow arbitrary whitespace delimiters– steeldriver
Nov 27 at 6:17
@steeldriver Thanks!
– cryptarch
Nov 27 at 6:57
@steeldriver Thanks!
– cryptarch
Nov 27 at 6:57
add a comment |
Using awk
summing up the columns 2-4 based on 1.
awk -v FS="t" -v OFS="t" '{ col1[$1]+=$2; col2[$1]+=$3; col3[$1]+=$4; next } END { for ( i in col1) print i, col1[i], col2[i], col3[i] }' file
add a comment |
Using awk
summing up the columns 2-4 based on 1.
awk -v FS="t" -v OFS="t" '{ col1[$1]+=$2; col2[$1]+=$3; col3[$1]+=$4; next } END { for ( i in col1) print i, col1[i], col2[i], col3[i] }' file
add a comment |
Using awk
summing up the columns 2-4 based on 1.
awk -v FS="t" -v OFS="t" '{ col1[$1]+=$2; col2[$1]+=$3; col3[$1]+=$4; next } END { for ( i in col1) print i, col1[i], col2[i], col3[i] }' file
Using awk
summing up the columns 2-4 based on 1.
awk -v FS="t" -v OFS="t" '{ col1[$1]+=$2; col2[$1]+=$3; col3[$1]+=$4; next } END { for ( i in col1) print i, col1[i], col2[i], col3[i] }' file
answered Nov 27 at 6:17
Inian
3,860824
3,860824
add a comment |
add a comment |