A pythonic and uFunc-y way to turn pandas column into “increasing” index?
up vote
7
down vote
favorite
Let's say I have a pandas df like so:
Index A B
0 foo 3
1 foo 2
2 foo 5
3 bar 3
4 bar 4
5 baz 5
What's a good fast way to add a column like so:
Index A B Aidx
0 foo 3 0
1 foo 2 0
2 foo 5 0
3 bar 3 1
4 bar 4 1
5 baz 5 2
I.e. adding an increasing index for each unique value?
I know I could use df.unique()
, then use a dict and enumerate
to create a lookup, and then apply that dictionary lookup to create the column. But I feel like there should be faster way, possibly involving groupby
with some special function?
python pandas
add a comment |
up vote
7
down vote
favorite
Let's say I have a pandas df like so:
Index A B
0 foo 3
1 foo 2
2 foo 5
3 bar 3
4 bar 4
5 baz 5
What's a good fast way to add a column like so:
Index A B Aidx
0 foo 3 0
1 foo 2 0
2 foo 5 0
3 bar 3 1
4 bar 4 1
5 baz 5 2
I.e. adding an increasing index for each unique value?
I know I could use df.unique()
, then use a dict and enumerate
to create a lookup, and then apply that dictionary lookup to create the column. But I feel like there should be faster way, possibly involving groupby
with some special function?
python pandas
add a comment |
up vote
7
down vote
favorite
up vote
7
down vote
favorite
Let's say I have a pandas df like so:
Index A B
0 foo 3
1 foo 2
2 foo 5
3 bar 3
4 bar 4
5 baz 5
What's a good fast way to add a column like so:
Index A B Aidx
0 foo 3 0
1 foo 2 0
2 foo 5 0
3 bar 3 1
4 bar 4 1
5 baz 5 2
I.e. adding an increasing index for each unique value?
I know I could use df.unique()
, then use a dict and enumerate
to create a lookup, and then apply that dictionary lookup to create the column. But I feel like there should be faster way, possibly involving groupby
with some special function?
python pandas
Let's say I have a pandas df like so:
Index A B
0 foo 3
1 foo 2
2 foo 5
3 bar 3
4 bar 4
5 baz 5
What's a good fast way to add a column like so:
Index A B Aidx
0 foo 3 0
1 foo 2 0
2 foo 5 0
3 bar 3 1
4 bar 4 1
5 baz 5 2
I.e. adding an increasing index for each unique value?
I know I could use df.unique()
, then use a dict and enumerate
to create a lookup, and then apply that dictionary lookup to create the column. But I feel like there should be faster way, possibly involving groupby
with some special function?
python pandas
python pandas
asked 2 hours ago
Lagerbaer
2,6231124
2,6231124
add a comment |
add a comment |
3 Answers
3
active
oldest
votes
up vote
7
down vote
One way is to use ngroup
. Just remember you have to make sure your groupby isn't resorting the groups to get your desired output, so set sort=False
:
df['Aidx'] = df.groupby('A',sort=False).ngroup()
>>> df
Index A B Aidx
0 0 foo 3 0
1 1 foo 2 0
2 2 foo 5 0
3 3 bar 3 1
4 4 bar 4 1
5 5 baz 5 2
add a comment |
up vote
6
down vote
No need groupby
using
Method 1factorize
pd.factorize(df.A)[0]
array([0, 0, 0, 1, 1, 2], dtype=int64)
#df['Aidx']=pd.factorize(df.A)[0]
Method 2 sklearn
from sklearn import preprocessing
le = preprocessing.LabelEncoder()
le.fit(df.A)
LabelEncoder()
le.transform(df.A)
array([2, 2, 2, 0, 0, 1])
Method 3 cat.codes
df.A.astype('category').cat.codes
Method 4 map
+ unique
l=df.A.unique()
df.A.map(dict(zip(l,range(len(l)))))
0 0
1 0
2 0
3 1
4 1
5 2
Name: A, dtype: int64
Method 5 np.unique
x,y=np.unique(df.A.values,return_inverse=True)
y
array([2, 2, 2, 0, 0, 1], dtype=int64)
Good solutions, they should be really fast as well. May be add time comparison as OP is looking for the most efficient solution
– Vaishali
45 mins ago
@Vaishali sorry it is hard for me to get the timing , would you mind add that for me , thanks a lot
– W-B
33 mins ago
add a comment |
up vote
4
down vote
One more method of doing so could be.
df['C'] = i.ne(df.A.shift()).cumsum()-1
df
When we print df
value it will be as follows.
Index A B C
0 0 foo 3 0
1 1 foo 2 0
2 2 foo 5 0
3 3 bar 3 1
4 4 bar 4 1
5 5 baz 5 2
Explanation of solution: Let's break above solution into parts for understanding purposes.
1st step: Compare df's A column by shifting its value down to itself as follows.
i.ne(df.A.shift())
Output we will get is:
0 True
1 False
2 False
3 True
4 False
5 True
2nd step: Use of cumsum()
function, so wherever TRUE value is coming(which will come when a match of A column and its shift is NOT found) it will call cumsum()
function and its value will be increased.
i.ne(df.A.shift()).cumsum()-1
0 0
1 0
2 0
3 1
4 1
5 2
Name: A, dtype: int32
3rd step: Save command's value into df['C']
which will create a new column named C
in df
.
1
Nice method ve++ for you
– W-B
1 hour ago
@W-B, thank you for encouragement sir, ++ve for your unique style already :)
– RavinderSingh13
1 hour ago
add a comment |
Your Answer
StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");
StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});
function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});
}
});
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53772121%2fa-pythonic-and-ufunc-y-way-to-turn-pandas-column-into-increasing-index%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
3 Answers
3
active
oldest
votes
3 Answers
3
active
oldest
votes
active
oldest
votes
active
oldest
votes
up vote
7
down vote
One way is to use ngroup
. Just remember you have to make sure your groupby isn't resorting the groups to get your desired output, so set sort=False
:
df['Aidx'] = df.groupby('A',sort=False).ngroup()
>>> df
Index A B Aidx
0 0 foo 3 0
1 1 foo 2 0
2 2 foo 5 0
3 3 bar 3 1
4 4 bar 4 1
5 5 baz 5 2
add a comment |
up vote
7
down vote
One way is to use ngroup
. Just remember you have to make sure your groupby isn't resorting the groups to get your desired output, so set sort=False
:
df['Aidx'] = df.groupby('A',sort=False).ngroup()
>>> df
Index A B Aidx
0 0 foo 3 0
1 1 foo 2 0
2 2 foo 5 0
3 3 bar 3 1
4 4 bar 4 1
5 5 baz 5 2
add a comment |
up vote
7
down vote
up vote
7
down vote
One way is to use ngroup
. Just remember you have to make sure your groupby isn't resorting the groups to get your desired output, so set sort=False
:
df['Aidx'] = df.groupby('A',sort=False).ngroup()
>>> df
Index A B Aidx
0 0 foo 3 0
1 1 foo 2 0
2 2 foo 5 0
3 3 bar 3 1
4 4 bar 4 1
5 5 baz 5 2
One way is to use ngroup
. Just remember you have to make sure your groupby isn't resorting the groups to get your desired output, so set sort=False
:
df['Aidx'] = df.groupby('A',sort=False).ngroup()
>>> df
Index A B Aidx
0 0 foo 3 0
1 1 foo 2 0
2 2 foo 5 0
3 3 bar 3 1
4 4 bar 4 1
5 5 baz 5 2
edited 1 hour ago
answered 2 hours ago
sacul
29.7k41640
29.7k41640
add a comment |
add a comment |
up vote
6
down vote
No need groupby
using
Method 1factorize
pd.factorize(df.A)[0]
array([0, 0, 0, 1, 1, 2], dtype=int64)
#df['Aidx']=pd.factorize(df.A)[0]
Method 2 sklearn
from sklearn import preprocessing
le = preprocessing.LabelEncoder()
le.fit(df.A)
LabelEncoder()
le.transform(df.A)
array([2, 2, 2, 0, 0, 1])
Method 3 cat.codes
df.A.astype('category').cat.codes
Method 4 map
+ unique
l=df.A.unique()
df.A.map(dict(zip(l,range(len(l)))))
0 0
1 0
2 0
3 1
4 1
5 2
Name: A, dtype: int64
Method 5 np.unique
x,y=np.unique(df.A.values,return_inverse=True)
y
array([2, 2, 2, 0, 0, 1], dtype=int64)
Good solutions, they should be really fast as well. May be add time comparison as OP is looking for the most efficient solution
– Vaishali
45 mins ago
@Vaishali sorry it is hard for me to get the timing , would you mind add that for me , thanks a lot
– W-B
33 mins ago
add a comment |
up vote
6
down vote
No need groupby
using
Method 1factorize
pd.factorize(df.A)[0]
array([0, 0, 0, 1, 1, 2], dtype=int64)
#df['Aidx']=pd.factorize(df.A)[0]
Method 2 sklearn
from sklearn import preprocessing
le = preprocessing.LabelEncoder()
le.fit(df.A)
LabelEncoder()
le.transform(df.A)
array([2, 2, 2, 0, 0, 1])
Method 3 cat.codes
df.A.astype('category').cat.codes
Method 4 map
+ unique
l=df.A.unique()
df.A.map(dict(zip(l,range(len(l)))))
0 0
1 0
2 0
3 1
4 1
5 2
Name: A, dtype: int64
Method 5 np.unique
x,y=np.unique(df.A.values,return_inverse=True)
y
array([2, 2, 2, 0, 0, 1], dtype=int64)
Good solutions, they should be really fast as well. May be add time comparison as OP is looking for the most efficient solution
– Vaishali
45 mins ago
@Vaishali sorry it is hard for me to get the timing , would you mind add that for me , thanks a lot
– W-B
33 mins ago
add a comment |
up vote
6
down vote
up vote
6
down vote
No need groupby
using
Method 1factorize
pd.factorize(df.A)[0]
array([0, 0, 0, 1, 1, 2], dtype=int64)
#df['Aidx']=pd.factorize(df.A)[0]
Method 2 sklearn
from sklearn import preprocessing
le = preprocessing.LabelEncoder()
le.fit(df.A)
LabelEncoder()
le.transform(df.A)
array([2, 2, 2, 0, 0, 1])
Method 3 cat.codes
df.A.astype('category').cat.codes
Method 4 map
+ unique
l=df.A.unique()
df.A.map(dict(zip(l,range(len(l)))))
0 0
1 0
2 0
3 1
4 1
5 2
Name: A, dtype: int64
Method 5 np.unique
x,y=np.unique(df.A.values,return_inverse=True)
y
array([2, 2, 2, 0, 0, 1], dtype=int64)
No need groupby
using
Method 1factorize
pd.factorize(df.A)[0]
array([0, 0, 0, 1, 1, 2], dtype=int64)
#df['Aidx']=pd.factorize(df.A)[0]
Method 2 sklearn
from sklearn import preprocessing
le = preprocessing.LabelEncoder()
le.fit(df.A)
LabelEncoder()
le.transform(df.A)
array([2, 2, 2, 0, 0, 1])
Method 3 cat.codes
df.A.astype('category').cat.codes
Method 4 map
+ unique
l=df.A.unique()
df.A.map(dict(zip(l,range(len(l)))))
0 0
1 0
2 0
3 1
4 1
5 2
Name: A, dtype: int64
Method 5 np.unique
x,y=np.unique(df.A.values,return_inverse=True)
y
array([2, 2, 2, 0, 0, 1], dtype=int64)
edited 35 mins ago
answered 1 hour ago
W-B
97.8k73162
97.8k73162
Good solutions, they should be really fast as well. May be add time comparison as OP is looking for the most efficient solution
– Vaishali
45 mins ago
@Vaishali sorry it is hard for me to get the timing , would you mind add that for me , thanks a lot
– W-B
33 mins ago
add a comment |
Good solutions, they should be really fast as well. May be add time comparison as OP is looking for the most efficient solution
– Vaishali
45 mins ago
@Vaishali sorry it is hard for me to get the timing , would you mind add that for me , thanks a lot
– W-B
33 mins ago
Good solutions, they should be really fast as well. May be add time comparison as OP is looking for the most efficient solution
– Vaishali
45 mins ago
Good solutions, they should be really fast as well. May be add time comparison as OP is looking for the most efficient solution
– Vaishali
45 mins ago
@Vaishali sorry it is hard for me to get the timing , would you mind add that for me , thanks a lot
– W-B
33 mins ago
@Vaishali sorry it is hard for me to get the timing , would you mind add that for me , thanks a lot
– W-B
33 mins ago
add a comment |
up vote
4
down vote
One more method of doing so could be.
df['C'] = i.ne(df.A.shift()).cumsum()-1
df
When we print df
value it will be as follows.
Index A B C
0 0 foo 3 0
1 1 foo 2 0
2 2 foo 5 0
3 3 bar 3 1
4 4 bar 4 1
5 5 baz 5 2
Explanation of solution: Let's break above solution into parts for understanding purposes.
1st step: Compare df's A column by shifting its value down to itself as follows.
i.ne(df.A.shift())
Output we will get is:
0 True
1 False
2 False
3 True
4 False
5 True
2nd step: Use of cumsum()
function, so wherever TRUE value is coming(which will come when a match of A column and its shift is NOT found) it will call cumsum()
function and its value will be increased.
i.ne(df.A.shift()).cumsum()-1
0 0
1 0
2 0
3 1
4 1
5 2
Name: A, dtype: int32
3rd step: Save command's value into df['C']
which will create a new column named C
in df
.
1
Nice method ve++ for you
– W-B
1 hour ago
@W-B, thank you for encouragement sir, ++ve for your unique style already :)
– RavinderSingh13
1 hour ago
add a comment |
up vote
4
down vote
One more method of doing so could be.
df['C'] = i.ne(df.A.shift()).cumsum()-1
df
When we print df
value it will be as follows.
Index A B C
0 0 foo 3 0
1 1 foo 2 0
2 2 foo 5 0
3 3 bar 3 1
4 4 bar 4 1
5 5 baz 5 2
Explanation of solution: Let's break above solution into parts for understanding purposes.
1st step: Compare df's A column by shifting its value down to itself as follows.
i.ne(df.A.shift())
Output we will get is:
0 True
1 False
2 False
3 True
4 False
5 True
2nd step: Use of cumsum()
function, so wherever TRUE value is coming(which will come when a match of A column and its shift is NOT found) it will call cumsum()
function and its value will be increased.
i.ne(df.A.shift()).cumsum()-1
0 0
1 0
2 0
3 1
4 1
5 2
Name: A, dtype: int32
3rd step: Save command's value into df['C']
which will create a new column named C
in df
.
1
Nice method ve++ for you
– W-B
1 hour ago
@W-B, thank you for encouragement sir, ++ve for your unique style already :)
– RavinderSingh13
1 hour ago
add a comment |
up vote
4
down vote
up vote
4
down vote
One more method of doing so could be.
df['C'] = i.ne(df.A.shift()).cumsum()-1
df
When we print df
value it will be as follows.
Index A B C
0 0 foo 3 0
1 1 foo 2 0
2 2 foo 5 0
3 3 bar 3 1
4 4 bar 4 1
5 5 baz 5 2
Explanation of solution: Let's break above solution into parts for understanding purposes.
1st step: Compare df's A column by shifting its value down to itself as follows.
i.ne(df.A.shift())
Output we will get is:
0 True
1 False
2 False
3 True
4 False
5 True
2nd step: Use of cumsum()
function, so wherever TRUE value is coming(which will come when a match of A column and its shift is NOT found) it will call cumsum()
function and its value will be increased.
i.ne(df.A.shift()).cumsum()-1
0 0
1 0
2 0
3 1
4 1
5 2
Name: A, dtype: int32
3rd step: Save command's value into df['C']
which will create a new column named C
in df
.
One more method of doing so could be.
df['C'] = i.ne(df.A.shift()).cumsum()-1
df
When we print df
value it will be as follows.
Index A B C
0 0 foo 3 0
1 1 foo 2 0
2 2 foo 5 0
3 3 bar 3 1
4 4 bar 4 1
5 5 baz 5 2
Explanation of solution: Let's break above solution into parts for understanding purposes.
1st step: Compare df's A column by shifting its value down to itself as follows.
i.ne(df.A.shift())
Output we will get is:
0 True
1 False
2 False
3 True
4 False
5 True
2nd step: Use of cumsum()
function, so wherever TRUE value is coming(which will come when a match of A column and its shift is NOT found) it will call cumsum()
function and its value will be increased.
i.ne(df.A.shift()).cumsum()-1
0 0
1 0
2 0
3 1
4 1
5 2
Name: A, dtype: int32
3rd step: Save command's value into df['C']
which will create a new column named C
in df
.
edited 1 hour ago
answered 1 hour ago
RavinderSingh13
25k41437
25k41437
1
Nice method ve++ for you
– W-B
1 hour ago
@W-B, thank you for encouragement sir, ++ve for your unique style already :)
– RavinderSingh13
1 hour ago
add a comment |
1
Nice method ve++ for you
– W-B
1 hour ago
@W-B, thank you for encouragement sir, ++ve for your unique style already :)
– RavinderSingh13
1 hour ago
1
1
Nice method ve++ for you
– W-B
1 hour ago
Nice method ve++ for you
– W-B
1 hour ago
@W-B, thank you for encouragement sir, ++ve for your unique style already :)
– RavinderSingh13
1 hour ago
@W-B, thank you for encouragement sir, ++ve for your unique style already :)
– RavinderSingh13
1 hour ago
add a comment |
Thanks for contributing an answer to Stack Overflow!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Some of your past answers have not been well-received, and you're in danger of being blocked from answering.
Please pay close attention to the following guidance:
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53772121%2fa-pythonic-and-ufunc-y-way-to-turn-pandas-column-into-increasing-index%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown