How to calculate the average of the most recent three non-nan value using Python

.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty,.everyoneloves__bot-mid-leaderboard:empty{ height:90px;width:728px;box-sizing:border-box;
}

I have a dataframe df looks like the following. I want to calculate the average of the last 3 non nan columns. If there are less than three non-missing columns then the average number is missing.

name day1 day2 day3 day4  day5 day6 day7

A    1     1   nan   2    3    0   3

B    nan   nan nan   nan  nan  nan 3

C    1     1   0     1    1    1   1

D    1     1   0     1    nan  1   4

The expect output should looks like the following

name day1 day2 day3 day4  day5 day6 day7    expected 

A    1     1   nan   2    3    0   3        2     <-  1/3*(day5 + day6 + day7)

B    nan   nan nan   nan  nan  nan 3        nan   <-  less than 3 non-missing

C    1     1   0     1    1    1   1        1     <-  1/3*(day5 + day6 + day7)

D    1     1   0     1    nan  1   4        2    <-  1/3 *(day4 + day6 + day7)

I know how to calculate the average of the last three column and count how many non-missing observation are there.
df.iloc[:, 5:7].count(axis=1) average of the last three column
df.iloc[:, 5:7].count(axis=1) number of non-nan in the last three column

If there are less than 3 non-missing observation, I know how to set the average value to missing using df.iloc[:, 1:7].count(axis=1) <= 3.

But I am struggling to find a way to calculate the average of the last three non-missing columns. Can anyone teach me how to solve this please?

edited Dec 26 '18 at 20:56

asked Dec 26 '18 at 20:44

fly36

2081313

add a comment |

I have a dataframe df looks like the following. I want to calculate the average of the last 3 non nan columns. If there are less than three non-missing columns then the average number is missing.

name day1 day2 day3 day4  day5 day6 day7

A    1     1   nan   2    3    0   3

B    nan   nan nan   nan  nan  nan 3

C    1     1   0     1    1    1   1

D    1     1   0     1    nan  1   4

The expect output should looks like the following

name day1 day2 day3 day4  day5 day6 day7    expected 

A    1     1   nan   2    3    0   3        2     <-  1/3*(day5 + day6 + day7)

B    nan   nan nan   nan  nan  nan 3        nan   <-  less than 3 non-missing

C    1     1   0     1    1    1   1        1     <-  1/3*(day5 + day6 + day7)

D    1     1   0     1    nan  1   4        2    <-  1/3 *(day4 + day6 + day7)

If there are less than 3 non-missing observation, I know how to set the average value to missing using df.iloc[:, 1:7].count(axis=1) <= 3.

But I am struggling to find a way to calculate the average of the last three non-missing columns. Can anyone teach me how to solve this please?

edited Dec 26 '18 at 20:56

asked Dec 26 '18 at 20:44

fly36

2081313

add a comment |

I have a dataframe df looks like the following. I want to calculate the average of the last 3 non nan columns. If there are less than three non-missing columns then the average number is missing.

name day1 day2 day3 day4  day5 day6 day7

A    1     1   nan   2    3    0   3

B    nan   nan nan   nan  nan  nan 3

C    1     1   0     1    1    1   1

D    1     1   0     1    nan  1   4

The expect output should looks like the following

name day1 day2 day3 day4  day5 day6 day7    expected 

A    1     1   nan   2    3    0   3        2     <-  1/3*(day5 + day6 + day7)

B    nan   nan nan   nan  nan  nan 3        nan   <-  less than 3 non-missing

C    1     1   0     1    1    1   1        1     <-  1/3*(day5 + day6 + day7)

D    1     1   0     1    nan  1   4        2    <-  1/3 *(day4 + day6 + day7)

If there are less than 3 non-missing observation, I know how to set the average value to missing using df.iloc[:, 1:7].count(axis=1) <= 3.

But I am struggling to find a way to calculate the average of the last three non-missing columns. Can anyone teach me how to solve this please?

edited Dec 26 '18 at 20:56

asked Dec 26 '18 at 20:44

fly36

2081313

I have a dataframe df looks like the following. I want to calculate the average of the last 3 non nan columns. If there are less than three non-missing columns then the average number is missing.

name day1 day2 day3 day4  day5 day6 day7

A    1     1   nan   2    3    0   3

B    nan   nan nan   nan  nan  nan 3

C    1     1   0     1    1    1   1

D    1     1   0     1    nan  1   4

The expect output should looks like the following

name day1 day2 day3 day4  day5 day6 day7    expected 

A    1     1   nan   2    3    0   3        2     <-  1/3*(day5 + day6 + day7)

B    nan   nan nan   nan  nan  nan 3        nan   <-  less than 3 non-missing

C    1     1   0     1    1    1   1        1     <-  1/3*(day5 + day6 + day7)

D    1     1   0     1    nan  1   4        2    <-  1/3 *(day4 + day6 + day7)

If there are less than 3 non-missing observation, I know how to set the average value to missing using df.iloc[:, 1:7].count(axis=1) <= 3.

But I am struggling to find a way to calculate the average of the last three non-missing columns. Can anyone teach me how to solve this please?

python pandas numpy

edited Dec 26 '18 at 20:56

asked Dec 26 '18 at 20:44

fly36

2081313

edited Dec 26 '18 at 20:56

asked Dec 26 '18 at 20:44

fly36

2081313

edited Dec 26 '18 at 20:56

asked Dec 26 '18 at 20:44

fly36

2081313

asked Dec 26 '18 at 20:44

fly36

2081313

asked Dec 26 '18 at 20:44

fly36

2081313

add a comment |

3 Answers
3

active

oldest

votes

Vectorized one using justify -

N = 3 # last N entries for averaging

avg = np.mean(justify(df.values,invalid_val=np.nan,axis=1, side='right')[:,-N:],1)

df['expected'] = avg

edited Dec 26 '18 at 21:02

answered Dec 26 '18 at 20:58

Divakar

160k1492184

2

I know I will see justify here :-)

– Wen-Ben
Dec 26 '18 at 21:00

add a comment |

You can use pd.DataFrame.apply with a custom function. This is only partially vectorised.

def mean_calculator(row):

    non_nulls = row.notnull()

    if non_nulls.sum() < 3:

        return np.nan

    return row[non_nulls].values[-3:].mean()



df['expected'] = df.iloc[:, 1:].apply(mean_calculator, axis=1)



print(df)



  name  day1  day2  day3  day4  day5  day6  day7  expected

0    A   1.0   1.0   NaN   2.0   3.0   0.0     3       2.0

1    B   NaN   NaN   NaN   NaN   NaN   NaN     3       NaN

2    C   1.0   1.0   0.0   1.0   1.0   1.0     1       1.0

3    D   1.0   1.0   0.0   1.0   NaN   1.0     4       2.0

answered Dec 26 '18 at 20:56

jpp

103k2167117

add a comment |

You can start by calculating the expected column using applying the following function:

expected = df.apply(lambda x: x[~x.isnull()][-3:].mean(), axis = 1)

And insert these values in the columns that have at least 3 valid values:

m = df.isnull().sum(axis=1) > 3

df.loc[~m,'expected'] = expected.mask(m)



       day1  day2  day3  day4  day5  day6  day7  expected

name                                                    

A      1.0   1.0   NaN   2.0   3.0   0.0     3       2.0

B      NaN   NaN   NaN   NaN   NaN   NaN     3       NaN

C      1.0   1.0   0.0   1.0   1.0   1.0     1       1.0

D      1.0   1.0   0.0   1.0   NaN   1.0     4       2.0

edited Dec 26 '18 at 21:29

answered Dec 26 '18 at 20:58

yatu

15.8k41642

add a comment |

Your Answer

StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");

StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});

}
});

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53936985%2fhow-to-calculate-the-average-of-the-most-recent-three-non-nan-value-using-python%23new-answer', 'question_page');
}
);

Post as a guest

Name

Required, but never shown

3 Answers
3

active

oldest

votes

3 Answers
3

active

oldest

votes

Vectorized one using justify -

N = 3 # last N entries for averaging

avg = np.mean(justify(df.values,invalid_val=np.nan,axis=1, side='right')[:,-N:],1)

df['expected'] = avg

edited Dec 26 '18 at 21:02

answered Dec 26 '18 at 20:58

Divakar

160k1492184

2

I know I will see justify here :-)

– Wen-Ben
Dec 26 '18 at 21:00

add a comment |

Vectorized one using justify -

N = 3 # last N entries for averaging

avg = np.mean(justify(df.values,invalid_val=np.nan,axis=1, side='right')[:,-N:],1)

df['expected'] = avg

edited Dec 26 '18 at 21:02

answered Dec 26 '18 at 20:58

Divakar

160k1492184

2

I know I will see justify here :-)

– Wen-Ben
Dec 26 '18 at 21:00

add a comment |

Vectorized one using justify -

N = 3 # last N entries for averaging

avg = np.mean(justify(df.values,invalid_val=np.nan,axis=1, side='right')[:,-N:],1)

df['expected'] = avg

edited Dec 26 '18 at 21:02

answered Dec 26 '18 at 20:58

Divakar

160k1492184

Vectorized one using justify -

N = 3 # last N entries for averaging

avg = np.mean(justify(df.values,invalid_val=np.nan,axis=1, side='right')[:,-N:],1)

df['expected'] = avg

edited Dec 26 '18 at 21:02

answered Dec 26 '18 at 20:58

Divakar

160k1492184

edited Dec 26 '18 at 21:02

answered Dec 26 '18 at 20:58

Divakar

160k1492184

answered Dec 26 '18 at 20:58

Divakar

160k1492184

answered Dec 26 '18 at 20:58

Divakar

160k1492184

2

I know I will see justify here :-)

– Wen-Ben
Dec 26 '18 at 21:00

add a comment |

2

I know I will see justify here :-)

– Wen-Ben
Dec 26 '18 at 21:00

I know I will see justify here :-)

– Wen-Ben
Dec 26 '18 at 21:00

add a comment |

You can use pd.DataFrame.apply with a custom function. This is only partially vectorised.

def mean_calculator(row):

    non_nulls = row.notnull()

    if non_nulls.sum() < 3:

        return np.nan

    return row[non_nulls].values[-3:].mean()



df['expected'] = df.iloc[:, 1:].apply(mean_calculator, axis=1)



print(df)



  name  day1  day2  day3  day4  day5  day6  day7  expected

0    A   1.0   1.0   NaN   2.0   3.0   0.0     3       2.0

1    B   NaN   NaN   NaN   NaN   NaN   NaN     3       NaN

2    C   1.0   1.0   0.0   1.0   1.0   1.0     1       1.0

3    D   1.0   1.0   0.0   1.0   NaN   1.0     4       2.0

answered Dec 26 '18 at 20:56

jpp

103k2167117

add a comment |

You can use pd.DataFrame.apply with a custom function. This is only partially vectorised.

def mean_calculator(row):

    non_nulls = row.notnull()

    if non_nulls.sum() < 3:

        return np.nan

    return row[non_nulls].values[-3:].mean()



df['expected'] = df.iloc[:, 1:].apply(mean_calculator, axis=1)



print(df)



  name  day1  day2  day3  day4  day5  day6  day7  expected

0    A   1.0   1.0   NaN   2.0   3.0   0.0     3       2.0

1    B   NaN   NaN   NaN   NaN   NaN   NaN     3       NaN

2    C   1.0   1.0   0.0   1.0   1.0   1.0     1       1.0

3    D   1.0   1.0   0.0   1.0   NaN   1.0     4       2.0

answered Dec 26 '18 at 20:56

jpp

103k2167117

add a comment |

You can use pd.DataFrame.apply with a custom function. This is only partially vectorised.

def mean_calculator(row):

    non_nulls = row.notnull()

    if non_nulls.sum() < 3:

        return np.nan

    return row[non_nulls].values[-3:].mean()



df['expected'] = df.iloc[:, 1:].apply(mean_calculator, axis=1)



print(df)



  name  day1  day2  day3  day4  day5  day6  day7  expected

0    A   1.0   1.0   NaN   2.0   3.0   0.0     3       2.0

1    B   NaN   NaN   NaN   NaN   NaN   NaN     3       NaN

2    C   1.0   1.0   0.0   1.0   1.0   1.0     1       1.0

3    D   1.0   1.0   0.0   1.0   NaN   1.0     4       2.0

answered Dec 26 '18 at 20:56

jpp

103k2167117

You can use pd.DataFrame.apply with a custom function. This is only partially vectorised.

def mean_calculator(row):

    non_nulls = row.notnull()

    if non_nulls.sum() < 3:

        return np.nan

    return row[non_nulls].values[-3:].mean()



df['expected'] = df.iloc[:, 1:].apply(mean_calculator, axis=1)



print(df)



  name  day1  day2  day3  day4  day5  day6  day7  expected

0    A   1.0   1.0   NaN   2.0   3.0   0.0     3       2.0

1    B   NaN   NaN   NaN   NaN   NaN   NaN     3       NaN

2    C   1.0   1.0   0.0   1.0   1.0   1.0     1       1.0

3    D   1.0   1.0   0.0   1.0   NaN   1.0     4       2.0

answered Dec 26 '18 at 20:56

jpp

103k2167117

answered Dec 26 '18 at 20:56

jpp

103k2167117

answered Dec 26 '18 at 20:56

jpp

103k2167117

answered Dec 26 '18 at 20:56

jpp

103k2167117

add a comment |

You can start by calculating the expected column using applying the following function:

expected = df.apply(lambda x: x[~x.isnull()][-3:].mean(), axis = 1)

And insert these values in the columns that have at least 3 valid values:

m = df.isnull().sum(axis=1) > 3

df.loc[~m,'expected'] = expected.mask(m)



       day1  day2  day3  day4  day5  day6  day7  expected

name                                                    

A      1.0   1.0   NaN   2.0   3.0   0.0     3       2.0

B      NaN   NaN   NaN   NaN   NaN   NaN     3       NaN

C      1.0   1.0   0.0   1.0   1.0   1.0     1       1.0

D      1.0   1.0   0.0   1.0   NaN   1.0     4       2.0

edited Dec 26 '18 at 21:29

answered Dec 26 '18 at 20:58

yatu

15.8k41642

add a comment |

You can start by calculating the expected column using applying the following function:

expected = df.apply(lambda x: x[~x.isnull()][-3:].mean(), axis = 1)

And insert these values in the columns that have at least 3 valid values:

m = df.isnull().sum(axis=1) > 3

df.loc[~m,'expected'] = expected.mask(m)



       day1  day2  day3  day4  day5  day6  day7  expected

name                                                    

A      1.0   1.0   NaN   2.0   3.0   0.0     3       2.0

B      NaN   NaN   NaN   NaN   NaN   NaN     3       NaN

C      1.0   1.0   0.0   1.0   1.0   1.0     1       1.0

D      1.0   1.0   0.0   1.0   NaN   1.0     4       2.0

edited Dec 26 '18 at 21:29

answered Dec 26 '18 at 20:58

yatu

15.8k41642

add a comment |

You can start by calculating the expected column using applying the following function:

expected = df.apply(lambda x: x[~x.isnull()][-3:].mean(), axis = 1)

And insert these values in the columns that have at least 3 valid values:

m = df.isnull().sum(axis=1) > 3

df.loc[~m,'expected'] = expected.mask(m)



       day1  day2  day3  day4  day5  day6  day7  expected

name                                                    

A      1.0   1.0   NaN   2.0   3.0   0.0     3       2.0

B      NaN   NaN   NaN   NaN   NaN   NaN     3       NaN

C      1.0   1.0   0.0   1.0   1.0   1.0     1       1.0

D      1.0   1.0   0.0   1.0   NaN   1.0     4       2.0

edited Dec 26 '18 at 21:29

answered Dec 26 '18 at 20:58

yatu

15.8k41642

You can start by calculating the expected column using applying the following function:

expected = df.apply(lambda x: x[~x.isnull()][-3:].mean(), axis = 1)

And insert these values in the columns that have at least 3 valid values:

m = df.isnull().sum(axis=1) > 3

df.loc[~m,'expected'] = expected.mask(m)



       day1  day2  day3  day4  day5  day6  day7  expected

name                                                    

A      1.0   1.0   NaN   2.0   3.0   0.0     3       2.0

B      NaN   NaN   NaN   NaN   NaN   NaN     3       NaN

C      1.0   1.0   0.0   1.0   1.0   1.0     1       1.0

D      1.0   1.0   0.0   1.0   NaN   1.0     4       2.0

edited Dec 26 '18 at 21:29

answered Dec 26 '18 at 20:58

yatu

15.8k41642

edited Dec 26 '18 at 21:29

answered Dec 26 '18 at 20:58

yatu

15.8k41642

answered Dec 26 '18 at 20:58

yatu

15.8k41642

answered Dec 26 '18 at 20:58

yatu

15.8k41642

add a comment |

draft saved

draft discarded

Thanks for contributing an answer to Stack Overflow!

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

This page is only for reference, If you need detailed information, please check here

搜尋此網誌

Vrftsjtryk