制作一个基于条件返回长度的函数。

最后发布: 2020-07-09


问题

我有2个DataFrames - 1个包含股票代码和最大最小价格范围以及其他列。

另一个DataFrame有日期作为指数,并按股票代码分组,有各种指标,如开盘,收盘,高低等。现在,我想从这个DataFrame中统计一个给定股票的收盘价高于最低价的天数。

我被卡在这里:现在我想找到例如AMZN有多少天的交易价格低于期间的最高价格。

我想根据第一个数据帧的值,从第二个数据帧中统计天数,统计收盘价小于MaxMin期价的天数。

我已经添加了代码来重现DataFrames。

请检查屏幕截图。First DataFrame Second DataFrame

import pandas as pd
import datetime
from dateutil.relativedelta import relativedelta
import yfinance as yf

start=datetime.datetime.today()-relativedelta(years=2)
end=datetime.datetime.today()

us_stock_list='FB AMZN BABA'
data_metric = yf.download(us_stock_list, start=start, end=end,group_by='column',auto_adjust=True)
data_ticker= yf.download(us_stock_list, start=start, end=end,group_by='ticker',auto_adjust=True)

stock_list=[stock for stock in data_ticker.stack()]

# max_price
max_values=pd.DataFrame(data_ticker.max().unstack()['High'])
# min_price
min_values=pd.DataFrame(data_ticker.min().unstack()['Low'])


# latest_price
latest_day=pd.DataFrame(data_ticker.tail(1).unstack())
latest_day=latest_day.unstack().unstack().unstack().reset_index()

# latest_day=latest_day.unstack().reset_index()
latest_day=latest_day.drop(columns=['level_0','Date'])
latest_day.set_index('level_3',inplace=True)

latest_day.rename(columns={0:'Values'},inplace=True)

latest_day=latest_day.groupby(by=['level_3','level_2']).max().unstack()

latest_day.columns=[ '_'.join(x) for x in latest_day.columns ]

latest_day=latest_day.join(max_values,how='inner')

latest_day=latest_day.join(min_values,how='inner')

latest_day.rename(columns={'High':'Period_High_Max','Low':'Period_Low_Min'},inplace=True)

close_price_data=pd.DataFrame(data_metric['Close'].unstack().reset_index())
close_price_data= close_price_data.rename(columns={'level_0':'Stock',0:'Close_price'})
close_price_data.set_index('Stock',inplace=True)

使用这个来重现。

{"Values_Close":{"AMZN":2286.0400390625,"BABA":194.4799957275,"FB":202.2700042725},"Values_High":{"AMZN":2362.4399414062,"BABA":197.3800048828,"FB":207.2799987793},"Values_Low":{"AMZN":2258.1899414062,"BABA":192.8600006104,"FB":199.0500030518},"Values_Open":{"AMZN":2336.8000488281,"BABA":195.75,"FB":201.6000061035},"Values_Volume":{"AMZN":9754900.0,"BABA":22268800.0,"FB":30399600.0},"Period_High_Max":{"AMZN":2475.0,"BABA":231.1399993896,"FB":224.1999969482},"Period_Low_Min":{"AMZN":1307.0,"BABA":129.7700042725,"FB":123.0199966431},"%_Position":{"AMZN":0.8382192115,"BABA":0.6383544892,"FB":0.7832576338}}


{"Stock":{
  "0":"AMZN",
  "1":"AMZN",
  "2":"AMZN",
  "3":"AMZN",
  "4":"AMZN",
  "5":"AMZN",
  "6":"AMZN",
  "7":"AMZN",
  "8":"AMZN",
  "9":"AMZN",
  "10":"AMZN",
  "11":"AMZN",
  "12":"AMZN",
  "13":"AMZN",
  "14":"AMZN",
  "15":"AMZN",
  "16":"AMZN",
  "17":"AMZN",
  "18":"AMZN",
  "19":"AMZN"},
"Date":{
  "0":1525305600000,
  "1":1525392000000,
  "2":1525651200000,
  "3":1525737600000,
  "4":1525824000000,
  "5":1525910400000,
  "6":1525996800000,
  "7":1526256000000,
  "8":1526342400000,
  "9":1526428800000,
  "10":1526515200000,
  "11":1526601600000,
  "12":1526860800000,
  "13":1526947200000,
  "14":1527033600000,
  "15":1527120000000,
  "16":1527206400000,
  "17":1527552000000,
  "18":1527638400000,
  "19":1527724800000 },
"Close_price":{
  "0":1572.0799560547,
  "1":1580.9499511719,
  "2":1600.1400146484,
  "3":1592.3900146484,
  "4":1608.0,
  "5":1609.0799560547,
  "6":1602.9100341797,
  "7":1601.5400390625,
  "8":1576.1199951172,
  "9":1587.2800292969,
  "10":1581.7600097656,
  "11":1574.3699951172,
  "12":1585.4599609375,
  "13":1581.4000244141,
  "14":1601.8599853516,
  "15":1603.0699462891,
  "16":1610.1500244141,
  "17":1612.8699951172,
  "18":1624.8900146484,
  "19":1629.6199951172}}
python pandas pandas-groupby
回答

做一个 merge 两个数据帧之间。groupby 公司 level=0)和 apply 一个自定义函数。

df_merge = close_price_data.merge(
    latest_day[['Period_High_Max', 'Period_Low_Min']],
    left_index=True,
    right_index=True)

def fun(df):
    d = {}
    d['days_above_min'] = (df.Close_price > df.Period_Low_Min).sum()
    d['days_below_max'] = (df.Close_price < df.Period_High_Max).sum()

    return pd.Series(d)

df_merge.groupby(level=0).apply(fun)

Period_Low_MinPeriod_High_Max 分别是最小值和最大值,所以所有的收盘价都会在这个范围内,如果这不是你想达到的目的,请告诉我。