lesson02练习解答
## 第2课 练习### 1. 给akshare加上自动延时等待
如果我们通过akshare高频度地获取数据,就有可能被服务器封IP。不过幸运地是,通过观察,我们发现服务器总是事先给出警告,如果在这种情况下,我们能主动加上延时等待,过一段时间后再试,则一般不会有问题。尝试使用retry这个库,写一段获取股票行情的代码,在akshare失败时,自动进行延时重试。延时重试的时间分别是1, 2, 4, 8
提示: retry库可以通过 `pip install retry`来安装。
```python
%pip install retry
```
```python
from retry import retry
import logging
import sys
logging.basicConfig()
logger = logging.getLogger("test")
logger.addHandler(logging.StreamHandler(stream=sys.stdout))
@retry(Exception, tries=5, backoff=2, delay=1, logger=logger)
def foo_may_go_wrong():
raise ValueError("something went wrong")
# 你将看到在分别等待1, 2, 4, 8秒之后,结束重试,报告错误
foo_may_go_wrong()
```
```
something went wrong, retrying in 1 seconds...
WARNING:test:something went wrong, retrying in 1 seconds...
something went wrong, retrying in 2 seconds...
WARNING:test:something went wrong, retrying in 2 seconds...
something went wrong, retrying in 4 seconds...
WARNING:test:something went wrong, retrying in 4 seconds...
something went wrong, retrying in 8 seconds...
WARNING:test:something went wrong, retrying in 8 seconds...
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
Cell In, line 16
13 raise ValueError("something went wrong")
15 # 你将看到在分别等待1, 2, 4, 8秒之后,结束重试,报告错误
---> 16 foo_may_go_wrong()
File ~\software\anaconda3\envs\lianghua\Lib\site-packages\decorator.py:232, in decorate.<locals>.fun(*args, **kw)
230 if not kwsyntax:
231 args, kw = fix(args, kw, sig)
--> 232 return caller(func, *(extras + args), **kw)
File ~\software\anaconda3\envs\lianghua\Lib\site-packages\retry\api.py:73, in retry.<locals>.retry_decorator(f, *fargs, **fkwargs)
71 args = fargs if fargs else list()
72 kwargs = fkwargs if fkwargs else dict()
---> 73 return __retry_internal(partial(f, *args, **kwargs), exceptions, tries, delay, max_delay, backoff, jitter,
74 logger)
File ~\software\anaconda3\envs\lianghua\Lib\site-packages\retry\api.py:33, in __retry_internal(f, exceptions, tries, delay, max_delay, backoff, jitter, logger)
31 while _tries:
32 try:
---> 33 return f()
34 except exceptions as e:
35 _tries -= 1
Cell In, line 13, in foo_may_go_wrong()
11 @retry(Exception, tries=5, backoff=2, delay=1, logger=logger)
12 def foo_may_go_wrong():
---> 13 raise ValueError("something went wrong")
ValueError: something went wrong
```
### 2. fix tqdm在无终端环境下不能使用的bug
tqdm在运行时,会需要一个终端,否则会报错。我们可以通过将上下文中的 stderr 替换成为StringIO对象,即可让其运行。
提示:需要用到 contextlib.redirect_stderr。这道题不要求大家复现错误情况(比较难),但要求能写出正确代码并运行成功。
```python
from tqdm import trange
import io
import contextlib
import time
print("show a normal progress bar below...")
for i in trange(10):
pass
print("replace stderr with StringIO(), you'll see no progess bar anymore")
def foo_calling_tqdm():
with contextlib.redirect_stderr(io.StringIO()):
for i in trange(5):
time.sleep(1)
pass
foo_calling_tqdm()
```
```
show a normal progress bar below...
100%|██████████| 10/10
replace stderr with StringIO(), you'll see no progess bar anymore
```
**解释**
当我们调用 `for i in trange(10)`时,就会出现一个进度条。在示例中,我们两次进行了这一调用。但第二次,由于我们使用字符串缓冲区替换了标准输出,进度条就没有显示了,也没有报错。
这个结果表明,通过这样的方法,也可以使akshaer在没有终端的情况下工作。
### 3. 前后复权价格相互推导
通过 akshare 获取某支股票的历史数据两次,一次为不复权,一次为后复权,算出它的复权因子,以此推导出前复权价格。使用akshare再获取一次前复权数据,与你推导出来的数据进行对比。
提示:通过akshare获取的平安银行的后复权数据不正确。我们同时查看了新浪服务器、同花顺行情软件和东方财富,发现他们的后复权价格都不一致。
```python
import akshare as ak
import numpy as np
code = "000001"
start = "20220103"
end = "20230414"
# 不复权价格
no_adjust = ak.stock_zh_a_hist(code, "daily", start, end)["收盘"].to_numpy()
# 后复权价格
adjust_forward = ak.stock_zh_a_hist(code, "daily", start, end, "hfq")["收盘"].to_numpy()
# 计算前复权价格: 将后复权价格序列进行归一化,再乘以现价
adjust_backward_hat = adjust_forward / adjust_forward[-1] * no_adjust[-1]
# 通过akshare获得的前复权价格
adjust_backward = ak.stock_zh_a_hist(code, "daily", start, end, "qfq")["收盘"].to_numpy()
# 比较计算出来的前复权价格是否与获得的前复权价格相等
# 这是两个序列(数组)之间的比较,我们使用numpy.testing.assert_array_almost_equal方法来比较。
import numpy as np
try:
np.testing.assert_array_almost_equal(
adjust_backward_hat[-10:], adjust_backward[-10:], decimal=2
)
except AssertionError as e:
print("计算出来的前复权价格与获取的前复权价格不相等" + str(e))
```
```
计算出来的前复权价格与获取的前复权价格不相等
Arrays are not almost equal to 2 decimals
Mismatched elements: 10 / 10 (100%)
Max absolute difference among violations: 1.01906651
Max relative difference among violations: 0.08876886
ACTUAL: array([12.54, 12.67, 12.65, 12.59, 12.63, 12.68, 12.55, 12.5 , 12.57,
12.69])
DESIRED: array([11.53, 11.67, 11.65, 11.58, 11.62, 11.68, 11.54, 11.48, 11.56,
11.69])
```
哪里出错了?难道前后复权之间不是线性变换吗?
产生这个结果的原因,是akshare在按前复权取数据时,它的前复权不是从我们传入的 `end`那一天开始计算的,而是从我们调用API时的时间起算的,所以,我们直接调用API获得的前复权数据,在 `end`那一天,就可能不等于以不复权方式得到的 `end`时的数据,而是要小一些。
我们通过这个例子提示大家,在使用数据源时,或者在做量化时,seeing is not believing。很多细节,特别是第三方的库,在文档上不会介绍那么细致,我们需要自己对其进行完整的测试,确定其行为之后再使用。否则,这些错误经累积传递下来,会导致我们的工作建立在错误的错误之上,得不到正确的结果。
### 4. 为CAPM模型提供数据
资本资产定价模型(CAPM)描述了资产的预期回报与市场系统风险之间的关系。它由威廉.夏普等人在1960年代提出。威廉.夏普获得了1990年的诺贝尔经济学奖。CAPM被认为是经济学的七个基本理论之一。
CAPM表示资产的预期收益等于无风险收益加上风险溢价。 CAPM的假设是投资者是理性的,希望获得最大化回报并尽可能降低风险。因此,CAPM 的目标是计算相对于无风险利率的给定风险溢价,投资者可以预期获得的回报。
在CAPM模型中,我们需要以下数据:
1. 国债收益率
2. 指数行情数据及成分股信息
3. 个股数据
在本题中,我们不要求理解CAPM模型,只需要能为模型提供数据即可。
#### 4.1 获取国债收益率
要求以当前时间为准,向前取一年的国债收益数据,用其平均数作为收益率。
```python
# 请在此写下你的代码
import akshare as ak
import arrow
import numpy as np
import random
import pandas as pd
random.seed(78)
now = arrow.now()
start = now.shift(years=-1)
end = f"{now.year}{now.month:02d}{now.day:02d}"
start = f"{start.year}{start.month:02d}{start.day:02d}"
bond = ak.bond_china_yield(start_date=start, end_date=end)
bond.set_index(keys='曲线名称', inplace=True)
bond
```
<div>
<style scoped>
.dataframe tbody tr th:only-of-type {
vertical-align: middle;
}
```
.dataframe tbody tr th {
vertical-align: top;
}
.dataframe thead th {
text-align: right;
}
```
</style>
<table border="1" class="dataframe">
<thead>
<tr style="text-align: right;">
<th></th>
<th>日期</th>
<th>3月</th>
<th>6月</th>
<th>1年</th>
<th>3年</th>
<th>5年</th>
<th>7年</th>
<th>10年</th>
<th>30年</th>
</tr>
<tr>
<th>曲线名称</th>
<th></th>
<th></th>
<th></th>
<th></th>
<th></th>
<th></th>
<th></th>
<th></th>
<th></th>
</tr>
</thead>
<tbody>
<tr>
<th>中债中短期票据收益率曲线(AAA)</th>
<td>2023-09-04</td>
<td>2.1850</td>
<td>2.2844</td>
<td>2.4498</td>
<td>2.7611</td>
<td>2.9544</td>
<td>3.0965</td>
<td>3.1503</td>
<td>NaN</td>
</tr>
<tr>
<th>中债商业银行普通债收益率曲线(AAA)</th>
<td>2023-09-04</td>
<td>2.0385</td>
<td>2.2395</td>
<td>2.3659</td>
<td>2.6721</td>
<td>2.7264</td>
<td>2.8224</td>
<td>2.9037</td>
<td>3.1167</td>
</tr>
<tr>
<th>中债国债收益率曲线</th>
<td>2023-09-04</td>
<td>1.6526</td>
<td>1.9795</td>
<td>1.9822</td>
<td>2.2752</td>
<td>2.4691</td>
<td>2.6249</td>
<td>2.6124</td>
<td>2.9611</td>
</tr>
<tr>
<th>中债国债收益率曲线</th>
<td>2023-09-05</td>
<td>1.6186</td>
<td>1.9587</td>
<td>1.9929</td>
<td>2.2704</td>
<td>2.4697</td>
<td>2.6224</td>
<td>2.6150</td>
<td>2.9666</td>
</tr>
<tr>
<th>中债中短期票据收益率曲线(AAA)</th>
<td>2023-09-05</td>
<td>2.1801</td>
<td>2.2941</td>
<td>2.4545</td>
<td>2.7563</td>
<td>2.9571</td>
<td>3.1052</td>
<td>3.1555</td>
<td>NaN</td>
</tr>
<tr>
<th>...</th>
<td>...</td>
<td>...</td>
<td>...</td>
<td>...</td>
<td>...</td>
<td>...</td>
<td>...</td>
<td>...</td>
<td>...</td>
</tr>
<tr>
<th>中债商业银行普通债收益率曲线(AAA)</th>
<td>2024-09-02</td>
<td>1.8257</td>
<td>1.9364</td>
<td>1.9565</td>
<td>2.0424</td>
<td>2.1155</td>
<td>2.2025</td>
<td>2.3300</td>
<td>2.5247</td>
</tr>
<tr>
<th>中债国债收益率曲线</th>
<td>2024-09-02</td>
<td>1.4566</td>
<td>1.4889</td>
<td>1.4650</td>
<td>1.6157</td>
<td>1.8085</td>
<td>2.0483</td>
<td>2.1457</td>
<td>2.3425</td>
</tr>
<tr>
<th>中债中短期票据收益率曲线(AAA)</th>
<td>2024-09-03</td>
<td>1.9470</td>
<td>2.0018</td>
<td>2.0300</td>
<td>2.1003</td>
<td>2.2204</td>
<td>2.2585</td>
<td>2.4600</td>
<td>NaN</td>
</tr>
<tr>
<th>中债商业银行普通债收益率曲线(AAA)</th>
<td>2024-09-03</td>
<td>1.8184</td>
<td>1.9363</td>
<td>1.9557</td>
<td>2.0388</td>
<td>2.1090</td>
<td>2.1964</td>
<td>2.3302</td>
<td>2.5247</td>
</tr>
<tr>
<th>中债国债收益率曲线</th>
<td>2024-09-03</td>
<td>1.4457</td>
<td>1.4796</td>
<td>1.4300</td>
<td>1.5705</td>
<td>1.7853</td>
<td>2.0419</td>
<td>2.1432</td>
<td>2.3390</td>
</tr>
</tbody>
</table>
<p>756 rows × 9 columns</p>
</div>
代码运行结果应该类似于:

这样我们就得到了近一年的各种债券收益率。我们可以用'中债国债收益率曲线'一年期的平均值当年化无风险收益率:
```python
rf = bond['1年'].mean()
print(rf)
rf = rf / 100
```
```
1.871846031746032
```
在2023年10月24日,rf输出大约在0.0207左右。
在2024年9月3日,rf输出大约在0.0187左右。
#### 4.2 获取沪深300的行情数据
接下来,我们通过akshare获取过去一年的沪深300的行情数据:
```python
# 请在此写下你的代码。沪深300行情数据使用变量hs300来引用。
import akshare as ak
hs300 = ak.stock_zh_index_daily(symbol="sz399300")
hs300.index = pd.to_datetime(hs300["date"])
print(hs300)
```
```
date open high low close volume
date
2002-01-042002-01-041316.4551316.4551316.4551316.455 0
2002-01-072002-01-071302.0841302.0841302.0841302.084 0
2002-01-082002-01-081292.7141292.7141292.7141292.714 0
2002-01-092002-01-091272.6451272.6451272.6451272.645 0
2002-01-102002-01-101281.2611281.2611281.2611281.261 0
... ... ... ... ... ... ...
2024-08-282024-08-283298.9093305.3503274.5883286.496 9269665000
2024-08-292024-08-293273.5413290.7863269.5293277.68113070233300
2024-08-302024-08-303273.7533351.6253273.6503321.43218869544300
2024-09-022024-09-023307.5433307.7633264.7573265.01114761235300
2024-09-032024-09-033261.7413281.5943261.1883273.42812289730600
```
我们根据你获得的hs300数据,来查看下沪深300的年化收益。
```python
import arrow
import numpy as np
end = arrow.get("2024-09-03")
year_ago = end.shift(years = -1)
year_ago = hs300.index
print(year_ago)
# 计算买入并持有的收益(最近一年)
buy_price = hs300.iloc["close"]
buy_and_hold = hs300["close"][-1]/buy_price - 1
print(f"买入并持收益:{buy_and_hold:.2%}")
# 通过均值推算年化收益
market_returns = hs300["close"].pct_change().dropna()
market_annual = (1 + market_returns.mean()) ** 242 - 1
print(f"年化收益: {market_annual:.2%}")
```
```
2023-09-04 00:00:00
买入并持收益:-14.95%
年化收益: -12.87%
C:\Users\Administrator\AppData\Local\Temp\ipykernel_5664\1405670539.py:7: UserWarning: no explicit representation of timezones available for np.datetime64
year_ago = hs300.index
C:\Users\Administrator\AppData\Local\Temp\ipykernel_5664\1405670539.py:12: FutureWarning: Series.__getitem__ treating keys as positions is deprecated. In a future version, integer keys will always be treated as labels (consistent with DataFrame behavior). To access a value by position, use `ser.iloc`
buy_and_hold = hs300["close"][-1]/buy_price - 1
```
通过两种方法进行推算得到,过去一年沪深300的收益大约在-15%左右。
接下来,我们获取沪深300成份股,以便从中抽取个股进行检验。请使用变量名 index_stock_cons_df 来保存你获取的成份股数据。
```python
# 在此写下你的代码
import akshare as ak
index_stock_cons_df = ak.index_stock_cons(symbol="399300")
# 打印结果
print(index_stock_cons_df)
```
最终,我们应该得到类似输出:
```
品种代码品种名称 纳入日期
0 001965招商公路2024-06-17
1 000807云铝股份2024-06-17
2 300442润泽科技2024-06-17
3 600415小商品城2024-06-17
4 603296华勤技术2024-06-17
.. ... ... ...
295600660福耀玻璃2005-04-08
296600690青岛海尔2005-04-08
297600741巴士股份2005-04-08
298600795国电电力2005-04-08
299600900长江电力2005-04-08
```
接下来,我们随机取10支股票,获取行情,并计算每日收益率:
```python
# 随机抽取10支股票,获取行情并计算每日收益
# 请在此写下你的代码
np.random.seed(78)
stocks = random.sample(index_stock_cons_df['品种代码'].to_list(), 10)
frames = {}
now = arrow.now()
start = now.shift(years = -1)
end = now.format("YYYYMMDD")
start = start.format("YYYYMMDD")
print(end)
print(start)
# 获取 10 支股票的行情数据
for code in stocks:
bars = ak.stock_zh_a_hist(symbol=code, period="daily", start_date=start, end_date=end, adjust="qfq")
bars.index = pd.to_datetime(bars["日期"])
frames = bars["收盘"]
# 与指数行情数据合并
start = np.datetime64(now.shift(years = -1))
frames["399300"] = hs300["close"]
df = pd.DataFrame(frames)
# 计算每日收益
returns = df.pct_change()
# 如果存在 NAN,则后面的回归法将无法聚合
returns.dropna(how='any', inplace=True)
returns.head().style.format('{:,.2%}')
```
为了后面的代码能够运行,请将行情数据保存在df变量中,每日收益保存在returns中。最终,returns应该是类似下面的结果:
```
688396 600941 600346 002410 000776 601916 300896 600332 601601 601788 399300
2023-09-05 00:00:00 -1.29% -1.06% 0.55% -0.62% -1.13% -1.22% -0.69% -0.43% -1.26% -1.41% -0.74%
2023-09-06 00:00:00 -1.46% -1.02% 1.17% 1.05% -0.34% 0.00% -0.22% -0.13% 1.59% 0.12% -0.22%
2023-09-07 00:00:00 -1.92% 0.28% -0.95% -1.16% -1.01% -0.41% -2.59% -1.73% -0.48% 0.42% -1.40%
2023-09-08 00:00:00 -0.59% 1.48% -0.76% -1.17% 0.34% -0.41% -1.63% -0.34% -0.27% 0.59% -0.49%
2023-09-11 00:00:00 0.07% 0.85% 0.07% 2.30% 1.29% 0.42% 3.18% 2.10% -1.37% 0.94% 0.74%
```
提示,你可能需要使用这些方法:
1. random.sample
2. stock_zh_a_hist
3. pd.to_datetime
4. np.datetime64
5. df.pct_change
6. df.dropna
7. df.style.format
接下来,我们就可以实施CAPM模型。这里我们仅用多项式拟合法。
```python
cols = df.columns
betas = {}
for name in cols:
beta, alpha = np.polyfit(returns, returns["399300"], deg=1)
print(name, f"{beta:.2%} {alpha:.2%}")
betas = beta
```
```
688396 24.96% -0.02%
600941 1.46% -0.06%
600346 30.01% -0.06%
002410 18.09% 0.00%
000776 58.21% -0.02%
601916 33.28% -0.07%
300896 15.09% -0.02%
600332 36.72% -0.05%
601601 23.79% -0.07%
601788 33.63% -0.05%
399300 100.00% 0.00%
```
最好的一支是002410。我们就看看,如果买入这一支,它的alpha和beta是多少:
```python
code = "002410"
beta = betas
# 回归法得到的预期收益
expected_return = rf + beta * (market_annual - rf)
print(f"code beta: {beta:.2f}, Er: {expected_return:.2%}")
```
```
code beta: 0.18, Er: -0.80%
```
页:
[1]