-
Notifications
You must be signed in to change notification settings - Fork 5.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[xdoctest] reformat example code with google style in paddle/io
#55732
Conversation
你的PR提交成功,感谢你对开源项目的贡献! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM~
有一些小疑问辛苦一师傅解答一下
and 好像ci没过
>>> # doctest: +SKIP | ||
0 1 | ||
1 3 | ||
2 9 | ||
>>> # doctest: -SKIP |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
just一些小疑问
- 为什么这部分要skip
- 我看dataset.py下有的api示例,输出给结果了,有的没给。是否要写结果,要不就根据原示例是否有结果,来决定
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
都听顺师傅的(啊不是,其实我本来也是这么想的)
... | ||
... def __len__(self): | ||
... return self.num_samples | ||
... |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
just一点小疑问,这部分如果去掉 ...
是不是也没问题
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
确实是 ~ xdoctest 可以兼容只用 >>>
,多行也不加 ...
提示。
但是,这只是 xdoctest 的特性,保不齐哪天不用 xdoctest 了,这样可能就挂了~
所以,个人建议,还是以 python 的 doctest 为最低兼容标准吧~
... simple_net.clear_gradients() | ||
... print("Epoch {} batch {}: loss = {}".format(e, i, np.mean(loss.numpy()))) | ||
|
||
Notes: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
其实用 .. note::
也可以(狗头),anyway
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
如果没记错的话,这是当年 @Ligoml 她老人家教导的,要改成这样的 [doge]
当然也有可能是我记错了 [doge][doge][doge]
... | ||
... def __len__(self): | ||
... return self.num_samples | ||
... |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
确实是 ~ xdoctest 可以兼容只用 >>>
,多行也不加 ...
提示。
但是,这只是 xdoctest 的特性,保不齐哪天不用 xdoctest 了,这样可能就挂了~
所以,个人建议,还是以 python 的 doctest 为最低兼容标准吧~
>>> for batch_indices in bs: | ||
... print(batch_indices) | ||
... | ||
>>> # init with sampler | ||
>>> sampler = RandomSampler(RandomDataset(100)) | ||
>>> bs = BatchSampler(sampler=sampler, | ||
... batch_size=8, | ||
... drop_last=True) | ||
... | ||
>>> for batch_indices in bs: | ||
... print(batch_indices) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这部分的 print 要不要加个输出?
... | ||
>>> dataset = RandomDataset(10) | ||
>>> for i in range(len(dataset)): | ||
... print(dataset[i]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
同上,要不要加个输出,或者加个 comment?
... | ||
>>> dataset = RandomDataset(10) | ||
>>> for img, lbl in dataset: | ||
... print(img, lbl) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
同上,要不要加个输出?
另外,numpy 和 paddle 加个 seed,随即数量也改小点:
>>> import numpy as np
>>> import paddle
>>> from paddle.io import IterableDataset
>>> np.random.seed(2023)
>>> paddle.seed(2023)
>>> # define a random dataset
>>> class RandomDataset(IterableDataset):
... def __init__(self, num_samples):
... self.num_samples = num_samples
...
... def __iter__(self):
... for i in range(self.num_samples):
... image = np.random.random([3]).astype('float32')
... label = np.random.randint(0, 9, (1, )).astype('int64')
... yield image, label
...
>>> dataset = RandomDataset(3)
>>> for img, lbl in dataset:
... print(img, lbl)
[0.3219883 0.89042246 0.5880523 ] [3]
[0.04382154 0.766739 0.33634686] [1]
[0.7272747 0.52438736 0.5449352 ] [8]
看看这样行不?之前的也是~
|
||
>>> for i in range(len(dataset)): | ||
... input, label = dataset[i] | ||
... print(input, label) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
同上~
... print(index) | ||
0 | ||
1 | ||
2 | ||
... | ||
99 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这个好!我加到 issue 的描述里面做个范例,后面更新文档也用的上~
... avg_loss.backward() | ||
... opt.minimize(avg_loss) | ||
... simple_net.clear_gradients() | ||
... print("Epoch {} batch {}: loss = {}".format(e, i, np.mean(loss.numpy()))) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这个地方算是一个相对完整的 case,输出加不加我觉得都行~
... # do something | ||
... break |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
或者其他的几个 print 的地方都改成这样也不错~
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
修改了一版,整体对于 image 的都采用这个策略,因为大多数不需要 print,直接加注释 do something
其他的该加输出的也都加了
... print(data) | ||
Tensor(shape=[1, 1], dtype=int64, place=Place(cpu), stop_gradient=True, | ||
[[2]]) | ||
Tensor(shape=[1, 1], dtype=int64, place=Place(cpu), stop_gradient=True, | ||
[[3]]) | ||
Tensor(shape=[1, 1], dtype=int64, place=Place(cpu), stop_gradient=True, | ||
[[4]]) | ||
Tensor(shape=[1, 1], dtype=int64, place=Place(cpu), stop_gradient=True, | ||
[[5]]) | ||
Tensor(shape=[1, 1], dtype=int64, place=Place(cpu), stop_gradient=True, | ||
[[6]]) | ||
Tensor(shape=[1, 1], dtype=int64, place=Place(cpu), stop_gradient=True, | ||
[[7]]) | ||
Tensor(shape=[1, 1], dtype=int64, place=Place(cpu), stop_gradient=True, | ||
[[8]]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这里 num_workers=2,输出行为不可控吧?是不是加个 skip?下面的几个也是~ 我看日志里面 fail 的都是这样~
Tensor(shape=[1, 1], dtype=int64, place=Place(cpu), stop_gradient=True, | ||
[[2]]) | ||
Tensor(shape=[1, 1], dtype=int64, place=Place(cpu), stop_gradient=True, | ||
[[3]]) | ||
Tensor(shape=[1, 1], dtype=int64, place=Place(cpu), stop_gradient=True, | ||
[[4]]) | ||
Tensor(shape=[1, 1], dtype=int64, place=Place(cpu), stop_gradient=True, | ||
[[5]]) | ||
Tensor(shape=[1, 1], dtype=int64, place=Place(cpu), stop_gradient=True, | ||
[[6]]) | ||
Tensor(shape=[1, 1], dtype=int64, place=Place(cpu), stop_gradient=True, | ||
[[7]]) | ||
Tensor(shape=[1, 1], dtype=int64, place=Place(cpu), stop_gradient=True, | ||
[[8]]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
试了下,这儿的输出好像是固定的?
Tensor(shape=[1, 1], dtype=int64, place=Place(cpu), stop_gradient=True, | |
[[2]]) | |
Tensor(shape=[1, 1], dtype=int64, place=Place(cpu), stop_gradient=True, | |
[[3]]) | |
Tensor(shape=[1, 1], dtype=int64, place=Place(cpu), stop_gradient=True, | |
[[4]]) | |
Tensor(shape=[1, 1], dtype=int64, place=Place(cpu), stop_gradient=True, | |
[[5]]) | |
Tensor(shape=[1, 1], dtype=int64, place=Place(cpu), stop_gradient=True, | |
[[6]]) | |
Tensor(shape=[1, 1], dtype=int64, place=Place(cpu), stop_gradient=True, | |
[[7]]) | |
Tensor(shape=[1, 1], dtype=int64, place=Place(cpu), stop_gradient=True, | |
[[8]]) | |
Tensor(shape=[1, 1], dtype=int64, place=Place(cpu), stop_gradient=True, | |
[[2]]) | |
Tensor(shape=[1, 1], dtype=int64, place=Place(cpu), stop_gradient=True, | |
[[6]]) | |
Tensor(shape=[1, 1], dtype=int64, place=Place(cpu), stop_gradient=True, | |
[[3]]) | |
Tensor(shape=[1, 1], dtype=int64, place=Place(cpu), stop_gradient=True, | |
[[7]]) | |
Tensor(shape=[1, 1], dtype=int64, place=Place(cpu), stop_gradient=True, | |
[[4]]) | |
Tensor(shape=[1, 1], dtype=int64, place=Place(cpu), stop_gradient=True, | |
[[8]]) | |
Tensor(shape=[1, 1], dtype=int64, place=Place(cpu), stop_gradient=True, | |
[[5]]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
>>> for idx, v in enumerate(a_list[0]): | ||
... print(idx, v) | ||
0 8 | ||
1 2 | ||
2 5 | ||
|
||
>>> # output of the second subset | ||
>>> for idx, v in enumerate(a_list[1]): | ||
... print(idx, v) | ||
0 9 | ||
1 6 | ||
2 3 | ||
3 4 | ||
4 1 | ||
5 0 | ||
6 7 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这是不是api本身有问题..?我在aistudio上的gpu和cpu的环境分别试了下,两个结果都不一样(即使指定seed
和cpuplace
)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
PR types
Others
PR changes
Others
Description
修改如下文件的示例代码为新的格式,并通过
xdoctest
检查:python/paddle/io/dataloader/batch_sampler.py
python/paddle/io/dataloader/dataset.py
python/paddle/io/dataloader/sampler.py
python/paddle/io/dataloader/worker.py
python/paddle/io/reader.py
预览:
@sunzhongkai588 @SigureMo @megemini
PCard-66962