Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[xdoctest] reformat example code with google style in paddle/io #55732

Merged
merged 7 commits into from
Aug 3, 2023

Conversation

SigureMo
Copy link
Member

@SigureMo SigureMo commented Jul 26, 2023

PR types

Others

PR changes

Others

Description

修改如下文件的示例代码为新的格式,并通过 xdoctest 检查:

  • python/paddle/io/dataloader/batch_sampler.py
  • python/paddle/io/dataloader/dataset.py
  • python/paddle/io/dataloader/sampler.py
  • python/paddle/io/dataloader/worker.py
  • python/paddle/io/reader.py

预览:

@sunzhongkai588 @SigureMo @megemini

PCard-66962

@paddle-bot
Copy link

paddle-bot bot commented Jul 26, 2023

你的PR提交成功,感谢你对开源项目的贡献!
请关注后续CI自动化测试结果,详情请参考Paddle-CI手册
Your PR has been submitted. Thanks for your contribution!
Please wait for the result of CI firstly. See Paddle CI Manual for details.

sunzhongkai588
sunzhongkai588 previously approved these changes Aug 1, 2023
Copy link
Contributor

@sunzhongkai588 sunzhongkai588 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM~
有一些小疑问辛苦一师傅解答一下
and 好像ci没过

Comment on lines 507 to 511
>>> # doctest: +SKIP
0 1
1 3
2 9
>>> # doctest: -SKIP
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just一些小疑问

  • 为什么这部分要skip
  • 我看dataset.py下有的api示例,输出给结果了,有的没给。是否要写结果,要不就根据原示例是否有结果,来决定

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

#55849 (comment)

都听顺师傅的(啊不是,其实我本来也是这么想的)

...
... def __len__(self):
... return self.num_samples
...
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just一点小疑问,这部分如果去掉 ... 是不是也没问题

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

确实是 ~ xdoctest 可以兼容只用 >>> ,多行也不加 ... 提示。

但是,这只是 xdoctest 的特性,保不齐哪天不用 xdoctest 了,这样可能就挂了~

所以,个人建议,还是以 python 的 doctest 为最低兼容标准吧~

... simple_net.clear_gradients()
... print("Epoch {} batch {}: loss = {}".format(e, i, np.mean(loss.numpy())))

Notes:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

其实用 .. note:: 也可以(狗头),anyway

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

如果没记错的话,这是当年 @Ligoml 她老人家教导的,要改成这样的 [doge]

当然也有可能是我记错了 [doge][doge][doge]

...
... def __len__(self):
... return self.num_samples
...
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

确实是 ~ xdoctest 可以兼容只用 >>> ,多行也不加 ... 提示。

但是,这只是 xdoctest 的特性,保不齐哪天不用 xdoctest 了,这样可能就挂了~

所以,个人建议,还是以 python 的 doctest 为最低兼容标准吧~

Comment on lines 81 to 91
>>> for batch_indices in bs:
... print(batch_indices)
...
>>> # init with sampler
>>> sampler = RandomSampler(RandomDataset(100))
>>> bs = BatchSampler(sampler=sampler,
... batch_size=8,
... drop_last=True)
...
>>> for batch_indices in bs:
... print(batch_indices)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这部分的 print 要不要加个输出?

...
>>> dataset = RandomDataset(10)
>>> for i in range(len(dataset)):
... print(dataset[i])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

同上,要不要加个输出,或者加个 comment?

...
>>> dataset = RandomDataset(10)
>>> for img, lbl in dataset:
... print(img, lbl)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

同上,要不要加个输出?

另外,numpy 和 paddle 加个 seed,随即数量也改小点:

>>> import numpy as np
>>> import paddle
>>> from paddle.io import IterableDataset
>>> np.random.seed(2023)
>>> paddle.seed(2023)
>>> # define a random dataset
>>> class RandomDataset(IterableDataset):
...     def __init__(self, num_samples):
...         self.num_samples = num_samples
...
...     def __iter__(self):
...         for i in range(self.num_samples):
...             image = np.random.random([3]).astype('float32')
...             label = np.random.randint(0, 9, (1, )).astype('int64')
...             yield image, label
...
>>> dataset = RandomDataset(3)
>>> for img, lbl in dataset:
...     print(img, lbl)
[0.3219883  0.89042246 0.5880523 ] [3]
[0.04382154 0.766739   0.33634686] [1]
[0.7272747  0.52438736 0.5449352 ] [8]

看看这样行不?之前的也是~


>>> for i in range(len(dataset)):
... input, label = dataset[i]
... print(input, label)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

同上~

Comment on lines +74 to +79
... print(index)
0
1
2
...
99
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这个好!我加到 issue 的描述里面做个范例,后面更新文档也用的上~

python/paddle/io/dataloader/sampler.py Show resolved Hide resolved
python/paddle/io/dataloader/sampler.py Show resolved Hide resolved
... avg_loss.backward()
... opt.minimize(avg_loss)
... simple_net.clear_gradients()
... print("Epoch {} batch {}: loss = {}".format(e, i, np.mean(loss.numpy())))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这个地方算是一个相对完整的 case,输出加不加我觉得都行~

Comment on lines +224 to +225
... # do something
... break
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

或者其他的几个 print 的地方都改成这样也不错~

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

修改了一版,整体对于 image 的都采用这个策略,因为大多数不需要 print,直接加注释 do something

其他的该加输出的也都加了

Comment on lines 161 to 175
... print(data)
Tensor(shape=[1, 1], dtype=int64, place=Place(cpu), stop_gradient=True,
[[2]])
Tensor(shape=[1, 1], dtype=int64, place=Place(cpu), stop_gradient=True,
[[3]])
Tensor(shape=[1, 1], dtype=int64, place=Place(cpu), stop_gradient=True,
[[4]])
Tensor(shape=[1, 1], dtype=int64, place=Place(cpu), stop_gradient=True,
[[5]])
Tensor(shape=[1, 1], dtype=int64, place=Place(cpu), stop_gradient=True,
[[6]])
Tensor(shape=[1, 1], dtype=int64, place=Place(cpu), stop_gradient=True,
[[7]])
Tensor(shape=[1, 1], dtype=int64, place=Place(cpu), stop_gradient=True,
[[8]])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这里 num_workers=2,输出行为不可控吧?是不是加个 skip?下面的几个也是~ 我看日志里面 fail 的都是这样~

Comment on lines 141 to 154
Tensor(shape=[1, 1], dtype=int64, place=Place(cpu), stop_gradient=True,
[[2]])
Tensor(shape=[1, 1], dtype=int64, place=Place(cpu), stop_gradient=True,
[[3]])
Tensor(shape=[1, 1], dtype=int64, place=Place(cpu), stop_gradient=True,
[[4]])
Tensor(shape=[1, 1], dtype=int64, place=Place(cpu), stop_gradient=True,
[[5]])
Tensor(shape=[1, 1], dtype=int64, place=Place(cpu), stop_gradient=True,
[[6]])
Tensor(shape=[1, 1], dtype=int64, place=Place(cpu), stop_gradient=True,
[[7]])
Tensor(shape=[1, 1], dtype=int64, place=Place(cpu), stop_gradient=True,
[[8]])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

试了下,这儿的输出好像是固定的?

Suggested change
Tensor(shape=[1, 1], dtype=int64, place=Place(cpu), stop_gradient=True,
[[2]])
Tensor(shape=[1, 1], dtype=int64, place=Place(cpu), stop_gradient=True,
[[3]])
Tensor(shape=[1, 1], dtype=int64, place=Place(cpu), stop_gradient=True,
[[4]])
Tensor(shape=[1, 1], dtype=int64, place=Place(cpu), stop_gradient=True,
[[5]])
Tensor(shape=[1, 1], dtype=int64, place=Place(cpu), stop_gradient=True,
[[6]])
Tensor(shape=[1, 1], dtype=int64, place=Place(cpu), stop_gradient=True,
[[7]])
Tensor(shape=[1, 1], dtype=int64, place=Place(cpu), stop_gradient=True,
[[8]])
Tensor(shape=[1, 1], dtype=int64, place=Place(cpu), stop_gradient=True,
[[2]])
Tensor(shape=[1, 1], dtype=int64, place=Place(cpu), stop_gradient=True,
[[6]])
Tensor(shape=[1, 1], dtype=int64, place=Place(cpu), stop_gradient=True,
[[3]])
Tensor(shape=[1, 1], dtype=int64, place=Place(cpu), stop_gradient=True,
[[7]])
Tensor(shape=[1, 1], dtype=int64, place=Place(cpu), stop_gradient=True,
[[4]])
Tensor(shape=[1, 1], dtype=int64, place=Place(cpu), stop_gradient=True,
[[8]])
Tensor(shape=[1, 1], dtype=int64, place=Place(cpu), stop_gradient=True,
[[5]])

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

离谱,我本地可稳定的 2345678 了

image

开发机和 CI 一样

image

按照 CI 来吧

Comment on lines 507 to 522
>>> for idx, v in enumerate(a_list[0]):
... print(idx, v)
0 8
1 2
2 5

>>> # output of the second subset
>>> for idx, v in enumerate(a_list[1]):
... print(idx, v)
0 9
1 6
2 3
3 4
4 1
5 0
6 7
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这是不是api本身有问题..?我在aistudio上的gpu和cpu的环境分别试了下,两个结果都不一样(即使指定seedcpuplace

Copy link
Member Author

@SigureMo SigureMo Aug 2, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

image image

对哦,为啥

Copy link
Contributor

@sunzhongkai588 sunzhongkai588 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@luotao1 luotao1 merged commit 7c9b1ab into PaddlePaddle:develop Aug 3, 2023
@SigureMo SigureMo deleted the xdoctest/io branch August 3, 2023 12:56
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants