[xdoctest] reformat example code with google style in `paddle/io` #55732

SigureMo · 2023-07-26T19:08:57Z

PR types

Others

PR changes

Others

Description

修改如下文件的示例代码为新的格式，并通过 xdoctest 检查：

python/paddle/io/dataloader/batch_sampler.py
python/paddle/io/dataloader/dataset.py
python/paddle/io/dataloader/sampler.py
python/paddle/io/dataloader/worker.py
python/paddle/io/reader.py

预览：

@sunzhongkai588 @SigureMo @megemini

[xdoctest] 分批次修改已有代码的示例 #55629

PCard-66962

paddle-bot · 2023-07-26T19:09:01Z

你的PR提交成功，感谢你对开源项目的贡献!
请关注后续CI自动化测试结果，详情请参考Paddle-CI手册。
Your PR has been submitted. Thanks for your contribution!
Please wait for the result of CI firstly. See Paddle CI Manual for details.

sunzhongkai588

LGTM～
有一些小疑问辛苦一师傅解答一下
and 好像ci没过

sunzhongkai588 · 2023-08-01T09:44:47Z

python/paddle/io/dataloader/dataset.py

+            >>> # doctest: +SKIP
+            0 1
+            1 3
+            2 9
+            >>> # doctest: -SKIP


just一些小疑问

为什么这部分要skip

我看dataset.py下有的api示例，输出给结果了，有的没给。是否要写结果，要不就根据原示例是否有结果，来决定

#55849 (comment)

都听顺师傅的（啊不是，其实我本来也是这么想的）

sunzhongkai588 · 2023-08-01T09:52:51Z

python/paddle/io/dataloader/batch_sampler.py

+                ...
+                ...     def __len__(self):
+                ...         return self.num_samples
+                ...


just一点小疑问，这部分如果去掉 ... 是不是也没问题

确实是～ xdoctest 可以兼容只用 >>> ，多行也不加 ... 提示。

但是，这只是 xdoctest 的特性，保不齐哪天不用 xdoctest 了，这样可能就挂了～

所以，个人建议，还是以 python 的 doctest 为最低兼容标准吧～

sunzhongkai588 · 2023-08-01T09:56:20Z

python/paddle/io/reader.py

+            ...         simple_net.clear_gradients()
+            ...         print("Epoch {} batch {}: loss = {}".format(e, i, np.mean(loss.numpy())))
+
+    Notes:


其实用 .. note:: 也可以（狗头），anyway

如果没记错的话，这是当年 @Ligoml 她老人家教导的，要改成这样的 [doge]

当然也有可能是我记错了 [doge][doge][doge]

megemini · 2023-08-01T10:04:07Z

python/paddle/io/dataloader/batch_sampler.py

+                ...
+                ...     def __len__(self):
+                ...         return self.num_samples
+                ...


确实是～ xdoctest 可以兼容只用 >>> ，多行也不加 ... 提示。

但是，这只是 xdoctest 的特性，保不齐哪天不用 xdoctest 了，这样可能就挂了～

所以，个人建议，还是以 python 的 doctest 为最低兼容标准吧～

megemini · 2023-08-01T10:05:02Z

python/paddle/io/dataloader/batch_sampler.py

+            >>> for batch_indices in bs:
+            ...     print(batch_indices)
+            ...
+            >>> # init with sampler
+            >>> sampler = RandomSampler(RandomDataset(100))
+            >>> bs = BatchSampler(sampler=sampler,
+            ...                     batch_size=8,
+            ...                     drop_last=True)
+            ...
+            >>> for batch_indices in bs:
+            ...     print(batch_indices)


这部分的 print 要不要加个输出？

megemini · 2023-08-01T10:05:29Z

python/paddle/io/dataloader/dataset.py

+            ...
+            >>> dataset = RandomDataset(10)
+            >>> for i in range(len(dataset)):
+            ...     print(dataset[i])


同上，要不要加个输出，或者加个 comment？

megemini · 2023-08-01T10:10:09Z

python/paddle/io/dataloader/dataset.py

+            ...
+            >>> dataset = RandomDataset(10)
+            >>> for img, lbl in dataset:
+            ...     print(img, lbl)


同上，要不要加个输出？

另外，numpy 和 paddle 加个 seed，随即数量也改小点：

>>> import numpy as np >>> import paddle >>> from paddle.io import IterableDataset >>> np.random.seed(2023) >>> paddle.seed(2023) >>> # define a random dataset >>> class RandomDataset(IterableDataset): ... def __init__(self, num_samples): ... self.num_samples = num_samples ... ... def __iter__(self): ... for i in range(self.num_samples): ... image = np.random.random([3]).astype('float32') ... label = np.random.randint(0, 9, (1, )).astype('int64') ... yield image, label ... >>> dataset = RandomDataset(3) >>> for img, lbl in dataset: ... print(img, lbl) [0.3219883 0.89042246 0.5880523 ] [3] [0.04382154 0.766739 0.33634686] [1] [0.7272747 0.52438736 0.5449352 ] [8]

看看这样行不？之前的也是～

megemini · 2023-08-01T10:10:52Z

python/paddle/io/dataloader/dataset.py


+            >>> for i in range(len(dataset)):
+            ...     input, label = dataset[i]
+            ...     print(input, label)


megemini · 2023-08-01T10:14:45Z

python/paddle/io/dataloader/sampler.py

+            ...     print(index)
+            0
+            1
+            2
+            ...
+            99


这个好！我加到 issue 的描述里面做个范例，后面更新文档也用的上～

python/paddle/io/dataloader/sampler.py

megemini · 2023-08-01T10:29:13Z

python/paddle/io/reader.py

+            ...         avg_loss.backward()
+            ...         opt.minimize(avg_loss)
+            ...         simple_net.clear_gradients()
+            ...         print("Epoch {} batch {}: loss = {}".format(e, i, np.mean(loss.numpy())))


这个地方算是一个相对完整的 case，输出加不加我觉得都行～

megemini · 2023-08-01T10:30:10Z

python/paddle/io/dataloader/batch_sampler.py

+            ...     # do something
+            ...     break


或者其他的几个 print 的地方都改成这样也不错～

修改了一版，整体对于 image 的都采用这个策略，因为大多数不需要 print，直接加注释 do something

其他的该加输出的也都加了

megemini · 2023-08-01T10:39:54Z

python/paddle/io/dataloader/dataset.py

+            ...     print(data)
+            Tensor(shape=[1, 1], dtype=int64, place=Place(cpu), stop_gradient=True,
+                [[2]])
+            Tensor(shape=[1, 1], dtype=int64, place=Place(cpu), stop_gradient=True,
+                [[3]])
+            Tensor(shape=[1, 1], dtype=int64, place=Place(cpu), stop_gradient=True,
+                [[4]])
+            Tensor(shape=[1, 1], dtype=int64, place=Place(cpu), stop_gradient=True,
+                [[5]])
+            Tensor(shape=[1, 1], dtype=int64, place=Place(cpu), stop_gradient=True,
+                [[6]])
+            Tensor(shape=[1, 1], dtype=int64, place=Place(cpu), stop_gradient=True,
+                [[7]])
+            Tensor(shape=[1, 1], dtype=int64, place=Place(cpu), stop_gradient=True,
+                [[8]])


这里 num_workers=2，输出行为不可控吧？是不是加个 skip？下面的几个也是～我看日志里面 fail 的都是这样～

sunzhongkai588 · 2023-08-01T18:47:47Z

python/paddle/io/dataloader/worker.py

+            Tensor(shape=[1, 1], dtype=int64, place=Place(cpu), stop_gradient=True,
+            [[2]])
+            Tensor(shape=[1, 1], dtype=int64, place=Place(cpu), stop_gradient=True,
+            [[3]])
+            Tensor(shape=[1, 1], dtype=int64, place=Place(cpu), stop_gradient=True,
+            [[4]])
+            Tensor(shape=[1, 1], dtype=int64, place=Place(cpu), stop_gradient=True,
+            [[5]])
+            Tensor(shape=[1, 1], dtype=int64, place=Place(cpu), stop_gradient=True,
+            [[6]])
+            Tensor(shape=[1, 1], dtype=int64, place=Place(cpu), stop_gradient=True,
+            [[7]])
+            Tensor(shape=[1, 1], dtype=int64, place=Place(cpu), stop_gradient=True,
+            [[8]])


试了下，这儿的输出好像是固定的？

Suggested change

Tensor(shape=[1, 1], dtype=int64, place=Place(cpu), stop_gradient=True,

[[2]])

Tensor(shape=[1, 1], dtype=int64, place=Place(cpu), stop_gradient=True,

[[3]])

Tensor(shape=[1, 1], dtype=int64, place=Place(cpu), stop_gradient=True,

[[4]])

Tensor(shape=[1, 1], dtype=int64, place=Place(cpu), stop_gradient=True,

[[5]])

Tensor(shape=[1, 1], dtype=int64, place=Place(cpu), stop_gradient=True,

[[6]])

Tensor(shape=[1, 1], dtype=int64, place=Place(cpu), stop_gradient=True,

[[7]])

Tensor(shape=[1, 1], dtype=int64, place=Place(cpu), stop_gradient=True,

[[8]])

Tensor(shape=[1, 1], dtype=int64, place=Place(cpu), stop_gradient=True,

[[2]])

Tensor(shape=[1, 1], dtype=int64, place=Place(cpu), stop_gradient=True,

[[6]])

Tensor(shape=[1, 1], dtype=int64, place=Place(cpu), stop_gradient=True,

[[3]])

Tensor(shape=[1, 1], dtype=int64, place=Place(cpu), stop_gradient=True,

[[7]])

Tensor(shape=[1, 1], dtype=int64, place=Place(cpu), stop_gradient=True,

[[4]])

Tensor(shape=[1, 1], dtype=int64, place=Place(cpu), stop_gradient=True,

[[8]])

Tensor(shape=[1, 1], dtype=int64, place=Place(cpu), stop_gradient=True,

[[5]])

离谱，我本地可稳定的 2345678 了

开发机和 CI 一样

按照 CI 来吧

sunzhongkai588 · 2023-08-01T19:11:26Z

python/paddle/io/dataloader/dataset.py

+            >>> for idx, v in enumerate(a_list[0]):
+            ...     print(idx, v)
+            0 8
+            1 2
+            2 5
+
+            >>> # output of the second subset
+            >>> for idx, v in enumerate(a_list[1]):
+            ...     print(idx, v)
+            0 9
+            1 6
+            2 3
+            3 4
+            4 1
+            5 0
+            6 7


这是不是api本身有问题..？我在aistudio上的gpu和cpu的环境分别试了下，两个结果都不一样（即使指定seed和cpuplace）

对哦，为啥

sunzhongkai588

LGTM

[xdoctest] reformat example code with google style in paddle/io

c5eb488

preview, test=docs_preview

e8a6b31

SigureMo mentioned this pull request Jul 27, 2023

[xdoctest] 分批次修改已有代码的示例 #55629

Closed

luotao1 assigned luotao1 and sunzhongkai588 Jul 31, 2023

sunzhongkai588 previously approved these changes Aug 1, 2023

View reviewed changes

megemini reviewed Aug 1, 2023

View reviewed changes

update example code, test=docs_preview

fbce310

SigureMo dismissed sunzhongkai588’s stale review via fbce310 August 1, 2023 13:20

sunzhongkai588 reviewed Aug 1, 2023

View reviewed changes

SigureMo added 4 commits August 2, 2023 09:57

update output, test=docs_preview

71c80e8

remove unused imports, test=docs_preview

90343f5

skip some device depends apis, test=docs_preview

8619edd

add skip reason, test=docs_preview

a89232c

sunzhongkai588 approved these changes Aug 2, 2023

View reviewed changes

luotao1 merged commit 7c9b1ab into PaddlePaddle:develop Aug 3, 2023

SigureMo deleted the xdoctest/io branch August 3, 2023 12:56

SigureMo mentioned this pull request Aug 3, 2023

[xdoctest] No.44-47 and No.50-59 doc style #55813

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[xdoctest] reformat example code with google style in `paddle/io` #55732

[xdoctest] reformat example code with google style in `paddle/io` #55732

SigureMo commented Jul 26, 2023 •

edited

Loading

paddle-bot bot commented Jul 26, 2023

sunzhongkai588 left a comment •

edited

Loading

sunzhongkai588 Aug 1, 2023

SigureMo Aug 1, 2023

sunzhongkai588 Aug 1, 2023

megemini Aug 1, 2023

sunzhongkai588 Aug 1, 2023

SigureMo Aug 1, 2023

megemini Aug 1, 2023

megemini Aug 1, 2023

megemini Aug 1, 2023

megemini Aug 1, 2023

megemini Aug 1, 2023

megemini Aug 1, 2023

megemini Aug 1, 2023

megemini Aug 1, 2023

SigureMo Aug 1, 2023

megemini Aug 1, 2023

sunzhongkai588 Aug 1, 2023

SigureMo Aug 2, 2023

sunzhongkai588 Aug 1, 2023

SigureMo Aug 2, 2023 •

edited

Loading

sunzhongkai588 left a comment

[xdoctest] reformat example code with google style in paddle/io #55732

[xdoctest] reformat example code with google style in paddle/io #55732

Conversation

SigureMo commented Jul 26, 2023 • edited Loading

PR types

PR changes

Description

paddle-bot bot commented Jul 26, 2023

sunzhongkai588 left a comment • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

SigureMo Aug 2, 2023 • edited Loading

Choose a reason for hiding this comment

sunzhongkai588 left a comment

Choose a reason for hiding this comment

[xdoctest] reformat example code with google style in `paddle/io` #55732

[xdoctest] reformat example code with google style in `paddle/io` #55732

SigureMo commented Jul 26, 2023 •

edited

Loading

sunzhongkai588 left a comment •

edited

Loading

SigureMo Aug 2, 2023 •

edited

Loading