Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[xdoctest] reformat example code with google style in paddle/jit #55645

Merged
merged 12 commits into from
Jul 31, 2023

Conversation

SigureMo
Copy link
Member

@SigureMo SigureMo commented Jul 24, 2023

PR types

Others

PR changes

Others

Description

修改如下文件的示例代码为新的格式,并通过 xdoctest 检查:

  • python/paddle/jit/api.py
  • python/paddle/jit/dy2static/convert_call_func.py
  • python/paddle/jit/dy2static/logging_utils.py
  • python/paddle/jit/dy2static/program_translator.py
  • python/paddle/jit/translated_layer.py

预览:

@sunzhongkai588 @SigureMo @megemini

PCard-66962

@SigureMo
Copy link
Member Author

image

有一些交互环境下无法获取源码的示例代码,比如一个非常简单的例子:

import inspect

def foo(): ...
inspect.getsource(foo)

在交互环境下是无法获取源码的,这样会报错,这种情况是要如何跳过呢?整段代码 # doctest: +SKIP 嘛?@megemini

@SigureMo
Copy link
Member Author

SigureMo commented Jul 24, 2023

@megemini 另外我发现一个问题

image image

非连续的 ... 直接被吞掉了,导致最后报找不到 __len__ 的错误

image

这是 xdoctest 的已知问题嘛?所以我需要在空行上加上 ... 是嘛?

@megemini
Copy link
Contributor

在交互环境下是无法获取源码的,这样会报错,这种情况是要如何跳过呢?整段代码 # doctest: +SKIP 嘛?

是的!如果无法执行的跳过就行!

如果同段代码后面有需要继续执行的,可以用 # doctest: -SKIP 去掉跳过指令~

@megemini
Copy link
Contributor

megemini commented Jul 24, 2023

这是 xdoctest 的已知问题嘛?所以我需要在空行上加上 ... 是嘛?

不连续的 ... 表示一段代码的结束,而代码的开始是 >>> ,所以会导致有空行的 ... 后半段被截掉~

可以手动加上 ... 或者把空行去掉~

我把 convert_doctest 改一下,把空行加上 ... 吧。

@megemini
Copy link
Contributor

#55629 增加了本地验证的步骤,可以一并试一下~

@SigureMo
Copy link
Member Author

#55629 增加了本地验证的步骤,可以一并试一下~

这个可以!不然与 CI 交互总归是不够及时

@SigureMo
Copy link
Member Author

SigureMo commented Jul 24, 2023

#55629 增加了本地验证的步骤,可以一并试一下~

不过奇怪的是,我本地跑一直报这个错误

      File "/Users/.../Projects/Paddle/python/paddle/fluid/__init__.py", line 36, in <module>
        from . import framework
    
      File "/Users/.../Projects/Paddle/python/paddle/fluid/framework.py", line 33, in <module>
        from .proto import framework_pb2, data_feed_pb2
    
    ModuleNotFoundError: No module named 'python.paddle.fluid.proto'
    
    
    During handling of the above exception, another exception occurred:
    
    
    Traceback (most recent call last):
    
      File "/opt/homebrew/Caskroom/mambaforge/base/envs/paddle/lib/python3.10/site-packages/xdoctest/doctest_example.py", line 749, in run
        self._import_module()
    
      File "/opt/homebrew/Caskroom/mambaforge/base/envs/paddle/lib/python3.10/site-packages/xdoctest/doctest_example.py", line 564, in _import_module
        self.module = utils.import_module_from_path(self.modpath, index=-1)
    
      File "/opt/homebrew/Caskroom/mambaforge/base/envs/paddle/lib/python3.10/site-packages/xdoctest/utils/util_import.py", line 355, in import_module_from_path
        module = _custom_import_modpath(modpath, index=index)
    
      File "/opt/homebrew/Caskroom/mambaforge/base/envs/paddle/lib/python3.10/site-packages/xdoctest/utils/util_import.py", line 200, in _custom_import_modpath
        raise RuntimeError('\n'.join(msg_parts))
    
    RuntimeError: ERROR: Failed to import modname=python.paddle.jit.api with modpath=python/paddle/jit/api.py
    Caused by: ModuleNotFoundError("No module named 'python.paddle.fluid.proto'")

看样子像是 python/paddle/fluid/framework.py 里用了相对引用了自动生成的 proto,在源码里找不到

复现方式:

xdoctest --global-exec "import paddle\npaddle.device.set_device('cpu')" python/paddle/jit/api.py

@megemini
Copy link
Contributor

xdoctest --global-exec "import paddle\npaddle.device.set_device('cpu')" python/paddle/jit/api.py

如果直接测试项目中的文件,就会出这种错误!

不能用 xdoctest 直接测试 paddle 整个项目也是这个原因~ 他对于 module 的查找跟咱们不一样~

所以我在描述里面也建议了,把 docstring 封装到另外一个空文件空函数里面去测试就行~

@SigureMo
Copy link
Member Author

所以我在描述里面也建议了,把 docstring 封装到另外一个空文件空函数里面去测试就行~

好的,我稍后试一下,话说 xdoctest 是否有参数 prevent 这个行为呢?

@megemini
Copy link
Contributor

megemini commented Jul 25, 2023

好的,我稍后试一下,话说 xdoctest 是否有参数 prevent 这个行为呢?

应该不行~ 命令行执行 xdoctest 就会去找 module~

不过,用之前做的 Xdoctester 可以:

In [31]: from sampcd_processor_xdoctest import Xdoctester

In [32]: dt = Xdoctester()

In [33]: with open('/paddle/python/paddle/distribution/bernoulli.py') as f:
    ...:     codes = ''.join(f.readlines())
    ...:     

In [34]: codes = dt.convert_directive(codes)

In [35]: dt.run('bernoulli', codes)
====== <exec> ======
* DOCTEST : <modpath?>::bernoulli:0, line 75 <- wrt source file
Tensor(shape=[], dtype=float32, place=Place(cpu), stop_gradient=True,
       0.30000001)
Tensor(shape=[], dtype=float32, place=Place(cpu), stop_gradient=True,
       0.21000001)
Tensor(shape=[], dtype=float32, place=Place(cpu), stop_gradient=True,
       0.61086434)
[100, 1]
[100]
[100, 2]
[100, 2, 2]
[100, 1]
[100]
[100, 2]
[100, 2, 2]
DOCTEST RESULT
* SUCCESS: <modpath?>::bernoulli:0
====== </exec> ======
Out[35]: [TestResult(name='<DocTest(<modname?> bernoulli:0 ln 75)>', nocode=False, passed=True, skipped=False, failed=False, time=0.004740715026855469, test_msg=None, extra_info=None)]

如果直接全文检查的话,会有一个问题,因为他把整个 python 文件当作 docstring,所以,如果有一个地方有 # doctest: +SKIP,而后面又没有 # doctest: -SKIP,就会导致这部分代码没法检查~

所以,如果没有 skip 之类的,上面的方法应该可以,或者把上面的 codes 改为每一段 docstring,也行~

@SigureMo
Copy link
Member Author

这个 PR 把 jit 下的都改了,可惜的是,因为 AST 动转静需要通过 inspect.getsource 获取源码,而 xdoctest 执行环境同 REPL 是无法获取源码的,因此相关 case 都 skip 了

@SigureMo
Copy link
Member Author

@megemini 我发现有一些空行会让 xdoctest 报错,比如

# test.py
def foo():
    """
    >>> import numpy as np
    >>> import paddle
    >>> from paddle import nn
    >>> import paddle.optimizer as opt

    >>> BATCH_SIZE = 16
    >>> BATCH_NUM = 4
    >>> EPOCH_NUM = 4

    >>> IMAGE_SIZE = 784
    >>> CLASS_NUM = 10

    >>> # define a random dataset
    >>> class RandomDataset(paddle.io.Dataset):
    ...     def __init__(self, num_samples):
    ...         self.num_samples = num_samples
    ...
    ...     def __getitem__(self, idx):
    ...         image = np.random.random([IMAGE_SIZE]).astype('float32')
    ...         label = np.random.randint(0, CLASS_NUM - 1, (1, )).astype('int64')
    ...         return image, label
    ...
    ...     def __len__(self):
    ...         return self.num_samples
    ...
    >>> class LinearNet(nn.Layer):
    ...     def __init__(self):
    ...         super().__init__()
    ...         self._linear = nn.Linear(IMAGE_SIZE, CLASS_NUM)
    ...
    ...     @paddle.jit.to_static
    ...     def forward(self, x):
    ...         return self._linear(x)
    ...
    >>> def train(layer, loader, loss_fn, opt):
    ...     for epoch_id in range(EPOCH_NUM):
    ...         for batch_id, (image, label) in enumerate(loader()):
    ...             out = layer(image)
    ...             loss = loss_fn(out, label)
    ...             loss.backward()
    ...             opt.step()
    ...             opt.clear_grad()
    ...             print("Epoch {} batch {}: loss = {}".format(
    ...                 epoch_id, batch_id, np.mean(loss.numpy())))
    ...
    >>> # create network
    >>> layer = LinearNet()
    >>> loss_fn = nn.CrossEntropyLoss()
    >>> adam = opt.Adam(learning_rate=0.001, parameters=layer.parameters())

    >>> # create data loader
    >>> dataset = RandomDataset(BATCH_NUM * BATCH_SIZE)
    >>> loader = paddle.io.DataLoader(dataset,
    ...     batch_size=BATCH_SIZE,
    ...     shuffle=True,
    ...     drop_last=True,
    ...     num_workers=2)
    ...
    >>> # train
    >>> train(layer, loader, loss_fn, adam)

    >>> # save
    >>> model_path = "linear.example.model"
    >>> paddle.jit.save(layer, model_path)

    >>> # load
    >>> translated_layer = paddle.jit.load(model_path)

    >>> # get program
    >>> program = translated_layer.program()
    """
xdoctest --global-exec "import paddle\npaddle.device.set_device('cpu')" test.py

会报

CommandLine:
    python -m xdoctest test.py foo:0
Traceback (most recent call last):
  File "/opt/homebrew/Caskroom/mambaforge/base/envs/paddle/bin/xdoctest", line 8, in <module>
    sys.exit(main())
  File "/opt/homebrew/Caskroom/mambaforge/base/envs/paddle/lib/python3.10/site-packages/xdoctest/__main__.py", line 167, in main
    run_summary = xdoctest.doctest_module(modname, argv=[], style=style,
  File "/opt/homebrew/Caskroom/mambaforge/base/envs/paddle/lib/python3.10/site-packages/xdoctest/runner.py", line 322, in doctest_module
    run_summary = _run_examples(enabled_examples, verbose, config,
  File "/opt/homebrew/Caskroom/mambaforge/base/envs/paddle/lib/python3.10/site-packages/xdoctest/runner.py", line 644, in _run_examples
    summary = example.run(verbose=verbose, on_error=on_error)
  File "/opt/homebrew/Caskroom/mambaforge/base/envs/paddle/lib/python3.10/site-packages/xdoctest/doctest_example.py", line 1053, in run
    summary = self._post_run(verbose)
  File "/opt/homebrew/Caskroom/mambaforge/base/envs/paddle/lib/python3.10/site-packages/xdoctest/doctest_example.py", line 1488, in _post_run
    lines = self.repr_failure()
  File "/opt/homebrew/Caskroom/mambaforge/base/envs/paddle/lib/python3.10/site-packages/xdoctest/doctest_example.py", line 1418, in repr_failure
    new_tblines = _alter_traceback_linenos(self, tblines)
  File "/opt/homebrew/Caskroom/mambaforge/base/envs/paddle/lib/python3.10/site-packages/xdoctest/doctest_example.py", line 1408, in _alter_traceback_linenos
    failed_ctx = self.failed_part.orig_lines[tb_lineno - 1]
IndexError: list index out of range

此时删掉 # create data loader 上面的空行就没问题了(只是没有这个问题,但示例代码还会报错),但很多情况下好像确实加空行比较合适,PyTorch 里也有很多这样的情况,这个看起来是不是 xdoctest 的一个 bug?

@megemini
Copy link
Contributor

这个看起来是不是 xdoctest 的一个 bug?

应该是~

我看了一下 xdoctest,上面的流程:

  • 运行示例 <- 咱们的
  • 抛出异常,raise OSError('could not get source code') <- 咱们的
  • 抛出异常 IndexError: list index out of range,是 xdoctest 的 repr_failure 显示异常 <- xdoctest 的

其中前两步如果出问题,是咱们的问题,repr_failure 是 xdoctest 用于输出显示异常的方法。

他用 failed_tb_lineno 来记录异常的行数,然后就是 repr_failure 一长串的加来加去 ~

具体问题等有时间再定位吧,不过应该可以确认是 xdoctest 的 bug ... ... 😮‍💨

@SigureMo SigureMo changed the title [xdoctest] reformat example code for paddle.jit.api [xdoctest] reformat example code for paddle/jit Jul 26, 2023
@SigureMo SigureMo changed the title [xdoctest] reformat example code for paddle/jit [xdoctest] reformat example code with google style in paddle/jit Jul 26, 2023
@SigureMo
Copy link
Member Author

@megemini @sunzhongkai588 这个 PR 可以来 review 一下了~ CI 只需要看 PR-CI-Static-Check、PR-CI-APPROVAL 其他的不出意外不会不过的

@megemini
Copy link
Contributor

megemini commented Jul 26, 2023

除了上面 review 的问题之外,从 PR-CI-Static-Check 的日志来看,以下文件没有抽取到接口:

  • convert_call_func.py
  • program_translator.py
  • api.py :: TracedLayer

不过,xdoctest 引入的时候,接口文档抽取之前的过程都是继承原来 sampcd_processor.py 的,这里是一直这样处理的吗?

2023-07-26 13:52:29 API check -- Example Code
2023-07-26 13:52:29 sample_test running under python 3.7.0
2023-07-26 13:52:31 [2023-07-26 05:52:31,307] [    INFO] dygraph_sharding_optimizer.py:27 - g_shard_use_reduce 0
2023-07-26 13:52:31 [2023-07-26 05:52:31,308] [    INFO] dygraph_sharding_optimizer.py:29 - g_shard_norm_align_dp 1
2023-07-26 13:52:31 [2023-07-26 05:52:31,308] [    INFO] hybrid_parallel_optimizer.py:43 - g_shard_norm_align_dp 1
2023-07-26 13:52:31 [2023-07-26 05:52:31,315] [    INFO] pipeline_parallel.py:48 - g_shard_use_reduce 0
2023-07-26 13:52:31 paddle.jit.TranslatedLayer in dev is *******, different from pr's *******
2023-07-26 13:52:31 paddle.jit.TranslatedLayer.program in dev is *******, different from pr's *******
2023-07-26 13:52:31 paddle.jit.enable_to_static in dev is *******, different from pr's *******
2023-07-26 13:52:31 paddle.jit.ignore_module in dev is *******, different from pr's *******
2023-07-26 13:52:31 paddle.jit.load in dev is *******, different from pr's *******
2023-07-26 13:52:31 paddle.jit.not_to_static in dev is *******, different from pr's *******
2023-07-26 13:52:31 paddle.jit.save in dev is *******, different from pr's *******
2023-07-26 13:52:31 paddle.jit.set_code_level in dev is *******, different from pr's *******
2023-07-26 13:52:31 paddle.jit.set_verbosity in dev is *******, different from pr's *******
2023-07-26 13:52:31 paddle.jit.to_static in dev is *******, different from pr's *******
2023-07-26 13:52:31 paddle.jit.TranslatedLayer' code block (name:None, id:1) is wrapped by PS1(>>> ), which will be tested by xdoctest.
2023-07-26 13:52:31 paddle.jit.TranslatedLayer.program' code block (name:None, id:1) is wrapped by PS1(>>> ), which will be tested by xdoctest.
2023-07-26 13:52:31 paddle.jit.enable_to_static' code block (name:None, id:1) is wrapped by PS1(>>> ), which will be tested by xdoctest.
2023-07-26 13:52:31 paddle.jit.ignore_module' code block (name:None, id:1) is wrapped by PS1(>>> ), which will be tested by xdoctest.
2023-07-26 13:52:31 paddle.jit.load' code block (name:code-example1, id:1) is wrapped by PS1(>>> ), which will be tested by xdoctest.
2023-07-26 13:52:31 paddle.jit.load' code block (name:code-example2, id:2) is wrapped by PS1(>>> ), which will be tested by xdoctest.
2023-07-26 13:52:31 paddle.jit.not_to_static' code block (name:None, id:1) is wrapped by PS1(>>> ), which will be tested by xdoctest.
2023-07-26 13:52:31 paddle.jit.save' code block (name:None, id:1) is wrapped by PS1(>>> ), which will be tested by xdoctest.
2023-07-26 13:52:31 paddle.jit.set_code_level' code block (name:None, id:1) is wrapped by PS1(>>> ), which will be tested by xdoctest.
2023-07-26 13:52:31 paddle.jit.set_verbosity' code block (name:None, id:1) is wrapped by PS1(>>> ), which will be tested by xdoctest.
2023-07-26 13:52:31 paddle.jit.to_static' code block (name:None, id:1) is wrapped by PS1(>>> ), which will be tested by xdoctest.
2023-07-26 13:52:31 -----API_PR.spec is the same as API_DEV.spec-----

另外,私有方法也没有提取,这个是 sampcd_processor.py 的抽取文档的方法,私有方法不对外,倒也不影响,只是挺别扭 ~

@SigureMo
Copy link
Member Author

除了上面 review 的问题之外,从 PR-CI-Static-Check 的日志来看,以下文件没有抽取到接口:

上面 review 的问题是?是不是没有 submit review 还处于 pending 状态?

不过,xdoctest 引入的时候,接口文档抽取之前的过程都是继承原来 sampcd_processor.py 的,这里是一直这样处理的吗?

我之前看过 https://github.com/PaddlePaddle/docs/blob/develop/docs/api/gen_doc.py 部分代码,其逻辑是将 __all__ 下的视为暴露的 API,比如 paddle.jit.__all__

__all__ = [ # noqa
'save',
'load',
'to_static',
'ignore_module',
'TranslatedLayer',
'set_code_level',
'set_verbosity',
'not_to_static',
'enable_to_static',
]

那么就会暴露 paddle.jit.savepaddle.jit.load…… 等等 API,其余 API 不会暴露,也不会显示在文档里

以下文件没有抽取到接口:

这些没抽取到接口的,是因为他们不是公开 API,这样看的话,示例代码的抽取和公开 API 的抽取逻辑是一样的,都是基于 __all__ 的,如果之前逻辑如此,那就没有问题

Comment on lines 1453 to 1463
>>> # create data loader
>>> dataset = RandomDataset(BATCH_NUM * BATCH_SIZE)
>>> loader = paddle.io.DataLoader(dataset,
>>> feed_list=[image, label],
>>> places=place,
>>> batch_size=BATCH_SIZE,
>>> shuffle=True,
>>> drop_last=True,
>>> return_list=False,
>>> num_workers=2)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这是用旧的 convert_doctest 版本生成的吧~

新版本已经改了~

image

整体 .. code-block:: python 缩进也对其了~

image

Comment on lines 872 to 874
... self._linear = nn.Linear(IMAGE_SIZE, CLASS_NUM)

... @paddle.jit.to_static
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

缺少 ...

Comment on lines 1492 to 1499
>>> adam = opt.Adam(learning_rate=0.001, parameters=fc.parameters())
>>> loader = paddle.io.DataLoader(dataset,
>>> places=place,
>>> batch_size=BATCH_SIZE,
>>> shuffle=True,
>>> drop_last=True,
>>> num_workers=2)
>>> for epoch_id in range(EPOCH_NUM):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

同上

Comment on lines +1625 to +1628
>>> print([i.name for i in inputs])
>>> # [u'generated_tensor_0'] the feed input Tensor name representing x
>>> print([o.name for o in outputs])
>>> # [u'_generated_var_4'] the fetch output Tensor name representing x_v
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这里的输出应该是什么?!我这边在 paddle 2.5 的 docker 直接运行报错了~

image

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ProgramTranslator 是一个废弃 API,不会公开

@megemini
Copy link
Contributor

lgtm :)

Copy link
Contributor

@sunzhongkai588 sunzhongkai588 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM
就是请教一个问题,第一行是空行的,是不是都是( >>> # doctest: +SKIP)
image

@SigureMo
Copy link
Member Author

就是请教一个问题,第一行是空行的,是不是都是( >>> # doctest: +SKIP)

emmm,话说 pytorch 这样的行也是只删除 # doctest: +SKIP 而不删除 >>> 嘛?看起来有点别扭 😂

@SigureMo
Copy link
Member Author

https://pytorch.org/docs/stable/generated/torch.einsum.html?highlight=einsum#torch.einsum

image image

看样子 torch 会把整行删掉,是怎么做到的捏?

@megemini
Copy link
Contributor

就是请教一个问题,第一行是空行的,是不是都是( >>> # doctest: +SKIP)

是的 ... ...

emmm,话说 pytorch 这样的行也是只删除 # doctest: +SKIP 而不删除 >>> 嘛?看起来有点别扭 joy

pytorch 删掉了 ~

image

image

让官网的研发同学帮帮忙吧 ~ :)

@SigureMo
Copy link
Member Author

让官网的研发同学帮帮忙吧 ~ :)

啊…… 再提需求感觉官网想打人了……

@megemini
Copy link
Contributor

看样子 torch 会把整行删掉,是怎么做到的捏?

判断只有 >>> 删掉就行了吧 ~ python 的这种格式我感觉前端处理起来应该不麻烦 ~ 🫣

@SigureMo
Copy link
Member Author

判断只有 >>> 删掉就行了吧 ~ python 的这种格式我感觉前端处理起来应该不麻烦 ~ 🫣

前端得到的是渲染后的 HTML 吧?让前端去删 DOM 我觉得不是很好的处理方式

@SigureMo
Copy link
Member Author

SigureMo commented Jul 28, 2023

我看了下 torch 的处理,看起来应该是这里:

https://github.com/pytorch/pytorch/blob/4fe407ad7387763aacbf3f168659198acdaceb31/docs/source/conf.py#L545-L586

我们是否可以利用 autodoc-process-docstring 来将这一行去掉呢?

UPDATE:

我在本地搭的 sphinx 环境里测试是可以的:

source:

image

before:

image

after:

image

@megemini
Copy link
Contributor

@SigureMo
Copy link
Member Author

提个 PR 试试?

嗯嗯

@megemini
Copy link
Contributor

PaddlePaddle/docs#6072

@luotao1 luotao1 merged commit 0c3b369 into PaddlePaddle:develop Jul 31, 2023
@SigureMo SigureMo deleted the xdoctest/jit-api branch July 31, 2023 06:45
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants