Skip to content

Commit

Permalink
Update comments / docs for Spanned (apache#1549)
Browse files Browse the repository at this point in the history
Co-authored-by: Ifeanyi Ubah <[email protected]>
  • Loading branch information
alamb and iffyio authored Nov 30, 2024
1 parent f4f112d commit 4ab3ab9
Show file tree
Hide file tree
Showing 8 changed files with 306 additions and 91 deletions.
19 changes: 12 additions & 7 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -100,13 +100,18 @@ similar semantics are represented with the same AST. We welcome PRs to fix such
issues and distinguish different syntaxes in the AST.


## WIP: Extracting source locations from AST nodes
## Source Locations (Work in Progress)

This crate allows recovering source locations from AST nodes via the [Spanned](https://docs.rs/sqlparser/latest/sqlparser/ast/trait.Spanned.html) trait, which can be used for advanced diagnostics tooling. Note that this feature is a work in progress and many nodes report missing or inaccurate spans. Please see [this document](./docs/source_spans.md#source-span-contributing-guidelines) for information on how to contribute missing improvements.
This crate allows recovering source locations from AST nodes via the [Spanned]
trait, which can be used for advanced diagnostics tooling. Note that this
feature is a work in progress and many nodes report missing or inaccurate spans.
Please see [this ticket] for information on how to contribute missing
improvements.

```rust
use sqlparser::ast::Spanned;
[Spanned]: https://docs.rs/sqlparser/latest/sqlparser/ast/trait.Spanned.html
[this ticket]: https://github.com/apache/datafusion-sqlparser-rs/issues/1548

```rust
// Parse SQL
let ast = Parser::parse_sql(&GenericDialect, "SELECT A FROM B").unwrap();

Expand All @@ -123,9 +128,9 @@ SQL was first standardized in 1987, and revisions of the standard have been
published regularly since. Most revisions have added significant new features to
the language, and as a result no database claims to support the full breadth of
features. This parser currently supports most of the SQL-92 syntax, plus some
syntax from newer versions that have been explicitly requested, plus some MSSQL,
PostgreSQL, and other dialect-specific syntax. Whenever possible, the [online
SQL:2016 grammar][sql-2016-grammar] is used to guide what syntax to accept.
syntax from newer versions that have been explicitly requested, plus various
other dialect-specific syntax. Whenever possible, the [online SQL:2016
grammar][sql-2016-grammar] is used to guide what syntax to accept.

Unfortunately, stating anything more specific about compliance is difficult.
There is no publicly available test suite that can assess compliance
Expand Down
52 changes: 0 additions & 52 deletions docs/source_spans.md

This file was deleted.

64 changes: 59 additions & 5 deletions src/ast/helpers/attached_token.rs
Original file line number Diff line number Diff line change
Expand Up @@ -19,25 +19,73 @@ use core::cmp::{Eq, Ord, Ordering, PartialEq, PartialOrd};
use core::fmt::{self, Debug, Formatter};
use core::hash::{Hash, Hasher};

use crate::tokenizer::{Token, TokenWithSpan};
use crate::tokenizer::TokenWithSpan;

#[cfg(feature = "serde")]
use serde::{Deserialize, Serialize};

#[cfg(feature = "visitor")]
use sqlparser_derive::{Visit, VisitMut};

/// A wrapper type for attaching tokens to AST nodes that should be ignored in comparisons and hashing.
/// This should be used when a token is not relevant for semantics, but is still needed for
/// accurate source location tracking.
/// A wrapper over [`TokenWithSpan`]s that ignores the token and source
/// location in comparisons and hashing.
///
/// This type is used when the token and location is not relevant for semantics,
/// but is still needed for accurate source location tracking, for example, in
/// the nodes in the [ast](crate::ast) module.
///
/// Note: **All** `AttachedTokens` are equal.
///
/// # Examples
///
/// Same token, different location are equal
/// ```
/// # use sqlparser::ast::helpers::attached_token::AttachedToken;
/// # use sqlparser::tokenizer::{Location, Span, Token, TokenWithLocation};
/// // commas @ line 1, column 10
/// let tok1 = TokenWithLocation::new(
/// Token::Comma,
/// Span::new(Location::new(1, 10), Location::new(1, 11)),
/// );
/// // commas @ line 2, column 20
/// let tok2 = TokenWithLocation::new(
/// Token::Comma,
/// Span::new(Location::new(2, 20), Location::new(2, 21)),
/// );
///
/// assert_ne!(tok1, tok2); // token with locations are *not* equal
/// assert_eq!(AttachedToken(tok1), AttachedToken(tok2)); // attached tokens are
/// ```
///
/// Different token, different location are equal 🤯
///
/// ```
/// # use sqlparser::ast::helpers::attached_token::AttachedToken;
/// # use sqlparser::tokenizer::{Location, Span, Token, TokenWithLocation};
/// // commas @ line 1, column 10
/// let tok1 = TokenWithLocation::new(
/// Token::Comma,
/// Span::new(Location::new(1, 10), Location::new(1, 11)),
/// );
/// // period @ line 2, column 20
/// let tok2 = TokenWithLocation::new(
/// Token::Period,
/// Span::new(Location::new(2, 10), Location::new(2, 21)),
/// );
///
/// assert_ne!(tok1, tok2); // token with locations are *not* equal
/// assert_eq!(AttachedToken(tok1), AttachedToken(tok2)); // attached tokens are
/// ```
/// // period @ line 2, column 20
#[derive(Clone)]
#[cfg_attr(feature = "serde", derive(Serialize, Deserialize))]
#[cfg_attr(feature = "visitor", derive(Visit, VisitMut))]
pub struct AttachedToken(pub TokenWithSpan);

impl AttachedToken {
/// Return a new Empty AttachedToken
pub fn empty() -> Self {
AttachedToken(TokenWithSpan::wrap(Token::EOF))
AttachedToken(TokenWithSpan::new_eof())
}
}

Expand Down Expand Up @@ -80,3 +128,9 @@ impl From<TokenWithSpan> for AttachedToken {
AttachedToken(value)
}
}

impl From<AttachedToken> for TokenWithSpan {
fn from(value: AttachedToken) -> Self {
value.0
}
}
16 changes: 14 additions & 2 deletions src/ast/mod.rs
Original file line number Diff line number Diff line change
Expand Up @@ -596,9 +596,21 @@ pub enum CeilFloorKind {

/// An SQL expression of any type.
///
/// # Semantics / Type Checking
///
/// The parser does not distinguish between expressions of different types
/// (e.g. boolean vs string), so the caller must handle expressions of
/// inappropriate type, like `WHERE 1` or `SELECT 1=1`, as necessary.
/// (e.g. boolean vs string). The caller is responsible for detecting and
/// validating types as necessary (for example `WHERE 1` vs `SELECT 1=1`)
/// See the [README.md] for more details.
///
/// [README.md]: https://github.com/apache/datafusion-sqlparser-rs/blob/main/README.md#syntax-vs-semantics
///
/// # Equality and Hashing Does not Include Source Locations
///
/// The `Expr` type implements `PartialEq` and `Eq` based on the semantic value
/// of the expression (not bitwise comparison). This means that `Expr` instances
/// that are semantically equivalent but have different spans (locations in the
/// source tree) will compare as equal.
#[derive(Debug, Clone, PartialEq, PartialOrd, Eq, Ord, Hash)]
#[cfg_attr(feature = "serde", derive(Serialize, Deserialize))]
#[cfg_attr(
Expand Down
5 changes: 3 additions & 2 deletions src/ast/query.rs
Original file line number Diff line number Diff line change
Expand Up @@ -282,6 +282,7 @@ impl fmt::Display for Table {
pub struct Select {
/// Token for the `SELECT` keyword
pub select_token: AttachedToken,
/// `SELECT [DISTINCT] ...`
pub distinct: Option<Distinct>,
/// MSSQL syntax: `TOP (<N>) [ PERCENT ] [ WITH TIES ]`
pub top: Option<Top>,
Expand Down Expand Up @@ -511,7 +512,7 @@ impl fmt::Display for NamedWindowDefinition {
#[cfg_attr(feature = "serde", derive(Serialize, Deserialize))]
#[cfg_attr(feature = "visitor", derive(Visit, VisitMut))]
pub struct With {
// Token for the "WITH" keyword
/// Token for the "WITH" keyword
pub with_token: AttachedToken,
pub recursive: bool,
pub cte_tables: Vec<Cte>,
Expand Down Expand Up @@ -564,7 +565,7 @@ pub struct Cte {
pub query: Box<Query>,
pub from: Option<Ident>,
pub materialized: Option<CteAsMaterialized>,
// Token for the closing parenthesis
/// Token for the closing parenthesis
pub closing_paren_token: AttachedToken,
}

Expand Down
50 changes: 40 additions & 10 deletions src/ast/spans.rs
Original file line number Diff line number Diff line change
Expand Up @@ -21,21 +21,51 @@ use super::{

/// Given an iterator of spans, return the [Span::union] of all spans.
fn union_spans<I: Iterator<Item = Span>>(iter: I) -> Span {
iter.reduce(|acc, item| acc.union(&item))
.unwrap_or(Span::empty())
Span::union_iter(iter)
}

/// A trait for AST nodes that have a source span for use in diagnostics.
/// Trait for AST nodes that have a source location information.
///
/// Source spans are not guaranteed to be entirely accurate. They may
/// be missing keywords or other tokens. Some nodes may not have a computable
/// span at all, in which case they return [`Span::empty()`].
/// # Notes:
///
/// Source [`Span`] are not yet complete. They may be missing:
///
/// 1. keywords or other tokens
/// 2. span information entirely, in which case they return [`Span::empty()`].
///
/// Note Some impl blocks (rendered below) are annotated with which nodes are
/// missing spans. See [this ticket] for additional information and status.
///
/// [this ticket]: https://github.com/apache/datafusion-sqlparser-rs/issues/1548
///
/// # Example
/// ```
/// # use sqlparser::parser::{Parser, ParserError};
/// # use sqlparser::ast::Spanned;
/// # use sqlparser::dialect::GenericDialect;
/// # use sqlparser::tokenizer::Location;
/// # fn main() -> Result<(), ParserError> {
/// let dialect = GenericDialect {};
/// let sql = r#"SELECT *
/// FROM table_1"#;
/// let statements = Parser::new(&dialect)
/// .try_with_sql(sql)?
/// .parse_statements()?;
/// // Get the span of the first statement (SELECT)
/// let span = statements[0].span();
/// // statement starts at line 1, column 1 (1 based, not 0 based)
/// assert_eq!(span.start, Location::new(1, 1));
/// // statement ends on line 2, column 15
/// assert_eq!(span.end, Location::new(2, 15));
/// # Ok(())
/// # }
/// ```
///
/// Some impl blocks may contain doc comments with information
/// on which nodes are missing spans.
pub trait Spanned {
/// Compute the source span for this AST node, by recursively
/// combining the spans of its children.
/// Return the [`Span`] (the minimum and maximum [`Location`]) for this AST
/// node, by recursively combining the spans of its children.
///
/// [`Location`]: crate::tokenizer::Location
fn span(&self) -> Span;
}

Expand Down
59 changes: 58 additions & 1 deletion src/lib.rs
Original file line number Diff line number Diff line change
Expand Up @@ -25,6 +25,9 @@
//! 1. [`Parser::parse_sql`] and [`Parser::new`] for the Parsing API
//! 2. [`ast`] for the AST structure
//! 3. [`Dialect`] for supported SQL dialects
//! 4. [`Spanned`] for source text locations (see "Source Spans" below for details)
//!
//! [`Spanned`]: ast::Spanned
//!
//! # Example parsing SQL text
//!
Expand Down Expand Up @@ -61,13 +64,67 @@
//! // The original SQL text can be generated from the AST
//! assert_eq!(ast[0].to_string(), sql);
//! ```
//!
//! [sqlparser crates.io page]: https://crates.io/crates/sqlparser
//! [`Parser::parse_sql`]: crate::parser::Parser::parse_sql
//! [`Parser::new`]: crate::parser::Parser::new
//! [`AST`]: crate::ast
//! [`ast`]: crate::ast
//! [`Dialect`]: crate::dialect::Dialect
//!
//! # Source Spans
//!
//! Starting with version `0.53.0` sqlparser introduced source spans to the
//! AST. This feature provides source information for syntax errors, enabling
//! better error messages. See [issue #1548] for more information and the
//! [`Spanned`] trait to access the spans.
//!
//! [issue #1548]: https://github.com/apache/datafusion-sqlparser-rs/issues/1548
//! [`Spanned`]: ast::Spanned
//!
//! ## Migration Guide
//!
//! For the next few releases, we will be incrementally adding source spans to the
//! AST nodes, trying to minimize the impact on existing users. Some breaking
//! changes are inevitable, and the following is a summary of the changes:
//!
//! #### New fields for spans (must be added to any existing pattern matches)
//!
//! The primary change is that new fields will be added to AST nodes to store the source `Span` or `TokenWithLocation`.
//!
//! This will require
//! 1. Adding new fields to existing pattern matches.
//! 2. Filling in the proper span information when constructing AST nodes.
//!
//! For example, since `Ident` now stores a `Span`, to construct an `Ident` you
//! must provide now provide one:
//!
//! Previously:
//! ```text
//! # use sqlparser::ast::Ident;
//! Ident {
//! value: "name".into(),
//! quote_style: None,
//! }
//! ```
//! Now
//! ```rust
//! # use sqlparser::ast::Ident;
//! # use sqlparser::tokenizer::Span;
//! Ident {
//! value: "name".into(),
//! quote_style: None,
//! span: Span::empty(),
//! };
//! ```
//!
//! Similarly, when pattern matching on `Ident`, you must now account for the
//! `span` field.
//!
//! #### Misc.
//! - [`TokenWithLocation`] stores a full `Span`, rather than just a source location.
//! Users relying on `token.location` should use `token.location.start` instead.
//!
//![`TokenWithLocation`]: tokenizer::TokenWithLocation
#![cfg_attr(not(feature = "std"), no_std)]
#![allow(clippy::upper_case_acronyms)]
Expand Down
Loading

0 comments on commit 4ab3ab9

Please sign in to comment.