-
Notifications
You must be signed in to change notification settings - Fork 65
StringFormat Explained
It's a very easy to use string extraction library.
Without explanation, see if you can intuitively guess what the following code does?
Optional<LogFile> log =
new StringFormat("/home/{usr}/log/{year}/{month}/{day}/job-{shard_id}.log")
.parse(
logFileName,
(usr, year, month, day, shardId) ->
LogFile.builder()
.setUser(usr)
.setDate(parseInt(year), parseInt(month), parseInt(day))
.setShard(shardId)
.build());
Yeah just trust your intuitition, it does exactly what it looks like doing!
(Starting from v7.0, there is a convenient parseOrThrow()
method that throws if the input can't be parsed, with reasonably informative error message.)
Sometimes you may be searching for sub-patterns from the input string and the sub-pattern may occur 0, 1 or multiple times. You can use the scan()
method for these use cases. For example, if there are multiple breakpoint specs from the input string:
List<Breakpoint> breakpoints =
new StringFormat("breakpoint: {line={line}, color={color}}")
.scan(inputString, Breakpoint::new)
.collect(toList());
Both the parse()
and scan()
methods have overloads that support from 1 to 8 placeholders.
You can also post-filter to ignore matches that don't satisfy a post-condition. For example, if you want to ignore invalid breakpoint specs, just return null for the invalid matches:
List<Breakpoint> breakpoints =
new StringFormat("breakpoint: {line={line}, color={color}}")
.scan(
inputString,
(line, color) ->
isNumeric(line) && isValidColor(color) ? new Breakpoint(line, color) : null)
.collect(toList());
If you use bazel as your build tool, compile-time check is provided out of box.
If you use Maven, we strongly recommend adding both ErrorProne and the mug-errorprone
plugin to your annotationProcessor
paths. For example:
<build>
<pluginManagement>
<plugins>
<plugin>
<artifactId>maven-compiler-plugin</artifactId>
<configuration>
<annotationProcessorPaths>
<path>
<groupId>com.google.errorprone</groupId>
<artifactId>error_prone_core</artifactId>
<version>2.23.0</version>
</path>
<path>
<groupId>com.google.mug</groupId>
<artifactId>mug-errorprone</artifactId>
<version>8.0</version>
</path>
</annotationProcessorPaths>
</configuration>
</plugin>
</plugins>
</pluginManagement>
</build>
This plugin checks against common programming errors including:
- The number of lambda parameters doesn't match the number of format placeholders
- The names of the lambda parameters don't match the placeholders
With the compile-time checks, you can safely define StringFormat
as private class constants and reference them many lines away.
The compile-time checks make the StringFormat.format()
method a safer alternative to String.format()
(it's faster too). For example:
private static final StringFormat JOB_ID_FORMAT = "{project_id}@{location}:{job}";
// 200 lines later
.setJobId(JOB_ID_FORMAT.format(projectId, location, job));
Compared to String.format()
, the benefits are:
- The format string is more human readable with the placeholder names.
- The
StringFormat
can be defined as class constant and safely reused throughout the file, because the compile-time check ensures that the format arguments match the placeholder names. You can't pass the wrong number or pass them in the wrong order!
Combining the parsing and formatting capability, you can round-trip between pojo and string formats.
Suppose you have a SQL library that supports injection-safe parameterization. It may look like this:
public class Query {
private String sql;
/** Don't allow users to pass in arbitrary unsafe SQL. */
private Query(String unsafeSql) {...}
/** Users have to use compile-time string literals. No dynamic values. */
public static Query create(@CompileTimeConstant String sql);
/** Dynamic parameters sent to the server */
public Query addParameter(String paramName, Object paramValue);
/** Append trusted sql snippet. */
public Query append(TrustedSql snippet);
public Result execute(Db db);
}
And the usual usage pattern is like:
Query query = Query.create("select name from Students where id = @id")
.addParameter("id", idString);
The API allows to define parameters in a template and then pass parameter values by chaining addParameter()
calls.
It still leaves a few things to be desired:
-
Sometimes the SQL can be long and complex. It'd be nice to extract the query template as a template. But in doing so the template with the placeholder names are far away from the
addParameter()
calls. If you made a typo in the wrong placeholder names, or passed fewer parameters, you get a runtime error. -
Occassionally it's desirable to also parameterize by table names, column names or even sub-queries.
For the bullet point #2, the usual workaround is to use the append()
method (similar to StringBuilder
):
private static final TrustedSql STUDENTS_TABLE = TrustedSql.fromFlag(studentsTableFlag);
Query getStudentName = Query.create("select name from ")
.append(STUDENTS_TABLE)
.append(" where id = @id")
.addParameter("id", studentId);
But the SQL gets fragmented and becomes harder to read.
Let's see if we can use StringFormat
to help address these issues. We'll use the StringFormat.template()
SPI to provide the same template syntax used by StringFormat
, but plug in our custom rules.
public class Query {
...
public static StringFormat.Template<Query> template(@CompileTimeConstant sqlTemplate) {
return StringFormat.template(
sqlTemplate,
// For template("select * from {tbl} where id = {id};").with(tableName, id)
// fragments = ["select * from ", " where id = ", ";"],
// placeholders = [`{tbl}`: tableName, `{id}`: id]
(List<String> fragments, BiStream<Substring.Match, Object> placeholders) -> {
Iterator<String> it = fragments.iterator();
BiStream.Builder<String, Object> parameters = BiStream.builder();
String.Builder builder = new StringBuilder();
placeholders.forEachOrdered((placeholder, value) -> {
if (value instanceof TrustedSql) { // trusted, just add to the sql string
builder.append(value);
} else {
// translate "{id}" to "@id".
String paramName = "@" + placeholde.skip(1, 1);
builder.append(paramName);
parameters.add(paramName, value);
}
});
builder.append(it.next()); // append the last ";"
// Create the Query and bind all parameter values
return parameters.build().collect(new Query(builder.toString()), Query::addParameter);
});
}
}
We can use this method to parameterize by both table names and values:
private static final StringFormat.Template<Query> GET_NAME_BY_ID =
Query.template("select name from {table} where id = {id}");
private static final TrustedSql STUDENTS_TABLE = TrustedSql.fromFlag(studentsTableFlag);
private static final TrustedSql TEACHERS_TABLE = TrustedSql.fromFlag(teachersTableFlag);
// 200 lines later
Query getStudentName = GET_NAME_BY_ID.with(STUDENTS_TABLE, studentId);
Query getTeacherName = GET_NAME_BY_ID.with(TEACHERS_TABLE, teacherId);
What we have accomplished:
- Retain the safety provided by the original
Query
API. - Parameterize by table name (or any other parts of the query) without compromising sql readability.
- Define the query templates as class constants with
StringFormat
's compile-time safety to ensure parameter correctness. - Light-weight syntax without having to chain the
addParameter()
calls.