Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Problem with persisting multiline texts with trailing backslashes #603

Open
neongrau opened this issue Jun 23, 2017 · 13 comments
Open

Problem with persisting multiline texts with trailing backslashes #603

neongrau opened this issue Jun 23, 2017 · 13 comments
Labels

Comments

@neongrau
Copy link

First i must say i'm not sure where exactly the error happens.
Maybe it's the adapter or maybe it's ActiveRecord's serializer logic.

I've encountered this error on Rails 4.2 as well as in Rails 5.0.

Having a table with a text/nvarchar(max) column, let's call the column 'data' that's supposed to be serialized.

Migration for replication:

class CreateTestings < ActiveRecord::Migration[5.0]
  def change
    create_table :testings do |t|
      t.text :data
    end
  end
end

Model

class Testing < ApplicationRecord
  serialize :data
end

When having a Hash with a String value that ends in a backslash and at the same time more data after that line in the serialized YAML this will result in an exception making the record pretty much broken.

x = Testing.create(data:{ foo:'foo\\', bar:'bar' })
#  SQL (0.0ms)  BEGIN TRANSACTION
#  SQL (0.0ms)  EXEC sp_executesql N'INSERT INTO [testings] ([data]) OUTPUT INSERTED.[id] #VALUES (@0)', N'@0 nvarchar(max)', @0 = N'---
#:foo: foo\
#:bar: bar
#'  [["data", "---\n:foo: foo\\\n:bar: bar\n"]]
#  SQL (15.7ms)  COMMIT TRANSACTION
#=> #<Testing id: 1, data: {:foo=>"foo\\", :bar=>"bar"}>
x.reload
Psych::SyntaxError: (<unknown>): mapping values are not allowed in this context at line 2 column 14

Looking into the database i see this was stored:

---\n:foo: foo:bar: bar\n

but actually there should be this:

---\n:foo: foo\\\n:bar: bar\n

Further tries

# will not trigger an exception will just lose the backslash
x = Testing.create(data:{ foo:'foo\\' })
# will not trigger an exception will just lose the last backslash, the first one will stay
x = Testing.create(data:{ foo:'foo\\\\' })

Any thoughts?

@metaskills
Copy link
Member

Sorry for the late reply! This sounds like a bug in the adapter's quoting to use sp_excutesql. Really surprised this has not come up before too. Let me take a quick look...

@metaskills
Copy link
Member

metaskills commented Sep 7, 2017

So just playing around with some debugging on the YAML example. That new literal format was new to me.

{foo:'foo', bar:'bar'}
=> {:foo=>"foo", :bar=>"bar"}
YAML.load "---\n:foo: foo\n:bar: bar\n"
=> {:foo=>"foo", :bar=>"bar"}
YAML.dump({foo:'foo', bar:'bar'})
=> "---\n" + ":foo: foo\n" + ":bar: bar\n"

Now playing with reproducing your issue using the scratchpad test we have and a model with a nvarchar(max) column.

SSTestDatatypeMigration.columns_hash['text_col'].sql_type
=> "nvarchar(max)"
o = SSTestDatatypeMigration.create! text_col: "---\n:foo: foo\n:bar: bar\n"
YAML.load(o.reload.text_col)
=> {:foo=>"foo", :bar=>"bar"}
  SQL (1.8ms)  EXEC sp_executesql N'INSERT INTO [sst_datatypes_migration] ([text_col]) OUTPUT INSERTED.[id] VALUES (@0)', N'@0 nvarchar(max)', @0 = N'---
:foo: foo
:bar: bar
'  [["text_col", nil]]

So everything looks good here. We are passing the ActiveRecord tests for serialized data too. So why were are getting different results? It can't be that composite primary keys gem again can it ;)

@neongrau
Copy link
Author

neongrau commented Sep 7, 2017

Definitely not the composite keys gem, i kicked that for good.

But in your examples that work and would here too i assume you don't have the problematic trailing backslash in a value that's not the last one.

This is what was causing the problem. I had user data where someone (accidentally) appended a backslash.

Pleas try with this change:

o = SSTestDatatypeMigration.create! text_col: "---\n:foo: foo\\\n:bar: bar\n"
YAML.load(o.reload.text_col)

the value of foo is supposed to be foo\

Real world scenario could be having a Windows-style directory like

{ folder: 'c:\\data\\foo\\', file: 'example.yml' }

@metaskills
Copy link
Member

Hmm... this is ODD. Can you tell me how to insert SQL like this more low level? I did a quick raw test and saw these results. Does that look odd? I wonder what SMS does?

sql = "INSERT INTO [sst_datatypes_migration] ([text_col]) VALUES (N'---\n:foo: foo\\\n:bar: bar\n')"
SSTestDatatypeMigration.delete_all
ActiveRecord::Base.connection.execute(sql)
SSTestDatatypeMigration.first.text_col # => "---\n:foo: foo:bar: bar\n"
INSERT INTO [sst_datatypes_migration] ([text_col]) VALUES (N'---
:foo: foo\
:bar: bar
')

@neongrau
Copy link
Author

This is really weird.

INSERT INTO [testings] ([data]) OUTPUT INSERTED.[id] VALUES (N'---
:folder: c:\data\foo\
:file: example.yml
')

I'm seeing exactly the same problem when executing this sample in SMS and also in the macOS client SQLPro Studio which is using FreeTDS afaik.

in both SMS and SQLPro Studio i can manually edit an existing record by copy and pasting the YAML string which result in the following UPDATE statement which i grabbed from SQL Server Profiler

BEGIN

SET QUOTED_IDENTIFIER OFF
UPDATE
	dbo.testings
SET
	data = '---
:folder: c:\data\foo\
:file: example.yml
'
WHERE
	id = 2

END

The formatting of the string looks absolutely identical, yet the result is correct.
We then presumed it's the line SET QUOTED_IDENTIFIER OFF that has something to do with it. But an INSERT with that line in front still does not work.

What works is manually adding a blank space after the trailing backslash. Or wrapping the path into quotation marks thereby modifying the YAML formatting.

It's just as if this YAML is plain incompatible to be persisted in SQL.

@neongrau
Copy link
Author

I guess i've found the root of this problem.

This is a special thing that SQL Server is doing.
A backslash at an end of a line is treated as
\ breaks a long string constant into two or more lines for readability.

https://docs.microsoft.com/en-us/sql/t-sql/language-elements/sql-server-utilities-statements-backslash

This means it's not a YAML specific issue. This would happen with any multi-line string that has a trailing backslash in a line.

INSERT INTO [testings] ([data]) 
VALUES (N'There is a file in the folder c:\data\random-texts\
with Lorem Ipsum inside.
This note is supposed to be 3 lines long.
')

This will result in a 2 liner instead of the 3 lines when done from SMS.

And same when attempted in Ruby

note = "There is a file in the folder c:\\data\\random-texts\\\nwith Lorem Ipsum inside.\nThis note is supposed to be 3 lines long."

irb(main):013:0> puts note
There is a file in the folder c:\data\random-texts\
with Lorem Ipsum inside.
This note is supposed to be 3 lines long.

x = Testing.create(data: note)
  SQL (0.0ms)  BEGIN TRANSACTION
  SQL (15.3ms)  EXEC sp_executesql N'INSERT INTO [testings] ([data]) OUTPUT INSERTED.[id] VALUES (@0)', N'@0 nvarchar(max)', @0 = N'--- |-
  There is a file in the folder c:\data\random-texts\
  with Lorem Ipsum inside.
  This note is supposed to be 3 lines long.
'  [["data", nil]]
  SQL (0.0ms)  COMMIT TRANSACTION

irb(main):016:0> puts x.data
There is a file in the folder c:\data\random-texts\
with Lorem Ipsum inside.
This note is supposed to be 3 lines long.

x.reload

#  Testing Load (0.0ms)  EXEC sp_executesql N'SELECT  [testings].* FROM [testings] WHERE [testings].[id] = @0  ORDER BY [testings].[id] ASC OFFSET 0 ROWS FETCH NEXT @1 ROWS ONLY', N'@0 int, @1 int', @0 = 6, @1 = 1  [["id", nil], ["LIMIT", nil]]
irb(main):018:0> puts x.data
There is a file in the folder c:\data\random-texts  with Lorem Ipsum inside.
This note is supposed to be 3 lines long.

@neongrau neongrau changed the title Problem with persisting serialized YAML data Problem with persisting multiline texts with trailing backslashes Oct 19, 2017
@neongrau
Copy link
Author

@metaskills
Copy link
Member

metaskills commented Oct 21, 2017

Damn! Incredible sleuthing. I'm guessing there is nothing the adapter needs to do with this. Perhaps you can use JSON instead or a custom YAML serializer for this column/attribute. I did a quick search it seems it is possible to enforce quoting if needed. Maybe there is a simpler way. ruby/psych#322

@neongrau
Copy link
Author

Eh? No it actually has nothing to do with YAML. It‘s an adapter thing and so far the only solution i can rhink of is to Regex parse any strings to handle the special way SQL Server treats backslashes at EOL.

Like the old Monkey patch from 2008 does.

@metaskills
Copy link
Member

I'm not so sure we should. Ideally we should act just like SMS does and what you showed me is that this is the case right? Supporting patches for any string, YAML or otherwise, looks like a game that we cant keep up with. So the hacks should be contextual on the end user. IE, if you want to support YAML with possible strings that end in , then you will have to ensure that. Make sense?

@neongrau
Copy link
Author

Again: this has nothing to do with YAML. Any textarea can suffer from thus. Just enter a text with a lne that ends with a backslash and a following linebreak will be removed alongside the backslash.

It‘s a braindead feature for creating more readable SQL that affects user input in programmatically generated SQL.

@neongrau
Copy link
Author

In any text to persist losing that backslash and the linebreak sure is just 2 bytes lost. In YAML you will notice quick because that breaks all following data.

@metaskills
Copy link
Member

Again: this has nothing to do with YAML

I KNOW. Seriously :) I knew that all along. This is the point I am making tho. I only used YAML as an example of what an end user would have to deal with.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants