Welcome!
Important information
-- Spectre and Meltdown vulnerabilities
-- Change in MX sources

News
-- MX Linux on social media: here
-- Mepis support still here

Current releases
-- MX-17.1 Final release info here
-- antiX-17 release info here

New users
-- Please read this first, and don't forget to add system and hardware information to posts!
-- Here are the Forum Rules

[SOLVED] Multi-line replacing in large text files.

Message
Author
User avatar
JmaCWQ
Forum Novice
Forum  Novice
Posts: 50
Joined: Fri Sep 09, 2016 4:42 am

[SOLVED] Multi-line replacing in large text files.

#1 Post by JmaCWQ » Sun Sep 16, 2018 12:57 am

Hello All,
I have a 2GB text file (.mp polish map format) and want to replace multiple line sequences of text in it.
I think either Gvim or GNU Emacs 24 will do what I want but I can't figure out how to make multiple line, whole word only selection work.
Searching around for an answer it's all gobbledegook to me the results I'm getting...not much good at understanding Terminal stuff & much prefer a GUI way of doing things when working with files.
I can do the single line, whole word only replacement with Gvim ok but not multi-line.
I can do multi-line with GNU Emacs 24, but not the 'whole word only' option.

An example of what I'm trying to replace is the text:

[POLYGON]
Type=0x1


With the text:

[POLYGON]
Type=0x4b


It has to be searched for with that above combination using the whole word only function as there's also:

[POLYLINE]
Type=0x1

Just changing all the Type=0x1's makes the required changes to the format of the polygons but messes up the format of the polylines using the same type.
There are other types that have more than just '0x1', like Type=0x1f, Type=0x1c etc., so it must be a 'whole word only' scenario.
I know I can just do them one at a time but there's several hundred thousand of them in a file that has over 30 million lines, plus other similar combinations I'm wanting to change, so I'm asking for a little help from those who understand these things.

Hopefully that makes some kind of sense...thanks.
Last edited by JmaCWQ on Sun Sep 16, 2018 12:40 pm, edited 1 time in total.

User avatar
thomasl
Forum Novice
Forum  Novice
Posts: 92
Joined: Sun Feb 04, 2018 10:26 am

Re: Multi-line replacing in large text files.

#2 Post by thomasl » Sun Sep 16, 2018 4:49 am

Given the size of the file, I think you're perhaps better off using a stream editor (sed, awk etc) rather than a full-blown text editor (one of the text editors I use under Windows can do that but many text editors are limited by available RAM).

This post looks as if it might help you to do the job for files of any size and there are many helpful comments as to what the cryptic command lines actually do ;) .
Dual-boot MX17.1/64 frugal root persistence + Windows 7 on Lenovo Edge72 i5-3470S/12GB and Tosh R950 i5-3340M/8GB
“In foreign countries they fear baldness. They are so rich in foreign countries, they can afford to fear all kinds of silly things.”

User avatar
fehlix
Forum Guide
Forum Guide
Posts: 1717
Joined: Wed Apr 11, 2018 5:09 pm

Re: Multi-line replacing in large text files.

#3 Post by fehlix » Sun Sep 16, 2018 5:28 am

JmaCWQ wrote:
Sun Sep 16, 2018 12:57 am
replace is the text:

[POLYGON]
Type=0x1


With the text:

[POLYGON]
Type=0x4b

Either on the command line with something like this

Code: Select all

sed -i.orig -E '/^\[POLYGON\]/{n; s/^Type=0x1/Type=0x4b/; }' MyTestFileName
or
using Gui text editor Geany using latest version from MX test-repo installed with MX package installer:

Open Text file
Press Ctrl+H for Search/Replace and
Mark:
Use regular expression
Use multi-line matching
Search for : \[POLYGON\]\nType=0x1
Replace with : [POLYGON]\nType=0x4b

-> Replace All -> In Document
multiline-replace-with-geany.png
:puppy:
You do not have the required permissions to view the files attached to this post.
Gigabyte Z77M-D3H, Intel Xeon E3-1240 V2 (Quad core), 32GB RAM,
GeForce GTX 770, Samsung SSD 850 EVO 500GB, Seagate Barracuda 4TB

User avatar
thomasl
Forum Novice
Forum  Novice
Posts: 92
Joined: Sun Feb 04, 2018 10:26 am

Re: Multi-line replacing in large text files.

#4 Post by thomasl » Sun Sep 16, 2018 6:09 am

fehlix wrote:
Sun Sep 16, 2018 5:28 am
using Gui text editor Geany using latest version from MX test-repo
Hm... isn't geany based on Scintilla? I seem to remember that Scintilla (actually it was SciTE) became extremely slow when filesizes reached several 100MBs, not to talk about GBs. That's one reason why I normally do not look into editors that are Scintilla-based.

But perhaps I'm wrong and things have changed with recent releases (or perhaps the GTK version has no such limit in the first place)?

If that's the case, I might look into this editor as I still haven't found a really good visual text editor for MX.
Dual-boot MX17.1/64 frugal root persistence + Windows 7 on Lenovo Edge72 i5-3470S/12GB and Tosh R950 i5-3340M/8GB
“In foreign countries they fear baldness. They are so rich in foreign countries, they can afford to fear all kinds of silly things.”

User avatar
fehlix
Forum Guide
Forum Guide
Posts: 1717
Joined: Wed Apr 11, 2018 5:09 pm

Re: Multi-line replacing in large text files.

#5 Post by fehlix » Sun Sep 16, 2018 6:49 am

thomasl wrote:
Sun Sep 16, 2018 6:09 am
isn't geany based on Scintilla?
Not I'm aware off. Quickly glancing through the "History" page
on their home-page https://www.geany.org/ doesn't indicates this either.

Re large files handling, Sure, I do would recommend sed for 'simple' tasks, but prefer perl for advanced issues.
Gigabyte Z77M-D3H, Intel Xeon E3-1240 V2 (Quad core), 32GB RAM,
GeForce GTX 770, Samsung SSD 850 EVO 500GB, Seagate Barracuda 4TB

User avatar
JmaCWQ
Forum Novice
Forum  Novice
Posts: 50
Joined: Fri Sep 09, 2016 4:42 am

Re: Multi-line replacing in large text files.

#6 Post by JmaCWQ » Sun Sep 16, 2018 7:11 am

Thanks for the replies people.
I like using Geany, been using it for years as a default editor, and it does very well the search/replace stuff, but only on smaller files.
Not even the testing version of Geany can open or view the file I'm working on, too big I guess (2.1 GB), it tries for a short while then stops & just opens a blank tab.
I've seen many pages similar to that StackOverflow one in the last few days, which is where the 'gobbledygook' above comes in, lol.
I just can't understand it all yet.

The sed command appears to have worked, thank you fehlix :cool:
I can now hopefully adapt that to make the rest of the changes I require.

How I did it, for noobs like me trying to figure this stuff out
My file is on the Desktop, named Australia.mp
So opened Terminal & cd to the Desktop, then:

Code: Select all

sed -i.orig -E '/^\[POLYGON\]/{n; s/^Type=0x1/Type=0x4b/; }' Australia.mp
Hit Enter.

What's the "for dummies like me" explanation of this above command please?
I think the only way I'll pick this up is to understand what each of those character combinations means.
I guess sed -i.orig runs sed and creates the backup (.orig) file.
The other character combinations I don't know.

Gvim opens, searches, finds & replaces matching whole words in this large file ok, but the problem is when when I try to select multiple lines to paste in the Replace box, it shows an unknown character when pasted and fails, I see from researching this there's command options in Vim for newline search matching but I couldn't understand it or make it work.
The character that appears:
Screenshot.png
I've tried replacing that character with \n which I think is Vim's command for a new line when searching, and tried a few other combinations of things I found when researching too, but none of them worked.
It'll only search for [POLYGON] if "Match whole word only" is un-checked, and finds nothing if it's checked.
I was thinking there'd be a way to get it to work using the GUI (Gvim) but perhaps there isn't?
Not much about it on the web that I could find.

Emacs also opens and edits this large file fine, allows me to paste multiple lines to search & replace and it does work, but I can't figure out how to force the "match whole word only" when doing this via the GUI, as when I tried it replaced every string containing Type=0x1, not just the ones containing only Type=0x1.

Thanks.
You do not have the required permissions to view the files attached to this post.

User avatar
thomasl
Forum Novice
Forum  Novice
Posts: 92
Joined: Sun Feb 04, 2018 10:26 am

Re: Multi-line replacing in large text files.

#7 Post by thomasl » Sun Sep 16, 2018 7:58 am

JmaCWQ wrote:
Sun Sep 16, 2018 7:11 am
What's the "for dummies like me" explanation of this above command please?
I think the only way I'll pick this up is to understand what each of those character combinations means.
I guess sed -i.orig runs sed and creates the backup (.orig) file.
The other character combinations I don't know.
That's the reason why I gave you a link to an answer in my post above... there are detailed explanations in that post how this works.
fehlix wrote:
Sun Sep 16, 2018 6:49 am
Not I'm aware off. Quickly glancing through the "History" page
on their home-page https://www.geany.org/ doesn't indicates this either.
So I went and checked the geany sources and alas, it's definitely based on Scintilla. See https://github.com/geany/geany/ . A pity.
Dual-boot MX17.1/64 frugal root persistence + Windows 7 on Lenovo Edge72 i5-3470S/12GB and Tosh R950 i5-3340M/8GB
“In foreign countries they fear baldness. They are so rich in foreign countries, they can afford to fear all kinds of silly things.”

User avatar
fehlix
Forum Guide
Forum Guide
Posts: 1717
Joined: Wed Apr 11, 2018 5:09 pm

Re: Multi-line replacing in large text files.

#8 Post by fehlix » Sun Sep 16, 2018 9:15 am

JmaCWQ wrote:
Sun Sep 16, 2018 7:11 am
So opened Terminal & cd to the Desktop, then:

Code: Select all

sed -i.orig -E '/^\[POLYGON\]/{n; s/^Type=0x1/Type=0x4b/; }' Australia.mp
Hit Enter.

What's the "for dummies like me" explanation of this above command please?
Ok, I know it looks ugly, but it's not to hard to understand.
let's go through it.

Multi-line replace with sed

Code: Select all

sed -i.orig -E '/^\[POLYGON\]/{n; s/^Type=0x1/Type=0x4b/; }' Australia.mp
Sed will read line by line and puts all lines (after manipulation) back into the file (output).

-i.orig : save original file to file.orig
-E : use extented regular expressions

/^\[POLYGON\] : for all lines starting (^) with [POLYGON] , escape-quoting of [ and ] with back-slash needed

do the commands, separated with semi-colon ';', whithin the curly brackets { ... } on the matching line
other lines are passed through unprocessed.

n : read-in the next line into the input-buffer (they call it "pattern space")

s/^Type=0x1/Type=0x4b/; : if line starts "^" with pattern Type=0x1 and do replace with Type=0x4b


Multi-line replace with vim - not Gvim

Open terminal.
vim myfilename

Type colon ':' to go into the vim-command-line (Esc to go back )

Enter seach replace

Code: Select all

:%s/^\[POLYGON\]\nType=0x1/[POLYGON]^MType=0x4b/
Within the search part (between the first two slashes you search for new line with "\n"

You replace this the with a "real" new line char displayed here as "^M"

You enter a new-line char by pressing frist Ctrl+V and than Ctrl+M
multi-line-replace-vim.png
:puppy:

EDIT:
To save the file with vim type :w
To quit/close vim type :q
and always press "Enter" on the ":"-commandline to do the action ;=)
You do not have the required permissions to view the files attached to this post.
Gigabyte Z77M-D3H, Intel Xeon E3-1240 V2 (Quad core), 32GB RAM,
GeForce GTX 770, Samsung SSD 850 EVO 500GB, Seagate Barracuda 4TB

User avatar
JmaCWQ
Forum Novice
Forum  Novice
Posts: 50
Joined: Fri Sep 09, 2016 4:42 am

Re: Multi-line replacing in large text files.

#9 Post by JmaCWQ » Sun Sep 16, 2018 10:03 am

thomasl wrote:
Sun Sep 16, 2018 7:58 am
That's the reason why I gave you a link to an answer in my post above... there are detailed explanations in that post how this works.
Yes I understand why you posted the link, I read that page right through a couple of days ago and again today when you posted it, but I couldn't understand the explanations or how to use them for what I'm trying to achieve.
Thanks for posting it though, and trying to help, I appreciate it.
Cheers.

User avatar
JmaCWQ
Forum Novice
Forum  Novice
Posts: 50
Joined: Fri Sep 09, 2016 4:42 am

Re: Multi-line replacing in large text files.

#10 Post by JmaCWQ » Sun Sep 16, 2018 12:38 pm

fehlix wrote:
Sun Sep 16, 2018 9:15 am
Ok, I know it looks ugly, but it's not to hard to understand.
let's go through it......
Thanks again fehlix.
It appears I'm learning something finally.
Had a play with Vim and got it replacing the correct text after a couple of tries.

Code: Select all

:%s/^\[POLYGON\]\nType=0x1/[POLYGON]^MType=0x4b/
Was replacing all instances of Label=0x1, the 0x1c's, 0x1f's etc. instead of just the ones matching.
Needed a $ after Type=0x1.

Code: Select all

:%s/^\[POLYGON\]\nType=0x1$/[POLYGON]^MType=0x4b/
works correctly :cool:

EDIT:
Forgot to mention when using Gvim the SHIFT+; (full colon) allows for typing/pasting in commands down the bottom same as in Vim running in a Terminal.

Post Reply

Return to “Software / Configuration”