Welcome!
Important information
-- Spectre and Meltdown vulnerabilities
-- Change in MX sources

News
-- MX Linux on social media: here
-- Mepis support still here

Current releases
-- MX-17.1 Final release info here
-- antiX-17 release info here

New users
-- Please read this first, and don't forget to add system and hardware information to posts!
-- Here are the Forum Rules

Manipulating large data sets

This forum is intended for the discussion around the use of MEPIS in an office setting as distinct from home or indivicual usage. Examples include mail merge, database construction, hardware sharing, advertising, etc..
Post Reply
Message
Author
User avatar
Jerry3904
Forum Veteran
Forum Veteran
Posts: 21913
Joined: Wed Jul 19, 2006 6:13 am

Manipulating large data sets

#1 Post by Jerry3904 » Mon Mar 07, 2016 10:33 am

I am currently engaged in a research project that involves merging and analyzing large data sets (not "big data" sets, which is an entirely different story). While setting up my methodology, I ran across this excellent (and admittedly geeky) document and thought I would share it in case anyone else comes along with the same needs.

http://moo.nac.uci.edu/~hjm/Manipulatin ... Linux.html

It may be that LO Calc will be sufficient, but I am not sure yet and want also to be careful to accommodate future statistical needs (probably R).
Production: 4.15.0-1-amd64, MX-17.1, AMD FX-4130 Quad-Core, GeForce GT 630/PCIe/SSE2, 8 GB, Kingston SSD 120 GB and WesternDigital 1TB
Testing: AAO 722: 4.15.0-1-386. MX-17.1, AMD C-60 APU, 4 GB

User avatar
Richard
Posts: 1814
Joined: Fri Dec 12, 2008 10:31 am

Re: Manipulating large data sets

#2 Post by Richard » Mon Mar 07, 2016 11:34 am

https://wiki.documentfoundation.org/Faq/Calc/022

What is the maximum number of cells, rows, columns and sheets in a LibreOffice spreadsheet?
The maximum number of columns is 1,024 (from column A to column AMJ);
the maximum number of rows is 1,048,576 (220);
the maximum number of cells in one sheet is 1,073,741,824 (230 which is more than 1 billion cells);
the maximum number of individual sheets in a complete worksheet is 256.


How large is large?
MX17.1____: T430-2017, 8 GB RAM, 4.15.0-1-amd64, 119 SSD
antiX-/MX-171: AA1/Eee, 1 GB RAM, 4.15.0-1-686-pae, 149 HDD
DC9, LibO605, Dropbox, FF61, FFesr, mPDFed, Py3, CherryT, Vbox
linux counter #288562

User avatar
Jerry3904
Forum Veteran
Forum Veteran
Posts: 21913
Joined: Wed Jul 19, 2006 6:13 am

Re: Manipulating large data sets

#3 Post by Jerry3904 » Mon Mar 07, 2016 12:09 pm

Thanks. It's not just a question of capacity (which I don't know yet). As my title indicates, I am concerned with manipulation and data analysis capabilities. Gnumeric used to be considered better in this area, but I am not sure that is still the case.

One of the alternatives I am considering is SQLite, in conjunction with SQLiteBrowser. I will know more in another few weeks about what the data actually look like.
Production: 4.15.0-1-amd64, MX-17.1, AMD FX-4130 Quad-Core, GeForce GT 630/PCIe/SSE2, 8 GB, Kingston SSD 120 GB and WesternDigital 1TB
Testing: AAO 722: 4.15.0-1-386. MX-17.1, AMD C-60 APU, 4 GB

User avatar
dolphin_oracle
Forum Veteran
Forum Veteran
Posts: 8773
Joined: Sun Dec 16, 2007 1:17 pm

Re: Manipulating large data sets

#4 Post by dolphin_oracle » Mon Mar 07, 2016 12:30 pm

Jerry3904 wrote:Thanks. It's not just a question of capacity (which I don't know yet). As my title indicates, I am concerned with manipulation and data analysis capabilities. Gnumeric used to be considered better in this area, but I am not sure that is still the case.

One of the alternatives I am considering is SQLite, in conjunction with SQLiteBrowser. I will know more in another few weeks about what the data actually look like.
be carefull if you use libreoffice base to interface with SQlite. My dad ( with a gazillion years experience as a senior database maintainer and analyst for a couple of different large chemical and materials companies) borked 2 separate installations of a weight lifting tacking database app he wrote using libreoffice, and the crashes corrupted his data. luckily he had a backup.

of course, he is also used to working in oracledb, so it might have been user-driven problems.
http://www.youtube.com/runwiththedolphin
lenovo ThinkPad T530 - MX-17
lenovo s21e & 100s - antiX-17, MX17(live-usb)
FYI: mx "test" repo is not the same thing as debian testing repo.

User avatar
Jerry3904
Forum Veteran
Forum Veteran
Posts: 21913
Joined: Wed Jul 19, 2006 6:13 am

Re: Manipulating large data sets

#5 Post by Jerry3904 » Mon Mar 07, 2016 12:32 pm

Thanks for that warning!

My *plan* atm is to set up a SQL database from the beginning, and use the SQLbrowser to manipulate and analyze the data.
Production: 4.15.0-1-amd64, MX-17.1, AMD FX-4130 Quad-Core, GeForce GT 630/PCIe/SSE2, 8 GB, Kingston SSD 120 GB and WesternDigital 1TB
Testing: AAO 722: 4.15.0-1-386. MX-17.1, AMD C-60 APU, 4 GB

User avatar
Old Giza
Forum Regular
Forum Regular
Posts: 340
Joined: Wed Apr 16, 2014 10:31 pm

Re: Manipulating large data sets

#6 Post by Old Giza » Mon Mar 07, 2016 7:04 pm

Probably irrelevant but in case it's of interest ...

I used to support the processing of thousands of large data sets (surveys/diaries) of several gigabytes or more using a little-known product called TPL, originally created by the US Bureau of Labor Statistics, now supported by ex-employees in QQQSoft. Used both the IBM mainframe version and later Linux.

The advantages were: no special expertise required (just the ability to read a comprehensive manual), flexible input data description language, non-procedural results type language, publication-ready (PDF) or Web-ready (HTML) output formatting, super-fast results, unlimited input data sets. The restriction was that the analysis was limited to what could be done using sequential input, i.e. tabulations, percentages, etc. But it was surprising what requests could be accomplished doing it that way. Had considered using SQL databases tied into another analysis package, but that would have slowed everything to a crawl. (Disclaimer: I have no ties to the product.)

User avatar
Gordon Cooper
Forum Guide
Forum Guide
Posts: 1895
Joined: Mon Nov 21, 2011 5:50 pm

Re: Manipulating large data sets

#7 Post by Gordon Cooper » Mon Mar 07, 2016 7:25 pm

Was thinking that MariaDB could be another option, it is GPL but have not looked at it for years.
Homebrew64 bit Intel duo core 2 GB RAM, 120 GB Kingston SSD, Seagate1TB.
Primary OS : MX-17.1 64bit. Also MX17, Kubuntu14.04 & Puppy 6.3.
Dell9010, MX-17.1, Win7

User avatar
Jerry3904
Forum Veteran
Forum Veteran
Posts: 21913
Joined: Wed Jul 19, 2006 6:13 am

Re: Manipulating large data sets

#8 Post by Jerry3904 » Mon Mar 07, 2016 7:50 pm

10.0 is in the repos, so I will take a look.
Production: 4.15.0-1-amd64, MX-17.1, AMD FX-4130 Quad-Core, GeForce GT 630/PCIe/SSE2, 8 GB, Kingston SSD 120 GB and WesternDigital 1TB
Testing: AAO 722: 4.15.0-1-386. MX-17.1, AMD C-60 APU, 4 GB

User avatar
Gordon Cooper
Forum Guide
Forum Guide
Posts: 1895
Joined: Mon Nov 21, 2011 5:50 pm

Re: Manipulating large data sets

#9 Post by Gordon Cooper » Mon Mar 07, 2016 11:39 pm

10.1 was released late last year but have never used it. Also DBEdit might provide the manipulation you are looking for.
Homebrew64 bit Intel duo core 2 GB RAM, 120 GB Kingston SSD, Seagate1TB.
Primary OS : MX-17.1 64bit. Also MX17, Kubuntu14.04 & Puppy 6.3.
Dell9010, MX-17.1, Win7

Post Reply

Return to “Office”